Tensors: The Building Blocks of Deep Learning

Lesson, slides, and applied problem sets.

Lesson

Tensors: The Building Blocks of Deep Learning

Why this module exists

Deep learning frameworks like PyTorch and TensorFlow are built on tensors—multi-dimensional arrays with superpowers. Understanding tensors is essential for moving from ML fundamentals to neural networks.

1) What is a tensor?

A tensor is a generalization of vectors and matrices:

0D tensor (scalar): A single number → 5
1D tensor (vector): A list of numbers → [1, 2, 3]
2D tensor (matrix): A table → [[1,2], [3,4]]
3D tensor: A "cube" of numbers
nD tensor: Higher-dimensional arrays

In practice, tensors are NumPy arrays or framework-specific arrays (torch.Tensor, tf.Tensor).

2) Tensor shapes

Shape describes the size along each dimension:

scalar = 5                    # shape: ()
vector = [1, 2, 3]           # shape: (3,)
matrix = [[1,2], [3,4]]      # shape: (2, 2)
tensor3d = [[[1,2], [3,4]],
            [[5,6], [7,8]]]  # shape: (2, 2, 2)

Always know your shapes! Shape errors are the most common bugs.

3) Common tensor shapes in ML

Batch of samples: (batch_size, n_features)

# 32 samples, each with 10 features
X = np.random.randn(32, 10)  # shape: (32, 10)

Images: (batch, height, width, channels) or (batch, channels, height, width)

# 32 RGB images of 28×28 pixels (channels-last)
images = np.random.randn(32, 28, 28, 3)  # shape: (32, 28, 28, 3)

Sequences: (batch, sequence_length, features)

# 32 sequences of length 100, each with 50 features
sequences = np.random.randn(32, 100, 50)  # shape: (32, 100, 50)

4) Creating tensors

import numpy as np

# From lists
a = np.array([1, 2, 3])

# Zeros and ones
zeros = np.zeros((3, 4))      # 3×4 of zeros
ones = np.ones((2, 3))        # 2×3 of ones

# Random
uniform = np.random.rand(3, 4)     # Uniform [0, 1)
normal = np.random.randn(3, 4)     # Normal(0, 1)

# Range
sequence = np.arange(10)           # [0, 1, 2, ..., 9]
linspace = np.linspace(0, 1, 5)    # [0, 0.25, 0.5, 0.75, 1]

# Identity
identity = np.eye(3)               # 3×3 identity matrix

5) Tensor indexing

a = np.array([[1, 2, 3],
              [4, 5, 6]])

a[0]        # First row: [1, 2, 3]
a[0, 1]     # Element at row 0, col 1: 2
a[:, 0]     # First column: [1, 4]
a[0:1, :]   # First row as 2D: [[1, 2, 3]]
a[:, 1:]    # All rows, columns 1 onward: [[2,3], [5,6]]

Slicing returns views, not copies (efficient but careful with modification).

6) Element-wise operations

Operations apply to each element:

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

a + b       # [5, 7, 9]
a * b       # [4, 10, 18]
a ** 2      # [1, 4, 9]
np.sqrt(a)  # [1.0, 1.41, 1.73]
np.exp(a)   # [2.71, 7.39, 20.09]
np.log(a)   # [0.0, 0.69, 1.09]

No explicit loops needed—vectorized and fast.

7) Matrix multiplication

Dot product (vectors):

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.dot(a, b)  # 1*4 + 2*5 + 3*6 = 32

Matrix multiplication:

A = np.array([[1, 2], [3, 4]])  # 2×2
B = np.array([[5, 6], [7, 8]])  # 2×2

np.dot(A, B)   # or A @ B
# [[19, 22],
#  [43, 50]]

Shape rule: (m, n) @ (n, p) → (m, p)

8) Broadcasting

Operations between tensors of different shapes:

a = np.array([[1, 2, 3],
              [4, 5, 6]])    # shape: (2, 3)
b = np.array([10, 20, 30])   # shape: (3,)

a + b  # b is "broadcast" to match a's shape
# [[11, 22, 33],
#  [14, 25, 36]]

Broadcasting rules:

Align shapes from the right
Dimensions are compatible if: equal, or one is 1
Size-1 dimensions are stretched

9) Broadcasting examples

# Add scalar to matrix
matrix + 5

# Add column vector to matrix
matrix + column_vector[:, np.newaxis]

# Multiply row by column (outer product)
row = np.array([1, 2, 3])      # shape: (3,)
col = np.array([[1], [2]])     # shape: (2, 1)
row * col
# [[1, 2, 3],
#  [2, 4, 6]]

Broadcasting is powerful but can hide shape bugs. Always verify shapes.

10) Reshaping tensors

a = np.arange(12)  # [0, 1, 2, ..., 11], shape: (12,)

# Reshape to 2D
a.reshape(3, 4)    # shape: (3, 4)
a.reshape(4, 3)    # shape: (4, 3)
a.reshape(2, 2, 3) # shape: (2, 2, 3)

# -1 means "infer this dimension"
a.reshape(-1, 4)   # shape: (3, 4)
a.reshape(3, -1)   # shape: (3, 4)

# Flatten
matrix.flatten()   # 2D → 1D
matrix.ravel()     # Same, but may return view

11) Axis operations

Reduce along specific dimensions:

a = np.array([[1, 2, 3],
              [4, 5, 6]])  # shape: (2, 3)

a.sum()          # 21 (all elements)
a.sum(axis=0)    # [5, 7, 9] (sum each column)
a.sum(axis=1)    # [6, 15] (sum each row)

a.mean(axis=0)   # Mean of each column
a.max(axis=1)    # Max of each row
a.argmax(axis=1) # Index of max in each row

Axis 0 = rows (vertical), Axis 1 = columns (horizontal).

12) Transpose and swapping axes

a = np.array([[1, 2, 3],
              [4, 5, 6]])  # shape: (2, 3)

a.T              # Transpose, shape: (3, 2)
a.transpose()    # Same
np.transpose(a, (1, 0))  # Explicit axis order

# For higher dimensions
b = np.ones((2, 3, 4))
b.transpose((0, 2, 1))  # shape: (2, 4, 3)

13) Stacking and concatenating

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Stack (new dimension)
np.stack([a, b])       # shape: (2, 3)
np.stack([a, b], axis=1)  # shape: (3, 2)

# Concatenate (along existing dimension)
np.concatenate([a, b])  # [1, 2, 3, 4, 5, 6]

# For 2D
np.vstack([a, b])  # Vertical stack
np.hstack([a, b])  # Horizontal stack

14) Why tensors for deep learning?

GPU acceleration: Tensor ops run on GPUs in parallel
Automatic differentiation: Frameworks track ops for gradients
Batching: Process many samples at once efficiently
Broadcasting: Flexible without explicit loops
Memory efficiency: Views share data

Neural network = sequence of tensor operations:

# Forward pass
z1 = X @ W1 + b1    # Linear
a1 = relu(z1)       # Activation
z2 = a1 @ W2 + b2   # Linear
output = softmax(z2)

Key takeaways

Tensors are n-dimensional arrays (scalars, vectors, matrices, and beyond)
Shape is everything—know your dimensions
Element-wise ops are vectorized (no loops)
Matrix multiplication: (m,n) @ (n,p) → (m,p)
Broadcasting extends smaller tensors to match larger ones
Reshaping, slicing, and axis operations are fundamental
Tensors enable GPU acceleration and automatic differentiation

Tensors: The Building Blocks of Deep Learning

Lesson

Tensors: The Building Blocks of Deep Learning

Why this module exists

1) What is a tensor?

2) Tensor shapes

3) Common tensor shapes in ML

4) Creating tensors

5) Tensor indexing

6) Element-wise operations

7) Matrix multiplication

8) Broadcasting

9) Broadcasting examples

10) Reshaping tensors

11) Axis operations

12) Transpose and swapping axes

13) Stacking and concatenating

14) Why tensors for deep learning?

Key takeaways

Module Items

Tensor Operations

Broadcasting