Tensors: The Building Blocks of Deep Learning
Lesson, slides, and applied problem sets.
View SlidesLesson
Tensors: The Building Blocks of Deep Learning
Why this module exists
Deep learning frameworks like PyTorch and TensorFlow are built on tensors—multi-dimensional arrays with superpowers. Understanding tensors is essential for moving from ML fundamentals to neural networks.
1) What is a tensor?
A tensor is a generalization of vectors and matrices:
- 0D tensor (scalar): A single number →
5 - 1D tensor (vector): A list of numbers →
[1, 2, 3] - 2D tensor (matrix): A table →
[[1,2], [3,4]] - 3D tensor: A "cube" of numbers
- nD tensor: Higher-dimensional arrays
In practice, tensors are NumPy arrays or framework-specific arrays (torch.Tensor, tf.Tensor).
2) Tensor shapes
Shape describes the size along each dimension:
scalar = 5 # shape: ()
vector = [1, 2, 3] # shape: (3,)
matrix = [[1,2], [3,4]] # shape: (2, 2)
tensor3d = [[[1,2], [3,4]],
[[5,6], [7,8]]] # shape: (2, 2, 2)
Always know your shapes! Shape errors are the most common bugs.
3) Common tensor shapes in ML
Batch of samples: (batch_size, n_features)
# 32 samples, each with 10 features
X = np.random.randn(32, 10) # shape: (32, 10)
Images: (batch, height, width, channels) or (batch, channels, height, width)
# 32 RGB images of 28×28 pixels (channels-last)
images = np.random.randn(32, 28, 28, 3) # shape: (32, 28, 28, 3)
Sequences: (batch, sequence_length, features)
# 32 sequences of length 100, each with 50 features
sequences = np.random.randn(32, 100, 50) # shape: (32, 100, 50)
4) Creating tensors
import numpy as np
# From lists
a = np.array([1, 2, 3])
# Zeros and ones
zeros = np.zeros((3, 4)) # 3×4 of zeros
ones = np.ones((2, 3)) # 2×3 of ones
# Random
uniform = np.random.rand(3, 4) # Uniform [0, 1)
normal = np.random.randn(3, 4) # Normal(0, 1)
# Range
sequence = np.arange(10) # [0, 1, 2, ..., 9]
linspace = np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1]
# Identity
identity = np.eye(3) # 3×3 identity matrix
5) Tensor indexing
a = np.array([[1, 2, 3],
[4, 5, 6]])
a[0] # First row: [1, 2, 3]
a[0, 1] # Element at row 0, col 1: 2
a[:, 0] # First column: [1, 4]
a[0:1, :] # First row as 2D: [[1, 2, 3]]
a[:, 1:] # All rows, columns 1 onward: [[2,3], [5,6]]
Slicing returns views, not copies (efficient but careful with modification).
6) Element-wise operations
Operations apply to each element:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
a + b # [5, 7, 9]
a * b # [4, 10, 18]
a ** 2 # [1, 4, 9]
np.sqrt(a) # [1.0, 1.41, 1.73]
np.exp(a) # [2.71, 7.39, 20.09]
np.log(a) # [0.0, 0.69, 1.09]
No explicit loops needed—vectorized and fast.
7) Matrix multiplication
Dot product (vectors):
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.dot(a, b) # 1*4 + 2*5 + 3*6 = 32
Matrix multiplication:
A = np.array([[1, 2], [3, 4]]) # 2×2
B = np.array([[5, 6], [7, 8]]) # 2×2
np.dot(A, B) # or A @ B
# [[19, 22],
# [43, 50]]
Shape rule: (m, n) @ (n, p) → (m, p)
8) Broadcasting
Operations between tensors of different shapes:
a = np.array([[1, 2, 3],
[4, 5, 6]]) # shape: (2, 3)
b = np.array([10, 20, 30]) # shape: (3,)
a + b # b is "broadcast" to match a's shape
# [[11, 22, 33],
# [14, 25, 36]]
Broadcasting rules:
- Align shapes from the right
- Dimensions are compatible if: equal, or one is 1
- Size-1 dimensions are stretched
9) Broadcasting examples
# Add scalar to matrix
matrix + 5
# Add column vector to matrix
matrix + column_vector[:, np.newaxis]
# Multiply row by column (outer product)
row = np.array([1, 2, 3]) # shape: (3,)
col = np.array([[1], [2]]) # shape: (2, 1)
row * col
# [[1, 2, 3],
# [2, 4, 6]]
Broadcasting is powerful but can hide shape bugs. Always verify shapes.
10) Reshaping tensors
a = np.arange(12) # [0, 1, 2, ..., 11], shape: (12,)
# Reshape to 2D
a.reshape(3, 4) # shape: (3, 4)
a.reshape(4, 3) # shape: (4, 3)
a.reshape(2, 2, 3) # shape: (2, 2, 3)
# -1 means "infer this dimension"
a.reshape(-1, 4) # shape: (3, 4)
a.reshape(3, -1) # shape: (3, 4)
# Flatten
matrix.flatten() # 2D → 1D
matrix.ravel() # Same, but may return view
11) Axis operations
Reduce along specific dimensions:
a = np.array([[1, 2, 3],
[4, 5, 6]]) # shape: (2, 3)
a.sum() # 21 (all elements)
a.sum(axis=0) # [5, 7, 9] (sum each column)
a.sum(axis=1) # [6, 15] (sum each row)
a.mean(axis=0) # Mean of each column
a.max(axis=1) # Max of each row
a.argmax(axis=1) # Index of max in each row
Axis 0 = rows (vertical), Axis 1 = columns (horizontal).
12) Transpose and swapping axes
a = np.array([[1, 2, 3],
[4, 5, 6]]) # shape: (2, 3)
a.T # Transpose, shape: (3, 2)
a.transpose() # Same
np.transpose(a, (1, 0)) # Explicit axis order
# For higher dimensions
b = np.ones((2, 3, 4))
b.transpose((0, 2, 1)) # shape: (2, 4, 3)
13) Stacking and concatenating
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Stack (new dimension)
np.stack([a, b]) # shape: (2, 3)
np.stack([a, b], axis=1) # shape: (3, 2)
# Concatenate (along existing dimension)
np.concatenate([a, b]) # [1, 2, 3, 4, 5, 6]
# For 2D
np.vstack([a, b]) # Vertical stack
np.hstack([a, b]) # Horizontal stack
14) Why tensors for deep learning?
- GPU acceleration: Tensor ops run on GPUs in parallel
- Automatic differentiation: Frameworks track ops for gradients
- Batching: Process many samples at once efficiently
- Broadcasting: Flexible without explicit loops
- Memory efficiency: Views share data
Neural network = sequence of tensor operations:
# Forward pass
z1 = X @ W1 + b1 # Linear
a1 = relu(z1) # Activation
z2 = a1 @ W2 + b2 # Linear
output = softmax(z2)
Key takeaways
- Tensors are n-dimensional arrays (scalars, vectors, matrices, and beyond)
- Shape is everything—know your dimensions
- Element-wise ops are vectorized (no loops)
- Matrix multiplication: (m,n) @ (n,p) → (m,p)
- Broadcasting extends smaller tensors to match larger ones
- Reshaping, slicing, and axis operations are fundamental
- Tensors enable GPU acceleration and automatic differentiation