Tensors: The Building Blocks of Deep Learning

Lesson, slides, and applied problem sets.

View Slides

Lesson

Tensors: The Building Blocks of Deep Learning

Why this module exists

Deep learning frameworks like PyTorch and TensorFlow are built on tensors—multi-dimensional arrays with superpowers. Understanding tensors is essential for moving from ML fundamentals to neural networks.


1) What is a tensor?

A tensor is a generalization of vectors and matrices:

  • 0D tensor (scalar): A single number → 5
  • 1D tensor (vector): A list of numbers → [1, 2, 3]
  • 2D tensor (matrix): A table → [[1,2], [3,4]]
  • 3D tensor: A "cube" of numbers
  • nD tensor: Higher-dimensional arrays

In practice, tensors are NumPy arrays or framework-specific arrays (torch.Tensor, tf.Tensor).


2) Tensor shapes

Shape describes the size along each dimension:

scalar = 5                    # shape: ()
vector = [1, 2, 3]           # shape: (3,)
matrix = [[1,2], [3,4]]      # shape: (2, 2)
tensor3d = [[[1,2], [3,4]],
            [[5,6], [7,8]]]  # shape: (2, 2, 2)

Always know your shapes! Shape errors are the most common bugs.


3) Common tensor shapes in ML

Batch of samples: (batch_size, n_features)

# 32 samples, each with 10 features
X = np.random.randn(32, 10)  # shape: (32, 10)

Images: (batch, height, width, channels) or (batch, channels, height, width)

# 32 RGB images of 28×28 pixels (channels-last)
images = np.random.randn(32, 28, 28, 3)  # shape: (32, 28, 28, 3)

Sequences: (batch, sequence_length, features)

# 32 sequences of length 100, each with 50 features
sequences = np.random.randn(32, 100, 50)  # shape: (32, 100, 50)

4) Creating tensors

import numpy as np

# From lists
a = np.array([1, 2, 3])

# Zeros and ones
zeros = np.zeros((3, 4))      # 3×4 of zeros
ones = np.ones((2, 3))        # 2×3 of ones

# Random
uniform = np.random.rand(3, 4)     # Uniform [0, 1)
normal = np.random.randn(3, 4)     # Normal(0, 1)

# Range
sequence = np.arange(10)           # [0, 1, 2, ..., 9]
linspace = np.linspace(0, 1, 5)    # [0, 0.25, 0.5, 0.75, 1]

# Identity
identity = np.eye(3)               # 3×3 identity matrix

5) Tensor indexing

a = np.array([[1, 2, 3],
              [4, 5, 6]])

a[0]        # First row: [1, 2, 3]
a[0, 1]     # Element at row 0, col 1: 2
a[:, 0]     # First column: [1, 4]
a[0:1, :]   # First row as 2D: [[1, 2, 3]]
a[:, 1:]    # All rows, columns 1 onward: [[2,3], [5,6]]

Slicing returns views, not copies (efficient but careful with modification).


6) Element-wise operations

Operations apply to each element:

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

a + b       # [5, 7, 9]
a * b       # [4, 10, 18]
a ** 2      # [1, 4, 9]
np.sqrt(a)  # [1.0, 1.41, 1.73]
np.exp(a)   # [2.71, 7.39, 20.09]
np.log(a)   # [0.0, 0.69, 1.09]

No explicit loops needed—vectorized and fast.


7) Matrix multiplication

Dot product (vectors):

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.dot(a, b)  # 1*4 + 2*5 + 3*6 = 32

Matrix multiplication:

A = np.array([[1, 2], [3, 4]])  # 2×2
B = np.array([[5, 6], [7, 8]])  # 2×2

np.dot(A, B)   # or A @ B
# [[19, 22],
#  [43, 50]]

Shape rule: (m, n) @ (n, p) → (m, p)


8) Broadcasting

Operations between tensors of different shapes:

a = np.array([[1, 2, 3],
              [4, 5, 6]])    # shape: (2, 3)
b = np.array([10, 20, 30])   # shape: (3,)

a + b  # b is "broadcast" to match a's shape
# [[11, 22, 33],
#  [14, 25, 36]]

Broadcasting rules:

  1. Align shapes from the right
  2. Dimensions are compatible if: equal, or one is 1
  3. Size-1 dimensions are stretched

9) Broadcasting examples

# Add scalar to matrix
matrix + 5

# Add column vector to matrix
matrix + column_vector[:, np.newaxis]

# Multiply row by column (outer product)
row = np.array([1, 2, 3])      # shape: (3,)
col = np.array([[1], [2]])     # shape: (2, 1)
row * col
# [[1, 2, 3],
#  [2, 4, 6]]

Broadcasting is powerful but can hide shape bugs. Always verify shapes.


10) Reshaping tensors

a = np.arange(12)  # [0, 1, 2, ..., 11], shape: (12,)

# Reshape to 2D
a.reshape(3, 4)    # shape: (3, 4)
a.reshape(4, 3)    # shape: (4, 3)
a.reshape(2, 2, 3) # shape: (2, 2, 3)

# -1 means "infer this dimension"
a.reshape(-1, 4)   # shape: (3, 4)
a.reshape(3, -1)   # shape: (3, 4)

# Flatten
matrix.flatten()   # 2D → 1D
matrix.ravel()     # Same, but may return view

11) Axis operations

Reduce along specific dimensions:

a = np.array([[1, 2, 3],
              [4, 5, 6]])  # shape: (2, 3)

a.sum()          # 21 (all elements)
a.sum(axis=0)    # [5, 7, 9] (sum each column)
a.sum(axis=1)    # [6, 15] (sum each row)

a.mean(axis=0)   # Mean of each column
a.max(axis=1)    # Max of each row
a.argmax(axis=1) # Index of max in each row

Axis 0 = rows (vertical), Axis 1 = columns (horizontal).


12) Transpose and swapping axes

a = np.array([[1, 2, 3],
              [4, 5, 6]])  # shape: (2, 3)

a.T              # Transpose, shape: (3, 2)
a.transpose()    # Same
np.transpose(a, (1, 0))  # Explicit axis order

# For higher dimensions
b = np.ones((2, 3, 4))
b.transpose((0, 2, 1))  # shape: (2, 4, 3)

13) Stacking and concatenating

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Stack (new dimension)
np.stack([a, b])       # shape: (2, 3)
np.stack([a, b], axis=1)  # shape: (3, 2)

# Concatenate (along existing dimension)
np.concatenate([a, b])  # [1, 2, 3, 4, 5, 6]

# For 2D
np.vstack([a, b])  # Vertical stack
np.hstack([a, b])  # Horizontal stack

14) Why tensors for deep learning?

  1. GPU acceleration: Tensor ops run on GPUs in parallel
  2. Automatic differentiation: Frameworks track ops for gradients
  3. Batching: Process many samples at once efficiently
  4. Broadcasting: Flexible without explicit loops
  5. Memory efficiency: Views share data

Neural network = sequence of tensor operations:

# Forward pass
z1 = X @ W1 + b1    # Linear
a1 = relu(z1)       # Activation
z2 = a1 @ W2 + b2   # Linear
output = softmax(z2)

Key takeaways

  • Tensors are n-dimensional arrays (scalars, vectors, matrices, and beyond)
  • Shape is everything—know your dimensions
  • Element-wise ops are vectorized (no loops)
  • Matrix multiplication: (m,n) @ (n,p) → (m,p)
  • Broadcasting extends smaller tensors to match larger ones
  • Reshaping, slicing, and axis operations are fundamental
  • Tensors enable GPU acceleration and automatic differentiation

Module Items