Linear Algebra Essentials for ML

Lesson, slides, and applied problem sets.

View Slides

Lesson

Linear Algebra Essentials for ML

Why this module exists

Machine learning is fundamentally about mathematical transformations on data. Linear algebra provides the language and tools for these transformations. Whether you're computing predictions, training neural networks, or reducing dimensions, you're doing linear algebra.

This isn't abstract math for its own sake. Every vector is a data point. Every matrix is a transformation or a dataset. Understanding these concepts viscerally will make everything in ML click.


1) Vectors: The fundamental unit

A vector is an ordered list of numbers. In ML, a vector typically represents:

  • A single data point (features of one sample)
  • A direction in space
  • Parameters of a model (weights)
# A 3-dimensional vector
x = [1, 2, 3]

# In ML context: a sample with 3 features
sample = [height, weight, age]

Notation: Vectors are usually column vectors, written as x or x with an arrow. Dimension is the count of elements.


2) Vector operations

Addition and Subtraction

Element-wise operations on vectors of the same dimension:

a = [1, 2, 3]
b = [4, 5, 6]
a + b = [5, 7, 9]
a - b = [-3, -3, -3]

Scalar Multiplication

Multiply each element by a scalar:

c = 2
c * a = [2, 4, 6]

Dot Product

The most important operation in ML. Produces a scalar:

a · b = a[0]*b[0] + a[1]*b[1] + a[2]*b[2]
      = 1*4 + 2*5 + 3*6 = 32

The dot product measures:

  • Similarity between vectors (higher = more aligned)
  • Projection of one vector onto another
  • Weighted sum (core of neural networks!)

3) Vector norms (magnitude)

The L2 norm (Euclidean norm) measures vector length:

||x||₂ = sqrt(x[0]² + x[1]² + ... + x[n]²)
||[3, 4]||₂ = sqrt(9 + 16) = 5

The L1 norm (Manhattan norm) is the sum of absolute values:

||x||₁ = |x[0]| + |x[1]| + ... + |x[n]|
||[3, -4]||₁ = 3 + 4 = 7

Why it matters: Norms appear in loss functions, regularization, and distance calculations.


4) Matrices: Collections and transformations

A matrix is a 2D array of numbers. Dimensions are rows × columns.

A = [[1, 2, 3],
     [4, 5, 6]]  # 2×3 matrix

In ML, matrices represent:

  • Datasets (rows = samples, columns = features)
  • Transformations (weight matrices in neural networks)
  • Covariance and correlation structures

5) Matrix operations

Transpose

Swap rows and columns: A^T

A = [[1, 2],     A^T = [[1, 3],
     [3, 4]]           [2, 4]]

Matrix-Vector Multiplication

A matrix times a vector produces a new vector:

A · x = [sum(A[i,:] * x) for each row i]

This is the core of linear models: prediction = weights · features

Matrix-Matrix Multiplication

For A (m×n) and B (n×p), the result AB is (m×p):

(AB)[i,j] = sum(A[i,k] * B[k,j] for k in 0..n)

Key rule: A's columns must equal B's rows.


6) Special matrices

Identity Matrix (I)

Square matrix with 1s on diagonal, 0s elsewhere. Multiplying by I leaves a matrix unchanged.

I = [[1, 0, 0],
     [0, 1, 0],
     [0, 0, 1]]

Diagonal Matrix

Only diagonal elements are non-zero. Efficient for scaling.

Symmetric Matrix

A = A^T. Covariance matrices are always symmetric.


7) Linear combinations and span

A linear combination of vectors v₁, v₂, ..., vₙ is:

c₁v₁ + c₂v₂ + ... + cₙvₙ

The span of vectors is all possible linear combinations. This tells you what space the vectors can reach.

Why it matters: Neural networks compute linear combinations at each layer (before activation).


8) Linear independence and rank

Vectors are linearly independent if none can be written as a combination of the others.

The rank of a matrix is the number of linearly independent columns (or rows).

  • Full rank: maximum possible (all columns independent)
  • Rank-deficient: redundant information, singularity issues

9) Eigenvalues and eigenvectors (intuition)

For a square matrix A, if:

A · v = λ · v

Then v is an eigenvector and λ is its eigenvalue.

Eigenvectors are directions that don't change under the transformation A (only scaled by λ).

Why it matters:

  • PCA uses eigenvectors of the covariance matrix
  • Understanding model behavior and stability
  • Spectral methods in clustering

10) Practical tips

  1. Shape tracking: Always know your dimensions. Most bugs are shape mismatches.
  2. Broadcasting: NumPy/PyTorch extend smaller arrays to match larger ones. Powerful but can hide bugs.
  3. Column vs row: ML conventions vary. Be explicit about shapes.
  4. Numerical stability: Avoid dividing by very small numbers. Use library functions for inverses.
  5. Visualization: For 2D/3D, plot vectors to build intuition.

Key takeaways

  • Vectors represent data points; matrices represent datasets and transformations
  • Dot products are the core computation (similarity, predictions)
  • Norms measure size; used in loss functions and regularization
  • Matrix multiplication is dimension-sensitive: (m×n) × (n×p) = (m×p)
  • Eigenvectors reveal intrinsic structure (PCA, stability analysis)

Module Items