Linear Algebra Essentials for ML
Lesson, slides, and applied problem sets.
View SlidesLesson
Linear Algebra Essentials for ML
Why this module exists
Machine learning is fundamentally about mathematical transformations on data. Linear algebra provides the language and tools for these transformations. Whether you're computing predictions, training neural networks, or reducing dimensions, you're doing linear algebra.
This isn't abstract math for its own sake. Every vector is a data point. Every matrix is a transformation or a dataset. Understanding these concepts viscerally will make everything in ML click.
1) Vectors: The fundamental unit
A vector is an ordered list of numbers. In ML, a vector typically represents:
- A single data point (features of one sample)
- A direction in space
- Parameters of a model (weights)
# A 3-dimensional vector
x = [1, 2, 3]
# In ML context: a sample with 3 features
sample = [height, weight, age]
Notation: Vectors are usually column vectors, written as x or x with an arrow. Dimension is the count of elements.
2) Vector operations
Addition and Subtraction
Element-wise operations on vectors of the same dimension:
a = [1, 2, 3]
b = [4, 5, 6]
a + b = [5, 7, 9]
a - b = [-3, -3, -3]
Scalar Multiplication
Multiply each element by a scalar:
c = 2
c * a = [2, 4, 6]
Dot Product
The most important operation in ML. Produces a scalar:
a · b = a[0]*b[0] + a[1]*b[1] + a[2]*b[2]
= 1*4 + 2*5 + 3*6 = 32
The dot product measures:
- Similarity between vectors (higher = more aligned)
- Projection of one vector onto another
- Weighted sum (core of neural networks!)
3) Vector norms (magnitude)
The L2 norm (Euclidean norm) measures vector length:
||x||₂ = sqrt(x[0]² + x[1]² + ... + x[n]²)
||[3, 4]||₂ = sqrt(9 + 16) = 5
The L1 norm (Manhattan norm) is the sum of absolute values:
||x||₁ = |x[0]| + |x[1]| + ... + |x[n]|
||[3, -4]||₁ = 3 + 4 = 7
Why it matters: Norms appear in loss functions, regularization, and distance calculations.
4) Matrices: Collections and transformations
A matrix is a 2D array of numbers. Dimensions are rows × columns.
A = [[1, 2, 3],
[4, 5, 6]] # 2×3 matrix
In ML, matrices represent:
- Datasets (rows = samples, columns = features)
- Transformations (weight matrices in neural networks)
- Covariance and correlation structures
5) Matrix operations
Transpose
Swap rows and columns: A^T
A = [[1, 2], A^T = [[1, 3],
[3, 4]] [2, 4]]
Matrix-Vector Multiplication
A matrix times a vector produces a new vector:
A · x = [sum(A[i,:] * x) for each row i]
This is the core of linear models: prediction = weights · features
Matrix-Matrix Multiplication
For A (m×n) and B (n×p), the result AB is (m×p):
(AB)[i,j] = sum(A[i,k] * B[k,j] for k in 0..n)
Key rule: A's columns must equal B's rows.
6) Special matrices
Identity Matrix (I)
Square matrix with 1s on diagonal, 0s elsewhere. Multiplying by I leaves a matrix unchanged.
I = [[1, 0, 0],
[0, 1, 0],
[0, 0, 1]]
Diagonal Matrix
Only diagonal elements are non-zero. Efficient for scaling.
Symmetric Matrix
A = A^T. Covariance matrices are always symmetric.
7) Linear combinations and span
A linear combination of vectors v₁, v₂, ..., vₙ is:
c₁v₁ + c₂v₂ + ... + cₙvₙ
The span of vectors is all possible linear combinations. This tells you what space the vectors can reach.
Why it matters: Neural networks compute linear combinations at each layer (before activation).
8) Linear independence and rank
Vectors are linearly independent if none can be written as a combination of the others.
The rank of a matrix is the number of linearly independent columns (or rows).
- Full rank: maximum possible (all columns independent)
- Rank-deficient: redundant information, singularity issues
9) Eigenvalues and eigenvectors (intuition)
For a square matrix A, if:
A · v = λ · v
Then v is an eigenvector and λ is its eigenvalue.
Eigenvectors are directions that don't change under the transformation A (only scaled by λ).
Why it matters:
- PCA uses eigenvectors of the covariance matrix
- Understanding model behavior and stability
- Spectral methods in clustering
10) Practical tips
- Shape tracking: Always know your dimensions. Most bugs are shape mismatches.
- Broadcasting: NumPy/PyTorch extend smaller arrays to match larger ones. Powerful but can hide bugs.
- Column vs row: ML conventions vary. Be explicit about shapes.
- Numerical stability: Avoid dividing by very small numbers. Use library functions for inverses.
- Visualization: For 2D/3D, plot vectors to build intuition.
Key takeaways
- Vectors represent data points; matrices represent datasets and transformations
- Dot products are the core computation (similarity, predictions)
- Norms measure size; used in loss functions and regularization
- Matrix multiplication is dimension-sensitive: (m×n) × (n×p) = (m×p)
- Eigenvectors reveal intrinsic structure (PCA, stability analysis)