- 01 Vectors and Vector Spaces: The Language of Data 02 Matrices and Matrix Operations: Organizing Linear Computation 03 Systems of Linear Equations: From Geometry to Algorithms 04 Determinants: The Volume Factor of Linear Maps 05 Linear Transformations: Matrices as Functions 06 Inner Products, Norms, and Orthogonality: Measuring Geometry 07 Eigenvalues and Eigenvectors: The DNA of a Matrix 08 Matrix Decompositions: Breaking Matrices into Simpler Pieces 09 Linear Algebra in Machine Learning: Putting It All Together 10 Matrix Calculus: Derivatives for Machine Learning 11 Tensor Operations: Beyond Matrices 12 Sparse Matrices and Efficient Computation 13 Randomized Linear Algebra: Speed Through Randomness
Introduction
A linear system is a set of equations where each equation is linear in the unknowns. Solving linear systems is one of the most fundamental operations in computational science — and in machine learning, it appears everywhere: fitting a linear model, computing the normal equation, solving for network weights, and running iterative optimization.
This article covers how to solve systems systematically, what the solution set looks like geometrically, and when a solution does or does not exist. We build on the matrix operations from the previous article.
Setting Up the Problem
A system of linear equations in unknowns can be written as:
In matrix form, this becomes:
where is the coefficient matrix, is the unknown vector, and is the right-hand side.
Key insight: The equation asks: “Can be written as a linear combination of the columns of ?” The solution gives the coefficients of that combination.
Geometric Interpretation
There are two dual ways to visualize :
Row Picture
Each equation defines a hyperplane in . The solution is the intersection of all hyperplanes.
In 2D, each equation is a line. Two lines can:
- Intersect at one point (unique solution)
- Be parallel (no solution)
- Be the same line (infinitely many solutions)
Column Picture
The equation asks whether lies in the column space of — the span of the columns of .
If is in the column space, a solution exists. If not, no exact solution exists — but we can find the closest approximation via least squares, which is the foundation of linear regression.
Gaussian Elimination
Gaussian elimination is the systematic algorithm for solving linear systems. It uses three elementary row operations:
- Swap two rows:
- Scale a row: where
- Add a multiple of one row to another:
These operations do not change the solution set.
The Augmented Matrix
We work with the augmented matrix , which appends as an extra column:
Algorithm
- Start from the leftmost column.
- Find a nonzero entry in the current column (the pivot). If necessary, swap rows to bring it to the top.
- Use the pivot to eliminate all entries below it by subtracting appropriate multiples of the pivot row.
- Move to the next column and repeat.
- The result is row echelon form (REF).
- Optionally, continue to reduced row echelon form (RREF) by eliminating entries above each pivot and scaling pivots to 1.
Worked Example
Solve the system:
Form the augmented matrix and apply row operations:
This is row echelon form. Back-substitute:
The unique solution is .
import numpy as np
A = np.array([[1, 2, 1],
[2, 4, 3],
[3, 7, 4]])
b = np.array([9, 21, 30])
x = np.linalg.solve(A, b)
print(x) # [6. 0. 3.]
Row Echelon Form and Pivots
A matrix is in row echelon form (REF) if:
- All zero rows are at the bottom.
- The leading nonzero entry (pivot) in each row is strictly to the right of the pivot in the row above.
- All entries below a pivot are zero.
It is in reduced row echelon form (RREF) if additionally:
- Each pivot is 1.
- Each pivot is the only nonzero entry in its column.
The positions of the pivots determine everything about the solution:
- Pivot columns correspond to basic variables (determined by the system).
- Non-pivot columns correspond to free variables (can take any value).
The Three Cases
The number of solutions depends on the relationship between equations and unknowns:
| Case | Condition | Solution Set |
|---|---|---|
| Unique solution | One point | |
| No solution | Empty (inconsistent) | |
| Infinite solutions | Affine subspace |
This is the Rouché-Capelli theorem (also called the Kronecker-Capelli theorem).
Key insight: In ML, overdetermined systems (, more equations than unknowns) typically have no exact solution. Instead of solving exactly, we minimize — this is least squares regression.
Homogeneous Systems
A homogeneous system has :
The trivial solution always exists. Nontrivial solutions exist if and only if — that is, there are free variables.
The set of all solutions to is the null space (or kernel) of :
The null space is always a subspace of . Its dimension is called the nullity:
This is the rank-nullity theorem, one of the most important results in linear algebra.
The Four Fundamental Subspaces
Every matrix defines four subspaces:
| Subspace | Definition | Dimension |
|---|---|---|
| Column space | ||
| Row space | Span of the rows of | |
| Null space | ||
| Left null space |
These four subspaces satisfy orthogonal complement relationships:
- The row space and null space are orthogonal complements in
- The column space and left null space are orthogonal complements in
Geometric interpretation: When solving , the column space tells you which vectors are reachable. The null space tells you the directions you can move in -space without changing .
LU Decomposition
In practice, Gaussian elimination is implemented as LU decomposition: factoring into a lower triangular matrix and an upper triangular matrix :
Solving then becomes two triangular solves:
- Solve for (forward substitution)
- Solve for (back substitution)
Each triangular solve costs , and the factorization costs . The advantage: once you have the factorization, solving for a new only costs .
import numpy as np
from scipy.linalg import lu, solve_triangular
A = np.array([[2, 1, 1],
[4, 3, 3],
[8, 7, 9]])
P, L, U = lu(A)
# Solve Ax = b using LU
b = np.array([4, 10, 24])
y = solve_triangular(L, P.T @ b, lower=True)
x = solve_triangular(U, y, lower=False)
print(x) # [1. 1. 1.]
Computational Complexity
| Method | Cost | When to Use |
|---|---|---|
| Gaussian elimination | General dense systems | |
| LU decomposition | once, per solve | Multiple right-hand sides |
| Cholesky () | Symmetric positive definite | |
| Iterative methods (CG, GMRES) | per iteration | Large sparse systems |
For the massive systems that arise in ML (millions of parameters), direct methods are too expensive. Instead, we use iterative methods — especially gradient descent, which can be viewed as an iterative solver for the normal equations.
Why This Matters for ML
Linear systems are the computational engine behind many ML methods:
- Normal equation: The closed-form solution to linear regression is , which is solved as a linear system .
- Least squares: When has no solution, the least squares solution minimizes .
- Gaussian processes: Predictions require solving systems involving the kernel matrix.
- Newton’s method: Each step solves , where is the Hessian and is the gradient.
- Rank and redundancy: Rank-deficient systems signal multicollinearity in features, which destabilizes model fitting.
Summary
- A linear system asks whether is a linear combination of the columns of .
- Gaussian elimination transforms the system to row echelon form using elementary row operations.
- The system has a unique solution, no solution, or infinitely many solutions, determined by the ranks of and .
- The null space captures all directions of freedom in the solution.
- The four fundamental subspaces provide a complete geometric picture of any matrix.
- LU decomposition is the practical implementation of Gaussian elimination.
- In ML, overdetermined systems lead to least squares — the mathematical foundation of regression.
- Next, we explore determinants, which provide a single number that captures whether a system has a unique solution.
References
- Strang, G. (2016). Introduction to Linear Algebra (5th ed.). Wellesley-Cambridge Press. math.mit.edu/~gs/linearalgebra
- Trefethen, L. N., & Bau, D. (1997). Numerical Linear Algebra. SIAM.
- Boyd, S., & Vandenberghe, L. (2018). Introduction to Applied Linear Algebra. Cambridge University Press. vmls-book.stanford.edu