Systems of Linear Equations: From Geometry to Algorithms

Solve linear systems with Gaussian elimination, understand solution geometry through row echelon form, and connect to ML applications.

Linear Algebra March 6, 2026 9 min read

Introduction

A linear system is a set of equations where each equation is linear in the unknowns. Solving linear systems is one of the most fundamental operations in computational science — and in machine learning, it appears everywhere: fitting a linear model, computing the normal equation, solving for network weights, and running iterative optimization.

This article covers how to solve systems systematically, what the solution set looks like geometrically, and when a solution does or does not exist. We build on the matrix operations from the previous article.

Setting Up the Problem

A system of mm linear equations in nn unknowns can be written as:

a11x1+a12x2++a1nxn=b1a21x1+a22x2++a2nxn=b2  am1x1+am2x2++amnxn=bm\begin{aligned} a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n &= b_1 \\[6pt] a_{21}x_1 + a_{22}x_2 + \cdots + a_{2n}x_n &= b_2 \\[6pt] &\;\vdots \\[6pt] a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n &= b_m \end{aligned}

In matrix form, this becomes:

Ax=b\mathbf{A}\mathbf{x} = \mathbf{b}

where ARm×n\mathbf{A} \in \mathbb{R}^{m \times n} is the coefficient matrix, xRn\mathbf{x} \in \mathbb{R}^n is the unknown vector, and bRm\mathbf{b} \in \mathbb{R}^m is the right-hand side.

Key insight: The equation Ax=b\mathbf{A}\mathbf{x} = \mathbf{b} asks: “Can b\mathbf{b} be written as a linear combination of the columns of A\mathbf{A}?” The solution x\mathbf{x} gives the coefficients of that combination.

Geometric Interpretation

There are two dual ways to visualize Ax=b\mathbf{A}\mathbf{x} = \mathbf{b}:

Row Picture

Each equation defines a hyperplane in Rn\mathbb{R}^n. The solution is the intersection of all hyperplanes.

In 2D, each equation is a line. Two lines can:

  • Intersect at one point (unique solution)
  • Be parallel (no solution)
  • Be the same line (infinitely many solutions)

Column Picture

The equation asks whether b\mathbf{b} lies in the column space of A\mathbf{A} — the span of the columns of A\mathbf{A}.

If b\mathbf{b} is in the column space, a solution exists. If not, no exact solution exists — but we can find the closest approximation via least squares, which is the foundation of linear regression.

Gaussian Elimination

Gaussian elimination is the systematic algorithm for solving linear systems. It uses three elementary row operations:

  1. Swap two rows: RiRjR_i \leftrightarrow R_j
  2. Scale a row: RicRiR_i \leftarrow cR_i where c0c \neq 0
  3. Add a multiple of one row to another: RiRi+cRjR_i \leftarrow R_i + cR_j

These operations do not change the solution set.

The Augmented Matrix

We work with the augmented matrix [Ab][\mathbf{A} \mid \mathbf{b}], which appends b\mathbf{b} as an extra column:

[Ab]=[a11a12a13b1a21a22a23b2a31a32a33b3][\mathbf{A} \mid \mathbf{b}] = \left[\begin{array}{ccc|c} a_{11} & a_{12} & a_{13} & b_1 \\ a_{21} & a_{22} & a_{23} & b_2 \\ a_{31} & a_{32} & a_{33} & b_3 \end{array}\right]

Algorithm

  1. Start from the leftmost column.
  2. Find a nonzero entry in the current column (the pivot). If necessary, swap rows to bring it to the top.
  3. Use the pivot to eliminate all entries below it by subtracting appropriate multiples of the pivot row.
  4. Move to the next column and repeat.
  5. The result is row echelon form (REF).
  6. Optionally, continue to reduced row echelon form (RREF) by eliminating entries above each pivot and scaling pivots to 1.

Worked Example

Solve the system:

x+2y+z=92x+4y+3z=213x+7y+4z=30\begin{aligned} x + 2y + z &= 9 \\[6pt] 2x + 4y + 3z &= 21 \\[6pt] 3x + 7y + 4z &= 30 \end{aligned}

Form the augmented matrix and apply row operations:

[12192432137430]R22R1[1219001337430]\left[\begin{array}{ccc|c} 1 & 2 & 1 & 9 \\ 2 & 4 & 3 & 21 \\ 3 & 7 & 4 & 30 \end{array}\right] \xrightarrow{R_2 - 2R_1} \left[\begin{array}{ccc|c} 1 & 2 & 1 & 9 \\ 0 & 0 & 1 & 3 \\ 3 & 7 & 4 & 30 \end{array}\right] R33R1[121900130113]R2R3[121901130013]\xrightarrow{R_3 - 3R_1} \left[\begin{array}{ccc|c} 1 & 2 & 1 & 9 \\ 0 & 0 & 1 & 3 \\ 0 & 1 & 1 & 3 \end{array}\right] \xrightarrow{R_2 \leftrightarrow R_3} \left[\begin{array}{ccc|c} 1 & 2 & 1 & 9 \\ 0 & 1 & 1 & 3 \\ 0 & 0 & 1 & 3 \end{array}\right]

This is row echelon form. Back-substitute:

z=3y+z=3    y=0x+2y+z=9    x=6\begin{aligned} z &= 3 \\[6pt] y + z &= 3 \implies y = 0 \\[6pt] x + 2y + z &= 9 \implies x = 6 \end{aligned}

The unique solution is x=[6,0,3]T\mathbf{x} = [6, 0, 3]^T.

import numpy as np

A = np.array([[1, 2, 1],
              [2, 4, 3],
              [3, 7, 4]])
b = np.array([9, 21, 30])

x = np.linalg.solve(A, b)
print(x)  # [6. 0. 3.]

Row Echelon Form and Pivots

A matrix is in row echelon form (REF) if:

  1. All zero rows are at the bottom.
  2. The leading nonzero entry (pivot) in each row is strictly to the right of the pivot in the row above.
  3. All entries below a pivot are zero.

It is in reduced row echelon form (RREF) if additionally:

  1. Each pivot is 1.
  2. Each pivot is the only nonzero entry in its column.

The positions of the pivots determine everything about the solution:

  • Pivot columns correspond to basic variables (determined by the system).
  • Non-pivot columns correspond to free variables (can take any value).

The Three Cases

The number of solutions depends on the relationship between equations and unknowns:

CaseConditionSolution Set
Unique solutionrank(A)=rank([Ab])=n\text{rank}(\mathbf{A}) = \text{rank}([\mathbf{A} \mid \mathbf{b}]) = nOne point
No solutionrank(A)<rank([Ab])\text{rank}(\mathbf{A}) < \text{rank}([\mathbf{A} \mid \mathbf{b}])Empty (inconsistent)
Infinite solutionsrank(A)=rank([Ab])<n\text{rank}(\mathbf{A}) = \text{rank}([\mathbf{A} \mid \mathbf{b}]) < nAffine subspace

This is the Rouché-Capelli theorem (also called the Kronecker-Capelli theorem).

Key insight: In ML, overdetermined systems (m>nm > n, more equations than unknowns) typically have no exact solution. Instead of solving Ax=b\mathbf{A}\mathbf{x} = \mathbf{b} exactly, we minimize Axb2\|\mathbf{A}\mathbf{x} - \mathbf{b}\|^2 — this is least squares regression.

Homogeneous Systems

A homogeneous system has b=0\mathbf{b} = \mathbf{0}:

Ax=0\mathbf{A}\mathbf{x} = \mathbf{0}

The trivial solution x=0\mathbf{x} = \mathbf{0} always exists. Nontrivial solutions exist if and only if rank(A)<n\text{rank}(\mathbf{A}) < n — that is, there are free variables.

The set of all solutions to Ax=0\mathbf{A}\mathbf{x} = \mathbf{0} is the null space (or kernel) of A\mathbf{A}:

null(A)={xRn:Ax=0}\text{null}(\mathbf{A}) = \{\mathbf{x} \in \mathbb{R}^n : \mathbf{A}\mathbf{x} = \mathbf{0}\}

The null space is always a subspace of Rn\mathbb{R}^n. Its dimension is called the nullity:

nullity(A)=nrank(A)\text{nullity}(\mathbf{A}) = n - \text{rank}(\mathbf{A})

This is the rank-nullity theorem, one of the most important results in linear algebra.

The Four Fundamental Subspaces

Every matrix ARm×n\mathbf{A} \in \mathbb{R}^{m \times n} defines four subspaces:

SubspaceDefinitionDimension
Column space C(A)C(\mathbf{A}){Ax:xRn}\{\mathbf{A}\mathbf{x} : \mathbf{x} \in \mathbb{R}^n\}r=rank(A)r = \text{rank}(\mathbf{A})
Row space C(AT)C(\mathbf{A}^T)Span of the rows of A\mathbf{A}rr
Null space N(A)N(\mathbf{A}){x:Ax=0}\{\mathbf{x} : \mathbf{A}\mathbf{x} = \mathbf{0}\}nrn - r
Left null space N(AT)N(\mathbf{A}^T){y:ATy=0}\{\mathbf{y} : \mathbf{A}^T\mathbf{y} = \mathbf{0}\}mrm - r

These four subspaces satisfy orthogonal complement relationships:

  • The row space and null space are orthogonal complements in Rn\mathbb{R}^n
  • The column space and left null space are orthogonal complements in Rm\mathbb{R}^m

Geometric interpretation: When solving Ax=b\mathbf{A}\mathbf{x} = \mathbf{b}, the column space tells you which b\mathbf{b} vectors are reachable. The null space tells you the directions you can move in x\mathbf{x}-space without changing Ax\mathbf{A}\mathbf{x}.

LU Decomposition

In practice, Gaussian elimination is implemented as LU decomposition: factoring A\mathbf{A} into a lower triangular matrix L\mathbf{L} and an upper triangular matrix U\mathbf{U}:

A=LU\mathbf{A} = \mathbf{L}\mathbf{U}

Solving Ax=b\mathbf{A}\mathbf{x} = \mathbf{b} then becomes two triangular solves:

  1. Solve Ly=b\mathbf{L}\mathbf{y} = \mathbf{b} for y\mathbf{y} (forward substitution)
  2. Solve Ux=y\mathbf{U}\mathbf{x} = \mathbf{y} for x\mathbf{x} (back substitution)

Each triangular solve costs O(n2)O(n^2), and the factorization costs O(n3)O(n^3). The advantage: once you have the factorization, solving for a new b\mathbf{b} only costs O(n2)O(n^2).

import numpy as np
from scipy.linalg import lu, solve_triangular

A = np.array([[2, 1, 1],
              [4, 3, 3],
              [8, 7, 9]])

P, L, U = lu(A)

# Solve Ax = b using LU
b = np.array([4, 10, 24])
y = solve_triangular(L, P.T @ b, lower=True)
x = solve_triangular(U, y, lower=False)
print(x)  # [1. 1. 1.]

Computational Complexity

MethodCostWhen to Use
Gaussian eliminationO(n3)O(n^3)General dense systems
LU decompositionO(n3)O(n^3) once, O(n2)O(n^2) per solveMultiple right-hand sides
Cholesky (A=LLT\mathbf{A} = \mathbf{L}\mathbf{L}^T)O(n3/3)O(n^3/3)Symmetric positive definite
Iterative methods (CG, GMRES)O(nk)O(nk) per iterationLarge sparse systems

For the massive systems that arise in ML (millions of parameters), direct methods are too expensive. Instead, we use iterative methods — especially gradient descent, which can be viewed as an iterative solver for the normal equations.

Why This Matters for ML

Linear systems are the computational engine behind many ML methods:

  • Normal equation: The closed-form solution to linear regression is w^=(XTX)1XTy\hat{\mathbf{w}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}, which is solved as a linear system (XTX)w^=XTy(\mathbf{X}^T\mathbf{X})\hat{\mathbf{w}} = \mathbf{X}^T\mathbf{y}.
  • Least squares: When Ax=b\mathbf{A}\mathbf{x} = \mathbf{b} has no solution, the least squares solution minimizes Axb2\|\mathbf{A}\mathbf{x} - \mathbf{b}\|^2.
  • Gaussian processes: Predictions require solving systems involving the kernel matrix.
  • Newton’s method: Each step solves HΔx=g\mathbf{H}\Delta\mathbf{x} = -\mathbf{g}, where H\mathbf{H} is the Hessian and g\mathbf{g} is the gradient.
  • Rank and redundancy: Rank-deficient systems signal multicollinearity in features, which destabilizes model fitting.

Summary

  • A linear system Ax=b\mathbf{A}\mathbf{x} = \mathbf{b} asks whether b\mathbf{b} is a linear combination of the columns of A\mathbf{A}.
  • Gaussian elimination transforms the system to row echelon form using elementary row operations.
  • The system has a unique solution, no solution, or infinitely many solutions, determined by the ranks of A\mathbf{A} and [Ab][\mathbf{A} \mid \mathbf{b}].
  • The null space captures all directions of freedom in the solution.
  • The four fundamental subspaces provide a complete geometric picture of any matrix.
  • LU decomposition is the practical implementation of Gaussian elimination.
  • In ML, overdetermined systems lead to least squares — the mathematical foundation of regression.
  • Next, we explore determinants, which provide a single number that captures whether a system has a unique solution.

References

  • Strang, G. (2016). Introduction to Linear Algebra (5th ed.). Wellesley-Cambridge Press. math.mit.edu/~gs/linearalgebra
  • Trefethen, L. N., & Bau, D. (1997). Numerical Linear Algebra. SIAM.
  • Boyd, S., & Vandenberghe, L. (2018). Introduction to Applied Linear Algebra. Cambridge University Press. vmls-book.stanford.edu

Keyboard Shortcuts

Navigation
j
Next heading
k
Previous heading
n
Next article in series
p
Previous article in series
t
Scroll to top
Actions
r
Toggle reading mode
Ctrl K
Search articles
?
Toggle this help
Esc
Close overlay