Limits and Continuity: The Foundation of Calculus

Understand limits, continuity, and the epsilon-delta definition — the bedrock on which derivatives, integrals, and optimization are built.

Calculus & Optimization March 7, 2026 9 min read

Why Limits Matter

Every central idea in calculus — derivatives, integrals, series — is defined through a limit. The derivative is a limit of difference quotients. The integral is a limit of Riemann sums. Without limits, none of these concepts have rigorous meaning.

For machine learning, limits are not just abstract formalism. Gradient descent relies on derivatives, which rely on limits. Convergence guarantees for optimization algorithms are statements about limits of sequences. Even the universal approximation theorem is a limit-based result. Understanding limits gives you the language to reason about why training works, not just how.

Intuitive Idea of a Limit

A limit describes the value a function approaches as its input approaches some target — even if the function never actually reaches that value.

Consider the function:

f(x)=x21x1f(x) = \frac{x^2 - 1}{x - 1}

At x=1x = 1, this function is undefined (division by zero). But what happens as xx gets close to 1?

xxf(x)f(x)
0.91.9
0.991.99
0.9991.999
1.0012.001
1.012.01
1.12.1

The values approach 2 from both sides. We write:

limx1x21x1=2\lim_{x \to 1} \frac{x^2 - 1}{x - 1} = 2

Algebraically, this is clear: x21=(x1)(x+1)x^2 - 1 = (x-1)(x+1), so for x1x \neq 1, f(x)=x+1f(x) = x + 1, which equals 2 at x=1x = 1.

Key insight: A limit describes the tendency of a function near a point, not the function’s value at that point. The function does not even need to be defined at the target point for the limit to exist.

The Formal Definition

The intuitive idea of “approaching” needs to be made precise. The epsilon-delta definition, formalized by Karl Weierstrass in the 19th century, does exactly that.

We say limxaf(x)=L\lim_{x \to a} f(x) = L if:

ϵ>0,  δ>0 such that 0<xa<δ    f(x)L<ϵ\forall \, \epsilon > 0, \; \exists \, \delta > 0 \text{ such that } 0 < |x - a| < \delta \implies |f(x) - L| < \epsilon

In plain language: no matter how small a tolerance ϵ\epsilon you demand around LL, I can find a neighborhood of radius δ\delta around aa such that every xx in that neighborhood (except aa itself) maps to within ϵ\epsilon of LL.

Worked Example

Let us prove that limx3(2x+1)=7\lim_{x \to 3} (2x + 1) = 7.

We need: given any ϵ>0\epsilon > 0, find δ>0\delta > 0 such that 0<x3<δ    f(x)7<ϵ0 < |x - 3| < \delta \implies |f(x) - 7| < \epsilon.

f(x)7=(2x+1)7=2x6=2x3\begin{aligned} |f(x) - 7| &= |(2x + 1) - 7| \\[6pt] &= |2x - 6| \\[6pt] &= 2|x - 3| \end{aligned}

We want 2x3<ϵ2|x - 3| < \epsilon, which means x3<ϵ/2|x - 3| < \epsilon / 2. So choosing δ=ϵ/2\delta = \epsilon / 2 works. For any ϵ>0\epsilon > 0, whenever 0<x3<δ=ϵ/20 < |x - 3| < \delta = \epsilon/2, we get f(x)7=2x3<2ϵ/2=ϵ|f(x) - 7| = 2|x - 3| < 2 \cdot \epsilon/2 = \epsilon.

One-Sided Limits

Sometimes a function approaches different values from the left and right. The left-hand limit limxaf(x)\lim_{x \to a^-} f(x) considers only x<ax < a, while the right-hand limit limxa+f(x)\lim_{x \to a^+} f(x) considers only x>ax > a.

The two-sided limit exists if and only if both one-sided limits exist and are equal:

limxaf(x)=L    limxaf(x)=L and limxa+f(x)=L\lim_{x \to a} f(x) = L \iff \lim_{x \to a^-} f(x) = L \text{ and } \lim_{x \to a^+} f(x) = L

Computing Limits

Direct Substitution

If ff is a “nice” function (polynomial, rational with nonzero denominator, exponential, etc.), simply plug in:

limxaf(x)=f(a)\lim_{x \to a} f(x) = f(a)

This works whenever ff is continuous at aa — a concept we formalize below.

Algebraic Manipulation

When direct substitution yields 0/00/0 (an indeterminate form), simplify first:

limx4x216x4=limx4(x4)(x+4)x4=limx4(x+4)=8\lim_{x \to 4} \frac{x^2 - 16}{x - 4} = \lim_{x \to 4} \frac{(x-4)(x+4)}{x-4} = \lim_{x \to 4} (x + 4) = 8

Other algebraic techniques include rationalizing (multiplying by the conjugate), factoring, and common denominators.

Squeeze Theorem

If g(x)f(x)h(x)g(x) \leq f(x) \leq h(x) near aa, and limxag(x)=limxah(x)=L\lim_{x \to a} g(x) = \lim_{x \to a} h(x) = L, then limxaf(x)=L\lim_{x \to a} f(x) = L.

Example: limx0x2sin(1/x)=0\lim_{x \to 0} x^2 \sin(1/x) = 0 because x2x2sin(1/x)x2-x^2 \leq x^2 \sin(1/x) \leq x^2 and both bounds go to 0.

Limits at Infinity

The behavior of f(x)f(x) as xx \to \infty tells us about long-term trends. The key principle is dominant term analysis: the fastest-growing term determines the limit.

limx3x2+5xx2+1=limx3+5/x1+1/x2=3\lim_{x \to \infty} \frac{3x^2 + 5x}{x^2 + 1} = \lim_{x \to \infty} \frac{3 + 5/x}{1 + 1/x^2} = 3

Growth rate hierarchy (slowest to fastest):

lnxxaaxx!xxfor a>1\ln x \ll x^a \ll a^x \ll x! \ll x^x \quad \text{for } a > 1

This hierarchy matters in ML when analyzing algorithm complexity — an O(nlogn)O(n \log n) algorithm is fundamentally faster than O(n2)O(n^2).

L’Hopital’s Rule (Preview)

When substitution gives 0/00/0 or /\infty/\infty, L’Hopital’s rule states:

limxaf(x)g(x)=limxaf(x)g(x)\lim_{x \to a} \frac{f(x)}{g(x)} = \lim_{x \to a} \frac{f'(x)}{g'(x)}

provided the right-hand limit exists. This requires derivatives, which we cover next.

Continuity

A function is continuous at a point aa if the limit equals the function value — there are no jumps, holes, or breaks.

Formally, ff is continuous at aa if three conditions hold:

  1. f(a)f(a) is defined
  2. limxaf(x)\lim_{x \to a} f(x) exists
  3. limxaf(x)=f(a)\lim_{x \to a} f(x) = f(a)

If ff is continuous at every point in an interval, we say ff is continuous on that interval.

Types of Discontinuities

When continuity fails, it fails in one of three ways:

  • Removable discontinuity: The limit exists but f(a)f(a) is either undefined or doesn’t match. We can “fill in the hole.” Example: f(x)=(x21)/(x1)f(x) = (x^2 - 1)/(x - 1) at x=1x = 1.

  • Jump discontinuity: Left and right limits both exist but differ. Example: the step function f(x)={0x<01x0f(x) = \begin{cases} 0 & x < 0 \\ 1 & x \geq 0 \end{cases}.

  • Essential (infinite) discontinuity: The limit does not exist (function blows up or oscillates). Example: f(x)=1/xf(x) = 1/x at x=0x = 0.

Properties of Continuous Functions

Continuous functions on closed intervals have powerful guarantees:

Intermediate Value Theorem (IVT): If ff is continuous on [a,b][a, b] and cc is between f(a)f(a) and f(b)f(b), then there exists some x(a,b)x^* \in (a, b) with f(x)=cf(x^*) = c.

Intuition: A continuous function cannot jump over a value. If it starts below cc and ends above cc, it must cross cc somewhere. This is the mathematical reason why bisection search works for root-finding.

Extreme Value Theorem (EVT): If ff is continuous on a closed interval [a,b][a, b], then ff attains both a maximum and a minimum on that interval.

This theorem guarantees that optimization problems with continuous objectives over compact (closed and bounded) domains always have solutions.

Continuity in Higher Dimensions

For a function f:RnRf: \mathbb{R}^n \to \mathbb{R}, continuity at a point a\mathbf{a} means:

limxaf(x)=f(a)\lim_{\mathbf{x} \to \mathbf{a}} f(\mathbf{x}) = f(\mathbf{a})

where xa\mathbf{x} \to \mathbf{a} means xa0\|\mathbf{x} - \mathbf{a}\| \to 0. The function must approach the same value regardless of the direction from which x\mathbf{x} approaches a\mathbf{a}.

This is more subtle than the one-dimensional case — a function can be continuous along every line through a point and still be discontinuous at that point. The standard notion requires convergence along all paths.

Lipschitz Continuity

A function ff is Lipschitz continuous with constant LL if:

f(x)f(y)Lxy|f(x) - f(y)| \leq L \|x - y\|

for all x,yx, y in the domain. This is a stronger condition than ordinary continuity: it bounds how fast the function can change.

Key insight: Lipschitz continuity is the smoothness condition that optimization theory relies on most heavily. When we say “the gradient is LL-Lipschitz,” we are bounding how quickly the gradient can change, which directly determines the maximum safe learning rate for gradient descent.

Most loss functions used in ML (MSE, cross-entropy with bounded inputs, Huber loss) are Lipschitz continuous or have Lipschitz gradients, which is what makes gradient-based optimization work reliably.

Why This Matters for ML

Limits and continuity are not just abstract prerequisites — they underpin the core mechanisms of machine learning:

  • Loss function continuity enables gradient-based optimization. If the loss were discontinuous, small parameter changes could cause unpredictable jumps in the loss value, making optimization impossible.

  • Activation functions illustrate the continuity spectrum. The sigmoid σ(x)=1/(1+ex)\sigma(x) = 1/(1 + e^{-x}) is infinitely differentiable (smooth). ReLU f(x)=max(0,x)f(x) = \max(0, x) is continuous but not differentiable at x=0x = 0. The step function is discontinuous — this is why perceptrons could not be trained with gradient descent, motivating the switch to smooth activations.

  • Convergence of training is a statement about limits. When we say “SGD converges,” we mean the sequence of parameter vectors has a limit that is a (local) minimum of the loss.

  • Lipschitz conditions on the gradient determine the maximum learning rate. If the gradient is LL-Lipschitz, gradient descent converges when the learning rate α<2/L\alpha < 2/L.

These ideas are formalized through derivatives, which we turn to next.

Summary

  • A limit describes the value a function approaches as input approaches a target
  • The epsilon-delta definition makes “approaching” rigorous: for any tolerance ϵ\epsilon, a neighborhood δ\delta exists
  • Direct substitution works for continuous functions; algebraic manipulation handles indeterminate forms
  • A function is continuous if the limit equals the function value — no holes, jumps, or blowups
  • The IVT guarantees intermediate values are hit; the EVT guarantees extrema exist on closed intervals
  • Lipschitz continuity bounds the rate of change and is critical for optimization convergence
  • These foundations enable everything in the series, starting with derivatives and differentiation

References

  • Stewart, J. (2015). Calculus: Early Transcendentals (8th ed.). Cengage Learning.
  • Rudin, W. (1976). Principles of Mathematical Analysis (3rd ed.). McGraw-Hill.
  • Abbott, S. (2015). Understanding Analysis (2nd ed.). Springer.
  • Boyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press. stanford.edu
  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. Chapter 4. deeplearningbook.org

Keyboard Shortcuts

Navigation
j
Next heading
k
Previous heading
n
Next article in series
p
Previous article in series
t
Scroll to top
Actions
r
Toggle reading mode
Ctrl K
Search articles
?
Toggle this help
Esc
Close overlay