An Introduction to Dual Numbers and Automatic Differentiation
#Introduction
Derivatives crop up everywhere, from biology to machine learning. This makes them incredibly important to be able to calculate, something you probably learnt how to do in secondary school or college.
But have you ever considered how to compute them in code?
The naive approach is to implement the rules of the symbolic method you already know, and if you’ve tried this, you’ll know this quickly gets messy due to the huge number of rules. Another option you might have considered is to use finite differences, but this is an approximation. What happens when you need exact derivatives, without the mess?
Enter dual numbers: a surprisingly simple number system that can differentiate functions automatically. In this post, I’ll explain dual numbers from scratch, covering what they are, how algebra works with them, and why they compute derivatives automatically.
#Definition
If you are familiar with complex numbers1, dual numbers follow a simple pattern. Complex numbers are defined using set-builder notation2 as:
or equivalently using quotient rings3:
Dual numbers use the same structure but instead of the imaginary unit , they use the symbol . To give dual numbers their special properties, is defined to satisfy two equations:
Using the same pattern given to us by the complex numbers, we get:
or with quotient rings:
If the quotient ring construction is unfamiliar, don’t worry. The set builder definition describes everything you need to know: dual numbers are pairs of numbers, written as where
These constraints cannot be satisfied by any real number, so lies outside . At first glance, isn’t that interesting, but as we’ll see later, it’s this property that enables automatic differentiation.
#Algebra with Dual Numbers
Before we can see how automatic differentiation works, we need to know how algebraic operations work with dual numbers. Don’t worry too much if you don’t understand the derivations of these operations, as the final rule is what matters.
#Addition/Subtraction
Similar to complex numbers, addition and subtraction is done component-wise:
#Multiplication
Multiplication is also relatively straightforward.
Notice the term vanishes. This truncation of higher-order terms is the reason dual numbers work for automatic differentiation.
#Division
Division is slightly more involved, but nothing too abnormal. Since the division of two duals should yield a dual4, we can express this as (assume ). Then we solve for the numerator:
Since we now have two dual numbers that are equal, we can equate their coefficients for each part:
- Real:
- Dual:
This gives us the rule for division:
#Automatic Differentiation
Now that we know how to do basic algebra with dual numbers, let’s see what makes them useful. Given any differentiable function , then:
This means that the coefficient of in our result is actually equal to the derivative. This isn’t just a coincidence.
#The Taylor Series
To understand why this works, let’s express our function as the Taylor series5 expansion of :
When shown using the summation, it’s not immediately clear what the relevance is, but watch what happens when we expand this for the first few terms:
This is where the magic happens. Since , all terms with will vanish, leaving just:
#An Example
Let’s verify this by trying a couple of examples. I’ll assume you can do basic differentiation to save writing out the steps.
The coefficient of is , which is exactly . The derivative appears automatically without needing to use symbolic methods of differentiation. Let’s try another:
Again, the derivative appears in the coefficient.
#Beyond Polynomials
The examples with polynomials showed the basics, but dual numbers work for any differentiable function. Let’s see how they handle exponentials and trigonometric functions.
Like earlier, let’s start with a Taylor series expansion of :
Just like we did here, we write out the expansion and cancel out all terms that are multiplied by (or higher), leaving as the coefficient of , the correct derivative6. The same pattern applies to all elementary functions:
While this does require knowing the derivative to begin with, we only need to know the derivatives of the elementary functions, but these are well known.
#Composition and the Chain Rule
The real power of dual numbers becomes evident when we compose functions. Let’s compute the derivative of .
If we were doing this symbolically, we’d need to use the chain rule, but with dual numbers we just have to evaluate it like we did before:
We can see that appears automatically; the chain rule happened automatically through algebra. This also applies for arbitrarily complex compositions: would work in exactly the same way.
#Conclusion
Dual numbers give a very elegant solution for computing derivatives through a single constraint: . This causes the Taylor series to be truncated, leaving the derivative in the coefficient. All of this happens without any manipulation of symbols or approximation. Just simple algebra!
In a future post, I’ll cover how this can be implemented in Rust and explore the applications of Forward-Mode Automatic Differentiation. We’ll also see how this method can become inefficient, and when alternative approaches such as Reverse-Mode Automatic Differentiation become necessary.