Automatic Differentiation in ΦML¶

Colab   •   🌐 ΦML   •   📖 Documentation   •   🔗 API   •   ▶ Videos   •   Examples

In [1]:
%%capture
!pip install phiml

from phiml import math

Like Jax, ΦML provides a functional approach to automatic differentiation. You can obtain the derivative of a function using math.gradient(). Note that we have to set the backend to either Jax, PyTorch or TensorFlow since NumPy does not support automatic differentiation.

In [2]:
math.use('torch')

def loss_function(x, y):
    return x ** 2 * y

dx_function = math.gradient(loss_function, wrt='x')
dx_function(x=1., y=1.)
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/phiml/math/_functional.py:630: RuntimeWarning: Using torch for gradient computation because numpy does not support jacobian()
  warnings.warn(f"Using {math.default_backend()} for gradient computation because {key.backend} does not support jacobian()", RuntimeWarning)
Out[2]:
(tensor(1., grad_fn=<MulBackward0>), tensor(2.))

By default, the gradient function also returns the output of the original function. In the above case, the loss value is 1 and the gradient is 2.

We can get the gradient only by passing get_output=False.

In [3]:
dx_function = math.gradient(loss_function, wrt='x', get_output=False)
dx_function(x=1., y=1.)
Out[3]:
tensor(2.)

Since we passed in native types (not ΦML tensors), we also get native types as a result. Let's pass a tensor for x instead.

In [4]:
x = math.wrap([0, 1, 2], math.channel('values'))
try:
    dx_function(x, y=1.)
except Exception as exc:
    print(exc)
Loss must be reduced to a scalar

This failed because gradient() requires our function to return a scalar or batched scalar, but we returned three values along a spatial axis. This restriction applies to all dimension types except for batch dimensions which are automatically summed over.

In [5]:
x = math.wrap([0, 1, 2], math.batch('values'))
dx_function(x, y=1.)
Out[5]:
(0.000, 2.000, 4.000) along valuesᵇ

A simple way to reduce all non-batch dimensions is l2_loss() or simply sum. Both of these operations reduce all dimensions except for batch dimensions.

In [6]:
def loss_function(x, y):
    return math.l2_loss(x ** 2 * y)

dx_function = math.gradient(loss_function, wrt='x', get_output=False)
dx_function(x, y=1.)
Out[6]:
(0.000, 2.000, 16.000) along valuesᵇ

We can get the gradients w.r.t. multiple values by passing multiple strings or a comma-separated str.

In [7]:
math.gradient(loss_function, wrt='x,y', get_output=False)(1, 1)
Out[7]:
[tensor(2.), tensor(1.)]

You can also compute the gradient w.r.t. pytrees and dataclasses.

In [8]:
from dataclasses import dataclass

@dataclass
class Vec:
    x1: math.Tensor
    x2: math.Tensor

    def __mul__(self, other):
        return Vec(self.x1 * other, self.x2 * other)

    def __pow__(self, power, modulo=None):
        return Vec(self.x1 ** power, self.x2 ** power)

    def __value_attrs__(self):
        return 'x1', 'x2'

dx_function(x=Vec(1, 2), y=1.)
Out[8]:
Vec(x1=tensor(2.), x2=tensor(16.))

Here, we create the custom class Vec which holds two properties, x1 and x2. In __value_attrs__, we declare that both members should be considered as values for value operations, such as l2_loss. The analog method __variable_attrs__ defines which values should be considered for automatic differentiation. This defaults to all variables if not implemented.

Further Reading¶

ΦML also provides finite difference differential operators.

Another important function transformation is JIT-compilation.

When training neural networks, the gradient is typically computed under-the-hood.

🌐 ΦML   •   📖 Documentation   •   🔗 API   •   ▶ Videos   •   Examples