PhiFlow

Optimization and Training

The backends PyTorch, TensorFlow and Jax have built-in automatic differentiation functionality. However, the respective APIs vary widely in how the gradients are computed. ΦFlow seeks to unify optimization and gradient computation so that code written against the ΦFlow API will work with all backends. Nevertheless, we recommend using backend-specific optimization for certain tasks like neural network training.

The following overview table shows which ΦFlow optimization functions are supported by which backends.

Backend solve_linear minimize functional_gradient NN Training
PyTorch
TensorFlow
Jax  
NumPy      

Physics Optimization

ΦFlow provides functions for solving linear equations as well as finding optimal parameters for non-linear problems. These functions work with all backends but run much slower with NumPy due to the lack of analytic gradients.

Nonlinear Optimization

The function math.minimize() and solve_nonlinear() solve unconstrained nonlinear optimization problems. The following example uses L-BFGS-B to find a solution to a nonlinear optimization problem using the phi.field API:

def loss(x1: Grid, x2: Grid) -> math.Tensor:
  return field.l2_loss(physics(x1, x2) - target)

x0 = x0_1, x0_2
solution = math.minimize(math.jit_compile(loss), math.Solve('L-BFGS-B', 0, 1e-3, x0=x0))

Linear Equations

For solving linear systems of equations, ΦFlow provides the function math.solve_linear(). The following example uses the conjugate gradient algorithm to solve_linear f(x) = y:

@math.jit_compile_linear
def f(x: Grid) -> Grid:
    return field.where(mask, 2 * field.laplace(x), x)

x = math.solve_linear(f, y, math.Solve('CG', 1e-3, 0, x0=x0))

Which solver implementation is used, depends on the backend. However, all backends support conjugate gradient ('CG') and conjugate gradient with adaptive step size ('CG-adaptive'). Specify 'auto' to let ΦFlow chose an appropriate solver.

Overview: Implementations

Method PyTorch TensorFlow Jax NumPy
‘CG’ Φ-Flow Φ-Flow jax.scipy.sparse.linalg.cg (only in jit mode, no trajectory) scipy.sparse.linalg.cg (no trajectory)
‘CG-adaptive’ Φ-Flow Φ-Flow Φ-Flow (only function-based) Φ-Flow
‘auto’ Φ-Flow Φ-Flow jax.scipy.sparse.linalg.cg (only in jit mode, no trajectory) scipy.sparse.linalg.spsolve (only for sparse matrices)

Handling Failed Optimizations

Both solve_linear and minimize return only the solution. To access further information about the optimization, run the optimization within a SolveTape context.

When a solve does not find a solution, a subclass of ConvergenceException is thrown.

solve = math.Solve('CG', 1e-3, 0, max_iterations=300)
try:
    solution = field.solve_linear(..., solve=solve)
    solution = field.minimize(..., solve=solve)
    converged = True
except NotConverged as not_converged:
    print(not_converged)
    last_estimate = not_converged.x
except Diverged as diverged:
    print(diverged)
iterations_performed = solve.result.iterations  # available in any case

Backpropagation

Currently, backpropagation is supported for solve_linear(). The following code shows how to specify different settings for the backpropagation solve:

from phi.flow import *
gradient_solve = math.Solve('auto', 1e-5, 0, max_iterations=100, x0=x0)
solve = math.Solve('CG', 1e-5, 1e-5, gradient_solve=gradient_solve, x0=None)

Solves during backpropagation raise the same exceptions as in the forward pass. The only exception is TensorFlow where a warning message is printed instead since exceptions are not supported during backpropagation. Information about backpropagation solves can be obtained the same ways with the forward solves, using a SolveTape context around the gradient computation.

Computing Gradients

There are two ways to evaluate gradients using ΦFlow: computing gradients of functions and computing gradients of tensors. The former is the preferred way and works will all backends except for NumPy.

Gradients of functions

The functions math.functional_gradient() and field.functional_gradient() compute the gradient of a Python function with respect to one or multiple arguments. The following example evaluates the gradient of physics w.r.t. two of its inputs.

def physics(x: Grid, y: Grid, target: Grid) -> math.Tensor:
    x, y = step(x, y)
    return field.l2_loss(y - target), x

gradient = field.functional_gradient(physics, wrt=[0, 1], get_output=True)

loss, x, dx, dy = gradient(...)

In the above example, wrt=[0, 1] specifies that we are interested in the gradient with respect to the first and second argument of physics. With get_output=True, evaluating the gradient also returns all regular outputs of physics. Otherwise, only dx, dy would be returned.

Note that the first output of physics must be a scalar Tensor object. All other outputs are not part of the gradient computation.

Note that higher-order derivatives might not work as expected with some backends.

Gradients of Tensors

This method is only supported by PyTorch and TensorFlow. Operations within a math.record_gradients context record gradient information. Calling math.gradients then computes the gradients with respect to a previously marked tensor.

Warning: You have to manually detach PyTorch tensors that will be used outside the context. Higher-order derivatives are not supported for PyTorch.

Backend-specific Optimization

Recommended for neural network training.

Unfortunately all supported backends have a different approach to computing gradients:

TensorFlow and PyTorch include various optimizers for neural network (NN) training. Additionally, NN variables created through the respective layer functions are typically marked as variables by default, meaning the computational graph for derived tensors is created automatically.

PyTorch Neural Network

The following script shows how a PyTorch neural network can be trained. See the demo network_training_pytorch.py for a full example.

net = u_net(2, 2)
optimizer = optim.Adam(net.parameters(), lr=1e-3)

for training_step in range(100):
    data: Grid = load_training_data(training_step)
    optimizer.zero_grad()
    prediction: Grid = field.native_call(net, data)
    simulation_output: Grid = simulate(prediction)
    loss = field.l2_loss(simulation_output)
    loss.native().backward()
    optimizer.step()

In the above example, field.native_call() extracts the field values as PyTorch tensors with shape (batch_size, channels, spatial...), then calls the network and returs the result again as a Field.

Since loss is a phi.math.Tensor, we need to invoke native() to call PyTorch’s backward() function.

TensorFlow Neural Network

For TensorFlow, a GradientTape context is required around network evaluation, physics and loss definition.