In this notebook, we will train a fully-connected neural network to solve an inverse ballistics problem. We will compare supervised training to differentiable physics training, both numerically and visually.
# !pip install phiflow
# from phi.tf.flow import *
from phi.torch.flow import *
# from phi.jax.stax.flow import *
To define the physics problem, we write a function to simulate the forward physics. This function takes the initial position, height, speed and angle of a thrown object and computes where the object will land, assuming the object follows a parabolic trajectory. Neglecting friction, can compute this by solving a quadratic equation.
def simulate_hit(pos, height, vel, angle, gravity=1.):
vel_x, vel_y = math.cos(angle) * vel, math.sin(angle) * vel
height = math.maximum(height, .5)
hit_time = (vel_y + math.sqrt(vel_y**2 + 2 * gravity * height)) / gravity
return pos + vel_x * hit_time, hit_time, height, vel_x, vel_y
simulate_hit(10, 1, 1, 0)[0]
11.414213
Let's plot the trajectory! We define y(x) and sample it on a grid. We have to drop invalid values, such as negative flight times and below-ground points.
def sample_trajectory(pos, height, vel, angle, gravity=1.):
hit, hit_time, height, vel_x, vel_y = simulate_hit(pos, height, vel, angle, gravity)
t = math.linspace(0, hit_time, spatial(t=100))
return vec(x=pos + vel_x * t, y=height + vel_y * t - gravity / 2 * t ** 2)
vis.plot(sample_trajectory(tensor(10), 1, 1, math.linspace(-PI/4, 1.5, channel(angle=7))), title="Varying Angle")
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/phi/field/_field.py:142: FutureWarning: Instance checks on Grid are deprecated and will be removed in version 3.0. Use the methods instance.is_grid, instance.is_point_cloud, instance.is_centered and instance.is_staggered instead. return isinstance(self, Grid)
<Figure size 1200x500 with 1 Axes>
Before we train neural networks on this problem, let's perform a classical optimization using gradient descent in the initial velocity vel
. We need to define a loss function to optimize. Here we desire the object to hit at x=0
.
vel = tensor(1)
def loss_function(vel):
return math.l2_loss(simulate_hit(10, 1, vel, 0)[0] - 0)
loss_function(0)
50.0
ΦFlow uses the selected library (PyTorch/TensorFlow/Jax) to derive analytic derivatives. By default, the gradient function also returns the function value.
gradient = math.functional_gradient(loss_function, get_output=True)
gradient(vel)
(65.14213, [16.142136])
trj = [vel]
for i in range(10):
loss, (grad,) = gradient(vel)
vel = vel - .2 * grad
trj.append(vel)
print(f"vel={vel} - loss={loss}")
trj = math.stack(trj, channel('opt'))
vis.plot(sample_trajectory(tensor(10), 1, trj, 0))
vel=-2.2284272 - loss=65.14213 vel=-4.1654835 - loss=23.451168 vel=-5.3277173 - loss=8.442421 vel=-6.0250573 - loss=3.0392709 vel=-6.4434614 - loss=1.0941381 vel=-6.694504 - loss=0.39388976 vel=-6.8451295 - loss=0.14180061 vel=-6.935505 - loss=0.051048037 vel=-6.9897304 - loss=0.018377367 vel=-7.0222654 - loss=0.0066157645
<Figure size 1200x500 with 1 Axes>
Now we can just subtract the gradient times a learning rate $\eta = 0.2$ until we converge.
Next, we generate a training set and test by sampling random values.
def generate_data(shape):
pos = math.random_normal(shape)
height = math.random_uniform(shape) + .5
vel = math.random_uniform(shape)
angle = math.random_uniform(shape) * PI/2
return math.stack(dict(pos=pos, height=height, vel=vel, angle=angle), channel('vector'))
x_train = generate_data(batch(examples=1000))
x_test = generate_data(batch(examples=1000))
y_train = simulate_hit(*x_train.vector)[0]
y_test = simulate_hit(*x_test.vector)[0]
Now, let's create a fully-connected neural network with three hidden layers. We can reset the seed
to make the weight initialization predictable.
math.seed(0)
net_sup = dense_net(1, 4, [32, 64, 32])
net_sup
DenseNet( (linear0): Linear(in_features=1, out_features=32, bias=True) (linear1): Linear(in_features=32, out_features=64, bias=True) (linear2): Linear(in_features=64, out_features=32, bias=True) (linear_out): Linear(in_features=32, out_features=4, bias=True) )
# net_sup.trainable_weights[0] # TensorFlow
net_sup.state_dict()['linear0.weight'].flatten() # PyTorch
# net_sup.parameters[0][0] # Stax
tensor([ 0.5317, 0.5576, 0.8676, -0.0228, 0.9547, 0.9196, 0.9320, 0.9324, -0.4410, 0.0031, -0.6949, 0.7739, 0.1919, 0.6169, -0.9259, 0.4986, -0.0187, -0.9493, -0.4455, 0.8291, 0.7643, 0.1477, 0.4587, -0.4650, 0.9987, -0.2630, -0.9132, 0.5971, -0.5870, -0.5184, -0.2275, -0.2534])
For the differentiable physics network we do the same thing again.
math.seed(0)
net_dp = dense_net(1, 4, [32, 64, 32])
# net_dp.trainable_weights[0] # TensorFlow
net_dp.state_dict()['linear0.weight'].flatten() # PyTorch
# net_dp.parameters[0][0] # Stax
tensor([ 0.1588, -0.6739, 0.9272, 0.1877, 0.5815, 0.7158, -0.1305, -0.8672, 0.8932, -0.3480, 0.5963, -0.4821, 0.4355, 0.7216, 0.6033, 0.6483, -0.2183, 0.9263, 0.3853, 0.4165, 0.5594, 0.5004, 0.6511, 0.7634, -0.6957, -0.0052, 0.3889, 0.8045, -0.4385, 0.3754, -0.9979, -0.9813])
Indeed, the We see that the networks were initialized identically! Alternatively, we could have saved and loaded the state.
Now we can train the network. We feed the desired hit position into the network and predict a possible initial state. For supervised training, we compare the network prediction to the ground truth from our training set.
opt_sup = adam(net_sup)
def supervised_loss(x, y, net=net_sup):
prediction = math.native_call(net, y)
return math.l2_loss(prediction - x)
print(f"Supervised loss (training set): {supervised_loss(x_train, y_train)}")
print(f"Supervised loss (test set): {supervised_loss(x_test, y_test)}")
Supervised loss (training set): (examplesᵇ=1000) 1.742 ± 0.996 (2e-01...9e+00) Supervised loss (test set): (examplesᵇ=1000) 1.670 ± 0.930 (2e-01...9e+00)
for i in range(100):
update_weights(net_sup, opt_sup, supervised_loss, x_train, y_train)
print(f"Supervised loss (training set): {supervised_loss(x_train, y_train)}")
print(f"Supervised loss (test set): {supervised_loss(x_test, y_test)}")
Supervised loss (training set): (examplesᵇ=1000) 0.347 ± 0.266 (2e-03...4e+00) Supervised loss (test set): (examplesᵇ=1000) 0.330 ± 0.235 (3e-03...2e+00)
What? That's almost no progress! Feel free to run more iterations but there is a deeper problem at work here. Before we get into that, let's train a the network again but with a differentiable physics loss function.
For the differentiable physics loss, we simulate the trajectory given the initial conditions predicted by the network. Then we can measure how close to the desired location the network got.
def physics_loss(y, net=net_dp):
prediction = math.native_call(net, y)
y_sim = simulate_hit(*prediction.vector)[0]
return math.l2_loss(y_sim - y)
opt_dp = adam(net_dp)
for i in range(100):
update_weights(net_dp, opt_dp, physics_loss, y_train)
print(f"Supervised loss (training set): {supervised_loss(x_train, y_train, net=net_dp)}")
print(f"Supervised loss (test set): {supervised_loss(x_test, y_test, net=net_dp)}")
Supervised loss (training set): (examplesᵇ=1000) 1.343 ± 0.847 (1e-01...8e+00) Supervised loss (test set): (examplesᵇ=1000) 1.322 ± 0.775 (1e-01...6e+00)
This looks even worse! The differentiable physics network seems to stray even further from the ground truth. Well, we're not trying to match the ground truth, though. Let's instead measure how close to the desired location the network threw the object.
print(f"Supervised network (test set): {physics_loss(y_test, net=net_sup)}")
print(f"Diff.Phys. network (test set): {physics_loss(y_test, net=net_dp)}")
Supervised network (test set): (examplesᵇ=1000) 0.004 ± 0.009 (1e-10...2e-01) Diff.Phys. network (test set): (examplesᵇ=1000) 6.24e-04 ± 1.3e-03 (3e-08...3e-02)
Now this is much more promissing. The diff.phys. network seems to hit the desired location very accurately considering it was only trained for 100 iterations. With more training steps, this loss will go down even further, unlike the supervised network.
So what is going on here? Why does the supervised network perform so poorly? The answer lies in the problem itself. The task is multi-modal, i.e. there are many initial states that will hit the same target. The network only gets the target position and must decide on a single initial state. With supervised training, there is no way to know which ground truth solution occurs in the test set. The best the network can do is average nearby solutions from the training set. But since the problem is non-linear, this will give only a rough guess.
The diff.phys. network completely ignores the ground truth solutions which are not even passed to the physics_loss
function. Instead, it learns to hit the desired spot, which is exactly what we want.
We can visualize the difference by looking at a couple of trajectories.
predictions = math.stack({
"Ground Truth": x_test.examples[:4],
"Supervised": math.native_call(net_sup, y_test.examples[:4]),
"Diff.Phys": math.native_call(net_dp, y_test.examples[:4]),
}, channel('curves'))
vis.plot(sample_trajectory(*predictions.vector), size=(16, 4))
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/phiml/math/_shape.py:1957: RuntimeWarning: Stacking shapes with incompatible item names will result in item names being lost. For vector Got ('pos', 'height', 'vel', 'angle') and None warnings.warn(f"Stacking shapes with incompatible item names will result in item names being lost. For {name} Got {item_names[index]} and {items}", RuntimeWarning) /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/phi/field/_field.py:142: FutureWarning: Instance checks on Grid are deprecated and will be removed in version 3.0. Use the methods instance.is_grid, instance.is_point_cloud, instance.is_centered and instance.is_staggered instead. return isinstance(self, Grid)
<Figure size 1600x400 with 4 Axes>
We can see that the differentiable physics network matches the hit point much more precisely than the supervised network, as expected from the loss values.