APEbench 2D Datasets

The APEbench 2D datasets consist of various partial differential equation (PDE) simulations in two dimensions. These datasets are generated using the exponax package and provide a comprehensive collection of physics-based simulations for machine learning research.

Installation

To generate these datasets, you'll need to set up a separate conda environment with the required dependencies:

conda create -n exponax python=3.12
conda activate exponax
pip install -r pdetransformer/data/simulations_apebench/requirementsExponax.txt

Available Datasets

The following PDEs are supported in 2D:

Advection (`adv`)

Data Shape: [s=60, t=30, c=1, x=2048, y=2048]
Channels: Density
Varied Parameters: Velocity X/Y (random), Initial Conditions

Diffusion (`diff`)

Data Shape: [s=60, t=30, c=1, x=2048, y=2048]
Channels: Density
Varied Parameters: Viscosity X/Y (random), Initial Conditions

Advection-Diffusion (`adv_diff`)

Data Shape: [s=60, t=30, c=1, x=2048, y=2048]
Channels: Density
Varied Parameters: Velocity X/Y, Viscosity X/Y (random), Initial Conditions

Dispersion (`disp`)

Data Shape: [s=60, t=30, c=1, x=2048, y=2048]
Channels: Density
Varied Parameters: Dispersivity X/Y (random), Initial Conditions

Hyper-Diffusion (`hyp_diff`)

Data Shape: [s=60, t=30, c=1, x=2048, y=2048]
Channels: Density
Varied Parameters: Hyper-Diffusivity (random), Initial Conditions

Burgers Equation (`burgers`)

Data Shape: [s=60, t=30, c=2, x=2048, y=2048]
Channels: Velocity X/Y
Varied Parameters: Viscosity (random), Initial Conditions

Korteweg-de Vries Equation (`kdv`)

Data Shape: [s=60, t=30, c=2, x=2048, y=2048]
Channels: Velocity X/Y
Varied Parameters: Domain Extent, Viscosity (random), Initial Conditions

Kuramoto-Sivashinsky Equation (`ks`)

Data Shape: [s=60, t=30, c=1, x=2048, y=2048]
Channels: Density
Varied Parameters: Domain Extent (random), Initial Conditions
Longer Rollout Test Set: [s=5, t=200, c=1, x=2048, y=2048]

Fisher-KPP Equation (`fisher`)

Data Shape: [s=60, t=30, c=1, x=2048, y=2048]
Channels: Concentration
Varied Parameters: Diffusivity, Reactivity (random), Initial Conditions

Gray-Scott Equation

Multiple configurations available: gs_alpha, gs_beta, gs_gamma, gs_delta, gs_epsilon, gs_theta, gs_iota, gs_kappa

Data Shape: [s=10, t=30, c=2, x=2048, y=2048]
Channels: Concentration A, Concentration B
Varied Parameters: Initial Conditions
Longer Rollout Test Sets: Available for gs_alpha, gs_beta, gs_gamma, gs_epsilon with [s=3, t=100, c=2, x=2048, y=2048]

Swift-Hohenberg Equation (`sh`)

Data Shape: [s=60, t=30, c=1, x=2048, y=2048]
Channels: Concentration
Varied Parameters: Reactivity, Critical Number (random), Initial Conditions

Navier-Stokes: Decaying Turbulence (`decay_turb`)

Data Shape: [s=60, t=30, c=1, x=2048, y=2048]
Channels: Vorticity
Varied Parameters: Viscosity (random), Initial Conditions
Longer Rollout Test Set: [s=5, t=200, c=1, x=2048, y=2048]

Navier-Stokes: Kolmogorov Flow (`kolm_flow`)

Data Shape: [s=60, t=30, c=1, x=2048, y=2048]
Channels: Vorticity
Varied Parameters: Viscosity (random), Initial Conditions
Longer Rollout Test Set: [s=5, t=200, c=1, x=2048, y=2048]

Data Generation

To generate these datasets, use the simulation.py script with the following arguments:

python simulation.py --pde <pde_type> --out_name <output_name> --out_path <output_path> --num_sims <number_of_simulations> [--gpu_id <gpu_id>]

For example, to generate the advection dataset:

python simulation.py --pde adv --out_name adv --out_path ./datasets --num_sims 100

The script will automatically create visualization images for each simulation in a directory named after your output name.

Note

Make sure the datasets are stored in the folder linked in the environment YAML file env/local.yaml in paths.PBDL_index.2D_APE. This is where the dataloader will look for them by default.

Data Format

Each dataset consists of:

Multiple simulations (specified by --num_sims)
Each simulation contains multiple timesteps (t=30 by default)
Each timestep has one or more channels (c=1 or c=2 depending on the PDE)
The spatial resolution is 2048x2048 for all 2D datasets. For training/inference we downsample the spatial resolution to 256x256.

Simulation Type Label

The PDE type is encoded as a class label in the metadata, which can be used as an input to the model. The mapping is shown in the following table:

PDE Type	Class Label	Dataset tag
Advection	1	`adv`
Diffusion	2	`diff`
Advection-Diffusion	3	`adv_diff`
Dispersion	4	`disp`
Hyper-Diffusion	5	`hyp_diff`
Burgers Equation	6	`burgers`
Korteweg-de Vries	7	`kdv`
Kuramoto-Sivashinsky	8	`ks`
Fisher-KPP	9	`fisher`
Gray-Scott	10	`gs_*`
Swift-Hohenberg	11	`sh`
Navier-Stokes: Decaying Turbulence	15	`decay_turb`
Navier-Stokes: Kolmogorov Flow	16	`kolm_flow`

Note

A label class value of num_classes=1000 is used as a special indicator that the PDE type is unknown and needs to be determined from the data itself.

Visualization

The render.py script provides functionality to visualize the generated data. Running simulation.py automatically creates visualization images for each simulation during generation.

APEbench 2D Datasets

Installation

Available Datasets

Advection (adv)

Diffusion (diff)

Advection-Diffusion (adv_diff)

Dispersion (disp)

Hyper-Diffusion (hyp_diff)

Burgers Equation (burgers)

Korteweg-de Vries Equation (kdv)

Kuramoto-Sivashinsky Equation (ks)

Fisher-KPP Equation (fisher)