Scrape Datasets from APEBench¤
APEBench is designed to tightly integrate its differentiable solver framework and hence (procedurally) regenerates the training data for each run. This notebook shows you how to export the generated arrays programmatically to use them in different settings like with PyTorch.
import jax.numpy as jnp
import apebench
Reading from a scenario¤
Let's instantiate the default scenario for 1d advection in difficulty mode.
advection_1d_difficulty = apebench.scenarios.difficulty.Advection()
Using the methods get_train_data()
and get_test_data()
procedurally
generates the corresponding JAX arrays.
train_data = advection_1d_difficulty.get_train_data()
test_data = advection_1d_difficulty.get_test_data()
train_data.shape, test_data.shape
From here on, you could use your preferred way to serialize the data or use it further in your application.
# jnp.save("advection_1d_train_data.npy", train_data)
# jnp.save("advection_1d_test_data.npy", test_data)
Modifiying the scenario¤
The important attributes that affect the size of the generated data are:
num_train_samples
train_temporal_horizon
num_test_samples
test_temporal_horizon
Additionally, the num_spatial_dims
, num_points
, and num_channels
affect the latter axes in the data arrays.
The seed for data generation can altered by:
train_seed
test_seed
modified_advection_1d_difficulty = apebench.scenarios.difficulty.Advection(
num_train_samples=81,
train_temporal_horizon=42,
train_seed=-1,
num_test_samples=3,
test_temporal_horizon=101,
test_seed=-3,
)
modified_train_data = modified_advection_1d_difficulty.get_train_data()
modified_test_data = modified_advection_1d_difficulty.get_test_data()
modified_train_data.shape, modified_test_data.shape
Exporting Metadata¤
To get additional information on the data, it can be helpful to extract the attributes of the scenario. Since each scenario is a dataclass, its members can easily be converted into a dictionary.
Let's first print the representation of the scenario
modified_advection_1d_difficulty
Then import the function form the dataclasses
module and convert the scenario to a
dictionary.
from dataclasses import asdict
modified_metadata = asdict(modified_advection_1d_difficulty)
modified_metadata
You can dump this data to a JSON file or use it in any other way you see fit.
# import json
# with open("modified_advection_1d_difficulty.json", "w") as f:
# json.dump(modified_metadata, f)
Using the scraping API¤
APEBench provides a structured way to get train data, test data, and metadata from a scenario.
train_data_ks, test_data_ks, meta_data_ks = apebench.scraper.scrape_data_and_metadata(
scenario="diff_ks"
)
train_data_ks.shape, test_data_ks.shape
meta_data_ks
You can provide any keyword argument that matches the attributes of the scenario to modify the produced data. Let's decrease the resolution.
apebench.scraper.scrape_data_and_metadata(scenario="diff_ks", num_points=64)[0].shape
Having the scraper write to disk¤
If you provide a folder name, the scrape will not return the data but writes it
as .npy
files to disk and dumps the metadata as a JSON file.
# apebench.scraper.scrape_data_and_metadata(".", scenario="diff_ks")
# Creates the following files:
# 1d_diff_ks_train.npy
# 1d_diff_ks_test.npy
# 1d_diff_ks.json
Creating a collection of datasets¤
You can loop over a list of dictionaries that contain scenarios and additional attributes to create a collection of datasets.
Your scenario name must match the short identifier as detailed in apebench.scenarios.scenario_dict
.
# scenario_list = [
# {"scenario": "diff_adv", "num_train_samples": 81},
# {"scenario": "diff_ks", "num_points": 64},
# ]
# for scenario in scenario_list:
# apebench.scraper.scrape_data_and_metadata(".", **scenario)
Export of curated lists¤
APEBench comes with a curation of scenarios, for example the set of data used for the original APEBench paper.
The export for CURATION_APEBENCH_V1
should take ~3min on a modern GPU and should produce ~40GB of data.
# from tqdm import tqdm
# import os
# DATA_PATH = "data"
# os.makedirs(DATA_PATH, exist_ok=True)
# for config in tqdm(apebench.scraper.CURATION_APEBENCH_V1):
# apebench.scraper.scrape_data_and_metadata(DATA_PATH, **config)