Hugging Face

Atompack can reopen datasets directly from remote Hugging Face repositories and can also publish local datasets through a small API in atompack.hub.

Open Remote Datasets

A common workflow is to read Atompack shards directly from a remote dataset repository. For example, LeMaterial/Atompack exposes layouts such as omat/train and omol/train:

import atompack

db = atompack.hub.open(
    repo_id="LeMaterial/Atompack",
    path_in_repo="omat/train",
)
print(len(db))
print(db[0].energy)
db.close()

db = atompack.hub.open(
    repo_id="LeMaterial/Atompack",
    path_in_repo="omol/train",
)
batch = db.get_molecules([0, 1, 2])
db.close()

AtompackReader presents one flat index space whether the source is a single .atp file or a directory of shard files. Shard ordering is lexicographic by path, and the reader is read-only. Call close() when you are done.

Download Remote Data First

Download only:

import atompack

local_path = atompack.hub.download(
    repo_id="LeMaterial/Atompack",
    path_in_repo="omat/train",
)

print(local_path)

Open the downloaded data through the same reader API:

import atompack

db = atompack.hub.open_path(local_path)
print(db[0].energy)
db.close()

Upload A Local Dataset

Upload a single .atp file:

import atompack

atompack.hub.upload(
    "data/train.atp",
    repo_id="your-org/atompack-demo",
    path_in_repo="exports",
)

Upload a shard directory:

import atompack

atompack.hub.upload(
    "exports/omat/train",
    repo_id="your-org/atompack-demo",
    path_in_repo="omat/train",
)

Directory uploads include **/*.atp plus **/manifest.json when present.