atompack

Atompack: append-only molecule storage for atomistic ML datasets.

A Python API backed by a Rust storage engine for writing, reopening, and serving molecular structures with forces, energies, charges, stress, and custom properties. Built for dataset pipelines, random-access reads, batched array loading, and ASE interoperability.

Examples

Create a molecule and add properties:

>>> import atompack
>>> import numpy as np
>>>
>>> positions = np.array([[0.0, 0.0, 0.0], [1.0, 0.0, 0.0]], dtype=np.float32)
>>> atomic_numbers = np.array([6, 8], dtype=np.uint8)
>>> mol = atompack.Molecule.from_arrays(positions, atomic_numbers)
>>> mol.energy = -123.456
>>> mol.forces = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], dtype=np.float32)

Save to database:

>>> db = atompack.Database("data.atp", overwrite=True)
>>> db.add_molecule(mol)
>>> db.flush()

Read back from database:

>>> db = atompack.Database.open("data.atp")
>>> mol = db[0]
>>> print(mol.energy)
-123.456

Database.open(…) is read-only by default and uses mmap. Reopen with Database.open(path, mmap=False) if you want to append molecules.

Submodules

Classes

Atom

Low-level PyO3-backed atom with 3D coordinates and atomic number.

Database

Low-level PyO3-backed database for storing molecules with compression.

Molecule

Low-level PyO3-backed molecule with optional builtin and custom properties.

Functions

add_ase_batch(db, atoms_list, *[, copy_info, ...])

Write many ASE Atoms objects efficiently, preserving supported metadata.

from_ase(atoms[, energy, forces, charges, velocities, ...])

Convert one ASE Atoms object to an atompack Molecule.

to_ase(molecule, *[, attach_calc, calc_mode, ...])

Convert an atompack molecule to ase.Atoms.

to_ase_batch(source[, indices, attach_calc, ...])

Convert many atompack molecules to ASE Atoms efficiently.

Package Contents

class atompack.Atom(x: float, y: float, z: float, atomic_number: int)

Low-level PyO3-backed atom with 3D coordinates and atomic number.

Parameters

xfloat

X coordinate in Angstroms

yfloat

Y coordinate in Angstroms

zfloat

Z coordinate in Angstroms

atomic_numberint

Atomic number (1=H, 6=C, 8=O, etc.)

Attributes

atomic_numberint

The atomic number

position() tuple[float, float, float]

Get the position as a tuple.

Returns

tuple of float

(x, y, z) coordinates in Angstroms

property atomic_number: int

Get the atomic number.

Returns

int

Atomic number (1=H, 6=C, 8=O, etc.)

distance_to(other: PyAtom) float

Calculate distance to another atom.

Parameters

otherPyAtom

The other atom

Returns

float

Distance in Angstroms

class atompack.Database(path: str, compression: str = 'none', level: int = 3, overwrite: bool = False)

Low-level PyO3-backed database for storing molecules with compression.

Supports parallel writes and random access reads, making it useful for training and dataset preparation workflows.

Parameters

pathstr

Path to database file

compression{“none”, “lz4”, “zstd”}, default=”none”

Compression type

levelint, default=3

Compression level for zstd (1-22)

overwritebool, default=False

If True, recreates the database file when it already exists.

static open(path: str, mmap: bool = True, populate: bool = False) PyAtomDatabase

Open an existing database.

By default this uses a memory-mapped index and is read-only. Pass mmap=False to reopen the database for appends.

Parameters

pathstr

Path to existing database file

mmapbool, default=True

If True, use a memory-mapped index and return a read-only handle. If False, load the index into memory and allow writes.

populatebool, default=False

Only valid when mmap=True. Prefaults mapped pages on Linux.

add_molecule(molecule: PyMolecule) None

Add a single molecule to the database.

Parameters

moleculePyMolecule

Molecule to add

add_molecules(molecules: Sequence[PyMolecule]) None

Add multiple molecules in parallel.

Parameters

moleculessequence of PyMolecule

Molecules to add

add_arrays_batch(positions: numpy.ndarray, atomic_numbers: numpy.ndarray, *, energy: numpy.ndarray | None = None, forces: numpy.ndarray | None = None, charges: numpy.ndarray | None = None, velocities: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, stress: numpy.ndarray | None = None, pbc: numpy.ndarray | None = None, name: Sequence[str] | None = None, properties: dict[str, Any] | None = None, atom_properties: dict[str, Any] | None = None) None

Add a stacked batch of molecules directly from numpy arrays.

Custom properties can be supplied as batched columns via properties (per-molecule) and atom_properties (per-atom).

get_molecule(index: int) PyMolecule

Get a molecule by index.

Parameters

indexint

Molecule index (0-based)

Returns

PyMolecule

The requested molecule

get_molecules(indices: Sequence[int]) list[PyMolecule]

Get multiple molecules by indices (batch read).

Parameters

indicessequence of int

Molecule indices (0-based)

Returns

list of PyMolecule

The requested molecules

get_molecules_flat(indices: Sequence[int]) dict[str, Any]

Get multiple molecules as contiguous batch arrays.

Returns a mapping containing the stacked builtin arrays plus nested properties and atom_properties dictionaries when present.

flush() None

Flush and save the database to disk.

This writes the index and ensures all data is persisted.

class atompack.Molecule(positions: numpy.ndarray, atomic_numbers: numpy.ndarray, *, energy: float | None = None, forces: numpy.ndarray | None = None, charges: numpy.ndarray | None = None, velocities: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, stress: numpy.ndarray | None = None, pbc: tuple[bool, bool, bool] | None = None, name: str | None = None)

Low-level PyO3-backed molecule with optional builtin and custom properties.

Parameters

positionsndarray of float32, shape (n_atoms, 3)

Atomic positions

atomic_numbersndarray of uint8, shape (n_atoms,)

Atomic numbers

Attributes

forcesndarray of float32, shape (n_atoms, 3), optional

Per-atom forces

energyfloat, optional

Total energy

chargesndarray of float64, shape (n_atoms,), optional

Per-atom partial charges

velocitiesndarray of float32, shape (n_atoms, 3), optional

Per-atom velocities

cellndarray of float64, shape (3, 3), optional

Unit cell for periodic systems

positionsndarray of float32, shape (n_atoms, 3)

Atomic positions (read-only)

atomic_numbersndarray of uint8, shape (n_atoms,)

Atomic numbers (read-only)

static from_arrays(positions: numpy.ndarray, atomic_numbers: numpy.ndarray, *, energy: float | None = None, forces: numpy.ndarray | None = None, charges: numpy.ndarray | None = None, velocities: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, stress: numpy.ndarray | None = None, pbc: tuple[bool, bool, bool] | None = None, name: str | None = None) PyMolecule

Create a molecule from numpy arrays (fast path).

Parameters

positionsndarray of float32, shape (n_atoms, 3)

Atomic positions (Angstroms)

atomic_numbersndarray of uint8, shape (n_atoms,)

Atomic numbers

atoms() list[PyAtom]

Get the list of atoms.

Returns

list of PyAtom

All atoms in the molecule

to_owned() PyMolecule

Materialize the molecule into an owned, self-contained object.

This is useful before pickling or sending a database-fetched lazy view across process boundaries.

property forces: numpy.ndarray | None

Per-atom forces.

Returns

ndarray of float32, shape (n_atoms, 3) or None

Forces on each atom, or None if not set

property energy: float | None

Total energy.

Returns

float or None

Energy value, or None if not set

property charges: numpy.ndarray | None

Per-atom partial charges.

Returns

ndarray of float64, shape (n_atoms,) or None

Charges on each atom, or None if not set

property velocities: numpy.ndarray | None

Per-atom velocities.

Returns

ndarray of float32, shape (n_atoms, 3) or None

Velocities of each atom, or None if not set

property cell: numpy.ndarray | None

Unit cell for periodic systems.

Returns

ndarray of float64, shape (3, 3) or None

Unit cell vectors, or None if not set

property stress: numpy.ndarray | None

Virial stress tensor.

Returns

ndarray of float64, shape (3, 3) or None

Stress tensor, or None if not set

property pbc: tuple[bool, bool, bool] | None

Periodic boundary condition flags.

Returns

tuple of bool or None

Periodicity along (x, y, z), or None if not set

property positions: numpy.ndarray

Atomic positions (read-only).

Returns

ndarray of float32, shape (n_atoms, 3)

Position of each atom in Angstroms

property atomic_numbers: numpy.ndarray

Atomic numbers (read-only).

Returns

ndarray of uint8, shape (n_atoms,)

Atomic number of each atom

get_property(key: str) Any

Get a custom property by key.

Parameters

keystr

Property key

Returns

Any

Property value

Raises

KeyError

If property key does not exist

set_property(key: str, value: Any, *, scope: Literal['molecule', 'atom'] | None = None) None

Set a custom property.

Parameters

keystr

Property key

valueAny

Property value

scope{“molecule”, “atom”}, optional

Property scope. Defaults to molecule for new keys.

property_keys(*, scope: Literal['molecule', 'atom'] | None = None) list[str]

Get all property keys.

Returns

list of str

All property keys

has_property(key: str, *, scope: Literal['molecule', 'atom'] | None = None) bool

Check if a property exists.

Parameters

keystr

Property key

scope{“molecule”, “atom”}, optional

Restrict the lookup to one scope.

Returns

bool

True if property exists, False otherwise

delete_property(key: str) None

Delete a custom property by key.

atompack.add_ase_batch(db, atoms_list, *, copy_info=True, copy_arrays=True, info=None, batch_size=512)[source]

Write many ASE Atoms objects efficiently, preserving supported metadata.

atompack.from_ase(atoms, energy=None, forces=None, charges=None, velocities=None, cell=None, stress=None, copy_info=True, copy_arrays=True, info=None)[source]

Convert one ASE Atoms object to an atompack Molecule.

Custom values from atoms.info, atoms.arrays, calculator results, and explicit info= overrides are stored as molecule-scope properties. Array shape is not used to infer atom-property scope during ingestion.

atompack.to_ase(molecule, *, attach_calc=True, calc_mode='singlepoint', copy_info=True, copy_arrays=True)[source]

Convert an atompack molecule to ase.Atoms.

The conversion reads directly from the molecule getters, so it works for both owned and view-backed molecules without going through molecule.atoms(). That keeps the path compatible with lazy SOA-backed molecules, although ASE object creation still requires Python/NumPy allocations.

Mapping rules:

  • positions and atomic_numbers always become the ASE geometry.

  • cell and pbc are copied when present.

  • velocities are attached with atoms.set_velocities(...).

  • energy, forces, stress, and charges are attached through an ASE calculator when attach_calc=True. calc_mode="singlepoint" preserves ASE’s snapshot semantics, while calc_mode="nocopy" is faster but does not snapshot the atoms state.

  • Custom properties shaped like per-atom arrays are stored in atoms.arrays when copy_arrays=True.

  • Remaining custom properties are stored in atoms.info when copy_info=True.

Parameters

moleculeatompack.Molecule

Molecule to convert.

attach_calcbool, default=True

Attach supported builtin results through an ASE calculator.

calc_mode{“singlepoint”, “nocopy”, “none”}, default=”singlepoint”

Calculator attachment mode. "singlepoint" uses ASE’s standard snapshotting calculator, "nocopy" skips the internal atoms copy for higher throughput, and "none" suppresses calculator attachment.

copy_infobool, default=True

Copy non-array custom properties into atoms.info.

copy_arraysbool, default=True

Copy per-atom custom arrays into atoms.arrays.

Returns

ase.Atoms

Converted ASE object.

atompack.to_ase_batch(source, indices=None, *, attach_calc=True, calc_mode='singlepoint', copy_info=True, copy_arrays=True)[source]

Convert many atompack molecules to ASE Atoms efficiently.