atompack¶

Atompack: append-only molecule storage for atomistic ML datasets.

A Python API backed by a Rust storage engine for writing, reopening, and serving molecular structures with forces, energies, charges, stress, and custom properties. Built for dataset pipelines, random-access reads, batched array loading, and ASE interoperability.

Examples¶

Create a molecule and add properties:

>>> import atompack
>>> import numpy as np
>>>
>>> positions = np.array([[0.0, 0.0, 0.0], [1.0, 0.0, 0.0]], dtype=np.float32)
>>> atomic_numbers = np.array([6, 8], dtype=np.uint8)
>>> mol = atompack.Molecule.from_arrays(positions, atomic_numbers)
>>> mol.energy = -123.456
>>> mol.forces = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], dtype=np.float32)

Save to database:

>>> db = atompack.Database("data.atp", overwrite=True)
>>> db.add_molecule(mol)
>>> db.flush()

Read back from database:

>>> db = atompack.Database.open("data.atp")
>>> mol = db[0]
>>> print(mol.energy)
-123.456

Database.open(…) is read-only by default and uses mmap. Reopen with Database.open(path, mmap=False) if you want to append molecules.

Submodules¶

Classes¶

`Atom`	Low-level PyO3-backed atom with 3D coordinates and atomic number.
`Database`	Low-level PyO3-backed database for storing molecules with compression.
`Molecule`	Low-level PyO3-backed molecule with optional builtin and custom properties.

Functions¶

`add_ase_batch`(db, atoms_list, *[, copy_info, ...])	Write many ASE Atoms objects efficiently, preserving supported metadata.
`from_ase`(atoms[, energy, forces, charges, velocities, ...])	Convert one ASE Atoms object to an atompack Molecule.
`to_ase`(molecule, *[, attach_calc, calc_mode, ...])	Convert an atompack molecule to `ase.Atoms`.
`to_ase_batch`(source[, indices, attach_calc, ...])	Convert many atompack molecules to ASE Atoms efficiently.

Package Contents¶

class atompack.Atom(x: float, y: float, z: float, atomic_number: int)¶

Low-level PyO3-backed atom with 3D coordinates and atomic number.

Parameters¶

xfloat: X coordinate in Angstroms
yfloat: Y coordinate in Angstroms
zfloat: Z coordinate in Angstroms
atomic_numberint: Atomic number (1=H, 6=C, 8=O, etc.)

Attributes¶

atomic_numberint: The atomic number

position() → tuple[float, float, float]¶

Get the position as a tuple.

Returns¶

tuple of float: (x, y, z) coordinates in Angstroms

property atomic_number: int¶

Get the atomic number.

Returns¶

int: Atomic number (1=H, 6=C, 8=O, etc.)

distance_to(other: PyAtom) → float¶

Calculate distance to another atom.

Parameters¶

otherPyAtom: The other atom

Returns¶

float: Distance in Angstroms

class atompack.Database(path: str, compression: str = 'none', level: int = 3, overwrite: bool = False)¶

Low-level PyO3-backed database for storing molecules with compression.

Supports parallel writes and random access reads, making it useful for training and dataset preparation workflows.

Parameters¶

pathstr: Path to database file
compression{“none”, “lz4”, “zstd”}, default=”none”: Compression type
levelint, default=3: Compression level for zstd (1-22)
overwritebool, default=False: If True, recreates the database file when it already exists.

static open(path: str, mmap: bool = True, populate: bool = False) → PyAtomDatabase¶

Open an existing database.

By default this uses a memory-mapped index and is read-only. Pass mmap=False to reopen the database for appends.

Parameters¶

pathstr: Path to existing database file
mmapbool, default=True: If True, use a memory-mapped index and return a read-only handle. If False, load the index into memory and allow writes.
populatebool, default=False: Only valid when mmap=True. Prefaults mapped pages on Linux.

add_molecule(molecule: PyMolecule) → None¶

Add a single molecule to the database.

Parameters¶

moleculePyMolecule: Molecule to add

add_molecules(molecules: Sequence[PyMolecule]) → None¶

Add multiple molecules in parallel.

Parameters¶

moleculessequence of PyMolecule: Molecules to add

add_arrays_batch(positions: numpy.ndarray, atomic_numbers: numpy.ndarray, *, energy: numpy.ndarray | None = None, forces: numpy.ndarray | None = None, charges: numpy.ndarray | None = None, velocities: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, stress: numpy.ndarray | None = None, pbc: numpy.ndarray | None = None, name: Sequence[str] | None = None, properties: dict[str, Any] | None = None, atom_properties: dict[str, Any] | None = None) → None¶

Add a stacked batch of molecules directly from numpy arrays.

Custom properties can be supplied as batched columns via properties (per-molecule) and atom_properties (per-atom).

get_molecule(index: int) → PyMolecule¶

Get a molecule by index.

Parameters¶

indexint: Molecule index (0-based)

Returns¶

PyMolecule: The requested molecule

get_molecules(indices: Sequence[int]) → list[PyMolecule]¶

Get multiple molecules by indices (batch read).

Parameters¶

indicessequence of int: Molecule indices (0-based)

Returns¶

list of PyMolecule: The requested molecules

get_molecules_flat(indices: Sequence[int]) → dict[str, Any]¶

Get multiple molecules as contiguous batch arrays.

Returns a mapping containing the stacked builtin arrays plus nested properties and atom_properties dictionaries when present.

flush() → None¶

Flush and save the database to disk.

This writes the index and ensures all data is persisted.

class atompack.Molecule(positions: numpy.ndarray, atomic_numbers: numpy.ndarray, *, energy: float | None = None, forces: numpy.ndarray | None = None, charges: numpy.ndarray | None = None, velocities: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, stress: numpy.ndarray | None = None, pbc: tuple[bool, bool, bool] | None = None, name: str | None = None)¶

Low-level PyO3-backed molecule with optional builtin and custom properties.

Parameters¶

positionsndarray of float32, shape (n_atoms, 3): Atomic positions
atomic_numbersndarray of uint8, shape (n_atoms,): Atomic numbers

Attributes¶

forcesndarray of float32, shape (n_atoms, 3), optional: Per-atom forces
energyfloat, optional: Total energy
chargesndarray of float64, shape (n_atoms,), optional: Per-atom partial charges
velocitiesndarray of float32, shape (n_atoms, 3), optional: Per-atom velocities
cellndarray of float64, shape (3, 3), optional: Unit cell for periodic systems
positionsndarray of float32, shape (n_atoms, 3): Atomic positions (read-only)
atomic_numbersndarray of uint8, shape (n_atoms,): Atomic numbers (read-only)

static from_arrays(positions: numpy.ndarray, atomic_numbers: numpy.ndarray, *, energy: float | None = None, forces: numpy.ndarray | None = None, charges: numpy.ndarray | None = None, velocities: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, stress: numpy.ndarray | None = None, pbc: tuple[bool, bool, bool] | None = None, name: str | None = None) → PyMolecule¶

Create a molecule from numpy arrays (fast path).

Parameters¶

positionsndarray of float32, shape (n_atoms, 3): Atomic positions (Angstroms)
atomic_numbersndarray of uint8, shape (n_atoms,): Atomic numbers

atoms() → list[PyAtom]¶

Get the list of atoms.

Returns¶

list of PyAtom: All atoms in the molecule

to_owned() → PyMolecule¶

Materialize the molecule into an owned, self-contained object.

This is useful before pickling or sending a database-fetched lazy view across process boundaries.

property forces: numpy.ndarray | None¶

Per-atom forces.

Returns¶

ndarray of float32, shape (n_atoms, 3) or None: Forces on each atom, or None if not set

property energy: float | None¶

Total energy.

Returns¶

float or None: Energy value, or None if not set

property charges: numpy.ndarray | None¶

Per-atom partial charges.

Returns¶

ndarray of float64, shape (n_atoms,) or None: Charges on each atom, or None if not set

property velocities: numpy.ndarray | None¶

Per-atom velocities.

Returns¶

ndarray of float32, shape (n_atoms, 3) or None: Velocities of each atom, or None if not set

property cell: numpy.ndarray | None¶

Unit cell for periodic systems.

Returns¶

ndarray of float64, shape (3, 3) or None: Unit cell vectors, or None if not set

property stress: numpy.ndarray | None¶

Virial stress tensor.

Returns¶

ndarray of float64, shape (3, 3) or None: Stress tensor, or None if not set

property pbc: tuple[bool, bool, bool] | None¶

Periodic boundary condition flags.

Returns¶

tuple of bool or None: Periodicity along (x, y, z), or None if not set

property positions: numpy.ndarray¶

Atomic positions (read-only).

Returns¶

ndarray of float32, shape (n_atoms, 3): Position of each atom in Angstroms

property atomic_numbers: numpy.ndarray¶

Atomic numbers (read-only).

Returns¶

ndarray of uint8, shape (n_atoms,): Atomic number of each atom

get_property(key: str) → Any¶

Get a custom property by key.

Parameters¶

keystr: Property key

Returns¶

Any: Property value

Raises¶

KeyError: If property key does not exist

set_property(key: str, value: Any, *, scope: Literal['molecule', 'atom'] | None = None) → None¶

Set a custom property.

Parameters¶

keystr: Property key
valueAny: Property value
scope{“molecule”, “atom”}, optional: Property scope. Defaults to molecule for new keys.

property_keys(*, scope: Literal['molecule', 'atom'] | None = None) → list[str]¶

Get all property keys.

Returns¶

list of str: All property keys

has_property(key: str, *, scope: Literal['molecule', 'atom'] | None = None) → bool¶

Check if a property exists.

Parameters¶

keystr: Property key
scope{“molecule”, “atom”}, optional: Restrict the lookup to one scope.

Returns¶

bool: True if property exists, False otherwise

delete_property(key: str) → None¶: Delete a custom property by key.

atompack.add_ase_batch(db, atoms_list, *, copy_info=True, copy_arrays=True, info=None, atom_keys=None, batch_size=512)[source]¶: Write many ASE Atoms objects efficiently, preserving supported metadata.

atompack.from_ase(atoms, energy=None, forces=None, charges=None, velocities=None, cell=None, stress=None, copy_info=True, copy_arrays=True, info=None, atom_keys=None)[source]¶

Convert one ASE Atoms object to an atompack Molecule.

Custom values from atoms.info, atoms.arrays, calculator results, and explicit info= overrides are stored as molecule-scope properties. Keys listed in atom_keys are stored as atom-scope custom properties. Array shape is not used to infer atom-property scope during ingestion.

atompack.to_ase(molecule, *, attach_calc=True, calc_mode='singlepoint', copy_info=True, copy_arrays=True)[source]¶

Convert an atompack molecule to ase.Atoms.

The conversion reads directly from the molecule getters, so it works for both owned and view-backed molecules without going through molecule.atoms(). That keeps the path compatible with lazy SOA-backed molecules, although ASE object creation still requires Python/NumPy allocations.

Mapping rules:

positions and atomic_numbers always become the ASE geometry.
cell and pbc are copied when present.
velocities are attached with atoms.set_velocities(...).
energy, forces, stress, and charges are attached through an ASE calculator when attach_calc=True. calc_mode="singlepoint" preserves ASE’s snapshot semantics, while calc_mode="nocopy" is faster but does not snapshot the atoms state.
Custom properties shaped like per-atom arrays are stored in atoms.arrays when copy_arrays=True.
Remaining custom properties are stored in atoms.info when copy_info=True.

Parameters¶

moleculeatompack.Molecule: Molecule to convert.
attach_calcbool, default=True: Attach supported builtin results through an ASE calculator.
calc_mode{“singlepoint”, “nocopy”, “none”}, default=”singlepoint”: Calculator attachment mode. "singlepoint" uses ASE’s standard snapshotting calculator, "nocopy" skips the internal atoms copy for higher throughput, and "none" suppresses calculator attachment.
copy_infobool, default=True: Copy non-array custom properties into atoms.info.
copy_arraysbool, default=True: Copy per-atom custom arrays into atoms.arrays.

Returns¶

ase.Atoms: Converted ASE object.

atompack.to_ase_batch(source, indices=None, *, attach_calc=True, calc_mode='singlepoint', copy_info=True, copy_arrays=True)[source]¶: Convert many atompack molecules to ASE Atoms efficiently.