atompack¶
Atompack: append-only molecule storage for atomistic ML datasets.
A Python API backed by a Rust storage engine for writing, reopening, and serving molecular structures with forces, energies, charges, stress, and custom properties. Built for dataset pipelines, random-access reads, batched array loading, and ASE interoperability.
Examples¶
Create a molecule and add properties:
>>> import atompack
>>> import numpy as np
>>>
>>> positions = np.array([[0.0, 0.0, 0.0], [1.0, 0.0, 0.0]], dtype=np.float32)
>>> atomic_numbers = np.array([6, 8], dtype=np.uint8)
>>> mol = atompack.Molecule.from_arrays(positions, atomic_numbers)
>>> mol.energy = -123.456
>>> mol.forces = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], dtype=np.float32)
Save to database:
>>> db = atompack.Database("data.atp", overwrite=True)
>>> db.add_molecule(mol)
>>> db.flush()
Read back from database:
>>> db = atompack.Database.open("data.atp")
>>> mol = db[0]
>>> print(mol.energy)
-123.456
Database.open(…) is read-only by default and uses mmap. Reopen with Database.open(path, mmap=False) if you want to append molecules.
Submodules¶
Classes¶
Functions¶
|
Write many ASE Atoms objects efficiently, preserving supported metadata. |
|
Convert one ASE Atoms object to an atompack Molecule. |
|
Convert an atompack molecule to |
|
Convert many atompack molecules to ASE Atoms efficiently. |
Package Contents¶
- class atompack.Atom(x: float, y: float, z: float, atomic_number: int)¶
Low-level PyO3-backed atom with 3D coordinates and atomic number.
Parameters¶
- xfloat
X coordinate in Angstroms
- yfloat
Y coordinate in Angstroms
- zfloat
Z coordinate in Angstroms
- atomic_numberint
Atomic number (1=H, 6=C, 8=O, etc.)
Attributes¶
- atomic_numberint
The atomic number
- class atompack.Database(path: str, compression: str = 'none', level: int = 3, overwrite: bool = False)¶
Low-level PyO3-backed database for storing molecules with compression.
Supports parallel writes and random access reads, making it useful for training and dataset preparation workflows.
Parameters¶
- pathstr
Path to database file
- compression{“none”, “lz4”, “zstd”}, default=”none”
Compression type
- levelint, default=3
Compression level for zstd (1-22)
- overwritebool, default=False
If True, recreates the database file when it already exists.
- static open(path: str, mmap: bool = True, populate: bool = False) PyAtomDatabase¶
Open an existing database.
By default this uses a memory-mapped index and is read-only. Pass
mmap=Falseto reopen the database for appends.Parameters¶
- pathstr
Path to existing database file
- mmapbool, default=True
If True, use a memory-mapped index and return a read-only handle. If False, load the index into memory and allow writes.
- populatebool, default=False
Only valid when
mmap=True. Prefaults mapped pages on Linux.
- add_molecule(molecule: PyMolecule) None¶
Add a single molecule to the database.
Parameters¶
- moleculePyMolecule
Molecule to add
- add_molecules(molecules: Sequence[PyMolecule]) None¶
Add multiple molecules in parallel.
Parameters¶
- moleculessequence of PyMolecule
Molecules to add
- add_arrays_batch(positions: numpy.ndarray, atomic_numbers: numpy.ndarray, *, energy: numpy.ndarray | None = None, forces: numpy.ndarray | None = None, charges: numpy.ndarray | None = None, velocities: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, stress: numpy.ndarray | None = None, pbc: numpy.ndarray | None = None, name: Sequence[str] | None = None, properties: dict[str, Any] | None = None, atom_properties: dict[str, Any] | None = None) None¶
Add a stacked batch of molecules directly from numpy arrays.
Custom properties can be supplied as batched columns via
properties(per-molecule) andatom_properties(per-atom).
- get_molecule(index: int) PyMolecule¶
Get a molecule by index.
Parameters¶
- indexint
Molecule index (0-based)
Returns¶
- PyMolecule
The requested molecule
- get_molecules(indices: Sequence[int]) list[PyMolecule]¶
Get multiple molecules by indices (batch read).
Parameters¶
- indicessequence of int
Molecule indices (0-based)
Returns¶
- list of PyMolecule
The requested molecules
- class atompack.Molecule(positions: numpy.ndarray, atomic_numbers: numpy.ndarray, *, energy: float | None = None, forces: numpy.ndarray | None = None, charges: numpy.ndarray | None = None, velocities: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, stress: numpy.ndarray | None = None, pbc: tuple[bool, bool, bool] | None = None, name: str | None = None)¶
Low-level PyO3-backed molecule with optional builtin and custom properties.
Parameters¶
- positionsndarray of float32, shape (n_atoms, 3)
Atomic positions
- atomic_numbersndarray of uint8, shape (n_atoms,)
Atomic numbers
Attributes¶
- forcesndarray of float32, shape (n_atoms, 3), optional
Per-atom forces
- energyfloat, optional
Total energy
- chargesndarray of float64, shape (n_atoms,), optional
Per-atom partial charges
- velocitiesndarray of float32, shape (n_atoms, 3), optional
Per-atom velocities
- cellndarray of float64, shape (3, 3), optional
Unit cell for periodic systems
- positionsndarray of float32, shape (n_atoms, 3)
Atomic positions (read-only)
- atomic_numbersndarray of uint8, shape (n_atoms,)
Atomic numbers (read-only)
- static from_arrays(positions: numpy.ndarray, atomic_numbers: numpy.ndarray, *, energy: float | None = None, forces: numpy.ndarray | None = None, charges: numpy.ndarray | None = None, velocities: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, stress: numpy.ndarray | None = None, pbc: tuple[bool, bool, bool] | None = None, name: str | None = None) PyMolecule¶
Create a molecule from numpy arrays (fast path).
Parameters¶
- positionsndarray of float32, shape (n_atoms, 3)
Atomic positions (Angstroms)
- atomic_numbersndarray of uint8, shape (n_atoms,)
Atomic numbers
- to_owned() PyMolecule¶
Materialize the molecule into an owned, self-contained object.
This is useful before pickling or sending a database-fetched lazy view across process boundaries.
- property forces: numpy.ndarray | None¶
Per-atom forces.
Returns¶
- ndarray of float32, shape (n_atoms, 3) or None
Forces on each atom, or None if not set
- property charges: numpy.ndarray | None¶
Per-atom partial charges.
Returns¶
- ndarray of float64, shape (n_atoms,) or None
Charges on each atom, or None if not set
- property velocities: numpy.ndarray | None¶
Per-atom velocities.
Returns¶
- ndarray of float32, shape (n_atoms, 3) or None
Velocities of each atom, or None if not set
- property cell: numpy.ndarray | None¶
Unit cell for periodic systems.
Returns¶
- ndarray of float64, shape (3, 3) or None
Unit cell vectors, or None if not set
- property stress: numpy.ndarray | None¶
Virial stress tensor.
Returns¶
- ndarray of float64, shape (3, 3) or None
Stress tensor, or None if not set
- property pbc: tuple[bool, bool, bool] | None¶
Periodic boundary condition flags.
Returns¶
- tuple of bool or None
Periodicity along
(x, y, z), or None if not set
- property positions: numpy.ndarray¶
Atomic positions (read-only).
Returns¶
- ndarray of float32, shape (n_atoms, 3)
Position of each atom in Angstroms
- property atomic_numbers: numpy.ndarray¶
Atomic numbers (read-only).
Returns¶
- ndarray of uint8, shape (n_atoms,)
Atomic number of each atom
- get_property(key: str) Any¶
Get a custom property by key.
Parameters¶
- keystr
Property key
Returns¶
- Any
Property value
Raises¶
- KeyError
If property key does not exist
- set_property(key: str, value: Any, *, scope: Literal['molecule', 'atom'] | None = None) None¶
Set a custom property.
Parameters¶
- keystr
Property key
- valueAny
Property value
- scope{“molecule”, “atom”}, optional
Property scope. Defaults to molecule for new keys.
- property_keys(*, scope: Literal['molecule', 'atom'] | None = None) list[str]¶
Get all property keys.
Returns¶
- list of str
All property keys
- atompack.add_ase_batch(db, atoms_list, *, copy_info=True, copy_arrays=True, info=None, batch_size=512)[source]¶
Write many ASE Atoms objects efficiently, preserving supported metadata.
- atompack.from_ase(atoms, energy=None, forces=None, charges=None, velocities=None, cell=None, stress=None, copy_info=True, copy_arrays=True, info=None)[source]¶
Convert one ASE Atoms object to an atompack Molecule.
Custom values from
atoms.info,atoms.arrays, calculator results, and explicitinfo=overrides are stored as molecule-scope properties. Array shape is not used to infer atom-property scope during ingestion.
- atompack.to_ase(molecule, *, attach_calc=True, calc_mode='singlepoint', copy_info=True, copy_arrays=True)[source]¶
Convert an atompack molecule to
ase.Atoms.The conversion reads directly from the molecule getters, so it works for both owned and view-backed molecules without going through
molecule.atoms(). That keeps the path compatible with lazy SOA-backed molecules, although ASE object creation still requires Python/NumPy allocations.Mapping rules:
positionsandatomic_numbersalways become the ASE geometry.cellandpbcare copied when present.velocitiesare attached withatoms.set_velocities(...).energy,forces,stress, andchargesare attached through an ASE calculator whenattach_calc=True.calc_mode="singlepoint"preserves ASE’s snapshot semantics, whilecalc_mode="nocopy"is faster but does not snapshot the atoms state.Custom properties shaped like per-atom arrays are stored in
atoms.arrayswhencopy_arrays=True.Remaining custom properties are stored in
atoms.infowhencopy_info=True.
Parameters¶
- moleculeatompack.Molecule
Molecule to convert.
- attach_calcbool, default=True
Attach supported builtin results through an ASE calculator.
- calc_mode{“singlepoint”, “nocopy”, “none”}, default=”singlepoint”
Calculator attachment mode.
"singlepoint"uses ASE’s standard snapshotting calculator,"nocopy"skips the internal atoms copy for higher throughput, and"none"suppresses calculator attachment.- copy_infobool, default=True
Copy non-array custom properties into
atoms.info.- copy_arraysbool, default=True
Copy per-atom custom arrays into
atoms.arrays.
Returns¶
- ase.Atoms
Converted ASE object.