atompack ======== .. py:module:: atompack .. autoapi-nested-parse:: Atompack: append-only molecule storage for atomistic ML datasets. A Python API backed by a Rust storage engine for writing, reopening, and serving molecular structures with forces, energies, charges, stress, and custom properties. Built for dataset pipelines, random-access reads, batched array loading, and ASE interoperability. Examples -------- Create a molecule and add properties: >>> import atompack >>> import numpy as np >>> >>> positions = np.array([[0.0, 0.0, 0.0], [1.0, 0.0, 0.0]], dtype=np.float32) >>> atomic_numbers = np.array([6, 8], dtype=np.uint8) >>> mol = atompack.Molecule.from_arrays(positions, atomic_numbers) >>> mol.energy = -123.456 >>> mol.forces = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], dtype=np.float32) Save to database: >>> db = atompack.Database("data.atp", overwrite=True) >>> db.add_molecule(mol) >>> db.flush() Read back from database: >>> db = atompack.Database.open("data.atp") >>> mol = db[0] >>> print(mol.energy) -123.456 `Database.open(...)` is read-only by default and uses mmap. Reopen with `Database.open(path, mmap=False)` if you want to append molecules. Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/atompack/ase_bridge/index /autoapi/atompack/hub/index Classes ------- .. autoapisummary:: atompack.Atom atompack.Database atompack.Molecule Functions --------- .. autoapisummary:: atompack.add_ase_batch atompack.from_ase atompack.to_ase atompack.to_ase_batch Package Contents ---------------- .. py:class:: Atom(x: float, y: float, z: float, atomic_number: int) Low-level PyO3-backed atom with 3D coordinates and atomic number. Parameters ---------- x : float X coordinate in Angstroms y : float Y coordinate in Angstroms z : float Z coordinate in Angstroms atomic_number : int Atomic number (1=H, 6=C, 8=O, etc.) Attributes ---------- atomic_number : int The atomic number .. py:method:: position() -> tuple[float, float, float] Get the position as a tuple. Returns ------- tuple of float ``(x, y, z)`` coordinates in Angstroms .. py:property:: atomic_number :type: int Get the atomic number. Returns ------- int Atomic number (1=H, 6=C, 8=O, etc.) .. py:method:: distance_to(other: PyAtom) -> float Calculate distance to another atom. Parameters ---------- other : PyAtom The other atom Returns ------- float Distance in Angstroms .. py:class:: Database(path: str, compression: str = 'none', level: int = 3, overwrite: bool = False) Low-level PyO3-backed database for storing molecules with compression. Supports parallel writes and random access reads, making it useful for training and dataset preparation workflows. Parameters ---------- path : str Path to database file compression : {"none", "lz4", "zstd"}, default="none" Compression type level : int, default=3 Compression level for zstd (1-22) overwrite : bool, default=False If True, recreates the database file when it already exists. .. py:method:: open(path: str, mmap: bool = True, populate: bool = False) -> PyAtomDatabase :staticmethod: Open an existing database. By default this uses a memory-mapped index and is read-only. Pass ``mmap=False`` to reopen the database for appends. Parameters ---------- path : str Path to existing database file mmap : bool, default=True If True, use a memory-mapped index and return a read-only handle. If False, load the index into memory and allow writes. populate : bool, default=False Only valid when ``mmap=True``. Prefaults mapped pages on Linux. .. py:method:: add_molecule(molecule: PyMolecule) -> None Add a single molecule to the database. Parameters ---------- molecule : PyMolecule Molecule to add .. py:method:: add_molecules(molecules: Sequence[PyMolecule]) -> None Add multiple molecules in parallel. Parameters ---------- molecules : sequence of PyMolecule Molecules to add .. py:method:: add_arrays_batch(positions: numpy.ndarray, atomic_numbers: numpy.ndarray, *, energy: numpy.ndarray | None = None, forces: numpy.ndarray | None = None, charges: numpy.ndarray | None = None, velocities: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, stress: numpy.ndarray | None = None, pbc: numpy.ndarray | None = None, name: Sequence[str] | None = None, properties: dict[str, Any] | None = None, atom_properties: dict[str, Any] | None = None) -> None Add a stacked batch of molecules directly from numpy arrays. Custom properties can be supplied as batched columns via ``properties`` (per-molecule) and ``atom_properties`` (per-atom). .. py:method:: get_molecule(index: int) -> PyMolecule Get a molecule by index. Parameters ---------- index : int Molecule index (0-based) Returns ------- PyMolecule The requested molecule .. py:method:: get_molecules(indices: Sequence[int]) -> list[PyMolecule] Get multiple molecules by indices (batch read). Parameters ---------- indices : sequence of int Molecule indices (0-based) Returns ------- list of PyMolecule The requested molecules .. py:method:: get_molecules_flat(indices: Sequence[int]) -> dict[str, Any] Get multiple molecules as contiguous batch arrays. Returns a mapping containing the stacked builtin arrays plus nested ``properties`` and ``atom_properties`` dictionaries when present. .. py:method:: flush() -> None Flush and save the database to disk. This writes the index and ensures all data is persisted. .. py:class:: Molecule(positions: numpy.ndarray, atomic_numbers: numpy.ndarray, *, energy: float | None = None, forces: numpy.ndarray | None = None, charges: numpy.ndarray | None = None, velocities: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, stress: numpy.ndarray | None = None, pbc: tuple[bool, bool, bool] | None = None, name: str | None = None) Low-level PyO3-backed molecule with optional builtin and custom properties. Parameters ---------- positions : ndarray of float32, shape (n_atoms, 3) Atomic positions atomic_numbers : ndarray of uint8, shape (n_atoms,) Atomic numbers Attributes ---------- forces : ndarray of float32, shape (n_atoms, 3), optional Per-atom forces energy : float, optional Total energy charges : ndarray of float64, shape (n_atoms,), optional Per-atom partial charges velocities : ndarray of float32, shape (n_atoms, 3), optional Per-atom velocities cell : ndarray of float64, shape (3, 3), optional Unit cell for periodic systems positions : ndarray of float32, shape (n_atoms, 3) Atomic positions (read-only) atomic_numbers : ndarray of uint8, shape (n_atoms,) Atomic numbers (read-only) .. py:method:: from_arrays(positions: numpy.ndarray, atomic_numbers: numpy.ndarray, *, energy: float | None = None, forces: numpy.ndarray | None = None, charges: numpy.ndarray | None = None, velocities: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, stress: numpy.ndarray | None = None, pbc: tuple[bool, bool, bool] | None = None, name: str | None = None) -> PyMolecule :staticmethod: Create a molecule from numpy arrays (fast path). Parameters ---------- positions : ndarray of float32, shape (n_atoms, 3) Atomic positions (Angstroms) atomic_numbers : ndarray of uint8, shape (n_atoms,) Atomic numbers .. py:method:: atoms() -> list[PyAtom] Get the list of atoms. Returns ------- list of PyAtom All atoms in the molecule .. py:method:: to_owned() -> PyMolecule Materialize the molecule into an owned, self-contained object. This is useful before pickling or sending a database-fetched lazy view across process boundaries. .. py:property:: forces :type: numpy.ndarray | None Per-atom forces. Returns ------- ndarray of float32, shape (n_atoms, 3) or None Forces on each atom, or None if not set .. py:property:: energy :type: float | None Total energy. Returns ------- float or None Energy value, or None if not set .. py:property:: charges :type: numpy.ndarray | None Per-atom partial charges. Returns ------- ndarray of float64, shape (n_atoms,) or None Charges on each atom, or None if not set .. py:property:: velocities :type: numpy.ndarray | None Per-atom velocities. Returns ------- ndarray of float32, shape (n_atoms, 3) or None Velocities of each atom, or None if not set .. py:property:: cell :type: numpy.ndarray | None Unit cell for periodic systems. Returns ------- ndarray of float64, shape (3, 3) or None Unit cell vectors, or None if not set .. py:property:: stress :type: numpy.ndarray | None Virial stress tensor. Returns ------- ndarray of float64, shape (3, 3) or None Stress tensor, or None if not set .. py:property:: pbc :type: tuple[bool, bool, bool] | None Periodic boundary condition flags. Returns ------- tuple of bool or None Periodicity along ``(x, y, z)``, or None if not set .. py:property:: positions :type: numpy.ndarray Atomic positions (read-only). Returns ------- ndarray of float32, shape (n_atoms, 3) Position of each atom in Angstroms .. py:property:: atomic_numbers :type: numpy.ndarray Atomic numbers (read-only). Returns ------- ndarray of uint8, shape (n_atoms,) Atomic number of each atom .. py:method:: get_property(key: str) -> Any Get a custom property by key. Parameters ---------- key : str Property key Returns ------- Any Property value Raises ------ KeyError If property key does not exist .. py:method:: set_property(key: str, value: Any, *, scope: Literal['molecule', 'atom'] | None = None) -> None Set a custom property. Parameters ---------- key : str Property key value : Any Property value scope : {"molecule", "atom"}, optional Property scope. Defaults to molecule for new keys. .. py:method:: property_keys(*, scope: Literal['molecule', 'atom'] | None = None) -> list[str] Get all property keys. Returns ------- list of str All property keys .. py:method:: has_property(key: str, *, scope: Literal['molecule', 'atom'] | None = None) -> bool Check if a property exists. Parameters ---------- key : str Property key scope : {"molecule", "atom"}, optional Restrict the lookup to one scope. Returns ------- bool True if property exists, False otherwise .. py:method:: delete_property(key: str) -> None Delete a custom property by key. .. py:function:: add_ase_batch(db, atoms_list, *, copy_info=True, copy_arrays=True, info=None, batch_size=512) Write many ASE Atoms objects efficiently, preserving supported metadata. .. py:function:: from_ase(atoms, energy=None, forces=None, charges=None, velocities=None, cell=None, stress=None, copy_info=True, copy_arrays=True, info=None) Convert one ASE Atoms object to an atompack Molecule. Custom values from ``atoms.info``, ``atoms.arrays``, calculator results, and explicit ``info=`` overrides are stored as molecule-scope properties. Array shape is not used to infer atom-property scope during ingestion. .. py:function:: to_ase(molecule, *, attach_calc=True, calc_mode='singlepoint', copy_info=True, copy_arrays=True) Convert an atompack molecule to ``ase.Atoms``. The conversion reads directly from the molecule getters, so it works for both owned and view-backed molecules without going through ``molecule.atoms()``. That keeps the path compatible with lazy SOA-backed molecules, although ASE object creation still requires Python/NumPy allocations. Mapping rules: - ``positions`` and ``atomic_numbers`` always become the ASE geometry. - ``cell`` and ``pbc`` are copied when present. - ``velocities`` are attached with ``atoms.set_velocities(...)``. - ``energy``, ``forces``, ``stress``, and ``charges`` are attached through an ASE calculator when ``attach_calc=True``. ``calc_mode="singlepoint"`` preserves ASE's snapshot semantics, while ``calc_mode="nocopy"`` is faster but does not snapshot the atoms state. - Custom properties shaped like per-atom arrays are stored in ``atoms.arrays`` when ``copy_arrays=True``. - Remaining custom properties are stored in ``atoms.info`` when ``copy_info=True``. Parameters ---------- molecule : atompack.Molecule Molecule to convert. attach_calc : bool, default=True Attach supported builtin results through an ASE calculator. calc_mode : {"singlepoint", "nocopy", "none"}, default="singlepoint" Calculator attachment mode. ``"singlepoint"`` uses ASE's standard snapshotting calculator, ``"nocopy"`` skips the internal atoms copy for higher throughput, and ``"none"`` suppresses calculator attachment. copy_info : bool, default=True Copy non-array custom properties into ``atoms.info``. copy_arrays : bool, default=True Copy per-atom custom arrays into ``atoms.arrays``. Returns ------- ase.Atoms Converted ASE object. .. py:function:: to_ase_batch(source, indices=None, *, attach_calc=True, calc_mode='singlepoint', copy_info=True, copy_arrays=True) Convert many atompack molecules to ASE Atoms efficiently.