Changelog 2025

Note

Get notified by watching releases for git repositories: lamindb, laminhub, laminr, and bionty.

For other years, see: 2024 · 2023 · 2022

2025-06-03 db 1.6.1 | bionty 1.5.0

Bionty.

LaminDB.

  • 🚸 Enable passing --branch and --space to lamin save PR @falexwolf

  • 🐛 Fix query of feature-associated labels from non-ULabel registries PR @sunnyosun

2025-06-01 db 1.6.0 | bionty 1.4.0

⚠️ Consider lamin migrate deploy

All instances connected to LaminHub have been migrated and there is no need to act.

If you are an admin of a self-managed instance, please migrate your database with lamin migrate deploy.

The migrations in this release do not break old LaminDB clients with the exception of writing to the Param registry: the data in the corresponding SQL table got moved into the Feature registry.

The bulk of database-level changes was made in this PR @falexwolf.

  • remove unique constraint from Feature.name

  • replace hard unique constraint on Transform.hash and Artifact.hash, with conditional unique constraint: hash can be duplicated for different keys

  • new names for how instances are referred to for type, these don’t clash with the new record concept: Ulabel.ulabels, Feature.features, Schema.schemas, Project.projects

  • default space uid is now a "A" for "All"

  • Feature._expect_many now defaults to None so that the auto-display of single values as opposed to a set makes sense, and a user can enforce one (single) or the other (many) in the future

  • hash is populated for all FeatureValue records so that there is an easy way to universally identify a unique feature value

Changes to registries.

  • 🏗️ Integrate the Param into the Feature registry PR @falexwolf – the change is backward compatible on the Python/R level – on the SQL level, records are transferred from the lamindb_param table to the lamindb_feature table during migrations

  • ✨ Introduce a Branch registry PR @falexwolf

  • ♻️ Rename Record to SQLRecord PR PR @falexwolf

  • ✨ Introduce a flexible Record registry to manage any kind of entity without database migrations PR @falexwolf

Data curation.

  • ✨ Add schema-based TiledbsomaExperimentCurator PR @Zethson

  • ✨ Support curating lists as values in DataFrameCurator PR @sunnyosun

Bug fixes.

  • 🐛 Fix transfer for cases in which genes are insufficiently populated PR @falexwolf

Dependency changes.

UX improvements.

  • 🚸 Do no longer duplicate tracking of predecessors through the corresponding link table on Transform PR @falexwolf

  • 🚸 Add is_run_input to Artifact.get() and Collection.get() PR @Koncopd

  • 🚸 Improve suffix mismatch error message PR @Zethson

  • 🚸 Clearer error in parse_cat_dtype if cat dtype contains a module name and the module is not found PR @Koncopd

  • 🚸 Better error message when user passes manual uid to track() + anticipate that the user might want to create new transforms in some cases also if hash matches PR @falexwolf

  • 🚸 Improve setting relationships of unsaved records UX PR @Zethson

  • 🚸 Improve DoesNotExist error message upon DBRecord.get() PR @Zethson

  • ♻️ Set current space when transferring records PR @Koncopd

  • ♻️ Mark internal lamindb-produced artifacts with kind="__lamindb__" instead of _branch_code=0 PR @falexwolf

2025-05-13 db 1.5.3

2025-05-13 db 1.5.2

  • 🐛 Reset SpatialData path when access in-memory representation PR @Zethson

  • 🚸 Do not validate twice within Artifact.from_X(...) when passing schema PR @falexwolf

2025-05-08 db 1.5.1

  • 🐛 Fix a too strict unique constraint in composite schemas PR @falexwolf

  • 🐛 Fix display of parents & children in view_parents(with_children) PR @falexwolf

  • ⬆️ Adapt save_tiledbsoma_experiment to tiledbsoma==1.16.2 PR @Koncopd

2025-05-07 db 1.5.0 | bionty 1.3.2

Data lineage.

  • 🚸 Make notebook & script tracking via ln.track() robust to renames PR @falexwolf

  • ✨ Enable executing notebooks via jupyter nbconvert --execute PR @falexwolf

CLI updates.

  • ✨ Enable cloud paths for lamin save PR @Koncopd

    lamin save s3://my-bucket/my-file.txt
    
  • ✨ Enable labeling with project during lamin save PR @falexwolf

    lamin save ./my-folder --project my-project
    

Streaming artifacts.

  • ✨ Enable polars in Artifact.open() and Collection.open() PR @Koncopd

  • ✨ Enable .load(), .open(), and .mapped() on query sets of artifacts PR @Koncopd

Curation & schemas.

  • ✨ Enable curating the index of a dataframe PR @falexwolf

    schema = ln.Schema(
        features=[
            ln.Feature(name="required_feature", dtype=str).save(),
        ],
        index=ln.Feature(name="sample", dtype=ln.ULabel).save(),
    ).save()
    
  • 🚸 Enable passing a ULabel type to dtype PR @falexwolf

    perturbation_type = ln.ULabel.get(name="Perturbation")  # perturbation_type.is_type is True
    ln.Feature(name="perturbation", dtype=perturbation_type)
    
  • 🚸 Handle schema updates decently PR @falexwolf

  • 🚸 Do not annotate with more than n_max_records = 1000 PR @falexwolf

  • 🚸 Introduce a submodule lamindb.examples with schemas PR @falexwolf

  • 🚸 Enable validating against nested dicts in spatialdata PR @falexwolf

  • 🚸 Better handle validation of ensembl gene IDs and add curator representation PR @Zethson

  • 🚸 Prettier Schema.describe() PR @sunnyosun

  • 🚸 AnnData: enable explicit transposition in var schema definition PR @falexwolf

  • 🚚 Rename the components argument of Schema() to slots PR @falexwolf

  • 🐛 Fix respecting schema.ordered_set in DataFrame validation PR @sunnyosun

Bulk annotation with features & queries via features.

  • ✨ Support feature dtype dict PR @falexwolf

    ln.Feature(name="metadata_details", dtype=dict).save()
    
  • 🚸 For artifacts, improve (1) bulk annotation with features + (2) queries by features PR @falexwolf

General UX improvements.

  • 🚸 Do not raise exceptions on problems with copy_or_move_to_cache within Artifact.save PR @Koncopd

  • 🚸 Allow passing key to save_vitessce_config() PR @namsaraeva

  • 🚸 Add .pt and .ckpt to valid suffixes PR @Zethson

Docs.

  • 📝 Document uid generation, prettify API reference docs PR @falexwolf

Refactoring.

General refactoring.

  • ♻️ Eliminate monkey patching of django.db.models.QuerySet and django.db.models.Manager PR @Koncopd

  • ♻️ Avoid non-lazy loads of settings on import of lamindb.models PR @Koncopd

Refactoring for curation & schemas.

  • ♻️ Restore validation error messages & add their fine-grained testing PR @falexwolf

  • ♻️ Can save csv artifacts in DataFrameCurator PR @sunnyosun

  • ♻️ Clearer naming conventions in the internal curator codebase PR @falexwolf

  • ♻️ Separate CatManager usage for .cat attribute and as legacy interface PR @falexwolf

  • ♻️ Separate legacy curators from new curators PR @falexwolf

  • ♻️ Execute curator examples and also show them in the curation guide PR @falexwolf

  • ♻️ Refactor annotating with inferred feature sets PR @falexwolf

Fine-grained access management (in beta).
  • 🚸 Better access management errors on Record.save() PR @Koncopd

  • ✨ Refresh db token on expiration PR @Koncopd

  • 🐛 Fix .using with fine-grained access instances and permissions test PR @Koncopd

  • ✅ Test auth errors PR @Koncopd

  • ✅ Temp table based authentication (adapt tests) PR @Koncopd

  • ✅ Decrypt token inside RLS (adapt tests) PR @Koncopd

  • 🚸 Delete version family if user wants to retain store by passing storage=False to artifact.delete(), but retain warning PR @falexwolf

2025-04-25 bionty 1.3.1

🐛 Fixed downloading old Ensembl versions. PR @sunnyosun

If you upgraded to bionty 1.3.0 and used Ensembl versions below 108, please clear the cached ontology source files.
import bionty as bt
import shutil

shutil.rmtree(bt.base.settings.dynamicdir)

2025-04-24 R 1.1.0

LaminR is now documented on docs.lamin.ai.

The previous docs site laminr.lamin.ai continues to host developer docs.

  • 📝 Update documentation site to match the main docs website PR PR @lazappi

  • 👷 Separate Seurat analysis from rest of the introduction notebook PR @falexwolf

  • ♻️ Make R and Python quickstarts parallel PR @falexwolf

  • ♻️ Move setup.Rmd to lamin-docs PR @falexwolf

New features.

  • ✨ Improved Python dependency management with reticulate, deprecated install_lamindb() PR @lazappi

  • ✨ Add tracking of the R environment using pak lockfiles PR @lazappi

  • ✨ Enable artifact$view_lineage() PR @lazappi

Bug fixes.

  • 🐛 Enable setting wrapped object slots like artifact$description, artifact$key, etc. PR @lazappi

  • 🐛 Fix an issue that was preventing lamin_connect() from being run multiple times with the same instance PR @lazappi

  • 🐛 Properly clear and delete temporary instances created using lamin_init_temp() PR @lazappi

Other changes.

  • ♻️ Set minimum reticulate dependency >= 1.38.0 PR @lazappi

  • 🚸 Improve inheritance of arguments when wrapping and overwriting Python functions PR @lazappi

  • ♻️ Dispatch CI from pre-release events in lamindb PR @falexwolf

2025-04-15 db 1.4.0 | bionty 1.3.0

✨ Add schema as an argument to Artifact.from_X(). PR @falexwolf

artifact = ln.Artifact.from_df(df, key="my_dataset.parquet", schema=schema).save()

✨ Enable defining simple schemas that merely enforce a feature identifier type. PR @falexwolf

schema = ln.Schema(itype=ln.Feature).save()  # <-- enforce valid feature identifiers, no need to define specific required features

✨ Enable defining optional features on a per-schema level & improve schema hash calculation. PR @sunnyosun

schema = ln.Schema(
  features=[
    ln.Feature(name="sample_id", dtype=str).save()  # required
    ln.Feature(name="sample_name", dtype=str).with_config(optional=True)  # optional
  ],
).save()

✨ Introduce lamin run with a Modal backend. PR @ragyhaddad

lamin run my_script.py --project my_project  # <-- will run the script on Modal

✨ Support auto-download of Ensembl genes of all organisms. Guide PR @sunnyosun

gene_ontology = bt.base.Gene(source="ensembl", organism="rabbit", version='release-103')
gene_ontology.register_source_in_lamindb()  # register the new ontology source in lamindb
source = bt.Source.get(entity="bionty.Gene", name="ensembl", organism="rabbit", version='release-103')
bt.Gene.import_source(source=source)  # import all genes from that source

🚸 Enable querying by features & params through Artifact.filter() and Run.filter(). Guide PR @falexwolf

ln.Artifact.filter(scientist="Barbara McClintock")

User experience.

  • 🚸 from_source no longer returns None but throws a NoResultFound exception if the look up in the public ontology fails PR @sunnyosun

  • 🚸 Allow renaming artifacts & transforms within the same version family PR @falexwolf

  • 🚸 Better support minimal_set, maximal_set, ordered_set in curators PR @sunnyosun

  • 🚸 Enable passing the stem uid to lamin save PR @falexwolf

  • 🚸 No longer throw an error but merely print a warning when attempting to update a schema PR @falexwolf

  • 🚸 Enable plain notebook uploads by making a default run for notebook in case no run is found PR @falexwolf

  • 🚸 Enable to authenticate and set the current instance through environment variables PR @falexwolf

  • 🚸 Show link to hub in view_lineage() and render lineage through graphviz also in scripts PR @falexwolf

  • 🚸 Order IsVersioned.versions query set PR @falexwolf

  • 🚸 Do not print warning about missing schema modules PR @falexwolf

Refactors.

  • ♻️ Eliminate duplicated parsing & record creation during curation PR @falexwolf

  • ♻️ Remove verbosity and organism arguments on CatManager level PR PR @falexwolf

  • ♻️ Organize categorical curation code with CatColumn PR @sunnyosun

  • ♻️ Add return_graph argument to view_lineage() PR @lazappi

  • ♻️ Suppress aiobotocore traceback logging PR @Koncopd

  • ⬆️ Upgrade supabase to <2.15.0 PR @Koncopd

  • ⬆️ Upgrade anndata to 0.11.4 PR @Koncopd

Docs.

  • 📝 Compare lamindb with pydantic and pandera in an FAQ doc PR @falexwolf

  • 📝 Document access any Ensembl genes PR @sunnyosun

Bugs.

  • 🐛 Fix validation of var_index PR @sunnyosun

  • 🐛 Fix numcodecs==0.16.0 incompatibility with zarr v2 PR @Koncopd

  • 🐛 Fix SpatialData and MuData check PR @Zethson

  • 🐛 Fix organism passing to from_source PR @sunnyosun

  • 🐛 Return an empty set not a dict for modules in instance settings PR @falexwolf

Bionty.

  • 🚸 Make the default organism "human" instead of None PR @falexwolf

  • ⬆️ Support Python 3.13 & remove support for Python 3.9 PR @Zethson

  • ♻️ Improve Ensembl prefix detection PR @sunnyosun

  • ♻️ Use UPath.synchronize in s3_bionty_assets PR @Koncopd

2025-03-27 db 1.3.2 | bionty 1.2.1

  • 🐛 Fix bionty ontology sources sync through reticulate PR @falexwolf

  • 🐛 Fix data transfer through when target instances has no schema modules PR @falexwolf

2025-03-26 db 1.3.1 | bionty 1.2.0

In Bionty, you can now add custom ontology sources through the Source registry.

df = pd.read_csv("./our_inhouse_genes.csv")  # a csv describing gene metadata e.g. from parsing a GTF file
custom_source = bt.Source(entity="bionty.Gene", organism="human", name="Our genes", version="2025-04-01").save()
bt.Gene.add_source(custom_source, df=df)  # couple the custom source to the Gene registry
Detailed changes

Bionty now relies on a single file source.yaml to reference public sources.

  • ✨ Enable update existing records to a new ontology PRPR @sunnyosun

  • ✨ Robust support of custom sources PR @sunnyosun

  • ➖ Remove pronto from main dependencies PR @Zethson

  • ♻️ Refactor sync_public_sources PR @sunnyosun

  • ♻️ Refactor default source configuration PR @sunnyosun

  • ♻️ Make EFO parsing the same as other ontologies PR @sunnyosun

  • ♻️ No longer use local source yaml files PR @sunnyosun

  • ♻️ Move source tests from lamindb to bionty PR @sunnyosun

  • ♻️ Standardize organism scientific names from ensembl source PR @sunnyosun

  • ♻️ Increase uid length for Source to 8 chars PR @falexwolf

  • 🍱 New ExperimentalFactor version: efo-3.69.0 PR @Zethson

  • 🍱 New CellType version: cl-2024-08-16 PR @Zethson

  • 🍱 New Disease version: mondo-2024-08-06 PR @Zethson

LaminDB changes.

  • 🐛 Fix incompatibility with gotrue==2.12.0 PR @Koncopd

  • 🐛 Enable transferring features pointing to multiple labels PR @sunnyosun

  • 🐛 More extensive validation for updates to artifact.key and artifact.suffix PR @falexwolf

  • 🚸 Refactor conventions for files written during init: the SQLite file is now .lamindb/lamin.db and the storage marker is .lamindb/storage_uid.txt PR @falexwolf

  • 🚸 Make upload of large directories more robust by reducing batch size PR @Koncopd

  • 🚸 Avoid requiring coerce_dtype for "int" and "float" in case an integer or float pd.Series.dtype only deviates by numerical precision/range PR @falexwolf

  • 🚸 In AnnDataCurator, make 'obs' schema optional and allow 'uns' schema PR @falexwolf

2025-03-16 db 1.3.0

New features.

Other changes.

  • ⬆️ Python 3.13 support PR @Zethson

  • ⬆️ Support CELLxGENE schema 5.2.0 PR1 PR2 @sunnyosun

  • 🚸 Skip ln.track() when connected in read-only mode PR @falexwolf

  • 🚸 Error if trying to register an instance without a storage in the hub PR @Koncopd

  • 🚸 Refactor organism constraints during validation PR @sunnyosun

  • 🚸 Add more constructor signatures and specific inherited types PR @falexwolf

  • 🚸 No logging message if database is behind by minor version PR @falexwolf

  • 📝 Re-structure curation guides PR1 PR2 @falexwolf

  • 📝 Integrate tutorials into introduction guide PR @falexwolf

2025-03-10 R 1.0.0

laminr now has feature parity with lamindb. PR @lazappi

  • Run install_lamindb(), which will ensure lamindb >= 1.2 in the Python environment used by reticulate.

  • Replace db <- connect() with ln <- import_module("lamindb") and see the “Detailed changes” dropdown.

The ln object is largely similar to the db object in laminr < v1 and matches lamindb’s Python API (.$).

Detailed changes

What

Before

After

Connect to the default LaminDB instance

db <- connect()

ln <- import_module("lamindb")

Start tracking

db$track()

ln$track()

Get an artifact from another instance

new_instance <- connect("another/instance"); new_instance$Artifact$get(...)

ln$Artifact$using("another/instance")$get(...)

Create an artifact from a path

db$Artifact$from_path(path)

ln$Artifact(path)

Finish tracking

db$finish()

ln$finish()

See the updated “Get started” vignette for more information.

User-facing changes:

  • Add an import_module() function to import Python modules with additional functionality, e.g., import_module("lamindb") for lamindb

  • Add functions for accessing more lamin CLI commands

  • Add a new “Introduction” vignette that replicates the code from the Python lamindb introduction guide

Internal changes:

  • Add an internal wrap_python() function to wrap Python objects while replacing Python methods with R methods as needed, leaving most work to {reticulate}

  • Update the internal check_requires() function to handle Python packages

  • Add custom cache()/load() methods to the Artifact class

  • Add custom track()/finish() methods to the lamindb module

2025-03-09 db 1.2.0

✨ Enable to auto-link entities to projects. Guide PR @falexwolf

ln.track(project="My project")

🚸 Better support for spatialdata with Artifact.from_spatialdata() and artifact.load(). PR1 PR2 @Zethson

🚸 Introduce .slots in Schema, Curator, and artifact.features to access schemas and curators by dataset slot. PR @sunnyosun

schema.slots["obs"]  # -> schema for .obs slot of AnnData
curator.slots["obs"]  # -> curator for .obs slot of AnnData
artifact.features["obs"]  # -> feature set for .obs slot of AnnData

🏗️ Re-structured the internal API away from monkey-patching Django models. PR @falexwolf

⚠️ Use of internal API

If you used the internal API, you might experience a breaking change. The most drastic change is that all internal registry-related functionality is now re-exported under lamindb.models.

🚸 When re-creating an Artifact, link subsequent runs instead of updating .run and linking previous runs. PR @falexwolf

On the hub.

More details here. @chaichontat

Before

After

An artifact is only shown as an output for the latest run that created the artifact. Previous runs don’t show it.

All runs that (re-)create an artifact show it as an output.

image

image

More changes:

  • ✨ Support R2 PR @Koncopd

  • ✨ Enable Artifact.open() and Artifact.load() for .gz files PR @Koncopd

  • 🐛 Fix passing a path to ln.track() when no path found by nbproject PR @Koncopd

  • 🐛 Do not overwrite ._state_db of records when the current instance is passed to .using PR @Koncopd

  • 🚸 Do not show track warning for read-only connections PR @Koncopd

  • 🚸 Raise NotImplementedError in Artifact.load() if there is no loader PR @Koncopd

2025-02-27 db 1.1.1

  • 🚸 Make the obs and var DataFrameCurator objects accessible via AnnDataCurator.slots PR @sunnyosun

  • 🚸 Better error message upon re-creation of schema with same name and different hash PR @falexwolf

  • 🚸 Raise consistency error if a source path suffix doesn’t match the artifact key suffix PR @falexwolf

  • 🚸 Automatically add missing columns upon DataFrameCurator.standardize() if nullable is True PR @falexwolf

  • 🚸 Allow specifying fsspec upload options in Artifact.save PR @Koncopd

  • 🚸 Populate Artifact.n_observations in Artifact.from_df() PR @Koncopd

  • 🐛 Fix UPath.view_tree on first call on gs PR @Koncopd

  • 🐛 Fix .add_new_from message PR @Zethson

  • 🐛 Run pip freeze with current python interpreter PR @ap–

  • 🐛 Do not resolve http links when registering PR @Koncopd

  • 🐛 Fix notebook re-run with same hash PR @falexwolf

2025-02-18 db 1.1.0

⚠️ The FeatureSet registry got renamed to Schema.

All your code is backward compatible. The Schema registry encompasses feature sets as a special case.

✨ Conveniently track functions including inputs, outputs, and parameters with a decorator: ln.tracked(). PR1 PR2 @falexwolf

@ln.tracked()
def subset_dataframe(
    input_artifact_key: str,  # all arguments tracked as parameters of the function run
    output_artifact_key: str,
    subset_rows: int = 2,
    subset_cols: int = 2,
) -> None:
    artifact = ln.Artifact.get(key=input_artifact_key)
    df = artifact.load()  # auto-tracked as input
    new_df = df.iloc[:subset_rows, :subset_cols]
    ln.Artifact.from_df(new_df, key=output_artifact_key).save()  # auto-tracked as output

✨ Make sub-types of ULabel, Feature, Schema, Project, Param, and Reference. PR @falexwolf

On the hub.

More details here. @awgaan @chaichontat

Before

After

image

image

perturbation = ln.ULabel(name="Perturbation", is_type=True).save()
ln.ULabel(name="DMSO", type=perturbation).save()
ln.ULabel(name="IFNG", type=perturbation).save()

✨ Use an overhauled dataset curation flow. @falexwolf @Zethson @sunnyosun

  • support persisting validation constraints as a pandera-compatible schema

  • support validating any feature type, no longer just categoricals

  • make the relationship between features, dataset schema, and curator evident

Detailed changes for the overhauled curation flow.

⚠️ The API gained the lamindb.curators module as the new way to access Curator classes for different data structures.

  • This release introduces the schema-based DataFrameCurator and AnnDataCurator

  • The old-style curation flow for categoricals based on lamindb.Curator.from_objecttype() continues to work

Before

After

image

image

image

image

Key PRs.

  • ✨ Overhaul curation guides + enable default values and filters on valid categories for features PR @falexwolf

  • ✨ Schema-based curators: AnnDataCurator PR @falexwolf

  • ✨ Schema-based curators: DataFrameCurator PR @falexwolf

Enabling PRs.

  • ✨ Allow passing artifact to Curator PR @sunnyosun

  • 🎨 A ManyToMany between Schema.components and .composites PR @falexwolf

  • ♻️ Mark Schema fields as non-editable PR @falexwolf

  • ✨ Add auxiliary field nullable to Feature PR @falexwolf

  • ♻️ Prettify AnnDataCurator implementation PR @falexwolf

  • 🚸 Better error for malformed categorical dtype PR @falexwolf

  • 🎨 A ManyToMany between Schema.components and .composites PR @falexwolf

  • 🚚 Restore .feature_sets as a ManyToManyField PR @falexwolf

  • 🚚 Rename CatCurator to CatManager PR @falexwolf

  • 🎨 Let Curator.validate() throw an error PR @falexwolf

  • ♻️ Re-purpose BaseCurator as Curator, introduce CatCurator and consolidate shared logic under CatCurator PR @falexwolf

  • ♻️ Refactor organism handling in curators PR @falexwolf

  • 🔥 Eliminate all logic related to using_key in curators PR @falexwolf

  • 🚚 Bulk-rename old-style curators to CatCurator PR @falexwolf

  • 🎨 Self-contained definition of CellxGene schema / validation constraints PR @falexwolf

  • 🚚 Move PertCurator from wetlab here and add CellxGene Curator test PR @falexwolf

  • 🚚 Move CellXGene Curator from cellxgene-lamin here PR @falexwolf

schema = ln.Schema(
    name="small_dataset1_obs_level_metadata",
    features=[
        ln.Feature(name="CD8A", dtype=int).save(),  # integer counts for CD8A marker
        ln.Feature(name="perturbation", dtype=ln.ULabel).save(),  # a categorical feature that validates against the ULabel registry
        ln.Feature(name="sample_note", dtype=str).save(),   # a note for the sample
    ],
).save()

df = pd.DataFrame({
    "CD8A": [1, 4, 0],
    "perturbation": ["DMSO", ],
    "sample_note": ["value_1", "value_2", "value_3"],
    "temperature": [22.2, 25.7, 27.3],
})
curator = ln.curators.DataFrameCurator(df, schema)
artifact = curator.save_artifact(key="example_datasets/dataset1.parquet")  # validates compliance with schema, annotates with metadata
assert artifact.schema == schema  # the validating schema

✨ Easily filter on a validating schema. @falexwolf @Zethson @sunnyosun

On the hub.

With the Schema filter button, find all datasets that satisfy a given schema (→ explore).

image
schema = ln.Schema.get(name="small_dataset1_obs_level_metadata")  # get a schema
ln.Artifact.filter(schema=schema).df()  # filter all datasets that were validated by the schema

Collection.open() returns a pyarrow dataset. PR @Koncopd

df = pd.DataFrame({"feat1": [0, 0, 1, 1], "feat2": [6, 7, 8, 9]})
df[:2].to_parquet("df1.parquet", engine="pyarrow")
df[2:].to_parquet("df2.parquet", engine="pyarrow")

artifact1 = ln.Artifact(shard1, key="df1.parquet").save()
artifact2 = ln.Artifact(shard2, key="df2.parquet").save()
collection = ln.Collection([artifact1, artifact2], key="parquet_col")

dataset = collection.open() # backed by files in the cloud storage
dataset.to_table().to_pandas().head()

✨ Support s3-compatible endpoint urls, say your on-prem MinIO deployment. PR @Koncopd

Speed up instance creation through squashed migrations.

Tiledbsoma.

  • ✨ Support endpoint_url in operations with tiledbsoma PR1 PR2 @Koncopd

  • ✨ Add Artifact.from_tiledbsoma to populate n_observations PR @Koncopd

MappedCollection.

  • 🐛 Allow filtering on np.nan in obs_filter of MappedCollection PR @Koncopd

  • 🐛 Fix labels for NaN in categorical columns for MappedCollection PR @Koncopd

SpatialDataCurator.

  • 🐛 Fix var_index standardization of SpatialDataCurator PR1 PR2 @Zethson

  • 🐛 Fix sample level metadata optional in SpatialDataCatManager PR @Zethson

Core functionality.

  • ✨ Allow checking the need for syncing without actually syncing PR @Koncopd

  • ✨ Check for corrupted cache in Artifact.load() & Artifact.open() PR PR @Koncopd

  • ✨ Infer n_observations in Artifact.from_anndata PR @Koncopd

  • 🐛 Account for VSCode appending languageid to markdown cell in notebook tracking PR @falexwolf

  • 🐛 Fix dangling folders on upload failures PR @Koncopd

  • 🐛 Normalize module names for robust checking in _check_instance_setup() PR @Koncopd

  • 🐛 Fix idempotency of Feature creation when description is passed and improve filter and get error behavior PR @Zethson

  • 🐛 Fix caching logic in Artifact.open() PR @Koncopd

  • 🚸 Make new version upon passing existing key to Collection PR @falexwolf

  • 🚸 Throw better error upon checking instance.modules when loading a lamindb schema module PR @Koncopd

  • 🚸 Validate existing records in the DB irrespective of whether an ontology source is passed or not PR @sunnyosun

  • 🚸 Full guarantee of avoiding duplicating Transform, Artifact & Collection in concurrent runs PR @falexwolf

  • 🚸 Fix RemovedInDjango60Warning PR @Zethson

  • 🚸 Better user feedback during keyword validation in Record constructor PR @Zethson

  • 🚸 Fix warning about artifacts in trash PR @ap–

  • 🚸 Improved error message when saving via CLI PR @Zethson

  • 🚸 Improve local storage not found warning message PR @Zethson

  • 🚸 Better error message when attempting to save a file while not being connected to an instance PR @Zethson

  • 🚸 Error for non-keyword parameters for Artifact.from_x methods PR @Zethson

Housekeeping.

  • 🚸 Error at runtime with old s3fs PR @Koncopd

  • 🚸 Safer resolve in check_path_is_child_of_root() PR @Koncopd

  • ⬆️ Upgrade fsspec packages (s3fs, gcsfs, universal_pathlib) PR @Koncopd

  • ➕ Add pyyaml to dependencies PR @Koncopd

2025-01-23 db 1.0.5

  • 🚸 No longer throw a NotebookNotSaved error in ln.finish() but wait for the user or gracefully exit PR @falexwolf

  • 🚸 Resolve save FutureWarning PR @Zethson

  • 🐛 Fix Artifact.replace() for folder-like artifacts PR @Koncopd

  • 🐛 Filter the latest transform on saving by filename PR @Koncopd

2025-01-21 db 1.0.4

🚚 Revert Collection.description back to unlimited length TextField. PR @falexwolf

2025-01-21 db 1.0.3

🚸 In track(), improve logging in RStudio sessions. PR @falexwolf

2025-01-20 R 0.4.0

  • 🚚 Migrate to lamindb v1 PR @falexwolf

  • 🚸 Improve the user experience for setting up Python & reticulate PR @lazappi

2025-01-20 db 1.0.2

🚚 Improvments for lamindb v1 migrations. PR @falexwolf

  • add a .description field to Schema

  • enable labeling Run with ULabel

  • add a .predecessors and .successors field to Project akin to what’s present on Transform

  • make .uid fields not editable

2025-01-18 db 1.0.1

🐛 Block non-admin users from confirming the dialogue for integrating lnschema-core. PR @falexwolf

2025-01-17 db 1.0.0

This release makes the API consistent, integrates lnschema_core & ourprojects into the lamindb package, and introduces a breadth of database migrations to enable future features without disruption. You’ll now need at least Python 3.10.

Your code will continue to run as is, but you will receive warnings about a few renamed API components.

What

Before

After

Dataset vs. model

Artifact.type

Artifact.kind

Python object for Artifact

Artifact._accessor

Artifact.otype

Number of files

Artifact.n_objects

Artifact.n_files

name arg of Transform

Transform(name="My notebook", key="my-notebook.ipynb")

Transform(key="my-notebook.ipynb", description="My notebook")

name arg of Collection

Collection(name="My collection")

Collection(key="My collection")

Consecutiveness field

Run.is_consecutive

Run._is_consecutive

Run initiator

Run.parent

Run.initiated_by_run

--schema arg

lamin init --schema bionty,wetlab

lamin init --modules bionty,wetlab

Migration guide:

  1. Upon lamin connect account/instance you will be prompted to confirm migrating away from lnschema_core

  2. After that, you will be prompted to call lamin migrate deploy to apply database migrations

New features:

  • ✨ Allow http storage backend for Artifact PR @Koncopd

  • ✨ Add SpatialDataCurator PR @Zethson

  • ✨ Allow filtering by multiple obs columns in MappedCollection PR @Koncopd

  • ✨ In git sync, also search git blob hash in non-default branches PR @Zethson

  • ✨ Add relationship with Project to everything except Run, Storage & User so that you can easily filter for the entities relevant to your project PR @falexwolf

  • ✨ Capture logs of scripts during ln.track() PR1 PR2 @falexwolf @Koncopd

  • ✨ Support "|"-seperated multi-values in Curator PR @sunnyosun

  • 🚸 Accept None in connect() and improve migration dialogue PR @falexwolf

UX improvements:

  • 🚸 Simplify the ln.track() experience PR @falexwolf

    1. you can omit the uid argument

    2. you can organize transforms in folders

    3. versioning is fully automated (requirement for 1.)

    4. you can save scripts and notebooks without running them (corollary of 1.)

    5. you avoid the interactive prompt in a notebook and the throwing of an error in a script (corollary of 1.)

    6. you are no longer required to add a title in a notebook

  • 🚸 Raise error when modifying Artifact.key in problematic ways PR1 PR2 @sunnyosun @Koncopd

  • 🚸 Better error message on running ln.track() within Python terminal PR @Koncopd

  • 🚸 Hide traceback for InstanceNotEmpty using Click Exception PR @Zethson

  • 🚸 Hide underscore attributes in __repr__ PR @Zethson

  • 🚸 Only auto-search ._name_field in sub-classes of CanCurate PR @falexwolf

  • 🚸 Simplify installation & API overview PR @falexwolf

  • 🚸 Make lamin_run_uid categorical in tiledbsoma stores PR @Koncopd

  • 🚸 Add defensive check for organism arg PR @Zethson

  • 🚸 Raise ValueError when trying to search a None value PR @Zethson

Bug fixes:

  • 🐛 Skip deleting storage when deleting outdated versions of folder-like artifacts PR @Koncopd

  • 🐛 Let SOMACurator() validate and annotate all .obs columns PR @falexwolf

  • 🐛 Fix renaming of feature sets PR @sunnyosun

  • 🐛 Do not raise an exception when default AWS credentials fail PR @Koncopd

  • 🐛 Only map synonyms when field is name PR @sunnyosun

  • 🐛 Fix source in .from_values PR @sunnyosun

  • 🐛 Fix creating instances with storage in the current local working directory PR @Koncopd

  • 🐛 Fix NA values in Curator.add_new_from() PR @sunnyosun

Refactors, renames & maintenance:

  • 🏗️ Integrate lnschema-core into lamindb PR1 PR2 @falexwolf @Koncopd

  • 🏗️ Integrate ourprojects into lamindb PR @falexwolf

  • ♻️ Manage created_at, updated_at on the database-level, make created_by not editable PR @falexwolf

  • 🚚 Rename transform type “glue” to “linker” PR @falexwolf

  • 🚚 Deprecate the --schema argument of lamin init in favor of --modules PR @falexwolf

  • ⬆️ Compatibility with tiledbsoma==1.15.0 PR @Koncopd

DevOps:

Detailed list of database migrations

Those not yet announced above will be announced with the functionality they enable.

  • ♻️ Add contenttypes Django plugin PR @falexwolf

  • 🚚 Prepare introduction of persistable Curator objects by renaming FeatureSet to Schema on the database-level PR @falexwolf

  • 🚚 Add a .type foreign key to ULabel, Feature, FeatureSet, Reference, Param PR @falexwolf

  • 🚚 Introduce RunData, TidyTable, and TidyTableData in the database PR @falexwolf

All remaining database schema changes were made in this PR @falexwolf. Data migrations happen automatically.

  • remove _source_code_artifact from Transform, it’s been deprecated since 0.75

    • data migration: for all transforms that have _source_code_artifact populated, populate source_code

  • rename Transform.name to Transform.description because it’s analogous to Artifact.description

    • backward compat:

      • in the Transform constructor use name to populate key in all cases in which only name is passed

      • return the same transform based on key in case source_code is None via ._name_field = "key"

    • data migrations:

      • there already was a legacy description field that was never exposed on the constructor; to be safe, we concatenated potential data in it on the new description field

      • for all transforms that have key=None and name!=None, use name to pre-populate key

  • rename Collection.name to Collection.key for consistency with Artifact & Transform and the high likelihood of you wanting to organize them hierarchically

  • a _branch_code integer on every record to model pull requests

    • include visibility within that code

    • repurpose visibility=0 as _branch_code=0 as “archive”

    • put an index on it

    • code a “draft” as _branch_code = 2, and “draft prs” as negative branch codes

  • rename values "number" to "num" in dtype

  • an ._aux json field on Record

  • a SmallInteger run._status_code that allows to write finished_at in clean up operations so that there is a run time also for aborted runs

  • rename Run.is_consecutive to Run._is_consecutive

  • a _template_id FK to store the information of the generating template (whether a record is a template is coded via _branch_code)

  • rename _accessor to otype to publicly declare the data format as suffix, accessor

  • rename Artifact.type to Artifact.kind

  • a FK to artifact run._logfile which holds logs

  • a hash field on ParamValue and FeatureValue to enforce uniqueness without running the danger of failure for large dictionaries

  • add a boolean field ._expect_many to Feature/Param that defaults to True/False and indicates whether values for this feature/param are expected to occur a single or multiple times for every single artifact/run

    • for feature

      • if it’s True (default), the values come from an observation-level aggregation and a dtype of datetime on the observation-level mean set[datetime] on the artifact-level

      • if it’s False it’s an artifact-level value and datetime means datetime; this is an edge case because an arbitrary artifact would always be a set of arbitrary measurements that would need to be aggregated (“one just happens to measure a single cell line in that artifact”)

    • for param

      • if it’s False (default), the values mean artifact/run-level values and datetime means datetime

      • if it’s True, the values would be from an aggregation, this seems like an edge case but say when characterizing a model ensemble trained with different parameters it could be relevant

  • remove the .transform foreign key from artifact and collection for consistency with all other records; introduce a property and a simple filter statement instead that maintains the same UX

  • store provenance metadata for TransformULabel, RunParamValue, ArtifactParamValue

  • enable linking projects & references to transforms & collections

  • rename Run.parent to Run.initiated_by_run

  • introduce a boolean flag on artifact that’s called _overwrite_versions, which indicates whether versions are overwritten or stored separately; it defaults to False for file-like artifacts and to True for folder-like artifacts

  • Rename n_objects to n_files for more clarity

  • Add a Space registry to lamindb with an FK on every BasicRecord

  • add a name column to Run so that a specific run can be used as a named specific analysis

  • remove _previous_runs field on everything except Artifact & Collection