Query & search registries

Find & access data using registries.

Setup

!lamin init --storage ./mydata
Hide code cell output
💡 connected lamindb: testuser1/mydata
import lamindb as ln

ln.settings.verbosity = "info"
💡 connected lamindb: testuser1/mydata

We’ll need some toy data:

ln.Artifact(ln.core.datasets.file_jpg_paradisi05(), description="My image").save()
ln.Artifact.from_df(ln.core.datasets.df_iris(), description="The iris collection").save()
ln.Artifact(ln.core.datasets.file_fastq(), description="My fastq").save()
Hide code cell output
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact 'e3WV1jNNz5xg2FDL7z72' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/e3WV1jNNz5xg2FDL7z72.jpg'
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact '1COm2aQq3Bb4tesE3vg5' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/1COm2aQq3Bb4tesE3vg5.parquet'
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact 'gx2cD7lye8RhSkCXXdSz' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/gx2cD7lye8RhSkCXXdSz.fastq.gz'
Artifact(uid='gx2cD7lye8RhSkCXXdSz', description='My fastq', suffix='.fastq.gz', size=20, hash='hi7ZmAzz8sfMd3vIQr-57Q', hash_type='md5', visibility=1, key_is_virtual=True, created_by_id=1, storage_id=1, updated_at='2024-06-17 11:30:30 UTC')

Look up metadata

For entities where we don’t store more than 100k records, a look up object can be a convenient way of selecting a record.

Consider the User registry:

users = ln.User.lookup(field="handle")

With auto-complete, we find a user:

user = users.testuser1
user
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-06-17 11:30:29 UTC')

Note

You can also auto-complete in a dictionary:

users_dict = ln.User.lookup().dict()

Filter by metadata

Filter for all artifacts created by a user:

ln.Artifact.filter(created_by=user).df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 e3WV1jNNz5xg2FDL7z72 None My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-17 11:30:30.720027+00:00
2 1COm2aQq3Bb4tesE3vg5 None The iris collection None .parquet DataFrame 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None 1 True 1 None None 1 2024-06-17 11:30:30.821023+00:00
3 gx2cD7lye8RhSkCXXdSz None My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-17 11:30:30.829012+00:00

To access the results encoded in a filter statement, execute its return value with one of:

  • .df(): A pandas DataFrame with each record stored as a row.

  • .all(): An indexable django QuerySet.

  • .list(): A list of records.

  • .one(): Exactly one record. Will raise an error if there is none.

  • .one_or_none(): Either one record or None if there is no query result.

Note

filter() returns a QuerySet.

The ORMs in LaminDB are Django Models and any Django query works. LaminDB extends Django’s API for data scientists.

Under the hood, any .filter() call translates into a SQL select statement.

.one() and .one_or_none() are two parts of LaminDB’s API that are borrowed from SQLAlchemy.

Search for metadata

ln.Artifact.search("iris").df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
2 1COm2aQq3Bb4tesE3vg5 None The iris collection None .parquet DataFrame 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None 1 True 1 None None 1 2024-06-17 11:30:30.821023+00:00

Let us create 500 notebook objects with fake titles and save them:

ln.save(
    [
        ln.Transform(name=title, type="notebook")
        for title in ln.core.datasets.fake_bio_notebook_titles(n=500)
    ]
)

We can now search for any combination of terms:

ln.Transform.search("intestine").df().head()
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
7 ZD30lUiDTnhOUKAA None Study Outer root sheath intestine. None None notebook None None None None 1 2024-06-17 11:30:35.520370+00:00
18 ql8QFtZ6XJtMXxEB None Study intestine IgG3 IgY Lugaro cells IgG IgM. None None notebook None None None None 1 2024-06-17 11:30:35.522080+00:00
34 0SMA9vaqienmGnxo None Igg3 candidate intestine result Medulla oblong... None None notebook None None None None 1 2024-06-17 11:30:35.524537+00:00
47 uHhzID7T8BvrFtDS None Efficiency intestine IgY rank Tonsils IgG3. None None notebook None None None None 1 2024-06-17 11:30:35.526555+00:00
49 xCvAcodAd67fTLak None Intestine Tanycytes Erythrocyte IgY IgG2. None None notebook None None None None 1 2024-06-17 11:30:35.526862+00:00

Leverage relations

Django has a double-under-score syntax to filter based on related tables.

This syntax enables you to traverse several layers of relations:

ln.Artifact.filter(run__created_by__handle__startswith="testuse").df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id

The filter selects all artifacts based on the users who ran the generating notebook.

(Under the hood, in the SQL database, it’s joining the artifact table with the run and the user table.)

Beyond __startswith, Django supports about two dozen field comparators field__comparator=value.

Here are some of them.

and

ln.Artifact.filter(suffix=".jpg", created_by=user).df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 e3WV1jNNz5xg2FDL7z72 None My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-17 11:30:30.720027+00:00

less than/ greater than

Or subset to artifacts greater than 10kB. Here, we can’t use keyword arguments, but need an explicit where statement.

ln.Artifact.filter(created_by=user, size__lt=1e4).df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
2 1COm2aQq3Bb4tesE3vg5 None The iris collection None .parquet DataFrame 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None 1 True 1 None None 1 2024-06-17 11:30:30.821023+00:00
3 gx2cD7lye8RhSkCXXdSz None My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-17 11:30:30.829012+00:00

or

from django.db.models import Q

ln.Artifact.filter().filter(Q(suffix=".jpg") | Q(suffix=".fastq.gz")).df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 e3WV1jNNz5xg2FDL7z72 None My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-17 11:30:30.720027+00:00
3 gx2cD7lye8RhSkCXXdSz None My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-17 11:30:30.829012+00:00

in

ln.Artifact.filter(suffix__in=[".jpg", ".fastq.gz"]).df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 e3WV1jNNz5xg2FDL7z72 None My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-17 11:30:30.720027+00:00
3 gx2cD7lye8RhSkCXXdSz None My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-17 11:30:30.829012+00:00

order by

ln.Artifact.filter().order_by("-updated_at").df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
3 gx2cD7lye8RhSkCXXdSz None My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-17 11:30:30.829012+00:00
2 1COm2aQq3Bb4tesE3vg5 None The iris collection None .parquet DataFrame 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None 1 True 1 None None 1 2024-06-17 11:30:30.821023+00:00
1 e3WV1jNNz5xg2FDL7z72 None My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-17 11:30:30.720027+00:00

contains

ln.Transform.filter(name__contains="search").df().head(10)
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
1 IUM5pqDTCJDrhaUo None Platelets result IgG3 research IgG. None None notebook None None None None 1 2024-06-17 11:30:35.519403+00:00
33 798YwkvCjLERdr9X None Ige candidate research IgG3 IgG3 Crystallin-co... None None notebook None None None None 1 2024-06-17 11:30:35.524382+00:00
43 EnYPOkQ9wQDY57N9 None Cluster research IgG IgM efficiency. None None notebook None None None None 1 2024-06-17 11:30:35.525939+00:00
45 TEUSX6cNpze75oCW None Igg3 cluster IgY IgG4 IgY research. None None notebook None None None None 1 2024-06-17 11:30:35.526247+00:00
52 EePtCYvk38nLQSR0 None Platelets Crystallin-containing lens fiber cel... None None notebook None None None None 1 2024-06-17 11:30:35.527327+00:00
57 oLCZwqvPXFbGYG12 None Rank classify IgY research Gonadotropes Midbrain. None None notebook None None None None 1 2024-06-17 11:30:35.528094+00:00
75 u7AGyP3qvWdp17zT None Ige IgG IgD research IgE Tanycytes. None None notebook None None None None 1 2024-06-17 11:30:35.530852+00:00
82 hg6XXbL6Z6ohEPtw None Igy IgG IgG2 IgE Heart Lugaro cells research. None None notebook None None None None 1 2024-06-17 11:30:35.535282+00:00
86 H7n2XDt6nz9GYMj3 None Igg4 IgM research IgG3 Platelets. None None notebook None None None None 1 2024-06-17 11:30:35.535871+00:00
88 MynWBRvPqCuU0maA None Intestine IgG4 IgG3 Lugaro cells research. None None notebook None None None None 1 2024-06-17 11:30:35.536164+00:00

And case-insensitive:

ln.Transform.filter(name__icontains="Search").df().head(10)
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
1 IUM5pqDTCJDrhaUo None Platelets result IgG3 research IgG. None None notebook None None None None 1 2024-06-17 11:30:35.519403+00:00
33 798YwkvCjLERdr9X None Ige candidate research IgG3 IgG3 Crystallin-co... None None notebook None None None None 1 2024-06-17 11:30:35.524382+00:00
43 EnYPOkQ9wQDY57N9 None Cluster research IgG IgM efficiency. None None notebook None None None None 1 2024-06-17 11:30:35.525939+00:00
45 TEUSX6cNpze75oCW None Igg3 cluster IgY IgG4 IgY research. None None notebook None None None None 1 2024-06-17 11:30:35.526247+00:00
52 EePtCYvk38nLQSR0 None Platelets Crystallin-containing lens fiber cel... None None notebook None None None None 1 2024-06-17 11:30:35.527327+00:00
57 oLCZwqvPXFbGYG12 None Rank classify IgY research Gonadotropes Midbrain. None None notebook None None None None 1 2024-06-17 11:30:35.528094+00:00
75 u7AGyP3qvWdp17zT None Ige IgG IgD research IgE Tanycytes. None None notebook None None None None 1 2024-06-17 11:30:35.530852+00:00
82 hg6XXbL6Z6ohEPtw None Igy IgG IgG2 IgE Heart Lugaro cells research. None None notebook None None None None 1 2024-06-17 11:30:35.535282+00:00
86 H7n2XDt6nz9GYMj3 None Igg4 IgM research IgG3 Platelets. None None notebook None None None None 1 2024-06-17 11:30:35.535871+00:00
88 MynWBRvPqCuU0maA None Intestine IgG4 IgG3 Lugaro cells research. None None notebook None None None None 1 2024-06-17 11:30:35.536164+00:00

startswith

ln.Transform.filter(name__startswith="Research").df()
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
131 i30fL634TKa76xWI None Research Platelets IgY IgM study IgG3. None None notebook None None None None 1 2024-06-17 11:30:35.542517+00:00
143 M0zagdmJ121lPGRl None Research intestine research. None None notebook None None None None 1 2024-06-17 11:30:35.544275+00:00
170 m6VBO5QR6smfsulN None Research IgG intestinal IgY Midbrain intestinal. None None notebook None None None None 1 2024-06-17 11:30:35.551167+00:00
180 CTsrt1uFhpl6mi8r None Research IgM visualize Fallopian tubes. None None notebook None None None None 1 2024-06-17 11:30:35.552616+00:00
235 DdytpfuW64E2MAlj None Research Surface epithelial cell Surface epith... None None notebook None None None None 1 2024-06-17 11:30:35.563817+00:00
363 OUFY2osrmIE414Ok None Research candidate IgG Tonsils IgY IgG3. None None notebook None None None None 1 2024-06-17 11:30:35.585212+00:00
366 rPneM032YpLURMUL None Research IgM candidate intestine classify Medu... None None notebook None None None None 1 2024-06-17 11:30:35.585890+00:00
371 3ghoAlgiGsgde4YC None Research intestine IgM Tonsils Tanycytes IgG I... None None notebook None None None None 1 2024-06-17 11:30:35.586638+00:00
412 RTDIjKH8rKe3UeZ5 None Research IgG Heart IgM IgM IgG. None None notebook None None None None 1 2024-06-17 11:30:35.595320+00:00
482 PIhyZb1WBxjUbjuX None Research Gonadotropes IgG3 investigate IgG4. None None notebook None None None None 1 2024-06-17 11:30:35.608125+00:00
Hide code cell content
# clean up test instance
!lamin delete --force mydata
!rm -r mydata
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.11.9/x64/bin/lamin", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamin_cli/__main__.py", line 103, in delete
    return delete(instance, force=force)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/_delete.py", line 98, in delete
    n_objects = check_storage_is_empty(
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/core/upath.py", line 779, in check_storage_is_empty
    raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage /home/runner/work/lamindb/lamindb/docs/mydata/.lamindb contains 3 objects ('_is_initialized' ignored) - delete them prior to deleting the instance
['/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/1COm2aQq3Bb4tesE3vg5.parquet', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/_is_initialized', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/e3WV1jNNz5xg2FDL7z72.jpg', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/gx2cD7lye8RhSkCXXdSz.fastq.gz']