lamindb.FeatureSet

class lamindb.FeatureSet(features: Iterable[Registry], type: str | None = None, name: str | None = None)

Bases: Registry, TracksRun

Feature sets.

Stores references to sets of Feature and other registries that may be used to identify features (e.g., class:~bionty.Gene or class:~bionty.Protein).

Why does LaminDB model feature sets, not just features?
  1. Performance: Imagine you measure the same panel of 20k transcripts in 1M samples. By modeling the panel as a feature set, you can link all your artifacts against one feature set and only need to store 1M instead of 1M x 20k = 20B links.

  2. Interpretation: Model protein panels, gene panels, etc.

  3. Data integration: Feature sets provide the currency that determines whether two collections can be easily concatenated.

These reasons do not hold for label sets. Hence, LaminDB does not model label sets.

Parameters:
  • featuresIterable[Registry] An iterable of Feature records to hash, e.g., [Feature(...), Feature(...)]. Is turned into a set upon instantiation. If you’d like to pass values, use from_values() or from_df().

  • typestr | None = None The simple type. Defaults to None for sets of Feature records, and otherwise defaults to "number" (e.g., for sets of Gene).

  • namestr | None = None A name.

Note

A feature set is identified by the hash of the feature uids in the set.

A slot provides a string key to access feature sets. It’s typically the accessor within the registered data object, here pd.DataFrame.columns.

See also

from_values()

Create from values.

from_df()

Create from dataframe columns.

Examples

Create a featureset from df with types:

>>> df = pd.DataFrame({"feat1": [1, 2], "feat2": [3.1, 4.2], "feat3": ["cond1", "cond2"]})
>>> feature_set = ln.FeatureSet.from_df(df)

Create a featureset from features:

>>> features = ln.Feature.from_values(["feat1", "feat2"], type=float)
>>> feature_set = ln.FeatureSet(features)

Create a featureset from feature values:

>>> import bionty as bt
>>> feature_set = ln.FeatureSet.from_values(adata.var["ensemble_id"], Gene.ensembl_gene_id, orgaism="mouse")
>>> feature_set.save()

Link a feature set to an artifact:

>>> artifact.features.add_feature_set(feature_set, slot="var")

Link features to an artifact (will create a featureset under the hood):

>>> artifact.features.add_values(features)

Attributes

members: QuerySet

A queryset for the individual records of the set..

objects Manager

Fields

created_at DateTimeField

Time of creation of record.

created_by ForeignKey

Creator of record, a User.

run ForeignKey

Last run that created or updated the record, a Run.

id AutoField

Internal id, valid only in one DB instance.

uid CharField

A universal id (hash of the set of feature values).

name CharField

A name (optional).

n IntegerField

Number of features in the set.

dtype CharField

Data type, e.g., “number”, “float”, “int”. Is None for Feature.

For Feature, types are expected to be heterogeneous and defined on a per-feature level.

registry CharField

The registry that stores the feature identifiers, e.g., 'core.Feature' or 'bionty.Gene'.

Depending on the registry, .members stores, e.g. Feature or Gene records.

hash CharField

The hash of the set.

Methods

classmethod from_df(df, field=FieldAttr(Feature.name), name=None, mute=False, organism=None, public_source=None)

Create feature set for validated features..

Return type:

FeatureSet | None

classmethod from_values(values, field=FieldAttr(Feature.name), type=None, name=None, mute=False, organism=None, public_source=None, raise_validation_error=True)

Create feature set for validated features.

Parameters:
  • values (List[str] | Series | array) – A list of values, like feature names or ids.

  • field (DeferredAttribute, default: FieldAttr(Feature.name)) – The field of a reference registry to map values.

  • type (str | None, default: None) – The simple type. Defaults to None if reference registry is Feature, defaults to "float" otherwise.

  • name (str | None, default: None) – A name.

  • organism (str | Registry | None, default: None) – An organism to resolve gene mapping.

  • public_source (Registry | None, default: None) – A public ontology to resolve feature identifier mapping.

  • raise_validation_error (bool, default: True) – Whether to raise a validation error if some values are not valid.

Raises:

ValidationError – If some values are not valid.

Return type:

FeatureSet

Examples

>>> features = ["feat1", "feat2"]
>>> feature_set = ln.FeatureSet.from_values(features)
>>> genes = ["ENS980983409", "ENS980983410"]
>>> feature_set = ln.FeatureSet.from_values(features, bt.Gene.ensembl_gene_id, float)

.

save(*args, **kwargs)

Save.

Return type:

None