lamindb.Feature¶
- class lamindb.Feature(name: str, type: str | list[type[Registry]], unit: str | None, description: str | None, synonyms: str | None)¶
Bases:
Registry
,CanValidate
,TracksRun
,TracksUpdates
Dataset dimensions.
A feature is a random variable or, equivalently, dimension of a dataset. The
Feature
registry helps tomanage metadata of features
annotate datasets by whether they measured a feature
Learn more: Tutorial: Features & labels.
- Parameters:
name –
str
Name of the feature, typically, a column name.type –
str | list[Type[Registry]]
Data type (“number”, “cat”, “int”, “float”, “bool”, “datetime”). For categorical types, can define from which registry values are sampled, e.g.,cat[ULabel]
orcat[bionty.CellType]
.unit –
str | None = None
Unit of measure, ideally SI ("m"
,"s"
,"kg"
, etc.) or"normalized"
etc.description –
str | None = None
A description.synonyms –
str | None = None
Bar-separated synonyms.
Note
For more control, you can use
bionty
registries to manage basic biological entities like genes, proteins & cell markers. Or you define custom registries to manage high-level derived features like gene sets.See also
from_df()
Create feature records from DataFrame.
features
Feature manager of an artifact or collection.
ULabel
Universal labels.
FeatureSet
Feature sets.
Example
>>> ln.Feature("cell_type_by_expert", dtype="cat", description="Expert cell type annotation").save()
Hint
Features and labels denote two ways of using entities to organize data:
A feature qualifies what is measured, i.e., a numerical or categorical random variable
A label is a measured value, i.e., a category
Consider annotating a dataset by that it measured expression of 30k genes: genes relate to the dataset as feature identifiers through a feature set with 30k members. Now consider annotating the artifact by whether that it measured the knock-out of 3 genes: here, the 3 genes act as labels of the dataset.
Re-shaping data can introduce ambiguity among features & labels. If this happened, ask yourself what the joint measurement was: a feature qualifies variables in a joint measurement. The canonical data matrix lists jointly measured variables in the columns.
Attributes¶
- featurevalue_set ReverseManyToOneDescriptor¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- objects Manager¶
Fields¶
- created_at DateTimeField¶
Time of creation of record.
- updated_at DateTimeField¶
Time of last update to record.
- id AutoField¶
Internal id, valid only in one DB instance.
- uid CharField¶
Universal id, valid across DB instances.
- name CharField¶
Name of feature (required).
- dtype CharField¶
Data type (“number”, “cat”, “int”, “float”, “bool”, “datetime”).
For categorical types, can define from which registry values are sampled, e.g.,
cat[ULabel]
orcat[bionty.CellType]
.
- unit CharField¶
Unit of measure, ideally SI (
m
,s
,kg
, etc.) or ‘normalized’ etc. (optional).
- description TextField¶
A description.
- synonyms TextField¶
Bar-separated (|) synonyms (optional).
- previous_runs ManyToManyField¶
Sequence of runs that created or updated the record.
- feature_sets ManyToManyField¶
Feature sets linked to this feature.
Methods¶
- classmethod from_df(df, field=None)¶
Create Feature records for columns..
- Return type: