Skip to content

feat: Added Model Metadata support in Registry #5365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ntkathole
Copy link
Member

@ntkathole ntkathole commented May 18, 2025

What this PR does / why we need it:

Store model-related metadata (e.g., model name, features used, project name, timestamp) inside the registry, alongside other Feast objects (like Feature-views, Features, etc.).

This will allow users to link features to models and visualize the relationships between them, which will enable user to answer questions like -

  • Which features were used to train this model?
  • If I change this feature, which models will be affected?
  • When a model was trained?
In [1]: from feast import FeatureStore

In [2]: from feast.model import ModelMetadata

In [3]: fs = FeatureStore("/feast/feature_repo")

In [4]: model = ModelMetadata(name="fraud_detection_v1", project="fraud_detection", tags={"team": "data_scientist"})

In [5]: fs.apply(model)

In [6]: fs.list_models()
Out[6]:
[ModelMetadata(
   name='fraud_detection_v1',
   project='fraud_detection',
   feature_view=[],
   feature_service=[],
   features=[],
   tags={'team': 'data_scientist'},
   training_timestamp=None,
   description='',
 )]

Misc

Next followup steps:

  • Add unit/integration tests
  • Add documentation
  • Implement remote registry support
  • Process features/Feature-views/Feature-services based on Model Metadata.
  • Add a “Models” tab in Feast UI
  • Add lineage showing features-model relations
  • Add grpc and rest registry endpoints
  • Add CLI for models

@ntkathole ntkathole requested a review from a team as a code owner May 18, 2025 12:00

import "google/protobuf/timestamp.proto";

message ModelMetadata {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we follow Model Registry or MLFlow's schema? Can we?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer ml flow tbh

@ntkathole ntkathole marked this pull request as draft May 18, 2025 12:09
@tokoko
Copy link
Collaborator

tokoko commented May 18, 2025

Let me suggest a different approach if I may.. If I understand correctly, this would essentially be part of feast only to enable proper model-feature lineage, right? I think introducing a separate proto object (with all that it entails, rbac and so on) might be an overkill for a non-essential bit of information. Also in my experience it's highly likely users will very often neglect documenting models in feast, especially when this much effort is necessary.

Can't we instead go for a lighter integration by simply introducing a field (a string or maybe something a bit more complicated) in FeatureService that would act as a link to whatever model registry users use? FeatureService is supposed to have a one-to-one relationship with models as-is anyway. Even if for some reason model queries feast w/o a FeatureService, creating a dummy FeatureService just for better lineage would essentially be equivalent in terms of effort required to the ModelMetadata approach.

@franciscojavierarceo
Copy link
Member

@tokoko what do you think about MLFlow here?

@tokoko
Copy link
Collaborator

tokoko commented May 18, 2025

Do we have to make a choice? I would either go with an open-text string that user is free to fill however it likes or with a oneof with MlFlowModel and ModelRegistryModel as possible options. with mlflow tracking server url and model name (maybe model version as well (?)) should probably be enough, not familiar with model registry but probably something similar there as well.

@ntkathole
Copy link
Member Author

Can't we instead go for a lighter integration by simply introducing a field (a string or maybe something a bit more complicated) in FeatureService that would act as a link to whatever model registry users use? FeatureService is supposed to have a one-to-one relationship with models as-is anyway. Even if for some reason model queries feast w/o a FeatureService, creating a dummy FeatureService just for better lineage would essentially be equivalent in terms of effort required to the ModelMetadata approach.

I see your point, adding a simple reference field to FeatureService is a practical solution that solves the immediate need with minimal implementation and maintenance overhead. For many users, especially those not deeply invested in formal model management infra, this lightweight feature service linkage might be more than sufficient. But, this approach might not scale well for users who have large teams and needs tighter integrations with model training pipelines or registries.

I had a thought of FeatureService is fundamentally a construct for combining and serving a set of features, often composed from multiple FeatureViews, and can be reused across multiple models. Users might tweak a few features between models, or reuse the same FeatureService across different experiments. Thus, feature service might look like a workaround as the primary location to store model-specific metadata.

Also in my experience it's highly likely users will very often neglect documenting models in feast, especially when this much effort is necessary.

That’s a fair concern. But one of the key advantages of having a structured ModelMetadata proto is that it opens the door for automation. Metadata could be auto-populated as part of model training or deployment pipelines, It will also allow users to reference specific training runs.

Having a dedicated ModelMetadata, even if lightweight or oneof with MlFlowModel and ModelRegistryModel, gives us better flexibility. Thoughts ?

Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
@franciscojavierarceo
Copy link
Member

we should discuss with @tarilabs @HumairAK and @szaher

@tarilabs
Copy link

tarilabs commented Jul 1, 2025

we should discuss with @tarilabs @HumairAK and @szaher

thank you for tagging me, if it's of any help here is an entry point for KF MR references, with the caveat we're effectively getting away from Google MLMD dependency (getting away transparently for the user).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants