Skip to content

Fixing some bugs in example feature repo for spark #5407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Felix-neko
Copy link

Hi folks!

I've just tried to do feast init -t spark and to run generated test_workflow.py (Python 3.11, PySpark 3.5, OS Ubuntu 24.04).

There were some errors that I was able to fix, maybe it will be helpful.

@Felix-neko Felix-neko requested a review from a team as a code owner May 31, 2025 18:13
- creating parquet files with timestamps in us units instead of ns (pyspark has problems reading ns timestamps)
- removing on-the-fly features that aren't declared before
- settings join keys for entities.

Signed-off-by: Felix-neko <felix-neko@list.ru>
@@ -109,8 +107,6 @@ def fetch_online_features(store, use_feature_service: bool):
features_to_fetch = [
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips",
"transformed_conv_rate:conv_rate_plus_val1",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh these didn't work?

Copy link
Author

@Felix-neko Felix-neko May 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This on-the-fly feature was not declared in the example feature repo for Spark and i have just removed it from this example.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code under aws/feature_repo/example_repo.py had this ODFV:

# Define an on demand feature view which can generate new features based on
# existing feature views and RequestSource features
@on_demand_feature_view(
    sources=[driver_stats_fv, input_request],
    schema=[
        Field(name="conv_rate_plus_val1", dtype=Float64),
        Field(name="conv_rate_plus_val2", dtype=Float64),
    ],
)
def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame:
    df = pd.DataFrame()
    df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"]
    df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"]
    return df

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, it works. I've restored it.

…ving unimplemented part with stream data pushing).

Signed-off-by: Felix-neko <felix-neko@list.ru>
Signed-off-by: Felix-neko <felix-neko@list.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants