-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fixing some bugs in example feature repo for spark #5407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- creating parquet files with timestamps in us units instead of ns (pyspark has problems reading ns timestamps) - removing on-the-fly features that aren't declared before - settings join keys for entities. Signed-off-by: Felix-neko <felix-neko@list.ru>
@@ -109,8 +107,6 @@ def fetch_online_features(store, use_feature_service: bool): | |||
features_to_fetch = [ | |||
"driver_hourly_stats:acc_rate", | |||
"driver_hourly_stats:avg_daily_trips", | |||
"transformed_conv_rate:conv_rate_plus_val1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh these didn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This on-the-fly feature was not declared in the example feature repo for Spark and i have just removed it from this example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code under aws/feature_repo/example_repo.py
had this ODFV:
# Define an on demand feature view which can generate new features based on
# existing feature views and RequestSource features
@on_demand_feature_view(
sources=[driver_stats_fv, input_request],
schema=[
Field(name="conv_rate_plus_val1", dtype=Float64),
Field(name="conv_rate_plus_val2", dtype=Float64),
],
)
def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame:
df = pd.DataFrame()
df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"]
df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"]
return df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, it works. I've restored it.
…ving unimplemented part with stream data pushing). Signed-off-by: Felix-neko <felix-neko@list.ru>
Signed-off-by: Felix-neko <felix-neko@list.ru>
Hi folks!
I've just tried to do
feast init -t spark
and to run generatedtest_workflow.py
(Python 3.11, PySpark 3.5, OS Ubuntu 24.04).There were some errors that I was able to fix, maybe it will be helpful.