-
Notifications
You must be signed in to change notification settings - Fork 36
Description
Hi @josephmachado ,
I could not find the corresponding repo for the Great Expectations Tutorial, so I am opening an issue here!
In the tutorial, you instruct us to run the following code, which produces an AttributeError
. This error generates with great expectations >= 1.0
, the latest version that new students will most likely install. I am assuming that the tutorial was written using great expectations 0.18.19
which the error does not appear.
See below:
import great_expectations as ge
import numpy as np
import pandas as pd
df_raw = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df = ge.from_pandas(df_raw)
df.expect_column_values_to_not_be_null('A')
AttributeError Traceback (most recent call last)
Cell In[12], [line 2](vscode-notebook-cell:?execution_count=12&line=2)
[1](vscode-notebook-cell:?execution_count=12&line=1) df_raw = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
----> [2](vscode-notebook-cell:?execution_count=12&line=2) df = ge.from_pandas(df_raw)
[3](vscode-notebook-cell:?execution_count=12&line=3) df.expect_column_values_to_not_be_null('A')
AttributeError: module 'great_expectations' has no attribute 'from_pandas'
Some suggestions:
- Specify
great expectations
version on the blog post - Update the tutorial to use the updated documentation from
great expectations
import great_expectations as gx
import pandas as pd
#create a data context and specify the mode and project directory
context = gx.get_context(mode = "file", project_root_dir= "./data")
#Define the Data Source Name
data_source_name = "data_folder"
#Add the Data Source to the Data Context
data_source = context.data_sources.add_pandas(name = data_source_name)
#Retrieve the Data Source
data_source_name = "data_folder"
data_source = context.data_sources.get(data_source_name)
#Define the Data Asset name
data_asset_name = "my_dataframe_data_asset"
#Add a Data Asset to the Data Source
data_asset = data_source.add_dataframe_asset(name = data_asset_name)
#Retrieve the Data Asset
data_asset = context.data_sources.get(data_source_name).get_asset(data_asset_name)
#Define the Batch Definition name
batch_definition_name = "my_batch_definition"
#Add a Batch Definition to the Data Asset
batch_definition = data_asset.add_batch_definition_whole_dataframe(
batch_definition_name
)
#Set path to data
path = "./data/sample_data.csv"
#read in data using pandas
dataframe = pd.read_csv(path)
#Define Batch Paramter Dictionary
batch_parameters = {"dataframe": dataframe}
#Retrive the dataframe Batch Definition
batch_definition = (
context.data_sources.get(data_source_name)
.get_asset(data_asset_name)
.get_batch_definition(batch_definition_name)
)
#Create Expectation that column Customer_Name is not NULL
expectation = gx.expectations.ExpectColumnValuesToNotBeNull(
column = "Customer_Name"
)
#Get the dataframe as a Batch
batch = batch_definition.get_batch(batch_parameters=batch_parameters)
#Test the Expectation
validation_results = batch.validate(expectation)
print(validation_results)
Calculating Metrics: 100%|██████████| 8/8 [00:00<00:00, 1072.20it/s] { "success": true, "expectation_config": { "type": "expect_column_values_to_not_be_null", "kwargs": { "batch_id": "data_folder-my_dataframe_data_asset", "column": "Customer_Name" }, "meta": {} }, "result": { "element_count": 100, "unexpected_count": 0, "unexpected_percent": 0.0, "partial_unexpected_list": [], "partial_unexpected_counts": [], "partial_unexpected_index_list": [] }, "meta": {}, "exception_info": { "raised_exception": false, "exception_traceback": null, "exception_message": null } }