Skip to content

Great Expectations Tutorial Updates #8

@arrismo

Description

@arrismo

Hi @josephmachado ,
I could not find the corresponding repo for the Great Expectations Tutorial, so I am opening an issue here!
In the tutorial, you instruct us to run the following code, which produces an AttributeError. This error generates with great expectations >= 1.0, the latest version that new students will most likely install. I am assuming that the tutorial was written using great expectations 0.18.19 which the error does not appear.

See below:

import great_expectations as ge
import numpy as np
import pandas as pd

df_raw = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df = ge.from_pandas(df_raw)
df.expect_column_values_to_not_be_null('A')


AttributeError                            Traceback (most recent call last)
Cell In[12], [line 2](vscode-notebook-cell:?execution_count=12&line=2)
      [1](vscode-notebook-cell:?execution_count=12&line=1) df_raw = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
----> [2](vscode-notebook-cell:?execution_count=12&line=2) df = ge.from_pandas(df_raw)
      [3](vscode-notebook-cell:?execution_count=12&line=3) df.expect_column_values_to_not_be_null('A')

AttributeError: module 'great_expectations' has no attribute 'from_pandas'

Some suggestions:

  1. Specify great expectations version on the blog post
  2. Update the tutorial to use the updated documentation from great expectations

I rewrote the beginning of the tutorial using great expectations 1.2.1 using the sample_data.csv that you provide below:

import great_expectations as gx
import pandas as pd


#create a data context and specify the mode and project directory
context = gx.get_context(mode = "file", project_root_dir= "./data")

#Define the Data Source Name
data_source_name = "data_folder"

#Add the Data Source to the Data Context
data_source = context.data_sources.add_pandas(name = data_source_name)

#Retrieve the Data Source
data_source_name = "data_folder"
data_source = context.data_sources.get(data_source_name)

#Define the Data Asset name
data_asset_name = "my_dataframe_data_asset"

#Add a Data Asset to the Data Source
data_asset = data_source.add_dataframe_asset(name = data_asset_name)

#Retrieve the Data Asset
data_asset = context.data_sources.get(data_source_name).get_asset(data_asset_name)

#Define the Batch Definition name
batch_definition_name = "my_batch_definition"

#Add a Batch Definition to the Data Asset
batch_definition = data_asset.add_batch_definition_whole_dataframe(
     batch_definition_name
 )

#Set path to data
path = "./data/sample_data.csv"

#read in data using pandas
dataframe = pd.read_csv(path)

#Define Batch Paramter Dictionary 
batch_parameters = {"dataframe": dataframe}

#Retrive the dataframe Batch Definition
batch_definition = (
    context.data_sources.get(data_source_name)
    .get_asset(data_asset_name)
    .get_batch_definition(batch_definition_name)
)

#Create Expectation that column Customer_Name is not NULL
expectation = gx.expectations.ExpectColumnValuesToNotBeNull(
    column = "Customer_Name"
)

#Get the dataframe as a Batch 
batch = batch_definition.get_batch(batch_parameters=batch_parameters)

#Test the Expectation
validation_results = batch.validate(expectation)
print(validation_results)

Calculating Metrics: 100%|██████████| 8/8 [00:00<00:00, 1072.20it/s] { "success": true, "expectation_config": { "type": "expect_column_values_to_not_be_null", "kwargs": { "batch_id": "data_folder-my_dataframe_data_asset", "column": "Customer_Name" }, "meta": {} }, "result": { "element_count": 100, "unexpected_count": 0, "unexpected_percent": 0.0, "partial_unexpected_list": [], "partial_unexpected_counts": [], "partial_unexpected_index_list": [] }, "meta": {}, "exception_info": { "raised_exception": false, "exception_traceback": null, "exception_message": null } }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions