DataFrameModel¶
pandera
offer a class-based DataFrameModel schema for data validation.
Validate with DataFrameModel¶
Alternatively, pandera
support a class-based DataFrameModel
schema.
An example of schema look like:
import pandera as pa
from pandera.typing import Series
class ExampleIrisDataSchema(pa.DataFrameModel):
sepal_length: Series[float] = pa.Field(gt=2000)
We suggest to create a schemas
folder to keep things organised. You can put this in src/kedro_pandera_tutorial/schemas/example_iris_data.py
and create a src/kedro_pandera_tutorial/schemas/__init__.py
file.
The __init__.py
need to import the class.
from .example_iris_data import ExampleIrisDataSchema
The file structure should look like this:
src/kedro_pandera_tutorial/schemas/
├── __init__.py
└── example_iris_data.py
Update the catalog¶
example_iris_data:
type: pandas.CSVDataset
filepath: data/01_raw/iris.csv
metadata:
pandera:
schema: ${pa.python:kedro_pandera_tutorial.schemas.ExampleIrisDataSchema}
Here you will use the pa.python
resolver to resolve the Python-based schema class.