How to validate data with pydantic [Cognite Official]

  • 1 September 2022
  • 0 replies
  • 106 views

Userlevel 5

Users are often required to fill in information: YAML, forms, that is then processed by a script. Other developers might also have to send data to an API you created. Nevertheless, there is no guarantee that the user fills in the correct type of information. You might want two things:

  • First, make sure the data you process is of the correct type

  • If it is not, provide the user some meaningful errors

 

Pydantic is a library that helps with data validation. Indeed, thanks to type hints and validators, you can enforce data types, from simple to complex, and apply validation rules to data. It uses python classes, and is quite easy to use.

 

As an example, let’s say we have a machine. This machine has a name, a make, and an ID. We can create a specific class for it.

 

from pydantic import BaseModel

 

 

class Machine(BaseModel):

   id: int

   name: str

   make: str


 

external_data = {

   "id": "123",

   "name": "Fantastic Machine",

   "make": "Machine Manufacturer",

}

 

my_machine = Machine(**external_data)

 

print(my_machine)

 

Output:

id=123 name='Fantastic Machine' make='Machine Manufacturer'

 

 

 

So far, pydantic does not bring value compared to a regular class. Pydantic allows data validation thanks to what are called field types and validators.

 

As you can see in the example above, pydantic converted the id from string to int, which is the type of the id in the class. This happens because that string can be converted to an integer. If that’s not the case, pydantic will raise a meaningful error.

 

class Machine(BaseModel):

   id: int

   name: str

   make: str


 

external_data = {

   "id": "abc",

   "name": "Fantastic Machine",

   "make": "Machine Manufacturer",

}

 

my_machine = Machine(**external_data)

 

Output:

pydantic.error_wrappers.ValidationError: 1 validation error for Machine

id

  value is not a valid integer (type=type_error.integer)


 

Also, if a field is not filled, an error will be raised as well, because those fields are mandatory. We’ll see further how to have an optional field.

 

Now, let’s say for example that we would like to have all the machine name that is always uppercased. If the data is input by a human, we don’t have the guarantee that it will be the case. Thanks to validators, we can enforce that.

 

class Machine(BaseModel):

   id: int

   name: str

   make: str

 

   @validator("name")

   def uppercase_name(cls, v):

       return v.upper()


 

external_data = {

   "id": 123,

   "name": "Fantastic Machine",

   "make": "Machine Manufacturer",

}

 

my_machine = Machine(**external_data)

 

print(my_machine)

 

Output

id=123 name='FANTASTIC MACHINE' make='Machine Manufacturer'



 

Then let’s suppose we have factories, which have a name, a location, and several machines. For each factory, we want to make sure that they have at least two machines, as an example. Names can be optional and will have a default value.

 

class Factory(BaseModel):

   name: Optional[str] = "default"

   location: str

   machines: conlist(min_items=2, item_type=Machine)


 

external_data_machine_1 = {

   "id": 123,

   "name": "Fantastic Machine",

   "make": "Machine Manufacturer",

}

 

external_data_machine_2 = {

   "id": 234,

   "name": "Awesome Machine",

   "make": "Machine Manufacturer",

}

 

machine_1 = Machine(**external_data_machine_1)

machine_2 = Machine(**external_data_machine_2)

 

my_factory = Factory(location="Oslo", machines=[machine_1, machine_2])

print(my_factory)

 

Output:

name='default' location='Oslo' machines=[Machine(id=123, name='FANTASTIC MACHINE', make='Machine Manufacturer'), Machine(id=234, name='AWESOME MACHINE', make='Machine Manufacturer')]

 

Having only one machine in the factory machines field would lead to the following error.

my_factory = Factory(location="Oslo", machines=[machine_1])

 

Output:

pydantic.error_wrappers.ValidationError: 1 validation error for Factory

machines

  ensure this value has at least 2 items (type=value_error.list.min_items; limit_value=2)

 


 

Thanks to pydantic, you can also set dynamic defaults. For example, let’s say you want to keep track of when your python object was created. You can do that as follows:

 

class Machine(BaseModel):

   id: int

   name: str

   make: str

   creation_datetime: datetime = None

 

   @validator("name")

   def uppercase_name(cls, v):

       return v.upper()

 

   @validator("creation_datetime", pre=True, always=True)

   def set_creation_datetime(cls, _):

       return datetime.now()


 

my_machine = Machine(id=123, name="Fantastic Machine", make="Machine Manufacturer")

print(my_machine)

 

Output:

id=123 name='FANTASTIC MACHINE' make='Machine Manufacturer' creation_datetime=datetime.datetime(2022, 9, 1, 8, 37, 36, 110031)

 

The “always” parameter set to true, in the validator, allows the validator to be used even when no value is supplied. The “pre” parameter will cause the validator to be called prior to other validation: this is necessary here, without it pydantic would try to validate None as a datetime, which would return an error.

 

 

 

Root validation is another feature that allows to validate your inputs at a root level. It can be pretty useful when several fields are interdependent for example. More on that topic in the documentation (https://pydantic-docs.helpmanual.io/usage/validators/#root-validators)


 

Pydantic is definitely a big help when it comes to validating data: it is easy to use, easy to test, well documented. It covers more than we covered in that introduction to it. That’s why I encourage you to have a look at the documentation to discover all what Pydantic offers. @Håkon V. Treider , I saw you also use pydantic, is there any best practice you would like to share ? 

 

Documentation: https://pydantic-docs.helpmanual.io/


0 replies

Be the first to reply!

Reply