Applied Python Chronicles: A Gentle Intro to Pydantic

-

What about default values and argument extractions?

from pydantic import validate_call

@validate_call(validate_return=True)
def add(*args: int, a: int, b: int = 4) -> int:
return str(sum(args) + a + b)

# ----
add(4,3,4)
> ValidationError: 1 validation error for add
a
Missing required keyword only argument [type=missing_keyword_only_argument, input_value=ArgsKwargs((4, 3, 4)), input_type=ArgsKwargs]
For further information visit

# ----

add(4, 3, 4, a=3)
> 18

# ----

@validate_call
def add(*args: int, a: int, b: int = 4) -> int:
return str(sum(args) + a + b)

# ----

add(4, 3, 4, a=3)
> '18'

Takeaways from this instance:

  • You may annotate the variety of the variable variety of arguments declaration (*args).
  • Default values are still an option, even for those who are annotating variable data types.
  • validate_call accepts validate_return argument, which makes function return value validation as well. Data type coercion can also be applied on this case. validate_return is about to False by default. Whether it is left because it is, the function may not return what is asserted in type hinting.

What about if you should validate the information type but additionally constrain the values that variable can take? Example:

from pydantic import validate_call, Field
from typing import Annotated

type_age = Annotated[int, Field(lt=120)]

@validate_call(validate_return=True)
def add(age_one: int, age_two: type_age) -> int:
return age_one + age_two

add(3, 300)
> ValidationError: 1 validation error for add
1
Input needs to be lower than 120 [type=less_than, input_value=200, input_type=int]
For further information visit

This instance shows:

  • You need to use Annotated and pydantic.Field to not only validate data type but additionally add metadata that Pydantic uses to constrain variable values and formats.
  • ValidationError is yet again very verbose about what was mistaken with our function call. This could be really helpful.

Here is another example of how you’ll be able to each validate and constrain variable values. We’ll simulate a payload (dictionary) that you should process in your function after it has been validated:

from pydantic import HttpUrl, PastDate
from pydantic import Field
from pydantic import validate_call
from typing import Annotated

Name = Annotated[str, Field(min_length=2, max_length=15)]

@validate_call(validate_return=True)
def process_payload(url: HttpUrl, name: Name, birth_date: PastDate) -> str:
return f'{name=}, {birth_date=}'

# ----

payload = {
'url': 'httpss://example.com',
'name': 'J',
'birth_date': '2024-12-12'
}

process_payload(**payload)
> ValidationError: 3 validation errors for process_payload
url
URL scheme needs to be 'http' or 'https' [type=url_scheme, input_value='httpss://example.com', input_type=str]
For further information visit
name
String must have at the least 2 characters [type=string_too_short, input_value='J', input_type=str]
For further information visit
birth_date
Date needs to be up to now [type=date_past, input_value='2024-12-12', input_type=str]
For further information visit

# ----

payload = {
'url': '',
'name': 'Joe-1234567891011121314',
'birth_date': '2020-12-12'
}

process_payload(**payload)
> ValidationError: 1 validation error for process_payload
name
String must have at most 15 characters [type=string_too_long, input_value='Joe-1234567891011121314', input_type=str]
For further information visit

This was the fundamentals of easy methods to validate function arguments and their return value.

Now, we are going to go to the second most significant way Pydantic could be used to validate and process data: through defining models.

This part is more interesting for the needs of information processing, as you will notice.

Up to now, we have now used validate_call to brighten functions and specified function arguments and their corresponding types and constraints.

Here, we define models by defining model classes, where we specify fields, their types, and constraints. This may be very much like what we did previously. By defining a model class that inherits from Pydantic BaseModel, we use a hidden mechanism that does the information validation, parsing, and serialization. What this provides us is the flexibility to create objects that conform to model specifications.

Here is an example:

from pydantic import Field
from pydantic import BaseModel

class Person(BaseModel):
name: str = Field(min_length=2, max_length=15)
age: int = Field(gt=0, lt=120)

# ----

john = Person(name='john', age=20)
> Person(name='john', age=20)

# ----

mike = Person(name='m', age=0)
> ValidationError: 2 validation errors for Person
name
String must have at the least 2 characters [type=string_too_short, input_value='j', input_type=str]
For further information visit
age
Input needs to be greater than 0 [type=greater_than, input_value=0, input_type=int]
For further information visit

You need to use annotation here as well, and you can too specify default values for fields. Let’s see one other example:

from pydantic import Field
from pydantic import BaseModel
from typing import Annotated

Name = Annotated[str, Field(min_length=2, max_length=15)]
Age = Annotated[int, Field(default=1, ge=0, le=120)]

class Person(BaseModel):
name: Name
age: Age

# ----

mike = Person(name='mike')
> Person(name='mike', age=1)

Things get very interesting when your use case gets a bit complex. Remember the payload that we defined? I’ll define one other, more complex structure that we are going to undergo and validate. To make it more interesting, let’s create a payload that we are going to use to question a service that acts as an intermediary between us and LLM providers. Then we are going to validate it.

Here is an example:

from pydantic import Field
from pydantic import BaseModel
from pydantic import ConfigDict

from typing import Literal
from typing import Annotated
from enum import Enum

payload = {
"req_id": "test",
"text": "This can be a sample text.",
"instruction": "embed",
"llm_provider": "openai",
"llm_params": {
"llm_temperature": 0,
"llm_model_name": "gpt4o"
},
"misc": "what"
}

ReqID = Annotated[str, Field(min_length=2, max_length=15)]

class LLMProviders(str, Enum):
OPENAI = 'openai'
CLAUDE = 'claude'

class LLMParams(BaseModel):
temperature: int = Field(validation_alias='llm_temperature', ge=0, le=1)
llm_name: str = Field(validation_alias='llm_model_name',
serialization_alias='model')

class Payload(BaseModel):
req_id: str = Field(exclude=True)
text: str = Field(min_length=5)
instruction: Literal['embed', 'chat']
llm_provider: LLMProviders
llm_params: LLMParams

# model_config = ConfigDict(use_enum_values=True)

# ----

validated_payload = Payload(**payload)
validated_payload
> Payload(req_id='test',
text='This can be a sample text.',
instruction='embed',
llm_provider=,
llm_params=LLMParams(temperature=0, llm_name='gpt4o'))

# ----

validated_payload.model_dump()
> {'text': 'This can be a sample text.',
'instruction': 'embed',
'llm_provider': ,
'llm_params': {'temperature': 0, 'llm_name': 'gpt4o'}}

# ----

validated_payload.model_dump(by_alias=True)
> {'text': 'This can be a sample text.',
'instruction': 'embed',
'llm_provider': ,
'llm_params': {'temperature': 0, 'model': 'gpt4o'}}

# ----

# After adding
# model_config = ConfigDict(use_enum_values=True)
# in Payload model definition, you get

validated_payload.model_dump(by_alias=True)
> {'text': 'This can be a sample text.',
'instruction': 'embed',
'llm_provider': 'openai',
'llm_params': {'temperature': 0, 'model': 'gpt4o'}}

Among the vital insights from this elaborated example are:

  • You need to use Enums or Literal to define a listing of specific values which might be expected.
  • In case you should name a model’s field otherwise from the sector name within the validated data, you need to use validation_alias. It specifies the sector name in the information being validated.
  • serialization_alias is used when the model’s internal field name isn’t necessarily the identical name you should use once you serialize the model.
  • Field could be excluded from serialization with exclude=True.
  • Model fields could be Pydantic models as well. The strategy of validation in that case is completed recursively. This part is absolutely awesome, since Pydantic does the job of going into depth while validating nested structures.
  • Fields that usually are not taken into consideration within the model definition usually are not parsed.

Here I’ll show you the snippets of code that show where and the way you need to use Pydantic in your day-to-day tasks.

Say you’ve data you could validate and process. It may well be stored in CSV, Parquet files, or, for instance, in a NoSQL database in the shape of a document. Let’s take the instance of a CSV file, and let’s say you should process its content.

Here is the CSV file (test.csv) example:

name,age,bank_account
johnny,0,20
matt,10,0
abraham,100,100000
mary,15,15
linda,130,100000

And here is the way it is validated and parsed:

from pydantic import BaseModel
from pydantic import Field
from pydantic import field_validator
from pydantic import ValidationInfo
from typing import List
import csv

FILE_NAME = 'test.csv'

class DataModel(BaseModel):
name: str = Field(min_length=2, max_length=15)
age: int = Field(ge=1, le=120)
bank_account: float = Field(ge=0, default=0)

@field_validator('name')
@classmethod
def validate_name(cls, v: str, info: ValidationInfo) -> str:
return str(v).capitalize()

class ValidatedModels(BaseModel):
validated: List[DataModel]

validated_rows = []

with open(FILE_NAME, 'r') as f:
reader = csv.DictReader(f, delimiter=',')
for row in reader:
try:
validated_rows.append(DataModel(**row))
except ValidationError as ve:
# print out error
# disregard the record
print(f'{ve=}')

validated_rows
> [DataModel(name='Matt', age=10, bank_account=0.0),
DataModel(name='Abraham', age=100, bank_account=100000.0),
DataModel(name='Mary', age=15, bank_account=15.0)]

validated = ValidatedModels(validated=validated_rows)
validated.model_dump()
> {'validated': [{'name': 'Matt', 'age': 10, 'bank_account': 0.0},
{'name': 'Abraham', 'age': 100, 'bank_account': 100000.0},
{'name': 'Mary', 'age': 15, 'bank_account': 15.0}]}

FastAPI is already integrated with Pydantic, so this one goes to be very temporary. The best way FastAPI handles requests is by passing them to a function that handles the route. By passing this request to a function, validation is performed routinely. Something much like validate_call that we mentioned initially of this text.

Example of app.py that’s used to run FastAPI-based service:

from fastapi import FastAPI
from pydantic import BaseModel, HttpUrl

class Request(BaseModel):
request_id: str
url: HttpUrl

app = FastAPI()

@app.post("/search/by_url/")
async def create_item(req: Request):
return item

Pydantic is a extremely powerful library and has numerous mechanisms for a large number of various use cases and edge cases as well. Today, I explained probably the most basic parts of how you must use it, and I’ll provide references below for many who usually are not faint-hearted.

Go and explore. I’m sure it would serve you well on different fronts.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x