A Brief Introduction About Dataclass & Pydantic Module in Python
What
is a python class?
Python is a object oriented programming language.A
Python class is like a template or a blueprint
for creating a new object. An object is anything that you wish to
create or change while working through
the code. Every time a class object is instantiated, which is when we declare a
variable, a new object is initiated from scratch. Class objects can be used
over and over again whenever needed.
Creating
a class in python:
The following defines a Person class with
two attributes name and age:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
@property
def name(self):
return self._name
@name.setter
def name(self, value):
self._name = value
@property
def age(self):
return self._age
@age.setter
def age(self, value):
self._age = value
def __eq__(self, other):
return self.name == other.name and self.age == other.age
def __hash__(self):
return hash(f'{self.name, self.age}')
def __str__(self):
return f'Person(name={self.name},age={self.age})'
def __repr__(self):
return f'Person(name={self.name},age={self.age})'
Typically, when defining a new class, you need to:
Define a list of object’s properties.
Define an __init__ method
to initialize object’s attributes.
Implement the __str__ and __repr__ methods
to represent the objects in human-readable and machine-readable formats.
Implement the __eq__ method
to compare objects by values of all properties.
Implement the __hash__ method
to use the objects of the class as keys of a dictionary or
elements of a set.
As you can see, it requires a lot of code.
What
is Python dataclass?:
dataclass module is introduced in Python 3.7 as
a utility tool to make structured classes specially for storing data. These
classes hold certain properties and functions to deal specifically with the
data and its representation.
DataClasses are widely used in Python3.6 Although the module was
introduced in Python3.7, one can also use it in Python3.6 by installing dataclasses library.
pip install dataclasses
The DataClasses are implemented by using decorators with
classes. Attributes are declared using Type Hints in Python which is
essentially, specifying data type for variables in python.
Creating
A basic Data Class:
# Importing dataclass module
from dataclasses import dataclass
@dataclass
class person():
"""A
class for holding a person content"""
# Attributes
Declaration
# using Type
Hints
name: str
age: int
# A DataClass object
Person1 = person("DataClasses",
"sugumar,
22)
print(person1)
output:
person(name='sugumar', age=22)
The
two noticeable points in above code.
Without a __init__() constructor,
the class accepted values and assigned it to appropriate variables.
The output of printing object is a neat
representation of the data present in it, without any explicit function coded
to do this. That means it has a modified __repr__() function.
The dataclass provides an in built __init__()
constructor to classes which handle the data and object creation for
them.
Also, if you compare two Person‘s objects with
the same attribute value, it’ll return True. For example:
p1 = Person('John', 25)
p2 = Person('John', 25)
print(p1 == p2)
Output:
True
The following discusses other functions that a data
class provides.
Default
values dataclass:
When using a regular class, you can define default
values for attributes. For example, the following Person class has
the iq parameter with the default value of 100.
class Person:
def __init__(self, name, age, iq=100):
self.name = name
self.age = age
self.iq = iq
To define a default value for an attribute in the
dataclass, you assign it to the attribute like this:
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
iq: int = 100
print(Person('John Doe', 25))
Like the parameter rules, the attributes with the
default values must appear after the ones without default values. Therefore,
the following code will not work:
from dataclasses import dataclass
@dataclass
class Person:
iq: int = 100
name: str
age: int
Convert
to a tuple or a dictionary
The dataclasses module has
the astuple() and asdict() functions that convert an
instance of the dataclass to a tuple and
a dictionary.
For example:
from dataclasses import dataclass,
astuple, asdict
@dataclass
class Person:
name: str
age: int
iq: int = 100
p = Person('John Doe', 25)
print(astuple(p))
print(asdict(p))
Output:
('John Doe', 25, 100)
{'name': 'John Doe', 'age': 25, 'iq': 100}
Create
immutable objects in python dataclass:
To create readonly objects from a dataclass, you can
set the frozen argument of the dataclass decorator to True. For example:
from dataclasses import dataclass,
astuple, asdict
@dataclass(frozen=True)
class Person:
name: str
age: int
iq: int = 100
If you attempt to change the attributes of the
object after it is created, you’ll get an error. For example:
p = Person('Jane Doe', 25)
p.iq = 120
Error:
dataclasses.FrozenInstanceError: cannot
assign to field 'iq'
Customize
attribute behaviors In python dataclass:
If don’t want to initialize an attribute in the
__init__ method, you can use the field() function from
the dataclasses module.
The following example defines
the can_vote attribute that is initialized using
the __init__ method:
from dataclasses import dataclass, field
class Person:
name: str
age: int
iq: int = 100
can_vote: bool = field(init=False)
The field() function has multiple
interesting parameters such as repr, hash, compare,
and metadata.
If you want to initialize an attribute that depends
on the value of another attribute, you can use
the __post_init__ method. As its name implies, Python calls
the __post_init__ method after the __init__ method.
The following use the __post_init__ method
to initialize the can_vote attribute based on
the age attribute:
from dataclasses import dataclass, field
@dataclass
class Person:
name: str
age: int
iq: int = 100
can_vote: bool = field(init=False)
def __post_init__(self):
self.can_vote = 18 <= self.age <= 70
p = Person('Jane Doe', 25)
p1=Person(“sugu”16)
print(p)
print(p1)
Output:
Person(name='Jane Doe', age=25, iq=100, can_vote=True)
Person1(name='sugu', age=16, iq=100, can_vote=False)
Pydantic
module :
Pydantic model is a external python module which has
the capability to replace dataclasses.pydantic is similar to the data classes
in python, only difference is pydantic is a external python module but data
classes are built in module in python.pydantic offers more functionality when
comes to validation and also integrated into fast api.
Installation:
Pip
install pydantic
Creating
a basic pydantic model
import pydantic
from typing import Optional
class person(pydantic.BaseModel):
Name:str
Age:int
Password:int
Intrests:Optional[str]
p1=person (Name="sugu",Age=22,Password=123123)
print(p1)
output:
Name='sugu' Age=22 Password=123123 Intrests=None
Validating
the fields:
Pydantic is
more helpful when you add validation to your data .validating the fields means
setting a validator for a specic data in the model.Validating like maximum
length and minimum length and special characters and data type, for this first we need validators.
Example:
from pydantic import BaseModel, validator
class TestModel(BaseModel):
Name:str
password: str
@validator("Name")
def
name_is_lower_case(value):
if not
value.islower():
raise ValueError("Must be lower")
return value
@validator("Name")
def name_is_long_enough(
value):
if len(value)
<5:
raise ValueError("Too short")
return value
@validator("password")
def is_lower_case(value):
if not
value.islower():
raise ValueError("Must be lower")
return value
@validator("password")
def is_long_enough( value):
if len(value)
<5:
raise ValueError("Too short")
return value
input:
1)person1=TestModel(Name=”sugumar”,password=”sugumar123”)
2)person2=TestModel(Name=”sugu”,password=”Sugumar123”)
Output:
1) Name='sugumar' password='sugumar123'
2)person2=TestModel(Name="sugu",password="Sugumar123")
File
"pydantic\main.py", line 331, in
pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 2 validation errors for
TestModel
Name
Too short (type=value_error)
password
Must be lower
(type=value_error)
You can also use root validators for reusing the
validators and json parsing.
Conclusion:
When
to use Dataclasses
Dataclasses are mainly about 'grouping' variables
together. Choose dataclasses if:
The main concern is around the type of the variable, not the value
When
to use Pydantic
Pydantic is about thorough data validation.
Choose pydantic if:
You want to validate the values inside each class
Comments
Post a Comment