A Brief Introduction About Dataclass & Pydantic Module in Python

 

What is a python class?

Python is a object oriented programming language.A Python class is like a template or a blueprint  for creating a new object. An object is anything that you wish to create  or change while working through the code. Every time a class object is instantiated, which is when we declare a variable, a new object is initiated from scratch. Class objects can be used over and over again whenever needed.


Creating a class in python:

The following defines a Person class with two attributes name and age:

class Person:

    def __init__(self, name, age):

        self.name = name

        self.age = age

 

    @property

    def name(self):

        return self._name

 

    @name.setter

    def name(self, value):

        self._name = value

 

    @property

    def age(self):

        return self._age

 

    @age.setter

    def age(self, value):

        self._age = value

 

    def __eq__(self, other):

        return self.name == other.name and self.age == other.age

 

    def __hash__(self):

        return hash(f'{self.name, self.age}')

 

    def __str__(self):

        return f'Person(name={self.name},age={self.age})'

 

    def __repr__(self):

        return f'Person(name={self.name},age={self.age})'

 

Typically, when defining a new class, you need to:

Define a list of object’s properties.

Define an __init__ method to initialize object’s attributes.

Implement the __str__ and __repr__ methods to represent the objects in human-readable and machine-readable formats.

Implement the __eq__ method to compare objects by values of all properties.

Implement the __hash__ method to use the objects of the class as keys of a dictionary or elements of a set.

As you can see, it requires a lot of code.

 

What is Python dataclass?:

dataclass module is introduced in Python 3.7 as a utility tool to make structured classes specially for storing data. These classes hold certain properties and functions to deal specifically with the data and its representation.
DataClasses are widely used in Python3.6 Although the module was introduced in Python3.7, one can also use it in Python3.6 by installing dataclasses library. 


                   pip install dataclasses

 

The DataClasses are implemented by using decorators with classes. Attributes are declared using Type Hints in Python which is essentially, specifying data type for variables in python.

 

Creating  A basic Data Class:

 

# Importing dataclass module

from dataclasses import dataclass

 

@dataclass

class person():

    """A class for holding a person content"""

 

    # Attributes Declaration

    # using Type Hints

 

    name: str

    age: int

   

 

# A DataClass object

Person1 = person("DataClasses",

                     "sugumar, 22)

print(person1)

 

output:

        person(name='sugumar', age=22)

 

The two noticeable points in above code. 
 

Without a __init__() constructor, the class accepted values and assigned it to appropriate variables.

The output of printing object is a neat representation of the data present in it, without any explicit function coded to do this. That means it has a modified __repr__() function.

 

The dataclass provides an in built __init__() constructor to classes which handle the data and object creation for them. 
 

Also, if you compare two Person‘s objects with the same attribute value, it’ll return True. For example:

p1 = Person('John', 25)

p2 = Person('John', 25)

print(p1 == p2)

Output:

True

 

The following discusses other functions that a data class provides.

Default values dataclass:

When using a regular class, you can define default values for attributes. For example, the following Person class has the iq parameter with the default value of 100.

class Person:

    def __init__(self, name, age, iq=100):

        self.name = name

        self.age = age

        self.iq = iq

 

To define a default value for an attribute in the dataclass, you assign it to the attribute like this:

from dataclasses import dataclass

@dataclass

class Person:

    name: str

    age: int

    iq: int = 100

print(Person('John Doe', 25))

Like the parameter rules, the attributes with the default values must appear after the ones without default values. Therefore, the following code will not work:

from dataclasses import dataclass

@dataclass

class Person:

    iq: int = 100

    name: str

    age: int

 

Convert to a tuple or a dictionary

The dataclasses module has the astuple() and asdict() functions that convert an instance of the dataclass to a tuple and a dictionary. For example:

from dataclasses import dataclass, astuple, asdict

@dataclass

class Person:

    name: str

    age: int

    iq: int = 100

p = Person('John Doe', 25)

print(astuple(p))

print(asdict(p))

 

Output:

('John Doe', 25, 100)

{'name': 'John Doe', 'age': 25, 'iq': 100}

 

Create immutable objects in python dataclass:

To create readonly objects from a dataclass, you can set the frozen argument of the dataclass decorator to True. For example:

from dataclasses import dataclass, astuple, asdict

@dataclass(frozen=True)

class Person:

    name: str

    age: int

    iq: int = 100

 

If you attempt to change the attributes of the object after it is created, you’ll get an error. For example:

p = Person('Jane Doe', 25)

p.iq = 120

Error:

dataclasses.FrozenInstanceError: cannot assign to field 'iq'

 

Customize attribute behaviors In python dataclass:

If don’t want to initialize an attribute in the __init__ method, you can use the field() function from the dataclasses module.

The following example defines the can_vote attribute that is initialized using the __init__ method:

from dataclasses import dataclass, field

class Person:

    name: str

    age: int

    iq: int = 100

    can_vote: bool = field(init=False)

The field() function has multiple interesting parameters such as repr, hash, compare, and metadata.

If you want to initialize an attribute that depends on the value of another attribute, you can use the __post_init__ method. As its name implies, Python calls the __post_init__ method after the __init__ method.

The following use the __post_init__ method to initialize the can_vote attribute based on the age attribute:

from dataclasses import dataclass, field

@dataclass

class Person:

    name: str

    age: int

    iq: int = 100

    can_vote: bool = field(init=False)

 

    def __post_init__(self):

           self.can_vote = 18 <= self.age <= 70

p = Person('Jane Doe', 25)

p1=Person(“sugu”16)

print(p)

print(p1)

 

Output:

Person(name='Jane Doe', age=25, iq=100, can_vote=True)

Person1(name='sugu', age=16, iq=100, can_vote=False)

 

 

Pydantic module :

Pydantic model is a external python module which has the capability to replace dataclasses.pydantic is similar to the data classes in python, only difference is pydantic is a external python module but data classes are built in module in python.pydantic offers more functionality when comes to validation and also integrated into fast api.

 

Installation:

   

      Pip install pydantic

 

Creating a basic pydantic model

     

import pydantic

from typing import Optional

class person(pydantic.BaseModel):

    Name:str

    Age:int

    Password:int

    Intrests:Optional[str]

 

p1=person (Name="sugu",Age=22,Password=123123)

print(p1)

 

output:

Name='sugu' Age=22 Password=123123 Intrests=None

 

 

Validating the fields:

 Pydantic is more helpful when you add validation to your data .validating the fields means setting a validator for a specic data in the model.Validating like maximum length and minimum length and special characters and data  type, for this first we need validators.

 

Example:

from pydantic import BaseModel, validator

 

class TestModel(BaseModel):

    Name:str

    password: str

 

    @validator("Name")

    def name_is_lower_case(value):

        if not value.islower():

            raise ValueError("Must be lower")

        return value

    @validator("Name")

    def name_is_long_enough( value):

        if len(value) <5:

            raise ValueError("Too short")

        return value

 

    @validator("password")

    def is_lower_case(value):

        if not value.islower():

            raise ValueError("Must be lower")

        return value

    @validator("password")

    def is_long_enough( value):

        if len(value) <5:

            raise ValueError("Too short")

        return value

 

input:

 

1)person1=TestModel(Name=”sugumar”,password=”sugumar123”)

2)person2=TestModel(Name=”sugu”,password=”Sugumar123”)

 

Output:

1) Name='sugumar' password='sugumar123'

 

2)person2=TestModel(Name="sugu",password="Sugumar123")

  File "pydantic\main.py", line 331, in pydantic.main.BaseModel.__init__     

pydantic.error_wrappers.ValidationError: 2 validation errors for TestModel   

Name

  Too short (type=value_error)

password

  Must be lower (type=value_error)

 

You can also use root validators for reusing the validators  and json parsing.

 

Conclusion:

When to use Dataclasses

Dataclasses are mainly about 'grouping' variables together. Choose dataclasses if:

The main concern is around the type of the variable, not the value

When to use Pydantic

Pydantic is about thorough data validation. Choose pydantic if:

You want to validate the values inside each class


 

Comments

Popular posts from this blog

Oracle Database Server Architecture: Overview

Oracle E-Business Suite (EBS) - Introduction

Why enterprises must not ignore Azure DevOps Server