Before PEP 557 landed in Python 3.7, every value class meant writing the same five methods by hand: __init__ to assign the fields, __repr__ for debuggable output, __eq__ for value comparison, often __hash__ to keep dict-key usage working, sometimes the ordering dunders for sorting. Thirty lines of boilerplate for what felt like four attributes. The @dataclass decorator collapses all of it into a single line above the class, infers the methods from your type-annotated fields, and gives you opt-ins for immutability (frozen=True), memory layout (slots=True), comparison (order=True), and validation (__post_init__). This entry covers what the decorator actually generates, the traps that survived the migration from hand-written classes, and the cases where alternatives like NamedTuple or Pydantic fit better. One of 10 explainers in our Python OOP complete guide.
Key Takeaways
@dataclass auto-generates__init__, __repr__, __eq__ from your type-annotated fields. Opt in to __hash__, ordering, and __slots__ with parameters.
Mutable defaults need field(default_factory=list). A bare items: list = [] raises ValueError at class creation.
frozen=True makes instances immutable and hashable. Required for set/dict-key use of mutable-by-default dataclasses.
slots=True (Python 3.10+) cuts per-instance memory by 40 to 50% and speeds attribute access.
Choose the alternative when its differentiator matters: NamedTuple for positional access, attrs for validators, Pydantic for runtime input validation.
Five lines you write, seven-plus methods Python generates. The opt-in row activates with decorator parameters.
The Minimum @dataclass
Three lines of decorator-and-class give you a fully functional value object:
from dataclasses import dataclass
@dataclass
class Point:
x: int
y: int
p1 = Point(3, 4)
p2 = Point(3, 4)
print(p1) # Point(x=3, y=4)
print(p1 == p2) # True
print(p1.x, p1.y) # 3 4
The annotated fields (x: int, y: int) tell the decorator what to wire up. Type hints are required; without them, the field is not recognized. Note that the type annotation isn't enforced at runtime by @dataclass itself, so Point("hello", "world") still constructs an instance. The hint is for documentation, type checkers, and the decorator's introspection.
What Gets Generated
The PSF docs on dataclasses spell out the full list. From a plain @dataclass decoration you always get:
__init__(self, ...) accepting all fields as parameters in declaration order, with their declared defaults.
__eq__(other) comparing field-by-field, treating NotImplemented correctly for non-matching types.
__hash__ set to None, which makes instances unhashable by default (matching the rule covered in the dunders guide: define __eq__ and you must explicitly handle __hash__).
Add parameters to the decorator and more methods activate:
@dataclass(frozen=True) makes assignment raise FrozenInstanceError and generates a real __hash__ based on the fields.
@dataclass(order=True) generates __lt__, __le__, __gt__, __ge__ using tuple comparison of fields in declaration order.
@dataclass(slots=True) (Python 3.10+) returns a new class with __slots__ set to the declared fields, dropping the per-instance __dict__.
If you want a default __init__ but a custom __repr__, write your own __repr__ in the class body; the decorator skips generating methods you've already defined.
The Mutable Default Trap
The single most common @dataclass error. You write:
@dataclass
class ShoppingCart:
items: list = [] # ValueError!
# ValueError: mutable default <class 'list'> for field items
# is not allowed: use default_factory
The reason: a bare = [] at class level creates a SINGLE list shared across every instance. Without the dataclass machinery you'd hit the classic shared-default bug (covered in the mutable default guide). The decorator catches the mistake at class-creation time and forces you to use field(default_factory=list):
from dataclasses import dataclass, field
@dataclass
class ShoppingCart:
items: list = field(default_factory=list)
c1 = ShoppingCart()
c2 = ShoppingCart()
c1.items.append("apple")
print(c2.items) # [] — not affected
default_factory takes any zero-argument callable. list, dict, set, and any custom function work. The factory is called once per new instance, giving each one a fresh value.
field() for Per-Field Configuration
field() is the per-field configuration tool. Common arguments:
from dataclasses import dataclass, field
@dataclass
class Order:
id: str
items: list = field(default_factory=list)
total: float = field(default=0.0)
notes: str = field(default="", repr=False) # Skip in __repr__
cached_total: float = field(init=False, default=0.0) # Not in __init__
secret: str = field(default="", compare=False) # Skip in __eq__
What each parameter controls:
default: literal default value (only for immutable types).
default_factory: callable producing the default per instance.
init: include the field as an __init__ parameter (default True).
repr: include the field in __repr__ (default True).
compare: include the field in __eq__ and ordering (default True).
hash: include in __hash__ (default None, which mirrors compare).
metadata: arbitrary dict for third-party tools, ignored by @dataclass itself.
You rarely need all of these in one class, but each solves a real need.
__post_init__ for Validation and Derived Fields
The auto-generated __init__ only assigns the fields. If you need validation, computed values, or any side effect after construction, define __post_init__:
from dataclasses import dataclass, field
from math import sqrt
@dataclass
class Circle:
radius: float
area: float = field(init=False) # Computed, not passed in
def __post_init__(self):
if self.radius < 0:
raise ValueError(f"radius must be non-negative, got {self.radius}")
self.area = 3.14159 * self.radius ** 2
c = Circle(radius=5)
print(c.area) # 78.53975
Circle(radius=-1) # ValueError
The area field uses init=False, so it's not part of the constructor signature. __post_init__ computes it after the radius is set.
Frozen + __post_init__ caveat: with frozen=True, you can't assign attributes normally. Use object.__setattr__(self, "area", value) to bypass the frozen check inside __post_init__. The technique is documented in the PSF reference but easy to miss.
frozen=True for Immutability
frozen=True rejects post-construction mutation:
from dataclasses import dataclass
@dataclass(frozen=True)
class Coordinate:
lat: float
lon: float
c = Coordinate(40.7, -74.0)
c.lat = 41.0
# dataclasses.FrozenInstanceError: cannot assign to field 'lat'
Two benefits: instances behave as proper immutable values (no surprise mutations across function calls), and Python auto-generates __hash__ based on the fields, so frozen dataclasses are hashable by default.
locations = {Coordinate(40.7, -74.0), Coordinate(40.7, -74.0)}
print(len(locations)) # 1 — equal coordinates collapse in the set
For mutable working state (records being edited, builders, accumulators), leave frozen=False. For value objects (config, identifiers, message envelopes, geometry), default to frozen=True.
slots=True for Memory Wins
Python 3.10 added slots=True as a decorator parameter. The decorator creates a new class with __slots__ set to the field names, eliminating the per-instance __dict__:
from dataclasses import dataclass
import sys
@dataclass
class PointRegular:
x: int
y: int
@dataclass(slots=True)
class PointSlotted:
x: int
y: int
print(sys.getsizeof(PointRegular(1, 2).__dict__)) # 104 bytes (the dict alone)
print(hasattr(PointSlotted(1, 2), '__dict__')) # False
The memory saving compounds with instance count. A class holding a million Point instances drops from roughly 200 MB to about 100 MB just from the __dict__ elimination.
The trade-off: instances reject ad-hoc attributes. p.z = 5 raises AttributeError because z isn't in the slots. This catches typos early but blocks dynamic attribute attachment, which some patterns rely on. Use slots=True for high-volume value objects; skip it when the speedup doesn't matter.
InitVar for Init-Only Parameters
Sometimes you need a value at construction time that isn't itself a field. InitVar handles this:
from dataclasses import dataclass, field, InitVar
@dataclass
class User:
name: str
password_hash: str = field(init=False)
password: InitVar[str] = ""
def __post_init__(self, password):
self.password_hash = hash(password) if password else ""
u = User(name="alice", password="secret123")
print(u.name, u.password_hash)
# alice 8762345...
print(u.password)
# AttributeError — not an actual field
InitVar declares a constructor parameter that gets passed to __post_init__ but is NOT stored as an instance attribute. Useful for inputs that get transformed into derived state.
Inheritance with Dataclasses
Dataclasses inherit cleanly. The subclass's fields are added after the parent's in the constructor signature:
@dataclass
class Animal:
name: str
legs: int = 4
@dataclass
class Dog(Animal):
breed: str = "mixed" # Defaulted to keep ordering valid
d = Dog(name="Rex", legs=4, breed="Labrador")
print(d) # Dog(name='Rex', legs=4, breed='Labrador')
The constraint to know: if the parent has any field with a default value, EVERY subclass field added later must also have a default. Python's __init__ signature rules forbid required parameters after defaulted ones, and that constraint flows through dataclass inheritance.
kw_only=True for Cleaner Constructors
Python 3.10 added kw_only=True as a decorator parameter (and a per-field option). It forces callers to use keyword arguments, which improves readability when classes have many fields:
from dataclasses import dataclass
@dataclass(kw_only=True)
class APIClient:
base_url: str
timeout: float = 30.0
max_retries: int = 3
user_agent: str = "MyApp/1.0"
verify_ssl: bool = True
# Positional construction fails:
APIClient("https://api.example.com", 60)
# TypeError: APIClient.__init__() takes 1 positional argument but 3 were given
# Keyword-only works and reads clearly:
client = APIClient(
base_url="https://api.example.com",
timeout=60,
max_retries=5,
)
The benefit is readability at the call site: APIClient(base_url=..., timeout=60) is unambiguous, while APIClient("https://api.example.com", 60, 5, "X", True) requires the reader to remember the parameter order. For classes with three or more fields, kw_only=True is the safer default.
Per-field kw_only works the same way:
@dataclass
class Mixed:
name: str # Positional or keyword
age: int # Positional or keyword
email: str = field(kw_only=True) # Keyword-only
Mixed("Alice", 30, email="a@b.com") # OK
Mixed("Alice", 30, "a@b.com") # TypeError
This sidesteps the "non-default after default" inheritance constraint covered above; kw_only fields don't participate in positional ordering, so they can be added without breaking parent class signatures.
Introspection with fields()
The dataclasses.fields() function returns a tuple of Field objects describing each field. This is the foundation for any serialization or generic processing code:
from dataclasses import dataclass, fields
@dataclass
class Product:
name: str
price: float
in_stock: bool = True
for f in fields(Product):
print(f.name, f.type, f.default)
# name <class 'str'> <dataclasses._MISSING_TYPE object>
# price <class 'float'> <dataclasses._MISSING_TYPE object>
# in_stock <class 'bool'> True
The standard library also ships asdict() and astuple() for shallow conversion to dict and tuple. They walk the field list and recursively convert nested dataclasses:
asdict is especially useful for JSON serialization: pair it with json.dumps and a dataclass becomes API-ready. For the inverse direction (parsing JSON into a dataclass), the standard library doesn't help, which is one of the cases where Pydantic earns its keep.
Comparing the Alternatives
@dataclass isn't the only way to build value-like classes. The honest comparison:
Tool
Use when
Skip when
@dataclass
Standard-library only, mutable or frozen value objects, type-hinted fields
You need runtime input validation or JSON serialization
You need mutation, field defaults beyond the simple, or custom methods
TypedDict
Working with dict shapes from JSON or external APIs, type-checking only
You want real classes with methods
attrs
Need converters, validators, or features not yet in @dataclass
Standard library is enough
Pydantic
Runtime validation, JSON parsing, schema generation, API boundaries
In-process value objects where validation overhead isn't justified
The general rule: @dataclass for internal value objects, Pydantic at the boundary where untrusted data arrives, NamedTuple for tiny positional records, attrs when you specifically need its features.
Real-World Example: An HTTP Request Wrapper
A typical service-layer value object combining several features:
from dataclasses import dataclass, field
from typing import Optional
@dataclass(frozen=True, slots=True)
class HTTPRequest:
method: str
url: str
headers: dict = field(default_factory=dict)
body: Optional[bytes] = None
timeout: float = 30.0
def __post_init__(self):
if self.method not in ("GET", "POST", "PUT", "DELETE", "PATCH"):
raise ValueError(f"unsupported method: {self.method}")
req = HTTPRequest(method="GET", url="https://api.example.com/items")
print(req)
# HTTPRequest(method='GET', url='https://api.example.com/items',
# headers={}, body=None, timeout=30.0)
# Hashable because frozen=True
cache = {req: "cached response"}
print(cache[req])
The class is immutable (safe to pass between threads), hashable (works as a dict key for caching), uses __slots__ for memory efficiency at scale, validates the method on construction, and uses default_factory for the headers dict. Five powerful @dataclass features composed in 12 lines.
Common Mistakes
1. Forgetting type annotations
Fields without type hints are NOT picked up by @dataclass. x = 5 in the class body is a class attribute, not a dataclass field. Always annotate: x: int = 5.
2. Using list = [] instead of field(default_factory=list)
The decorator catches it and raises ValueError, but new readers often misread the error. The fix is always default_factory.
3. Mixing required and defaulted fields
@dataclass
class Bad:
a: int = 0
b: int # TypeError — non-default after default
Defaulted fields must come AFTER required ones, matching standard Python function signature rules.
4. Trying to mutate frozen instances
If you find yourself wanting to mutate a frozen instance, the answer is usually to make a new one. dataclasses.replace(obj, field=new_value) does this cleanly.
5. Using @dataclass for input validation
@dataclass doesn't validate types at runtime. If you're parsing untrusted data, use Pydantic instead. @dataclass is a structuring tool, not a validation framework.
Frequently Asked Questions
What does the Python @dataclass decorator actually generate?
@dataclass auto-generates __init__ (accepting all declared fields as parameters with their declared defaults), __repr__ (returning ClassName(field1=value1, field2=value2)), and __eq__ (comparing instances field-by-field). With eq=True (the default) it also sets __hash__ to None, making instances unhashable. Add frozen=True and it generates a real __hash__ based on the fields. Add order=True and it generates the ordering dunders __lt__, __le__, __gt__, __ge__ using tuple comparison of fields in declaration order. Add slots=True (Python 3.10+) and it adds __slots__ to reduce per-instance memory. The decorator is shorthand for 20 to 40 lines of repetitive method definitions.
Why can't I use a mutable default like an empty list in a Python dataclass?
Python raises ValueError if you write items: list = [] in a dataclass field because mutable class-level defaults are a classic bug: every instance would share the SAME list, and appending to one instance's items would mutate the shared default. The fix is field(default_factory=list), which calls list() once per instance to create a fresh empty list. The same pattern applies to dict, set, and any custom mutable type. The dataclass machinery raises this at class-creation time rather than letting the bug appear at runtime, which is one of @dataclass's quiet wins over hand-written __init__.
When should I use frozen=True for a Python dataclass?
Use frozen=True when the instance represents an immutable value: configuration records, message envelopes, identifiers, anything that should not change after construction. Frozen dataclasses raise FrozenInstanceError on attribute assignment after __init__, and they automatically get a real __hash__ (so instances can serve as dict keys or set members). The trade-off: any computed-field logic inside __post_init__ must use object.__setattr__ to bypass the frozen check. For most domain-value cases the immutability is worth that small ceremony. Use frozen=False (the default) for mutable records like rows being updated, in-progress builders, or anything edited after construction.
What does slots=True do for a Python dataclass?
slots=True (added in Python 3.10) generates a __slots__ tuple listing exactly the dataclass's declared fields, which tells Python to skip the per-instance __dict__ and lay attributes out in a fixed-size array instead. The benefits are real: per-instance memory drops by roughly 40 to 50% on common workloads, and attribute access gets a measurable speedup. The cost: instances no longer accept ad-hoc attributes (writing a new attribute name raises AttributeError), and __slots__ classes don't play well with some patterns like functools.cached_property on the same field name. Use slots=True for high-volume value objects (millions of instances, hot loops, memory-bound workloads); skip it for one-off configuration records where the speedup doesn't matter.
When should I use NamedTuple, attrs, or Pydantic instead of @dataclass?
NamedTuple gives immutability and tuple-like positional access in one cheap built-in, but lacks methods or validation. Use it for lightweight return types from functions where (x, y) syntax matters. attrs (the third-party library that inspired @dataclass) offers more features like validators, converters, and a longer history, useful in mature codebases or when you need its specific advanced features. Pydantic adds runtime validation, JSON serialization, and schema generation, ideal for parsing untrusted input from APIs or config files. Choose @dataclass for in-process value objects where standard library is preferred; choose the others when their specific differentiator (positional access, validation, serialization) matches your need.
The Bottom Line: One Decorator, Most of the Boilerplate Gone
@dataclass turns a type-annotated class into a fully functional value object with one line of decoration. Default behavior covers __init__, __repr__, and __eq__; opt-in parameters add immutability (frozen=True), ordering (order=True), memory efficiency (slots=True), and structured init hooks (__post_init__). The mutable default trap is the main pitfall; field(default_factory=...) fixes it. For runtime input validation reach for Pydantic; for positional return values use NamedTuple; for everything else, @dataclass is the standard-library winner. Next up: what metaclasses actually are (and when you genuinely need one). Or browse the full Python OOP guide.
Lock In Dataclass Reflexes
CodeGym's Python track puts @dataclass patterns into 800+ hands-on tasks across 62 levels: frozen value objects, slots for memory-bound work, default_factory for mutable defaults, and the __post_init__ validation hook. AI validators catch the mutable-default trap and the frozen-mutation traps; AI mentors explain when @dataclass is the right answer versus when to reach for Pydantic. First level free; full plan on the pricing page.
GO TO FULL VERSION