This issue tracker will soon become read-only and move to GitHub.
For a smoother transition, remember to log in and link your GitHub username to your profile.
For more information, see this post about the migration.

classification
Title: dataclasses.asdict() incorrectly calls __deepcopy__() on values.
Type: behavior Stage:
Components: Library (Lib) Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, tfish2
Priority: normal Keywords:

Created on 2021-09-08 09:18 by tfish2, last changed 2021-09-08 17:39 by tfish2.

Messages (3)
msg401360 - (view) Author: Thomas Fischbacher (tfish2) Date: 2021-09-08 09:18
This problem may also be the issue underlying some other dataclasses.asdict() bugs:

https://bugs.python.org/issue?%40columns=id%2Cactivity%2Ctitle%2Ccreator%2Cassignee%2Cstatus%2Ctype&%40sort=-activity&%40filter=status&%40action=searchid&ignore=file%3Acontent&%40search_text=dataclasses.asdict&submit=search&status=-1%2C1%2C2%2C3

The documentation of dataclasses.asdict() states:

https://docs.python.org/3/library/dataclasses.html#dataclasses.asdict

===
Converts the dataclass instance to a dict (by using the factory function dict_factory). Each dataclass is converted to a dict of its fields, as name: value pairs. dataclasses, dicts, lists, and tuples are recursed into. For example: (...)
===

Given this documentation, the expectation about behavior is roughly:

def _dataclasses_asdict_equivalent_helper(obj, dict_factory=dict):
  rec = lambda x: (
    _dataclasses_asdict_equivalent_helper(x,
                                          dict_factory=dict_factory))
  if isinstance(obj, (list, tuple)):
    return type(obj)(rec(x) for x in obj)
  elif isinstance(obj, dict):
    return type(obj)((k, rec(v) for k, v in obj.items())
  # Otherwise, we are looking at a dataclass-instance.
  for field in type(obj).__dataclass_fields__:
    val = obj.__getattribute__[field]
    if (hasattr(type(obj), '__dataclass_fields__')):
      # ^ approx check for "is this a dataclass instance"?
      # Not 100% correct. For illustration only.
      ret[field] = rec(val)
    ret[field] = val
  return ret

def dataclasses_asdict_equivalent(x, dict_factory=dict):
   if not hasattr(type(x), '__dataclass_fields__'):
      raise ValueError(f'Not a dataclass: {x!r}')
   return _dataclasses_asdict_equivalent(x, dict_factory=dict_factory)


In particular, field-values that are neither dict, list, tuple, or dataclass-instances are expected to be used identically.

What actually happens however is that .asdict() DOES call __deepcopy__ on field values it has no business inspecting:

===
import dataclasses


@dataclasses.dataclass
class Demo:
  field_a: object

class Obj:
   def __init__(self, x):
    self._x = x

   def __deepcopy__(self, *args):
     raise ValueError('BOOM!')


###
d1 = Demo(field_a=Obj([1,2,3]))
dd = dataclasses.asdict(d1)

# ...Execution does run into a "BOOM!" ValueError.
===

Apart from this: It would be very useful if dataclasses.asdict() came with a recurse={boolish} parameter with which one can turn off recursive translation of value-objects.
msg401393 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-09-08 15:47
The intent was that asdict() returns something that, if mutated, doesn't affect the original object tree. I'd sort of like to just deprecate it, it's got a lot of corner cases that are poorly handled.

It probably needs the same kind of controls that attrs.asdict() does.
msg401418 - (view) Author: Thomas Fischbacher (tfish2) Date: 2021-09-08 17:39
The current behavior deviates from the documentation in a way that might evade tests and hence has the potential to cause production outages.

Is there a way to fix the documentation so that it correctly describes current behavior - without having to wait for a new release? Eliminating the risk in such a way would be highly appreciated.

In the longer run, there may be some value in having a differently named method (perhaps .as_dict()?) that basically returns
{k: v for k, v in self.__dict__.items()}, but without going through reflection? The current approach to recurse looks as if it were based on quite a few doubtful assumptions.

(Context: some style guides, such as Google's Python style guide,
limit the use of reflection in order to keep some overall undesirable processes in check: https://google.github.io/styleguide/pyguide.html#2191-definition)
History
Date User Action Args
2021-09-08 17:39:37tfish2setmessages: + msg401418
2021-09-08 15:47:13eric.smithsetmessages: + msg401393
2021-09-08 09:24:50xtreaksetnosy: + eric.smith
2021-09-08 09:18:41tfish2create