Dataclasses and mutable defaults
Posted on Tue 14 January 2020 in programming
One common Python gotcha is the use of mutable objects as defaults for function keyword arguments. There are approximately one billion questions on SO about this or nice discussions elsewhere. I came across a nice feature in Python's dataclasses library that addresses a similar problem.
Mutable defaults are bad
As a reminder, in this example, every invocation of foo without a parameter for d
provided by the caller would modify the same default dictionary (equivalent to the second
example).
# this...
def foo(d={}):
...
# is equivalent to this
default = {} # this is a global object!
def foo(d=default):
...
You always want to instead use a value like None
and check in the function or method body
whether a default needs to be supplied, and document the function's behavior accordingly as
it cannot be inferred from the type signatures.
# better
def foo(d=None):
"""Does foo
Args:
d (dict): some parameter. Defaults to {}.
"""
if d is None:
d = {}
...
Named tuples
A similar gotcha is using mutable defaults for named tuples.
from typing import NamedTuple
class Foo(NamedTuple):
d: dict = {} # no
Again, every instance of Foo
that uses the default value will share a reference to a
single underlying object. Instead, set the default to None
and require consumers to check
whether d is None
.
Dataclasses
Dataclasses are like named tuples, but better in every way. One feature I recently discovered is the ability to use a default factory to avoid this issue.
from dataclasses import dataclass, field
@dataclass
class Foo:
d: dict = field(default_factory=dict)
With this paradigm, every time a default needs to be supplied for d
, the factory function
will be called, creating a new dictionary object each time and side-stepping the gotcha.
Additionally, consumers of Foo
can always assume that d
is a dict and simplify their
code accordingly.