This post describes more fundamental underpinnings of how copy
and deepcopy
interact with callable attributes. See previous post for some practical examples of unexpected behavior.
Methods vs Instance Attributes
Consider the following code. Does statement self.method = self.method
alter the object state? Surprisingly, the answer is yes.
class TestClass1: def method(self, arg): print(self) print(arg) def update(self): self.method = self.method
In Python, methods are attributes of a class. Class TestClass1
literally contains a free-standing function that takes two arguments: self
and arg
. We can even call it directly, and the first argument may not be of type TestClass1
:
TestClass1.method("Fake self", 42) Output: Fake self 42
Now, consider this code:
instance = TestClass1() instance.method(99) Output: <__main__.TestClass1 object at 0x7c840fd12b90> 43
What happens when we refer to instance.method
? There is no attribute named "method"
in instance
. The answer is, Python uses the “descriptor protocol”.
Descriptors
When we write something like instance.attr
, Python performs several steps behind the scenes. Somewhat simplified version of the logic looks as follows:
- Python looks for attribute named
"attr"
in theinstance
object. - If it is not found, Python looks for the attribute named
"attr"
in theinstance
‘s class. - If this attribute is found and has a
__get__()
method, it is considered a descriptor. Python calls the__get__()
method, and returns the value it produces. - Otherwise, Python returns the value of the class attribute value as is.
- If the attribute is not found in the class, Python raises an
AttributeError
.
Let’s test this in practice:
class CountDescriptor: def __init__(self): self.count = 0 def __get__(self, obj, objtype = None): self.count += 1 return self.count class TestClass2: class_attr = 42 count = CountDescriptor() def __init__(self): self.instance_attr = 100 instance = TestClass2() print(instance.instance_attr) # 100, from the instance attribute print(instance.class_attr) # 42, from TestClass.class_attr print(instance.count) # 1 from TestClass.count.__get__() print(instance.count) # 2 from TestClass.count.__get__()
Functions are Descriptors
Going back to our previous example, TestClass1.method
is a function, and type function
does have a method named __get__()
, making every function a descriptor.
When invoked, this method produces a “bound method” object. It’s a callable whose job is to prepend instance
to its list of arguments, and call the original function. The bound method object keeps instance
as an attribute, and thus is “bound” to it. This code is written in C, but the Python pseudo code looks like this:
class BoundMethod: def __init__(self, func, obj): self.__func__ = func self.__self__ = obj def __call__(self, *args, **kwargs): func = self.__func__ obj = self.__self__ return func(obj, *args, **kwargs) class Function: ... def __get__(self, obj, objtype=None): if obj is None: return self return MethodType(self, obj)
Let’s check how it works:
class TestClass1: def method(self, arg): print(self) print(arg) def update(self): self.method = self.method instance = TestClass1() bound_method = TestClass1.method.__get__(instance, TestClass1) print(f"instance is {instance}") # instance is <__main__.TestClass1 object at 0x784b3023b190> # we can now call instance.method in three different ways TestClass1.method(instance, 42) # invocation 1 bound_method(42) # invocation 2 instance.method(42) # invocation 3 # All three invocations print # <__main__.TestClass1 object at 0x784b3023b190> # 42
Each call to instance.method
returns a new bound method object. The resulting objects are equal, but not identical.
b1 = instance.method b2 = instance.method print(bound_method) # <bound method TestClass1.method of <__main__.TestClass1 object at 0x784b3023b190>> print(b1) # <bound method TestClass1.method of <__main__.TestClass1 object at 0x784b3023b190>> print(b2) # <bound method TestClass1.method of <__main__.TestClass1 object at 0x784b3023b190>> print(b1 is b2) # False print(b1 == b2) # True print(bound_method == b1) # True print(bound_method == b2) # True
Bound Method as Instance Attribute
If the instance itself has attribute "method"
, it turns off the descriptor mechanism. The attribute is now part of the instance state (__dict__
), and the same object is returned each time.
class TestClass1: def method(self, arg): print(self) print(arg) def update(self): self.method = self.method instance = TestClass1() print(instance.__dict__) # {} -- no instance attributes b1 = instance.method print(id(b1)) # 134705937658240 b2 = instance.method print(id(b2)) # 134705937658368 - new object instance.update() # create new instance attribute named "method" print(instance.__dict__) # {'method': <bound method TestClass1.method of <__main__.TestClass1 object at 0x7a83ab902dd0>>} b3 = instance.method # 134705937658432 print(id(b3)) b4 = instance.method # 134705937658432 -- same as b3; descriptor code is no longer called print(id(b4)) b5 = instance.method # 134705937658432 -- same as b3 and b4; descriptor code is no longer called print(id(b5))
So, it is now clear that self.method = self.method
does alter the visible behavior somewhat: the instance now has a new attribute, and its value is fixed, the calls are no longer going through the descriptor logic. This becomes important when we start copying things.
References:
How Copy Works
The source code for Python copy()
is somewhat confusing. but barring special customizations the workflow is as follows:
copy()
calls magic method__reduce_ex__()
on whatever is being copied.- Default implementation of this function found in
object
supplies a tuple with the following members:func
: a function that creates new instance of the object without constructor.args
: tuple containing arguments to that function. Typically it has only one member: the object type.state
: the dictionary containing object state.listiter
: usuallyNone
.dictiter
: usuallyNone
The copy logic then calls _reconstruct()
function, that creates a new uninitialized object via func(*args)
, and does y.__dict__.update(state)
, where y
is the new object. It therefore sets the object attributes directly, bypassing the __init__()
method. All these details are specific to CPython, but it is a rather widespread implementation of Python.
Let’s consider an example:
class TestClass3: def __init__(self): self.value = 42 self.method = self.amethod def amethod(self): print(self.value) source_instance = TestClass3() data = source_instance.__reduce_ex__(4) # 4 is the current pickle protocol version print(data)
Formatted output:
( <function __newobj__ at 0x7bc5100f8ae0>, (<class '__main__.TestClass3'>,), {'value': 42, 'method': <bound method TestClass3.amethod of <__main__.TestClass3 object at 0x7bc50ff36dd0>>}, None, None )
The state
dictionary contains the value 42, and the bound method, which is bound to the source instance.
When we create a shallow copy of source_instance
, it is still bound to the original instance:
from copy import copy copy_instance = copy(source_instance) print(dest_instance.method.__self__ is source_instance) # True copy_instance.value = 100 copy_instance.amethod() # 100 copy_instance.method() # 42 <--- it still uses source_instance
This is, of course, not ideal, since calling copy_instance.method()
will operate on source_instance
, not on copy_instance
.
How Deep Copy Works
deepcopy()
generally follows the same steps as copy()
, but it recursively copies the arguments of the initialization function, and members of the object state, before passing the on. It also keeps track of already copied objects in `memo` dictionary, to prevent copying something twice. It is implemented by these lines in `copy.py`:
if deep and args: args = (deepcopy(arg, memo) for arg in args) ... if deep: state = deepcopy(state, memo)
Let’s re-examine the state dictionary in our example:
{'value': 42, 'method': <bound method TestClass3.amethod of <__main__.TestClass3object at 0x7bc50ff36dd0>>}
.
Certain types are subject to _deepcopy_atomic
: the copy process simply returns the original object. Among other things, this applies to integers and functions, but not to bound methods. The reducer for the bound method returns the following data:
print(source_instance.method.__reduce_ex__(4)) Output: (<built-in function getattr>, (<__main__.TestClass3 object at 0x7e27d788ad90>, 'amethod'))
Note that this data is somewhat unusual, the “new” function does not create a new object, but retrieves attribute amethod
of the existing one. Function arguments will be also deep-copied. But when we do deep = deepcopy(source_instance)
, note that __main__.TestClass3 object at 0x7e27d788ad90
points to the original object source_instance
, which already has been deep-copied to deep
, and therefore deepcopy()
replaces it with deep
.
As a result it almost accidentally does the right thing: newly created bound method object points to deep
! Let’s verify it:
from copy import deepcopy deep = deepcopy(source_instance) print(deep.method.__self__ is source_instance) # False deep.value = 100 deep.amethod() # 100 deep.method() # 100
Deep Copy Can’t Change Bound Functions
Deep copy will properly copy bound methods, but it won’t deep copy functions that capture bound methods in their code. Example:
def decorate(method): def inner(): method() return inner class TestClass4: def __init__(self): self.value = 42 self.method = decorate(self.amethod) def amethod(self): print(self.value) source_instance = TestClass4() data = source_instance.__reduce_ex__(4) print(data)
Formatted output:
( <function __newobj__ at 0x7e9dae9dcae0>, (<class '__main__.TestClass4'>,), {'value': 42, 'method': <function decorate.<locals>.inner at 0x7e9dae80c720>}, None, None )
Note that the state now contains a function object instead of a bound method. Functions are one of the special types whose copy is the original value itself. So, both regular copy and deep copy will keep the function bound to the original variable foo
:
from copy import copy, deepcopy copy_instance = copy(source_instance) copy_instance .value = 100 copy_instance .amethod() # 100 copy_instance .method() # 42 <-- bound to source_instance deep = deepcopy(source_instance) deep.value = 100 deep.amethod() # 100 deep.method() # 42 <-- bound to source_instance
Disabling copy and deep copy
It is possible to disable copy()
and deepcopy()
by overriding magic methods __copy__()
and __deepcopy__()
respectfully and raising an error:
from copy import copy, deepcopy class Uncopyable: def __copy__(self): raise RuntimeError("Uncopyable class cannot be copied") def __deepcopy__(self, memo): raise RuntimeError("Uncopyable class cannot be deepcopied") instance = Uncopyable() copy(instance) # RuntimeError: Uncopyable class cannot be copied deepcopy(instance) # RuntimeError: Uncopyable class cannot be deepcopied
Keep in mind, however that in Python user can always copy things manually, there is no such thing as private attributes.
Implementing Custom Copy
We can implement our own copy mechanism, instead of relying on the standard copy()
that does not always to the right thing.
from copy import copy class CopyableIntroduction: def __init__(self, name, default_language): self.name = name self._default_language = default_language # logically const; changing it won't have effect self.introduce = (self.introduce_in_english if default_language == "English" else self.introduce_in_french) def introduce_in_english(self): print(f"I am {self.name}") def introduce_in_french(self): print(f"Je m'apelle {self.name}") def __copy__(self): return CopyableIntroduction(self.name, self._default_language) def __deepcopy__(self, memo): return self.__copy__() paul = CopyableIntroduction("Paul", "English") paul.introduce() # I am Paul harry = copy(paul) harry.name = "Harry" harry.introduce() # I am Harry
Without the custom copier, harry.introduce()
would print "I am Paul"
, because mechanically copied introduce
method would still be bound to paul
.
Summary
Python provides a rather ingenious machinery to implement class methods, but it has some interesting side effects.
- Statements like
self.method = self.method
are not a no-op. They create a new instance attribute that contains a bound callable. - Copying instances with self-bound callable attributes can get tricky. The callable in the copy may still operate on the original instance.
- Deepcopy “accidentally” copies bound methods correctly, but it can’t handle functions that capture and call bound methods.
- Also:
copy()
anddeepcopy
bypass__init__()
by creating an uninitalized object and assigning its attributes directly. This means that copying bound methods cannot be fixed by adding precautions into the__init__()
method.
There are several ways to avoid copying bound callable attributes:
- Avoid bound callable attributes.
- Make objects that contain such attributes immutable. Although, this is easier said than done in Python.
- Avoid copying such objects. Disable
copy()
anddeepcopy
by overriding__copy__()
and__deepcopy__()
to raise an error. - Implement custom copying logic by providing
__copy__()
and__deepcopy__()
methods.