Python methods vs callable attributes and (deep)copy

This post describes more fundamental underpinnings of how copy and deepcopy interact with callable attributes. See previous post for some practical examples of unexpected behavior.

Methods vs Instance Attributes

Consider the following code. Does statement self.method = self.method alter the object state? Surprisingly, the answer is yes.

class TestClass1:
    def method(self, arg):
        print(self)
        print(arg)
    
    def update(self):
        self.method = self.method

In Python, methods are attributes of a class. Class TestClass1 literally contains a free-standing function that takes two arguments: self and arg. We can even call it directly, and the first argument may not be of type TestClass1:

TestClass1.method("Fake self", 42)

Output:
Fake self
42

Now, consider this code:

instance = TestClass1()
instance.method(99) 

Output:
<__main__.TestClass1 object at 0x7c840fd12b90>
43

What happens when we refer to instance.method? There is no attribute named "method" in instance. The answer is, Python uses the “descriptor protocol”.

Descriptors

When we write something like instance.attr, Python performs several steps behind the scenes. Somewhat simplified version of the logic looks as follows:

  1. Python looks for attribute named "attr" in the instance object.
  2. If it is not found, Python looks for the attribute named "attr" in the instance‘s class.
  3. If this attribute is found and has a __get__() method, it is considered a descriptor. Python calls the __get__() method, and returns the value it produces.
  4. Otherwise, Python returns the value of the class attribute value as is.
  5. If the attribute is not found in the class, Python raises an AttributeError.

Let’s test this in practice:

class CountDescriptor:
    def __init__(self):
        self.count = 0
        
    def __get__(self, obj, objtype = None):
        self.count += 1
        return self.count
        
class TestClass2:
    class_attr = 42
    count = CountDescriptor()
    def __init__(self):
        self.instance_attr = 100
        
instance = TestClass2()
print(instance.instance_attr) # 100, from the instance attribute
print(instance.class_attr)    #  42, from TestClass.class_attr
print(instance.count)         #   1  from TestClass.count.__get__()
print(instance.count)         #   2  from TestClass.count.__get__()     

Functions are Descriptors

Going back to our previous example, TestClass1.method is a function, and type function does have a method named __get__(), making every function a descriptor.
When invoked, this method produces a “bound method” object. It’s a callable whose job is to prepend instance to its list of arguments, and call the original function. The bound method object keeps instance as an attribute, and thus is “bound” to it. This code is written in C, but the Python pseudo code looks like this:

class BoundMethod:
    def __init__(self, func, obj):
        self.__func__ = func
        self.__self__ = obj

    def __call__(self, *args, **kwargs):
        func = self.__func__
        obj = self.__self__
        return func(obj, *args, **kwargs)

class Function:
    ...

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        return MethodType(self, obj)

Let’s check how it works:

class TestClass1:
    def method(self, arg):
        print(self)
        print(arg)
    
    def update(self):
        self.method = self.method
        
instance = TestClass1()
bound_method = TestClass1.method.__get__(instance, TestClass1)

print(f"instance is {instance}")  # instance is <__main__.TestClass1 object at 0x784b3023b190>

# we can now call instance.method in three different ways
TestClass1.method(instance, 42)   # invocation 1
bound_method(42)                  # invocation 2
instance.method(42)               # invocation 3

# All three invocations print
# <__main__.TestClass1 object at 0x784b3023b190>
# 42

Each call to instance.method returns a new bound method object. The resulting objects are equal, but not identical.

b1 = instance.method
b2 = instance.method
print(bound_method)         # <bound method TestClass1.method of <__main__.TestClass1 object at 0x784b3023b190>>
print(b1)                   # <bound method TestClass1.method of <__main__.TestClass1 object at 0x784b3023b190>>
print(b2)                   # <bound method TestClass1.method of <__main__.TestClass1 object at 0x784b3023b190>>
print(b1 is b2)             # False
print(b1 == b2)             # True
print(bound_method == b1)   # True
print(bound_method == b2)   # True

Bound Method as Instance Attribute

If the instance itself has attribute "method", it turns off the descriptor mechanism. The attribute is now part of the instance state (__dict__), and the same object is returned each time.

class TestClass1:
    def method(self, arg):
        print(self)
        print(arg)

    def update(self):
        self.method = self.method
        
instance = TestClass1()
print(instance.__dict__)  # {} -- no instance attributes
b1 = instance.method      
print(id(b1))             # 134705937658240
b2 = instance.method      
print(id(b2))             # 134705937658368 - new object

instance.update()         # create new instance attribute named "method"
print(instance.__dict__)  # {'method': <bound method TestClass1.method of <__main__.TestClass1 object at 0x7a83ab902dd0>>}

b3 = instance.method      # 134705937658432
print(id(b3))
b4 = instance.method      # 134705937658432 -- same as b3; descriptor code is no longer called
print(id(b4))
b5 = instance.method      # 134705937658432 -- same as b3 and b4; descriptor code is no longer called
print(id(b5))

So, it is now clear that self.method = self.method does alter the visible behavior somewhat: the instance now has a new attribute, and its value is fixed, the calls are no longer going through the descriptor logic. This becomes important when we start copying things.

References:

  1. Descriptor guide: Functions and Methods.
  2. Descriptor Protocol.

How Copy Works

The source code for Python copy() is somewhat confusing. but barring special customizations the workflow is as follows:

  1. copy() calls magic method __reduce_ex__() on whatever is being copied.
  2. Default implementation of this function found in object supplies a tuple with the following members:
    1. func: a function that creates new instance of the object without constructor.
    2. args: tuple containing arguments to that function. Typically it has only one member: the object type.
    3. state: the dictionary containing object state.
    4. listiter: usually None.
    5. dictiter: usually None

The copy logic then calls _reconstruct() function, that creates a new uninitialized object via func(*args), and does y.__dict__.update(state), where y is the new object. It therefore sets the object attributes directly, bypassing the __init__() method. All these details are specific to CPython, but it is a rather widespread implementation of Python.

Let’s consider an example:

class TestClass3:
    def __init__(self):
        self.value = 42
        self.method = self.amethod
        
    def amethod(self):
        print(self.value)
        
source_instance = TestClass3()
data = source_instance.__reduce_ex__(4)  # 4 is the current pickle protocol version
print(data)

Formatted output:

(
  <function __newobj__ at 0x7bc5100f8ae0>, 
  (<class '__main__.TestClass3'>,), 
  {'value': 42, 'method': <bound method TestClass3.amethod of <__main__.TestClass3 object at 0x7bc50ff36dd0>>}, 
  None, 
  None
)

The state dictionary contains the value 42, and the bound method, which is bound to the source instance.

When we create a shallow copy of source_instance, it is still bound to the original instance:

from copy import copy
copy_instance = copy(source_instance)
print(dest_instance.method.__self__ is source_instance)  # True
copy_instance.value = 100                  
copy_instance.amethod()                                  # 100
copy_instance.method()                                   # 42 <--- it still uses  source_instance

This is, of course, not ideal, since calling copy_instance.method() will operate on source_instance, not on copy_instance.

How Deep Copy Works

deepcopy() generally follows the same steps as copy(), but it recursively copies the arguments of the initialization function, and members of the object state, before passing the on. It also keeps track of already copied objects in `memo` dictionary, to prevent copying something twice. It is implemented by these lines in `copy.py`:

  if deep and args:
    args = (deepcopy(arg, memo) for arg in args)
  ...
  if deep:
    state = deepcopy(state, memo)

Let’s re-examine the state dictionary in our example:
{'value': 42, 'method': <bound method TestClass3.amethod of <__main__.TestClass3object at 0x7bc50ff36dd0>>}.

Certain types are subject to _deepcopy_atomic: the copy process simply returns the original object. Among other things, this applies to integers and functions, but not to bound methods. The reducer for the bound method returns the following data:

print(source_instance.method.__reduce_ex__(4))

Output:
(<built-in function getattr>, (<__main__.TestClass3 object at 0x7e27d788ad90>, 'amethod'))

Note that this data is somewhat unusual, the “new” function does not create a new object, but retrieves attribute amethod of the existing one. Function arguments will be also deep-copied. But when we do deep = deepcopy(source_instance), note that __main__.TestClass3 object at 0x7e27d788ad90 points to the original object source_instance, which already has been deep-copied to deep, and therefore deepcopy() replaces it with deep.

As a result it almost accidentally does the right thing: newly created bound method object points to deep! Let’s verify it:

from copy import deepcopy

deep = deepcopy(source_instance)
print(deep.method.__self__ is source_instance)  # False
deep.value = 100
deep.amethod()                      # 100
deep.method()                       # 100

Deep Copy Can’t Change Bound Functions

Deep copy will properly copy bound methods, but it won’t deep copy functions that capture bound methods in their code. Example:

def decorate(method):
    def inner():
        method()
    return inner

class TestClass4:
    def __init__(self):
        self.value = 42
        self.method = decorate(self.amethod)
        
    def amethod(self):
        print(self.value)

source_instance = TestClass4()
data = source_instance.__reduce_ex__(4)
print(data)

Formatted output:

(
  <function __newobj__ at 0x7e9dae9dcae0>, 
  (<class '__main__.TestClass4'>,), 
  {'value': 42, 'method': <function decorate.<locals>.inner at 0x7e9dae80c720>}, 
  None, 
  None
)

Note that the state now contains a function object instead of a bound method. Functions are one of the special types whose copy is the original value itself. So, both regular copy and deep copy will keep the function bound to the original variable foo:

from copy import copy, deepcopy
copy_instance = copy(source_instance)
copy_instance .value = 100
copy_instance .amethod()  # 100
copy_instance .method()   # 42 <-- bound to source_instance

deep = deepcopy(source_instance)
deep.value = 100
deep.amethod()  # 100
deep.method()   # 42 <-- bound to source_instance

Disabling copy and deep copy

It is possible to disable copy() and deepcopy() by overriding magic methods __copy__() and __deepcopy__() respectfully and raising an error:

from copy import copy, deepcopy

class Uncopyable:
   def __copy__(self):
       raise RuntimeError("Uncopyable class cannot be copied")

   def __deepcopy__(self, memo):
       raise RuntimeError("Uncopyable class cannot be deepcopied")

instance = Uncopyable()
copy(instance)      # RuntimeError: Uncopyable class cannot be copied
deepcopy(instance)  # RuntimeError: Uncopyable class cannot be deepcopied

Keep in mind, however that in Python user can always copy things manually, there is no such thing as private attributes.

Implementing Custom Copy

We can implement our own copy mechanism, instead of relying on the standard copy() that does not always to the right thing.

from copy import copy

class CopyableIntroduction:
    def __init__(self, name, default_language):
        self.name = name
        self._default_language = default_language  # logically const; changing it won't have effect
        self.introduce = (self.introduce_in_english 
            if default_language == "English"
            else self.introduce_in_french)
 
    def introduce_in_english(self):
        print(f"I am {self.name}")
 
    def introduce_in_french(self):
        print(f"Je m'apelle {self.name}")

    def __copy__(self):
        return CopyableIntroduction(self.name, self._default_language)

    def __deepcopy__(self, memo):
        return self.__copy__()

paul = CopyableIntroduction("Paul", "English")
paul.introduce()    # I am Paul

harry = copy(paul)
harry.name = "Harry"
harry.introduce()   # I am Harry

Without the custom copier, harry.introduce() would print "I am Paul", because mechanically copied introduce method would still be bound to paul.

Summary

Python provides a rather ingenious machinery to implement class methods, but it has some interesting side effects.

  • Statements like self.method = self.method are not a no-op. They create a new instance attribute that contains a bound callable.
  • Copying instances with self-bound callable attributes can get tricky. The callable in the copy may still operate on the original instance.
  • Deepcopy “accidentally” copies bound methods correctly, but it can’t handle functions that capture and call bound methods.
  • Also: copy() and deepcopy bypass __init__() by creating an uninitalized object and assigning its attributes directly. This means that copying bound methods cannot be fixed by adding precautions into the __init__() method.

There are several ways to avoid copying bound callable attributes:

  1. Avoid bound callable attributes.
  2. Make objects that contain such attributes immutable. Although, this is easier said than done in Python.
  3. Avoid copying such objects. Disable copy() and deepcopy by overriding __copy__() and __deepcopy__() to raise an error.
  4. Implement custom copying logic by providing __copy__() and __deepcopy__() methods.

Leave a Reply

Your email address will not be published. Required fields are marked *