
{"id":5037,"date":"2024-07-14T23:09:42","date_gmt":"2024-07-15T03:09:42","guid":{"rendered":"https:\/\/ikriv.com\/blog\/?p=5037"},"modified":"2024-07-14T23:09:42","modified_gmt":"2024-07-15T03:09:42","slug":"python-methods-vs-callable-attributes-and-deepcopy","status":"publish","type":"post","link":"https:\/\/ikriv.com\/blog\/?p=5037","title":{"rendered":"Python methods vs callable attributes and (deep)copy"},"content":{"rendered":"<p>This post describes more fundamental underpinnings of how <code>copy<\/code> and <code>deepcopy<\/code> interact with callable attributes. See <a href=\"https:\/\/ikriv.com\/blog\/?p=4987\">previous post<\/a> for some practical examples of unexpected behavior.<\/p>\n<h1>Methods vs Instance Attributes<\/h1>\n<p>Consider the following code. Does statement <code>self.method = self.method<\/code> alter the object state? Surprisingly, the answer is yes.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nclass TestClass1:\r\n    def method(self, arg):\r\n        print(self)\r\n        print(arg)\r\n    \r\n    def update(self):\r\n        self.method = self.method\r\n<\/pre>\n<p>In Python, methods are attributes of a class. Class <code>TestClass1<\/code> literally contains a free-standing function that takes two arguments: <code>self<\/code> and <code>arg<\/code>. We can even call it directly, and the first argument may not be of type <code>TestClass1<\/code>:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nTestClass1.method(&quot;Fake self&quot;, 42)\r\n\r\nOutput:\r\nFake self\r\n42\r\n<\/pre>\n<p>Now, consider this code:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\ninstance = TestClass1()\r\ninstance.method(99) \r\n\r\nOutput:\r\n&lt;__main__.TestClass1 object at 0x7c840fd12b90&gt;\r\n43\r\n<\/pre>\n<p>What happens when we refer to <code>instance.method<\/code>? There is no attribute named <code>\"method\"<\/code> in <code>instance<\/code>. The answer is, Python uses the &#8220;descriptor protocol&#8221;.<\/p>\n<h1>Descriptors<\/h1>\n<p>When we write something like <code>instance.attr<\/code>, Python performs several steps behind the scenes. Somewhat simplified version of the logic looks as follows:<\/p>\n<ol>\n<li>Python looks for attribute named <code>\"attr\"<\/code> in the <code>instance<\/code> object.<\/li>\n<li>If it is not found, Python looks for the attribute named <code>\"attr\"<\/code> in the <code>instance<\/code>&#8216;s class.<\/li>\n<li>If this attribute is found and has a <code>__get__()<\/code> method, it is considered a <em>descriptor<\/em>. Python calls the <code>__get__()<\/code> method, and returns the value it produces.<\/li>\n<li>Otherwise, Python returns the value of the class attribute value as is.<\/li>\n<li>If the attribute is not found in the class, Python raises an <code>AttributeError<\/code>.\n<\/ol>\n<p>Let&#8217;s test this in practice:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nclass CountDescriptor:\r\n    def __init__(self):\r\n        self.count = 0\r\n        \r\n    def __get__(self, obj, objtype = None):\r\n        self.count += 1\r\n        return self.count\r\n        \r\nclass TestClass2:\r\n    class_attr = 42\r\n    count = CountDescriptor()\r\n    def __init__(self):\r\n        self.instance_attr = 100\r\n        \r\ninstance = TestClass2()\r\nprint(instance.instance_attr) # 100, from the instance attribute\r\nprint(instance.class_attr)    #  42, from TestClass.class_attr\r\nprint(instance.count)         #   1  from TestClass.count.__get__()\r\nprint(instance.count)         #   2  from TestClass.count.__get__()     \r\n<\/pre>\n<h1>Functions are Descriptors<\/h1>\n<p>Going back to our previous example, <code>TestClass1.method<\/code> is a function, and type <code>function<\/code> does have a method named <code>__get__()<\/code>, making every function a descriptor.<br \/>\nWhen invoked, this method produces a &#8220;bound method&#8221; object. It&#8217;s a callable whose job is to prepend <code>instance<\/code> to its list of arguments, and call the original function. The bound method object keeps <code>instance<\/code> as an attribute, and thus is &#8220;bound&#8221; to it. This code is written in C, but the Python pseudo code looks like this:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nclass BoundMethod:\r\n    def __init__(self, func, obj):\r\n        self.__func__ = func\r\n        self.__self__ = obj\r\n\r\n    def __call__(self, *args, **kwargs):\r\n        func = self.__func__\r\n        obj = self.__self__\r\n        return func(obj, *args, **kwargs)\r\n\r\nclass Function:\r\n    ...\r\n\r\n    def __get__(self, obj, objtype=None):\r\n        if obj is None:\r\n            return self\r\n        return MethodType(self, obj)\r\n<\/pre>\n<p>Let&#8217;s check how it works:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nclass TestClass1:\r\n    def method(self, arg):\r\n        print(self)\r\n        print(arg)\r\n    \r\n    def update(self):\r\n        self.method = self.method\r\n        \r\ninstance = TestClass1()\r\nbound_method = TestClass1.method.__get__(instance, TestClass1)\r\n\r\nprint(f&quot;instance is {instance}&quot;)  # instance is &lt;__main__.TestClass1 object at 0x784b3023b190&gt;\r\n\r\n# we can now call instance.method in three different ways\r\nTestClass1.method(instance, 42)   # invocation 1\r\nbound_method(42)                  # invocation 2\r\ninstance.method(42)               # invocation 3\r\n\r\n# All three invocations print\r\n# &lt;__main__.TestClass1 object at 0x784b3023b190&gt;\r\n# 42\r\n<\/pre>\n<p>Each call to <code>instance.method<\/code> returns a new bound method object. The resulting objects are equal, but not identical.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nb1 = instance.method\r\nb2 = instance.method\r\nprint(bound_method)         # &lt;bound method TestClass1.method of &lt;__main__.TestClass1 object at 0x784b3023b190&gt;&gt;\r\nprint(b1)                   # &lt;bound method TestClass1.method of &lt;__main__.TestClass1 object at 0x784b3023b190&gt;&gt;\r\nprint(b2)                   # &lt;bound method TestClass1.method of &lt;__main__.TestClass1 object at 0x784b3023b190&gt;&gt;\r\nprint(b1 is b2)             # False\r\nprint(b1 == b2)             # True\r\nprint(bound_method == b1)   # True\r\nprint(bound_method == b2)   # True\r\n<\/pre>\n<h1>Bound Method as Instance Attribute<\/h1>\n<p>If the instance itself has attribute <code>\"method\"<\/code>, it turns off the descriptor mechanism. The attribute is now part of the instance state (<code>__dict__<\/code>), and the same object is returned each time.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nclass TestClass1:\r\n    def method(self, arg):\r\n        print(self)\r\n        print(arg)\r\n\r\n    def update(self):\r\n        self.method = self.method\r\n        \r\ninstance = TestClass1()\r\nprint(instance.__dict__)  # {} -- no instance attributes\r\nb1 = instance.method      \r\nprint(id(b1))             # 134705937658240\r\nb2 = instance.method      \r\nprint(id(b2))             # 134705937658368 - new object\r\n\r\ninstance.update()         # create new instance attribute named &quot;method&quot;\r\nprint(instance.__dict__)  # {&#039;method&#039;: &lt;bound method TestClass1.method of &lt;__main__.TestClass1 object at 0x7a83ab902dd0&gt;&gt;}\r\n\r\nb3 = instance.method      # 134705937658432\r\nprint(id(b3))\r\nb4 = instance.method      # 134705937658432 -- same as b3; descriptor code is no longer called\r\nprint(id(b4))\r\nb5 = instance.method      # 134705937658432 -- same as b3 and b4; descriptor code is no longer called\r\nprint(id(b5))\r\n<\/pre>\n<p>So, it is now clear that <code>self.method = self.method<\/code> does alter the visible behavior somewhat: the instance now has a new attribute, and its value is fixed, the calls are no longer going through the descriptor logic. This becomes important when we start copying things.<\/p>\n<p>References: <\/p>\n<ol>\n<li><a href=\"https:\/\/docs.python.org\/3\/howto\/descriptor.html#functions-and-methods\">Descriptor guide: Functions and Methods<\/a>.<\/li>\n<li><a href=\"https:\/\/python-reference.readthedocs.io\/en\/latest\/docs\/dunderdsc\/\">Descriptor Protocol<\/a>.<\/li>\n<\/ol>\n<h1>How Copy Works<\/h1>\n<p>The <a href=\"https:\/\/github.com\/python\/cpython\/blob\/3.12\/Lib\/copy.py\">source code for Python <code>copy()<\/code><\/a> is somewhat confusing. but barring special customizations the workflow is as follows:<\/p>\n<ol>\n<li><code>copy()<\/code> calls magic method <code>__reduce_ex__()<\/code> on whatever is being copied.<\/li>\n<li>Default implementation of this function found in <code>object<\/code> supplies a tuple with the following members:\n<ol>\n<li><code>func<\/code>: a function that creates new instance of the object without constructor.<\/li>\n<li><code>args<\/code>: tuple containing arguments to that function. Typically it has only one member: the object type.<\/li>\n<li><code>state<\/code>: the dictionary containing object state.<\/li>\n<li><code>listiter<\/code>: usually <code>None<\/code>.<\/li>\n<li><code>dictiter<\/code>: usually <code>None<\/code><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p>The copy logic then calls <code>_reconstruct()<\/code> function, that creates a new uninitialized object via <code>func(*args)<\/code>, and does <code>y.__dict__.update(state)<\/code>, where <code>y<\/code> is the new object. It therefore sets the object attributes directly, bypassing the <code>__init__()<\/code> method. All these details are specific to CPython, but it is a rather widespread implementation of Python.<\/p>\n<p>Let&#8217;s consider an example:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nclass TestClass3:\r\n    def __init__(self):\r\n        self.value = 42\r\n        self.method = self.amethod\r\n        \r\n    def amethod(self):\r\n        print(self.value)\r\n        \r\nsource_instance = TestClass3()\r\ndata = source_instance.__reduce_ex__(4)  # 4 is the current pickle protocol version\r\nprint(data)\r\n<\/pre>\n<p>Formatted output:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n(\r\n  &lt;function __newobj__ at 0x7bc5100f8ae0&gt;, \r\n  (&lt;class &#039;__main__.TestClass3&#039;&gt;,), \r\n  {&#039;value&#039;: 42, &#039;method&#039;: &lt;bound method TestClass3.amethod of &lt;__main__.TestClass3 object at 0x7bc50ff36dd0&gt;&gt;}, \r\n  None, \r\n  None\r\n)\r\n<\/pre>\n<p>The <code>state<\/code> dictionary contains the value 42, and the bound method, which is bound to the source instance.<\/p>\n<p>When we create a shallow copy of <code>source_instance<\/code>, it is still bound to the original instance:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nfrom copy import copy\r\ncopy_instance = copy(source_instance)\r\nprint(dest_instance.method.__self__ is source_instance)  # True\r\ncopy_instance.value = 100                  \r\ncopy_instance.amethod()                                  # 100\r\ncopy_instance.method()                                   # 42 &lt;--- it still uses  source_instance\r\n<\/pre>\n<p>This is, of course, not ideal, since calling <code>copy_instance.method()<\/code> will operate on <code>source_instance<\/code>, not on <code>copy_instance<\/code>.<\/p>\n<h1>How Deep Copy Works<\/h1>\n<p><code>deepcopy()<\/code> generally follows the same steps as <code>copy()<\/code>, but it recursively copies the arguments of the initialization function, and members of the object state, before passing the on. It also keeps track of already copied objects in `memo` dictionary, to prevent copying something twice. It is implemented by these lines in `copy.py`:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n  if deep and args:\r\n    args = (deepcopy(arg, memo) for arg in args)\r\n  ...\r\n  if deep:\r\n    state = deepcopy(state, memo)\r\n<\/pre>\n<p>Let&#8217;s re-examine the state dictionary in our example:<br \/>\n<code>{'value': 42, 'method': &lt;bound method TestClass3.amethod of &lt;__main__.TestClass3object at 0x7bc50ff36dd0&gt;&gt;}<\/code>.<\/p>\n<p>Certain types are subject to <code>_deepcopy_atomic<\/code>: the copy process simply returns the original object. Among other things, this applies to integers and functions, but not to bound methods. The reducer for the bound method returns the following data:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nprint(source_instance.method.__reduce_ex__(4))\r\n\r\nOutput:\r\n(&lt;built-in function getattr&gt;, (&lt;__main__.TestClass3 object at 0x7e27d788ad90&gt;, &#039;amethod&#039;))\r\n<\/pre>\n<p>Note that this data is somewhat unusual, the &#8220;new&#8221; function does not create a new object, but retrieves attribute <code>amethod<\/code> of the existing one. Function arguments will be also deep-copied. But when we do <code>deep = deepcopy(source_instance)<\/code>, note that <code>__main__.TestClass3 object at 0x7e27d788ad90<\/code> points to the original object <code>source_instance<\/code>, which already has been deep-copied to <code>deep<\/code>, and therefore <code>deepcopy()<\/code> replaces it with <code>deep<\/code>.<\/p>\n<p>As a result it almost accidentally does the right thing: newly created bound method object points to <code>deep<\/code>! Let&#8217;s verify it:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nfrom copy import deepcopy\r\n\r\ndeep = deepcopy(source_instance)\r\nprint(deep.method.__self__ is source_instance)  # False\r\ndeep.value = 100\r\ndeep.amethod()                      # 100\r\ndeep.method()                       # 100\r\n<\/pre>\n<h1>Deep Copy Can&#8217;t Change Bound Functions<\/h1>\n<p>Deep copy will properly copy bound methods, but it won&#8217;t deep copy functions that capture bound methods in their code. Example:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\ndef decorate(method):\r\n    def inner():\r\n        method()\r\n    return inner\r\n\r\nclass TestClass4:\r\n    def __init__(self):\r\n        self.value = 42\r\n        self.method = decorate(self.amethod)\r\n        \r\n    def amethod(self):\r\n        print(self.value)\r\n\r\nsource_instance = TestClass4()\r\ndata = source_instance.__reduce_ex__(4)\r\nprint(data)\r\n<\/pre>\n<p>Formatted output:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n(\r\n  &lt;function __newobj__ at 0x7e9dae9dcae0&gt;, \r\n  (&lt;class &#039;__main__.TestClass4&#039;&gt;,), \r\n  {&#039;value&#039;: 42, &#039;method&#039;: &lt;function decorate.&lt;locals&gt;.inner at 0x7e9dae80c720&gt;}, \r\n  None, \r\n  None\r\n)\r\n<\/pre>\n<p>Note that the state now contains a function object instead of a bound method. Functions are one of the special types whose copy is the original value itself. So, both regular copy and deep copy will keep the function bound to the original variable <code>foo<\/code>:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nfrom copy import copy, deepcopy\r\ncopy_instance = copy(source_instance)\r\ncopy_instance .value = 100\r\ncopy_instance .amethod()  # 100\r\ncopy_instance .method()   # 42 &lt;-- bound to source_instance\r\n\r\ndeep = deepcopy(source_instance)\r\ndeep.value = 100\r\ndeep.amethod()  # 100\r\ndeep.method()   # 42 &lt;-- bound to source_instance\r\n<\/pre>\n<h1>Disabling copy and deep copy<\/h1>\n<p>It is possible to disable <code>copy()<\/code> and <code>deepcopy()<\/code> by overriding magic methods <code>__copy__()<\/code> and <code>__deepcopy__()<\/code> respectfully and raising an error:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nfrom copy import copy, deepcopy\r\n\r\nclass Uncopyable:\r\n   def __copy__(self):\r\n       raise RuntimeError(&quot;Uncopyable class cannot be copied&quot;)\r\n\r\n   def __deepcopy__(self, memo):\r\n       raise RuntimeError(&quot;Uncopyable class cannot be deepcopied&quot;)\r\n\r\ninstance = Uncopyable()\r\ncopy(instance)      # RuntimeError: Uncopyable class cannot be copied\r\ndeepcopy(instance)  # RuntimeError: Uncopyable class cannot be deepcopied\r\n<\/pre>\n<p>Keep in mind, however that in Python user can always copy things manually, there is no such thing as private attributes.<\/p>\n<h1>Implementing Custom Copy<\/h1>\n<p>We can implement our own copy mechanism, instead of relying on the standard <code>copy()<\/code> that does not always to the right thing.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nfrom copy import copy\r\n\r\nclass CopyableIntroduction:\r\n    def __init__(self, name, default_language):\r\n        self.name = name\r\n        self._default_language = default_language  # logically const; changing it won&#039;t have effect\r\n        self.introduce = (self.introduce_in_english \r\n            if default_language == &quot;English&quot;\r\n            else self.introduce_in_french)\r\n \r\n    def introduce_in_english(self):\r\n        print(f&quot;I am {self.name}&quot;)\r\n \r\n    def introduce_in_french(self):\r\n        print(f&quot;Je m&#039;apelle {self.name}&quot;)\r\n\r\n    def __copy__(self):\r\n        return CopyableIntroduction(self.name, self._default_language)\r\n\r\n    def __deepcopy__(self, memo):\r\n        return self.__copy__()\r\n\r\npaul = CopyableIntroduction(&quot;Paul&quot;, &quot;English&quot;)\r\npaul.introduce()    # I am Paul\r\n\r\nharry = copy(paul)\r\nharry.name = &quot;Harry&quot;\r\nharry.introduce()   # I am Harry\r\n<\/pre>\n<p>Without the custom copier, <code>harry.introduce()<\/code> would print <code>\"I am Paul\"<\/code>, because mechanically copied <code>introduce<\/code> method would still be bound to <code>paul<\/code>.<\/p>\n<h1>Summary<\/h1>\n<p>Python provides a rather ingenious machinery to implement class methods, but it has some interesting side effects. <\/p>\n<ul>\n<li>Statements like <code>self.method = self.method<\/code> are not a no-op. They create a new instance attribute that contains a bound callable.<\/li>\n<li>Copying instances with self-bound callable attributes can get tricky. The callable in the copy may still operate on the original instance.<\/li>\n<li>Deepcopy &#8220;accidentally&#8221; copies bound methods correctly, but it can&#8217;t handle functions that capture and call bound methods.<\/li>\n<li>Also: <code>copy()<\/code> and <code>deepcopy<\/code> bypass <code>__init__()<\/code> by creating an uninitalized object and assigning its attributes directly. This means that copying bound methods cannot be fixed by adding precautions into the <code>__init__()<\/code> method.<\/li>\n<\/ul>\n<p>There are several ways to avoid copying bound callable attributes:<\/p>\n<ol>\n<li>Avoid bound callable attributes.<\/li>\n<li>Make objects that contain such attributes immutable. Although, this is easier said than done in Python.<\/li>\n<li>Avoid copying such objects. Disable <code>copy()<\/code> and <code>deepcopy<\/code> by overriding <code>__copy__()<\/code> and <code>__deepcopy__()<\/code> to raise an error.<\/li>\n<li>Implement custom copying logic by providing <code>__copy__()<\/code> and <code>__deepcopy__()<\/code> methods.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>This post describes more fundamental underpinnings of how copy and deepcopy interact with callable attributes. See previous post for some practical examples of unexpected behavior. Methods vs Instance Attributes Consider <a href=\"https:\/\/ikriv.com\/blog\/?p=5037\" class=\"more-link\">[&hellip;]<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"Layout":"","footnotes":""},"categories":[4],"tags":[],"class_list":["entry","author-ikriv","post-5037","post","type-post","status-publish","format-standard","category-hack"],"_links":{"self":[{"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5037","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5037"}],"version-history":[{"count":57,"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5037\/revisions"}],"predecessor-version":[{"id":5094,"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5037\/revisions\/5094"}],"wp:attachment":[{"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5037"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5037"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5037"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}