Python: modifying imported value

TL;DR: Python statement from module import obj creates a copy of the object in the importing scope.

In other words, from module import obj is roughly equivalent to obj = __import__('module').obj. It creates a new module-level variable, which is independent of  the original module.obj. Python documentation says, somewhat cryptically, that ” it binds the results of that [import] search to a name in the local scope“, but as far as I understand it is equivalent to “creates a new variable”.

This is sort of obvious after the fact, but it took me a while to fully internalize all the consequences. I subconsciously assumed that from module import obj is just syntactic sugar that allows to write obj instead of module.obj, similar to how using works in C++ and C#, and import in Java. I already knew that this is not the case when patching is involved, but I did not think much about it.

None of the above matters as long as imported variables are not re-assigned. If they point to a modifiable object, such as a list, all modifications to that list will be visible through all variables. However, assigning a new list to one variable, will not be reflected in the others.

# module_a.py
global_list = []                 # declares module_a.global_list

# module_b.py
from module_a import global_list # module_b.global_list = module_a.global_list

# main.py
from module_a import global_list # main.global_list = module_a.global_list
global_list.append(42)           # the change is visible in all modules, the list now contains [42]
global_list = [1,2,3,4]          # the change only affects main.global_list, other lists still contain [42]

However, if we only import the modules, no new variables are created.

# module_a.py
global_list = []                 # module_a.global_list

# module_b.py
import module_a
...module_a.global_list...

# main.py
import module_a 
module_a.global_list.append(42)   # the change is visible in all modules
module_a.global_list = [1,2,3,4]  # the change is visible in modules that use module_a.global_list 
                                  # modules that use from module_a import global_list keep the old value

The situation may become rather chaotic if a mix of import module and from module import obj is used
throughout the code with the same variables.

Consider this example:

#=================== config.py =================== 
NUM_THREADS = 10

#===================  runner_a.py =================== 
import config

def run_a():
    print(f"Runner A uses {config.NUM_THREADS} threads")

#===================  runner_b.py =================== 
from config import NUM_THREADS

def run_b():
    print(f"Runner B uses {NUM_THREADS} threads")

#===================  main.py =================== 
import config
import runner_a
import runner_b

config.NUM_THREADS = 20
runner_a.run_a()
runner_b.run_b()

This prints

Runner A uses 20 threads
Runner B uses 10 threads 

I spent about an hour today trying to debug a problem similar to this one.

The discrepancy can be fixed by making the top config reference immutable, and making the modifiable parts its attributes. This is a little bit more verbose, but this way it is much harder to end up with different config values in different modules.

#===================  config.py =================== 
class Config:
    def __init__(self):
        self.NUM_THREADS = 10

global_config = Config()

#===================  runner_a.py =================== 
import config

def run_a():
    print(f"Runner A uses {config.global_config.NUM_THREADS} threads")

#===================  runner_b.py =================== 
from config import global_config

def run_b():
    print(f"Runner B uses {global_config.NUM_THREADS} threads")

#===================  main.py =================== 
import config
import runner_a
import runner_b

config.global_config.NUM_THREADS = 20
runner_a.run_a()
runner_b.run_b()

This version prints

Runner A uses 20 threads
Runner B uses 20 threads 

Better yet, avoid the global config altogether, to prevent any possibility of confusion.

#===================  config.py =================== 
class Config:
    def __init__(self):
        self.NUM_THREADS = 10

#===================  runner_a.py =================== 
import config

def run_a(cfg: config.Config) -> None:
    print(f"Runner A uses {cfg.NUM_THREADS} threads")

#===================  runner_b.py =================== 
import config

def run_b(cfg: config.Config) -> None:
    print(f"Runner B uses {cfg.NUM_THREADS} threads")

#===================  main.py =================== 
import config
import runner_a
import runner_b

main_config = config.Config()
main_config.NUM_THREADS = 20
runner_a.run_a(main_config)
runner_b.run_b(main_config)

The conclusion is two-fold:

Avoid mentally mapping similar concepts across programming languages, even when they look similar and do similar things. Java import is not the same as Python import.

Avoid mixing from module import obj and import module ... module.obj in the same codebase. Prefer the latter if obj can
be potentially reassigned to a new value. This includes both production usage and “patching” for unit tests.

Happy importing!

Leave a Reply

Your email address will not be published. Required fields are marked *