TL;DR: Python statement from module import obj
creates a copy of the object in the importing scope.
In other words, from module import obj
is roughly equivalent to obj = __import__('module').obj
. It creates a new module-level variable, which is independent of the original module.obj
. Python documentation says, somewhat cryptically, that ” it binds the results of that [import] search to a name in the local scope“, but as far as I understand it is equivalent to “creates a new variable”.
This is sort of obvious after the fact, but it took me a while to fully internalize all the consequences. I subconsciously assumed that from module import obj
is just syntactic sugar that allows to write obj
instead of module.obj
, similar to how using
works in C++ and C#, and import
in Java. I already knew that this is not the case when patching is involved, but I did not think much about it.
None of the above matters as long as imported variables are not re-assigned. If they point to a modifiable object, such as a list, all modifications to that list will be visible through all variables. However, assigning a new list to one variable, will not be reflected in the others.
# module_a.py global_list = [] # declares module_a.global_list # module_b.py from module_a import global_list # module_b.global_list = module_a.global_list # main.py from module_a import global_list # main.global_list = module_a.global_list global_list.append(42) # the change is visible in all modules, the list now contains [42] global_list = [1,2,3,4] # the change only affects main.global_list, other lists still contain [42]
However, if we only import the modules, no new variables are created.
# module_a.py global_list = [] # module_a.global_list # module_b.py import module_a ...module_a.global_list... # main.py import module_a module_a.global_list.append(42) # the change is visible in all modules module_a.global_list = [1,2,3,4] # the change is visible in modules that use module_a.global_list # modules that use from module_a import global_list keep the old value
The situation may become rather chaotic if a mix of import module
and from module import obj
is used
throughout the code with the same variables.
Consider this example:
#=================== config.py =================== NUM_THREADS = 10 #=================== runner_a.py =================== import config def run_a(): print(f"Runner A uses {config.NUM_THREADS} threads") #=================== runner_b.py =================== from config import NUM_THREADS def run_b(): print(f"Runner B uses {NUM_THREADS} threads") #=================== main.py =================== import config import runner_a import runner_b config.NUM_THREADS = 20 runner_a.run_a() runner_b.run_b()
This prints
Runner A uses 20 threads Runner B uses 10 threads
I spent about an hour today trying to debug a problem similar to this one.
The discrepancy can be fixed by making the top config reference immutable, and making the modifiable parts its attributes. This is a little bit more verbose, but this way it is much harder to end up with different config values in different modules.
#=================== config.py =================== class Config: def __init__(self): self.NUM_THREADS = 10 global_config = Config() #=================== runner_a.py =================== import config def run_a(): print(f"Runner A uses {config.global_config.NUM_THREADS} threads") #=================== runner_b.py =================== from config import global_config def run_b(): print(f"Runner B uses {global_config.NUM_THREADS} threads") #=================== main.py =================== import config import runner_a import runner_b config.global_config.NUM_THREADS = 20 runner_a.run_a() runner_b.run_b()
This version prints
Runner A uses 20 threads Runner B uses 20 threads
Better yet, avoid the global config altogether, to prevent any possibility of confusion.
#=================== config.py =================== class Config: def __init__(self): self.NUM_THREADS = 10 #=================== runner_a.py =================== import config def run_a(cfg: config.Config) -> None: print(f"Runner A uses {cfg.NUM_THREADS} threads") #=================== runner_b.py =================== import config def run_b(cfg: config.Config) -> None: print(f"Runner B uses {cfg.NUM_THREADS} threads") #=================== main.py =================== import config import runner_a import runner_b main_config = config.Config() main_config.NUM_THREADS = 20 runner_a.run_a(main_config) runner_b.run_b(main_config)
The conclusion is two-fold:
Avoid mentally mapping similar concepts across programming languages, even when they look similar and do similar things. Java import is not the same as Python import.
Avoid mixing from module import obj
and import module ... module.obj
in the same codebase. Prefer the latter if obj
can
be potentially reassigned to a new value. This includes both production usage and “patching” for unit tests.
Happy importing!