Learn all about weak references in Python: reference counting, garbage collection, and practical uses of the weakref
module
Chances are high that you simply never touched and possibly haven’t even heard about Python’s weakref
module. While it may not be commonly utilized in your code, it’s fundamental to the inner workings of many libraries, frameworks and even Python itself. So, in this text we are going to explore what it’s, the way it is useful, and the way you can incorporate it into your code as well.
To grasp weakref
module and weak references, we first need a little bit intro to garbage collection in Python.
Python uses reference counting as a mechanism for garbage collection — in easy terms — Python keeps a reference count for every object we create and the reference count is incremented every time the article is referenced in code; and it’s decremented when an object is de-referenced (e.g. variable set to None
). If the reference count ever drop to zero, the memory for the article is deallocated (garbage-collected).
Let’s have a look at some code to know it a little bit more:
import sysclass SomeObject:
def __del__(self):
print(f"(Deleting {self=})")
obj = SomeObject()
print(sys.getrefcount(obj)) # 2
obj2 = obj
print(sys.getrefcount(obj)) # 3
obj = None
obj2 = None
# (Deleting self=<__main__.SomeObject object at 0x7d303fee7e80>)
Here we define a category that only implements a __del__
method, which is named when object is garbage-collected (GC’ed) – we do that in order that we are able to see when the rubbish collection happens.
After creating an instance of this class, we use sys.getrefcount
to get current variety of references to this object. We might expect to get 1
here, however the count returned by getrefcount
is mostly one higher than you would possibly expect, that is because after we call getrefcount
, the reference is copied by value into the function’s argument, temporarily bumping up the article’s reference count.
Next, if we declare obj2 = obj
and call getrefcount
again, we get 3
since it’s now referenced by each obj
and obj2
. Conversely, if we assign None
to those variables, the reference count will decrease to zero, and eventually we are going to get the message from __del__
method telling us that the article got garbage-collected.
Well, and the way do weak references fit into this? If only remaining references to an object are weak references, then Python interpreter is free to garbage-collect this object. In other words — a weak reference to an object just isn’t enough to maintain the article alive:
import weakrefobj = SomeObject()
reference = weakref.ref(obj)
print(reference) #
print(reference()) # <__main__.SomeObject object at 0x707038c0b700>
print(obj.__weakref__) #
print(sys.getrefcount(obj)) # 2
obj = None
# (Deleting self=<__main__.SomeObject object at 0x70744d42b700>)
print(reference) #
print(reference()) # None
Here we again declare a variable obj
of our class, but this time as a substitute of making second strong reference to this object, we create weak reference in reference
variable.
If we then check the reference count, we are able to see that it didn’t increase, and if we set the obj
variable to None
, we are able to see that it immediately gets garbage-collected though the weak reference still exist.
Finally, if attempt to access the weak reference to the already garbage-collected object, we get a “dead” reference and None
respectively.
Also notice that after we used the weak reference to access the article, we needed to call it as a function ( reference()
) to retrieve to object. Subsequently, it is usually more convenient to make use of a proxy as a substitute, especially if it’s worthwhile to access object attributes:
obj = SomeObject()reference = weakref.proxy(obj)
print(reference) # <__main__.SomeObject object at 0x78a420e6b700>
obj.attr = 1
print(reference.attr) # 1
Now that we understand how weak references work, let’s have a look at some examples of how they may very well be useful.
A typical use-case for weak references is tree-like data structures:
class Node:
def __init__(self, value):
self.value = value
self._parent = None
self.children = []def __repr__(self):
return "Node({!r:})".format(self.value)
@property
def parent(self):
return self._parent if self._parent is None else self._parent()
@parent.setter
def parent(self, node):
self._parent = weakref.ref(node)
def add_child(self, child):
self.children.append(child)
child.parent = self
root = Node("parent")
n = Node("child")
root.add_child(n)
print(n.parent) # Node('parent')
del root
print(n.parent) # None
Here we implement a tree using a Node
class where child nodes have weak reference to their parent. On this relation, the kid Node
can live without parent Node
, which allows parent to be silently removed/garbage-collected.
Alternatively, we are able to flip this around:
class Node:
def __init__(self, value):
self.value = value
self._children = weakref.WeakValueDictionary()@property
def children(self):
return list(self._children.items())
def add_child(self, key, child):
self._children[key] = child
root = Node("parent")
n1 = Node("child one")
n2 = Node("child two")
root.add_child("n1", n1)
root.add_child("n2", n2)
print(root.children) # [('n1', Node('child one')), ('n2', Node('child two'))]
del n1
print(root.children) # [('n2', Node('child two'))]
Here as a substitute, the parent keeps a dictionary of weak references to its children. This uses WeakValueDictionary
— every time a component (weak reference) referenced from the dictionary gets dereferenced elsewhere in this system, it robotically gets faraway from the dictionary too, so we do not have manage lifecycle of dictionary items.
One other use of weakref
is in Observer design pattern:
class Observable:
def __init__(self):
self._observers = weakref.WeakSet()def register_observer(self, obs):
self._observers.add(obs)
def notify_observers(self, *args, **kwargs):
for obs in self._observers:
obs.notify(self, *args, **kwargs)
class Observer:
def __init__(self, observable):
observable.register_observer(self)
def notify(self, observable, *args, **kwargs):
print("Got", args, kwargs, "From", observable)
subject = Observable()
observer = Observer(subject)
subject.notify_observers("test", kw="python")
# Got ('test',) {'kw': 'python'} From <__main__.Observable object at 0x757957b892d0>
The Observable
class keeps weak references to its observers, since it doesn’t care in the event that they get removed. As with previous examples, this avoids having to administer the lifecycle of dependant objects. As you almost certainly noticed, in this instance we used WeakSet
which is one other class from weakref
module, it behaves identical to the WeakValueDictionary
but is implemented using Set
.
Final example for this section is borrowed from weakref
docs:
import tempfile, shutil
from pathlib import Pathclass TempDir:
def __init__(self):
self.name = tempfile.mkdtemp()
self._finalizer = weakref.finalize(self, shutil.rmtree, self.name)
def __repr__(self):
return "TempDir({!r:})".format(self.name)
def remove(self):
self._finalizer()
@property
def removed(self):
return not self._finalizer.alive
tmp = TempDir()
print(tmp) # TempDir('/tmp/tmp8o0aecl3')
print(tmp.removed) # False
print(Path(tmp.name).is_dir()) # True
This showcases yet another feature of weakref
module, which is weakref.finalize
. Because the name suggest it allows executing a finalizer function/callback when the dependant object is garbage-collected. On this case we implement a TempDir
class which could be used to create a short lived directory – in ideal case we’d all the time remember to scrub up the TempDir
after we don’t need it anymore, but when we forget, we’ve the finalizer that may robotically run rmtree
on the directory when the TempDir
object is GC’ed, which incorporates when program exits completely.
The previous section has shown couple practical usages for weakref
, but let’s also take a have a look at real-world examples—one in all them being making a cached instance:
import logging
a = logging.getLogger("first")
b = logging.getLogger("second")
print(a is b) # Falsec = logging.getLogger("first")
print(a is c) # True
The above is basic usage of Python’s builtin logging
module – we are able to see that it allows to only associate a single logger instance with a given name – meaning that after we retrieve same logger multiple times, it all the time returns the identical cached logger instance.
If we desired to implement this, it could look something like this:
class Logger:
def __init__(self, name):
self.name = name_logger_cache = weakref.WeakValueDictionary()
def get_logger(name):
if name not in _logger_cache:
l = Logger(name)
_logger_cache[name] = l
else:
l = _logger_cache[name]
return l
a = get_logger("first")
b = get_logger("second")
print(a is b) # False
c = get_logger("first")
print(a is c) # True
And eventually, Python itself uses weak references, e.g. in implementation of OrderedDict
:
from _weakref import proxy as _proxyclass OrderedDict(dict):
def __new__(cls, /, *args, **kwds):
self = dict.__new__(cls)
self.__hardroot = _Link()
self.__root = root = _proxy(self.__hardroot)
root.prev = root.next = root
self.__map = {}
return self
The above is snippet from CPython’s collections
module. Here, the weakref.proxy
is used to forestall circular references (see the doc-strings for more details).
weakref
is fairly obscure, but at times very great tool that you need to keep in your toolbox. It may be very helpful when implementing caches or data structures which have reference loops in them, similar to doubly linked lists.
With that said, one should pay attention to weakref
support — every little thing said here and within the docs is CPython specific and different Python implementations could have different weakref
behavior. Also, most of the builtin types don’t support weak references, similar to list
, tuple
or int
.
This text was originally posted at martinheinz.dev