Using Guppy to debug Django memory leaks
There are some rather long-winded posts about Django memory leaks, but none of them cut to the chase: how do I debug the actual web application side of things? Using Guppy is the preferred method, but the instructions there are tailored toward interactive processes. There is django-performance where I cribbed some of the following knowledge from, but following this this should get you up and profiling in less than two minutes.
-
If you are running your application in a virtualenv, then getting the heap analyzer should be as easy as
pip install guppy
. -
Next, insert the following code in a file that gets loaded once by Django. A prime place for this is your root
urls.py
.import guppy from guppy.heapy import Remote Remote.on()
-
Fire up your Django process. Remember that if you have
DEBUG = True
, you will see an ever-increasing heap because of the queries being cached on the connection object. For any real profiling you will wantDEBUG
turned off. -
Now, in another terminal (with your virtualenv activated if you installed it that way), you will be able to do something like this:
$ python -c "from guppy import hpy;hpy().monitor()" <Monitor> *** Connection 1 opened *** <Monitor> lc CID PID ARGV 1 19597 ['bin/django', 'runserver'] <Monitor> sc 1 Remote connection 1. To return to Monitor, type <Ctrl-C> or .<RETURN> <Annex> int Remote interactive console. To return to Annex, type '-'. >>> hp.heap() Partition of a set of 122382 objects. Total size = 17204200 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 54342 44 4995408 29 4995408 29 str 1 28455 23 2426432 14 7421840 43 tuple 2 1711 1 1323112 8 8744952 51 dict (no owner) 3 525 0 1186104 7 9931056 58 dict of module 4 8495 7 1019400 6 10950456 64 function 5 949 1 853984 5 11804440 69 type 6 6790 6 814800 5 12619240 73 types.CodeType 7 947 1 759752 4 13378992 78 dict of type 8 3225 3 403080 2 13782072 80 list 9 285 0 236472 1 14018544 81 dict of class <501 more rows. Type e.g. '_.more' to view.> >>> hp.setref() >>> hp.heap() Partition of a set of 69 objects. Total size = 9480 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 27 39 2432 26 2432 26 tuple 1 4 6 1888 20 4320 46 dict (no owner) 2 1 1 1048 11 5368 57 dict of 0x2e1b3a0 3 3 4 888 9 6256 66 django.utils.datastructures.SortedDict 4 3 4 840 9 7096 75 dict of django.utils.datastructures.SortedDict 5 11 16 600 6 7696 81 str 6 5 7 424 4 8120 86 list 7 4 6 360 4 8480 89 unicode 8 1 1 280 3 8760 92 dict of django.db.models.base.ModelState 9 3 4 264 3 9024 95 __builtin__.weakref <6 more rows. Type e.g. '_.more' to view.> >>> q <Annex> close *** Connection 1 closed *** <Monitor> q
-
After calling
heap.setref()
, do whatever you need to do in your application to cause the memory leak, and then callhp.heap()
again. There are more operations than the two I’ve highlighted available and you will find details of those in the sites I linked at the start of this post.
See Also
- Python cached property decorator - December 10, 2010
- Arch Package Visualization - June 23, 2011
- Django South graphmigrations - February 16, 2011
- Django Proxy Models - August 1, 2009
- MySQL fails to EXPLAIN - September 29, 2010