toofishes.net

Using Guppy to debug Django memory leaks

There are some rather long-winded posts about Django memory leaks, but none of them cut to the chase: how do I debug the actual web application side of things? Using Guppy is the preferred method, but the instructions there are tailored toward interactive processes. There is django-performance where I cribbed some of the following knowledge from, but following this this should get you up and profiling in less than two minutes.

  1. If you are running your application in a virtualenv, then getting the heap analyzer should be as easy as pip install guppy.

  2. Next, insert the following code in a file that gets loaded once by Django. A prime place for this is your root urls.py.

    import guppy
    from guppy.heapy import Remote
    Remote.on()
    
  3. Fire up your Django process. Remember that if you have DEBUG = True, you will see an ever-increasing heap because of the queries being cached on the connection object. For any real profiling you will want DEBUG turned off.

  4. Now, in another terminal (with your virtualenv activated if you installed it that way), you will be able to do something like this:

     $ python -c "from guppy import hpy;hpy().monitor()"
     <Monitor> 
     *** Connection 1 opened ***
     <Monitor> lc
     CID PID   ARGV
       1 19597 ['bin/django', 'runserver']
     <Monitor> sc 1
     Remote connection 1. To return to Monitor, type <Ctrl-C> or .<RETURN>
     <Annex> int
     Remote interactive console. To return to Annex, type '-'.
     >>> hp.heap()
     Partition of a set of 122382 objects. Total size = 17204200 bytes.
      Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
          0  54342  44  4995408  29   4995408  29 str
          1  28455  23  2426432  14   7421840  43 tuple
          2   1711   1  1323112   8   8744952  51 dict (no owner)
          3    525   0  1186104   7   9931056  58 dict of module
          4   8495   7  1019400   6  10950456  64 function
          5    949   1   853984   5  11804440  69 type
          6   6790   6   814800   5  12619240  73 types.CodeType
          7    947   1   759752   4  13378992  78 dict of type
          8   3225   3   403080   2  13782072  80 list
          9    285   0   236472   1  14018544  81 dict of class
     <501 more rows. Type e.g. '_.more' to view.>
     >>> hp.setref()
     >>> hp.heap()
     Partition of a set of 69 objects. Total size = 9480 bytes.
      Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
          0     27  39     2432  26      2432  26 tuple
          1      4   6     1888  20      4320  46 dict (no owner)
          2      1   1     1048  11      5368  57 dict of 0x2e1b3a0
          3      3   4      888   9      6256  66 django.utils.datastructures.SortedDict
          4      3   4      840   9      7096  75 dict of django.utils.datastructures.SortedDict
          5     11  16      600   6      7696  81 str
          6      5   7      424   4      8120  86 list
          7      4   6      360   4      8480  89 unicode
          8      1   1      280   3      8760  92 dict of django.db.models.base.ModelState
          9      3   4      264   3      9024  95 __builtin__.weakref
     <6 more rows. Type e.g. '_.more' to view.>
     >>> q
     <Annex> close
     *** Connection 1 closed ***
     <Monitor> q
    
  5. After calling heap.setref(), do whatever you need to do in your application to cause the memory leak, and then call hp.heap() again. There are more operations than the two I’ve highlighted available and you will find details of those in the sites I linked at the start of this post.

Tags

See Also