Converting an existing Django model to Django-MPTT
For a long time I’ve been painfully aware of a glaring inefficiency in my custom blog software built off of Django. A lot of what it does relies on the tree model: folders within folders. Finding all the front-page articles for my main blog means getting all the children of each category, and then any descendants of those children, and so on. Previewing a new blog article can take a minute or more, mainly because of the recursive nature of building the sidebar. There are only seven articles on the front page, but they require looking through a tree with 1,505 articles spread over five levels.
I’ve been aware of various solutions for turning Django into more of a tree-hugger, but they’ve tended either to not work well with an existing installation, or if they did, their documentation hid that. For example, django-mptt until a few years ago hid the very important .rebuild method, which is necessary for building the tree data for an existing dataset.
Having just finished the basic installation, however, django-mptt is very helpful. On my very simple, not very tree-like blog, the Walkerville Weekly Reader, publish time dropped by around 13% to 21%; since it only took about two minutes anyway, that’s not necessarily a big deal. But Mimsy Were the Borogoves, which I’ve been running since the nineties, has a lot more articles and a much more complex structure. Publish time dropped from about 30 minutes to about 10 minutes. That’s worth the installation troubles.
Converting my model to use django-mptt
I installed django-mptt by downloading version 0.8.0, unpacking it, and running setup, but, despite the documentation claiming that it’s okay with Django 1.8.x, it performed an automatic upgrade to Django 1.9.1. I had to downgrade again using:
- sudo pip uninstall django
- sudo pip install django==1.8.8
I’m using django-ajax-selects version 1.3.6, which doesn’t work with Django 1.9.x. While the 1.4 version is compatible with Django 1.9, the 1.4 version doesn’t work on inlines, and inlines are the main reason I use ajax-selects. I have a list of 9,000 URLs and 14,000 pages that can be attached via inline to any page, not to mention the keywords and authors and so forth. Without ajax-select, it takes forever for browsers to render pages because those become pull-down menus.
Converting to django-mptt is very easy. I added this to the top of my models.py:
- from mptt.models import MPTTModel, TreeForeignKey, TreeManager
I had to include TreeManager because I have a custom manager. If you use the standard manager, you shouldn’t need it.
[toggle code]
- class PageManager(models.Manager):
- …
-
class Page(models.Model):
- parent = models.ForeignKey(‘self’, blank=True, null=True)
became:
[toggle code]
- class PageManager(TreeManager):
- …
-
class Page(MPTTModel):
- parent = TreeForeignKey(‘self’, blank=True, null=True)
I also had a “level” method on my Page model, which I needed to rename because it interfered with the MPTTModel’s level column.
I used “./manage.py sqlall pages” to find the new columns and indexes, and then added them by hand in SQLite:
- ALTER TABLE pages_page ADD COLUMN lft INTEGER UNSIGNED NOT NULL DEFAULT 0;
- ALTER TABLE pages_page ADD COLUMN rght INTEGER UNSIGNED NOT NULL DEFAULT 0;
- ALTER TABLE pages_page ADD COLUMN tree_id INTEGER UNSIGNED NOT NULL DEFAULT 0;
- ALTER TABLE pages_page ADD COLUMN level INTEGER UNSIGNED NOT NULL DEFAULT 0;
- CREATE INDEX "pages_page_329f6fb3" ON "pages_page" ("lft");
- CREATE INDEX "pages_page_e763210f" ON "pages_page" ("rght");
- CREATE INDEX "pages_page_ba470c4a" ON "pages_page" ("tree_id");
- CREATE INDEX "pages_page_20e079f4" ON "pages_page" ("level");
Your index names are likely to be different.
After verifying that everything still worked, I built the tree data:
- Page.objects.rebuild()
It took about ten minutes for it to create the tree data.
At that point, I left it alone for a few days to make sure I hadn’t introduced anything that broke the blog.
Code
Your code changes will depend on your code, of course. In my case, I had two methods that needed to go:
[toggle code]
- #get all levels of responses to a blog article or members of a blog category
-
def getBlogArticles(self):
- articles = self.children().exclude(provider='nisus')
- possiblyUpdated = Page.objects.live().filter(parent__in=articles, provider='db').exclude(pagekey__keyword__key='Voices')
-
for article in possiblyUpdated:
- articles = articles | article.parent.getBlogArticles()
-
if self.blogCategory:
- articles = articles.order_by('rank', '-added')
-
else:
- articles = articles.order_by('-added')
- return articles
- #uses list to reduce size of query, as I think that without it the query is too long for sqlite
-
def alldescendants(self):
- idescendants = list(Page.objects.filter(parent=self).values_list('id', flat=True))
- descendants = []
-
while idescendants:
- descendants.extend(idescendants)
- idescendants = list(Page.objects.filter(parent__in=idescendants).values_list('id', flat=True))
- descendants = Page.objects.filter(parent__in=descendants, live='public')
- return descendants
-
def modified(self):
- descendants = self.alldescendants()
-
try:
- mostrecent = descendants.order_by('-edited')[:1].get().edited
- lastmodified = max(mostrecent, self.edited)
-
except:
- lastmodified = self.changed()
- return lastmodified
As you can see, I had two methods of getting all descendants before installing django-mptt. One was to recurse the method on every child and then every child’s child, as I did with getBlogArticles. This had the advantage of always working, but being very slow. The second, in alldescendants was to get the id of all children, and then get every page that had a parent in that list of ids, and then repeat that for the new set of pages, and so on, until no pages had parents in the furthest generation. This had the advantage of being faster, but the disadvantage of creating gigantic SQL statements.1
[toggle code]
- #get all levels of responses to a blog article or members of a blog category
-
def getBlogArticles(self):
- articles = self.get_descendants().filter(provider='db').exclude(pagekey__keyword__key='Voices')
- articles = articles & Page.objects.live()
-
if self.blogcategory:
- articles = articles.order_by('rank', '-added')
-
else:
- articles = articles.order_by('-added')
- return articles
-
def alldescendants(self, include_self=False):
- return self.get_descendants(include_self=include_self).filter(live='public')
-
def modified(self):
- descendants = self.alldescendants(include_self=True)
- return descendants.order_by('-edited')[:1].get().edited
One immediate benefit, besides the massive speed increase for the more complex blog, is that I no longer have to deal with those incredibly large flat lists of ids. When trying to find the most recent modification within a folder, doing so under a parent with too many descendants ran the risk of an error, at which point I just defaulted to the modification time of the parent
. Using MPTT, however, the modified function no longer needs the try/except clause.If you’re running a large tree, and traversing the tree is becoming time-consuming, look into django-mptt.
Note that there’s actually a bug in alldescendants which I only noticed checking the behavior of the new method against the behavior of the old one: the final parent__in should have included not just the descendants of self, but self as well. Otherwise, it won’t include self’s immediate children, but only its grandchildren and greater grandchildren.
↑
- Django MPTT
- “Utilities for implementing Modified Preorder Tree Traversal with your Django Models and working with trees of Model instances.”
- django-ajax-selects
- “Enables editing of ForeignKey, ManyToMany and CharField using jQuery UI Autocomplete.”
- Mimsy Were the Borogoves
- The official blog of Jerry Stratton.
- The Walkerville Weekly Reader
- In the end times, one newspaper dared to call God to task for His hypocrisy. That newspaper was not us, we swear it. Not the eternal flames!
More Django
- Two search bookmarklets for Django
- Bookmarklets—JavaScript code in a bookmark—can make working with big Django databases much easier.
- Fixing Django’s feed generator without hacking Django
- It looks like it’s going to be a while before the RSS feed generator in Django is going to get fixed, so I looked into subclassing as a way of getting a working guid in my Django RSS feeds.
- ModelForms and FormViews
- This is just a notice because when I did a search, nothing came up. Don’t use ModelForm with FormView, use UpdateView instead.
- Django: fix_ampersands and abbreviations
- The fix_ampersands filter will miss some cases where ampersands need to be replaced.
- Custom managers for Django ForeignKeys
- I’ve got one really annoying model for keywords. There’s one category of keywords that, by default, should not show up when used as a ForeignKey for most models. Key word: most.
- 29 more pages with the topic Django, and other related pages