Thinking Python: Django cache expiration time
We have a task/project manager in which each task can have any other task as its parent, and different tasks can be applied to different employees. To show a project for any one employee involves collecting the tasks assigned to them, and collecting connected tasks into projects. And then counting up the completed tasks vs. the incompleted tasks to show a progress bar and an estimated time of completion.
Our skunkworks Django server is a bit old, and can be slow to display complex projects. I’ve tried to convince them not to create complex projects (most of the offenders are never-ending projects that should be broken into smaller projects that can actually be completed), but that’s an uphill battle. So I started looking into Django’s caching.
The first thing I noticed is that Django’s caching is built solely around caches expiring themselves. There’s nothing within the cache object for dynamic expiration based on new data coming in. It doesn’t even have a means of getting the timestamp of when the cache was created.
I ended up looking into the cache code, and found that cache.get() looks at the expiration time before returning the cached data. So I added a get_stamp() method to CacheClass in django/core/cache/backends/filebased.py:
[toggle code]
-
def get_stamp(self, key, timeout=None):
- fname = self._key_to_file(key)
-
try:
- f = open(fname, 'rb')
- exp = pickle.load(f)
- f.close()
-
if timeout:
- exp = exp - timeout
- return exp
-
except(IOError, OSError, EOFError, pickle.PickleError):
- pass
- return 0
As you can see, it runs into a problem immediately: what I want is when the cache was created, but Django’s caches are very simple. They store only the absolutely necessary information from the perspective of the cache: when does the cache expire, and what does the cache contain? It does not store the time the cache was created. So this method accepts the same timeout value that cache.get() accepts in order to calculate when the cache was most likely created.
Here’s how I used it:
[toggle code]
- from django.core.cache import cache
- import datetime
-
def personalProjects(self):
- #cache for a long time (one day)--if there's new stuff, we'll recache anyway
- cacheTime = 60*60*24
- cacheKey = 'personal-projects-' + str(self.id)
- rawTasks = Task.objects.filter(assignees=self)
- #if anything has changed since the last cache, recreate the cache
- recentTasks = rawTasks.filter(edited__gte=datetime.datetime.fromtimestamp(cache.get_stamp(cacheKey, cacheTime)))
-
if not recentTasks:
- openProjects = cache.get(cacheKey)
-
if openProjects:
- return openProjects
- …
- #cache both the time of this cache, and the open projects
- cache.set(cacheKey, openProjects, cacheTime)
- return openProjects
It seems to work fine. However, this seems like pretty basic functionality. Whenever I see a major project, like Django, missing what I consider to be basic functionality, I know there’s a good chance I’m missing something. So I posted the proposed addition to CacheClass to the Django developers newsgroup. The first workaround proposed (by Thomas Adamcik) was a very Pythonesque solution: use tuples. Instead of caching the data, cache a tuple of the current timestamp and the data.
Simply using cache.set('your_cache_key', (datetime.now(), your_value)) (or time() instead of datetime.now()) should allow you to keep track of this in a way which doesn’t require modifying any core Django functionality :)
The problem with modifying core functionality is that you have to do it every time you upgrade, unless (and until) your modifications make it into the distributed code. So I try to avoid this whenever possible. Using tuples instead of returning the modified expiration time means that not only don’t I have to modify Django’s code, I can store the real cache time instead of guessing it based on expiration time:
[toggle code]
- from django.core.cache import cache
- import datetime
-
def personalProjects(self):
- #cache for a long time (one day)--if there's new stuff, we'll recache anyway
- cacheTime = 60*60*24
- cacheKey = 'personal-project-' + str(self.id)
- (cacheStamp, cachedProjects) = cache.get(cacheKey, (None, None))
- rawTasks = Task.objects.filter(assignees=self)
-
if cachedProjects and cacheStamp:
- #if nothing has changed since the last cache, return the last cache
- recentTasks = rawTasks.filter(edited__gte=cacheStamp)
-
if not recentTasks:
- return cachedProjects
- …
- #cache both the time of this cache, and the open projects
- cache.set(cacheKey, (datetime.datetime.now(), openProjects), cacheTime)
- return openProjects
Note that I changed the cache key from personal-projects-id to personal-project-id. If the cache key remained the same, existing caches would fail when returned, since they aren’t yet tuples.
- get_stamp for CacheClass?: Jerry Stratton
- “As I’m working with caches, I’ve found myself wanting to know when the cache was created, to compare against the last time some data was updated.”
More Django
- Converting an existing Django model to Django-MPTT
- Using a SQL database to mimic a filesystem will, eventually, create bottlenecks when it comes to traversing the filesystem. One solution is modified preordered tree traversal, which saves the tree structure in an easily-used manner inside the model.
- Two search bookmarklets for Django
- Bookmarklets—JavaScript code in a bookmark—can make working with big Django databases much easier.
- Fixing Django’s feed generator without hacking Django
- It looks like it’s going to be a while before the RSS feed generator in Django is going to get fixed, so I looked into subclassing as a way of getting a working guid in my Django RSS feeds.
- ModelForms and FormViews
- This is just a notice because when I did a search, nothing came up. Don’t use ModelForm with FormView, use UpdateView instead.
- Django: fix_ampersands and abbreviations
- The fix_ampersands filter will miss some cases where ampersands need to be replaced.
- 29 more pages with the topic Django, and other related pages
More Python
- Quick-and-dirty old-school island script
- Here’s a Python-based island generator using the tables from the Judges Guild Island Book 1.
- Astounding Scripts on Monterey
- Monterey removes Python 2, which means that you’ll need to replace it if you’re still using any Python 2 scripts; there’s also a minor change with Layer Windows and GraphicConverter.
- Goodreads: What books did I read last week and last month?
- I occasionally want to look in Goodreads for what I read last month or last week, and that currently means sorting by date read and counting down to the beginning and end of the period in question. This Python script will do that search on an exported Goodreads csv file.
- Test classes and objects in python
- One of the advantages of object-oriented programming is that objects can masquerade as each other.
- Timeout class with retry in Python
- In Paramiko’s ssh client, timeouts don’t seem to work; a signal can handle this—and then can also perform a retry.
- 30 more pages with the topic Python, and other related pages