Django Twitter tag and RSS object

Jerry Stratton, June 13, 2009

Python’s minidom makes it easy to parse RSS feeds, since RSS feeds are themselves just very simple XML. I wanted to parse my Twitter RSS feed into a context usable by Django templates.

I broke the feed down into the Feed, a Channel, and individual Items. Channels and Items are both XML nodes, so I made them inherit from a Node class that understands what is available in RSS.

[toggle code]

#!/usr/bin/python
#provide an RSS feed object for use in Django or Mako templates
import datetime, os.path, time, urllib, xml.dom.minidom
from xml.parsers.expat import ExpatError
class Node(object):
- def __init__(self, node):
  - self.node = node
  - self.title = self.getValue('title')
  - self.link = self.getValue('link')
  - self.description = self.getValue('description')
- def __str__(self):
  - return self.title
- def getValue(self, tag):
  - node = self.node.getElementsByTagName(tag)[0].firstChild
  - data = None
  - if node:
    - data = self.node.getElementsByTagName(tag)[0].firstChild.data
    - data = data.strip()
  - return data
class Channel(Node):
- def items(self, displayCount):
  - items = self.node.getElementsByTagName("item")
  - if displayCount:
    - items = items[:displayCount]
  - feedItems = []
  - for item in items:
    - feedItem = Item(item)
    - feedItems.append(feedItem)
  - return feedItems
class Item(Node):
- def __init__(self, item):
  - super(Item, self).__init__(item)
  - self.pubDate = self.getValue('pubDate')
- #provide a datetime for use by Django's date filters
- def stamp(self):
  - #Mon, 16 Mar 2009 13:02:19 +0000
  - return datetime.datetime.strptime(self.pubDate, '%a, %d %b %Y %H:%M:%S +0000')
class Feed(object):
- def __init__(self, feedURL, cache=None):
  - self.feedURL = feedURL
  - if cache:
    - self.cache = '/tmp/' + cache + '.rss'
  - else:
    - self.cache = None
- #is the cache fresh enough to use?
- def freshCache(self):
  - if self.cache and os.path.exists(self.cache):
    - #use cache if it is less than sixty minutes old
    - freshTime = time.time() - 60*60
    - if os.path.getmtime(self.cache) > freshTime:
      - return True
  - return False
- def readCache(self, forceRead=False):
  - feed = None
  - if forceRead or self.freshCache():
    - try:
      - feed = xml.dom.minidom.parse(open(self.cache))
    - except:
      - feed = None
  - return feed
- def reCache(self):
  - try:
    - feed = xml.dom.minidom.parse(urllib.urlopen(self.feedURL))
  - except ExpatError, message:
    - print "ExpatError opening URL:", message
    - feed = None
  - except IOError, message:
    - print "IOError opening URL:", message
    - feed = None
  - if self.cache:
    - if not feed:
      - feed = self.readCache(forceRead=True)
    - if feed:
      - xmlString = feed.toprettyxml(encoding="utf-8")
      - #if last created by a different user, remove it first
      - if os.path.exists(self.cache) and not os.access(self.cache, os.W_OK):
        
        os.remove(self.cache)
      - cacheFile = open(self.cache, 'w')
      - cacheFile.write(xmlString)
      - cacheFile.close()
  - return feed
- def context(self, displayCount=None):
  - context = {}
  - feed = self.readCache()
  - if not feed:
    - feed = self.reCache()
  - if feed:
    - channel = Channel(feed.getElementsByTagName("channel")[0])
    - feedItems = channel.items(displayCount)
    - context['items'] = feedItems
    - context['title'] = channel.title
    - context['feedURL'] = self.feedURL
  - return context

The Feed class caches, if possible, the output of the RSS feed, and tries not to make a request more often than once an hour.

I saved this file in an app I have called “resources”. Then I added a “tweet” tag to my templatetags:

[toggle code]

import resources.rss
from django import template
register = template.Library()
def tweet(count=1):
- feed = resources.rss.Feed('http://twitter.com/statuses/user_timeline/20020901.rss', "twitter")
- context = feed.context(displayCount=count)
- context['webURL'] = 'http://twitter.com/hoboes'
- return template.loader.render_to_string("parts/tweet.html", context)
register.simple_tag(tweet)

This uses a dedicated Django template snippet to render the tweets:

[toggle code]

<ul class="twitter">
- {% for tweet in items %}
  - <li><a href="{{ tweet.link }}">{{ tweet.title|stripLeadingText:"hoboes:" }}</a></li>
- {% endfor %}
</ul>

There’s a filter in there called “stripLeadingText” that I use to remove my Twitter name from the title:

[toggle code]

def stripLeadingText(text, toStrip):
- text = text.strip()
- if text.startswith(toStrip):
  - text = text[len(toStrip):]
- text = text.strip()
- return text
register.filter('stripLeadingText', stripLeadingText)

I can then use a “tweet” template tag to display one or more tweets:

{% tweet %}
{% tweet 2 %}
{% tweet 10 %}

You can also, of course, provide the tweets directly to any template via your views, or turn this into a tweet/endtweet loop for custom tweet HTML on every page.

A couple of caveats:

You don’t want to use /tmp for your cache files. I just used it so that the example will most likely work.
Remember to compare dates as UTC time, using, for example, datetime.datetime.utcnow().
I’ve also seen pubDate formats which use GMT instead of +0000. You may need to account for that if you use multiple feeds and each uses a different format.
If to-the-second timeliness is important, you’ll want to pay attention to the ETags that Twitter sends as well as throttling based on the cache’s timestamp. I make either one or two requests per day, and I’m not sure ETags matter on Twitter anyway.

August 14, 2009: Using ETag and If-Modified-Since

In Django Twitter tag and RSS object I wrote “If to-the-second timeliness is important, you’ll want to pay attention to the ETags that Twitter sends as well as throttling based on the cache’s timestamp.”

I ended up needing that for another project. The main difference is that you have to manage HTTP headers, and to manage HTTP headers you have to use urllib2 instead of urllib.

This change will require the addition of two methods, and modifying the reCache method. Also, change the import at the top of the file from urllib to urllib2.

Here’s the new reCache:

[toggle code]

def reCache(self):
- feedStream = self.openFeed()
- feed = None
- if feedStream:
  - try:
    - feed = xml.dom.minidom.parse(feedStream)
  - except ExpatError, message:
    - print "ExpatError opening URL:", message, self.feedURL
    - feed = None
  - except IOError, message:
    - print "IOError opening URL:", message, self.feedURL
    - feed = None
- if self.cache:
  - if not feed:
    - feed = self.readCache(forceRead=True)
  - if feed:
    - xmlString = feed.toprettyxml(encoding="utf-8")
    - #if last created by a different user, remove it first
    - self.ensureWritability(self.cache)
    - cacheFile = open(self.cache, 'w')
    - cacheFile.write(xmlString)
    - cacheFile.close()
- return feed

This uses two new functions. One is easy: ensureWritability is the same code as before to make sure that the process that’s running this code can write to the cache file. I’ve moved it off into a separate method, because now we’re going to have to cache an ETag also.

[toggle code]

def ensureWritability(self, filepath):
- if os.path.exists(filepath) and not os.access(filepath, os.W_OK):
  - os.remove(filepath)

The bulk of the new work is done with the openFeed method. The urllib2 requires a lot more fiddling than does urrllib, and managing ETags also requires some fiddling. You need to keep track of the ETag of the feed if one is provided, and then in the future, ask for the new feed “if the new feed doesn’t match this old feed”.

Read the full post and comments

Dive Into Python: Handling Last-Modified and ETag: Using urllib2 to add special headers to request the page only if the page has changed.
xml.dom.minidom: “xml.dom.minidom is a light-weight implementation of the Document Object Model interface. It is intended to be simpler than the full DOM and also significantly smaller.”

More Django

Converting an existing Django model to Django-MPTT: Using a SQL database to mimic a filesystem will, eventually, create bottlenecks when it comes to traversing the filesystem. One solution is modified preordered tree traversal, which saves the tree structure in an easily-used manner inside the model.
Two search bookmarklets for Django: Bookmarklets—JavaScript code in a bookmark—can make working with big Django databases much easier.
Fixing Django’s feed generator without hacking Django: It looks like it’s going to be a while before the RSS feed generator in Django is going to get fixed, so I looked into subclassing as a way of getting a working guid in my Django RSS feeds.
ModelForms and FormViews: This is just a notice because when I did a search, nothing came up. Don’t use ModelForm with FormView, use UpdateView instead.
Django: fix_ampersands and abbreviations: The fix_ampersands filter will miss some cases where ampersands need to be replaced.
29 more pages with the topic Django, and other related pages

More Python

Quick-and-dirty old-school island script: Here’s a Python-based island generator using the tables from the Judges Guild Island Book 1.
Astounding Scripts on Monterey: Monterey removes Python 2, which means that you’ll need to replace it if you’re still using any Python 2 scripts; there’s also a minor change with Layer Windows and GraphicConverter.
Goodreads: What books did I read last week and last month?: I occasionally want to look in Goodreads for what I read last month or last week, and that currently means sorting by date read and counting down to the beginning and end of the period in question. This Python script will do that search on an exported Goodreads csv file.
Test classes and objects in python: One of the advantages of object-oriented programming is that objects can masquerade as each other.
Timeout class with retry in Python: In Paramiko’s ssh client, timeouts don’t seem to work; a signal can handle this—and then can also perform a retry.
30 more pages with the topic Python, and other related pages

More RSS

No more Twitter on masthead: Yes, my Twitter feed is off of the masthead, due to the lack of an easy RSS feed.
Fixing Django’s feed generator without hacking Django: It looks like it’s going to be a while before the RSS feed generator in Django is going to get fixed, so I looked into subclassing as a way of getting a working guid in my Django RSS feeds.
Why I still use RSS: I still use RSS because connections regularly fail, especially to Twitter.
Using ETag and If-Modified-Since: In the article on grabbing an RSS feed, I mentioned that if you’re grabbing a feed more than once a day, you should pay attention to the ETag and the If-Modified-Since headers. Here’s how to do that.

More XML

Catalina: iTunes Library XML: What does Catalina mean for 42 Astounding Scripts?
Parsing JSKit/Echo XML using PHP: In the comments, dpusa wants to import JSKit comments into WordPress, which uses PHP. Here’s how to parse them using PHP.
Parsing JSKit/Echo XML comments files: While I’m not a big fan of remote comment systems for privacy reasons, I was willing to use JSKit as a temporary solution because they provide an easy XML dump of posted comments. This weekend, I finally moved my main blog to custom comments; here’s how I parsed JSKit’s XML file.
Auto-closing HTML tags in comments: One of the biggest problems on blogs is that comments often get stuck with unclosed italics, bold, or links. You can automatically close them by transforming the HTML snippet into an XML document.
minidom self-closes empty SCRIPT tags: Python’s minidom will self-close empty script tags—as it should. But it turns out that Firefox 3.6 and IE 8 don’t support empty script tags.
Five more pages with the topic XML, and other related pages

Comments?

The undiscovered comment form, whose bourn no poster returns.

Your email, URL, and location are optional—but I won’t be able to contact you if you don’t leave a working email. Your email does not get displayed, your URL and location do. Your name is required but may vary as the needs of the day demand, or you can just use the anonymous Hark Thrice name. You can use the following tags: <em>, <a>, <blockquote>. Use them wisely and post intelligently. Comments may take some time to approve, especially if I’m stuck in a Mexican jail.

If you have private comments, or questions about this page, please, leave a message on the Negative Space Comments Page.

Lost?

If you’re looking for something here, use the search box in the navigation to limit your search to this part of the site, or use the Negative Space search page.

Jerry

No dish remains the same when you can choose whether to make it or not. — Delizia! The Epic History of the Italians and Their Food^•

Contents of Negative Space™ as a whole Copyright © 1994-2025 Jerry Stratton. Individual copyrights remain held by their respective authors unless they specify otherwise. Site titles, such as Negative Space, Strange Bedfellows, Biblyon Broadsheet, Highland Games, and FireBlade Coffeehouse are trademarks of Jerry Stratton.

Code and code snippets, to the extent that they are copyrightable, may be re-distributed under the terms of the GNU General Public License 3.

Django Twitter tag and RSS object last modified August 15th, 2009.

Your comment
Your name
Your email
Your web page
Your location

Mimsy Were the Borogoves

Django Twitter tag and RSS object

More Django

More Python

More RSS

More XML

Editorials

Books, Movies, & Music

Technology & Hacks

Food

42 Astounding Scripts

Walkerville Reader

Biblyon Broadsheet

About Mimsy

Comments?

Lost?

Mimsy Were the Borogoves

Django Twitter tag and RSS object

More Django

More Python

More RSS

More XML

Editorials

Books, Movies, & Music

Technology & Hacks

Food

42 Astounding Scripts

Walkerville Reader

Biblyon Broadsheet

Blogroll

Keep in touch

About Mimsy

Comments?

Lost?