Django Twitter tag and RSS object
Python’s minidom makes it easy to parse RSS feeds, since RSS feeds are themselves just very simple XML. I wanted to parse my Twitter RSS feed into a context usable by Django templates.
I broke the feed down into the Feed, a Channel, and individual Items. Channels and Items are both XML nodes, so I made them inherit from a Node class that understands what is available in RSS.
[toggle code]
- #!/usr/bin/python
- #provide an RSS feed object for use in Django or Mako templates
- import datetime, os.path, time, urllib, xml.dom.minidom
- from xml.parsers.expat import ExpatError
-
class Node(object):
-
def __init__(self, node):
- self.node = node
- self.title = self.getValue('title')
- self.link = self.getValue('link')
- self.description = self.getValue('description')
-
def __str__(self):
- return self.title
-
def getValue(self, tag):
- node = self.node.getElementsByTagName(tag)[0].firstChild
- data = None
-
if node:
- data = self.node.getElementsByTagName(tag)[0].firstChild.data
- data = data.strip()
- return data
-
def __init__(self, node):
-
class Channel(Node):
-
def items(self, displayCount):
- items = self.node.getElementsByTagName("item")
-
if displayCount:
- items = items[:displayCount]
- feedItems = []
-
for item in items:
- feedItem = Item(item)
- feedItems.append(feedItem)
- return feedItems
-
def items(self, displayCount):
-
class Item(Node):
-
def __init__(self, item):
- super(Item, self).__init__(item)
- self.pubDate = self.getValue('pubDate')
- #provide a datetime for use by Django's date filters
-
def stamp(self):
- #Mon, 16 Mar 2009 13:02:19 +0000
- return datetime.datetime.strptime(self.pubDate, '%a, %d %b %Y %H:%M:%S +0000')
-
def __init__(self, item):
-
class Feed(object):
-
def __init__(self, feedURL, cache=None):
- self.feedURL = feedURL
-
if cache:
- self.cache = '/tmp/' + cache + '.rss'
-
else:
- self.cache = None
- #is the cache fresh enough to use?
-
def freshCache(self):
-
if self.cache and os.path.exists(self.cache):
- #use cache if it is less than sixty minutes old
- freshTime = time.time() - 60*60
-
if os.path.getmtime(self.cache) > freshTime:
- return True
- return False
-
if self.cache and os.path.exists(self.cache):
-
def readCache(self, forceRead=False):
- feed = None
-
if forceRead or self.freshCache():
-
try:
- feed = xml.dom.minidom.parse(open(self.cache))
-
except:
- feed = None
-
try:
- return feed
-
def reCache(self):
-
try:
- feed = xml.dom.minidom.parse(urllib.urlopen(self.feedURL))
-
except ExpatError, message:
- print "ExpatError opening URL:", message
- feed = None
-
except IOError, message:
- print "IOError opening URL:", message
- feed = None
-
if self.cache:
-
if not feed:
- feed = self.readCache(forceRead=True)
-
if feed:
- xmlString = feed.toprettyxml(encoding="utf-8")
- #if last created by a different user, remove it first
-
if os.path.exists(self.cache) and not os.access(self.cache, os.W_OK):
- os.remove(self.cache)
- cacheFile = open(self.cache, 'w')
- cacheFile.write(xmlString)
- cacheFile.close()
-
if not feed:
- return feed
-
try:
-
def context(self, displayCount=None):
- context = {}
- feed = self.readCache()
-
if not feed:
- feed = self.reCache()
-
if feed:
- channel = Channel(feed.getElementsByTagName("channel")[0])
- feedItems = channel.items(displayCount)
- context['items'] = feedItems
- context['title'] = channel.title
- context['feedURL'] = self.feedURL
- return context
-
def __init__(self, feedURL, cache=None):
The Feed class caches, if possible, the output of the RSS feed, and tries not to make a request more often than once an hour.
I saved this file in an app I have called “resources”. Then I added a “tweet” tag to my templatetags:
[toggle code]
- import resources.rss
- from django import template
- register = template.Library()
-
def tweet(count=1):
- feed = resources.rss.Feed('http://twitter.com/statuses/user_timeline/20020901.rss', "twitter")
- context = feed.context(displayCount=count)
- context['webURL'] = 'http://twitter.com/hoboes'
- return template.loader.render_to_string("parts/tweet.html", context)
- register.simple_tag(tweet)
This uses a dedicated Django template snippet to render the tweets:
[toggle code]
-
<ul class="twitter">
-
{% for tweet in items %}
- <li><a href="{{ tweet.link }}">{{ tweet.title|stripLeadingText:"hoboes:" }}</a></li>
- {% endfor %}
-
{% for tweet in items %}
- </ul>
There’s a filter in there called “stripLeadingText” that I use to remove my Twitter name from the title:
[toggle code]
-
def stripLeadingText(text, toStrip):
- text = text.strip()
-
if text.startswith(toStrip):
- text = text[len(toStrip):]
- text = text.strip()
- return text
- register.filter('stripLeadingText', stripLeadingText)
I can then use a “tweet” template tag to display one or more tweets:
- {% tweet %}
- {% tweet 2 %}
- {% tweet 10 %}
You can also, of course, provide the tweets directly to any template via your views, or turn this into a tweet/endtweet loop for custom tweet HTML on every page.
A couple of caveats:
- You don’t want to use /tmp for your cache files. I just used it so that the example will most likely work.
- Remember to compare dates as UTC time, using, for example, datetime.datetime.utcnow().
- I’ve also seen pubDate formats which use GMT instead of +0000. You may need to account for that if you use multiple feeds and each uses a different format.
- If to-the-second timeliness is important, you’ll want to pay attention to the ETags that Twitter sends as well as throttling based on the cache’s timestamp. I make either one or two requests per day, and I’m not sure ETags matter on Twitter anyway.
- August 14, 2009: Using ETag and If-Modified-Since
-
In Django Twitter tag and RSS object I wrote “If to-the-second timeliness is important, you’ll want to pay attention to the ETags that Twitter sends as well as throttling based on the cache’s timestamp.”
I ended up needing that for another project. The main difference is that you have to manage HTTP headers, and to manage HTTP headers you have to use urllib2 instead of urllib.
This change will require the addition of two methods, and modifying the reCache method. Also, change the import at the top of the file from urllib to urllib2.
Here’s the new reCache:
[toggle code]
-
def reCache(self):
- feedStream = self.openFeed()
- feed = None
-
if feedStream:
-
try:
- feed = xml.dom.minidom.parse(feedStream)
-
except ExpatError, message:
- print "ExpatError opening URL:", message, self.feedURL
- feed = None
-
except IOError, message:
- print "IOError opening URL:", message, self.feedURL
- feed = None
-
try:
-
if self.cache:
-
if not feed:
- feed = self.readCache(forceRead=True)
-
if feed:
- xmlString = feed.toprettyxml(encoding="utf-8")
- #if last created by a different user, remove it first
- self.ensureWritability(self.cache)
- cacheFile = open(self.cache, 'w')
- cacheFile.write(xmlString)
- cacheFile.close()
-
if not feed:
- return feed
This uses two new functions. One is easy: ensureWritability is the same code as before to make sure that the process that’s running this code can write to the cache file. I’ve moved it off into a separate method, because now we’re going to have to cache an ETag also.
[toggle code]
-
def ensureWritability(self, filepath):
-
if os.path.exists(filepath) and not os.access(filepath, os.W_OK):
- os.remove(filepath)
-
if os.path.exists(filepath) and not os.access(filepath, os.W_OK):
The bulk of the new work is done with the openFeed method. The urllib2 requires a lot more fiddling than does urrllib, and managing ETags also requires some fiddling. You need to keep track of the ETag of the feed if one is provided, and then in the future, ask for the new feed “if the new feed doesn’t match this old feed”.
-
def reCache(self):
- Dive Into Python: Handling Last-Modified and ETag
- Using urllib2 to add special headers to request the page only if the page has changed.
- xml.dom.minidom
- “xml.dom.minidom is a light-weight implementation of the Document Object Model interface. It is intended to be simpler than the full DOM and also significantly smaller.”
More Django
- Converting an existing Django model to Django-MPTT
- Using a SQL database to mimic a filesystem will, eventually, create bottlenecks when it comes to traversing the filesystem. One solution is modified preordered tree traversal, which saves the tree structure in an easily-used manner inside the model.
- Two search bookmarklets for Django
- Bookmarklets—JavaScript code in a bookmark—can make working with big Django databases much easier.
- Fixing Django’s feed generator without hacking Django
- It looks like it’s going to be a while before the RSS feed generator in Django is going to get fixed, so I looked into subclassing as a way of getting a working guid in my Django RSS feeds.
- ModelForms and FormViews
- This is just a notice because when I did a search, nothing came up. Don’t use ModelForm with FormView, use UpdateView instead.
- Django: fix_ampersands and abbreviations
- The fix_ampersands filter will miss some cases where ampersands need to be replaced.
- 29 more pages with the topic Django, and other related pages
More Python
- Quick-and-dirty old-school island script
- Here’s a Python-based island generator using the tables from the Judges Guild Island Book 1.
- Astounding Scripts on Monterey
- Monterey removes Python 2, which means that you’ll need to replace it if you’re still using any Python 2 scripts; there’s also a minor change with Layer Windows and GraphicConverter.
- Goodreads: What books did I read last week and last month?
- I occasionally want to look in Goodreads for what I read last month or last week, and that currently means sorting by date read and counting down to the beginning and end of the period in question. This Python script will do that search on an exported Goodreads csv file.
- Test classes and objects in python
- One of the advantages of object-oriented programming is that objects can masquerade as each other.
- Timeout class with retry in Python
- In Paramiko’s ssh client, timeouts don’t seem to work; a signal can handle this—and then can also perform a retry.
- 30 more pages with the topic Python, and other related pages
More RSS
- No more Twitter on masthead
- Yes, my Twitter feed is off of the masthead, due to the lack of an easy RSS feed.
- Fixing Django’s feed generator without hacking Django
- It looks like it’s going to be a while before the RSS feed generator in Django is going to get fixed, so I looked into subclassing as a way of getting a working guid in my Django RSS feeds.
- Why I still use RSS
- I still use RSS because connections regularly fail, especially to Twitter.
- Using ETag and If-Modified-Since
- In the article on grabbing an RSS feed, I mentioned that if you’re grabbing a feed more than once a day, you should pay attention to the ETag and the If-Modified-Since headers. Here’s how to do that.
More XML
- Catalina: iTunes Library XML
- What does Catalina mean for 42 Astounding Scripts?
- Parsing JSKit/Echo XML using PHP
- In the comments, dpusa wants to import JSKit comments into WordPress, which uses PHP. Here’s how to parse them using PHP.
- Parsing JSKit/Echo XML comments files
- While I’m not a big fan of remote comment systems for privacy reasons, I was willing to use JSKit as a temporary solution because they provide an easy XML dump of posted comments. This weekend, I finally moved my main blog to custom comments; here’s how I parsed JSKit’s XML file.
- Auto-closing HTML tags in comments
- One of the biggest problems on blogs is that comments often get stuck with unclosed italics, bold, or links. You can automatically close them by transforming the HTML snippet into an XML document.
- minidom self-closes empty SCRIPT tags
- Python’s minidom will self-close empty script tags—as it should. But it turns out that Firefox 3.6 and IE 8 don’t support empty script tags.
- Five more pages with the topic XML, and other related pages