Code formatting Django tag
Long ago I wrote a Perl script for representing programming code in HTML using lists. Recently I’ve been converting my pages from HTML to XHTML and have had to redo that script: I had mistakenly nested lists by placing sublists as direct children of the parent list. That isn’t right: sublists need to be children of a list item.
It occurred to me that since my new pages also use Django, that I should be able to write a Django template tag that formats code on the fly; I would then be able to make any changes and have them apply automatically to all code snippets on my site. After learning a bit about xml.dom.minidom for my excerpting partial XHTML project, I realized it was perfect for the task.
[toggle code]
- from django import template
- import re
- from xml.dom import minidom
- #display code as a list
-
def do_code(parser, token):
- nodelist = parser.parse(('endcode',))
- parser.delete_first_token()
- return codeNode(nodelist)
-
class codeNode(template.Node):
-
def __init__(self, nodelist):
- self.nodelist = nodelist
-
def render(self, context):
- #lines = self.rawSource().split("\n")
- lines = self.nodelist.render(context).split("\n")
- document = minidom.Document()
- code = self.createList(document, lines)
- code.setAttribute('class', 'code')
- code = code.toprettyxml()
- return code
-
def level(self, line):
-
if line.startswith("\t"):
- lineLevel = len(re.findall('^\t+', line)[0])
-
else:
- lineLevel = 0
- return lineLevel
-
if line.startswith("\t"):
-
def clean(self, line, lineLevel):
- cleanLine = line.strip()
- #only reset line level for lines with something in them
-
if cleanLine:
- lineLevel = self.level(line)
- return cleanLine, lineLevel
-
def createList(self, document, lines, listLevel=0):
- ul = document.createElement("ul")
- #no sections from empty lines at the start of the code
- #but afterwards, empty lines mean that the next element needs a section class
- started=False
- markSection = False
- currentLine = 0
- maxLine = len(lines)
- lineLevel = listLevel
-
while currentLine < maxLine:
- line = lines[currentLine]
- cleanLine, lineLevel = self.clean(line, lineLevel)
- #if the indentation has grown, send the sublines out to make a new list
-
if lineLevel > listLevel:
- subLines = []
-
while currentLine < maxLine and lineLevel > listLevel:
- subLines.append(line)
- currentLine = currentLine + 1
-
if currentLine < maxLine:
- line = lines[currentLine]
- cleanLine, lineLevel = self.clean(line, lineLevel)
-
else:
- cleanLine = ''
- lineLevel = 0
- markSection, subUL = self.createList(document, subLines, listLevel+1)
-
if not started:
- li = document.createElement("li")
- ul.appendChild(li)
- ul.childNodes[-1].appendChild(subUL)
- #what's left is text; create text node and put it in an LI
-
if cleanLine:
- li = document.createElement("li")
- li.appendChild(document.createTextNode(cleanLine))
- #after blank lines, give the list item a special class
-
if markSection:
- li.setAttribute('class', 'section')
- markSection = False
- ul.appendChild(li)
- started = True
-
elif started:
- markSection = True
- currentLine = currentLine + 1
- #sublists need to return both the list and whether or not there were blank lines left over
-
if listLevel > 0:
- return markSection, ul
-
else:
- return ul
-
def __init__(self, nodelist):
- register = template.Library()
- register.tag('code', do_code)
One thing I’m doing a bit differently here is that I’m not rendering the nodelist that comes back from parser.parse(). Instead, I’m pulling the raw source out (probably incorrectly, but I can’t find any documentation on it). This way, I don’t have to worry about Django tags being inside my code—they can be displayed, too. This is handled with the rawNode() method. It loops through each node in the nodelist, pulls out the raw source, and concatenates it together determines the starting and ending point of the code in the raw source, and extracts that portion. This is useful enough that I extracted it into a parent class that can be used for other template tags. The raw source is only available if DEBUG=True is set in settings.py. This is not a viable option. I’ve modified the above code to reflect this. For historical reasons, here is the old class for retrieving the raw source:
[toggle code]
- #this node is for template tags that need to be able to return their raw source
-
class RawNode(template.Node):
- #get the raw code
-
def rawSource(self):
- code, codeRange = self.source
- start = codeRange[1]
- end = start
-
for node in self.nodelist:
-
if hasattr(node, "source"):
- stringNode, codeRange = node.source
- end = codeRange[1]
-
if hasattr(node, "source"):
- return code.source[start:end]
Instead, you’ll need to use the “templatetag” template tag to display braces in code snippets:
- <p>Here’s an example of a Django tag {% templatetag openblock %} link whitehouse {% templatetag closeblock %}.</p>
Code to be displayed is put between a {% code %} and {% endcode %} tag. The only code that won’t be able to be displayed is the code that ends the block are Django tags.
Otherwise, this is pretty straightforward. The “createList” method loops through every line; if a line is indented further than expected, createList calls itself with all of the sub-lines and increments the expected indentation level. Sub-levels are added as childNodes to the most recent element (which should always be an <LI>). Blank lines mean that the next list item is marked with the class “section”, which can be styled in the CSS file to put extra space in front of it.
When the overall list returns to the render method, it adds the “code” class to the top-level element (in this case, a <UL>). The whole thing is converted from XML nodes to a string of XHTML.
It can start mid-block as well; if it encounters immediate indentation, it will recurse until it gets to content, and then make up empty list items to hold the lists.
[toggle code]
-
-
- markSection, subUL = self.createList(document, subLines, listLevel+1)
- ul.childNodes[-1].appendChild(subUL)
- #what's left is text; create text node and put it in an LI
- if cleanLine:
-
If this works, I’ll have to take a look at some of the syntax highlighters available for Python, such as Pygments.
- Django using too much memory? Turn off DEBUG=True!
- DEBUG=False can save hundreds of megabytes in Django command-line scripts, and probably in Django web processes.
- Excerpting partial XHTML using minidom
- You can use xml.dom.minidom to parse partial XHTML as long as you use a few tricks and don’t mind that getElementById doesn’t work.
- Pygments
- “Pygments is a generic syntax highlighter for general use in all kinds of software such as forum systems, wikis or other applications that need to prettify source code.”
- Representing code in HTML
- A minor epiphany that may not be new to others on how to display programming and HTML code in HTML.
More Django
- Converting an existing Django model to Django-MPTT
- Using a SQL database to mimic a filesystem will, eventually, create bottlenecks when it comes to traversing the filesystem. One solution is modified preordered tree traversal, which saves the tree structure in an easily-used manner inside the model.
- Two search bookmarklets for Django
- Bookmarklets—JavaScript code in a bookmark—can make working with big Django databases much easier.
- Fixing Django’s feed generator without hacking Django
- It looks like it’s going to be a while before the RSS feed generator in Django is going to get fixed, so I looked into subclassing as a way of getting a working guid in my Django RSS feeds.
- ModelForms and FormViews
- This is just a notice because when I did a search, nothing came up. Don’t use ModelForm with FormView, use UpdateView instead.
- Django: fix_ampersands and abbreviations
- The fix_ampersands filter will miss some cases where ampersands need to be replaced.
- 29 more pages with the topic Django, and other related pages
More XML
- Catalina: iTunes Library XML
- What does Catalina mean for 42 Astounding Scripts?
- Parsing JSKit/Echo XML using PHP
- In the comments, dpusa wants to import JSKit comments into WordPress, which uses PHP. Here’s how to parse them using PHP.
- Parsing JSKit/Echo XML comments files
- While I’m not a big fan of remote comment systems for privacy reasons, I was willing to use JSKit as a temporary solution because they provide an easy XML dump of posted comments. This weekend, I finally moved my main blog to custom comments; here’s how I parsed JSKit’s XML file.
- Auto-closing HTML tags in comments
- One of the biggest problems on blogs is that comments often get stuck with unclosed italics, bold, or links. You can automatically close them by transforming the HTML snippet into an XML document.
- minidom self-closes empty SCRIPT tags
- Python’s minidom will self-close empty script tags—as it should. But it turns out that Firefox 3.6 and IE 8 don’t support empty script tags.
- Five more pages with the topic XML, and other related pages
June 13, 2009: I’ve modified the rawNode method to more reliably (I hope) get the full raw source of the tag.
June 14, 2009: It turns out that the raw source is only available when DEBUG=True is on. That’s too bad.