Automatically distributing images within XHTML
The ability to safely and surely parse XHTML makes it easy to automate some boring tasks. For example, in my movie reviews I usually provide a handful of stills from the movie I’m reviewing. I don’t really care where they go on the page, just that they should be relatively evenly distributed.
When I first started including images in my reviews back in 2001, I was just using soupy HTML. I automated image distribution by counting up the number of “paragraphs” and hoping that the image didn’t fall into a sidebar or table. If the image did, then I’d either change the review so that the image-unsafe code section moved, or I’d switch the review to manual mode.
Now that I’m using XHTML, I don’t have to worry: I can parse the XML and loop through the top-level elements.
As I did in Excerpting partial XHTML using minidom, in order to parse loose XHTML it needs to be surrounded with a single element (I’m using a div) and the ampersands need to be encoded. Since I’m obviously going to be doing this for more than one purpose, it needs to be a function:
[toggle code]
-
def parseLooseXHTML(content):
- content = '<div>' + content + '</div>'
- content = content.encode("utf-8")
- content = content.replace('&', '&')
- xhtml = minidom.parseString(content).childNodes[0]
- return xhtml
After that, it’s a simple process of taking some XHTML content and a list of media and looping:
[toggle code]
- #insert automatic media between top-level HTML
-
def simplemedia(content, media):
- mediaCount = len(media)
-
if not mediaCount:
- return content
- currentMedia = 0
- characterCount = len(content)
- currentCharacter = 0
- xhtml = parseLooseXHTML(content)
- htmlParts = []
-
for tag in xhtml.childNodes:
- tagText = getElementText(tag)
-
if currentMedia < mediaCount:
-
if currentCharacter >= characterCount*currentMedia/mediaCount:
-
if currentMedia % 2:
- mediaClass = ["pulleven"]
-
else:
- mediaClass = ["pullodd"]
- mediaHolder = media[currentMedia]
-
if mediaHolder.style:
- mediaClass.append(mediaHolder.style.className)
- mediaClass = ' '.join(mediaClass)
- imageContext = {'link': mediaHolder.linkHTML(embed=True), 'style': mediaClass, 'caption': mediaHolder.caption}
- htmlParts.append(render_to_string("parts/image_pull.html", imageContext))
- currentMedia = currentMedia + 1
-
if currentMedia % 2:
- currentCharacter = currentCharacter + len(tagText)
-
if currentCharacter >= characterCount*currentMedia/mediaCount:
- htmlParts.append(tagText)
- content = "\n".join(htmlParts)
- return content
Each item in the list of media is an object that knows how to create its display HTML (method: linkHTML), and that contains properties for various parts of the media, such as the caption, any custom style, the title, and the URL.
I’m using Django, so I can use render_to_string to render a template using a dict of items. The template looks like this:
[toggle code]
-
<div class="imagepull {{ style }}">
- {{ link }}
-
{% if caption %}
- <p class="caption">{{ caption }}</p>
- {% endif %}
- </div>
You could do the same thing with Mako or other templating systems.
And I’m using the same getElementText that I used in Excerpting XHTML:
[toggle code]
- #clean an XHTML snippet and return its useful text
-
def getElementText(element):
- return element.toxml().strip().replace('&', '&')
The simplemedia function keeps track of the size of each element as it loops, so that larger elements count for more than smaller elements when distributing the images or other media. And I get nicely spaced graphics interspersed throughout my reviews, or any other page that uses images that don’t need to be precisely placed.
- Django
- “Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design.” Oh, the sweet smell of pragmatism.
- Excerpting partial XHTML using minidom
- You can use xml.dom.minidom to parse partial XHTML as long as you use a few tricks and don’t mind that getElementById doesn’t work.
- Mako
- “Mako is an embedded Python language, which refines the familiar ideas of componentized layout and inheritance to produce one of the most straightforward and flexible models available, while also maintaining close ties to Python calling and scoping semantics.”
- Movie and DVD Reviews
- The best and not-so-best movies available on DVD, and whatever else catches my eye.
More XML
- Catalina: iTunes Library XML
- What does Catalina mean for 42 Astounding Scripts?
- Parsing JSKit/Echo XML using PHP
- In the comments, dpusa wants to import JSKit comments into WordPress, which uses PHP. Here’s how to parse them using PHP.
- Parsing JSKit/Echo XML comments files
- While I’m not a big fan of remote comment systems for privacy reasons, I was willing to use JSKit as a temporary solution because they provide an easy XML dump of posted comments. This weekend, I finally moved my main blog to custom comments; here’s how I parsed JSKit’s XML file.
- Auto-closing HTML tags in comments
- One of the biggest problems on blogs is that comments often get stuck with unclosed italics, bold, or links. You can automatically close them by transforming the HTML snippet into an XML document.
- minidom self-closes empty SCRIPT tags
- Python’s minidom will self-close empty script tags—as it should. But it turns out that Firefox 3.6 and IE 8 don’t support empty script tags.
- Five more pages with the topic XML, and other related pages