Automatically distributing images within XHTML

Jerry Stratton, August 20, 2009

The ability to safely and surely parse XHTML makes it easy to automate some boring tasks. For example, in my movie reviews I usually provide a handful of stills from the movie I’m reviewing. I don’t really care where they go on the page, just that they should be relatively evenly distributed.

When I first started including images in my reviews back in 2001, I was just using soupy HTML. I automated image distribution by counting up the number of “paragraphs” and hoping that the image didn’t fall into a sidebar or table. If the image did, then I’d either change the review so that the image-unsafe code section moved, or I’d switch the review to manual mode.

Now that I’m using XHTML, I don’t have to worry: I can parse the XML and loop through the top-level elements.

As I did in Excerpting partial XHTML using minidom, in order to parse loose XHTML it needs to be surrounded with a single element (I’m using a div) and the ampersands need to be encoded. Since I’m obviously going to be doing this for more than one purpose, it needs to be a function:

[toggle code]

def parseLooseXHTML(content):
- content = '<div>' + content + '</div>'
- content = content.encode("utf-8")
- content = content.replace('&', '&')
- xhtml = minidom.parseString(content).childNodes[0]
- return xhtml

After that, it’s a simple process of taking some XHTML content and a list of media and looping:

[toggle code]

#insert automatic media between top-level HTML
def simplemedia(content, media):
- mediaCount = len(media)
- if not mediaCount:
  - return content
- currentMedia = 0
- characterCount = len(content)
- currentCharacter = 0
- xhtml = parseLooseXHTML(content)
- htmlParts = []
- for tag in xhtml.childNodes:
  - tagText = getElementText(tag)
  - if currentMedia < mediaCount:
    - if currentCharacter >= characterCount*currentMedia/mediaCount:
      - if currentMedia % 2:
        
        mediaClass = ["pulleven"]
      - else:
        
        mediaClass = ["pullodd"]
      - mediaHolder = media[currentMedia]
      - if mediaHolder.style:
        
        mediaClass.append(mediaHolder.style.className)
      - mediaClass = ' '.join(mediaClass)
      - imageContext = {'link': mediaHolder.linkHTML(embed=True), 'style': mediaClass, 'caption': mediaHolder.caption}
      - htmlParts.append(render_to_string("parts/image_pull.html", imageContext))
      - currentMedia = currentMedia + 1
    - currentCharacter = currentCharacter + len(tagText)
  - htmlParts.append(tagText)
- content = "\n".join(htmlParts)
- return content

Each item in the list of media is an object that knows how to create its display HTML (method: linkHTML), and that contains properties for various parts of the media, such as the caption, any custom style, the title, and the URL.

I’m using Django, so I can use render_to_string to render a template using a dict of items. The template looks like this:

[toggle code]

<div class="imagepull {{ style }}">
- {{ link }}
- {% if caption %}
  - <p class="caption">{{ caption }}</p>
- {% endif %}
</div>

You could do the same thing with Mako or other templating systems.

And I’m using the same getElementText that I used in Excerpting XHTML:

[toggle code]

#clean an XHTML snippet and return its useful text
def getElementText(element):
- return element.toxml().strip().replace('&', '&')

The simplemedia function keeps track of the size of each element as it loops, so that larger elements count for more than smaller elements when distributing the images or other media. And I get nicely spaced graphics interspersed throughout my reviews, or any other page that uses images that don’t need to be precisely placed.

Django: “Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design.” Oh, the sweet smell of pragmatism.
Excerpting partial XHTML using minidom: You can use xml.dom.minidom to parse partial XHTML as long as you use a few tricks and don’t mind that getElementById doesn’t work.
Mako: “Mako is an embedded Python language, which refines the familiar ideas of componentized layout and inheritance to produce one of the most straightforward and flexible models available, while also maintaining close ties to Python calling and scoping semantics.”
Movie and DVD Reviews: The best and not-so-best movies available on DVD, and whatever else catches my eye.

More XML

Catalina: iTunes Library XML: What does Catalina mean for 42 Astounding Scripts?
Parsing JSKit/Echo XML using PHP: In the comments, dpusa wants to import JSKit comments into WordPress, which uses PHP. Here’s how to parse them using PHP.
Parsing JSKit/Echo XML comments files: While I’m not a big fan of remote comment systems for privacy reasons, I was willing to use JSKit as a temporary solution because they provide an easy XML dump of posted comments. This weekend, I finally moved my main blog to custom comments; here’s how I parsed JSKit’s XML file.
Auto-closing HTML tags in comments: One of the biggest problems on blogs is that comments often get stuck with unclosed italics, bold, or links. You can automatically close them by transforming the HTML snippet into an XML document.
minidom self-closes empty SCRIPT tags: Python’s minidom will self-close empty script tags—as it should. But it turns out that Firefox 3.6 and IE 8 don’t support empty script tags.
Five more pages with the topic XML, and other related pages

Comments?

The undiscovered comment form, whose bourn no poster returns.

Your email, URL, and location are optional—but I won’t be able to contact you if you don’t leave a working email. Your email does not get displayed, your URL and location do. Your name is required but may vary as the needs of the day demand, or you can just use the anonymous Hark Thrice name. You can use the following tags: <em>, <a>, <blockquote>. Use them wisely and post intelligently. Comments may take some time to approve, especially if I’m stuck in a Mexican jail.

If you have private comments, or questions about this page, please, leave a message on the Negative Space Comments Page.

Lost?

If you’re looking for something here, use the search box in the navigation to limit your search to this part of the site, or use the Negative Space search page.

Jerry

Madam when I want to read a book, I write one. — Samuel Johnson (Benjamin Disraeli Letters: 1815-1834)

Contents of Negative Space™ as a whole Copyright © 1994-2025 Jerry Stratton. Individual copyrights remain held by their respective authors unless they specify otherwise. Site titles, such as Negative Space, Strange Bedfellows, Biblyon Broadsheet, Highland Games, and FireBlade Coffeehouse are trademarks of Jerry Stratton.

Code and code snippets, to the extent that they are copyrightable, may be re-distributed under the terms of the GNU General Public License 3.

Automatically distributing images within XHTML last modified August 14th, 2009.

Your comment
Your name
Your email
Your web page
Your location

Mimsy Were the Borogoves

Automatically distributing images within XHTML

More XML

Editorials

Books, Movies, & Music

Technology & Hacks

Food

42 Astounding Scripts

Walkerville Reader

Biblyon Broadsheet

About Mimsy

Comments?

Lost?

Mimsy Were the Borogoves

Automatically distributing images within XHTML

More XML

Editorials

Books, Movies, & Music

Technology & Hacks

Food

42 Astounding Scripts

Walkerville Reader

Biblyon Broadsheet

Blogroll

Keep in touch

About Mimsy

Comments?

Lost?