Multiple column PDF generation in Python
My first step was to generate PDF in Snakelets. It wasn’t particularly useful PDF, but it showed that PDF generation was reasonably easy using ReportLab toolkit.
The second step was to embed Python in Django by combining Mako and Django templates.
The final step is to generate multi-column PDF with common content on each page. I’m going to step back into my Snakelets testbed to get this working, and then use what I learn to create the Django version.
The process of creating multiple column layouts in Platypus is conceptually quite simple. First, you create the page type (such as letter size, or landscape-oriented A4). Second, you place frames on the page. And finally, you pour some text into the page. Platypus will flow it into each frame in order, creating new pages with the same layout when the current page fills up.
Along the way, we can give the page more than one layout and control which one is used; and we can tell the page to let us know whenever a new page is created, so that we can add common elements to each page, such as a page number.
Multiple columns
Multiple columns turned out to be easier than I expected. Instead of using the SimpleDocTemplate as in the previous examples, we need to use BaseDocTemplate and create our own frames using Platypus. Platypus will flow the text into each frame as necessary.
I completely re-wrote the makePDF method on the PDF class of my Snakelets blog. First thing, of course, is to import the appropriate libraries.
- import reportlab.platypus as platypus
- import reportlab.lib
The PDF class is now very similar to the previous one, except that it uses BaseDocTemplate instead of SimpleDocTemplate.
[toggle code]
- #creates a PDF version of the site
-
class PDF(Blog):
-
def serve(self, request, response):
- response.setContentType("application/pdf")
- #display it normally in the browser if possible, but if they save it give it a useful name
- response.setHeader("Content-Disposition", "inline; filename=Second Chances.pdf")
- out=response.getOutput()
- self.makePDF(out)
-
def serve(self, request, response):
I’ve added a content-disposition header. Instead of attachment, as you normally put into content-disposition, I’ve specified that the file should be displayed normally (which on some browsers will still be as an attachment if they don’t have the capability to inline-view PDF files). But by giving it a filename, if they do save the PDF they should get that as the default filename for the file.
[toggle code]
-
-
def makePDF(self, destination):
- posts = []
- #let's set up some styles
- inch = reportlab.lib.units.inch
- style = reportlab.lib.styles.getSampleStyleSheet()
- #article styles
- headlineStyle = style["Heading2"]
- paraStyle = style["Normal"]
- paraStyle.spaceAfter = inch*.04
- paraStyle.alignment=reportlab.lib.enums.TA_JUSTIFY
- #go through each blog post and store its title and content
-
for postKey in self.app.posts.posts:
- #get the post
- post = self.app.posts.posts[postKey]
- body = post.body
- title = post.title
- #the parts of each post will need to be kept together
- items = []
- headline = platypus.Paragraph(title, headlineStyle)
- items.append(headline)
-
for paragraph in body.split("\n"):
- #I have never understood encode/decode
- #and have no idea why this is the correct decoding on my Mac
- para = paragraph.decode("cp1252")
- para = platypus.Paragraph(para, paraStyle)
- items.append(para)
- item = platypus.KeepTogether(items)
- posts.append(item)
-
def makePDF(self, destination):
All that the above code does is loop through each post, and add it to the list of items that will need to be on the page. Each post is held together by the KeepTogether function, so that a post will not break across page—or, in this case, frame—boundaries.
[toggle code]
-
-
- #create the basic page and frames
- document = platypus.BaseDocTemplate(destination, pagesize=reportlab.lib.pagesizes.letter)
- frameCount = 2
- frameWidth = document.width/frameCount
- frameHeight = document.height-.05*inch
- frames = []
- #construct a frame for each column
-
for frame in range(frameCount):
- leftMargin = document.leftMargin + frame*frameWidth
- column = platypus.Frame(leftMargin, document.bottomMargin, frameWidth, frameHeight)
- frames.append(column)
- template = platypus.PageTemplate(frames=frames)
- document.addPageTemplates(template)
- document.build(posts)
- return
-
I’ve made the number of frames flexible so that I can also create three or four columns by changing one variable. This will be useful when I switch to the restaurant listing. However many frames we want, the code loops through and sets the left margin to march across the page; the bottom margin, width, and height are the same for each frame.
The frames need to be combined into a template using PageTemplate, and the template then needs to be added to the document using addPageTemplates. Finally, the code builds the document as normal. Here’s what that code creates:
A different first page
You’ll notice that the function to add a template to a document is plural. We can add as many templates as we want, and choose the appropriate one for the next page.
When we have more than one template, we need to give each template an ID so that we can refer to it elsewhere.
For the blog, wouldn’t it be nice to have a title page that contains the title and subtitle of the blog, with only short columns below the title?
First we’ll need to set up some styles for the title page. Above the “article styles” section, add:
- #title page styles
- titleStyle = style["Title"]
- titleStyle.fontSize=40
- titleStyle.leading = titleStyle.fontSize*1.1
- #need to copy the object or style changes will apply to any incarnation of "Normal"
- subTitleStyle = copy.copy(style["Normal"])
- subTitleStyle.alignment=reportlab.lib.enums.TA_CENTER
- subTitleStyle.fontName="Times-Italic"
Also, import the copy module:
- import copy
We need to copy the Normal style, because these styles are all objects. In Python, when you use “=” with an object on the right, you’re getting a reference to that object. But we’re already using the Normal style for our standard paragraphs. Without copying the object, any changes we make to subTitleStyle will also apply to paraStyle, and vice versa.
Now we need to add the title and subtitle to the list of items that need to appear in our document. Above the for loop that loops through each post, add:
- #now create the title page
- title = self.getTitle()
- description = self.getDescription()
- posts.append(platypus.Paragraph(title, titleStyle))
- posts.append(platypus.Paragraph(description, subTitleStyle))
- #done with the title info, move to the next frame and queue up the later page template
- posts.append(platypus.FrameBreak())
- posts.append(platypus.NextPageTemplate("laterPages"))
This will add the title and subtitle using the styles we’ve set up. Then, it appends them to the “posts” list. The trick here is, we’re going to have two different templates. One template is for the first page (the title page), and the other template will be for all of the rest of the pages. We’ll call that second template “laterPages”.
The first page template will have an extra frame at the top for the headline. The FrameBreak line here adds a frame break immediately after the subtitle, so that only the title and subtitle go into that first frame. The NextPageTemplate tells Platypus that, the next time it breaks the page, it should use the template called “laterPages”.
We set up the first page template right along with the standard template. Above the “construct the column frames” for loop, add:
- #title page frames
- firstPageHeight = 3.5*inch
- firstPageBottom = frameHeight-firstPageHeight
- framesFirstPage = []
- titleFrame = platypus.Frame(document.leftMargin, firstPageBottom, document.width, firstPageHeight)
- framesFirstPage.append(titleFrame)
This calculates how much space the title frame will use, and how much space is left for the columns, on the title page. Inside the “construct the column frames” for loop, add:
- #columns for the first page
- firstPageColumn = platypus.Frame(leftMargin, document.bottomMargin, frameWidth, firstPageBottom)
- framesFirstPage.append(firstPageColumn)
So now, in addition to creating a set of frames in the “frames” list, that span from the top and bottom of the page, we’re also creating a set of frames in the “framesFirstPage” list, which span only from the bottom of the title to the bottom of the page.
Finally, we’re now using more than one template, so we have to set up that list. Replace the “template=” line and the addPageTemplates line with:
- templates = []
- templates.append(platypus.PageTemplate(frames=framesFirstPage, id="firstPage"))
- templates.append(platypus.PageTemplate(frames=frames, id="laterPages"))
- document.addPageTemplates(templates)
Instead of giving addPageTemplates a single template, we’re giving it a list of templates. The first template is made of the frames in the framesFirstPage list, and the second is the same template we made previously. Each template has an ID.
This should produce the following PDF:
It’s beginning to look like a perfectly good PDF document.
A common header
Most long documents will have page numbers, or some kind of header or footer. This is not something that can be done on the template, however, because it tends to change across each page. What we can do is add a marker to each template for which we need special treatment, telling Platypus to call a special function every time a new page is created using that template.
To the laterPages template, add “onPage=self.addHeader”:
- templates.append(platypus.PageTemplate(frames=frames, id="laterPages", onPage=self.addHeader))
This tells Platypus to call the addHeader method whenever a new page is created from the laterPages template. For the addHeader method, add this method to the PDF class:
[toggle code]
- #display the title of the blog and the current page
-
def addHeader(self, canvas, document):
- canvas.saveState()
- title = self.getTitle()
- fontsize = 12
- fontname = 'Times-Roman'
- headerBottom = document.bottomMargin+document.height+document.topMargin/2
- bottomLine = headerBottom - fontsize/4
- topLine = headerBottom + fontsize
- lineLength = document.width+document.leftMargin
- canvas.setFont(fontname,fontsize)
-
if document.page % 2:
-
#odd page: put the page number on the right and align right
- title += "-" + str(document.page)
- canvas.drawRightString(lineLength, headerBottom, title)
-
#odd page: put the page number on the right and align right
-
else:
- #even page: put the page number on the left and align left
- title = str(document.page) + "-" + title
- canvas.drawString(document.leftMargin, headerBottom, title)
- #draw some lines to make it look cool
- canvas.setLineWidth(1)
- canvas.line(document.leftMargin, bottomLine, lineLength, bottomLine)
- canvas.line(document.leftMargin, topLine, lineLength, topLine)
- canvas.restoreState()
When Platypus calls this method, it will send us the canvas for the page, and the document that it’s writing to the page. Judging from the sample code out there, you’ll want to always begin your method by saving the canvas’s state, and end it by restoring the canvas’s state. I think that this ensures that if some other function is still using the canvas, we don’t end up inadvertently changing the font name, font size, or other settings under its nose.
This method works differently from the page itself. On the page we used what ReportLab calls flowables, to dynamically let the text flow where it needs to be. Here, we’re specifying exactly where the text will display.
On odd pages, we use the drawRightString method to align the header to the right. On even pages, we use drawString to align to the left.
In both cases we get the current page number from document.page.
On all pages, we draw a line across the top of the header and across the bottom of the header, just because it looks cool (and because I needed to know I could do it for my restaurants listing). You can also draw rectangles, circles, and curves. The user guide on the ReportLab Toolkit site explains how they work.
This is what the final PDF, with headers, looks like:
The After-midnight version
Using this in Django for my restaurant database was simple enough. The only problem I ran into is that, in my Django databases, I convert every special character into its HTML entity for flexibility. Platypus appears to understand only a very few entities, and it throws out any entity it doesn’t understand. The only fix I could come up with was to hand-edit either reportlab/platypus/paraparser.py to add the entities I needed to the greek dict, or, as I ended up doing, hand-editing reportlab/lib/xmllib.py to add them to ENTITYDEFS.
For example, currently restaurant names contain right single quotes and accented e’s. Until I edited xmllib.py, those characters disappeared from the final PDF.
[toggle code]
-
ENTITYDEFS = {
- 'lt': '<',
- 'gt': '>',
- 'amp': '&',
- 'quot': '"',
- 'apos': '\'',
- 'rsquo': '\xe2\x80\x99',
- 'eacute': '\xc3\xa9',
- }
There’s probably a better way of getting Platypus to recognize these entities, but I couldn’t find any mention of this in the documentation.
[toggle code]
- {% mako %}
- <%
- import reportlab.platypus as platypus
- import reportlab.lib
- import StringIO
- destination = StringIO.StringIO()
- parts = []
- #downsize the default styles
-
def adjustStyles(style):
- reduction = .7
- style.fontSize=style.fontSize*reduction
- style.leading = style.leading*reduction
- style.spaceAfter = style.spaceAfter*reduction
- style.spaceBefore = style.spaceBefore*reduction
- style.leftIndent = style.leftIndent*reduction
- #display the title and hostname of the site
-
def addHeader(canvas, document):
- canvas.saveState()
- title = "San Diego After Midnight"
- hostname = "DiningAfterMidnight.com"
- fontsize = 12
- fontname = 'Times-Roman'
- headerBottom = document.bottomMargin+document.height-document.topMargin
- bottomLine = headerBottom - fontsize/4
- topLine = headerBottom + fontsize
- lineLength = document.width+document.leftMargin
- canvas.setFont(fontname,fontsize)
- canvas.drawString(document.leftMargin, headerBottom, title)
- canvas.drawRightString(lineLength, headerBottom, hostname)
- #draw some lines to make it look cool
- canvas.line(document.leftMargin, bottomLine, lineLength, bottomLine)
- canvas.line(document.leftMargin, topLine, lineLength, topLine)
- canvas.restoreState()
- #let's set up some styles
- inch = reportlab.lib.units.inch
- style = reportlab.lib.styles.getSampleStyleSheet()
- #styles
- areaStyle = style["Heading1"]
- areaStyle.spaceAfter = 0
- areaStyle.spaceBefore = 4
- adjustStyles(areaStyle)
- nameStyle = style["Heading2"]
- nameStyle.fontName="Times-Italic"
- nameStyle.spaceAfter = 0
- nameStyle.spaceBefore = 2
- nameStyle.leftIndent=.13*inch
- adjustStyles(nameStyle)
- infoStyle = style["Normal"]
- infoStyle.leftIndent=.25*inch
- infoStyle.leading = infoStyle.leading*.85
- adjustStyles(infoStyle)
- #go through each restaurant and store its info
- area = ""
-
for restaurant in restaurants:
- #don't break a restaurant apart
- items = []
- #the area should be accompanied by at least one restaurant in its column
-
if restaurant.area.name != area:
- area = restaurant.area.name
- items.append(platypus.Paragraph(area, areaStyle))
- items.append(platypus.Paragraph(restaurant.name, nameStyle))
- items.append(platypus.Paragraph(restaurant.address(), infoStyle))
- items.append(platypus.Paragraph(restaurant.phone, infoStyle))
- items.append(platypus.Paragraph(restaurant.typeline(), infoStyle))
- items.append(platypus.Paragraph(restaurant.hoursline(), infoStyle))
- item = platypus.KeepTogether(items)
- parts.append(item)
- #create the basic page
- pagesize =reportlab.lib.pagesizes.landscape(reportlab.lib.pagesizes.letter)
- leftMargin = .25*inch
- rightMargin = .25*inch
- topMargin = .25*inch
- bottomMargin = .25*inch
- document = platypus.BaseDocTemplate(destination, pagesize=pagesize, leftMargin=leftMargin, rightMargin=rightMargin, topMargin=topMargin, bottomMargin=bottomMargin)
- #create the frames
- frameCount = 3
- frameWidth = document.width/frameCount
- #leave space for the header
- frameHeight = document.height-.25*inch
- frames = []
- #construct the column frames
-
for frame in range(frameCount):
- leftMargin = document.leftMargin + frame*frameWidth
- column = platypus.Frame(leftMargin, document.bottomMargin, frameWidth, frameHeight)
- frames.append(column)
- document.addPageTemplates(platypus.PageTemplate(frames=frames, onPage=addHeader))
- document.build(parts)
- pdfpage = destination.getvalue().decode('utf8', 'ignore')
- %>
- ${pdfpage}
- {% endmako %}
This uses the “mako” template tag I described in Embedding Mako into Django to embed Python directly into a Django template. You can see the result of this script at the Dining After Midnight web site.
The only new thing here is the adjustStyles function. I added that so that I could use the basic sizes and distances in the sample template but easily adjust them if the number of restaurants grows. I should really try to tie it to the number of columns. I could then check the final page to see if it has exceeded two pages, and if so, run through again with a smaller font size and/or an extra column.
- Python PDF generation with Snakelets
- One of the things I need to do to move my current web site over to Django is be able to automatically generate PDF documents. Step is to learn how to generate PDF using Python.
- ReportLab Toolkit
- “The ReportLab Open Source PDF library is a proven industry-strength PDF generating solution, that you can use for meeting your requirements and deadlines in enterprise reporting systems.”
- San Diego After Midnight
- There is nothing quite like the hunger you get at three in the morning when everyone else has gone to sleep. If you’re hanging with the late crowd in San Diego, come and see where you can time out for a bite after midnight!
- Snakelets
- “Snakelets is a very simple-to-use Python web application server. It provides a threaded web server, Ypages (Python HTML template language) and Snakelets: code-centric page request handlers. Snakelet’s focus is to make the creation of dynamic web sites as quick and easy as possible.”
- Django
- “Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design.” Oh, the sweet smell of pragmatism.
More Django
- Converting an existing Django model to Django-MPTT
- Using a SQL database to mimic a filesystem will, eventually, create bottlenecks when it comes to traversing the filesystem. One solution is modified preordered tree traversal, which saves the tree structure in an easily-used manner inside the model.
- Two search bookmarklets for Django
- Bookmarklets—JavaScript code in a bookmark—can make working with big Django databases much easier.
- Fixing Django’s feed generator without hacking Django
- It looks like it’s going to be a while before the RSS feed generator in Django is going to get fixed, so I looked into subclassing as a way of getting a working guid in my Django RSS feeds.
- ModelForms and FormViews
- This is just a notice because when I did a search, nothing came up. Don’t use ModelForm with FormView, use UpdateView instead.
- Django: fix_ampersands and abbreviations
- The fix_ampersands filter will miss some cases where ampersands need to be replaced.
- 29 more pages with the topic Django, and other related pages
More PDF
- Creating searchable PDFs in Ventura
- My searchablePDF script’s behavior changed strangely after upgrading to Ventura. All of the pages are generated at extremely low quality. This can be fixed by generating a JPEG representation before generating the PDF pages.
- Create searchable PDFs in Swift
- This Swift script will take a series of image scans, OCR them, and turn them into a PDF file with a simple table of contents and searchable content—with the original images as the visually readable content.
- Quality compressed PDFs in Mac OS X Lion
- The instructions for creating a “reduce PDF file size” filter in Lion are the same as for earlier versions of Mac OS X—except that for some reason ColorSync saves the filter in the wrong place (or, I guess, Preview is looking for them in the wrong place).
- Calculating true three-fold PDF in Python
- Calculating a true three-fold PDF requires determining exactly where the folds should occur.
- Adding links to PDF in Python
- It is very easy to add links to PDF documents using reportlab or platypus in Python.
- Four more pages with the topic PDF, and other related pages
More Python
- Quick-and-dirty old-school island script
- Here’s a Python-based island generator using the tables from the Judges Guild Island Book 1.
- Astounding Scripts on Monterey
- Monterey removes Python 2, which means that you’ll need to replace it if you’re still using any Python 2 scripts; there’s also a minor change with Layer Windows and GraphicConverter.
- Goodreads: What books did I read last week and last month?
- I occasionally want to look in Goodreads for what I read last month or last week, and that currently means sorting by date read and counting down to the beginning and end of the period in question. This Python script will do that search on an exported Goodreads csv file.
- Test classes and objects in python
- One of the advantages of object-oriented programming is that objects can masquerade as each other.
- Timeout class with retry in Python
- In Paramiko’s ssh client, timeouts don’t seem to work; a signal can handle this—and then can also perform a retry.
- 30 more pages with the topic Python, and other related pages
More Snakelets
- Python PDF generation with Snakelets
- One of the things I need to do to move my current web site over to Django is be able to automatically generate PDF documents. Step is to learn how to generate PDF using Python.
- Quick & dirty Snakelets “blog”
- This “No Second Chances” blog engine was fun to write during spare time at ETech 2007. Snakelets appears to be a useful Python webapp server if you need a webapp server immediately.