Adding links to PDF in Python
In Multi-column PDFs I went over how to add frames to a reportlab toolkit PDF document using Python. I did this for my San Diego Dining After Midnight web page. While it’s meant for printing, PDFs can contain hyperlinks, and with the growth of PDAs with quality screens, PDFs will probably become even more useful on the computer than on paper. So why not add links to the restaurants and to the map to get to the restaurant?
Adding a linked area to a PDF
There are two ways to make a link in reportlab. The first is by placing it on a rectangle on the page. For example, at the top of each page I have a header that contains the hostname for the Dining After Midnight web site. I create it using a function that draws the hostname in the upper right corner of the page.
[toggle code]
-
def addHeader(canvas, document):
- canvas.saveState()
- hostname = "DiningAfterMidnight.com"
- hostlink = "http://www." + hostname + "/"
- fontsize = 12
- fontname = 'Times-Roman'
- headerBottom = document.bottomMargin+document.height-document.topMargin
- bottomLine = headerBottom - fontsize/4
- topLine = headerBottom + fontsize
- lineLength = document.width+document.leftMargin
- canvas.setFont(fontname,fontsize)
- canvas.drawRightString(lineLength, headerBottom, hostname)
- hostnamewidth = canvas.stringWidth(hostname)
- linkRect = (lineLength, bottomLine, lineLength-hostnamewidth, topLine)
- canvas.linkURL(hostlink, linkRect)
The emphasized lines show the new code to create a link to the hostname over the hostname text. If you’re familiar with HTML, this is not like HTML at all. In HTML, we specify what text the link “belongs to”. In PDF, we specify the rectangular portion of the page that the link lives at. If we want that rectangular portion to correspond to some specific text, we need to determine where that text is and how big it is.
Since the hostname is drawn on the page using drawRightString, we already know the lower right hand corner of the text. The width is available from the stringWidth method.
Linking text in a PDF
The platypus extension to reportlab makes link creation sort of easier. Sort of, because there’s a bit of a gotcha if your PDF contains multiple markup.
The way to mark some text in platypus as being linked is to surround it with the link tag. If you are familiar with HTML, the link tag is very similar to the a tag in HTML:
- <link href="http://www.hoboes.com/Mimsy/'">Mimsy Were the Borogoves</link>
This, for example, modifies the Dining After Midnight PDF code to generate linked restaurant names and linked street addresses:
[toggle code]
- name = restaurant.name
- name = name.replace('’', '<unichar name="RIGHT SINGLE QUOTATION MARK"/>')
-
if restaurant.url:
- link = restaurant.url
-
if link.live:
- name = '<link href="' + link.get_absolute_url() + '">' + name + '</link>'
- address = restaurant.address()
- address = address.replace('’', '<unichar name="RIGHT SINGLE QUOTATION MARK"/>')
- address = '<link href="' + restaurant.mapref() + '">' + address + '</link>'
- items.append(platypus.Paragraph(name, nameStyle))
- items.append(platypus.Paragraph(address, infoStyle))
It’s simple enough. All it is is text manipulation. The “restaurant” object contains its name, its address, and a link object. And it knows how to generate a map link (mapref) to its address.
The problem is that if you have any markup or special characters already in the address or title, those, and only those, linked texts will be underlined. To the reader, it appears that some of the links are underlined, and some aren’t, which really ends up looking as if some texts aren’t linked when they really are.
The reason is that platypus automatically underlines links if other markup appears in the link text, but not if it doesn’t. This includes adding unicode characters using platypus’s unichar markup. As far as I can tell, this doesn’t look like a bug. I have no idea why it’s there, though. I can’t find any mention of why it happens or even that it happens in the documentation.
I “fixed” it by commenting out the line “tx._canvas.line(t_off+x1, y, t_off+x2, y)” on line 464 of reportlab/platypus/paragraph.py:
[toggle code]
-
for x1,x2,link in xs.links:
- #tx._canvas.line(t_off+x1, y, t_off+x2, y)
- _doLink(tx, link, (t_off+x1, y, t_off+x2, yl))
- xs.links = []
This may end up disabling some underlining capabilities in platypus, but since I don’t use underlining it’s not a problem for me. Your mileage may vary.
- February 6, 2016: Test classes and objects in python
-
One of the critical advances in computer programming since I began programming in the eighties are Objects. An Object is a thing that can include both functions and variables. In Python, an object is an instance of a class. For example, Django relies heavily on classes for its models. In Adding links to PDF in Python the main class used is for a restaurant. Each object is an instance of the restaurant class.
But one of the great things about object-oriented programming is that the things that access your objects don’t care what class it is. They care only whether it has the appropriate functions and variables. When they are on an object, a function is called a method and a variable is called a property.
Any object can masquerade as another object by providing the same methods and properties. This means that you can easily make test classes that allows creating objects for use in the PDF example.
In Adding links to PDF in Python, I had a Django model for the Restaurants and a Django model for the Links that were each restaurant’s web page. But because Django models are nothing more than (very useful) classes, you can make a fake Restaurant and fake Link to impersonate what the code snippet expects.
[toggle code]
- # in real life, the Link class would probably pull information from a database of links
- # and live would be whether it is currently a valid link,
- # and get_absolute_url would be the actual URL for that link
-
class Link():
-
def __init__(self, title):
- self.title = title
-
def live(self):
- return True
-
def get_absolute_url(self):
- return "http://www.example.com/" + self.title.replace(" ", "_")
-
def __init__(self, title):
- # in real life, the Restaurant class would probably be a table of restaurants
- # and would store the name of each restaurant, an id for the restaurant's web site
- # and the restaurant's address
-
class Restaurant():
-
def name(self):
- return "The Green Goblin"
-
def url(self):
- myURL = Link("The Green Goblin")
- return myURL
-
def address(self):
- return "1060 West Addison, Chicago, IL"
-
def mapref(self):
- return "https://www.google.com/maps/place/" + self.address().replace(" ", "+")
-
def name(self):
Save that as restaurant.py.
Objects are created from classes using:
- Multiple column PDF generation in Python
- You can use ReportLab’s Platypus to generate multi-column PDFs in Snakelets, Django, or any Python app.
- San Diego After Midnight
- There is nothing quite like the hunger you get at three in the morning when everyone else has gone to sleep. If you’re hanging with the late crowd in San Diego, come and see where you can time out for a bite after midnight!
- ReportLab Toolkit
- “The ReportLab Open Source PDF library is a proven industry-strength PDF generating solution, that you can use for meeting your requirements and deadlines in enterprise reporting systems.”
More PDF
- Creating searchable PDFs in Ventura
- My searchablePDF script’s behavior changed strangely after upgrading to Ventura. All of the pages are generated at extremely low quality. This can be fixed by generating a JPEG representation before generating the PDF pages.
- Create searchable PDFs in Swift
- This Swift script will take a series of image scans, OCR them, and turn them into a PDF file with a simple table of contents and searchable content—with the original images as the visually readable content.
- Quality compressed PDFs in Mac OS X Lion
- The instructions for creating a “reduce PDF file size” filter in Lion are the same as for earlier versions of Mac OS X—except that for some reason ColorSync saves the filter in the wrong place (or, I guess, Preview is looking for them in the wrong place).
- Calculating true three-fold PDF in Python
- Calculating a true three-fold PDF requires determining exactly where the folds should occur.
- Multiple column PDF generation in Python
- You can use ReportLab’s Platypus to generate multi-column PDFs in Snakelets, Django, or any Python app.
- Four more pages with the topic PDF, and other related pages
More Python
- Quick-and-dirty old-school island script
- Here’s a Python-based island generator using the tables from the Judges Guild Island Book 1.
- Astounding Scripts on Monterey
- Monterey removes Python 2, which means that you’ll need to replace it if you’re still using any Python 2 scripts; there’s also a minor change with Layer Windows and GraphicConverter.
- Goodreads: What books did I read last week and last month?
- I occasionally want to look in Goodreads for what I read last month or last week, and that currently means sorting by date read and counting down to the beginning and end of the period in question. This Python script will do that search on an exported Goodreads csv file.
- Test classes and objects in python
- One of the advantages of object-oriented programming is that objects can masquerade as each other.
- Timeout class with retry in Python
- In Paramiko’s ssh client, timeouts don’t seem to work; a signal can handle this—and then can also perform a retry.
- 30 more pages with the topic Python, and other related pages