Untying URLs from the filesystem in Apache
Often, your web pages—or, what your visitors think of as your web pages—don’t exist on the file system. They exist in a database somewhere, or they exist as feeds from other sources. The easy way to handle these kinds of pages is to use the query string; the common form is an id and a number, such as http://www.example.com/articles/?id=677.
The problem with such URLs, though, is that URLs are still not completely invisible to our visitors; they show up in histories and searches, and ?id=677 doesn’t provide any context. You can get around that by using a slug instead of an id number; something like http://www.example.com/articles/?slug=untying-urls-filesystem-apache. This works, and is often the best solution by virtue of its simplicity on your end.
But sometimes you don’t want to expose that query string. You want these pages to look like any other URL. In the above example, you want it to be http://www.example.com/articles/untying-urls-filesystem-apache.
By default, Apache expects that the above URL matches a file on the filesystem called “untying-urls-filesystem-apache” in the folder “articles” in the document folder of the site. There are ways around this, but not, as far as I’m aware, with paths that don’t end in a scripting language’s extension. For example, if you don’t mind “articles” not looking like a folder, you can use URLs like http://www.example.com/articles.php/untying-urls-filesystem-apache. Apache will recognize that article.php refers to a file capable of scripting, and will call it.1
But just putting an index.php file inside of a folder called “articles” doesn’t trigger this behavior.2
This is where Apache’s Rewrite Engine comes in. We can place a .htaccess file in the article’s folder that tells Apache to use index.php for any requests that appear to be under this folder, no matter how deep.
- RewriteEngine On
- #this sets the path base so that it's the same no matter what the visitor's browser is asking for
- RewriteBase /
- #don't rewrite the index.php file to itself
- RewriteCond %{REQUEST_URI} !^/articles/index.php
- #wherever they are, rewrite them to the real folder
- RewriteRule . /articles/ [L]
Now, you can go to http://www.example.com/articles/untying-urls-filesystem-apache and if you put phpinfo() in the index.php file, you’ll see that the server variable REDIRECT_URL exists and it contains “/articles/untying-urls-filesystem-apache”.
From here, if you only need the one slug following /articles/ you could just strip the start of the string:
- $path = $_SERVER['REDIRECT_URL'];
- $path = substr($path, 10);
Or, if you might have more than one part to the path, you can use explode:
- $path = $_SERVER['REDIRECT_URL'];
- $parts = explode('/', $path);
If the visitor asks for http://www.example.com/articles/paintings/matisse/, REDIRECT_URL will contain “/articles/paintings/matisse/”. Exploding that on “/” will return an array that has ‘paintings’ in index 2 and ‘matisse’ in index 3.
Preserving filesystem pages
But what if you also have some real files in that folder, and you want them to display? Add these conditions after the one about not rewriting index.php:
- #if the file or directory that the visitor is asking for actually exists, then don't do any rewriting.
- #that is, only rewrite if the requested filename does not exist.
- RewriteCond %{REQUEST_FILENAME} !-f
- RewriteCond %{REQUEST_FILENAME} !-d
This will cancel rewrite if REQUEST_FILENAME exists as either a regular file or a directory.
This feature makes relative URLs of the form “../images/image.png” for all practical purposes useless on Apache web servers. Browsers still see those post-file slashes as representing directory levels and uses them to determine the URL that the image resides at. All it takes is one typo by someone anywhere linking to your page, and search engines will pick up this “alternative” URL to your page.
↑This is why you sometimes see URLs of the form http://www.example.com/articles/index.php/slug. By including index.php in the URL, this triggers Apache’s scripting behavior.
Which seems to me to take the worst of each of these solutions. Not only are you tying yourself to having .php in your scriptname, you’re tying yourself to using “index” as your index name, and it’s an ugly URL to boot.
↑
More RewriteEngine
- mod_rewrite cheat sheet
- Apache’s mod_rewrite is very useful, and very poorly documented. The documentation’s there, it just isn’t easy to read. Dave Child has written a very useful cheat sheet for mod_rewrite.
Handy tips.
Possibly worth noting that if you have permission to change the apache config files, the above can be accomplished there in a <directory> sectional directive. Server performance is improved if .htaccess is turned off, as the server does not have to check the directory hierarchy for .htaccess files on each request.
Jerry Seeger at 7:55 p.m. August 22nd, 2010
IqnSM
Definitely worth noting. My current hosting service doesn’t give access to do this, but I think my next one will.
capvideo at 6:04 a.m. August 23rd, 2010
tVAhq