Make Prettier URLs with Apache's Mod Rewrite | WebReference

Make Prettier URLs with Apache's Mod Rewrite

By Sukrit Dhandhania

We've discussed how to use the Apache module mod_rewrite to rewrite URLs in a previous article. I showed you how to setup URL rewriting using Apache and how to use it to forward a user from one web location to another. That was a pretty straightforward exercise. Now it's time to try out something a little more fancy. Let's look at how to use mod_rewrite to make prettier URLs for your web applications. Many websites on the web today make the use of dynamic URLs. It's quite likely that you have come across a web link that looks something like this - I'm referring to the section of the URL after the question mark. This is where the web application passes on information gathered earlier, quite likely using a form of some type. If you have a web application or a content management system that churns out URLs like this one, you can use Apache's ability to rewrite URLs to make it look a lot easier on the eyes, like this: Other than being better too look at, these cleaner URLs are also pretty useful for search engine optimization.

We'll use a .htaccess file to set this rule. I'm assuming that you have the mod_rewrite module setup and loaded by default in your Apache configuration. To allow mod_rewrite to do its magic you will need to enable the use of the module. Create or edit a file by the name .htaccess in the web server's directory where you want enable prettier URLs and add the following:

The "RewriteEngine On" directive will switch on the mod_rewrite engine, and the "Options +FollowSymLinks" directive is a security measure recommended if you are using mod_rewrite. It is quite likely that one or both of these directives have already been setup in your web server's main configuration file, httpd.conf. In that case, you can skip the previous step. A quick recap of what the URL rewriting directives are composed of. There are four parts to it.

You first invoke RewriteRule, which is the static syntax to create a rule. Then you enter the source URL, which is the address that will be typed by the user in her web browser. After this comes the destination URL, which is the page the web server will actually activate. Last is the optional flags section, which is where you can define the nature of the URL forwarding.

As I mentioned in my previous article about URL rewriting with Apache, a good knowledge of regular expressions, or regex, will make using mod_rewrite a breeze. If you are not too familiar with regex, not to worry, I will do some handholding in this exercise. Two very important regex characters we'll be using here are ^ and $. Put simply, ^ is regex for the beginning of the string, and $ marks the end of the string. When we use these characters in our rewrite directive these two regex characters will define the beginning and end of the string that regex needs to match with the defined conditions.

A safe way to begin defining a rewrite rule is to use the (.*) regex combination. This tells regex to match any characters. Although this quite clearly is a very loose rule, the reason it is a good starting point is because it will capture all possible entries. We will test our rule with this and then come back to this and define it in a way that we will only allow what we want to go though. Using a combination of the two sets of directives discussed above, we can construct the first part of our rewrite rule - the part where we define the regex.

The above will take the string entered and split it into two parts and save them both into "atoms", which are variables that allow you to reuse these strings in your rewrite rule. So if the input that this regex definition received was /biology/4856 it will split this string at the slashes, and the strings "biology" and "4856" will be saved separately for future use.

As I mentioned, the regex directive used above will allow just about anything to go though. We need to tighten it. We know that the first entry can be formed of a combination of letters and numbers. The second part will always be numbers. Therefore, we can create a rule that only allows these. Regex has a great way to set this up. You can tell regex to parse for a range or characters. For example, if you want to let through only numbers you can use the syntax [0-9] or just lower case letters from the alphabet, [a-z]. You can combine both like this, [a-z0-9_].

So now our regex directive will look something like this ^([a-z0-9_]+)/([0-9_]+)$. This should cover most of the cases we would come across. This, of course, depends on your application. You should refine the rule further if your application allows you to. Let's now setup the URL rewriting rule:

As you can see, we have used the regex directive we prepared earlier. The second half of this mod_rewrite directive consists of Apache returning the URL that your application requires, with the question mark and the ampersand. The string we split using regex is stored as $1 and $2 and returned as part of the URL. The [L] at the end is a flag that tells mod_rewrite to execute this rule last if there are more rules than just this one, and then stop processing.

Save this rewrite rule in your .htaccess file and test it out. You should now change all your URLs to look so neat.

Original: June 17, 2009