15 Jun 2007
posted in thoughts
The trailing slash on URLs referencing directories has always bugged me because it doesn't look good. I am a bit of a URL fetishist, I like URLs to be simple and clear and without a trailing slash. But that is not the correct way of doing things.
Back in the early days, a website was essentially a collection of directories and
index.html files in them with a number of of other
.html files around (who remembers the difference between
index.html?). The directory's name was appeared plainly in the URL and that wasn't a problem as long as it was followed by a trailing slash (or a filename).
If they are missing, the server coughs, because if you say
/somedir/foo instead of
/somedir/foo/ the server searches for a file named
foo, and because this file is a directory, it complains and tries to fix it by itself. A List Apart has a good article on trailing slashes.
Notwithstanding that it generate an extra disk access and slows the whole process down for your user, it usually works out. Except in a few cases. Typically if the last portion of the URL is actually a directory on the server's filesystem and you're rewriting the URL to remove the trailing slash for cosmetic reasons. Like
/weblog where 'weblog' is an actual directory but is rewritten into
RewriteEngine on ^weblog/$ /weblog [R] ^weblog$ /sub_index.php?div=blo&sec=hom
Why would you need to do this you might be wondering? Well, in this particular case there is a CMS that drives the weblog of a specific site, and generates content include files in the 'weblog' directory which are displayed by a PHP script located at the root of the site (or in any other directory for that matter). The same script displays different sections of the site, including the weblog's main page.
Now, initially, I wanted the URL without a trailing slash, and used a rewrite rule that mapped it onto the PHP script. Unfortunately, that doesn't work. You'll get
/weblog/?div=blo&sec=hom show up in your browser if you omit the trailing slash in the URL.
The correct rewrite is to add the slash if missing:
RewriteEngine on ^weblog$ /weblog/ [R] ^weblog/$ /sub_index.php?div=blo&sec=hom
Use trailing slashes for directories because that is the correct way of doing things.
A direct consequence for me of all this was to have directory and file names that don't appear in the URL per se. This approach turned out to have several advantages:
- The URL doesn't give up anything about how the file system organisation (security);
- The URLs can be permanent in time and independant from the server's file system and technology (expandability);
- Forces you to use a directory/file naming scheme that prevents accidental overwriting of files.
Who hasn't at least once accidentally overwritten the wrong file while uploading an amended copy with the same name (e.g. index.html) to the server?
I like to organise the sections of a website in directories with each their own 3 letter code:
sec_pro for products section,
sec_con for contacts section, and so on. In each directory, the index file will prepend the 3 letter code:
con_index.html, etc. This way you ensure that there aren't two files on the server that bear the same name.
This scheme can be pushed further to subsection organisation:
The content is in a separate file:
con_sales_content_extra_inc.html, etc. and located in its own separate directory:
con_content. The file system looks like this:
/sec_con/ /sec_con/con_index.html /sec_con/con_sales.html /sec_con/con_support.html /sec_con/con_corporate.html /sec_con/con_content/ /sec_con/con_content/con_sales_content_intro_inc /sec_con/con_content/con_sales_content_main_inc /sec_con/con_content/con_sales_content_extra_inc [..]
This might seem a little overkill at first, but reveals to be extremely flexible and efficient in the long run. The content files can be CMS driven or not, depending of the context, users and level of expertise.
Call me a file system maniac if you like, but keeping to a neat, concise, and structured directory layout benefits everyone, and ensures that your site will not break when management decides to install the latest content management system developed in yet another emerging web–based development environment.
Next: Bear with me