I've recently been noticing a trend in how other websites are handling Clean URLs and it isn't good! Most websites don't appear to have the same luxury we have with Drupal (a concrete URL Alias system provided by the Path module). Although the URL Aliases can sometimes be a bit of a burden on larger sites, as the table can easily enter the tens or evey hundreds of thousands of entries, it provides (if used correctly) a very effective 1:1 relationship.
[adsense:468x60:4496506397]
The Old Way
Most sites implement a cheat clean URL system where the path contains the ID or a 'source path' (in Drupal Lingo) very early on in the path. The rest of the path tends to simply be keyword stuffed. This is achieved using URL Rewrites which, put simply, will ignore everything after a certain pattern has been matched. There are many examples of this technique... Here are a few (with slight variants):
-
http://www.beerintheevening.com/pubs/s/13/1316/Wenlock_Arms/Hoxton
-
http://cgi.ebay.co.uk/Drupal-by-David-Mercer-2006_W0QQi.....l1247QQcmdZViewItem
-
http://www.seoseonews.com/articles/5582/1/-Gre........d-Information-and-Ideas.html
The above links share the same thing - the unique ID is towards the beginning of the URL and everything after a specific point is either ignored completely or is considered unnecessary.
The BeerInTheEvening site only requires the 'pub/s/13/1316' part - everything else is simply ignored.
Ebay do it slightly differently, they appear to have everything in one long path with no slashes - but they use 'QQ' to separate arguments and Z to separate key/value pairs - how sneaky! Therefore the pattern Ebay expects is (at minimum) '{some_title}QQitemZ{some_id}'. Some testing shows that the {some_title} appears unchangeable - however the rest of the URL is very changeable. Maybe the title is part of the ID?
The article at seoseonews.com is very similar to BeerInTheEvening. It appears to basically tag the keywords onto the end of the URL in the hope that Google will index that instead of the "true" URL which actually doesn't need anthing on the end at all!
The New Way
Recently I've noticed that some sites are trying to be slightly more cunning... Amazon for example...
http://www.amazon.co.uk/Adobe-Flash-Pro-CS3-Mac/dp/B000O17CGU/ref=sr_1_3
I was looking for Adobe Flash CS3 for the Mac and I looked at the URL and decided to have a play with it to find out what was necessary to get the page to load. First thing that came off was what looked like the referring source - that was an easy one. I then started to work backwards from there and found that not many arguments could be removed at all! It was at that point I wondered if they'd been sneaky... Did they need the first part of the URL? Answer: No!
http://www.amazon.co.uk/dp/B000O17CGU
Since finding this out I have found a number of other sites which are actually doing exactly the same thing (eg, DeviantArt.com). Instead of keyword stuffing the end of the path - they are pre-stuffing the URL and putting the ID on the end.
Why do this? Well the keywords important to the page appear at the beginning of the URL rather than the end. As Tesco say; Every Little Helps! ©
Why is this bad?
Well one thing you learn in SEO-101 is that Google & Co. do NOT like it when sites represent the same content on multiple URL's, commonly known as Duplicate Content. Google has a tendancy when it finds these sites to take a "hammer to kill a fly" approach and blacklist the site without warning or explanation.
The Drupal Way
The way drupal handles the URL's is pretty neat - especially the clean ones. You simply define what path you'd like to map to the source path and drupal handles it all internally. If the alias path doesn't match anything in the database then you don't get a matching source path.
Of course, as soon as you enable the path module and create your first alias you have created Duplicate Content. Why? Well you can still access the node on its source path (eg, http://example.com/node/12) and on its nice new alias (eg, http://example.com/my-nice-alias.html). Currently Drupal does not handle this internally which is why I have developed the GlobalRedirect module. This module simply checks that the currently accessed path has no alias associated. If it does then it will do a permanent redirect (301) to the proper path.
What if you dont want to have to keep filling out the alias field every time? What if you dont want to hand site structure control over to the masses on your nice new & funky Web 2.0 site? Enter PathAuto (along with Tokens). When these two pair up - Drupal will automatically generate paths for you based on whatever template you configure for your node type (or taxonomy term... or user...)!
"What about other SEO stuff like Page Titles and Meta Data? " I hear you cry! All taken care of by another module I maintain (along with John Albin) is called PageTitle. Version 1.x only allows basic control of the page title template, however PageTitle 2 is currently in beta testing and has some VERY funky features! This new release is another Token-powered module which works in a similar way to PathAuto. You provide a token template on a per node-type basis along with (optionally) specifying a separate page title for the node (ie, you get a NODE TITLE (for links, H1's, etc) and a HEAD TITLE for the Search Engines). The module takes care of the rest.
MetaTags? Easy - checkout the fantastic NodeWords module. Unfortunately this isn't token powered (yet - hint hint!) so you cant setup token template descriptions, however it will automatically take the node teaser as the description and take the node's taxonomy as keywords for you! You can also setup some site-wide defaults too.
Summary
All in all - I think Drupal is pretty damn SEO friendly - especially compared to some other CMS's (and ESPECIALLY when you look at how easy it is to turn your website into and SEO dream!). What else does the community use for SEO for Drupal?
References
I've mentioned a few modules above - here are their links: