The Importance of URLs and Structure in a CMS

20 comments
Story Text:

When selecting a CMS, there are always issues that are more important than others. So, the trick is finding a CMS where the developers have the same preferences as you. That's a bit hard though. An issue that is very important to me is URL's...

I could say extremely important, even. If the system don't support "nice URL's" then it doesn't matter much if the code is W3C compliant and all - the basic building blocks are arranged all wrong, and it's no easy task to get them right.

Most other things can be tweaked, but the basic site architecture has to be right from the outset, just like with a building. You just don't put the chimney in the basement when it's supposed to be on the roof, and the toilet is not supposed to be in the same room as the refridgerator - and so on.

If a CMS developer should see this by accident: URL's are important!

In the "Quiet out there" thread Nick stated that Drupal had a problem with URL's, so that you could not lay them out like, say:

example.com /news /articleX.htm
example.com /sports /articleX.htm
example.com /sports /golf /articleX.htm
example.com /sports /football /articleX.htm

At least that was how i interpreted it. I've seen it quite a few times with CMS'es: Getting URL's all wrong (in many different ways). And... (bonus info for CMS dev's) i'm trying to find a good CMS - and i've got friends and customers to advice on this as well. Some of those names would look pretty good on your lists, as these are pro's: Publishers. No, not bloggers. Real publishers, like newspapers, magazines, tv, radio and so on. The type of people that buy rather than donate. See?

I work with information architecture. Defining the URL structure is the first thing i do when planning a new site: Which sections should it have, which sections should have sub-sections, and so on. It's all in the URLs, as URLs are the building blocks of the WWW.

This is not the first time i have seen database developers thinking that URLs are of secondary importance (as, to them, they're just fields in the database). From an usability viewpoint, a logical URL structure is essential, however.

This hit me instantly as a major flaw, so i digged a little around on the Drupal site, and found some debate over "forum taxonomies" at the Release candidate 4.6 thread, however, i was not sure if it's the same issue. I also found modules like:

And... as you can see i stumbled right into another thing that makes selection of a CMS hard - they use terms such as "path", "block", "node", "taxonomy", "category" and so on - all of these seems to have one meaning related to their product and another meaning related to other products.

Anyway, i think i found it, at least relating to Drupal: URL Aliasing. The description is not very informative though, for one new to the system, and it's also very wrong in the assumptions it's based on:

You can create a news section for example aliasing nodes and taxonomy overview pages falling under a 'news' vocabulary, thus having "news/15" and "news/sections/3" instead of "node/15" and "taxonomy/term/3". You need extensive knowledge of Drupal's inner workings and regular expressions though to make such advanced aliases.

Advanced? That's not advanced! URL's are the basic building blocks of any well laid out site, and indeed of the whole WWW - no less than that.

Absolute URL control is what you start out with "in kindergarten", ie. when creating a static site in HTML - even when using a beginners level editor like Frontpage you have it. You simply create your folders and put your files into them - it's not in any way an advanced topic. It's so basic - and yet it gets overlooked by database people.

So, does anyone know of a CMS that has absolute URL control built in?

Any thoughts on all this?

Comments

 

Quote:
Advanced? That's not advanced! URL's are the basic building blocks of any well laid out site, and indeed of the whole WWW - no less than that.

They're talking about aliasing, not urls in general claus.

Quote:
So, does anyone know of a CMS that has absolute URL control built in?

Drupal

I think you need to test it a bit more, i looks like you're not quite 'getting it' :)

 

Quote:
Nick stated that Drupal had a problem with URL's, so that you could not lay them out like, say:

No, i said you could do that...

So ...

What's the answer, Nick? Can you manipulate the URLs in Drupal without "extensive knowledge of Drupal's inner workings and regular expressions though to make such advanced aliases"?

Yes

I do it all the time..

For example

http://www.threadwatch.org/about/editors

when i wrote the page, i just filled in a seperate field for the 'path' as 'about/editors'

It's ever so simple once you've done it a couple of times...

Aha

That's good news. Was this an extra plugin/whatever, or did Drupal come with it?

And (I have to ask) I'd like to know why you chose Drupal over, say, some blogging software? (Though it does look like it's got "extra" features, I'd like to hear what you found preferable, as you know the system.)

 

Well, it's primary function is community based, there are a number of 'citizen journalism' projects being run with it and a big political campaign was done with it some time back also - for community stuff it's great..

you can dumb it right down though to use just as a simple site builder. Part of the appeal is the advanced permissions, and all the modules available.

The path module is built in, you just have to enable it...

Excellent

Excellent, Nick. Thanks for the data. :-)

I'd add this: though claus may not have caught this particular feature in Drupal, it's the problem with a number of CMSes, shopping carts and the like: great attention paid to functionality, but not so much attention paid to how things have to work in the real world.

uhm

i feel a bit silly about the above; having used Drupal as an example, if Drupal really can do this. Sorry about the misunderstanding, i should have kept the post totally general.

So, does anyone know other CMS'es (apart from Drupal) that have nice URLs?

a logical URL structure

The issue seems to be "logical URL structure". Logical from the point of the application serving the content, or from how the information is perceived to be categorized?

It's not that developers don't care about urls, it's that when you build an application there is a logic that should follow there too...

http://www.allinthehead.com/retro/245/designing-uris

That link provides a nice write up, and what I'm getting at can be seen at the "Where was I?" heading. Nested categories can throw a wrench in the works of the type of logic described there.

Anyway, I wound up writing my own cms because of this very issue...

Now, how about tags? ;-)

And that's ...

That's precisely the point, mipapage.

This kind of tunnel vision can happen when you've got only the programmers working together and they're ignoring all other issues. Same can be said of only the graphics department, or only marketing, etc.

Thing is, a properly set up website needs more than functionality. It needs more than graphics. It needs more than content. It needs ... *everything* set up correctly and working with everything else.

app logic

My take on application logic is that it is a good thing and that it should be found there - at the application level.

So, the site organization might not be logical from a database viewpoint, but the database just has to take that into account. Users rule on the user side, not programmers. Users should not care about database calls and docID's - users should see pages.

Actually there's a simple test: Does the URL compute IRL?

Can you receive the URL in an email and immediately see what the page is about? If you put it in a letter (on paper) or on a post-it note, will you be able to identify the topic of the page when you see the URL? Is it "plain-text-readable", and can you actually pronounce it easily over the phone?

And then there's the advanced version: Is it intuitive?

Can i just type in "example.com/products/123456" for information about product 123456? or "example.com/weather/spain/today" for information on the weather in Spain today? Is there a help system if i spell wrong, are URL's case-sensitive (better not be), and where do i go if i omit parts of a URL (ie. can i omit "/today" and see all weather for Spain)?

YMMV, of course.

 

I agree, clause; at least, I think I do.

Bottom line is that a CMS/database that does not provide what the site owner needs is simply incomplete for the site owner's purpose.

We all know this. Otherwise, we wouldn't be having this discussion at all; we'd just post a list of CMSes and be done with it.

 

  • "everything* set up correctly and working with everything else."
  • "application logic at the application level... Users rule on the user side..."

Nice.

Nice URLs...

...are the primary reason I go for a DIY solution 95% of the time...

I was, however, very pleasantly suprised by WordPress 1.5 (a blogging software written in PHP) that also sports 'static pages' since the latest release (1.5). To support 'nice URLs', you have to have mod_rewrite enabled (an Apache module) and the ability to use .htaccess files (Apache per-directory settings file)... WordPress automatically generates .htaccess files based on your instructions, and you also have the ability to manually set 'page slugs' (the part of the URL not mandated by URL structure)

I will now shamelessly advertise my not at all interesting blog as a fine example of WP1.5 :)
it's located at http://luka.kladaric.net/

there's also a static page there about me at http://luka.kladaric.net/about/ and a page describing all the Firefox extensions I use located at http://luka.kladaric.net/firefox/extensions/ (as a sub-page of the main 'Firefox' page, because I have Firefox pages on extensions, configuration, search plugins, themes and bookmarklets)

a number of URL 'tags' are available, as described in the WordPress wiki at http://codex.wordpress.org/Using_Permalinks

hope this was helpful to all y'all ;)

Interesting

Interesting, allixsenos. I've been using WordPress 1.2.2 on a number of blogs (for which I had to hand-create the static pages), and will be testing 1.5 shortly.

However, that said, sometimes a blog isn't what's needed. If one needs a regular CMS, or a shopping cart, then blogware isn't the best choice -- and then all of the above still apply.

re: Interesting

I know, that's why I still had to create a custom CMS solution for a car-portal I was creating for a client... But WP1.5 is currently THE BEST solution for a personal website that I know of :)

an example

Here's an example from a CMS a publisher i work with use:

example.com/article/20050411/ENVIRONMENT/104150015/-1/FRONTPAGE

This is really bad URL design. What the URL says, from the application viewpoint is:

An article, published today, in the "ENVIRONMENT" section, with an ID of "104150015" served after a click on a link from the front page. Okay, that sounds reasonable... What's wrong with it:

  • "article" is redundant - the user does not need this information, it's just 7 more characters to type and spell wrong.
  • The date - no need for that either. This is not a personal diary, and the user can see the date on the page itself. An indication of the article subject would be more appropriate, as very few days have only one thing you connect them with. But, most of all, it's placed really bad:
  • The section - All caps? Why? And how do i see all articles for that section? I don't because sections are sub-folders of the date! This also means that any traffic analysis systems have to be set up with reguar expressions to strip the date field - it's simple if you know how to, and you have a stats system that can work with regexps, but, well...
  • Article ID: Well, it has to be there i guess - that nine-digit number is the only really critical piece of information in that long string. A few words would have been a bit better i'd say. Perhaps even the title.
  • The last part: Would you imagine that this is refferrer tracking for this CMS? It is: Leave out everything after the article ID and you see the same page. Now, why should the user need to be told that s/he has just clicked a link from the front page?

Was that all? Oh, it was everything there is? Wouldn't you think that this would be easier for users, somehow (although not a lot more informative):

example.com/environment/104150015

Nine digits might perhaps even be okay in stead of a descriptive title, as it's a newspaper for engineers. It's a standard "in-a-box" publishing system though - they're not the only publisher who use it.

it's application specific

Claus,

You're right on with your requirement, but isn't it a transient thing? If we built a CMS to satisfy you, then perhaps in 12 months your requirements would be different. Sure today it's about semantics in the URL, and UI design-by-URL, which you can "adjust" if you can control URL labels, by it is possible that the technical requirements or perhaps standards would shift and such a CMS would again be "wrong" for you despite that ability. I don't imagine you would enjoy hand-managing all those URLs once they were not semantically tied to view content.

The core issue is the website's dependence on a filesystem. That is the inconsistency that needs to be eliminated.

We have gotten stuck in an MVC architecture (model, view, controller) where the "display page" is selected for use with the "content to show" for a given "specific instance". A CMS is a highly-evolved MVC architecture, and I agree it is worthy of optimizing. But the participatory web emerging now goes beyond MVC and isn't well-served by a CMS. It is not clear yet what SEO will come out of it (tags and track/ping backs and collaborative authorship and group-wise indexing etc) but it, too will be problematic if websites continue to be based on file systems. I agree with others who have highlighted how a CMS provides *some* of the views, but not all. Similar concept to WordPress offering "pages" that are database driven, yet not CMS-dependent in the "traditional" way. It's a logical step for any CMS to add "static" pages. However WordPress' approach is a re-write hack as well, and integration of "pages" with Wordpress requires rather careful management of WordPress looping and plug-ins that frequently step on each other.

Apache re-write is a hack for file system dependency and hierarchical layout dependency. I don't think it works as a long-term solution due to it's own hierarchical dependencies. I THINK THE URL SCHEMA MUST BE APPLICATION SPECIFIC and the coders need to abstract the file system, providing us DEPLOYERS with an interface to URL management. That works.

I, too design from URL out. If you really have a URL-based design, you can manage the URL system outside of Apache. I have generalized my solution and anticipate it can accommodate whatever evolves, provided I have control over the DNS and web server. If you want to keep some of the flexibility of the CMS you can do so with careful use of page level tagging (search engine bot management).

I am much happier if I find a system is consistent and reliable, even with really ugly URLs, than I am if I find it using re-write conditions (because the coders assumptions are usually wrong for my application) or re-write rules (because they add to the inflexible hierarchy and file-system dependence).

Good coding is scalable. It is my view that the problem everyone cites about features getting attention at the expense of code (due to popular demand) is also the problem that leads to unnecessary file system dependency. If the popular request is for "search engine friendly URLs" we are likely to get more the same non-scalable file-system-dependent URL rewriting.

The request should be for consistent and reliable URLs, and we should have an interface to managing that that the domain level. Quite a stretch from where we are today, I admit, but not outside the reach of serious clients with decent coders.

good points

>> a transient thing.

Exactly. That's why i used the words "absolute URL control" in the first post - it's not only important to be able designate an appropriate value to them, but also to be able to change those values as needs change. A cool URI is not one that doesn't change, it's one that has change management behind it.

---

>> The core issue is the website's dependence on a filesystem.
>> That is the inconsistency that needs to be eliminated.

I would have put "sometimes" in the second sentence. Sometimes "file system" and "display system" can be the same, and it all makes sense. Sometimes it's not so. The more "advanced" the site, the less likely is the similarity. Still, even on 100% static sites i run, the file system does not always have the same structure as the URL structure. Visitors see what is best from their viewpoint and files are organized on the server according to what's the easiest thing to manage. Then there are some rules glueing the two sides together.

---

>> file systems, views, static pages

There's enough meat in these three concepts for an entire thread. Let's say that "file system" is one possible "view" then we have eliminated one third of it :-)

A "view" is a very good way of thinking about all this, as that incorporates the idea that the same data can be "displayed" in widely different formats and states; HTML, XML, RSS, WAP, WebTV, SMS, email, screenreader, PDF, PDA, pager, voicemail, etc. We're not even close to that now, i know.

Static pages are something else in my world. Either they're hand built, or else they're built by other means, but they are basically flat files with no (or very little) computing being done at request time. You can have static pages built by an advanced back end - the web site does not need to be static because the pages are. The backend produces them, the server serves them, and hence they are very efficient for high traffic sites. All that is needed is a "publish" button that generates them as flat files. These pages can of course also have dynamic plug-ins, so it's a bit blurred.

Anyway, i digress - i have edited a long section out of this post, as i could probably write a web site in stead.

---

>> Application specific

In the case of Flash the URL's are hidden from the user and as such they are not available. So, here they can be whatever is easiest. The same goes for other embedded applications, eg. the Ajax variety that is starting to get momentum. For web sites, the URL is a navigational tool as well as a promotional tool, so at least just as much thought must go into URL planning as with ordinary navigation, breadcrumbs, and the like.

If you see "application" as equal to "web site" then the URLs should be site specific. At least to some degree, as there are also benefits you can get by adopting conventions used widely, eg. having "contact" as part of a contact page URL, or "remove file name to nearest slash" in order to go to the section front page.

When i personally think about the word "application" it is not the web site as a whole - it is only the programming that controls the web site, ie. the backend, or CMS. It can also be elements on pages, eg. a flash movie or a game. So, as long as URL's are seen by the user it is a tool for the user, and as such it is my opinion that URLs should be carefully crafted with said user in mind. Or, at least significant parts of the URLs.

---

>> Apache re-write is a hack for file system dependency and hierarchical layout dependency.

Very few things are not hierarcical. Still, it's a hack that is only necessary because the application does not do it's job properly. URL management should be built in.

---

>> The request should be for consistent and reliable URLs, and we should have an interface to
>> managing that at the domain level.

I totally agree.

 

thanks for the commentary, Claus. I think we're in agreement.

I have yet to accomplish full URL management to my own satisfaction within a re-usable framework, but I do believe it can be done well. Currently the bulk of my manual labor is semantic work, content development, and URL management.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.