Charl van Niekerk » Blog

Main

Latest

Archives

Powered by Blogger

CMSs & Frameworks

Up to thus far, the concepts of the CMS and the development framework have been mixed together in most common CMSs of today.

Plone/Zope is actually the first system I know of that separates these two concepts. Plone is the CMS and Zope is the development framework.

So, what is the difference, you might ask? Well, the easiest explanation is to say this: A CMS is for the end user, while the development framework is for the developer to build a CMS on. Which one comes first? The chicken or the egg? I don't know, but in this case the development framework comes first and then the CMS.

So, what exactly is a development framework? Well, it's all kinds of things combined. Nobody likes to start with nothing; that's why we're using high-level programming/scripting languages for building normal stuff instead of Assembler (unless of course your name is Steve Gibson, with all due respect to him).

Anyway, for the layer fanatics, the development framework would be the layer right in between your programming/scripting language (PHP, Python, Ruby, etc) and your CMS. It contains things like database abstraction layers (not fancy, high-level ones, but like PEAR DB) , templating systems, session & user management, general tools for common tasks, etc.

Database abstraction? Do you mean a class with methods like getAllListings? Not really, more like the PEAR DB (not like that kind of thing is always too useful, but that's another post). Getting your result sets back in two-dimensional arrays with getAll was never that efficient in PHP, but still quite handy every now and then. The getRow and getOne methods are very handy though (reduces your code with a few lines and makes it more readable in certain cases when you only return one row from the database).

Or maybe your thing is more like Django with a higher-level abstraction that replaces the need for writing SQL. But whatever you want, that's part of it.

Templating systems... What, you haven't been using them yet? You're still doing print "<h1>$heading1</h1>\n";??? Get outta here! :P

Well, actually I still can't manage to find a templating system I actually like, but that's also another post.

Well, that's prettymuch the concept of a development framework. As I said earlier, a CMS is what the end user works with. In other words, basic manipulation (add, edit, delete) of items like blog articles, products, etc.

The whole point of separating these two concepts is that most users want their own high-level functionality. In other words, different kinds of items, different properties (database fields), etc (and by database I include all kinds of data storage types including XML). A development framework should assist the developer to set up such things easily, because no matter how much functionality you'll add they will always want more. Or else you get a SAP system that virtually nobody can admin.

Anyway, since most CMSs are built on top of their own development frameworks, there is a lack of basic standardisation between the various CMS implementations. Moving from one CMS to the next is a pain for developers because they often need to rewrite many of their components/modules or otherwise spend a long time adjusting them.

This is also one of the reasons I like the PHP PEAR since that also helps you standardise some things. I know those packages aren't always perfect, but at the end of the day it isn't perfection that counts but getting somewhere.

Also, when developing your own CMS, consider using the things already there, or otherwise improve them. That's the real spirit of open source, not the everyone-does-their-own-little-thing spirit that is so prevalent in many open source communities world-wide currently. People do the most powerful things when they work together, not when they just work on their own.

Actually, much of the open source spirit goes hand-in-hand with standards. And we're not only talking standardised protocols and formats, but also standardised coding practices. That's actually very important, especially when you work along with other programmers.

If each programmer does their own unique little thing, at the end of the day when another programmer needs to work with their code it's difficult for them to adjust to the original programmer's coding style and code structure if the original programmer didn't follow certain norms. The more standards and norms that one follows usually the easier it gets. You also see advantages when it comes to code reusability and integrating existing systems into each other.

Yes, of course I'm talking about things like object orientation, but everybody follows that already, don't they? (What? You don't? Shame on you!) But when people also start to use things like the PEAR etc. then they not only save themselves development time but other programmers that use it too learn to develop with platforms built on them a bit faster.

But what do you do if the existing systems are pretty shitty? Well, first of all you offer to contribute to the existing package. If that doesn't work, then I guess you'll have to go custom but preferably make your work available too under a separate package in the PEAR so that others don't have to do it all over again.

Of course, I'm talking open source here. If this offends you, sorry but this site is dedicated to open source gangsterism. ;)

The point is, and I want to stress this again, if you have an open source project don't make it exclusive; allow the public to test, provide feedback, and allow patches and improvements from other developers that submit them (of course checking them first to see that they're not trying to insert malicious code into your project if you don't trust the developers).

The second (more specific) point is that CMSs are typically too specific to users needs. Keep an open mind and make sure you're careful not to make a CMS too specific; if it gets too specific then rather split up the concept into a development framework and a CMS running on top of it. At the end of the day, use your common sense. That's what matters. :)

Jabber

This blog is mostly about web standards and open source. I believe most readers of my blog are interested in at least one of these two areas, therefore I think it's about time I make a post like this.

One of the things I don't really understand from the web standards community at large is that they insist on supporting web standards but (seemingly) couldn't give a rat's ass for standards in any other medium of the Internet. For example, instant messaging.

The problems with proprietary instant messaging are similar to the problems with proprietary web technologies and browser-specific coding.

We currently have quite a few large instant messaging protocols & networks. Most of these networks are closed (belongs to one particular company) and uses some custom proprietary protocol over this network.

This is what I like to call the "centralised approach". This is quite common (I'll blog about more examples one day) and is typically very bad. This typically means that some company controls the network, which isn't always bad (unless the typical monopoly effects start to set in) and doesn't talk to any of the other networks, which is always bad in my opinion.

For example, think about MSN, ICQ and Yahoo. All three operate their own instant messaging services. Different people join different networks for various reasons. But if I join MSN I can't talk to a person on ICQ. Now that's bad. Ok, so I join ICQ too. Now I've got another friend who's using Yahoo. And another one using Skype. Now I need to join those networks too. Four different clients all using my computer's memory and my bandwidth. Sheesh.

I want to set an away message? Ha, you have to do each client separately. Cool. Or not...

Mmm, did I forget to mention Google's new creation? These are the times when swearing is allowed.

Luckily, very few people still seem to be using ICQ and Yahoo. So it's not really that bad. I just hope Google Talk doesn't become popular.

Then luckily we got multi-protocol clients Gaim so that we don't need as many client programs running on our computers. But bandwidth is still liberally wasted.

MSN Messenger seems to be the de facto for many at the moment. Let's face it, MSN is not bad, but it's still proprietary so out of principle it's just as bad as the others. And it still isn't the only network around (thus there are many people using the other networks); thank goodness otherwise we had another IE (monopoly) on our hands.

Jabber, to my knowledge, is the first large instant messaging system available running an open network and being built on top of the open XMPP system.

The advantage is obvious - it's decentralised, it's extensible (XMPP is built on top of XML), it's open, and it's cool (because of the aforementioned reasons).

Theoretically it is possible for all the other closed networks to migrate to Jabber, but I don't see this happening soon. People, and especially companies, with a bad mindset are unlikely to change.

Google Talk is also built on top of XMPP, which is quite cool. Apparently you can also connect to Google's servers through a normal Jabber client. The only problem is that they still made a closed network around it instead of joining the open public Jabber network. So back to square one.

One of the complaints I often get about Jabber is that the "service is slow and unreliable". Well, then you're obviously not connecting to the right servers. I'm connected to a local server in South Africa, jabber.obsidian.co.za, which is very reliable and extremely fast (relative to the other networks) when accessing it here in South Africa. Well, at least through my connection. Don't just join the default server (jabber.org), visit the site and select the best server for you.

And if you're an ISP, why not run you're own Jabber server and let it join the public Jabber network? The advantages are similar to mirroring - reduce external bandwidth usage while providing your clients with very fast, reliable access to the resource as a value-added service.

I don't want to go into Jabber extensively and do a feature comparison; however to prevent you from thinking Jabber is an all-in-one I'm going to tell you that you will find there are some features (particularly VoIP) which Jabber can't cope with on its own (there are other open standards like SIP etc. for that kind of stuff; if that's too complicated for you just go proprietary with Skype, at least it's cross-platform unlike some...)

I believe Jabber is superior for text-based messaging systems but some other IM services have brought a whole new meaning to IM by incorporating other services like games etc. into your "IM experience". Don't expect these things in Jabber, I'm sure there will be other programs available out there that will be able to cater for those needs but Jabber most likely does not include the kitchen sink and neither the bathroom washing basin.

Remember that Jabber, being built on top of XMPP, can be extended to do all kinds of things (including some of the features mentioned above). It's actually so multi-purpose it's ridiculous. Of course you can already use it to send SMSs using third-party (typically charged-for) services, but it can really accommodate any kind of (fairly) low-bandwidth node-to-node traffic through the network. For high-bandwidth stuff think SIP; then you first establish one or more TCP connections between the two (or more) nodes directly saving the XMPP network's bandwidth. Well, this is exactly what proprietary systems do too it seems (just probably not through an open standard like SIP), so there.

So maybe, if you want a specific feature your current instant messaging client provides, you could find that feature in a more specialist SIP/XMPP application or otherwise you're always welcome to develop one yourself. ;)

Also remember that there is no "Jabber client". There are a huge amount of clients (personally I use Gaim) available, many of them open source and some proprietary. But I don't really have that much against proprietary products, as long as there are open source alternatives available and as long as they all follow the right standards well.

Jabber is best taken from a fun perspective; I think it's an especially nice system to hack with (although I haven't really been doing that much hacking on it yet myself). But it's also nice for everyday use once you've got the gang of it, and I'm sure you'll find that it compares nicely when you don't want much more than pure plain text. And if you do, well, get hacking!

Naturally, I don't mean "hack" as in the media sense of the term (read: breaking into); I purely mean hack in its true sense as in "explore" (in this specific case). Of course, "breaking into" is probably "exploring" but "exploring" isn't necessary "breaking into". Remember that. ;)

Mambo & Standards

One of the things that make me quite angry is when people talk absolute rubbish; ok, I admit to talking a lot of nonsense quite frequently myself, but that's beside the point.

Many people like to rave about Mambo being a standards compliant CMS. Users looking for a standards compliant solution are often directed to it by others. Although Mambo probably came a far way already, it still can't be classified as a standards compliant CMS according to my book.

Firstly, let's give credit where credit is due. The developers of Mambo, in comparison to other CMS developers, have definitely made a large step towards standards support by outputting mostly valid XHTML 1.0 Transitional by default (or at least, according to the W3C Validator). Note, however, that the character encoding isn't properly specified.

However, now to the "bad" part of things.

Firstly, markup standards define two things:

  1. The structure of the document (validity).
  2. The semantics of the various parts of the document.

A computer can typically check the first thing, but not the second (at least not until we have fairly sophisticated AI).

Although the markup output by Mambo by default seems to be valid, the semantics are another issue entirely. I would say a standards compliant CMS needs to do the following two things (lots of lists today, as you can see):

  1. The core CMS must ensure that output is both valid and semantic as far as possible.
  2. The default templates provided with the CMS should follow W3C Recommendations for validity, semantics and accessibility.

The markup of both the core CMS and the default templates seem to be valid, but the semantics of both suck. Ok, so it could have been worse, but divs are often used where more semantic elements are appropriate and you'll also find too many tables for layout purposes (that in my opinion can't be justified).

Accessibility? Stress-testing by inserting invalid markup? Let's rather not even go there.

Insert: Note though that the current WYSIWYG editor seems to automatically correct many mistakes many users will make in the markup structure when editing the source (you have to do it through the WYSIWYG editor), although no obvious error is being displayed to the user in this regard. Turn JavaScript off though to bypass the WYSIWYG editor; but now suddenly you can't save the file. So it seems like it does try to steer you into creating valid markup, although being JavaScript-dependent (like this) is not a good thing for me personally. Also note that only XHTML 1.0 Transitional is catered for; Strict requirements aren't enforced.

I think all-in-all Mambo is not a bad CMS; it's fairly user friendly and is probably one of the best currently available on the market. But a standards compliant CMS it is not (well, at least not yet).

(I used the latest beta I could download without using CVS, 4.5.3, to gather these facts; apologies if you thought this was going to be a thorough all-round review; I regret not having the time to complete one currently).

We are Borg; F*** Your Freedom!

First of all, apologies for the swearing in the title, but it has been too long since I had a good rant.

It's always fun to know that large corporations care about you, the user instead of just feeding their own selfish hunger for yet more power. It's a pity we don't have the pleasure of knowing that more often.

As if you didn't read the quote enough times by now, here it comes:

"We won the desktop. We won the server. We will win the Web. We will move fast, we will get there. We will win the Web."

This is so low, yet sadly typical, of Microsoft that I probably shouldn't even respond. The reason I am responding is because this comes to show the following: Power corrupts; absolute power corrupts absolutely.

This kind of attitude isn't just present at Microsoft, but also at many other companies as well as governments (even worse) in the world.

One of the reasons I turned an open source gangster is because I'm getting sick and tired of high-technology companies (Microsoft and others) trying to monopolate and control my computing life.

Especially the fact (well, I don't really know if it's a "fact" but it definitely looks like it) that Microsoft dismantled their IE development team after they "killed" Netscape shows me that it's not about software quality, but all about the monopoly.

Money is a powerful initiative. And although it can be used well it can also cause some people to do some incredibly evil things.

Let's face it; Bill Gates sucks when it comes to coding. I don't know if he even can code. All that he ever did was use/steal/borrow/buy other people's ideas and sell them. And frankly he did a pretty good job of that; maybe too good.

Don't get me wrong, I'm not a communist. Communist governments (more often than not) like to take away the freedom of their people instead of giving them more by telling them which religions they may or may not believe in, etc. Capitalism, on its own, can't rule either because money will eventually destroy this entire planet if somebody isn't going to put a cap on it sometime.

I think the ideal socio-economical system would be a hybrid of capitalism and communism, or maybe an entirely new system should be developed. But anyway, I'm not going to say much more about that. At the end of the day, capitalism works as long as people don't get greedy. Which they all to often do, unfortunately.

But the most important thing is freedom; I believe any person should have freedom to do anything they please unless that somehow hurts or takes away the freedom of another person (or people). And by "people" I also mean future generations.

Why make human rights complex? Keep it simple.

Well, now I'm going to be practicing my democratic right not to use Microsoft software.

I really feel sorry for ethically-challenged people like Ballmer. The future can't, and shouldn't, be lead by people with that kind of thinking.

Just to end with a short message to Ballmer, not to offend but rather to practice my freedom: "F*** you, you poor imbesilic bag of shit!" (How's that for freedom of speech?)

Free Opera?

Ok, so you must have heard about this already: Opera is now free. If you didn't, then you probably need a feed reader. :)

Anyway, I think this is a really positive move. One of the few reasons I haven't been using Opera much so far is because of those irritating banners; now they're gone so I think I'll be using Opera a lot more now!

I actually found some okes that liked the Google Adsense option because it found them documents relevant to the current one they're viewing. I agree that it's a handy option to have, and I think one could make an extension (both for Opera and for Firefox) that does that. However, personally I thought the Adsense is rather ugly and takes up too much of my precious screen estate.

The only thing that still really bothers me about Opera is that it's not open source. Ok, I don't want to sound like an open source nazi here, although I actually am, but because of "certain" abusers (read whatever you want) I'm starting to really get somewhat paranoid about any project that isn't open and free (in all senses of the word, particularly "freedom").

I really hope now, especially that Opera is "free" as in "no-cost", Opera will become "free" as in "freedom" (read: open source) too soon. Who knows. :)

But anyway, on the positive side Opera is cross-platform, which does a lot for me since I use two different operating systems regularly: Linux and Windows. Opera running on both is really cool.

Anyway, congrats to all the people at Opera and thanks for such a fantastic browser. I'm not going to get into the whole Firefox-versus-Opera debate since I think both have their advantages and their disadvantages and both are absolutely superb pieces of software. I'm also (not yet) going to decide which one will become my primary browser, because I like the feelings of variety and choice, and I'm really being spoiled right now! :)

Keep It Open!

Some people like to argue that software should be closed source, proprietary, and/or that you need to pay for licenses.

Let's think for a second. First Sun released the open version of StarOffice called OpenOffice. So, which one became the biggest? Open Office. Red Hat released Fedora Core. Then, which one became the biggest? Fedora Core.

The only people that apparently still care about StarOffice and Red Hat are some businesses, and rightfully so. Which makes sense, really.

Novell started the OpenSuSE project to make SuSE more open. Now they're working on SuSE 10, which is all based upon work in the OpenSuSE community.

The open development model works. Tried & tested. Period.

Novell's own Linux is quite, make that very expensive, but I like the concept of it because it's all aimed at the business environment. Rather have them running expensive Linux than have them running Windows. And they sell support with it, so again it makes sense.

In the meantime, Novell uses some of their profit to sponsor cool projects like OpenSuSE and Mono.

I think this is the kind of thing we're going to be seeing into the future. Home users will keep using open Linux distro's while companies start using commercial ones. And both groups will benefit from each other and work together. Yay for the future!

Link Elements

I see this quite often:

<link rel="section" title="Products" href="/products">
<link rel="section" title="Services" href="/services">
...
<ul id="sections">
 <li><a href="/products">Products</a></li>
 <li><a href="/services">Services</a></li>
</ul>

Then I wonder why this shouldn't be good enough:

<ul id="sections">
 <li><a rel="section" href="/products">Products</a></li>
 <li><a rel="section" href="/services">Services</a></li>
</ul>

Remember that the rel attribute is perfectly valid on the a element as per HTML 4.01 specification (and is therefore also valid in XHTML 1.x).

People have actually grown so used to using link elements for accessibility that relatively few people think about page size. I agree that, in most cases, it won't make that much of a difference, but still.

Also, UA support for the latter is a bit worrying. Although some browsers have an extra "navigation bar" exposing many of the link elements, they seem to like to ignore a elements with rel (and what about rev?) attributes.

I think the ideal would be this: If you're going to link to a resource inside the body anyway, don't add yet another link element. This should make it easier to maintain (or to write scripts to maintain) the navigation system in most cases while saving some bytes in the process and keeping your code cleaner and more minimalist. But, needless to say, implementations need to be fixed.

Although I've never used Lynx before, apparently it displays the link elements before the actual page. What's the use in seeing the same navigation twice?

Some might argue that one should rather put all the navigation in link elements and forget about putting anything in the body. Although I'm not going to argue for or against this (personally I would most likely lean towards against, but that's beside the point) this is not going to be practical soon because most common browsers don't have a built-in mechanism enabled per default to display these.

The best implementations? For a navigation bar in a GUI-browser, display both the applicable link elements and the a elements with a rel or rev attribute there. For text browsers that "force" the user to look at them before getting to the actual page, rather just display the applicable (what would otherwise be hidden) link elements there and forget about the a elements (they'll probably be displayed later anyway). But for other purposes where you call the navigation on-demand somehow, make sure to use everything.

Languages in (X)HTML Documents

As many of you would most likely have seen already, there was an interesting post yesterday on The Web Standards Project:

There's a lot of misinformation about how, when and where to declare a language - or multiple languages - within HTML and XHTML documents. Fortunately, the GEO group at the W3C provides us with details as to how to do this. Here are some guidelines to help:

  • Always declare the default text processing language of the page, using the html tag, unless there are more than one primary languages.
  • Use the lang and/or xml:lang attributes around text to indicate any changes in language.
  • Do not use Content-Language to declare the default text processing language, and do not use language attributes to declare the primary language metadata.
  • Do not declare the language of a document in the body tag.
  • For HTML use the lang attribute only, for XHTML 1.0 served as text/html use the lang and xml:lang attributes, and for XHTML served as XML use the xml:lang attribute only.
  • If the text in attribute values and element content is in different languages, consider using a russian doll approach.
  • For documents with multiple primary languages, decide whether you want to declare a single text processing language in the html tag, or leave it undefined.

For those that don't speak specification, this basically means (well, as far as I can decipher it) that you need to set both the content-language and the lang attribute (and/or xml:lang as applicable) on the root element if you have a document that has a main language. The content-language in the HTTP Response Header would then be used for metadata purposes (for example, to help search engines index your page under a particular language and/or for site directories) but that the attributes would be for the UAs (particularly for aural UAs) to help them determine how to read out specific pieces of online text.

Also, if I build a CMS that automatically visits the resources I link to in order to retrieve the language/content type for that link (to possibly be used in attributes on outgoing links later) as well as to check if those links aren't broken, it only needs to do an HTTP HEAD request instead of an HTTP GET request (thus only retrieving HTTP Response Header info instead of the entire page) thereby saving bandwidth.

IRIs & Content Negotiation - Part 2

Ok, so why do we need IRIs in multiple languages? I already presented some thoughts in my post Multilingual URIs (which definitely should have been titled "Multilingual IRIs" now that I think about it) a while ago; at the end of the day it comes down to these issues:

How many times to you type a web address into your UA every day? Of course, you probably use bookmarks/links/search engines most of the time, but if you're like me you like to often remember web addresses and type them in directly.

For example, you might be walking down the street and somebody might ask you for the IRI of a specific website.

"Just Google," many might say (and rightfully so), but as any regular searcher probably knows, although Google is very good it doesn't always find you exactly what you want.

So, how do you store and retrieve a IRI directly into your head? The human brain is designed to remember things by forming patterns. Things like normal words, logical thoughts, etc play an important (largely subconscious) role in this.

For example, let's say I want to find the homepage of Ubuntu Linux. Mmm, let's see... It's a distro of Linux and it's a non-profit project... Therefore the IRI is most likely ubuntulinux.org.

Yep, I did that just out of my head.

Now, let's say I want to go to their community page. ubuntulinux.org/community?

Why should I care about this? Why not just go to the homepage and click the "Community" link? Well, if I'm a regular Ubuntu user, why should I? Isn't it faster to go to the wiki directly?

Also, simple, standard and logical IRIs look so much cleaner in a browser address bar than:

http://ubuntulinux.org/index.php?p=49&auth=95225&golf=not_today&tennis=tomorrow&sex=yes-please&coffee=decaf

Now, think about the international audience. Assuming you don't read Afrikaans, would this make sense to you? Would you be able to remember it?

http://ubuntulinux.org/ondersteuning

Probably not. Now think if you only read Afrikaans and couldn't understand English, would /support make sense to you?

That's why you want the user to be able to go to either /support or /ondersteuning and get to the same info, in their own language.

Now be careful: One might be tempted to use the IRI to decide on which language to serve. In other words, /support gives you English and /ondersteuning would give you Afrikaans. But what if an Englishman would give me the /support address and I would rather get it in Afrikaans?

Just use the IRI to decide on which information, not on which language. And use standard content negotiation for the rest.

However, there are still some cases where you might want to override content negotiation and get particular information in a particular language. Of course this issue is going to become quite an interesting one when looking into future technologies, but I'll have to make another post sometime about that. For now, we'll just stick to what's available and practical currently.

Since it's pesky to have to go into your UA settings to change which header is being sent, the best (only?) option currently is to incorporate your language choice somehow into your IRI. There are now two options left (taking into account that I already eliminated the others in previous posts):

  1. http://example.com/about?lang=af
  2. http://example.com/about.af

Now we start to really get into the nitty-gritty of things.

Querystrings, as in the first option, was traditionally used for, well, queries. The kind of action where you want to retrieve information typically based on search criteria from the server without making any permanent changes on the server. That's also why it is called HTTP GET, since, well, you get stuff.

HTTP POST, in comparison, is for when you really want to upload info to the server. Also, this will most likely make permanent changes on the server; hence, also when you reload a page you POSTed to in your browser, your browser warns you of re-sending the information. Actually, because of that default behaviour, anytime when you're issuing a command that causes anything to be changed on the server, whether you're uploading or not the POST method is typically used.

The second option is basically seen as a unique document, different from http://example.com/about or http://example.com/about.en, and bears no relationship with http://example.com/about.af (we might think it's obvious, but machines do not). Of course, if it's an (X)HTML document, one could use the rel attribute and one could also possibly use RDF to specify these relationships.

However, since you're not really doing a query in this case, it is quite obvious (to me) that the latter is the better choice.

Also, think from a searchbot's perspective. Search bots should not really look at anything beyond the ?. In other words, let's say you have the following three links:

  1. http://example.com/potatoes
  2. http://example.com/potatoes?sort_table=title_desc
  3. http://example.com/potatoes?sort_table=size_asc

Search engines should only index the first page, and see the latter two links as synonyms for the first. This is not what Google is doing, but what some of the others are doing.

Why? Well, again think about both the semantic differences and the practical implications. Do you want to index based on specific queries? No, you should index according to directories of links (not as in "folders" necessarily but think navigation). Also, do you want one search engine to index another's result pages? Or even its own (if it was idiots who wrote it)? Do you want to index the same content multiple times because it's being sorted in a different way (see above example)? Of course not!

However, for now the searchbot should index all the various language versions of the same document for keyword gathering purposes in all the available languages. One day, searchbots might become so intelligent that they will automatically "understand" you're query and automatically search for synonyms of your keywords even across languages. Then this might become unnecessary, but for now it makes sense.

This has two problems though:

  1. Search engines need to find links to all the available languages.
  2. Search engines will most likely point you towards the specific-language IRI of a document instead of the "content negotiation" neutral one because of the way it indexes and "understands" individual IRIs.

The first problem can easily be overcome, especially in (X)HTML documents, but the second problem is slightly more complex.

I frequently search in English because the search engine doesn't "understand" Afrikaans (yet) and I'll then only be searching through Afrikaans documents, while searching in English typically gives me so much more to search through. Of course, there are exceptions, but this is what I do normally.

However, when I land on a document that's also available in Afrikaans while still searching in English, I want content negotiation to take over. Now you should see my problem.

The core of this problem really sits inside HTTP. Although it provides mechanisms for content negotiation, it doesn't provide any direct method for grouping documents in various languages/formats together (of course, much of what I state here about languages also counts for format). The only way to overcome this is by using metadata. In (X)HTML, it can be embedded, but in other formats we'll most likely have to rely on an external metadata format such as RDF, etc to give the search engines this kind of information.

Of course, HTTP could also be extended to provide additional info in the HTTP Response header. For example, available-languages with a list of all the language codes of the available translations of the current document. And a language-neutral IRI for linking to when a language-specific IRI is requested. And the list goes on.

Of course, similar things should also be available for formats.

And formats... If you were a search engine, do you want to index exactly the same info both in PDF and HTML form? I don't think so. You want to index one of them and just be aware of the others so that, when a user does as type-specific search you know about all the available versions. Once search engines start "understanding" search queries, the same might become applicable when it comes to languages.

One last issue: What do you want to see in your address bar in your browser? If I view the about page of example.com in Afrikaans (in my browser's address bar as http://example.com/aangaande) and I copy-and-paste the address to you, my English friend, over instant messaging (for example), when you're viewing the English version would you not like to see http://example.com/about in your browser's address bar? This could be achieved currently by a redirect, but that creates extra lag. content-location? Not like that's implemented... Anywhere where in the HTTP Response Header will you then send the language/type-specific IRI? And what about your separating your language-specific IRI, your type-specific IRI and your language- & type-specific IRI? All for linking, maybe, or other reasons? Ok, following standard conventions a fairly technical-minded person with enough experience would be able to figure those out by him/herself... but what about computers and maybe even newbies? Just thoughts.

This issue is rather complex, and some might say I'm going overboard, but (although I believe it's achievable using current, non-perfect technologies) we're probably quite a few years away from having computers do this kind of thinking for us and being able to browse the web truly "ideally". But I freely blog away, because thinking about these things are cool.

ASP.NET 2.0 - Improving!

As reported by the Web Standards Project today (sorry, no permalink, they stuffed that up it seems) ASP.NET will now default to XHTML 1.0 Transitional instead of XHTML 1.1. Although some people might disagree, I think this is definitely a step in the right direction.

ASP.NET still doesn't make any real attempts to send XHTML under its own mime type, and XHTML 1.0 Transition can at least be sent as HTML while keeping the W3C happy. Although I might not be totally happy with that, but anyway, that's irrelevant.

The reason I'm quite excited about this move is that this shows that Microsoft is indeed looking at standards and slowly moving towards fairly good standards support. It's far from perfect, but we'll just have to take it one step at a time.

For now, I want to give Microsoft the "thumbs up" and hope they will continue in this trend with the ASP.NET project (and hopefully other projects too).

You can find more info on Microsoft's site.

Copyright © 2004-2008 Charl van Niekerk. All articles are released under the Creative Commons Attribution 2.5 South Africa licence, unless where otherwise stated.