Charl van Niekerk » Blog

Main

Latest

Archives

Powered by Blogger

PostgreSQL

Oh my goodness! You won't believe what I've been up to... I actually tried PostgreSQL today. Faruk is actually the one to blame for all of this, because if it wasn't for his fet I would never have tried it (well, at least not yet).

Check my webcam for my reaction to it after I got my first simple database with one one-to-many relationship going:

"What on earth?" "This can't be possible..." "This is too good to be true! Just what I've been waiting for!" "Thumbs up to Postgre, literally!"

Yep, that summarises my feelings pretty well.

First impressions of Postgre is that, well, it totally rox. You set up all relationships before-hand in the database so no more joins in the SQL statements. Especially when you start to work with large systems you later get tons of joins in one statement - I had 5 the other day, and that was only for a fairly simplistic gallery! Just becomes too much of a hassle...

Postgre is just so much more powerful, so much cooler, and so much faster to get stuff done with. Well, I actually only tried a few simple things so far and I'm already impressed; apparently the real strength of Postgre lies in more complex tasks, so I'm sure there will be much more fun for me to come.

Postgre lets you set up all kinds of other interesting things like Triggers, Rules, Privileges and that kind of stuff. The upcoming MySQL 5 promises some things, but the stable is yet to be released.

Mind you, this is only Postgre 7.3.9. I can't wait to start playing with 8.

New Blog

I just launched a new blog on Blogger. I'm actually going to launch another blog too soon, which is not going to be on Blogger, but I still need to add a commenting system. Patience, dear Watson, patience...

Anyway, this new blog is going to be 100% in Afrikaans. I originally started this one more than a year ago in English, just because I knew most of my readers would not "speak the lingo" and therefore I would have a much larger reader base.

It doesn't make sense to me to keep a blog that nobody reads anyway; I know some people do this just for the hell of it, but personally I need a little more motivation.

It seems like, a year down the line, I'm getting a bit more popular inside of the international Afrikaans community (if there's any community truly international, it's this one), and since Afrikaans is just one super-cool language I couldn't resist. Therefore, I'm very proud to say that Pure Charligheid is now officially open for, ehm, blogging!

For those that, "don't know the lingo", "Pure Charligheid" translates to something in the lines of "Pure Charl-ness"; I'm actually playing with words because "Charligheid" is pronounced in Afrikaans much like "saligheid" which is a word slightly difficult for me to translate, but amounts to something in the lines of "serenity" and "pure happiness". Maybe that will make some sense, although I doubt it.

Also, just to show off with my creative abilities, the tagline "Die joernaal van 'n Charl" translates to "The journal of a Charl". It doesn't sound cool in English, because it doesn't rhyme like it does in Afrikaans. If you're Dutch, it won't rhyme for you either, because the Dutch pronunciation of my name is different than the Afrikaans one, not like I mind if you pronounce my name like that since I'm used to it. At least you're pronouncing it better than most English do (which apparently think my name is similar to "Charlie" or "Charles" or something, haha! :P

Of course, this blog is not going to close down or anything, I'm still going to be posting here when I have time. And naturally it's still going to be in English.

The other blog still needs some serious work on the template, just like this one right now, but I'm in the process of doing a serious redesign so please have some patience while I get it all sorted out with my increasingly confusing schedule. Great things in the pipeline!

Photos

Warning: For those that aren't specifically interested in wasting their time, do not go further than this line!

Give a Charl a webcam, and this is what you get:

"WTF is going on here?" "Huh?" "How weird, this thing is taking photos of me!" "I can't believe this shit!" "Mmm, is there something useful I can do with this camera? Nope, probably not." "Cheers!" "Thirsty..." "What a waste of time!" "I can't believe I'm doing this." "Ok, the fun and games are over. Back to the madhouse..."

Yeah, just another day in the life of Charl. I know many will say I look drop-dead gorgeous on these, but thankfully I look much better in real life!

If you're a girl: I'm single, btw. Just saying so that you know...

Stealth

This wasn't what I was planning on posting about, but anyway, here goes.

Yesterday was Women's Day, a national holiday in South Africa (I don't know about other places in the world). Usually I hate public holidays because they just come and mess up one's schedule, but I have to say, this one was pretty welcome.

Since I had some extra time, I thought I might as well go see a movie since I was planning on doing that anyway but didn't get around to it. So I went to see Stealth.

It's a bit of an unusual movie; there wasn't much of a story in it but the action and special effects made up for it. I can't remember when last I saw a movie with so many things happening constantly. I think it was better than War of the Worlds but not nearly as good as Star Wars III. Of course, all Star Wars episodes are fantastic, but the last was certainly the best by far. The special effects were just so much better. One can see how much the technology progressed after all these years.

I mean, just check out the Light sabers. In episodes 4-6 they were very unnatural (typical of editing in the old days); now they zoom in right to the tip of the saber and it's perfect.

Of course, you won't appreciate Star Wars fully unless you watch the entire series. If you know what happens in episodes 4-6 and also in 1 & 2, you really enjoy 3 because that fills the gap between the others perfectly.

I would still like to watch all of them all over again, this time in the correct order (starting at 1 and ending at 6) but I fear that, after 3, the special effects of the older movies will be disappointing.

But anyway, we're getting completely off track now. Back to Stealth.

One thing which really made me happy was that they chose to use Linux (well, actually I don't know which kind of unix, but one of them) in the movie. More specifically, it was at that scene where Dr. Orbit (ha ha, genius people are funny) was "fixing" Eddie's brain. I didn't see much since they only showed his laptop screen for a second or two, but I can remember seeing something in the lines of su'ing in as root and them some other stuff at the Linux (or whatever) console. Cool! I wish I could pause to take another look, but unluckily this was in the movie theatre so I couldn't.

Good choice to the technical crew. Well done; you made my day!

In a real world situation, they would choose Linux/BSD because it's highly scalable, relatively secure and stable (all three things being important when it comes to high-tech fully-automated military weapons). You can't have the pane crash as soon the operating system crashes. But since this was just a movie, they just used it because Linux is cool. :)

IRIs & Content Negotiation

Sorry, I wanted to publish this post last week, but got caught up in looking at pictures of pretty girls. I actually got lucky, because (believe it or not) these actually had clothes on!

Anyway, here is a follow-up on my last post. There wasn't really supposed to be a follow-up, but now there is. So there you have it then.

Some people ask, "Why are these bad?"

Well, in order to explain why they are bad, we'll have to go right to the fundamentals. A logical approach is always beneficial, because that makes everything a lot easier for intelligent people to learn/use. Stupid people will have problems, since they typically have problems when it comes to thinking logical; however, you must remember that no matter what we do, they'll have problems anyway.

Make something idiot proof, and somebody will invent a better idiot. Common sense is not so common.

Ok, so here we go. First of all, you have information. Since we have a huge amount of it, we need to organise it so that we can access it.

In order to identify any particular piece of information, we need a unique identifier like an IRI.

Now you have two problems though:

  1. Information can often be represented in various different formats.
  2. Different groups of people often speak different human languages.

Ok, let's first cover the problem #1. Different formats serve different purposes. Three of the most major reasons we have different formats are the follows:

  1. Backwards Compatibility
  2. Proprietary Formats
  3. Multiple Purposes

Actually, #1 and #2 should be grouped together as one item, namely "Compatibility".

Backwards compatibility is important for obvious reasons, although I think one shouldn't overdo it. Sometimes people will just have to upgrade; I think use the not-so-common common sense here (even if your only motivation is to not be classified as stupid).

There is absolutely no excuse anymore for proprietary formats. One should standardise everything as much as possible; this is the whole point behind web standards (just to mention one example; there are many different forms of standards). There are various advantages to going the standards route; why re-invent the wheel? However, competition should exist in the various implementations, just to give the various software vendors (both open source and proprietary) a reason to work hard during the days (and maybe even nights) in order to create better products.

Multiple purposes... You can always try to take a "all-in-one" approach but sometimes you then sit with one format which isn't really suited well for any of the various particular uses. Don't entertain multiple formats unnecessarily without good motivation; rather standardise on one solid format which will serve a particular set of purposes; but again, keep it practical and use your common sense.

Ok, now to get back to languages. There are many different languages around the world (but we all know this, don't we? - oh wait, there might be Americans reading this blog too). Something that people in mainly English-speaking countries often forget (or like to forget) is that language is not merely a method of communication; it's typically an integral part of the culture and heritage of most of the native speakers of that respective language.

Having many different languages from all over the world is super-cool since it's a rich part of cultural diversity. As far as I'm concerned, having to wake up in a mono-language society would be extremely dull and boring. Afrikaans, in my particular case, is the language I was born with. It's part of my culture, I am very proud of it, and would like to have it stay with me for the rest of my life. And I will use and advocate it freely within my own society and all other interested parties.

Therefore, English has become the universal cross-culture communication medium, but the entire world should not just "forget about everything else" and start speaking English as their only language.

Now, after I've stressed the importance of multilinguility, let's go back to the accessing of information.

The directory-based navigation structure is really popular. These days, many systems are going the direction of tag-based navigation, which actually makes sense in many cases where a particular item can belong to multiple categories. In such cases, the categories can't be directories because an item can't be contained in more than one directory. Well, actually it can, but then you get repudiation. Of course, this does not mean repudiation on the server; there are a number of ways in which you can overcome that. However, it is fundamentally evil to have a particular piece of information available at multiple locations unless there is any really good reason.

If you struggle to understand why, just think about this for a moment. Why should search engines index the same thing more than once? Why waste storage space? The list goes on.

Not like mirroring is bad. We probably still need a way to overcome that; only the main site needs to be indexed by search engines so that you can automatically be redirected to your nearest mirror from there (think php.net).

Ok, getting back to directory-based navigation. Here, KISS rules. Let the directories be completely consistent with the actual navigation menus. Make it possible for the UA to "guess" which page is above the current just by looking at the directory structure (this is already being done by Opera and Firefox (with extensions)).

A particular piece of information can be available in multiple languages and in multiple formats. It doesn't matter in which format and language you get it, it's still the same information. Notice the semantics here. As I said earlier, any particular piece of information should have a unique address.

As I also said previously, different people need/want different languages & formats. Why should they be forced to manually select what they want? Let the computer do it. Yes, I'm talking about content negotiation now.

Remember: it's still the same information, just in a different format/language. And if everything is as it should be, it doesn't matter which format/language you get it in, as long as that format/language is good for your use and/or your computer's use.

Now, separate the two concepts. First, you have information. You could have that organised in directories. Then you have multiple languages/formats for each piece of information.

Ok, now sit back for a second and think about it. Everything should be coming into place now. Let's look back at the original question: "Why are these bad?"

Well, would you have these?:

I certainly hope not!

Then essentially, what about this even:

Yuck!

I hope now you understand where I'm coming from. Therefore, all of these are not so good (according to me). The only thing you, well at least "I", want is this:

http://example.com/about

Well, not entirely. Sometimes you still might want to link to a specific type/language; the next article will be about that and also more on getting the actual IRIs multilingual.

Multilingual URIs

One of the conventions used quite often for URIs in multilingual information spaces (the spec term for "websites", oh wow don't I sound intelligent right now (sarcasm intended)) where a page is translated into more than one language is (what I like to call the) Multiviews convention (because one of the most common implementations that uses that is Apache Multiviews).

In other words, let's say I've got an about page available both in English and in Afrikaans. I might use about.en.html and about.af.html for the different versions respectively.

This naming convention is quite obviously not bad. In a normal alphabetically-sorted directory listing, these would appear together making management a bit easier. Writing implementations to use this is also relatively simple.

There is, however, one small problem: You still have a "base" name (in this case about) which must be in some language or the other. Well, actually it doesn't need to be in a specific language, it can be anything like fjdsk356 for example, as long as it's unique in that directory. This is probably what the inventors of this convention thought; but as anybody knows this is far from clean and human readable. Users will also not have the added benefit of search engines placing extra weight on the keywords in the URI (Google seems to do this, for those that didn't know).

There are of course also other nice methods. As it has been suggested in the past, URIs should be just as the text in the documents: translated into the various languages.

To use the previous example, you might have about.html for the English version and aangaande.html for the Afrikaans version ("aangaande" is Afrikaans for "about", for those that didn't catch the obvious).

However, now you run into a lot more trouble. The biggest issue here is management, and your implementations also need to be a bit more competent. The only way to keep order in this kind of chaos is when you have a database or something to link the different translations together and their various names in the different languages. A fancy CMS could do that, but things are complicated a bit when you have multiple directories also. Like, for example, you need to know that /pillows/soft/furry links to the same document as /kussings/sagte/wollerig.

Another big problem is when you try to do content negotiation. Let's say I request the http://example.com/kussings/sagte/wollerig but with English as my preferred language according to my accept-language request header. Am I going to be redirected to http://example.com/pillows/soft/furry? Or is content-location going to be used? Well, with current common UA behaviour, the latter won't work as one would like in this case. Also, the former will waste some bandwidth, but more importantly add some latency.

Ok, then there's one last issue I want to throw at you. Let's say I have a site aimed at campers. As two of my subjects, I might be talking about the problems the slope of a steep mountain can cause when putting up a tent. I might also talk about using special pillow covers in order to keep the insects away from your head while you sleep. (I don't know if this is an actual option since I don't sleep in tents much myself and I'm not too clued up about the latest in camping gear, but this is only an example.) Ok, so I might have the following two URIs: http://example.com/pillowcovers and http://example.com/slope. This is fine for an English-only site, but now I want to translate the site to Afrikaans. Now I want to translate the pillowcovers page, and for the URI I want to use slope because slope is the Afrikaans for "pillow covers".

Oh but wait, I already used that... Oh crap!

Ok, this isn't the biggest of issues, one can work around these issues by being a little more "creative" when translating in most cases, and it is fairly unlikely that you would encounter many such cases. I'm just writing down some of the issues I encountered while trying to figure out a solid URI structure, which seems to be one of the toughest things to do on a site. Wait, hasn't this been said before? ;)

Update: Of course, there is a third option. What about http://example.com/en/about and http://example.com/af/about? As far as I'm concerned, this just sucks. I don't know why; it just looks ugly. Then I'll rather take Multiviews.

Multiviews

Although I think Apache Multiviews is quite cool, I have one issue.

For example, let's say I have a page, about, in the root of example.com, which is available in two languages namely Afrikaans (af) and English (en) in two formats, HTML (.html) and PDF (.pdf).

Now I would like to request the page; theoretically, I should have four possibilities when writing the URI:

  1. Request the page in a specific type and language (e.g. http://example.com/about.af.html or http://example.com/about.html.af)
  2. Request the page in a specific type and leave the language up to content negotiation. (e.g. http://example.com/about.html)
  3. Request the page in a specific language and leave the type up to content negotiation. (e.g. http://example.com/about.af)
  4. Request the page and leave both the type and the language up to content negotiation. (http://example.com/about)

I could name these files on the server as follows:

If I do this, possibilities 1 (with index.af.html, not index.html.af), 3 & 4 will work, but 2 won't. Instead, I get a nice 404.

I could also name the files like this:

If I do this, possibilities 1 (with index.html.af, not index.af.html now), 2 & 4 will work, but 3 won't.

Of course one could try silly things like duplicating all files, but otherwise I guess it's time for some scripting...

Copyright © 2004-2008 Charl van Niekerk. All articles are released under the Creative Commons Attribution 2.5 South Africa licence, unless where otherwise stated.