Main
Latest
- South African Embassy
- Green Card Lottery 2008
- Zend Framework Google Login Example
- Custom Muti Widgets
- Google Social Graph API and PHP 5.2
- Google AJAX Feed API Muti Example
- Gnip API Changes
- Google Maps and Geolocation
- oEmbed, flickr and starstar
- Petition Against Public Holidays
Archives
- June 2004
- July 2004
- August 2004
- September 2004
- October 2004
- November 2004
- December 2004
- January 2005
- February 2005
- March 2005
- April 2005
- May 2005
- June 2005
- July 2005
- August 2005
- September 2005
- October 2005
- November 2005
- December 2005
- January 2006
- February 2006
- March 2006
- April 2006
- May 2006
- June 2006
- July 2006
- August 2006
- September 2006
- November 2006
- December 2006
- January 2007
- February 2007
- March 2007
- April 2007
- May 2007
- June 2007
- July 2007
- August 2007
- September 2007
- October 2007
- November 2007
- December 2007
- January 2008
- February 2008
- March 2008
- April 2008
- May 2008
- June 2008
- July 2008
- August 2008
- September 2008
- October 2008
Google's Country Search Analyzed
I reported earlier that Google's country-based search uses the ccTLD to determine which country a particular site is relevant to. However, after doing some more in-depth research it seems like it actually determines this by the IP address of the server the domain is hosted on.
In other words, if a server is physically hosted in South Africa all domains on that server will be classified as South African irrespective of the ccTLD.
This is incorrect to me because of two reasons:
- A website relevant to South Africa can be hosted overseas in another country such as America.
- A website hosted in South Africa could not be relevant specifically to South Africa at all but could be aimed mainly at international audiences and provide international content or could be more relevant to another country.
The first point is far from unlikely. Because of the unduely high bandwidth charges in South Africa, hosting stuff locally is extremely and utterly rediculously expensive. Therefore many people host overseas. Take Host4Africa for example. Yes, they host their servers in the USA because of reduced bandwidth costs. And if you compare their hosting charges with typical local hosting in South Africa, then you can see that this isn't a load of balony.
The second point is also far from invalid. Because the state of Internet connectivity in the rest of Africa is typically far worse than in South Africa, it is far from unlikely that a company in, let's say Kenya, would much rather host in South Africa than in its own country. That doesn't give the site any relevancy to South Africa at all, it's just physically hosted in South Africa.
This could be better for them, since international bandwidth to Kenya is most likely far worse than international bandwidth to South Africa. They might yet want to host in South Africa rather than in America because most of their clients could be located in South Africa and they want to optimize their site's bandwidth for their clients. (This is the same reason most South Africans are also still hosting their sites locally.)
The two problems stated above wouldn't have existed in a perfect world. But, as we all know, this isn't a perfect world.
I think the reason that Google is doing this is because there is too much ccTLD abuse going on already. This is really sad, and I think that we should work towards making ccTLD-based country detection practically viable.
Weblog Awards
As some of you might know already, I am very honored to have been chosen as one of the 150 panelists for the 2005 Weblog Awards. The site has been down for a short while, but now it's up again. The finalists are out, so go and vote for your favourite weblogs! I already did. :-)
There is some unhappiness however between many South Africans that Africa had to share its category with the Middle East. This is an unfair disadvantage against Africans, since the Middle East is a very prominent area at the moment, especially for bloggers.
So, the South African Weblog Awards was born. I can't wait to see a list of finalists, since I would really like to read some more South African weblogs, especially those related to technical material.
Online Advertising
Let's face it: Large animated graphical (or even Flash) ads sqeezed into ever hook-and-corner of a web page's days are over.
Back in the late nineties when I surfed to www.google.com for the first time, I couldn't believe how minimalist it was compared to other search engines. Soon after that, I started to read about this new "revolution" in some books and saw presentations about the difference in design between the various search engines. Google was always noted as having a unique approach with its minimal design.
Google quickly became popular. I didn't like it too much at first (can you believe that?), but soon everybody was telling me about Google, so I figured it can't be that bad. After that, I soon realised that the superior speed and quality of Google's search results was more than worth switching. Even some South African search engines, which should have been faster because of the lack of international bandwidth (at that time), couldn't compete with Google at all when it comes to speed.
These days many sites follow the same route as Google, making Google's approach far from "unique". Take a look at MSN Search for example. In general, the past few years have seen dramatic changes in the way individuals and companies think about design and clutter.
For the last couple of years, more is less and less is more. Clean, simple designs that focus on the user's needs rule. And the same goes for advertising.
Everybody remembers the web in the '90s, right? Large graphical animated banner ads graced the top of every page, and more animated buttons could be found everywhere on the page without 1 cm2 to spare. This tought users to subcontiously ignore them, and when advertisers realised that their ad revenue is falling, they were quick to jump in and make the ads even less ignorable, obtrusive and irritating than ever (think popups and Flash).
I can still remember the articles stating that if you want a really successful advertisement, tell users that the ad is only for testing purposes and that they should not click on it. It wouldn't have taken the users long to start to figure that one out too... :-D
Google's AdSense is quite a revolution, and a turn in the right direction IMHO. It's unobtrusive, requires minimal bandwidth, and because of directed advertising it can often even be helpful to find similar resources that you might be interested in.
I like it when it's used in the sidebar where it's visible, yet not in the way. Some weblog authors try to make it more "in your face" by first giving you a paragraph of a post to get you interested, and then spam you with a few ads first before continuing with the post. This is really irritating and is a practice that must die.
I had an interesting conversation the other day with Bojhan over this. I soon came to a conclusion that the only advertisement that's still effective is the one that appears not to be an advertisement in the first place. This is because most people are trained to spot and ignore advertisements subcontiously, and this even applies to text ads.
A very nice, pretty, aesthetically pleasing and therefore effective technique is to style the text advertisements to fit in with the rest of your site. The better it fits in, and the less it looks like an ad, the better the success. Take a look at Tweakers.net as example.
Media-specific @import
If you are a hard-core CSS coder like myself, I bet you couldn't resist catering for the different media types out there with separate stylesheets. However, this can get a bit tricky to manage.
One technique is to use @media. This is quite handy, but for your average site it is far better to separate the different media types into different files so that the UA only needs to download what it can actually use.
And then the media attribute comes to the rescue:
<link rel="stylesheet" media="aural" href="/stylesheets/speech">
<link rel="stylesheet" media="handheld" href="/stylesheets/handheld">
<link rel="stylesheet" media="presentation" href="/stylesheets/presentation">
<link rel="stylesheet" media="print" href="/stylesheets/print">
<link rel="stylesheet" media="screen" href="/stylesheets/screen">
Face it, you can't get much worse. This is very ugly, and if you add support for a new media type, it means that you'll have to go edit the markup. Note that most sites will have the same stylesheet rules for screen and presentation, so those two can probably be combined. But you get the point. ;-)
So, what to do? Simple. Link to one general stylesheet:
<link rel="stylesheet" href="/stylesheets/style">
And then put all the mess in that stylesheet:
@import url(speech) aural;
@import url(handheld) handheld;
@import url(presentation) presentation;
@import url(print) print;
@import url(screen) screen;
If you need to change anything in this, you can do it all in a central place. Personally, I feel that semantically-wise this is better to do in a stylesheet than in the markup anyway.
Very few people actually seem to use media-specific @imports at the moment; that is most likely because IE doesn't support it. Needless to say, that has never stopped me, but if your site needs to stay compliant for some silly reason, then you can't use this nice technique.
Mozilla and Opera has supported this since I started using it (about a year ago). I don't know about other browsers - Safari anyone?. Since I don't have a "real" host at the moment, I can't set up proper test cases, but it should be easy enough to copy-and-paste some of the code from this site.
Anyway, I should really get my own Mac. :-)
Delicious and PEAR
How do you include your latest Delicious entries into your template to create a linklog? If you want to use the PEAR, it's easy to do this server-side. Let me show you how.
For this article, I will assume that you are already accustomed to the PEAR and that you can use it on your server.
First of all, you must have the Services_Delicious package installed along with all its dependencies.
Then you can start editing your template. Note that the code below is PHP, so you'll have to make sure that you send this part of your template through a PHP parser obviously.
First of all, make sure that you have the Delicious class imported:
require_once("Services/Delicious.php");
Next, you must create an object and log on to Delicious with your username and password:
$delicious = new Services_Delicious("yourusername", "yourpassword");
Of course, you must replace yourusername with your Delicious account username and yourpassword with your password.
Next, retrieve the latest links from your account:
$links = $delicious->getRecentPosts();
$links will now be a two-dimentional array. The first dimention represents the individual links and the second dimention represents the various pieces of info associated with each link. Here is a bit of sample code to print the links into a definition list (dl element opening and closing not included):
foreach($links as $link)
{
$href = htmlspecialchars($link["href"], ENT_COMPAT);
$description = htmlspecialchars($link["description"], ENT_NOQUOTES);
$extended = htmlspecialchars($post["extended"], ENT_NOQUOTES);
print " <dt><a href=\"$href\">$description</a></dt>\n";
print " <dd>$extended</dd>\n";
}
Of course, you should make sure that all of your special characters like greater than, smaller than and quotes in the case of the href attribute's value are correctly encoded. This can very easily be done with PHP's htmlspecialchars function.
An interesting thing to note is that it seems like the returned link info is UTF-8, so you probably should send the page you're printing this to as UTF-8 appropriately or otherwise you might run into some trouble. But you probably should be sending all your output as UTF-8 anyway. :-)
ccTLD Abuse
I made a post a while ago about domain extentions.
Recently, something has again fallen under my radar. Many medical/medicine-related companies have started using .md as their TLD.
.md is the ccTLD for the Maldives. Therefore, semantically any subdomain of that ccTLD must be directly related to that area.
Although .md might seem as a cool option if you're a medical company, it actually sucks from a semantic point of view (unless of course you're actually based in the Maldives or the site is specifically aimed at that area).
Ok, so now you might be thinking: "You're just going overboard you stubborn, stuck-up redneck son-of-a-bitch semantic purist! There's no practical implication of this 'minor' misuse of semantics!"
First of all, to me it's actually quite a major misuse. But leaving that there, there are actually some serious practical implications.
Many search engines (including Google) now provide you with the option to only search inside your own country. So, how does a search engine know which country a site belongs to? Not by meta tags; not by RDF; not by IP address. By it's ccTLD of course! And if it doesn't have a ccTLD but instead a gTLD, it can be classified as "international".
Therefore, let's say I'm staying in South Africa (incedentally I am, but anyway...) and I want to search for a local pharmacy to buy some medicine from. I obviously want to restrict my search to South Africa then. However, now some uninformed webmaster of one of the pharmacies decided to register an .md. It might be cool for offline advertisement, but that would mean that when I search like that it wouldn't pop up in the results. So is it really so cool?
Ok, I hope this makes a strong point. It's not only .md that's often being abused, but also many other ccTLDs. I honestly hope that this practice will cease soon. It might look cool, but actually it's a lame practice.
And along with that, I also hope that governments will soon start to act and disallow their ccTLDs to be abused like this.
RSS & Atom
Please Note: In the quotations in this article, the text has not been modified directly but the markup has for semantic and accessibility reasons.
<rant>
Recently, RSS 1.1 has been in the news a lot, despite it being only draft still apparently (but then, Atom 0.3 is only labelled as a pre-draft). It is basically an update on RSS 1.0.
The new version looks a lot better than the old version. I like the fact that it is nicely integrated with RDF. However, it is still far inferior when it comes to Atom IMHO.
One of the things that irritate me about the whole RSS-thing is that you actually get two different streams of RSS: "RSS" as in "RDF Site Summary" and "RSS" as in "Really Simple Syndication". RSS 1.0 (and therefore 1.1) belongs to the former, while RSS 0.91, 0.92 and 2.0 all belong to the latter.
Personally, I vote for RDF Site Summary above Really Simple Syndication, since the latter is actually quite pathetic. Actually, it serves its name well: It's really simple. Too simple. Simpicity is good, but not always the key. Personally, I always find this: Aim for simplicity, but power normally should come first.
Anyway, one of the official aims for RDF is to let it facilitate syndication. This is a mistake IMHO; I personally believe that RDF should rather just stick to metadata. Some people will argue that syndication falls under metadata, but I don't agree. For me, syndication calls for an alternate format of a document in any applicable XML-based language. This XML-based language could define metadata just like (X)HTML does, but I don't see why it should be classified completely as metadata.
RDF does provide some functionality that can be used very nicely for syndication; therefore it's not totally illogical to want to use it for that. However, I still don't completely see the semantic logic in it. Or maybe I'm just going overboard; who knows...
Moving on, here is a quote from the RSS 1.1 draft:
In its other guises, e.g. RSS 0.92, RSS 2.0, and Atom, content syndication of this kind has been enormously successful, and requires no justification of its general potential. The addition of yet another version, however, to the current proliferation of RSS variants necessitates its rationale being documented.
Many sites understand the benefits of syndication, and have provided RSS feeds to achieve a variety of goals, from increased readership to providing instant updates on content. A number of formats have emerged, each of them offering aspects that others do not. RSS 1.0, due to its RDF based nature, offers a variety of benefits. Thus far, however, uptake for RSS 1.0 has been relatively limited, due to the difficulty in creating conforming documents in comparison to other syndication formats. Duplication of data, as well as a generally confusing specification, have left much to be desired from the developer perspective, leading to a less-than-impressive number of RDF-based RSS documents in the wild.
Why indeed? The reasons given in the second quotes paragraph still says nothing to me!
As far as I'm concerned, syndication still calls for a separate XML-based markup language separate from RDF, like Atom. Atom is extremely powerful, widely used, and simple enough (if you ask me)! Why these people are still wasting their time (ok I'm sorry to have to put it like this, but IMHO they are) to upgrade a different (far inferior IMHO they are) syndication format is beyond me!
Actually, they do mention Atom in two places:
This specification is therefore made available by users of the RSS 1.0 format who wanted to update the specification to make use of the latest features of RDF in order to reduce the redundancy in the format, and the ambiguity in the specification, while at the same time implementing a series of bugfixes from the lessons learned in developing the other descendent of RSS 1.0, Atom.
RSS 1.1 is hence to be considered a bugfix and streamline release of RSS 1.0 for users of RSS 1.0 who do not want to migrate to Atom.
Why not just switch to Atom and get it over with? Ok, Atom isn't perfect, but RSS 1.1 still doesn't provide equivalent functionality in many respects. Ok, so this is still a draft, but why create yet another specification?
Then, as a last argument, I think RDF Site Summary should change its name. Some people (like me) want to read entire posts through their feed readers. Why does the name make it sound like only summaries are being syndicated? Technically speaking, a site could syndicate anything (like links in a linklog) through this format. Ok, maybe this is going a little overboard, but I still think that either the name is misleading, or these guys should totally rethink the purpose of the specification.
</rant>
More Reading:
- inessential.com: Weblog: Comments for ‘RSS 1.1’
- miscoranda: RSS 1.1
- Rough Guide to RSS 1.1
- Technical Ramblings » RSS 1.1
go_open Episode 7
Episode 7 of go_open aired on 2004-01-15.
VoIP is becoming a major thing in South Africa because of the relatively high local phone call charges. And now with the deregulation of the industry, we're looking at a fairly bright future in this regard. It's been in the computer magazines recently, and everybody in the telecommunications/internet industry is talking about it. [ Online Article | FCC's Report | Molo Afrika | Google Search: VoIP ]
Solly Masinga has a very interesting success story thanks to the HP i-Community Centre in Limpopo. [ Online Article ]
They interviewed ESR, a very high-profile Internet hacker and open source activist. [ Online Article & Transcript | Eric's Random Writings | OSI | Slashdot ]
Ubuntu Linux was featured. It focuses on translation, accessibility, being regularly updated and well maintained. Most importantly, it will always remain free of charge. [ Online Article | My Review ]
Featured Sites:
- TheOpenCD
- OSS for the Windows platform.
- Slashdot: News for nerds, stuff that matters
- No description necessary. :-)
- Freshmeat.net
- Massive directory of OSS.
- ICDL Foundation Africa
- Get qualified. Course material of Open Office and Linux available.
GAIM is a multi-protocol instant messaging (IM) client for Linux, BSD, MacOS X, and Windows. It is compatible with AIM and ICQ (Oscar protocol), MSN Messenger, Yahoo!, IRC, Jabber, Gadu-Gadu, SILC, GroupWise Messenger, and Zephyr networks.
Gaim users can log on to multiple accounts on multiple IM networks simultaneously. This means that you can be chatting with friends on AOL Instant Messenger, talking to a friend on Yahoo Messenger, and sitting in an IRC channel all at the same time.
Gaim supports many features of the various networks, such as file transfer, away messages, typing notification, and MSN window closing notification. It also goes beyond that and provides many unique features. A few popular features are Buddy Pounces, which give the ability to notify you, send a message, play a sound, or run a program when a specific buddy goes away, signs online, or returns from idle; and plug-ins, consisting of text replacement, a buddy ticker, extended message notification, iconify on away, spell checking, tabbed conversations, and more.
The above have been quoted directly from the online article about Gaim.
Go Open Episode 6
A week late, but here is the report on go_open Episode 6 that aired on 2004-01-08 (the first episode in a while - they stopped broadcasting temporarily over the peak of the holidays):
The lead story was about distributed computing and the various applications thereof. Apparently it is mostly powered by OSS. They featured distributed.net as being one of the major roleplayers in this area. [More]
Tiger Brands is one of the major South African food companies. Apparently their whole company runs on Oracle, and they decided to migrate from proprietary Unix to open source Linux. [More]
They interviewed Bruce Perens (yes, him!). I really like the header on his own site. Follow the "more" link for a transcript of the interview. [More]
TWiki is one of the top collaboration tools for virtual teams. They have some really nice success stories of some high-profile companies on their site!
If you struggle to understand a technical term, try the FOLDOC.
And why not get a formal certificate in OSS from ICDL? (International link, anyone?)
Firefox on South African TV!!!
Update: With some help from Lachlan regarding the English, I made a post on SFX on my weblog there about this too.
Finally, I spotted Firefox on South African TV! I know I would, it was only a question of when.
More details: It was on the program go_open (the usual suspect...) on SABC 2 on 2004-01-15. Unluckily, they didn't mention its name, but they used it to browse to some websites and demonstrated a few of its functions (including the "Extentions" panel) on screen while talking about OSS in general in the background.
Also note: Thunderbird was mentioned previously on the same program.
Linking to Feeds
So, how do you link to a feed from an (X)HTML document?
What We Want
The idea is that when you try to open a feed in your web browser, your feed reader must jump up and take it from there.
How Today's Browsers Work
Currently most browsers are written to do the following: When it encounters a familiar protocol and it receives a MIME type which it can't handle on its own, it either downloads the file and then passes it to another application as per configuration or if no application is configured to automatically open that specific MIME type it prompts the user to save the file.
When a protocol is encountered that the web browser doesn't understand, it attempts to pass that URI to another application that can handle the protocol.
Protocols in URIs
HTTP, FTP, POP3, SMTP, IMAP, NNTP and IRC are all common protocols.
A protocol is normally specified right at the start of a URI before ://. Here is a list of examples for some of the protocols mentioned above:
http://www.google.comftp://ftp.gardenroute.comnews://news.saix.netirc://irc.mozilla.org
The Ugly Solution
Unfortunately it seems like we can't really configure most of today's browsers to pass the URI to an external application according to MIME type. However, you can configure them to do just that by inventing a "new protocol". So, many propose to link to feeds this way:
feed://charlvn.blogspot.com/atom.xml
When applicable, the URI will be passed to the user's favourite feed reader for syndication.
However, this is semantically wrong. In reality, feed:// doesn't have any unique protocol associated with it. All feeds I know of is served through HTTP. From a semantic point of view, the feed URI should be this:
http://charlvn.blogspot.com/atom.xml
Therefore, I am totally not in favor of this method of linking.
The Proper Solution
Actually, it's simple. Or maybe not.
The user's system must be configured to handle a URI this way: As soon as a web browser (and this principle can probably be applied universally to all UAs) receives a file with a content type it doesn't understand, it must find an application that does. Then it must read a setting to either download the file and save it locally and then only pass the path of that file to the respective application that can open it, or is must pass the original URI to that application directly. This setting can then be set on a per-application basis so that each application can be given what it can handle best.
However, this option will require some modifications to the web browser, as most of them don't support this atm AFAIK. Therefore, it's not an immediate solution.
In an ideal world, every applicable application should be able to be fed any URI that points to a resource either on an external server or locally, and there should be a central list of MIME types with their associated applications that can be accessed from any application. They should also make use of a universal API that is integrated with the OS to load (or download) that particular resource. Then, if any application tries to open up a resource with a MIME type it doesn't support, it can just call another application directly and feed them the URI regardless.
But, this isn't an ideal world, and we're probably a few years away from that. We're calling for solid integration here after all. Still nice to philosophise, though! That's right guys, work towards it! Just imagine how cool such functionality would be! However, security is still an issue. You can't integrate Microsoft Word or IE or the likes into such a system, or you'll have a virus very soon!
However, there are one or two things to keep in mind when implementing such a solution:
You need to make sure the MIME type of your feed is correct. And not only correct, but ultimate. Of course, you shouldn't send it as text/plain or as text/html, but everyone (should) know that. Technically speaking, to send an Atom feed as application/xml is correct. However, for reliable and easy integration as described above, it will have to be sent as application/atom+xml so that that particular MIME type can be associated with the feed reader.
And opening up an external application must take precedence in an XML UA over trying to handle the document itself because of the +xml bit at the end!
Link Relations
Another option that has been proposed is rel attributes on the links pointing to a feed. For example, I could link to my Atom feed like this from any (X)HTML document:
<a href="/atom.xml" rel="feed">My Atom Feed</a>
This could also be used by the web browser to know that this URI must be passed on to the feed reader. A much simpler solution for now. Not a bad option, but let's say I copy-and-paste a URI. No rel then! Anyway, I think in the light of what I mentioned previously in this post, we should rather strive towards the "ideal" solution.
<rant>
One thing that I also seriously want to moan about:
On Blogger, and on many other weblogging services/systems too, a link with rel="alternate" is added to every page in the weblog that links to the most recent posts feed. The most recent posts page (typically the home page on most weblogs), the archive pages, and even the item pages, all contain this link. Not semantical!
rel="alternate" means the resource you're linking to essentially contains the same content as the document you're linking from, but in a different format or language. When you link to a feed like this from an item page, you're basically saying that the feed currently contains, and will remain to contain, the same content as that item page. This is ok if the feed is something like a comment feed, but if the feed is a "latest posts" feed (like in most cases), it doesn't contain the same content as the item page. Therefore it should be liked to with rel="feed", not rel="alternate".
So, on the most recent posts page (the home page or whatever), providing that it contains and will remain to contain precisely the same posts that you're syndicating through the feed, you can link to the feed using rel="alternate feed", while on all other pages, you can link to the same feed using rel="feed" only.
</rant>
Some Resources
UTF-8 in ASP.NET
Update: Appologies, I made a mistake in this post by writing ill-formed XML. Unforgivable, I know. Funny that I didn't notice it until now. Anyway, corrected.
ASP.NET has super UTF-8 support!
ASP.NET destinguishes between two different character encodings: The File character encoding of your source code/document, and the Response encoding of the resultant document that will be sent to the UA.
.NET mostly uses Unicode internally. This includes the String class.
The fileEncoding (source code/document character encoding) directive can be set in the web.config file to specify the default character encoding like this:
<configuration>
<system.web>
<globalization fileEncoding="UTF-8"/>
</system.web>
</configuration>
This will set the default file encoding to UTF-8.
If you don't set the default like this, I have no clue as to what it will be. The documentation doesn't say. However, it does say this:
Unicode and UTF-8 files saved with the byte order mark prefix will be automatically recognized regardless of the value of
fileEncoding.
(I didn't edit the text in the quote above, only the markup.)
The responseEncoding is, by default, UTF-8. In most cases, it will be better to leave it like that. However, you can overide it either by the responseEncoding directive in web.config or in the individual source code/document files.
In the web.config:
<configuration>
<system.web>
<globalization responseEncoding="ISO-8859-1"/>
</system.web>
</configuration>
Or in the individual source code/document file (takes precedence over the above):
<%@Page ResponseEncoding="ISO-8859-1"%>
Both of these will result in ISO-8859-1 as output, and the HTTP header will be ajusted accordingly automatically!
There is also the Request Encoding, but I'll leave that up to you to read up about yourself if you want [more].
It's the best to set both your file encoding and your response encoding to UTF-8 in most cases. This will probably be the most efficient and flexible solution. However, if you're suffering with an editor that doesn't have proper UTF-8 support, you might want to use a different file encoding (or simply to switch to another editor).
If the file encoding and the response encoding differ from each other, ASP.NET should automatically handle the conversions transparently.
This is something PHP is still seriously lacking. It does have all of the necessary functions like utf8_encode, but that's far from transparent. And you can have built-in support for UTF-8 too, but that needs special compiling. But AFAIR (what a cool acronym...), that might change soon.
If you want to learn more about UTF-8 and Unicode in general, here is a list of my personal top resources:
- Unicode Home Page
- Quick guide to UTF-8 <Anne's Weblog about Markup & Style>
- ‘Karaktercodering is nog een ondergeschoven kindje’ (UTF-8)
- Lachy's Log: Guide to Unicode, Part 1
- Lachy's Log: Guide to Unicode, Part 2
- Lachy's Log: Guide to Unicode, Part 3
ASP.NET and Standards
ASP.NET is often regarded as a definite no-no for any standards compliant developer. However, be careful not to throw things out of proportion. In this post I'll try to explain the concept of ASP.NET and some of the problems so that we can clear things out a little.
Note: A "form" in .NET means something totally different than in (X)HTML. Therefore, when I mention "forms" I refer to "ASP.NET Forms", and when I write "<form/>s" I'm talking about "(X)HTML Forms".
There are two different ways you can code server-side scripts in ASP.NET: The old way and the new way.
The Old Way: Code ASP.NET like you are used to coding PHP. You write standard, straight-forward markup. No funny-business. When you code a <form/>, you do that just like you have always done it. You validate the input by coding your JavaScript separately (if you want), and then you (re)validate it all on the server side as you retrieve it by good old proven methods.
The New Way: Use ASP.NET forms and controls to really show just how lazy and braindead you are. Ever written a desktop application in good old VB or in the new .NET? Remember how you set up a "form" (window) by dragging and dropping, and then add some functionality by using events? Now you do almost precisely the same on a web page.
The advantage of the new way is that an application developer experiences a minimal learning curve when creating web applications. The disadvantage is that Microsoft doesn't care about standards (no news there), and the code created by the ASP.NET forms is no less than tag **** (not "soup", another 4-letter word also starting with the letter "s").
First, let's take a look at ASP.NET forms a little more closely. As hinted above, a form (in Microsoft terms) means a "window" in desktop application development. The idea was to bring this entire concept over to web development - essentially treating a web page like a window. VB started the concept of dragging and dropping "controls" such as buttons, drop-down boxes, text boxes, radio buttons, etc on these "forms" to easily create an application. Now with .NET, you can do the same in VB.NET, C#, J# (Microsoft's equivalent to Java), etc. All of these languages are merely just that - languages. They all interact with a standard API - the Microsoft.NET Framework API.
So, now they're taking this concept over to the web. ASP.NET allows you to code in two languages: C# and VB.NET (C# obviously being the most loved alternative, being the "next big thing" and all - well, it's a lot better than VB, so...). Again, you interact with the same API as you would with desktop applications. For example, the same XML parser you use in any desktop application can now also be used in your ASP.NET web application.
I have to admit it: These guys at Redmond didn't get so rich by being stupid. The concept behind it is truly excellent. No matter what programming language you prefer, or if you're writing desktop or web applications, you can always interact with the same API and use the same basic programming language (although ASP.NET doesn't have languages like J#, but no real loss there). Just think about all of the various implementations of certain simple functions currently in existance. Why not rather centralise it all by placing all of these classes and functions (aka methods) in one central repository so that they can be used anytime, anywhere. Brilliant.
However, now we get back to the nasty part. Since there are still some fundamental differences between a desktop application and a web application, Microsoft created the ASP.NET forms. Basically, this is a set of controls that can be used specifically for web development. They dynamically generate markup and JavaScript on-the-fly as the application is executed server-side. And, as to be expected from Microsoft, the generated markup of the built-in controls sucks from a standards point of view.
Let's say I want to create a simple feedback form. I create the form itself with all the fields (from name, from e-mail, subject, body) and a submit button (also a field actually). Then you tie an event to the submit button that reads the text in the fields. You can do error checking & handling, display error messages by setting text in a label (the web equivalent of the "labels" you use in desktop applications, not to be confused with an (X)HTML <label/>) and virtually anything else completely transparently just like the page is a window and no round trips to the server ever has to be made.
In reality, this is what gets sent to the browser: Forget about a standard <input type="submit"/>; you actually get an <input type="button"/> with an additional onclick attribute that calls some JavaScript to make the form submit. Eventually the form gets submitted just like any other form unless you have JavaScript disabled. Just wait until Microsoft gets XMLHttpRequest into their minds...
One of the things that makes me really angry is that some of the built-in controls act like IE is the only "smart" browser and sends it more advanced markup while sending very primitive markup to other browsers. Microsoft cleanly explains this as IE being "technologically advanced" while other browsers are "less advanced". Except for the fact that IE being technologically advanced is an oxymoron, other browsers are much more... agh why even waste my time by typing more about this?
So, essentially if you want to create a standards compliant website, you'll have to stick to the old way: Coding markup manually, processing forms the old way (no events and stuff), etc. The good thing is that it seems like ASP.NET doesn't really intefere with good old hard-coded (X)HTML. So if you stick to that, you can do your ASP.NET coding in peace without much worrying. And you can still use the powerful Microsoft.NET Framework APIs. :-)
So, the purpose of this post is just to say: Don't fear ASP.NET; it's not as terrible as it might sound. Actually, it's quite cool (except for the lack of speed at development time because everything needs to be precompiled). But stay away from those built-in controls! Sure, you can create your own controls (very cool actually), and then you can get it to output whatever you like. Just be careful, because for some reason the whole control-based setup just doesn't look very stable to me what standards and accessibility is concerned. Well structured, semantic markup is just better done by hand! :D
PEAR DB Package Basic Tutorial
As promised yesterday, here is a basic introduction to the PEAR DB package, a fairly powerful standardised database abstraction class.
For this article, I will assume that you already have at least basic knowlege and experience with database connectivity, SQL, OOP and PHP.
Remember that this is a basic tutorial on the DB package for people that are already used to database connectivity in PHP, therefore I will assume that most of the code shown here is self-explanatory. I'm not going to go into all of the details.
In order to use any of the PEAR packages, you must first have PEAR installed on your server [instructions]. You must also have the package you want to use installed separately either in the common includes directory or in some other directory where you can include the DB.php file into your script. You can download the DB package itself here.
Now, create a new PHP script and let's start hacking. :-)
First of all, you need to include the DB class:
require_once('DB.php');
Now, to connect up to the database you need to formulate a DSN. Let's assume you are trying to connect to a MySQL server on host hostname with username user and password password while trying to open up database database:
$dsn = 'mysql://user:password@hostname/database';
Now, issue this command to connect up:
$db = DB::connect($dsn);
This creates a new DB object and connects to the database using the DSN as discussed earlier.
We should be connected to the database now. If everything doesn't run that smoothly, you could add some debugging code as shown in the manual to help you solve the problem. When writing professional applications, it is best to have proper debugging along with applicable error messages anyway.
Next, let's say I want to count the rows in table articles without retrieving all of them from the server first. Here is what I could do:
$count = $db->getOne('SELECT COUNT(*) FROM articles');
As you can see, everything we do from now on concerning the database is done through the DB object we created earlier referenced through the $db variable.
The getOne method executes any SQL statement and returns the first column of the first row from the result. Therefore, the value of the count we performed is stored directly into $count. Normally, we would have had to write all of this to do precisely the same thing:
$count = mysql_fetch_row(mysql_query('SELECT COUNT(*) FROM articles'))[0];
You see how easy the DB class makes certain things? Far more minimalistic, neater code.
Now, on to a normal query:
$result = $db->query('SELECT * FROM articles');
while($row = $result->fetchRow(DB_FETCHMODE_ASSOC))
{
print $row['title'];
// etc, etc...
}
You can see that the result set is also an object now. The row we're retrieving though is still like it always was. Also note DB_FETCHMODE_ASSOC; this makes the fetchRow method return a keyed array and replaces mysql_fetch_array. You can replace that with DB_FETCHMODE_ORDERED and that will then return a standard ordered array just like mysql_fetch_row does.
If you leave out that parameter entirely, it will revert back to the default. The default is normally DB_FETCHMODE_ORDERED, but you can change that by doing the following:
$db->setFetchMode(DB_FETCHMODE_ASSOC);
Any queries performed after this will return a keyed array.
And when you're finished, don't forget to close the connection!
$db->disconnect();
Hope you found this article useful — this was the first in many articles about various PEAR packages.
PHP PEAR
The PHP PEAR is an extremely cool resource for all PHP developers. There are many packages that can automate a lot of tedious tasks and add some interesting functionality, all pre-written for you.
What's even better is that it attempts to standardise these packages for universal use. This makes it much easier to start up, and makes it easier for others to understand your code (since you're using some of the same classes as them). It can even help you to get different applications to work together.
For example, take the ever-prevalent database abstraction class. Most good programmers use those, since if they ever need to migrate their application to a different database system they can do so much quicker. The first system I ever saw that used this is the all-famous phpNuke, although I don't really know where the concept originated.
The PEAR offers you the DB Package. It does everything your standard database abstraction class does, plus more. It's completely OO, and what's better, it's standardised, free and open source. I'll try to give you a basic tutorial on how to use this package tomorrow.
At first I thought I would quickly make a list of some of the most interesting packages in the repository; however, there's so many (and the number of them is growing by the day) that I don't even feel like starting! I'll cover some of them one by one over the next few months. Stay tuned...
Redesign (Finally) Launched!
Update: As Lachlan pointed out to me in a private conversation, I wasn't too clear about what I actually redesigned. It's this weblog. The design has happened gradually over the last 4 months, so many wouldn't really have noticed it much.
Ok, this must have been one of the longest redesigns in history, but trying to fit all of this into an already busy schedule is not always easy.
The template is based on Son of Moto by Jeffrey Zeldman. I kept much of it, but I completely changed the look of the sidebar and I changed the main content area from having a fixed width to being liquid. At the moment, the sidebar is fixed but I'll make that liquid too one day.
I also changed the font, because I wasn't too keen on the original one. I decided on Verdana for its readability. I decided on a font size of 0.8em for most of my content because it is clean and neat (1em looks too bulky), but yet still not too small and very readable. For me, anything smaller than that is too small to scan through. I often have to manually ajust my font size in Bloglines since it's default font is irritatingly small for that kind of application, and it really puts unnecessary strain on the eyes.
The sidebar was getting a bit full (especially with my large blogroll), and it was becoming rather difficult to browse around in it. It started as a standard expand/collapse menu, but soon ideas started to form and now it's an XML look-alike. I guess it's really a bit of a mock, but taking the topic of this site into account, I think it's more than appropriate.
It took some JavaScript (completely DOM-based of course) to get it working, but if you have JavaScript disabled it is still completely accessible. It's a good example of how JavaScript can be used to aid the user and to improve the user experience without sacrificing accessibility at all. There you have it, JavaScript doesn't necessarily have to be evil. :-)
Thanks also goes to Ben Ward for his suggestions regarding the menu.
The main theme of the header stayed the same as in the original template (except for the fact that it is also liquid now). I did find some need to spruce it up a little, so with a lot of help from Graphic I made a plan there too. I now boast a Japanese logo with the text 開 放 規 範
("kaihou kihan") on the right. This means something in the direction of "open standards". Quite fitting. :-)
The "tagline" is a bit of an odd one. The theme of this site doesn't really fit with gangsterism, and I don't even really know why I used that in the first place. As far as I can recall, I was listening to some track from PhonoPhunk. That probably explains it then. Spanspek was added in later. It is Afrikaans for "melon", and yes, I know that totally doesn't make sense at all, but it just sounds cool. :-)
Thanks also goes to Shunuk for helping me sort out the problem with the Japanese logo in Safari. For some reason I made the very stupid mistake of having the type as image/jpg instead of image/jpeg. It is quite a suprise though that Firefox rendered the image; I'm actually sort-of disappointed. This is a very good example of why standards is important in 2004/2005: If Firefox didn't support that other (incorrect) mime-type then I would have cought that issue much earlier. Down with quirks mode!
Lastly, I just want to brag about the fact that I'm one of the relatively small amount of users of aural stylesheets. I know support for it still sucks, but as you should know by now, that won't stop me! ;-)
I do admit that there are one or two small issues that still need to be sorted out, but I'll get around to those as I have time. For the meanwhile, I'm officially "launching" the new design. Those of you who have watched the design of this weblog over the past few months would know how the design slowly evolved, and finally I think it's at a state where I can make an official launch. It will still evolve even further, since there's much work that needs to be done. Stay tuned. :-)
LinkLog
Finally, I have something that every weblog should have these days: a LinkLog, thanks to del.icio.us. You can also subscribe to the feed [bloglines]. One day I'll find a way to import the latest links into my template, but until I figure out how, you can just use the feed if you like. :-)
Firefox 5 Minute Challenge
As reported, the Firefox 5 Minute Challenge has just been released. I helped Lachlan a little by creating the stylesheet. I also translated the page into Afrikaans (the screenshots are not of the localised version since that is still work in progress).
So, if you haven't converted all of your friends yet then now is the time!
Retaining Heading Structure Integrity
I find that headings are heavily underused in weblog posts atm, but that is also probably since weblog posts are normally not very large. However, when you write long posts from time to time like me, it's always very handy to be able to use headings to break things up into more managable parts.
This helps your readers scan your content. Scanning has always been a very important usability issue on the web (I can recall Jacob Nielsen writing about this in Designing Web Usability), and with this information overload we're in right now combined with feed reader technology, scanning is becoming even more important than ever.
However, there are a number of practical difficulties (not impossibilities) with using proper hx elements directly in posts. It can often become quite a task to retain structural integrity (that sounds really fancy, doesn't it - Star Trek comes to mind).
This is what got me thinking: When I started this weblog, in the default template the posts got structured in the document first according to date and then according to post title. So I had one h1 element that contained my weblog title, h2 elements containing dates, and h3 elements containing post titles.
If I wanted to use sub-headings inside my posts, then I would logically have had to start with h4 elements in such a setup.
I don't like the way the h1 elements are being used by Blogger, although I haven't had the time to do something about that yet. I also didn't really fancy the date heading thing (for my purposes), so I removed it. Therefore I needed to move up the post headings from h3 to h2 elements. Now, if I had used h4 elements in my posts I would have had to change all of those to h3 elements. Quite pesky, as you can imagine!
In reality, what happened was (and what is still happening is) that I didn't/don't want to go through many of my posts every time I change something in my document structure. Therefore, I shamefully tried to work around not using headings at all by simply leaving them out. And when I couldn't, I used description lists and even <p><strong>My heading here</strong></p> which is nothing short of being totally unforgivable!
But wait, before I start crying out loud for the world to hear my anger and frustration, let me first give you another example: Let's say you want to display a single archive page with all of the posts for one month. So you have one h1 element containing an appropriate heading like "Archive for December 2004". Then you have the various posts for that month with each post title being contained in a h2 element. But now in the item pages for those posts you want the post title to be contained in a h1 element. See the problem?
There are many other cases I can state where hard-coding the level numbers of the document structure with hx elements can be a real pain in the neck. Typically, your only option would be using something like XSLT to transform the elements in question accordingly. However, be careful:
One last boring example. Let's say you want to syndicate a piece of content (containing markup and headings) from somebody else's site (think Bloglines). You can't just insert it directly into your site, since that might break your heading structure. However, you also can't just run it through a simple XSLT stylesheet since you can't assume that you know what the top-level heading of that specific piece of content is. It could be h1, h2, h3 etc and that will depend on how you will need to transform. You can do some cool things with XSLT, so maybe there's a reasonable workaround for this. Otherwise, you'll have to first parse the content by some other means and sniff the heading structure of that piece of content.
As you can see, you could probably make a plan with anything they throw at you. Just don't expect it to be simple!
XHTML 2.0 tries to solve this problem by using h elements which can be used in combination with section elements.
Take the following simple example:
<h2>Fruit</h1>
<h3>Apples</h2>
<p>Apples are delicious fruit!</p>
<h3>Pears</h2>
<p>Pears are nice, too!</p>
In XHTML 2.0, that document structure could be marked up like this:
<section>
<h>Fruit</h>
<section>
<h>Apples</h2>
<p>Apples are delicious fruit!</p>
</section>
<section>
<h>Pears</h2>
<p>Pears are nice, too!</p>
</section>
</section>
On first glance, the markup might seem a little more complex, but this actually does solve many problems. Since you're not hard-coding the numbering of the headings, you can insert the above into the body element or any other section element and the document structure will sort itself out automatically! Much easier. For some reason, I also like this because it just feels more logical to me. However, as we all know XHTML 2.0 is not a recommendation yet, and even when it becomes a recommendation we'll probably still have to wait a while in order for it to become well supported. (Note that I use "a while" and not "forever" since I'm not counting in IE.)
Comments in Feeds
Most people that read any particular post would like to follow its comments too (I know I do), especially when they are participating in the discussion. But that can become quite a task (or even impossible) when you have a large blogroll.
When you have more than 100 feeds like me (I know, I know!), it's much easier to simply browse and read through all the new posts directly in your feed reader. So, what about comments?
I was very happy when Anne recently announced his new comment feed [bloglines], since that makes life just so much easier. This is fast becoming a necessity (every weblog should have one - come-on Blogger!).
There are only three things that irritate me somewhat about such a comment feed:
- The comments are in order of date, not in order of their respective post. This makes it a little harder to follow a particular discussion, especially on weblogs that post frequently and have separate discussions going on in various posts.
- When you read a post, you need to open up a separate feed to read the comments to that post unless you want to open up that post on the author's site (something which some of us don't want to do). You can't just read the post and immedately read the comments on that post easily.
- You need to subscribe to two separate feeds, which isn't the end of the world but it does make one's blogroll a little more cluttered.
We all know that Atom is very cool, and is seems to be getting cooler by the day. Since Atom is so powerful I couldn't help but think about the possibilities...
What about integrated comment syndication? The comments could be distributed along with the posts in a single feed. For example, you could have an item element and inside that you could have multiple comment elements. A smart aggregator could then find a nice way to display these in relation to each other, while preserving the spirit of aggregation by only showing you new comments which you haven't read yet. Wouldn't that be much easier?
Ok, so maybe I'm going a little overboard here. I should probably get out into the real world, smell the cuppachino, eat a fresh blueberry muffin with two strawberry-and-cream pancakes (do I have you hungry yet?), and climb out of my feed reader. However, since many items will have associated comments, and since we even have specialised linkblog markup, what's the harm of adding some more supercool functionality? Just dreaming on...
Copyright © 2004-2008 Charl van Niekerk. All articles are released under the Creative Commons Attribution 2.5 South Africa licence, unless where otherwise stated.


