Charl van Niekerk » Blog

Main

Latest

Archives

Powered by Blogger

W3C Validator on CentOS 5.2

About a week ago I decided to set up my own copy of the W3C Validator and the Validator.nu. The latter is mainly for HTML 5 validation. The W3C Validator can now validate HTML 5 by using Validator.nu's REST API, so to validate HTML 5 fully locally you need to run your own copy of Validator.nu as well.

I set both of these up on a CentOS 5.2 box with Perl v5.8.8 built for i386-linux-thread-multi (taken from perl -v). Due to the limited amount of yum packages available, I needed to install various Perl modules manually in order to get the W3C Validator to work.

At the end of the day, I had to hack the check script anyway due to some silly issue with my XML/DOM library.

cvs diff: Diffing .
Index: check
===================================================================
RCS file: /sources/public/validator/httpd/cgi-bin/check,v
retrieving revision 1.606
diff -u -r1.606 check
--- check 14 Nov 2008 16:22:51 -0000 1.606
+++ check 22 Nov 2008 06:41:43 -0000
 -872,8 +872,8 @@
       $File->{Templates}->{Error}->param(fatal_missing_checker  => "HTML5 Validator");   
       return $File;
     }     
-    my @nodelist = $xmlDOM->getElementsByTagName("messages");
-    my $messages_node = $nodelist[0];
+    my $nodelist = $xmlDOM->findnodes('/*');
+    my $messages_node = $nodelist->get_node(1);
     my @message_nodes =  $messages_node->childNodes;  
     foreach my $message_node (@message_nodes) {
       my $message_type = $message_node->localname;

Eventually, I got it running though. Validator.nu was also a bit tricky. I had some issues with Java but eventually got that running too. (Please note that those two sites are purely temporary and can be deactivated at any time.)

6 Comments

Comment by OpenID olivier on Monday, December 01, 2008 1:57:00 PM

Ah, very interesting. I think the issue is that xmlDOM->getElementsByTagName() is only supported by versions of libxml2 more recent than what is bundled with a lot of linux distributions.

Your patch is nice, but it does assume that the format of the html5 validator will not change, which I think is a bit dangerous. But for now, it does work.

Maybe you could follow-up to the "installation notes" of http://lists.w3.org/Archives/Public/www-validator/2008Nov/0025.html ?

Comment by Blogger Charl van Niekerk on Tuesday, December 02, 2008 2:53:00 AM

Hi Olivier,

Many thanks for your comment. Although at the time I was mainly focusing on just getting it working, it should be possible to modify that XPath to be the equivalent of your code. I'll take a look at it and post another patch if I can.

Best Regards,
Charl

Comment by Blogger Charl van Niekerk on Tuesday, December 02, 2008 3:05:00 AM

Ok, that's really strange...

"/*" works but not "/" and neither "/messages". Perhaps my XPath knowledge is somewhat lacking though.

Comment by OpenID olivier on Monday, December 22, 2008 9:04:00 PM

Hi Charl,

I wonder if we could skip the “find the child nodes of the node we just selected step” since the xpath selection method you found would allow us to simply select all the messages nodes, something like:

- my @nodelist = $xmlDOM->getElementsByTagName("messages");
- my $messages_node = $nodelist[0];
- my @message_nodes = $messages_node->childNodes;
+ my @message_nodes = $xmlDOM->findnodes('messages');

Not tested, and I'm not much of an XPATH guru myself… but could probably work. Want to give it a go?

Comment by Blogger Charl van Niekerk on Saturday, January 03, 2009 10:59:00 PM

Tested but unfortunately doesn't seem to work either. Perl doesn't get a fit or anything, but if there's an error it won't be reported and the validator will say the document is valid.

Comment by Blogger Charl van Niekerk on Saturday, January 03, 2009 11:00:00 PM

Just tried a few other things but it seems like the xpath implementation is severely incomplete/broken.

Post a Comment

Copyright © 2004-2009 Charl van Niekerk. All articles are released under the Creative Commons Attribution 2.5 South Africa licence, unless where otherwise stated.