Charl van Niekerk » Blog

Main

Latest

Archives

Powered by Blogger

PHP CamelCase Explode 2.0

In an update to yesterday's post PHP CamelCase Explode, here is a different implementation making use of regular expressions:

<?php

/**
 * Splits up a string into an array similar to the explode() function but according to CamelCase.
 * Uppercase characters are treated as the separator but returned as part of the respective array elements.
 * @author Charl van Niekerk <charlvn@charlvn.za.net>
 * @param string $string The original string
 * @param bool $lower Should the uppercase characters be converted to lowercase in the resulting array?
 * @return array The given string split up into an array according to the case of the individual characters.
 */
function explodeCase($string, $lower = true)
{
  // Split up the string into an array according to the uppercase characters
  $array = preg_split('/([A-Z][^A-Z]*)/', $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
  
  // Convert all the array elements to lowercase if desired
  if ($lower) {
    $array = array_map(strtolower, $array);
  }
  
  // Return the resulting array
  return $array;
}

?>

This should do the same as the other algorithm but just using less code.

Don't you just "love" the title? 2.0 is the new green.

Oh yes, and when I mention yesterday, I mean before I went to bed. It's technically speaking still the same day both in South African and UTC terms. There's only two hours of difference anyway.

7 Comments

Comment by Blogger Gerg on Sunday, November 11, 2007 10:53:00 PM

I think the regex should be /([A-Z][^A-Z]+)/, not /([A-Z][^A-Z]*)/ . Otherwise it splits acronyms like "AOL".

Comment by Blogger Charl van Niekerk on Monday, November 12, 2007 3:21:00 PM

Hey man, good one, thanks for pointing that out! My intention was actually that acronyms such as AOL should be split up into 3 elements (that's the way I tested). However, I'm sure lots of people would like to use your solution instead.

Comment by Blogger Charl van Niekerk on Monday, November 12, 2007 3:25:00 PM

Oh yes, just why I didn't design it like that, if you join two adjacent segments that are both all-caps one will not know where they should be separated.

Comment by Blogger Andreas H. on Monday, June 16, 2008 4:14:00 AM

The "advanced" thing, with the following features:
- give "ABCd" or "AB Cd" or "J E T Word" as an $example_string parameter, to tell the function how it should split acronyms and MIXedWords.. (performance issues are avoided via caching in static variables)
- additional $glue parameter to implode the array afterwards.

See http://www.bevolunteer.org/trac/wiki/CamelCaseExplode

Example use:

echo '
'.camelCaseExplode('MyXMLParsingEngine', true, 'ABCd', ' ');
echo '
'.camelCaseExplode('MyXMLParsingEngine', true, 'AB Cd', ' ');
echo '
'.camelCaseExplode('MyXMLParsingEngine', true, 'A B Cd', ' ');

gives

My XMLParsing Engine
My XML Parsing Engine
My X M L Parsing Engine

---------------

And here's the code...
(sorry, blogger doesn't like the < pre > tag in comments)

function camelCaseExplode($string, $lowercase = true, $example_string = 'AA Bc', $glue = false)
{
static $regexp_available = array(
'/([A-Z][^A-Z]*)/',
'/([A-Z][^A-Z]+)/',
'/([A-Z]+[^A-Z]*)/',
);
static $regexp_by_example = array();
if (!isset($regexp_by_example[$example_string])) {
$example_array = explode(' ', $example_string);
foreach ($regexp_available as $regexp) {
if (implode(' ', preg_split(
$regexp,
str_replace(' ', '', $example_string),
-1,
PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE
)) == $example_string) {
break;
}
}
$regexp_by_example[$example_string] = $regexp;
}
$array = preg_split(
$regexp_by_example[$example_string],
$string,
-1,
PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE
);
if ($lowercase) array_map('strtolower', $array);
return is_string($glue) ? implode($glue, $array) : $array;
}

Comment by Blogger Charl van Niekerk on Monday, June 16, 2008 10:18:00 AM

Cool thanks Andreas!!

Comment by Blogger Stephen on Friday, July 11, 2008 1:05:00 PM

Thanks for the code, this was just what I was looking for..


One thing I noticed...

if ($lowercase) array_map('strtolower', $array);

neeeds to be:

if ($lowercase) $array = array_map('strtolower', $array);

Comment by Blogger Charl van Niekerk on Saturday, July 12, 2008 12:46:00 AM

Well spotted Stephen!

Post a Comment

Copyright © 2004-2009 Charl van Niekerk. All articles are released under the Creative Commons Attribution 2.5 South Africa licence, unless where otherwise stated.