On verbosity of programming languages

My primary task at work for the last few weeks has been the building of an open source plugin for IntelliJ IDEA enabling tooling support for building Android applications which need to talk to Azure Mobile Services, Azure Notification Hubs and various Office 365 services. One of the things I needed to do was a little string processing task. Specifically, given a string, the following needed to be done:

Replace all instances of . and _ with a single white space.
Title case each white space delimited word.

Simple enough. I figured it might be an interesting exercise implementing this in the various programming languages that I have varying levels of familiarity with. Here goes.

Java

For various reasons we needed to support Java 6 and up for the plugin. I am fairly new to the Java world so at first it seemed like I was going to have to implement this by hand till I discovered the immensely useful Google Guava library. With Google Guava this turns out to be a function that looks like this:

private String scrubString(String name) {
  // replace all instances of . and _ with white space
  CharMatcher matcher = CharMatcher.anyOf("._");
  name = matcher.replaceFrom(name, ' ');

  // split the string into a sequence delimited by white space
  Iterable<String> tokens = Splitter.on(' ').split(name);

  // this function, given a string returns a title cased
  // version of it
  Function<String, String> makeTitleCase =
      new Function<String, String>() {
        @Override
        public String apply(String str) {
          return Character.toUpperCase(str.charAt(0)) +
              str.substring(1);
        }
      };

  // transform the tokens into their title-cased counterparts
  Iterable<String> titleCaseTransformer = Iterables.transform(
      tokens, makeTitleCase);

  // re-join the title-cased scrubbed strings using white space
  return Joiner.on(' ').join(titleCaseTransformer);
}

That's, well, verbose. If I wanted a terser version of this, I could do this:

private String scrubString(String name) {
  return Joiner.on(' ').
      join(Iterables.transform(
          Splitter.on(' ').split(
              CharMatcher.anyOf("._").
                  replaceFrom(name, ' ')),
          new Function<String, String>() {
            @Override
            public String apply(String str) {
              return Character.toUpperCase(
                  str.charAt(0)) + str.substring(1);
            }
          }));
}

But that's of course, far less readable. With Java 8 lambda syntax however this can be simplified somewhat.

private String scrubString(String name) {
  return Joiner.on(' ').
      join(Iterables.transform(
          Splitter.on(' ').split(
              CharMatcher.anyOf("._").
                  replaceFrom(name, ' ')),
          str -> Character.toUpperCase(str.charAt(0)) +
                    str.substring(1)));
}

Though the only piece of code that was replaced is the callback routine that transforms regular strings to their title-cased counterparts, it does however declutter the code a fair bit.

C#

With C#'s support for LINQ this turns out to be far terser.

private string ScrubString(string str)
{
  return String.Join(" ",
    from p in new Regex (@"[._]").Replace(str, " ").Split(' ')
    select Thread.CurrentThread.
            CurrentCulture.TextInfo.ToTitleCase(p));
}

I wrote that first and then realized that given that we have the ToTileCase method it's a bit of an overkill to split and join the string. Here's a simpler version:

private string ScrubString (string str)
{
  return Thread.
         CurrentThread.
         CurrentCulture.
         TextInfo.
         ToTitleCase (new Regex (@"[._]").Replace (str, " "));
}

Python

With Python's support for list comprehension this ends up being even terser than C#.

import string
import re

def string_scrub(str):
  return string.join([s.title() for s in \
      string.split(re.sub('[._]', ' ', str))])

C++ 11

Here's my take on this using C++ 11 capabilities:

string scrub(const string& input) {
  regex re { "[._]" };
  string str = regex_replace(input, re, " ");

  vector<string> tokens;
  split(str, ' ', tokens);

  transform(tokens.begin(), tokens.end(), tokens.begin(),
      [](const string& s) {
        return title_case(s);
      });

  return join(tokens, ' ');
}

string title_case(const string& str) {
  return string(1, toupper(str[0])) + str.substr(1);
}

vector<string>& split(
    const string& str,
    char delimiter,
    vector<string>& tokens) {
  string item;
  stringstream ss(str);
  while(getline(ss, item, delimiter))
    tokens.push_back(item);
  return tokens;
}

string join(const vector<string>& tokens, char delimiter ) {
  ostringstream ss;
  bool first = true;
  for_each(tokens.begin(),
      tokens.end(),
      [&ss, &first, &delimiter](const string& s) {
        if(first) {
          first = false;
        } else {
          ss << delimiter;
        }
        ss << s;
      });

  return ss.str();
}

JavaScript (of course!)

Here's the JavaScript version (using ES2015 syntax).

function scrubString(str) {
  return str.
    replace(/[._]/g, ' ').
    split(' ').
    map(s => `${s.charAt(0).toUpperCase()}${s.substr(1)}`).
    join(' ');
}

I really like the nice fluent manner in which we are able to translate the requirements into an implementation in JS.

Common Lisp

It's been a while since I have dabbled in Common Lisp, but after some fervent searching here's what I came up with. Note that this does use a library that is not part of the standard Common Lisp distribution called CL-PPCRE which appears to be a fairly popular regular expression library for Common Lisp.

(load "~/quicklisp/setup.lisp")

(ql:quickload :cl-ppcre)

(defun scrub_string (str)
  (string-capitalize (cl-ppcre:regex-replace-all "[._]" str " ")))

There. If I had to pick a favorite I'd have to say I like the JavaScript version the best. The Common Lisp and the C# versions aren't too bad either. What do you think? Sound off in the comments!