The problem of ordering unicode strings in Ruby

As having our company a substantial presence in Colombia, we usually have to deal with the problem of handling content that is written in Spanish within our software products.

Here the typical problem is to order a list person names, where some of them are spelled with funny characters like: á, ó or ñ. For example:

  • Hernán
  • María
  • Álvaro
  • Andy

For a spanish speaker, the proper result for ordering something like this:

persons = [‘álvaro’, ‘alberto’, ‘andy’]

is:

[‘alberto’, ‘álvaro’, ‘andy’]

But a standard ruby’s persons.sort command will return:

[‘alberto’, ‘andy’, ‘álvaro’]

That happens due the á unicode character. Fortunately, there are several solutions available in github for this problem. My solution is to use the ‘sort_alphabetical’ gem. It’s pretty simple:

In your gemfile include:

gem 'sort_alphabetical'

And insead of calling sort, use sort_alphabetical:

persons.sort_alphabetical

And you will have the correct answer:

[‘alberto’, ‘álvaro’, ‘andy’]

Hope this helps :)

Written on August 25, 2014