home

also visit: Theatre of Noise | Soundings

about: this site | me

subscribe: RSS

08 August 2007

Friendly Readable ID Strings

One often needs unique ID strings to tag processes, threads, files or anything else that might be created in quantity. Traditionally these are created based on PIDs or similar system values. But it is not easy to visually recognise these strings, which makes debugging more difficult than it need be. This recipe creates readable and pronounceable ID strings that are much easier to work with.

I have submitted this to the Python Cookbook but thought I would write it up here as well. The code is freely available under the Python license.

The inspiration for this recipe is the fact that I hate debugging systems by trying to track IDs that look like "kSbT73oQ". It's much easier on the brain to scan strings like "levokosi". These are also pronounceable, so one can communicate with other members on the development team. "Hey dude, what's up with file sadofabi -- it appears to be locked."

I many cases one does not need billions of IDs, so restricting the strings to alternating consonants and vowels, and a reasonable length, gives us plenty of scope. The function GetFriendlyID() generates these of length 8, using the built-in Python pseudo-random generator. If this is not good enough, increase the length or use a more random module. But before you do so, realise that the algorithm here allows over 40 million unique strings.

(Of course it is sometimes useful to have ID strings based on PIDs or whatever, so they can be tied back to the originating process. In those cases this recipe is not applicable.)

The GetUniqueFriendlyID() function sketches out a typical implementation, where we want to ensure the generated IDs are, in fact, unique.

The module-level code does a simple test so we can inspect some resulting strings. Baby name generator, anyone?

from random import choice

def GetFriendlyID():
"""
Create an ID string we can recognise.
(Think Italian or Japanese or Native American.)
"""
v = 'aeiou'
c = 'bdfghklmnprstvw'

return ''.join([choice(v if i%2 else c) for i in range(8)])

def GetUniqueFriendlyID(used_ids):
"""
Return an ID that is not in our list of used IDs.
"""
# trying infinitely is a bad idea
LIMIT = 1000

count = 0
while count < LIMIT:
id = GetFriendlyID()
if id not in used_ids:
break
count += 1
id = ''
return id

if __name__ == '__main__':
from sets import Set

print 'some sample unique IDs:'
used_ids = Set()
for i in xrange(50):
id = GetUniqueFriendlyID(used_ids)
if not id:
print 'something broke'
break
used_ids.add(id)
print id


Edit 10 August: Fixed resetting ID string in GetUniqueFriendlyID(). Fixed import. Changed implementation example to use a set to store IDs.

Edit 8 August: I've simplified the algorithm into what is basically a one-liner. This version requires Python 2.5 as it uses conditional expressions.

16 comment(s) follow:

  Anonymous Anonymous wrote at 08 August, 2007 17:00...

Hey, nice idea.
It seems easier without the even() function though:

for i in xrange(1, length+1):
    if i % 2:
        char = random.choice(consonant)
    else:
        char = random.choice(vowel)

  Blogger robin wrote at 08 August, 2007 17:34...

Or even this one-liner:

from random import choice

return ''.join([choice('aeiou' if i%2 else 'bcdfghklmnprstvw') for i in range(8)])

  Anonymous Anonymous wrote at 08 August, 2007 20:36...

Dude, that is a terrifically creative idea.

  Blogger Chuck wrote at 09 August, 2007 20:17...

I've seen this long ago with password generators, and it's for the same reason: it's much easier to remember something that looks like a word. The one I remember from LambdaMOO also took care to avoid some of the obvious combinations that looked obscene, though it's impossible to get all of them. Imagine one of your users saying "process sukapenis is hung"? (huh huh..huh)

  Blogger robin wrote at 09 August, 2007 22:03...

Ah yes, now I think about it there are of course password generators that use a dictionary to grab word combinations. I remember taking great delight in reading all the "free" AOL disks that arrived in my mail for just this reason... found poetry.

I don't think one need worry too much about obscene combinations since the chance of getting them would be small. But then again: monkeys, typewriters, Hamlet, etc.

  Anonymous Anonymous wrote at 09 August, 2007 22:33...

Instead of keeping a list of generated names I would append a unique number to the end of the generated word. Even a thread safe accumulator would guarentee uniqueness across the life time of the process.

  Anonymous Antti Rasinen wrote at 10 August, 2007 05:31...

You can also precompute a list of character combinations:

alternating = [consonants, wovels]

And then use map:

"".join(map(choice, alternating))

Of course, with this method you can also build several variants of acceptable consonant-wovel combinations, such as vccvcvcc (abbarocks) or allow capital letters (at least for the first letter) and so on.

You can also add these in a list and then pick one randomly from those.

I would not use a list for storing the used IDs however. If the list is large and the search fails, that's a LIMIT * len(list) operations. A set would be more applicable. Or perhaps even a binary search tree =)

  Anonymous Peter wrote at 10 August, 2007 06:18...

original:
return ''.join([choice(v if i%2 else c) for i in range(8)])


fixed reference:
return ''.join([random.choice(v if i%2 else c) for i in range(8)])


alternatively change import random:
from random import choice

  Anonymous Denis wrote at 10 August, 2007 08:46...

Its great, but useless without smart word filter -- some people dont want to say "My case ID is sadomazo, no, not sadomaso but with 'Z' instead of last 'S'. Yes, sadomaZZZo" by phone on their workplace.
;)

  Blogger Martijn Pieters wrote at 10 August, 2007 09:38...

py 2.4 version:

return ''.join([choice(i%2 and v or c) for i in range(8)])

  Anonymous Ralph Corderoy wrote at 10 August, 2007 13:51...

The unique version returns a non-unique id after LIMIT attempts; shouldn't it do something else? Returning a non-unique id may cause huge problems later.

Perhaps a better way would be to use the numbers from a PRNG and map those onto pronouncable words, discarding unpronouncable ones. That way, you know the period of the PRNG is large, that you're discarding only 66% of them, and you only have to remember the current seed to resume where you left off, not a list of all previously returned ones.

Cheers, Ralph.

  Blogger robin wrote at 10 August, 2007 15:04...

As a couple of readers have suggested, a set is more appropriate to store larger lists of IDs. This example was not really designed for that.

Peter: thanks, I'd forgotten to change the import when editing.

Denis: I hope this recipe is not "useless without smart word filter". Your example will not be a problem since the letter "z" is not allowed. Likewise any other redundant pronunciation problems are hopefully avoided. And without needing the overhead of a dictionary of words.

The biggest problem with pronouncing words is the vowels. In some English accents vowels essentially have the same sound. You will have to judge how important this is for your application.

Ralph: there was a line missing from the code that resets the ID to a null string. Otherwise the check in the parent wouldn't do much good. Nice catch!

  Blogger robin wrote at 10 August, 2007 15:21...

Have changed the implementation to use a set, since this is more appropriate as several readers pointed out.

This is recipe 526619 in the online Cookbook.

  Blogger David wrote at 15 August, 2007 17:47...

Very nice even if one of my first generated values was diketude! Try explaining that one to the QA staff ;-)

  Blogger Ants Aasma wrote at 16 August, 2007 21:52...

You might want to check out Yould. Maybe a bit too overkill for given purposes, but I'm using it as a great way to automatically create strong easily memorizable passwords. Had to extend it with arbitrary length markov chains to support my own language - we routinely have three consonants in a row and the default chain length generated a lot of unpronouncable words with four or more consonants in a row.

  OpenID eswald wrote at 13 July, 2009 21:06...

Sounds similar to the systems used for artifacts and secret web pages in To Boldly Go, in which a number is deterministically transformed into such a set of syllables. In at least one case, the number equivalence is important; it seems to be used as a bitfield. In other cases, it would make storage cheaper. If you already have the number, it would also eliminate the need to store previously-used words.

Post a Comment or go back to home