A Faster Base62 Encoding and Decoding Library in Ruby
Base 62 is a mathematical numeral systems that uses sixty-two distinct characters as its base. Most implementations use the numbers 0 – 9 to represent values from zero to nine, and letters a – z and A – Z to represent values from ten to sixty-one.
The encode method of base62 transforms a decimal number (Base 10) to Base 62, while the decode method reverse the process. For example: a decimal number 3781504209452600
can be encoded to hjNv8tS3K
, and hjNv8tS3K
should be decoded back as 3781504209452600
vice versa. Clearly, it is a one to one map between two systems.
Base62 encoding schemes are commonly used for URL shortening service. You can regard the decimal number 3781504209452600
as an unique ID of a link or record in your database, and encoding it to base62 can provide a url like this: http://yourdomain.com/hjNv8tS3K
as a shorten link for retrieving the record of the integer ID.
I’ve written a base62 implementation in Ruby, and you can find it on Github: https://github.com/steventen/base62-rb. The algorithms for encoding and decoding are not hard, I believe you can easily understand them by reading the source code here.
There are several base62 libraries written in Ruby already, and all of the libraries in fact use the same math algorithm. However, with different implementation, and different Ruby language features, I find speed performance can be quite different. As it turns out, my implementation could reach at most 3 times faster than others.
I have included each implementations and compared them accordingly in the benchmark.
Encoding
The encoding part mainly includes loop, string concatenation, and divide operation. Benchmarks of different implementations are described here.
Decoding
The decoding method contains loop, exponentiation operation, index lookup, and addition operation. The benchmarks of various implementations are shown here.
Lesson learned: as a software engineer, understand and master the tools (e.g. the programming language) you are using is really important.