On one of our e-commerce web sites, we needed a unique transaction ID to pass to a third party reporting tool on the checkout pages. We already had a GUID on the page for internal use. And you know how much we love GUIDs!
āASCII values 0-255ā eh? Repeat after me: ASCII is a 7-bit encoding. ASCII is a 7-bit encoding.
Indeed, the ASCII spec only defines character values 32-126 - these 95 values are the only valid ASCII values. Anything else isnāt ASCII.
However, given 20 ASCII characters, that is still about 131.397 bits of information. So youāve still got over 3 bits to spare after your GUID! Just enough space to store a fairly small number.
Ian, thanks for the clarification as always. Just when I thought I was a computer āscientistāā¦
Jon, Iām waiting for the inevitable BASE95 or ASCII95. There must be some good reason that Adobe chose to use just 85 of the 95 possible printable ASCII characters, but I canāt think of what that could be right nowā¦
Jon, Iām waiting for the inevitable BASE95 or ASCII95. There must be
some good reason that Adobe chose to use just 85 of the 95 possible
printable ASCII characters, but I canāt think of what that could be
right nowā¦
Seems to me that the need for certain special case characters would preclude them from using the full 95 characters. I know from reading the wiki entry you linked that at least >, <, and z are generally off limits.
Am I off base?
P.S. This is my first comment here, I love you blog though - truly a pleasure to read. It has been a staple of my google homepage for quite some time
Base64 may be less efficient in terms of space, but itās far more efficient in terms of speed. Division is a slow operation (it becomes noticeable when done with frequency). Obviously when transmitting data over the network, this could be relevantā¦ unless you remember if you have such a restriction, it means either youāre using XML (which is both slow and verbose) or youāre dealing with an old, old legacy system (which is slow and you have no say on the verboseness).
I work with handheld developmentā¦ and both space and speed are frequent issues. However, in such a situation, Iād stick to base64 or maybe even use hex encoding.
A very simple solution to this problem is to use a base 32 rather than base 16 representation to cut the length down from 32 characters to 16 characters. Hexadecimal encoding uses chars [0-9A-F] for an alphabet size of 16 to represent 4 byte runs of a binary stream.
You can represent 8 byte runs of the same binary stream using chars [A-Za-z]. This will cut the ascii representation in half. It would be fairly straight forward to implement such an encoding, simply use the same logic you would to convert to hexadecimal but substitute the larger alphabet for the representation and cut the stream into 8 byte rather than 4 byte chunks.
The reason thereās not a ascii95 is because 85^5 (5 bytes of encoded string) is only a little bigger than 256^4 (4 bytes of unencoded string), which makes it a particularly effective encoding for blocks of such a small size. In fact, you have to get up to 21 bytes before ascii95 would be an improvement on ascii85 (21 unencoded bytes -> 26 ascii95, but 27 ascii85). By that point, the numbers are rather too large to deal with reasonably - weāre barely at 64-bit computers, let alone 168 bits!
Since ascii95 wouldnāt really be a feasible improvement on ascii85, we may as well just use 85 and then have those 10 characters for whatever we want, like āzā and ā~ā.
(Plus we donāt want to use space, so it would really only be ascii94. That doesnāt change the math, though.)