Geek: the evolution of character codes... For an aging geek like me, this paper was a trip down memory lane. In the '70s, while writing software for Univac computers that were part of the U.S. Navy's NTDS system, I often wrote programs to convert textual data from one character encoding to another. This was a common problem, as there was no “one standard to rule them all” as there is today with Unicode. Instead we used a combination of different character encodings, and if we wanted one system or program to communicate with another, we had to write a conversion program to do it.
The character encodings that I worked with included several widely used ones: Baudot, FIELDATA, ASCII, and EBCDIC, all discussed in the linked paper. We also used some special-purpose, typically application-specific encodings that were basically primitive compression schemes – these were especially common in what we'd call log data today. For instance, one system I worked on kept a log (on magnetic tape!) of all the targets we had identified and tracked. Space on that tape was at a premium, so many simple tricks were used to conserve characters. One that I recall: in an ASCII character stream (5 bits per character), we had a special “numbers-only” mode that was initiated by a control character. Once in that mode, codes from 0x00 to 0x63 represented decimal digit pairs (00 to 99), and 0x64 dropped us out of that mode. This was useful because a high percentage of a log was comprised of numbers – so why “waste” an entire ASCII character for just one digit? If you had a number with 8 or more sequential digits (and we had many of these), this character encoding would save bits.
What a different world with Unicode today!
No comments:
Post a Comment