Unicode as an Illustration of the Meaninglessness of Raw Data

One of the best illustrations of what data means in terms of computation is the Unicode Consortium code. While one might think of a string of characters as data, in computation, this conceptualization of data is already too abstract. The ideal of the letter “K” has no inherent meaning, but it can be represented by data, a concept here meaning a string of bytes and bits that have no intrinsic meaning. The concept of a “K” can be encoded in so many different ways. Before unicode, if I was programming a computer, I could decide that any combination of bits and bytes could encode a representation of a “K.”

The problem with this Wild West approach to encoding is that without a structure of how data should be encoded to represent certain text is that computers interacting with each out might decode data to mean two separate things. For example, one computer could encode 01001010 as “K,” but another computer might have decided that the data of 01001010 means “🦵🏼” which could lead to some interesting mixups when the computers send data to each other to be interpreted. It’s a bit uncomfortable to think that the data and the concept it stores are different things, but that’s the beauty of a general purpose computer storing data.

Enter Unicode. Instead of different programs and layers using different bit representations to encode different character values that might get jumbled or lost in translation, Unicode converged to assign consistent values and identities to fixed byte codes. Unicode includes different language symbols and emoji. All together, Unicode currently codes 137,439 different characters.

via BuzzFeed News

Thus, unicode represents a microcosm of the challenges and solutions that are presented within data storage. Concepts that are familiar to humans based on semiotic knowledge, such as the number 4 or the letter “K” can be encoded with different data combinations because data inherently has no set meaning. Such a situation can be confusing when different encodings clash. Thankfully, we have unicode to convene to determine a universal code of character representation. Now, if only they could make the emoji identical across platforms, all issues of encoding and decoding meaning digitally could be solved. 🤪

References:

Irvine, “Introduction to Data Concepts and Database Systems.”

Tasker, P. (2018, July 17). How Unicode Works: What every developer needs to know about strings and 🦄. Retrieved February 20, 2019, from https://deliciousbrains.com/how-unicode-works/
Unicode and You – BetterExplained. (n.d.). Retrieved February 20, 2019, from https://betterexplained.com/articles/unicode/
Unicode Emoji. (n.d.). Retrieved February 20, 2019, from https://www.unicode.org/emoji/