Data as a layer above information is something given. “Data,” to be interpretable, is something that can be named, classified, sorted, and be given logical predicates or labels (attributes, qualities, properties, and relations) (Irvine, 2021). And since data needed to be representable and designed to the form that will apply to many platforms and devices, to transform from the string of pure data into the extension of a comprehensive human symbolistic sign, everything makes sense. The problem occurred decades ago when scientists first started wanting to share data with each other. In simple words, the architectures pre-script the language while building the computer, so the device can “work-on its own” to no small degree to process the data. The scientists simply needed to decode the result based on their codebook. However, because each computer was designed based on different codebooks, sharing and exchanging the data became a problem. It required the technologists to comprehend more info of different data types and also block the connections between devices (Instructions & Programs: Crash Course Computer Science #8, 2017, 03:15-05:21).
Unicode was designed to solve these problems. As professor Irvine mentioned: “Unicode is the data “glue” for representing the written characters (data type: “string”) of any language by specifying a code range for a language family and standard bytecode definitions for each character in the language) (Irvine, 2021). Based on my understanding, all of the text we see is not actually word by word, and it is still pictures by pictures combined together. Taking Chinese word as an example. When we see a character：
尛 = 小 + 小 + 小
And each part in the square was build up by a sequence of binary code, combined by the selecting font. After processing hardware and software, we see the final combination of all three images generated a total character pixel by pixel. Unicode provides interpretable data that are accessible for electronic devices. And that’s also the reason why we need to use Unicode.
Many devices have a problem that they could only process a binary language system, especially Latin-originated characters rather than multi-language systems, simply because they share more common basic data info. In order to solve this problem, the developers use the strings from Unicode to allow devices to decode and understand the language on their own to further communicate with each other. In this case, using Unicode saves more storage and more easily for users to process software locally.
Irvine, Martin. (2021). “Introduction to Data Concepts and Database Systems.”
Instructions & Programs: Crash Course Computer Science #8. (2017, April 12). [Video]. YouTube. https://www.youtube.com/watch?v=zltgXvg6r3k&list=PL8dPuuaLjXtNlUrzyH5r6jN9ulIgZBpdo&index=9