In digital computing discourse, the term “data” is also different from a traditional context like information is. According to Professor Irvine, there is “no data without representations.” For whatever types of data, they must be interpretable in any software layer, processable by the computing system, and storable as files in memory. That is, “any form of data representation must be computable; anything computable must be represented as a type of data.” This concept is also connected to the information theory. As we learned last week, information, in the digital computing context, is a physical concept. It is encoded to the electronic signals that are communicable in the transmitting channel but unobservable. In this case, “data” is more similar to the meta-information (which is the information in the generic-information sense).
Based on that, it could be explained why “data” is a level above “information.” E-information, at its level, structures “the code for data at next level up and code for operations, interpretations, and transformation of, or over, the representations.” Information could be understood that it plays the function role over the physical computer system level with its strings of binary codes so that data, at the next level up, could be interpretable by a human (as representations) and computable by the computer.
All formats, including text (like TXT) and images (like JPEG), are the same. They have “long lists of numbers, stored as binary on a storage device.” are encoded as “digital data.” In text format, words are coded by Unicode with different character encodings so that words (or visual symbols) can be represented on the computer in different languages. Take Emojis as an example. In the Emoji system, each emoji has a unique codepoint, and it could be combined with another codepoint to form a new emoji. For example, the code of “👶” (baby) is “1F476” (it will be transferred into binary code so that the computer could interpret it). It can be combined with another color code like “1F3FB”; then, we will get a baby with light skin tone- “👶🏻” (“1F476+1F3FB”). For other formats, like image, it works in a similar mode. Images are formed by pixels, which are combinations of three colors- red, green, and blue. “An image format starts with metadata (key values for image), such as image width, image height, and image color.” Colors on each pixel could be divided into three parts- red, green, and blue (each part has a maximum of 8 bits/ 1 byte). For example, “000” is white, which means it has zero intensity of red, green, blue (the biggest value on each color is 255). With each value on the pixel, we could get an image with certain amount pixels. In the process, the code for each pixel will also be transfer to binary code for computer interpretation.
Reference
Irvine, M. Introduction to Computer System Design, 2020.
Kelleher, J. D., and B. Tierney. Data Science. MIT Press, 2018.
Irvine, M. Introduction to Data Concepts and Database Systems, 2021.
“Unicode.” In Wikipedia, February 21, 2021. https://en.wikipedia.org/w/index.php?title=Unicode&oldid=1008164095.
CrashCourse. 2017c. Files & File Systems: Crash Course Computer Science #20. https://www.youtube.com/watch?v=KN8YgJnShPM&list=PL8dPuuaLjXtNlUrzyH5r6jN9ulIgZBpdo&index=21.
Question:
Can data be interpreted as a similar term for the meta-information in the digital computer sense?