The Basics of Human Communication – Digitally

Best put, the term “’data’” is inseparable from the concept of representation. In the contexts of computing and information, data is always a humanly imposed structure that must be represented through an interpretable unit of some kind.

All text characters work with reference to an international standard for representing text characters in standard bytecode definitions. In effect, Unicode is designed by using bytecode characters, which are designed to be interpreted as a data type for creating instances of character, followed by interpretation in the software stack design, which projects character shapes to pixel patterns on the specific screens of a device as its output. Unlike some of the other types of communication we have spoken about like images or gifs or mp3 files, Unicode provides a set of software-interpretable numbers for representing the form of the whole representable character. A binary media file (like those previously mentioned) has no predefined form or size (for memory). I find a personal example of using Unicode to be funny.  In a web design computer science course I took, I was taught to put in the line UTF-8 to be able to help with the setup of a website. Not until reading the Wikipedia page did I realize that line represented one of the most commonly used encodings.

The second way in which we define a concept of data is through database management systems. This relies on a client/server relationship The client-side reflects the software interfaces for creating, managing, and querying the database on a user/manager’s local PC, while the server-side would be the database management system that is installed and running on a computer in a data center or Cloud array of servers. An example of a relational database model is SQL; SQL uses “Structured Query Language to create a database instance, input data, manage updates, and output data-query results to a client interface,” with which the client can “‘query’ (ask questions or search) data in the system.” As an aside to this definition of DBMS as a concept of data, I think something that has helped me deblackbox this idea is the database course I am taking currently. We have not even started to learn SQL, but instead, we are given a problem and hand draw the given data and its relation to the other data. This signifies to me just how much of a human-centered process database design is, it is not just magical Oracle taking care of everything; “a well-designed database is a partial map of human logic and pattern recognition for a defined domain.” I think that this understanding I have gained can be summed up in Kelleher’s Data Science: “One of the biggest myths is the belief that data science is an autonomous process that we can let loose on our data to find the answers to our problems. In reality, data science requires skilled human oversight throughout the different stages of the process. Human analysts are needed to frame the problem, to design and prepare the data, to select which ML algorithms are most appropriate, to critically interpret the results of the analysis, and to plan the appropriate action to take based on the insight(s) the analysis has revealed.” 

Interestingly enough, digital images seem to be defined by data similar to a combination of the way we use DBMS and Unicode (for text). “Digital cameras store photographs in digital memory, much like you save a Word document or a database to your computer’s hard drive. Storage begins in your camera and continues to your personal computer.” To get more into the ‘nitty-gritty,’ an image is stored as a huge array of numbers and in digital photography, the three colors – red, blue, and green—can have any of 256 shades, with black as 0 and the purest rendition of that color as 255. The colors on your computer monitor or the little LCD screen on the back of your camera are grouped in threes to form tiny full-color pixels, millions of them. When you change the values of adjoining colors in the pixels, you suddenly have 17 million colors at your disposal for each pixel. Essentially, we are express colors as numbers, and these pixelated colors form an image. This is similar to Unicodes expression of text as a number, not a glyph, for each character we attempt to encode.

Kelleher, J. D., and B. Tierney. Data Science. MIT Press, 2018. 

 

“Unicode.” In Wikipedia, February 21, 2021. https://en.wikipedia.org/w/index.php?title=Unicode&oldid=1008164095.

White, R., and T. E. Downs. How Digital Photography Works. Que, 2005. 
 
 
 
Question:
 
If we say all data must be able to be represented in order to be considered data, why is there a separate definition for “data” as representable logical and as conceptual structures?