Technical Document for Data Science, Coding, and Use in Database, Computing and AI – Heba Khashogji

The many contexts and uses of the terms “information” and “data” make these terms perplexing and confusing outside an understood context. Using the method of thinking in levels and our contexts for defining data concepts, outline for yourself the concept of “data” and its meaning in two of the data systems we review this week. One “system” is the encoding of text data in Unicode for all applications in which text “data” is used; others are database management systems.

What is Data Science?

       Data science incorporates a set of principles, problem identification, algorithms, and processes for extracting unapparent and helpful patterns from large data sets. Many of the data science elements have been developed in related fields, such as machine learning and data mining. In fact, the terms data science, machine learning, and data mining are often used interchangeably. The commonality across these disciplines is a focus on improving decision making through the analysis of data. However, although data science borrows from these other fields, it is broader in scope. Machine learning (ML) emphases on the design and assessment of algorithms for extracting patterns from data. Data mining typically handle the examination of structured data and often suggests a focus on commercial applications.

A Brief History of Data Science.   

The term data science can be traced back to the 1990s. Nevertheless, the fields that it profits by having a much longer history. One thread in this more extended history is data collection history; another is the history of data analysis. In this section, we review the main developments in these threads and describe how and why they converged into the field of data science. Of necessity, this review introduces new terminology as we define and name the important technical innovations as they arose. For each new term, we provide a brief explanation of its meaning; we return to many of these terms later in the book and give a more detailed description of them. We begin with a history of data collection, then provide a history of data analysis, and, finally, cover data science development.  (Kelleher and Tierney, 2018). 

Document and Evidence

The word information commonly refers to bits, bytes, books, and other signifying objects, and it is convenient to refer to this class of objects as documents, using a broad sense of that word. Documents are essential because they are considered evidence. 

The Rise of Data Sets.

Academic research projects typically generate data sets, but in practice, it is generally impractical for anyone else to attempt to make further use of these data, even though significant research funders now mandate that researchers have a data management plan to preserve generated data sets and make them accessible.

Naming

Finding operations depend heavily on the names assigned to document descriptions and the named categories to which documents are assigned. Naming is a language activity and so inherently a cultural activity. For that, we introduce a brief overview of the issues, tensions, and compromises involved in describing collected documents. The notation can be codes or ordinary words. Linguistic expressions are necessarily culturally grounded and so unstable and, for that reason, are in conflict with the need to have stable, unambiguous marks if systems are to perform efficiently.

The First Purpose of Metadata: Description

The primary and original use of metadata is to describe documents. There are various types of descriptive metadata:  technical (to describe the format, encoding standards, etc.); administrative. These descriptions help in understanding a document’s character and in deciding whether to make use of it. Description can be instrumental, even if nonstandard terminology is used.

The Second Use of Metadata: Search

Thinking of metadata to describe individual documents reflects only one of the two roles of metadata. The second use of metadata is different: it emerges when you start with a query or with the description rather than the document—with the metadata rather than the data— when searching in an index. This second use of metadata is for finding, search and discovery. (Buckland, 2017). 

Both “information” and “data” are used in general and undifferentiated ways in ordinary and popular discourse. Still, to advance in our learning for AI and all the data science topics that we will study, we all need to be clear on these terms and concepts’ specific meanings. The term “data” in ordinary language is a vague, ambiguous term. We must also untangle and differentiate the uses and contexts for “data,” a key term in everything computational, AI, and ML.

No Data without Representation.

In whatever context and application, “data” is inseparable from the concept of representation. A good slogan should be “no data without representation” (which can be said of computation in general). By “representation”, we mean a computable structure, usually of “tokens” (instances of something representable) corresponding to “types” (categories or classes of representation, roughly corresponding to a symbolic class like text character, text string, number type, matrix/array of number values, etc.). (Irvine, 2021).

Knowledge of database technology increases in importance every day. Databases are used everywhere: They are fundamental components of e-commerce and other Web-based applications. They lay at the core across the organization’s operational and decision support applications. Databases are also used by thousands of workgroups and millions of individuals. It is assessed that there are more than 10 million active databases in the world today.

This book aims to teach the essential relational database concepts, technology, and techniques that you need to start a career as a database developer. This book fails to teach everything that matters in relational database technology. Still, it will give you adequate scope to create your databases and participate as a group member in developing a more immense, more complex database. (Kroenke et al., 2017).

The data type attribute (numeric, ordinal, nominal) affect the methods we can use to analyse and understand the data. Use to describe the distribution of values that an attribute takes and the more complex algorithms we use to identify the patterns of relationships between attributes. At the most basic level of analysis, numeric attributes allow arithmetic operations. The typical statistical analysis applied to numeric attributes is to measure the central tendency (using the mean value of the attribute) and the dispersion of the attributes’ values (using the variance or standard deviation statistics).

Machine Learning 101

The primary tasks for a data scientist are defining the problem, designing the data set, preparing the data, deciding on the type of data analysis to apply and evaluating, and interpreting the data analysis results. What the computer brings to this partnership is processing data and searching for patterns in the data. Machine learning is the field of study that develops the algorithms that computers follow to identify and extract data patterns. ML algorithms and techniques are applied primarily during the modelling stage of CRISP-DM. ML involves a two-step process.

First, an ML algorithm is applied to a data set to identify useful patterns in the data. Second, once a model has been created, it is used for analysis. (Kelleher and Tierney, 2018). 

References :

  1. Kelleher, J & Tierney, B (2018). Data Science. The MIT Press: London.
  2. Buckland, M. (2017). Information and Society. The MIT Press: London.
  3. Irvine, M. (2021). Universes of Data: Distinguishing Kinds and User of “Data” in Computing and Al Applications.
  4. Kroenke, D., Auer, D., Vandenberg, S. L., & Yodeer, R.C. (2017). Database Concept. Pearson: NY.
This entry was posted in Week 5 on by .

About Heba Khashogji

As a true believer in the seeds of obedience that blossom in our lives my life found happiness in honoring my parents. This leads me to the passion I’ve been fulfilling, to be an agent of change both in the corporate and societal environment. I advocate to work on social services to create and promote equity, opportunity and improvement of the people and the community. I offer more than a decade of experience and accomplishment in human resource, driving implementation in employee development, quality management systems, salary standardization, compensation and benefits management, personnel services management and company reorganization and realignment. One of my achievements is the creation of a quality management procedures and policies as an strategic and tactical efforts that drove our company, Khashoggi Holding Company in its International recognition as Quality Crown Gold Awardee in 2014. Going back, when I started working as a volunteer accountant/admin to setup Dar AlHekma College, the first private college for ladies in the Saudi Arabia and my first official career in King Fahad Armed Forces Hospital, I developed an interest in human relations and developed this interest into my participation to the implementation of quality management and standardization of policy management systems in these organizations. Demonstrating initiative in the start, I applied and implemented integration programs in Personnel Section leading to employees' satisfaction by delivering fair and reasonable benefits to all. Throughout my career, I had the opportunity to establish a strong network contacts in and out of the country through my active participation in several seminars and workshops. The scope of my experience has spanned practically in all aspects of HR as well as leadership. Another passion I am in love with is the aiding to the propagation of young Saudi generation be with better traits and characters created children books, converted to animated videos shown in local TV channels to help reinforcing behavioral change in the Arab region bringing them to be more well-mannered individuals and be more diplomatic among them as well as with their foreign friends exercising tact and courtesy in every encounter. Just recently, another 2 things in my wish list are achieved, to skydive and take Master course. Skydiving made me challenge myself and conquer my fears that can help me overcome obstacles in my future. I am not stopping to dream and I am not stopping to learn. I still see myself in a class, for 23 years from now, physical or virtual. I thirst for knowledge and I always crave for new ideas not even in the time of pandemic.