Data and representation: Types and Tokens
Digital representation and formatting data types is based on the whole computing/information architecture design for tokenization (representation instances) and re-tokenization (“copies,” further instances, interpretations of tokens output as additional tokens).
“Data types” are techniques for bundling, labeling, or encoding (i.e., with meta-information) bit and byte units as tokens of defined types for computability. (We must assign different kinds of computable processes for text strings in defined languages, for types of number representations, and for matrices — value arrays — used in digital images.) Each type can have unlimited token instances across the computing architecture at several levels: all forms of memory, the representations in processing units (CPUs, GPUs, and codecs +transducers), data units as Internet packets, etc.
The discourses of “Neural Nets” and “Deep Learning”
Neural nets are mathematical graphs are designed to do fast statistical calculations through many layers of sorting and weighting toward a designed goal. The whole point is pattern recognition and pattern matching, based on human perceptual inferences and pattern recognition capacities as part of human symbolic cognition.
We can design algorithms as pattern recognizers over data because the patterns are human-generated patterns represented in the computable tokens. Pattern-recognizing AI/ML and NLP models are thus projections from human symbolic capabilities, which give us the abilities of multi-leveled abstraction, generalization, and combining types of representations (sign and symbol systems).
“Deep Learning” is not “deep” as in ordinary language “deep/depth of knowledge,” or cumulative history of knowledge and learning. “Deep Learning” means adding many more graph layers with recursive (recurrent) pattern analysis.
Confused and confusing philosophy in ML
/197/ To conclude this journey, we would like to say a few words
about cognitive issues. The most active researchers in the field of
machine translation generally avoid addressing cognitive issues
and make few parallels with the way humans perform a
translation. The artificial intelligence domain has suffered from
spectacular and inflated claims too much in the past, and in
relation to systems that had nothing to do with the way humans
think or reason. It may thus seem reasonable to focus on
technological issues and leave any parallel with human behavior
aside, especially because we do not, in fact, know much about
the way the human brain works.
However, it may be interesting in this conclusion to have a
look at cognitive issues despite what has just been said, because
the evolution of the field of machine translation is arguably
highly relevant from this point of view. The first systems were
based on dictionaries and rules and on the assumption that it was
necessary to encode all kinds of knowledge in the source and
target languages in order to produce a relevant translation. This
approach largely failed because information is often partial and
sometimes contradictory, and knowledge is contextual and
fuzzy. Moreover, no one really knows what knowledge is, or /198/
where it begins and where it ends. In other words, developing an
efficient system of rules for machine translation cannot be
carried out efficiently by humans, since the task is potentially
infinite and it is not clear what should be encoded in practice.
Statistical systems then seemed like a good solution, since
these systems are able to efficiently calculate complex contextual
representations for thousands of words and expressions. This is
something the brain probably does in a very different way, but
nevertheless very efficiently: we have seen in chapter 2 that any
language is full of ambiguities (cf. “the bank of a river” vs. “the
bank that lends money”). Humans are not bothered at all by
these ambiguities: most of the time we choose the right meaning
in context without even considering the other meanings. In “I
went to the bank to negotiate a mortgage,” it is clear that the word
“bank” refers to the lending institution, and the fact that there is
another meaning for “bank” is simply ignored by most humans.
A computer still has to consider all options, but at least statistical
systems offer interesting and efficient ways to model word senses
based on the context of use.