Author Archives: Eric Cruet

Data, Representation and Visualization

By Eric Cruet

I. Introduction

“Use a picture. It’s worth a thousand words.” Arthur Brisbane -1911

What constitutes an awesome visualization? Some think of it as flashy graphics, while others look for busy charts and colorful graphs.   A better general definition is one that provides a clear, visual frame to represent data, in a way that allows the observer to “see” a trend, outline, pattern, outlier, or other significant information which would otherwise been imperceptible to him by just looking at the source data.

Mediology is based on the differentiation of transmission and communication.  According to Regis Debray, to communicate means to transport information in a space within one and the same space-time-sphere.  To transmit means to transport information in time between different space-time-spheres. Communication is a moment in a longer process and a fragment of a larger whole, that we call transmission [1]. Based on Debray’s definition, visualization is a medium as opposed to a specific tool.  A tool generates bar charts and graphs.  A medium has the ability to communicate emotion, curiosity, activity, energy, and granularity.  For instance, the pictorial representations to scale of human anatomical components in Gray’s Anatomy have persevered the test of time, communicating the same information across generations of medical students globally.

Data is the basis of any visualization and as such, an abstraction of information and facts. The data set is a collection of snapshots of the desired data at one point in time and usually serves as the basis for the visualization.  Statistics are used to manipulate and analyze the set, since collectively, the data points in the set generate means, medians, and standard deviations.  But what is most important is the context associated with the data and the results, in other words, what they represent.  They translate into descriptions of people, places, and things that allow the comparison and contrast of specific items.  When you drill down on the data, you obtain individual details about members and objects of the population.  All of the above can be used to tell visual stories, and make data that usually look like columns and rows of numbers, human and relatable.

II. Data

When you ask most people what is data, they reply with a vague description usually related to a file, an application or numbers.  Some might mention spreadsheets or databases. These are all containers and formats that data comes in, but provide it very little context.  That’s where representation and visualization come in.

William Cleveland and Robert McGill are often cited for their work on perception and accuracy in statistics [2].  Elements like position, scale, and the use of scatterplots, followed by length, angle and then slope can be attributed to their work.  Edward Tufte is also credited with identifying some of the first basic rules of design.  But his most important rule was that “most principles of design should be greeted with some skepticism” [3].

Data is undergoing a paradigm shift.  There is more to the term “big data” than the quantity. Most of our institutions were established under the assumption that decisions would be made with information that was scarce, exact, and causal in nature.  This situation is changing rapidly now that amounts of data are huge, can be quickly processed, and some degree of uncertainty is acceptable [4].  In certain operating scenarios, correlations are more important than causality.  Most importantly, many times we are interested in data streams, as opposed to data snapshots.

So the type, source and volume of data influences the way information is represented, communicated and visualized.  The following example contains statistical data for traffic fatalities [5] in the US in a chart format:

This is the basic table containing the source data.  In order for it to tell us more than just counts of fatalities, a process of representation and visualization needs to take place.  This entails the application of computational methods, good design principles, some basic rules about art layout, color, and the use of templates, and decisions about the level of granularity for the type of information you wish to relay.

 

III. Representation

There is value in looking at data beyond the mean, median, or total because these measurements only tell part of the story.  Many times, aggregates or values around the middle of the distribution hide the interesting details that really need focus for decision making or illustrative purposes.

Outliers which stand outside of the centrally situated values could also be needing attention. Changes over time sometimes indicate that something positive (or negative) is happening (or about to happen) in the system under observation.  Regular occurrences or patterns could help you anticipate future events and granularity can be adjusted depending on variability. The graph below [10] is an example of a creative representation of the data table in the previous section:

Although these are snapshots, they provide a different perspective on the the same data by communicating alternative information.  One glance at the chart tells you that traffic fatalities have decreased substantially over time.  Key milestones are listed by year of significance.  It’s a different take on what could be a boring line chart.

Finally, a poster[10] drilling down into a comparison of traffic vs. total fatalities data for 2008 – 2009:


The poster format is well suited for this type of information.  It utilizes a variety of graphs and charts to represent data.  It summarizes the information well and has a good level of detail.  The use of color is appropriate and the shapes and sizes complement each other.

When you look at these representations, they look much better than columns and rows of statistics one after the other.

IV. Visualization

Visualization has been around for centuries, but it is relatively new as a field of study.  Even the experts in the field have not settled on what exactly comprises it.  One of the topics of debate is: when and where does visualization become art?

The answers to these questions vary depending on whom you ask.  But rather than think of the field as composed of disparate categories that work independently from others, it is better thought of as a continuous spectrum that stretches from statistics to data art [9].  Although you can find examples at each extreme, most of what you commonly see is a mixture of both.  Where there is a balance of statistics, design, and aesthetics is most likely that you will find the best examples of visualization work.

My post in Week 1 deals with mapping large scales of change.  Much of mankind’s preoccupation has been with changes in the sciences, technology, sociology and economics.  More recently, the concern has shifted to variations in climate, global financial states, the effect of technology on society, and the increasing use of unlawful violence intended to coerce or to intimidate governments or societies i.e terrorism.

Traditionally, network, graph, and cluster analysis are the mathematical tools used to understand specific instances of the data generated by these scenarios at a given point in time. But without methods to distinguish between real patterns and statistical error, which can be significant in large data sets, these approaches may not be ideal for studying change.  Also, patterns and trends can be better ascertained by observing behaviour over time, as opposed to at a specific point in time. By looking at a time series and assigning weights to individual networks, we can determine meaningful structural differences vs. random fluctuations [2].

In the follow up post in Week 2, the unique, clever use of alluvial diagrams [2] by M. Rosvall and C. T. Bergstrom in their research entitled “Mapping change in large networks”, is a good example of how accurate statistics, good design, and simple artwork can reveal interesting, otherwise, hidden patterns in the data.  Using bibliometrics, which utilizes quantitative analysis and statistics to find patterns of publication within a given field or body of literature, they tracked citation patterns among scientific journals.  This allowed them to map idea flows and how the flow of ideas influenced changes in the science disciplines over time.  The resulting diagram and link to the research can be found below:

 

Just at a glance, what is evident from the “picture” is the fact that from 2000 – 2010, the neurosciences emerged as a new “discipline” from the fields of neurology and molecular and cell biology.  In this case the visualization served as the data analysis tool revealing the changes that the research hypotheses was trying to uncover. 

Along the lines of using visualization as a data analysis tool, the collaboration team of Fernanda Viégas and Martin Wattenberg, (at http://hint.fm) have invented artistic, creative ways of using visualization to express data.  Although they have a suite of impressive examples at their site, history flow is a tool that allows you to explore the history of any Wikipedia entry over time.

As shown below, the visual looks like an inverted stacked area chart where each layer reprpesents a body of text.  As time passes, new layers are added or removed and you can see the change in overall size via the total vertical height of the full stack:

The image above is the diagram for the wiki article on abortion. The black gashes show points where the article has been deleted and replaced with offensive comments. This type of vandalism turns out to be common on controversial articles.  The authors performed statistical analysis in 2003 to investigate the issue of online vandalism [6], and discovered that the median lifetime of certain types of vandalism is measured in minutes.  This is an alternate use of the alluvial diagram   shown previously, but in a different context.  Whereas in the previous case it was mapping changes in science, in this instance it is mapping changes to the bodies of text in Wikipedia articles.

Another great example of using visualization as a tool is this wind map, which provides a living portrait of the wind currents over the U.S.  Clicking on the map will take you to the real time instance.  Check it out:

Finally, researchers in cognitive science are using Diffusion Tensor Imaging (DTI), an MRI-based neuroimaging technique which makes it possible to visualize the location, orientation, and anisotropy of the brain’s white matter tracts.  Once they have a suitable group of sample volunteers, they test for neuropsychological factors such as general cognition, memory and information processing speed.  In addition, metrics such as fiber counts, length, diffusion rate, and diffusion anisotropy are statistically correlated to support the data.  The statistical relationship to age is usually modeled using a linear regression.  The fiber’s direction is indicated by the tensor’s main eigenvector. This vector can be color-coded, yielding a cartography of the tracts’ position, direction (red for right-left, blue for foot-head, green for anterior-posterior), and anisotropy (as indicated by the tract’s brightness).  In the following study, the researchers provide a visual assessment of the white matter maturation for 80 subjects of distint ages [7]:

 

This image illustrates the significant age related differences between tract-based bundles in the brain.  Red and blue indicate negative and positive correlation respectively.

 

 

 

This diagram shows the significant age related effects in connectivity based bundles.  Red and blue indicate negative and positive correlation, respectively.  Light gray connections had no significant effects, and the higher the saturation in the color, the more significant the age related effect in the result.  The population average bundle volume (sum of fiber lengths) is mapped to cord thickness.  Total bundle volume of each grey matter region is mapped proportionately to arc length.

Key: L=Left, R=Right, F=Frontal, T=Temporal, P=Parietal, O=Occipital, S=Subcortical

In closing these DTI scans can also derive neural tract directional information from the data using 3D or multidimensional vector algorithms based on six or more gradient directions, sufficient to compute the diffusion tensor. The diffusion model is a rather simple model of the diffusion process, assuming homogeneity and linearity of the diffusion within each image voxel. From the diffusion tensor, diffusion anisotropy measures such as the fractional anisotropy (FA), can be computed. Moreover, the principal direction of the diffusion tensor can be used to infer the white-matter connectivity of the brain (i.e. tractography; trying to see which part of the brain is connected to which other part).  Here’s a video clip on 3D DTI:

 

V. Interpretations

The intention of visualization is to communicate results to a wider audience.  Imagine you are a tour guide.  Put yourself in the tourist’s position.  You’re on a tour of a city where historic events have occurred over centuries.  What would the tourist want out of the tour?  He wants to know about when and where key events happened, who the main characters were, and why the buildings have particular shapes or colors.  All tour guides have their own personality, but they should stay on course and on the subject that the tourist paid to hear about.  Above all else, the tourist wants the guide to be factual and truthful in his account of events.  If he doesn’t know the answer to a question, he should be honest and say so.

As leading a tour of data through the use of visualization, presenters (or representers) should assume similar responsibilities.  It’s your duty to point out key highlights, background info, stay focused, and eliminate confusion.  Always aim your content at your target audience, and remember, speak the truth and nothing but the truth.

“The naked truth is always better than the best dressed lie”

Ann Landers (1918 – 2002) 

Appendix A: 3D Visualizations – Follows References Section

References:

[1] Debray, Régis “Qu’est-ce que la médiologie?” Trans. Martin Irvine. Le Monde Diplomatique, August 1999, p32.

[2] Cleveland, W. S., & McGill, R. (1984). Graphical perception: Theory, experimentation, and application to the development of graphical methods.Journal of the American Statistical Association79(387), 531-554.

[3] Tufte, E. R., & Graves-Morris, P. R. (1983). The visual display of quantitative information (Vol. 2). Cheshire, CT: Graphics press.

[4] Mayer-Schönberger, V., & Cukier, K. (2013). Big Data: A Revolution that Will Transform how We Live, Work, and Think. Eamon Dolan/Houghton Mifflin Harcourt.

[5]http://www.census.gov/compendia/statab/cats/transportation/motor_vehicle_accidents_and_fatalities.html

[5] Rosvall, M., & Bergstrom, C. T. (2010). Mapping change in large networks. PloS one5(1), e8694. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0008694

[6] Viégas, F. B., Wattenberg, M., & Dave, K. (2004, April). Studying cooperation and conflict between authors with history flow visualizations. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 575-582). ACM.

[7] Cabeen, R. P., Bastin, M. E., & Laidlaw, D. H. (2013). A Diffusion MRI Resource of 80 Age-varied Subjects with Neuropsychological and Demographic Measures. ISMRM.

[8] http://www.technologyreview.com/photoessay/411056/the-brain-unveiled/

[9] Yau, N. (2013). Data Points: Visualization That Means Something. John Wiley & Sons.

[10] http://www.caranddriver.com/features/safety-in-numbers-charting-traffic-safety-and-fatality-data

Appendix A: 3D Visualizations 

Diffusion spectrum imaging [8], developed by neuroscientist Van Wedeen at Massachusetts General Hospital, analyzes magnetic resonance imaging (MRI) data in new ways, letting scientists map the nerve fibers that carry information between cells. This image, generated from a living human brain, shows a reconstruction of the entire brain The red fibers in the middle and lower left are part of the corpus callosum, which connects the two halves of the brain.

This image, generated from a living human brain, shows a subset of fibers. The red fibers in the middle and lower left are part of the corpus callosum, which connects the two halves of the brain.

Mapping Diffusion

Neural fibers in the brain are too tiny to image directly, so scientists map them by measuring the diffusion of water molecules along their length. The scientists first break the MRI image into “voxels,” or three-dimensional pixels, and calculate the speed at which water is moving through each voxel in every direction. Those data are represented here as peanut-shaped blobs. From each shape, the researchers can infer the most likely path of the various nerve fibers (red and blue lines) passing through that spot.

This image is the isolated optic tract, which relays visual signals from the eyes to the visual cortex, from the brain of an owl monkey. The blue lines at lower right represent nerve fibers connecting the eyes to the lateral geniculate nucleus (marked by the white ball), a pea-size ball of neurons that acts as a relay station for visual information. Those signals are then sent to the visual cortex, at the back of the head, via the blue and purple fibers that arc across the brain.

 

Edited on Microsoft Surface RT

Approaches to Cognitive Science

By Eric Cruet

The genesis of cognitive science as a collaborative endeavor of psychology, computer science, neuroscience, linguistics, and related fields began in the the 1950s, however, its first major institutions (a journal and society) were established in the late 1970s.

A key contributor to the emergence of cognitive science, psychologist George Miller, dates its birth to September 11, 1956, the second day of a Symposium on Information Theory at MIT. Computer scientists Allen Newell and Herbert Simon, linguist Noam Chomsky, and Miller himself presented work that would point each of their fields in a more cognitive direction.

In the late 1970s, human experimental psychology, theoretical linguistics, and the computer simulation of cognitive processes saw progressive elaboration and coordination. In the 1990s, John McCarthy and Marvin Minsky at MIT developed a broad based agenda for the field they named artificial intelligence (AI), and the convergence of all the above led to the establishment of the multi-disciplinary field we recognize today.

Today, the inclusion of network theory, complexity science, advances in imaging modalities and visualization, and the ability to process entire data sets as opposed to small samples promise to significantly change the way in which the organization and dynamics of cognitive and behavioral processes are understood. Below, we describe a mix of classic and current approaches to cognitive science.

Distributed Cognition

Distributed cognition is a branch of cognitive science that proposes cognition and knowledge are not confined to an individual; rather, it is distributed across objects, individuals, artefacts, and tools in the environment.  Early work in distributed cognition was motivated by the fact that cognition is not only a socially (also, materially and temporally) distributed phenomenon, but one that is essentially situated in real practices [1].  The theory does not posit some new kind of cognitive process.  Rather, it represents the claim that cognitive processes generally are best understood as situated in and distributed across concrete socio-technical contexts.

Traditional cognitive science theory emphasizes an internalism that marginalizes (some would argue ignores) the role of external representation and problem solving in cooperative contexts.  Traditional approaches to description and design in human-computer interaction have similarly focused on users internal models of the technologies with which they interact.  In this case the theoretical focus is on how cognition is distributed across people and artifacts, and on how it depends on both internal and external representations.

The Cognitive Niche

Humans have the ability to pursue abstract intellectual feats such as science, mathematics, philosophy, and law.  This is surprising, given that opportunities to exercise these talents did not exist in the hunter-gatherer societies where humans evolved.

The “Cognitive Niche theory of cognition states that humans evolved to fill a mode of survival by manipulating the environment through causal reasoning and social cooperation. In addition, the psychological faculties that evolved to prosper in the cognitive niche can be co-opted to abstract domains by processes of metaphorical abstraction and productive combination like the ones found in the use of human language [2]. 

 

This theory claims several advantages as an explanation of the evolution of the human mind. It incorporates facts about the cognitive, affective, and linguistic mechanisms discovered by modern scientific psychology rather than appealing to vague, prescientific black boxes like “symbolic behavior”.  Finally, the cognitive adaptations comprise the “intuitive theories” of physics, biology, and psychology; the adaptations for cooperation comprise the moral emotions and mechanisms for remembering individuals and their actions; and the linguistic adaptations comprise the combinatorial apparatus for grammar and the syntactic and phonological units that it manipulates [3]. 

 

Connectionism

Connectionism is an alternate computational paradigm to that provided by the von Neumann architecture that has inspired classical cognitive science [4]. Originally taking its inspiration from the biological neuron and neurological organization, it emphasizes collections of simple processing elements in place of the centrally-controlled manipulation of symbols by rules that is typical in classical cognitive science. The simple processing elements in connectionism are typically only capable of rudimentary calculations (such as summation).

A connectionist network is a particular organization of processing units into a whole network. In most connectionist networks, the systems are trained using a learning rule to adjust the weights of all connections between processors in order to obtain a network that performs some desired input-output mapping.

Connectionist networks offer many advantages as models in cognitive science [5]. However, in spite of the fact that connectionism arose as a reaction against the assumptions of classical cognitive science, the two approaches have many similarities when examined from the perspective of Marr’s tri-level hypothesis [6].

There are many forms of connectionism, but the most common forms use neural network models.

Though there are a large variety of neural network models, they almost always follow two basic principles regarding the mind:

  1. Any mental state can be described as an (N)-dimensional vector of numeric activation values over neural units in a network.
  2. Memory is created by modifying the strength of the connections between neural units. The connection strengths, or “weights”, are generally represented as an (N×N)-dimensional matrix.

Connectionists are in agreement that recurrent neural networks (networks wherein connections of the network can form a directed cycle) are a better model of the brain than feedforward neural networks (networks with no directed cycles). Many recurrent connectionist models also incorporate dynamical systems theory. Many researchers, such as the connectionist Paul Smolensky, have argued that connectionist models will evolve toward fully continuous, high-dimensional, non-lineardynamic systems approaches.

Theoretical Neuroscience

Theoretical neuroscience is the attempt to develop mathematical and computational theories and models of the structures and processes of the brains of humans and other animals. It differs from connectionism in trying to be more biologically accurate by modeling the behavior of large numbers of realistic neurons organized into functionally significant brain areas. In recent years, computational models of the brain have become biologically richer, both with respect to employing more realistic neurons such as ones that spike and have chemical pathways, and with respect to simulating the interactions among different areas of the brain such as the hippocampus and the cortex. These models are not strictly an alternative to computational accounts in terms of logic, rules, concepts, analogies, images, and connections, but should complement other models to illustrate how mental functions can be translated and performed at the neural level.

Learning is arguably the central problem in theoretical neuroscience. It is possible that other problems such as the understanding of representationsnetwork dynamics and circuit function could be understood when the details of the learning process are known that, together with the action of the genome, produce these phenomena. 

Another tremendous challenge is “the invariance problem”.  Our mental experience suggests that the brain encodes and manipulates ‘objects’ and their relationships, but there is no neural theory of how this is done. We recognize, for example, a cup regardless of its location, orientation, size, or other variations such as lighting and partial occlusion. How do brain networks recognize a cup despite these complicated variations in the image data? How is the invariant part (‘cup-ness’) encoded separately from the variant part?

This is the ‘holy grail’ problem of the computer vision community, and we aim to tackle it by fortifying our learning algorithms with insights from the mathematics surrounding the concept of invariance. Invariance may also be seen in motor scenarios, cups being a class of things that we can drink from (what J.J.Gibson called an affordance).

 

References:

[1] Wilson, R. A., & Keil, F. C. (Eds.). (1999). The MIT encyclopedia of the cognitive sciences (Vol. 134). Cambridge^ eMA. MA.: MIT press.

[2] Pinker, S. (2010). The cognitive niche: Coevolution of intelligence, sociality, and language. Proceedings of the National Academy of Sciences107(Supplement 2), 8993-8999.

[3] Whiten, A., & Erdal, D. (2012). The human socio-cognitive niche and its evolutionary origins. Philosophical Transactions of the Royal Society B: Biological Sciences367(1599), 2119-2129.

[4] Bechtel, W., & Abrahamsen, A. A. (2002). Connectionism And The Mind : Parallel Processing, Dynamics, And Evolution In Networks (2nd ed.). Malden, MA: Blackwell.

[5] Dawson, M. R. W. (1998).Understanding Cognitive Science. Oxford, UK: Blackwell.

[6] Dawson, M. R. W. (2004). Minds And Machines : Connectionism And Psychological Modeling. Malden, MA: Blackwell Pub.

 

Abstraction and Simulation

By Eric Cruet

 

Complex computational models typically require large amounts of processing power to produce highly detailed output difficult for users to understand. Building abstracted simulation and visualization systems that simplify both computation and output can help overcome this barrier.

Furthermore, the output of such simulations, which often consists of an agonizingly detailed trace of system events, can be difficult to understand at a global or intuitive level. These two considerations, economy of resources (time, cycles) and intelligibility, argue for the development of abstracted simulation systems of reduced complexity that ignore certain interactions or collapse over some dimensions. Abstracting a detailed simulation can simplify both computation and output, providing an accurate picture of events and efficient utilization of resources.

Since W. S. Gosset, a brewer at Guiness and considered the “father of statistics”, used simulation to prove his elucidation of the t-statistic [2], simulation and visualization in scientific research have been driven by interaction between the following:

1. Inspiration, which may be motivated by sheer curiosity as well as specific theoretical or practical problems

2. Intuition, which may guide the search for a problem solution or lead to new discoveries when reasoning alone is insufficient to ensure continued progress

3. Abstraction, which encompasses the modeling and analysis techniques required to build a simulation model, design experiments using that model, and draw appropriate conclusions from the observed results

4. Experimentation, which is computer based and thus differs fundamentally from other empirical scientific work because of the efficiency improvements that are achievable using Monte Carlo methods

Henri Poincare, was a polymath, and in mathematics also known as The Last Universalist, since he excelled in all fields of the discipline as it existed during his lifetime.  In his text, Mathematical Discovery [1], he stated on inspiration and detailed verification (emphasis added)— “I have spoken of the feeling of absolute certainty which accompanies the inspiration; in the cases quoted this feeling was not deceptive, and more often than not this will be the case. But we must beware of thinking that this is a rule without exceptions. Often the feeling deceives us without being any less distinct on that account, and we only detect it when we attempt to establish the demonstration

From the opposite perspective, abstraction encompasses both simulation modeling and the simulation analysis required to do the following:

•Build a model

•Design experiments using that model

•Draw appropriate conclusions from the observed results.

Simulation-based experimentation differs fundamentally from all other types of empirical scientific work by the large potential efficiency improvements that are achievable because we have complete control of the experimental conditions under which each alternative scenario is simulated.

Examples:

NEURON

NEURON is a simulation environment for modeling individual neurons and networks of neurons. As of version 7.3, Neuron is capable of handling diffusion-reaction models, and integrating diffusion functions into models of synapses and cellular networks.

NEURON [3] models individual neurons via the use of sections which are subdivided into individual compartments by the program, instead of requiring the user to manually create the compartments. The primary scripting language that is used to interact with it is hoc but a Python interface is also available. The programs for it can be written interactively in a shell, or loaded from a file. NEURON supports parallelization via the MPI protocol. Also, starting with NEURON 7.0 parallelization is possible via internal multithreaded routines, for use on computers with multiple cores. 

Currently, NEURON is used as the basis for instruction in computational neuroscience in many courses and laboratories around the world.

GENESIS

Generative E-Social Science for Socio-Spatial Simulation (GENESIS) [6]

Generative social science is widely regarded as one of the grand  challenges of the social sciences. The term was popularised by  Epstein and Axtell of the Brookings Institution in the book (1996) Growing Artificial Societies: Social Science from the Bottom Up who define it as simulation that “… allows us to grow social  structures in silico demonstrating that certain sets of micro-specifications are sufficient to  generate the macro-phenomena of interest”. It is consistent with the  development of the complexity sciences, with the development of decentralised  and distributed agent-based simulation, and with ideas about social and spatial  emergence. It requires large-scale data bases for its execution as well as  powerful techniques of visualisation for its understanding and dissemination. It  provides experimental conditions under which key policy initiatives can be  tested on large-scale populations simulated at individual level. It is entirely  coincident with the development of e-social science which provides the infrastructure  on which such modelling must take place.

In closing, advances in simulation are driven by the continuous interplay of the following:

Our sources of inspiration—both internal and external—for the discovery of solutions to practical problems as well as the theory and methodology required to attack those problems;

The intuition that we acquire from careful experimentation with well-designed simulation models, from intense scrutiny of the results, and from allowing the unconscious to work on the results

The conscious follow-up work in which the emerging flashes of insight into the problem at hand are expressed precisely, verified completely, and connected to other simulation work.

http://youtu.be/JqMpGrM5ECo

References:

[1] Poincaré, Henri. (1914) 1952. “Mathematical Discovery.” In Science and Method, 46–63. Translated by Francis Maitland, with a preface by Bertrand Russell. London: Thomas Nelson and Sons. Reprint, New York: Dover Publications.

[2] http://www.lib.ncsu.edu/specialcollections/simulation/collections.php

[3] Brette R., Rudolph M., Carnevale T., Hines M., Beeman D., Bower J., et al.  (2007). Simulation of networks of spiking neurons: a review of tools and strategies. J. Comput. Neurosci. 23, 349–398. doi: 10.1007/s10827-007-0038-6. [PMC free article]  [PubMed] [Cross Ref]

[4] Drewes R. (2005). Brainlab: a Toolkit to Aid in the Design, Simulation, and Analysis of Spiking Neural Networks with the NCS Environment. Master’s thesis, University of Nevada, Reno. [PMC free article]  [PubMed]

[5] Drewes R., Zou Q., Goodman P. (2009). Brainlab: a python toolkit to aid in the design, simulation, and analysis of spiking neural networks with the neocortical simulator. Front. Neuroinform. 3:16. doi: 10.3389/neuro.11.016.2009. [PMC free article]  [PubMed] [Cross Ref]

http://www.genesis.ucl.ac.uk/

 

 

An Overview of Google Analytics

By Eric Cruet

Google Analytics is a free service offered by Google that generates detailed statistics based on the number of visitors to a website.  It will show you who is visiting your website, where they came from and what they searched to find you [1].

In addition to traffic breakdown, interpretation of Google Analytics can also show you how visitors are engaging with your site by reporting on key areas such as:

  • A measure of your best content, by indicating the most popular pages of your site
  • Visitor traffic over specified periods of time giving you a feel for how sticky your site is and how many visitors come back for future visits
  • The length of time spent on your site

  • Visitors
    1. Characteristics
    2. Browser, new vs. returning user, originating location
  • Traffic
  1. Origins
  2. Keywords, refers, pages
  • Content
  1. Effectiveness
  2. Bounce rate, paths, navigation summary

To get started all you need is a Google account.  Once you go the site you sign up, add the name of the website you want to track, and it will generate a tracking ID and the tracking code you  will need to add to each page that you want to generate statistics on [2].

This is what the tracking code looks like in my attempt to track visits to my CCTP 903 blog:

<script>
(function(i,s,o,g,r,a,m){i[‘GoogleAnalyticsObject’]=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,’script’,’//www.google-analytics.com/analytics.js’,’ga’);

ga(‘create’, ‘UA-41934475-1’, ‘georgetown.edu’);
ga(‘send’, ‘pageview’);

</script>

This code needs to be pasted in the html header of each page you want to track.  There is a specific procedure to perform this task within WordPress.  Unfortunately, when I attempted to do this, I did not have administrator privileges to copy it in the correct location (I copied it in the wrong place and was not able to get any statistics).  Consult with the systems administrator of the website you need to monitor.

Once the code is installed, there are numerous variables that can be set up in the dashboard section of the program.  Below you will find the links to the all the measurements available through the Core Reporting API [3].  Use this reference to explore all the dimensions and metrics available.  Besides the ability to call these programmatically by code, the majority can also be set up for real-time monitoring using the dashboard.  Click the category name to see dimensions and metrics by feature:

Visitor

Session

Traffic Sources

AdWords

Goal Conversions

Platform / Device

Geo / Network

System

Social Activities

Page Tracking

Internal Search

Site Speed

App Tracking

Event Tracking

Ecommerce

Social Interactions

User Timings

Exception Tracking

Experiments

Custom Variables

Time

Google Analytics offers a host of compelling features and benefits for everyone from senior executives. professionals in marketing, advertising and politics, to social media site and content developers.  It’s free and easy to get started.  If you want to see what Google Analytics can do first-hand, take the (short) tour.

 

 

 

References:
[1] http://www.google.com/analytics/features/index.html
[2] https://www.udemy.com/getting-started-with-google-analytics/
[3] Ledford, J. L., Teixeira, J., & Tyler, M. E. (2010). Google analytics. Wiley.
 

Modeling Neuron Electrokinetics using Markov Models

By Eric Cruet

Ion Channel Kinetics

In continuation of the previous post, the function of neurons in the brain is about detection.  They receive thousands of different input signals from other neurons, trained to detect patterns specific to their function.  A simplistic analogy is the thermostat in an oven.  When you set the oven to preheat at 350 degrees, the sensor samples the temperature until it reaches the specified threshold temperature.  Then it “fires” a signal and the alarm goes off.  In the same fashion, the neuron has a threshold and “fires” a signal to adjacent neurons only if it detects a signal significant enough to cross its threshold.  This signal is known as the action potential or spike and in the diagram above, is represented and accomplished by the excitability arrows.

Synapses are the connectors between sending neurons, dendrites are the “branches” that integrate all the inputs to the neurons, and the part of the axon that’s very close to the output end of the neuron (Axon hillock) is where the threshold activity takes place.  The farthest end of the axon branches out and turns into inputs to other neurons, completing the next chain of communication.  See the image below.

Neuron Cell Structure

The bottom line is understanding the neuron’s fundamental functionality as a detection mechanism: it receives and integrates inputs, and determines whether its threshold has been exceeded, triggering an output signal based on its inputs.

Now let’s briefly cover some basic biochemistry, since Markov models simplify and simulate ion channel kinetics.  Ion channels are where the some of vital functions involved in the triggering of the signal occur.

There are three major sources of input signals to the neuron [4]:

    1. Excitatory inputs:  these are the more common, prevalent type of input from other neurons (=approx 85% of all inputs).  Their effect excites the receiving neuron, which makes it more likely to exceed its threshold and “fire”, or trigger a signal.  These are signalled via a synaptic channel called AMPA, opened by the neurotransmitter glutamate.  In addition, AMPA receptors that are non-selective cationic channels allowing the passage of Na+ and K+and therefore have an electric equilibrium potential near 0 mV (milliVolts).
    2. Inhibitory inputs: comprising the other 15% of inputs, they have the opposite effect of the excitatory inputs.  They cause the neuron to be less likely to fire, or trigger a signal, which makes the integration process (of inputs) much more robust (by keeping the excitation under control).  Specialized neurons in the brain called inhibitory interneurons accomplish this function in the brain.  These inputs are signalled via GABA (gamma-Aminobutryc Acid) synaptic channels, via the GABA neurotransmitter.  It also causes the opening of ion channels to allow the flow of either negatively charged Cl (chloride) ions into the cell or positively charged K+(potassium) ions out of the cell.
    3. Leak inputs: technically not considered inputs since they are always active.  However, they are similar to inhibitory inputs in that they counteract excitation and keep the neuron in balance.  They receive their signalling via K+ (potassium) channels.  

The interaction between these elements in a cell create what is know as the membrane potential.  Membrane potential (also transmembrane potential or membrane voltage) is the difference in electrical potential between the interior and the exterior of a biological cell. Typical values of membrane potential range from –40 mV to –80 mV.  These are a result of differences in concentration of ions ( Na+/K+/Cl) on opposite sides of a cellular membrane.  Please refer to the following picture:

 

So we’ve covered the process by which the neuron “detects” various inputs based on chemistry.  These chemical processes generate a difference in potential (charge or mVolts) across the cell.  In simplified terms, the rate, direction, and the amount of change in this potential is what determines whether a neuron will exceed its threshold.  A brief overview of mathematical models for neuron ion channel kinetics follows.

Hodgkin-Huxley

The first, most widely-used models of neurons that is based on the Markov kinetic model was developed from Hodgkin and Huxley’s 1952 work [2] based on data from the squid giant axon. We note as before our voltage-current relationship, this time series generalized to include multiple voltage-dependent currents:

C_\mathrm{m} \frac{d V(t)}{d t} = -\sum_i I_i (t, V).

Each current is given by Ohm’s Law as: (this is derived from the basic  I = \frac{V}{R},   ) where

I = Current, V = Voltage and R = Resistance or 1/g where g = Conductance

I(t,V) = g(t,V)\cdot(V-V_\mathrm{eq})

where g(t,V) is the conductance over time, or inverse resistance, which can be expanded in terms of its constant average  and the activation and inactivation fractions m and h, respectively, that determine how many ions can flow through available membrane channels. This expansion is given by

g(t,V)=\bar{g}\cdot m(t,V)^p \cdot h(t,V)^q

and our fractions follow the first-order kinetics

\frac{d m(t,V)}{d t} = \frac{m_\infty(V)-m(t,V)}{\tau_\mathrm{m} (V)} = \alpha_\mathrm{m} (V)\cdot(1-m) - \beta_\mathrm{m} (V)\cdot m

with similar dynamics for h, where we can use either τ and m or α and β to define our gate fractions.

With such a form, all that remains is to individually investigate each current one wants to include. Typically, these include inward Ca2+ and Na+ input currents and several varieties of K+ outward currents, including a “leak” current. The end result can be at the small end 20 parameters which one must estimate or measure for an accurate model [1].  At that time this could not be computed.  This was the starting point for a subsequent series of studies, all attempting to simplify the neuron model. 

In 2008, James P. Keener, performing research in mathematics at the University of Utah, published a paper entitled “Invariant Manifold Reductions for Markovian Ion Channel Dynamics.”  He proposed using Markov jump processes to model the transitions in ion channel states [3]. These Markov models had been previously been used in conductance based models to study the dynamics of electrical activity in nerve cells, cardiac cells and muscle cells.

In summary, what Dr. Keener proved was that the classical Hodgkin-Huxley formulations of potassium and sodium channel conductance are exact solutions of Markov models, although there were no means of computing proof at the time. This means that the solutions of the Hodgkin-Huxley equations and the solutions of a full Markov model simulating neuron electrokinetic activity with an 8-state sodium channel and a 4-state potassium channel model (after several milliseconds during which initial transients decay) are exactly the same, even though the first is a system of four differential equations and the latter is a system of 13 differential equations.

There are a lot of pieces to the cognitive neuroscience puzzle.  This is one of many theoretical frameworks to approach the complex subject of brain function.  One of the drawbacks of a computational cognitive approach is that it basically designs the very functionality it tries to explain.  It is still very useful, but limited in what it can ultimately explain. It has also been one of the most successful in dealing with cognitive function precisely because it deals at a higher level modeling a system that uses mathematics and logic with the same tools.  The ultimate goal is that it piques the curiosity of those who are unfamiliar to the subject and share what I’ve learned with those who have acquired an interest.

“The larger the island of knowledge, the longer the shoreline of wonder.”

Ralph Washington Sockman (1889 – 1970)

 

References:

 

[1] Goldwyn, J. H., & Shea-Brown, E. (2011). The what and where of adding channel noise to the Hodgkin-Huxley equations. PLoS computational biology7(11), e1002247.
[2] Hodgkin, A. L., & Huxley, A. F. (1952). Propagation of electrical signals along giant nerve fibres. Proceedings of the Royal Society of London. Series B, Biological Sciences140(899), 177-183.
[3] Keener, J. P. (2009). Invariant manifold reductions for Markovian ion channel dynamics. Journal of mathematical biology58(3), 447-457.
[4] O'Reilly, R. C., Munakata, Y., Frank, M. J., & Hazy, T. E. (2012). Computational Cognitive Neuroscience. Wiki Book,.

 

 

 

Andrey Markov and Stochastic Processes

By Eric Cruet

Many years ago mathematician Andrey Markov introduced us to a branch of probability theory by applying mathematics to poetry.  By analyzing the text of Alexander Pushkin’s novel in verse Eugene Onegin, Markov spent hours crawling through patterns of vowels and consonants. In 1913, he summarized his findings in an address to the Imperial Academy of Sciences in St. Petersburg. His analysis explained the models he developed—now known as Hidden Markov Chains—and extended the theory of probability in a new direction. His approach broke new ground because it could be applied to chains of linked events by knowing their probability.  Many times the goal is to predict the antecedent events based on probabilities of future events that might seem unrelated.

Specifically, a hidden Markov model is a hybrid technique consisting of:

  • a machine learning model
  • a discrete hill climbing technique

In computer science, hill climbing is a mathematical optimization technique which belongs to the family of local search. It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution by incrementally changing a single element of the solution.

A hidden Markov chain (composed of Markov processes) is characterized by:

  • A number of states s1,s2, . . . ,sN
  • Time proceeding in discrete steps: t = 1,2,3, . . .
  • At each time step, a Markov process is in exactly one state
  • Transitions depend on current state (Markov chain order 0,1,2) and transition probability matrices 
  • Incorporates the concept of “memoryless Random Process”: a property of certain probability distributions in which the exponential distributions and the geometric distributions, and their derived probabilities from a set of random samples is distinct and has no information (i.e. “memory”) of earlier samples.

THE THREE TRADITIONAL PROBLEMS

Problem 1: Given a model λ = (A,B,π) and observation sequence , find P(O|λ)

That is, we can score an observation sequence to see how well it fits a given model

Problem 2: Given λ = (A,B,π) and O, find an optimal state sequence or uncover hidden part

Problem 3: Given O, N, and M, find the model λ that maximizes probability of O

That is, train a model to fit observations

Hidden Markov Models in Practice:

Problem 1: Problem 1: Score an observation sequence versus a given model

SOLUTION 1: BRUTE FORCE ALGORITHM: COSTLY!

  • Forward Algorithm – Instead of brute force: forward algorithm – Or “alpha pass”
  •  For t = 0,1,…,T-1 and i=0,1,…,N-1, letαt(i) = P(O0,O1,…,Ot,xt=qi|λ)
  • Probability of “partial sum” to t, and Markov process is in state qi at step t
  • What the? Complicated but….
  • Can be computed recursively, efficiently

Problem 2: Given a model, “uncover” hidden part

  • Solution 2: Given a model, find “most likely” hidden states: Given λ = (A,B,π) and O, find an optimal state sequence
  • Recall that optimal means “maximize expected number of correct states”
  • A better way: backward algorithm – Or “beta pass”

Problem 3 (find N by trial and error)

  • Given an observation sequence;
  • Assume a (hidden) Markov process exists;
  • Train a model based on observations;
  • Then given a sequence of observations, score it versus the model
  • (MACHINE LEARNING MODE-MOST EFFICIENT)
  • Programmable by an algorithm-train the learner then use!

 

Next week: An example!

If you’re really interested, Google uses HMM(Hidden Markov Models) to fine tune its PageRank Algorithm):

 

References:

A revealing introduction to hidden Markov models, by M. Stamp http://www.cs.sjsu.edu/faculty/stamp/RUA /HMM.pdf

A tutorial on hidden Markov models and selected applications in speech recognition, by L.R. Rabiner http://www.cs.ubc.ca/~murphyk/Bayes/rabi ner.pdf

Hunting for metamorphic engines, W. Wong and M. Stamp Journal in Computer Virology, Vol. 2, No. 3, December 2006, pp. 211-229

Hunting for undetectable metamorphic viruses, D. Lin and M. Stamp Journal in Computer Virology, Vol. 7, No. 3, August 2011, pp. 201-214 

——————

Wow, really fascinating, and also behind Kurzweil’s NLP and text-to-speech processing.

Now, can Markov models be used to model semiosis and semantics? Hard problem with too many variables? Variabilities in observers’ knowledge, competence, encyclopedic access? Yet the vast majority of meanings that people assert or use for any culturally meaningful expression will be similar and overlapping, not random (following principles of intersubjectivity in symbolic cognition). Meaning generation is not random, but also not specifically predictable; there are rules and constraints (which make it generative) but also unbounded (infinite productivity from finite means). Can our cognitive processes that produce meaning over some time dimension be modelled on Markov chains?

–MI

What exactly is data mining?

by Eric Cruet

One of my pet peeves has always been never use an technological acronym (or for that matter any acronym) if I don’t know what the letters mean.  As of late, phrases such as big data and data mining have become part of the technology lingo.  But what do these terms really mean?

BIG Data

In a 2001 research report [1] and related lectures, the META Group (now Gartner) analyst Doug Laney defined data growth challenges and opportunities as being three-dimensional.  IBM [2] later modified the definition to include a fourth dimension, veracity.  The “4Vs” definition states that:

Big data is high-volume, high-velocity, high-variety and questionable veracity information assets that demand cost-effective, innovative forms of information processing for enhanced insight, trust, data assurance and decision making.  Some examples:

HighVolume: As of 2012, limits on the size of data sets that are feasible to process in a reasonable amount of time were on the order of exabytes of data.  Currently, applications in the areas of meteorology, genomicsconnectomics, complex physics simulations, and biological and environmental research, as well as Internet search, finance and business informatics easily amass terabytes —even petabytes—of information.  The huge volumes of data generated present limits because the datasets cannot be processed in a reasonable amount of time.  For instance:

  • Turning 12 terabytes of Tweets created each day into improved product sentiment analysis
  • Converting 350 billion annual meter readings to better predict power consumption

High Velocity: Big data refers not only to huge datasets, but also to voluminous amounts of data in streams.

The difference between a stream and a database is that the data in a stream is lost if you do not do something about it immediately

For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to process it efficiently.  Real scenarios include:

  • Scrutinize 5 million trade events created each day to identify potential fraud
  • Analyze 500 million daily call detail records in real-time to predict customer churn faster

High Variety: Big data is any type of data – structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. Interesting and unexpected patterns are found when analyzing these data types together:

  • Monitor 100’s of live video feeds from surveillance cameras to target points of interest
  • Exploit the 80% data growth in images, video and documents to improve customer satisfaction

Questionable Veracity: With security threats on the rise, governments, scientists and businesses have trust issues in the information they use to make decisions. How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows.

Data Mining 

When you mention data mining, images of digging down deep into some endless pit of data come to some people’s minds.  But data mining focuses on amounts larger than what can fit into the system’s main memory.  Programmatically, the treatment of this information requires specific algorithms to meet processing requirements of CPU, time, memory, and resource allocation.

Depending on who you ask, you will get a different definition of data mining.  I prefer to take an algorithmic point of view: data mining is about applying algorithms to data.  But a widely accepted definition is that it is the process of discovery for “models” that fit the data. 

Statisticians were the first to use the term “data mining [3].”  Originally, “data mining” or “data dredging” was a derogatory term referring to attempts to extract information that was not supported by the data.  However, today, “data mining” has taken on a positive meaning.  Statisticians now view data mining as the construction of a statistical model, that is, an underlying distribution from which the visible data is drawn. 

In closing, data mining is more about the method of finding a model that “fits” the data, as opposed to digging through data in the hopes of finding something (you’re not sure you are looking for).  It is not synonymous with machine learning.  Some data mining appropriately use algorithms from machine learning.  This makes particularly good sense when we are not sure of what we are looking for in the data (as mentioned above).  In the next post we will cover the specifics of a well established statistical model.

References:

[1] Douglas, Laney. “3D Data Management: Controlling Data Volume, Velocity and Variety”. Gartner. Retrieved 6 February 2001.

[2] Zikopoulous, P. (2013). IBM: The Big Data Platform (pp. 34-58). N.p.: McGraw Hill.

[3] Rajaraman, A., & Ullman, J. D. (2011). Mining of massive datasets. Cambridge University Press.

Hierarchy of Patterns and Biological Hierarchies

By Eric Cruet

In William Gibson’s seventh novel, “Pattern Recognition”, the main character (Cayce Pollard) is a legend in the field of market research.  She is paid handsomely to recognize cultural and social patterns that corporations can turn into cash.  The truth, according to her friends, is that her sensitivity is closer to allergy, a morbid and sometimes violent reactivity to the symbols of the marketplace. Hired by Blue Ant, the world’s hippest ad agency, for the sort of high-corporate re-branding she’s known for, a more intriguing project emerges when the head of the firm asks her to determine who’s producing a mysterious series of video fragments that have gripped the imaginations of people around the world. The source of this footage, carefully concealed, has so far proven untraceable.  But what if the sense of purpose and meaning that she and others perceive in the footage is only an illusion — in other words, faulty pattern recognition? 

In Ray Kurzweil’s new book “How to Create a Mind: The Secret of Human Thought Revealed”, Kurzweil argues that the underlying principles and neural networks that are responsible for higher-order thinking are actually relatively simple, consisting of hierarchies of pattern recognition modules which make up the neocortex.  He states that many machines running current AI (Artificial Intelligence) software perform these same functions using similar principles and imitating the same neuro-structures that are present in the human brain.

Recent discoveries in neuroscience seem to confirm a subset of his Pattern Recognition Theory of the Mind or PRTM. Operating on pattern matching principles, it is hierarchical in nature for the processing of a particular input, such as the letters in a word.  It is also redundant, massively parallel to a hierarchy of concepts, for instance, when it processes the letters in the word “apple”, the differences in writing styles, the spoken word “apple”, variations in accents, different perspectives (on a tree, on a teacher’s desk), shadings, shapes, and varieties.

The human neocortex is capable of a very wide range of very complex abilities, yet the underlying structures and principles that are responsible for these abilities are very simple and straightforward, according to Dr. Kurzweil.  For example, he describes the architecture of the pattern recognition module and its operation.  Each module stores a “weight” for each input dendrite indicating how important that input is to the recognition.  But a good question would be: by what mechanism are these weights assigned?  In addition, he compares the successful recognition of a pattern by its corresponding module to the way NLP (Natural Language Processing) software encodes characteristics of time and space to recognize words with same letters but different pronunciation.  How does this work for the successful recognition of levels of attractiveness, joy, embarrassment (and the resulting blushing reflex)?

Although the book legitimately addresses brain processing functions that are similar computationally to state of the art AI mathematical modeling and learning, neuroscience and cognitive scientists need to think about the control and interface mechanisms between the neocortex and the other major brain components (thalamus, brainstem).  In closing, one key postulate from the text is the hierarchy of abstractions between the functional processing of the hierarchy of patterns by the pattern recognition modules to the biological hierarchy of cortical columns in the neocortex.

From a systems perspective, the task at hand presents itself as a large multivariate problem that probabilistically challenges whether a complete brain could ever be created to operate in the same way.  But it’s a hell of a start……

The fundamental uniformity of the neocortex (see above) was reconfirmed in a recent study using the latest in brain scanning technology (loc. 1199). The lead scientist in this study, Harvard neuroscientist and physicist Van J. Weeden, explains the findings thus: “‘using magnetic resonance imaging… what we found was that rather than being haphazardly arranged or independent pathways, we find that all of the pathways of the brain taken together fit together in a single exceedingly simple structure. They basically look like a cube. They basically run in three perpendicular directions, and in each one of those three directions the pathways are highly parallel to each other and arranged in arrays. So, instead of independent spaghettis, we see that the connectivity of the brain is, in a sense, a single coherent structure” (loc. 1212).

 

References:
Grinvald, A., & Hildesheim, R. (2004). VSDI: a new era in functional imaging of cortical dynamics. Nature Reviews Neuroscience5(11), 874-885.
Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. Oxford University Press, USA. (2), 279-292.
Joseph, R. (2011). Neuroscience: Neuropsychology, Neuropsychiatry, Behavioral Neurology, Brain & Mind: Primer.
Kurzweil, R. (2012). How to Create a Mind: The Secret of Human Thought Revealed. Viking Adult.

 

 

 

Culturonomics – Think Outside the Box

by Eric Cruet

In 2011, a group of scientists — mostly in mathematics and evolutionary psychology — published an article in Science titled “Quantitative Analysis of Culture Using Millions of Digitized Books”.  The authors’ technique, called “culturomics,” would, “extend the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.” The authors employed a “corpus” of more than 5 million books — 500 billion words — that have been scanned by Google as part of the Google Books project.  These books, the authors assert, represent about 4 percent of all the books ever published, and will allow the kind of statistically significant analysis common to many sciences.

Their main method of analysis is to count the number of times a particular word or phrase (referred to as an n-gram) occurs over time in the corpus (Try your own hand at n-grams here).  A ‘one-gram’ plots the frequency of a single word such as “chided” over time; a ‘two-gram’ shows the frequency of a contiguous phrase, such as ‘touch base’ (see‘Think outside the box’).

Their full data set includes over 2 billion such “culturomic trajectories”.  One of the examples the authors give is to trace the usage of the year “1865”.  They note that “1865” was not discussed much before the actual year 1865, that it appeared a lot in 1865, and that its usage dropped off after 1865.  They call this evidence of collective memory.  Below is another example.

Google unveiled the tool on 16 December 2010.  One of the first notable discoveries was made by two Harvard postdocs, Lieberman Aiden and Jean-Baptiste Michel, also members of the team that published the original paper in Nature.  When comparing German and English texts from the first half of the twentieth century they discovered that the Nazi regime suppressed mention of the Jewish artist Marc Chagall, and that the n-grams tool could be used to identify artists, writers or activists whose suppression had hitherto been unknown.  They called their approach culturomics, a reference to the genomics-like scale of the literary corpus.  The term has evolved as a new scientific discipline of the digital humanities—the use of computer algorithms to search for meaning in large databases of text and media.

In the first 24 hours after its launch, the n-grams viewer (ngrams.googlelabs.com) received more than one million hits.  Dan Cohen, director of the Roy Rosenzweig Center for History and New Media at George Mason University in Fairfax, Virginia, calls the tool a “gateway drug” for the digital humanities, a field that has been gaining pace and funding in the past few years (see ‘A discipline goes digital’).  The name is an umbrella term for approaches that include not just the assembly of large-scale databases of media and other cultural data, but also the willingness of humanities scholars to develop the algorithms to engage with them.  

However, some scholars have deep reservations about the digital humanities movement as a whole — especially if it will come at the expense of traditional approaches.  Also, humanities researchers from traditional camps complain that their field can never be encapsulated by the frequency charts of words and phrases produced by an n-grams tool.  Comparing the contribution that books provide in the context of the cultural encyclopedia to the corresponding DNA strands of human experience is a dangerous proposition….or just a cultural posthumanist one?

Culturonomics 2.0 at TedX

 

References:
Michel, J. B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Pickett, J. P., … & Aiden, E. L. (2011). Quantitative analysis of culture using millions of digitized books. science331(6014), 176-182.
Lin, Y., Michel, J. B., Aiden, E. L., Orwant, J., Brockman, W., & Petrov, S. (2012, July). Syntactic annotations for the google books ngram corpus. In Proceedings of the ACL 2012 System Demonstrations (pp. 169-174). Association for Computational Linguistics.
Aiden, L. (2011). Google Books, Wikipedia, and the future of culturomics.

 

Advances in Building the Human Brain

By Eric Cruet

“The making of a synthetic brain requires now little more than time and labour….Such a machine might be used in the distant future…..to explore regions of intellectual subtlety and complexity at present beyond the human powers…..How will it end?  I suggest that the simple way to find out is to make the thing and see.”

Ross Ashby, Design for a Brain (1948, 382-83)

The human brain is exceedingly complex and studying it encompasses gathering information across a range of levels, from molecular processes to behavior. The sheer breadth of this undertaking has perhaps led to an increased specialization of brain research.  One of the areas of specialization that has gathered steam recently is the modeling of the brain on silicon.  However, even when considering computing’s exponential growth in processing power, it is still unimpressive as compared with the “specifications” of the human brain.

The average human brain packs a hundred billion or so neurons − connected by a quadrillion (1015) constantly changing synapses − into a space the size of a honeydew melon.  It consumes a measly 20 watts, about what one compact fluorescent light bulb (CFL) uses.  Replicating this awesome wetware with traditional digital circuits would require a supercomputer 1000 times more powerful than those currently available.  It would also require a nuclear power plant to run it.

Fortunately, the types of circuits needed to model the brain are not necessarily digital.  Currently there are several projects around the world focusing on building brain models that use specialized analog circuits.  Unlike traditional digital circuits in today’s computers, which could take weeks or even months to model a single second of brain operation, these analog circuits can duplicate brain activity as fast or even faster that it really occurs, while consuming a fraction of the power.  But the drawback of analog chips is that they aren’t very programmable.  This makes it difficult to make changes in the model, which is a requirement, since initially it is not known what level of biological detail is needed in order to simulate brain behavior.

In the race to build the first low power, large scale, digital model of the brain, the leading research effort is dubbed SpiNNaker (Spiking Neural Network Architecture), a project collaboration between the following universities and industrial partners:

  • University of Manchester
  • University of Southampton
  • University of Cambridge
  • University of Sheffield
  • ARM Ltd (link to these)
  • Silistix Ltd
  • Thales

The design of this machine looks a lot like a conventional parallel processor but it significantly changes the way the chips intercommunicate.  Traditional CMOS (digital) chips were not invented with parallel computing in mind, which is the way our minds operate.  The logic gates in silicon chips usually connect to a relatively few number of devices, whereas neurons in the brain receive signals from hundreds of thousands of other neurons.  In addition, neurons are always in a “ready” state, and respond instantaneously after receiving a signal.  Silicon chips rely on clocking to advance computation in discrete time steps, which consumes a lot of power.  Also, the connections between CMOS-based processors are fixed, and the synapses that connect neurons are always in flux.

One way to speed things up is custom analog circuits that directly replicate brain operation.  Some of the chips under development can run 10,000 times faster than their corresponding part of the brain while being energy efficient.  But as we mentioned previously, as speedy and efficient as they can be, they are not very flexible.

The basic building block of the SpiNNaker machine is a multicore System-on-Chip (see below). The chip is a Globally Asynchronous Locally Synchronous (GALS) system with 18 ARM968 processor nodes residing in synchronous islands, surrounded by a lightweight, packet-switched asynchronous communications infrastructure.  Clicking on the PCB (Printed Circuit Board) will take you to the SpiNNaker Project website.

The figure below illustrates shows that each SpiNNaker chip contains two silicon dies: the SpiNNaker die itself and a 128 MByte SDRAM (Synchronous Dynamic Random Access Memory) die, which is physically mounted on top of the SpiNNaker die and stitch-bonded to it.

The micro-architecture assumes that processors are ‘free’: the real cost of computing is energy. This is why we use energy-efficient ARM9 embedded processors and Mobile DDR (Double Data Rate) SDRAM, in both cases sacrificing some performance for greatly enhanced power efficiency.  These are the same type of chips found in today’s mobile electronics.

It is obvious that although great strides are being made at developing a “digital” brain, simply “building” a brain from the bottom up by replicating its parts, connections, and organization fails to capture its essential function—complex behavior. Instead, just as engineers can only construct cars and computers because they know how they work, we will only be able to construct a brain if we know how it works—that is, if we understand the biological and computational details that are carried out in individual brain areas, and how these details are implemented on the level of neural networks.

 

References:

http://apt.cs.man.ac.uk/projects/SpiNNaker/project/
Eliasmith, C., Stewart, T. C., Choo, X., Bekolay, T., DeWolf, T., Tang, C., & Rasmussen, D. (2012). A large-scale model of the functioning brain. science,338(6111), 1202-1205.
Pickering, A. (2010). The cybernetic brain: sketches of another future. University of Chicago Press.
Price, D., Jarman, A. P., Mason, J. O., & Kind, P. C. (2011). Building brains: An introduction to neural development. Wiley.

 

The Quantitative Mapping of Change in Science

By Eric Cruet

As a follow up to last week’s post (re-posted below), we will consider an application where the quantitative method described [1] will be used to map changes in the sciences.

In this century, the volume of scientific research has become vast and complex.  However, the ever-increasing size and specialized nature of this body today makes it difficult for any group of experts to fully and fairly evaluate the bewildering array of material, both accomplished and proposed.

Therefore, a library faced with collection decisions, a foundation making funding choices, or a government office weighing national research needs must rely on expert analysis of scientific research performance. 

One approach is Bibliometrics.  It utilizes quantitative analysis and statistics to find patterns of publication within a given field or body of literature. 

Through cumulative cycles of modeling and experimentation, scientific research undergoes constant change: scientists self-organize into fields that grow and shrink, merge and split. Citation patterns among scientific journals allow us to track this flow of ideas and how the flow of ideas changes over time [2].

For the purposes of this simplified example [3] the citation data is mined from Thomson-Reuters’ Journal Citation Reports circa 1997–2007, which aggregate, at the journal level, approximately 35,000,000 citations from more than 7000 journals over the past decade.  Citations are included from articles published in a given year  referencing articles published in the previous two years [7].

Method

  1. We first cluster the networks with the information-theoretic clustering method presented in the previous post [1], which can reveal regularities of information flow across directed and weighted networks.  The method will be applied to the pre-mined citation data.
  2. With appropriate modifications, the described method of bootstrap resampling accompanied by significance clustering is general and works for any type of network and any clustering algorithm. 
  3. To assess the accuracy of a clustering, we resample a large number (n > 1000) of bootstrap networks from the original network [7].  For the directed and weighted citation network of science, in which journals correspond to nodes and citations to directed and weighted links, we treat the citations as independent events and resample the weight of each link from a Poisson distribution with the link weight in the original network as mean. This parametric resampling of citations approximates a non-parametric resampling of articles, which makes no assumption about the underlying distribution.  For scalar summary statistics, it is straightforward to assign a 95% bootstrap confidence interval as spanning the 2.5th and 97.5th percentiles of the bootstrap distribution [4], but different data sets and clusters may require a different approach [5].
  4. To identify the journals that are significantly associated with the clusters to which they are assigned, we use simulated annealing to search for the largest subset of journals within each cluster of the original network that are clustered together in at least 95% of all bootstrap networks. To identify the clusters that are significantly distinct from all other clusters, we search for clusters whose significant subset is clustered with no other cluster’s significant subset in at least 95% of all bootstrap networks [7].  Figure 1 below shows this technique applied to a network at two different time points:
  5.  Once we have a significance cluster for the network at each time, we want to reveal the trends in the data by simplifying and highlighting the structural changes between clusters. The bottom of Figure 1, shows how to construct an alluvial diagram of the example networks that highlights and summarizes the structural differences between the time 1 and time 2 significance clusters. Each cluster in the network is represented by an equivalently colored block in the alluvial diagram. Darker colors represent nodes that have statistical significance, while lighter colors represent non-significant assignments. Changes in the clustering structure from one time period to the next are represented by the mergers and divergences that occur in the ribbons linking the blocks at time 1 and time 2.

    Diagram from: [7] Rosvall, M., & Bergstrom, C. T. (2010). Mapping change in large networks. PloS one, 5(1), e8694.http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0008694

  6. The resulting alluvial diagram for the actual data (above) illustrates, for example, how over the years 2001–2005, urology gradually splits off from oncology and how the field of infectious diseases becomes a unique discipline, instead of a subset of medicine, in 2003. But these changes are just two of many over this period. In the same diagram, we also highlight the biggest structural change in scientific citation patterns over the past decade: the transformation of neuroscience from interdisciplinary specialty to a mature and stand-alone discipline, comparable to physics or chemistry, economics or law, molecular biology or medicine [7].
  7. In their citation behavior, neuroscientists have finally cleaved from their traditional disciplines and united to form what is now the fifth largest field in the sciences (after molecular and cell biology, physics, chemistry, and medicine). Although this interdisciplinary integration has been ongoing since the 1950s [6], only in the last decade has this change come to dominate the citation structure of the field and overwhelm the intellectual ties along traditional departmental lines.

References: 

Credit for this research belongs to the work performed in [7] Rosvall, M., & Bergstrom, C. T. (2010). Mapping change in large networks. PloS one5(1), e8694.

[1] https://blogs.commons.georgetown.edu/cctp-903-summer2013/2013/05/23/quantitative-mapping-of-change/
[2] de Solla Price DJ (1965) Networks of scientific papers. Science 149: 510–515. doi:10.1126/science.149.3683.510.
[3]Heimeriks, G., Hoerlesberger, M., & Van den Besselaar, P. (2003). Mapping communication and collaboration in heterogeneous research networks.Scientometrics58(2), 391-413.
[4] Costenbader E, Valente T (2003) The stability of centrality measures when networks are sampled. Soc Networks 25: 283–307. doi: 10.1016/S0378-8733(03)00012-1.
[5] Trevor. Hastie, Robert. Tibshirani, & Friedman, J. J. H. (2001). The elements of statistical learning (Vol. 1). New York: Springer.
[6] Gfeller, D., Chappelier, J. C., & De Los Rios, P. (2005). Finding instabilities in the community structure of complex networks. Physical Review E72(5), 056135.

[7] Rosvall, M., & Bergstrom, C. T. (2010). Mapping change in large networks. PloS one5(1), e8694. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0008694

A method for the quantitative mapping of change

By Eric Cruet

The problem of change[1].  Much of mankind’s preoccupation has been with changes in the sciences, technology, sociology and economics.  More recently, we seem to be concerned about variations in climate change, global financial states, the effect of technology on society, and the increasing use of unlawful violence intended to coerce or to intimidate governments or societies i.e terrorism.

Traditionally, network, graph, and cluster analysis are the mathematical tools used to understand specific instances of the data generated by these scenarios at a given point in time. But without methods to distinguish between real patterns and statistical error, which can be significant in large data sets, these approaches may not be ideal for studying change.  By assigning weights to individual networks, we can determine meaningful structural differences vs. random fluctuations [3].

Alternatively a bootstrap technique [2] can be used when there are multiple networks to arrive at an accurate estimate by resampling the empirical distribution of observations for each network.  In the case of a single network, resampling can be accomplished by using a parametric model to fit the link weights without undermining the individual characteristics of the nodes.  Using this technique, we can determine cluster significance and also estimate the accuracy of the summary statistics (μ, σ. ρ) based on the proportion of bootstrap networks that support the observation. 

Statistically:

 

Diagramatically:

The standard procedure to cluster networks is to minimize an objective function over probable partitions (left side of diagram).  By resampling the weighted links of the original network, a bootstrap world is created of resampled networks.  Next, these are clustered and compared to the clustering of the original network (2nd row, right side).  This provides an estimate of the probability that a node belongs to a specific cluster.  The result is a “significant clustering” [3].  For example, in the diagram above, the darker nodes (bottom of the diagram) are clustered together in at least 95% of the 1000 bootstrap networks.  Several algorithms in the public domain exist to automate the majority of these tasks.  

Finally, once a significance cluster has been generated for the network at each point in time,  an alluvial diagram is used to reveal the trends in the data. An alluvial diagram (bottom of the picture) orders the cluster by size and reveals changes in network structures over time [3]. Please refer to the diagram below:

As you can see from the alluvial diagram, from time 1 to time 2, the condition scenario represented by ORANGE clustered with the condition scenario represented by PINK.  This clustering was a result of some underlying change, and was not obvious at time 1.  As a result, the bootstrap/cluster analysis allowed the quantitative mapping of the change to take place.

The model can be used in a variety of scenarios:  to map the changes in global weather patterns, US emigration flows from state to state based on various factors (employment, housing prices, education, income per capita), variations in federal funds market in response to major events [3], and track global targets of terrorism activity.

But my main area of interest is illustrating the method by applying to map the change in the structure of science [4].  Stay tuned.  I conclude with a rather lengthy but appropriate and relevant quote.

From Michael Focault’s “The Order of Things”

The problem of change.  It has been said that this work denies the very possibility of change. And yet my main concern has been with changes. In fact, two things in particular struck me: the suddenness and thorough­ness with which certain sciences were sometimes reorganized; and the fact that at the same time similar changes occurred in apparently very different disciplines. Within a few years (around 1800), the tradition of general grammar was replaced by an essentially historical philology; natural classifications were ordered according to the analyses of comparative anatomy; and a political economy was founded whose main themes were labour and production. Confronted by such a curious combination of phenomena, it occurred to me that these changes should be examined more closely, without being reduced, in the name of continuity, in either abruptness or scope. It seemed to me at the outset that different kinds of change were taking place in scientific discourse – changes that did not occur at the same level, proceed at the same pace, or obey the same laws; the way in which, within a particular science, new propositions were pro­duced, new facts isolated, or new concepts built up (the events that make up the everyday life of a science) did not, in all probability, follow the same model as the appearance of new fields of study (and the frequently corresponding disappearance of old ones); but the appearance of new fields of study must not, in turn, be confused with those overall re-dis­tributions that alter not only the general form of a science, but also its relations with other areas of knowledge. It seemed to me, therefore, that all these changes should not be treated at the same level, or be made to culminate at a single point, as is sometimes done, or be attributed to the genius of an individual, or a new collective spirit, or even to the fecundity of a single discovery; that it would be better to respect such differences, and even to try to grasp them in their specificity. In this way I tried to describe the combination of corresponding transformations that char­acterized the appearance of biology, political economy, philology, a number of human sciences, and a new type of philosophy, at the threshold of the nineteenth century.

References: 

[1] Foucault, M. (2002). The order of things. Routledge.

[2] Palla, G., Derényi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature,435(7043), 814-818.

[3] Rosvall, M., & Bergstrom, C. T. (2010). Mapping change in large networks. PloS one5(1), e8694.

[4] de Solla Price DJ (1965) Networks of scientific papers. Science 149: 510–515. doi:10.1126/science.149.3683.510.

Note: Dragon Dictate is used as a speech to text transcriber for a portion of this document.  Although I make every effort to proofread the postings, any unusual syntax, lexicon or semantic error in language is attributed to my lack of attention and the immaturity of this technology.

Sentiment Analysis/Appraisal Theory

By Eric Cruet

Opinions are like elbows; everyone has two of them for every topic.  Scientifically, however, opinions are very difficult to examine.  As of late, the computational linguistic community has recognized value in extracting, mining, and analyzing opinions from bulk text found in SMSs (Social Media Sites).  Sentiment Analysis is the task of having computers use machine learning algorithms to automatically perform such tasks, and attempt the classification of the opinions into “emotions”.

Computational approaches to sentiment analysis focus on extracting the affective content of the text from the detection of expressions of sentiment.  These expressions are assigned a positive or negative value representing the corresponding positive, negative or neutral sentiment towards a specific issue.  For example, using information retrieval, text mining, and computational linguistics, one can calculate opinions using the Support Vector Machines classification algorithm with a “bag of sentiment words”.  This technique was very popular for movie review classifications.  In a bag of words technique, the classifier identifies single word opinion clues and weights them according to their ability to help classify reviews as positive or negative (the number of times they appear).   So the word “sucked” (as in the movie sucked) would have a higher weight than the word “ok” (as in the movie was ok).

 

It is obvious that there are many opinion scenarios that this classification technique will not address.  For instance, it cannot account for the effect of the word “not” which will turn a review of “good” into “not good”, thereby reversing a positive sentiment into a negative one.   It also cannot account for more complicated sentiments i.e. “I wish the movie was in 3D.”   In order to incorporate more complicated sentiment tasks, it is important to further structure the approach in order to capture these elements.

The tasks described above were part of sentiment classification. In order to incorporate more complicated sentiment tasks, a more appropriate technique is structured opinion extraction.

The goal of structured opinion extraction is not only to extract individual opinions from text, but to also break down those opinions and parts so that those subcomponents can be used by sentiment analysis applications.  This is defined by identifying product features and opinions about those product features.

One way to accomplish this is using an appraisal expression. An appraisal expression is a basic grammatical structure expressing a single evaluation, based on linguistic analysis of evaluative language, to correctly capture the full complexity of opinion expressions.  Most existing work and corpora in sentiment analysis have considered only three parts of an appraisal expression: evaluator, target and attitude.  However, Hunston’s and Sinclair’s [3] local grammar of evaluation demonstrated the existence of other parts of an appraisal expression that can also provide useful information about the opinion when they are identified. These parts include superordinates, aspects, processes, and expressors.

Evaluator I attitude love it target when she walks to me and smiles.

 

Target He is Attitude one mean superordinate bastardevaluator said the employee.

 

Extracting appraisal expressions is an essential subtask in Sentiment Analysis because it provides sentiment words that can help define the features used by many higher-level applications.  As stated in Cognitive Appraisal theory, we decide what to feel after interpreting or explaining what has just happened. Two things are important in this: whether we interpret the event as positive or negative and what we believe is the cause of the event.  The resulting classification of the appraisal expression allow for a finer granularity in the application of quantitative methods so the results more closely represent what is being measured.

 

 

 

 

Google Prediction API

 

[1] Asher, N., Benamara, F., & Mathieu, Y. Y. (2009). Appraisal of opinion expressions in discourse. Lingvisticæ Investigationes32(2), 279-292.

[2] Hunston, S., & Sinclair, J. (2000). A local grammar of evaluation. Evaluation in Text: Authorial stance and the construction of discourse, 74-101.

[3] Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. Oxford University Press, USA. (2), 279-292.

[4] Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics37(2), 267-307.