The Quantitative Mapping of Change in Science

By Eric Cruet

As a follow up to last week’s post (re-posted below), we will consider an application where the quantitative method described [1] will be used to map changes in the sciences.

In this century, the volume of scientific research has become vast and complex.  However, the ever-increasing size and specialized nature of this body today makes it difficult for any group of experts to fully and fairly evaluate the bewildering array of material, both accomplished and proposed.

Therefore, a library faced with collection decisions, a foundation making funding choices, or a government office weighing national research needs must rely on expert analysis of scientific research performance. 

One approach is Bibliometrics.  It utilizes quantitative analysis and statistics to find patterns of publication within a given field or body of literature. 

Through cumulative cycles of modeling and experimentation, scientific research undergoes constant change: scientists self-organize into fields that grow and shrink, merge and split. Citation patterns among scientific journals allow us to track this flow of ideas and how the flow of ideas changes over time [2].

For the purposes of this simplified example [3] the citation data is mined from Thomson-Reuters’ Journal Citation Reports circa 1997–2007, which aggregate, at the journal level, approximately 35,000,000 citations from more than 7000 journals over the past decade.  Citations are included from articles published in a given year  referencing articles published in the previous two years [7].

Method

  1. We first cluster the networks with the information-theoretic clustering method presented in the previous post [1], which can reveal regularities of information flow across directed and weighted networks.  The method will be applied to the pre-mined citation data.
  2. With appropriate modifications, the described method of bootstrap resampling accompanied by significance clustering is general and works for any type of network and any clustering algorithm. 
  3. To assess the accuracy of a clustering, we resample a large number (n > 1000) of bootstrap networks from the original network [7].  For the directed and weighted citation network of science, in which journals correspond to nodes and citations to directed and weighted links, we treat the citations as independent events and resample the weight of each link from a Poisson distribution with the link weight in the original network as mean. This parametric resampling of citations approximates a non-parametric resampling of articles, which makes no assumption about the underlying distribution.  For scalar summary statistics, it is straightforward to assign a 95% bootstrap confidence interval as spanning the 2.5th and 97.5th percentiles of the bootstrap distribution [4], but different data sets and clusters may require a different approach [5].
  4. To identify the journals that are significantly associated with the clusters to which they are assigned, we use simulated annealing to search for the largest subset of journals within each cluster of the original network that are clustered together in at least 95% of all bootstrap networks. To identify the clusters that are significantly distinct from all other clusters, we search for clusters whose significant subset is clustered with no other cluster’s significant subset in at least 95% of all bootstrap networks [7].  Figure 1 below shows this technique applied to a network at two different time points:
  5.  Once we have a significance cluster for the network at each time, we want to reveal the trends in the data by simplifying and highlighting the structural changes between clusters. The bottom of Figure 1, shows how to construct an alluvial diagram of the example networks that highlights and summarizes the structural differences between the time 1 and time 2 significance clusters. Each cluster in the network is represented by an equivalently colored block in the alluvial diagram. Darker colors represent nodes that have statistical significance, while lighter colors represent non-significant assignments. Changes in the clustering structure from one time period to the next are represented by the mergers and divergences that occur in the ribbons linking the blocks at time 1 and time 2.

    Diagram from: [7] Rosvall, M., & Bergstrom, C. T. (2010). Mapping change in large networks. PloS one, 5(1), e8694.http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0008694

  6. The resulting alluvial diagram for the actual data (above) illustrates, for example, how over the years 2001–2005, urology gradually splits off from oncology and how the field of infectious diseases becomes a unique discipline, instead of a subset of medicine, in 2003. But these changes are just two of many over this period. In the same diagram, we also highlight the biggest structural change in scientific citation patterns over the past decade: the transformation of neuroscience from interdisciplinary specialty to a mature and stand-alone discipline, comparable to physics or chemistry, economics or law, molecular biology or medicine [7].
  7. In their citation behavior, neuroscientists have finally cleaved from their traditional disciplines and united to form what is now the fifth largest field in the sciences (after molecular and cell biology, physics, chemistry, and medicine). Although this interdisciplinary integration has been ongoing since the 1950s [6], only in the last decade has this change come to dominate the citation structure of the field and overwhelm the intellectual ties along traditional departmental lines.

References: 

Credit for this research belongs to the work performed in [7] Rosvall, M., & Bergstrom, C. T. (2010). Mapping change in large networks. PloS one5(1), e8694.

[1] https://blogs.commons.georgetown.edu/cctp-903-summer2013/2013/05/23/quantitative-mapping-of-change/
[2] de Solla Price DJ (1965) Networks of scientific papers. Science 149: 510–515. doi:10.1126/science.149.3683.510.
[3]Heimeriks, G., Hoerlesberger, M., & Van den Besselaar, P. (2003). Mapping communication and collaboration in heterogeneous research networks.Scientometrics58(2), 391-413.
[4] Costenbader E, Valente T (2003) The stability of centrality measures when networks are sampled. Soc Networks 25: 283–307. doi: 10.1016/S0378-8733(03)00012-1.
[5] Trevor. Hastie, Robert. Tibshirani, & Friedman, J. J. H. (2001). The elements of statistical learning (Vol. 1). New York: Springer.
[6] Gfeller, D., Chappelier, J. C., & De Los Rios, P. (2005). Finding instabilities in the community structure of complex networks. Physical Review E72(5), 056135.

[7] Rosvall, M., & Bergstrom, C. T. (2010). Mapping change in large networks. PloS one5(1), e8694. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0008694