**By Eric Cruet**

As a follow up to last week’s post (re-posted below), we will consider an application where the quantitative method described [1] will be used to map changes in the sciences.

In this century, the volume of scientific research has become vast and complex. However, the ever-increasing size and specialized nature of this body today makes it difficult for any group of experts to fully and fairly evaluate the bewildering array of material, both accomplished and proposed.

Therefore, a library faced with collection decisions, a foundation making funding choices, or a government office weighing national research needs must rely on expert analysis of scientific research performance.

One approach is Bibliometrics. It utilizes quantitative analysis and statistics to find patterns of publication within a given field or body of literature.

Through cumulative cycles of modeling and experimentation, scientific research undergoes constant change: scientists self-organize into fields that grow and shrink, merge and split. Citation patterns among scientific journals allow us to track this flow of ideas and how the flow of ideas changes over time [2].

For the purposes of this simplified example [3] the citation data is mined from Thomson-Reuters’ Journal Citation Reports circa 1997–2007, which aggregate, at the journal level, approximately 35,000,000 citations from more than 7000 journals over the past decade. Citations are included from articles published in a given year referencing articles published in the previous two years [7].

## Method

- We first cluster the networks with the information-theoretic clustering method presented in the previous post [1], which can reveal regularities of information flow across directed and weighted networks. The method will be applied to the pre-mined citation data.
- With appropriate modifications, the described method of bootstrap resampling accompanied by significance clustering is general and works for any type of network and any clustering algorithm.
- To assess the accuracy of a clustering, we resample a large number (n > 1000) of bootstrap networks from the original network [7]. For the directed and weighted citation network of science, in which journals correspond to nodes and citations to directed and weighted links, we treat the citations as independent events and resample the weight of each link from a Poisson distribution with the link weight in the original network as mean. This
*parametric*resampling of citations approximates a*non-parametric*resampling of articles, which makes no assumption about the underlying distribution. For scalar summary statistics, it is straightforward to assign a 95% bootstrap confidence interval as spanning the 2.5th and 97.5th percentiles of the bootstrap distribution [4], but different data sets and clusters may require a different approach [5]. - To identify the journals that are significantly associated with the clusters to which they are assigned, we use simulated annealing to search for the largest subset of journals within each cluster of the original network that are clustered together in at least 95% of all bootstrap networks. To identify the clusters that are significantly distinct from all other clusters, we search for clusters whose significant subset is clustered with no other cluster’s significant subset in at least 95% of all bootstrap networks [7]. Figure 1 below shows this technique applied to a network at two different time points:
- Once we have a significance cluster for the network at each time, we want to reveal the trends in the data by simplifying and highlighting the structural changes between clusters. The bottom of Figure 1, shows how to construct an
*alluvial diagram*of the example networks that highlights and summarizes the structural differences between the time 1 and time 2 significance clusters. Each cluster in the network is represented by an equivalently colored block in the alluvial diagram.**Darker**colors represent nodes that have statistical significance, while*lighter*colors represent non-significant assignments. Changes in the clustering structure from one time period to the next are represented by the mergers and divergences that occur in the ribbons linking the blocks at time 1 and time 2. - The resulting alluvial diagram for the actual data (above) illustrates, for example, how over the years 2001–2005, urology gradually splits off from oncology and how the field of infectious diseases becomes a unique discipline, instead of a subset of medicine, in 2003. But these changes are just two of many over this period. In the same diagram, we also highlight the biggest structural change in scientific citation patterns over the past decade: the transformation of neuroscience from interdisciplinary specialty to a mature and stand-alone discipline, comparable to physics or chemistry, economics or law, molecular biology or medicine [7].
- In their citation behavior, neuroscientists have finally cleaved from their traditional disciplines and united to form what is now the fifth largest field in the sciences (after molecular and cell biology, physics, chemistry, and medicine). Although this interdisciplinary integration has been ongoing since the 1950s [6], only in the last decade has this change come to dominate the citation structure of the field and overwhelm the intellectual ties along traditional departmental lines.

References:

Credit for this research belongs to the work performed in [7] Rosvall, M., & Bergstrom, C. T. (2010). Mapping change in large networks. *PloS one*, *5*(1), e8694.

*Scientometrics*,

*58*(2), 391-413. [4] Costenbader E, Valente T (2003) The stability of centrality measures when networks are sampled. Soc Networks 25: 283–307. doi: 10.1016/S0378-8733(03)00012-1. [5] Trevor. Hastie, Robert. Tibshirani, & Friedman, J. J. H. (2001).

*The elements of statistical learning*(Vol. 1). New York: Springer. [6] Gfeller, D., Chappelier, J. C., & De Los Rios, P. (2005). Finding instabilities in the community structure of complex networks.

*Physical Review E*,

*72*(5), 056135.

[7] Rosvall, M., & Bergstrom, C. T. (2010). Mapping change in large networks. *PloS one*, *5*(1), e8694. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0008694