Uncovering historical patterns in scientific publications

Using probabilistic topic modeling, researchers have developed a system, called Bookworm-arXiv, that can parse through thousands of scientific manuscripts located on arXiv, thereby providing immense data manipulation capabilities to science historians. The same team helped to develop Google’s n-gram viewer, which provides similar in-text search capabilities for Google collection of books.

the Cultural Observatory, will soon inaugurate a browser that searches for such language changes in a large online repository of scientific papers known as arXiv (pronounced like “archive”)

arXiv under load due to Perelman's Fields Medal (Photo credit: ktheory)

Users will be able to type in one or two words at the site, called Bookworm-arXiv, and immediately see a graph showing the ups and downs of the phrase’s use in the archive


The system will enable researchers to understand the history of scientific concepts and the diffusion of knowledge through the scientific community.

Users can then click on the graph and drill down to read the original papers in which the terms appear, tracing ideas back toward their roots, or to spots where scientific ideas spread from one field to another.


