Uncovering historical patterns in scientific publications

Using probabilistic topic modeling, researchers have developed a system, called Bookworm-arXiv, that can parse through thousands of scientific manuscripts located on arXiv, thereby providing immense data manipulation capabilities to science historians. The same team helped to develop Google’s n-gram viewer, which provides similar in-text search capabilities for Google collection of books.

the Cultural Observatory, will soon inaugurate a browser that searches for such language changes in a large online repository of scientific papers known as arXiv (pronounced like “archive”)

arXiv under load due to Perelman's Fields Medal

arXiv under load due to Perelman's Fields Medal (Photo credit: ktheory)

Users will be able to type in one or two words at the site, called Bookworm-arXiv, and immediately see a graph showing the ups and downs of the phrase’s use in the archive


The system will enable researchers to understand the history of scientific concepts and the diffusion of knowledge through the scientific community.

Users can then click on the graph and drill down to read the original papers in which the terms appear, tracing ideas back toward their roots, or to spots where scientific ideas spread from one field to another.


via Words by the Millions, Sorted by Software – NYTimes.com.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s