Why you should link to my blog and I should link to yours

A forthcoming paper in Management Science investigates the theoretical reasons behind blog links. Linking one’s blog to another blog by way of static or dynamic links is common in the blogosphere.

In a small random sample of blogs, we found that 61% of blogs contained at least one link to another site in the last 10 posts, with approximately 72% of links going to other blogs, 13% to newspaper sites, and the rest to other sites.

However, on face value the practice of linking to another (potentially rival) blog might lead to some complications, and thus seems counter-intuitive.

First, a reader who follows the outgoing link may not return to the original site in the short run. Second, a link implies that the linked blog has interesting content, which can improve the reader’s perception of a competing site.

Using a game theoretic model, the authors of this paper show that bloggers link to another blog, not just in mere expectation of reciprocity (i.e. link to me because I linked to you), which happens to be the most common reasoning given by blog gurus.

Instead, bloggers link due to their desire to signal the quality of their own blog. By linking to another blog, bloggers not just showcase their own ability to find breaking news, but also signal the ability of the linked blog to break news. On the whole, this kind of behavior leads to an increase in the overall experience of readers.

Bloggers link because doing so improves the reader’s inference about the blog’s quality and ultimately increases the readership to their site

An important implication of these findings are that the sum of links of a blog can be a proxy for blog success / quality.

… we find that both the number of incoming and outgoing links may serve as a
metric of blog quality.

So what are you waiting for? Let the links begin! Smile


Image by xkcd


Mayzlin, D. and Yoganarasimhan, H. “Link to Success: How Blogs Build an Audience by Promoting Rivals”, Management Science.

mnsc.1110.1510; published online before print July 30, 2012, doi:10.1287/mnsc.1110.1510


What makes a tweet go viral?

There is a lot of recent research studies that investigate what factors influence the popularity of memes on social networks. Much of this research analyzes twitter posts and has identified many reasons why certain tweets go viral. These include factors related to the tweet itself (e.g. how controversial the tweet is) and factors related to the tweeters (e.g. number of followers, influence, and frequency of posting). New work says that ‘going viral’ is a random process.

Visualizations of meme diffusion networks for different topics.

This new study uses an agent-based model to study this phenomenon. This model simulates message sharing on a social network and incorporates two key characteristics of such a context: users have limited attention spans and can only view a portion of all tweets.

The predictions of our model are consistent with empirical data from Twitter, a popular microblogging platform. Surprisingly, we can explain the massive heterogeneity in the popularity and persistence of memes as deriving from a combination of the competition for our limited attention and the structure of the social network, without the need to assume different intrinsic values among ideas.

Or in other words, the pattern of twitter memes can be replicated in the absence of tweet or tweeter based factors.  This raises interesting questions regarding the direction of causality – do tweets go viral because of certain factors, or is the popularity of posts on social networks a random process and we find mere correlations in our bid to find explanations. While some say that this correlation versus causation conundrum can be solved only empirically, others say that a controlled, experimental approach is the way to go.

Suppose, Menczer says, that in his study he randomly assigned different colours to each tweet. If red tweets ended up being the most popular, one could argue that colour was a predictive factor for success when in reality the popularity of red-coloured tweets was coincidental.

Read more at  Going viral on Twitter is a random act – tech – 13 April 2012 – New Scientist and Competition among memes in a world with limited attention : Scientific Reports : Nature Publishing Group.


Weng, L., Flammini, A., Vespignani, A. & Menczer, F., Competition among memes in a world with limited attention, Scientific Reports 2, Article number: 335 doi:10.1038/srep00335

Stock market prediction using Twitter

A recent paper presented at the ACM International Conference on Web Search & Data Mining uses a simulation to show that a trading strategy based on Twitter conversations can outperform the Dow Jones and basic trading strategies. Using twitter and stock market data over a four-month period, the study also finds that the number of distinct twitter conversations about a stock is a strong predictor of stock trading volume. A higher number of distinct conversations is also positively related to higher stock prices while lower number of conversations can predict a lower stock price.


The main trading room of the Tokyo Stock Excha...

Image via Wikipedia




While past research has looked the sentiment, positive or negative, of tweets to predict stock price, little research has focused on the volume of tweets and the ways that tweets are linked to other tweets, topics or users. Further, past work has mostly studied the overall stock market indexes, and not individual stocks.


A key limitation of the study is that the market lost value during the examined time period. Thus the Twitter based strategy outperformed other strategies by losing the least amount.


For the study, the researchers simulated a series of investments between March 1, 2010 and June 30, 2010 and analyzed performance using several investment strategies.


Read more here and here. See the paper here.




Uncovering historical patterns in scientific publications

Using probabilistic topic modeling, researchers have developed a system, called Bookworm-arXiv, that can parse through thousands of scientific manuscripts located on arXiv, thereby providing immense data manipulation capabilities to science historians. The same team helped to develop Google’s n-gram viewer, which provides similar in-text search capabilities for Google collection of books.

the Cultural Observatory, will soon inaugurate a browser that searches for such language changes in a large online repository of scientific papers known as arXiv (pronounced like “archive”)

arXiv under load due to Perelman's Fields Medal

arXiv under load due to Perelman's Fields Medal (Photo credit: ktheory)

Users will be able to type in one or two words at the site, called Bookworm-arXiv, and immediately see a graph showing the ups and downs of the phrase’s use in the archive


The system will enable researchers to understand the history of scientific concepts and the diffusion of knowledge through the scientific community.

Users can then click on the graph and drill down to read the original papers in which the terms appear, tracing ideas back toward their roots, or to spots where scientific ideas spread from one field to another.


via Words by the Millions, Sorted by Software – NYTimes.com.

Correlation versus Causation (as per xkcd)

While Businessweek had an interesting infographic on the difference between correlation and causation, xkcd provides a different take on the logical fallacy of cum hoc ergo propter hoc (correlation proves causation).

Correlation versus Causation (cum hoc ergo propter hoc)

Image by xkcd

In other words, correlation enables prediction whereas causation enables explanation. Correlation denotes the strength of a relationship between two variables, while causation requires correlation of cause and effect, temporal precedence and rejection of alternative hypotheses.

via Correlation versus Causation (cum hoc ergo propter hoc) « A little bit of this, a little bit of that.


Edit: Don’t forget the mouse-over text.

Implementing heteroskedasticity-consistent standard errors in SPSS (and SAS)

Homoskedasticity (also spelled as Homoscedasticity), or constant variance of regression error terms, is a key assumption of ordinary least squares (OLS) regression. When this assumption is violated, i.e. the errors are heteroskedastic (or heteroscedastic), the regression estimator is unbiased and consistent. However, it is less efficient and this leads to Type I error inflation or reduced statistical power for coefficient hypothesis tests.

Thus correcting for heteroskedasticity is necessary while conducting OLS. There are methods for this, which include transforming the data, use of weighted least squares (WLS) regression and generalized least squares (GLS) estimation. Another alternative is to use  heteroskedasticity-consistent standard error (HCSE) estimators of OLS parameter estimates (White, 1980).

Comparison of residuals between first order He...

Comparison of residuals between first order Heteroskedastic and Homoskedastic disturbances (Photo credit: Wikipedia)

HCSE are of four types. Standard errors from HC0 (the most common implementation) are best used for large sample sizes as these estimators are downward biased for small sample sizes. HC1, HC2, and HC3 estimators are better used for smaller samples.

Many researchers conduct their statistical analysis in STATA, which has in-built procedures for estimating standard errors using all of the HC methods. However, others use SPSS due to its pair-wise deletion capability (versus list-wise deletion in STATA) and suffer from its  lack of heteroskedasticity correction capabilities. This wonderful paper by Hayes and Cai, provides a macro (in the Appendix) that can implement HCSE estimators in SPSS. They also provide a similar macro for SAS.

Note that the macro has no error-handling procedures, hence pre-screening of the data is required. Also, missing data is handled by list-wise deletion (which might defeat the purpose of using SPSS for some users).

Another link to the paper is here.


Hayes, A.F. and Cai, L. (2007) , “Using heteroskedasticity-consistent standard error estimators in OLS regression: An introduction and software implementation”, Behavior Research Methods, 39 – 4, 709-722, DOI: 10.3758/BF03192961.

White, H. (1980), “A Heteroscedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroscedasticity,” Econometrica, 48, 817-838.

Is press coverage good for science?

The past few months the press has reported many stories about grand scientific discoveries. For example, the discovery of the lair of the Kraken, a pre-historic monster that ate dinosaurs for fun (see Lair of Ancient ‘Kraken’ Sea Monster Possibly Discovered – Yahoo! News and The Giant, Prehistoric Squid That Ate Common Sense | Wired Science | Wired.com). There was also extensive coverage of a study that said that the speed of light can be broken and hence Einstein’s theory of special relativity was flawed (read more at  Speed of light may have been broken – Q&A – Telegraph).

Pen and wash drawing by malacologist Pierre Dé...

Image via Wikipedia

In hindsight, it seems that the reporting of these scientific discoveries was a little pre-mature. In the zest to get the next big story, non-reviewed working papers get cited, data gets mis-represented, findings get mis-quoted, and scientific ethics ignored. These problems are more prevalent in some parts of the world.

Everyone has an example of the scientific ignorance of the press, but researchers in Britain probably have more than most. With stories ranging from ludicrous (wind turbine attacked by aliens) to downright irresponsible (promoting the link between childhood vaccinations and autism), the fourth estate in the United Kingdom has hardly covered itself in glory when it comes to science and scientific issues. Other countries have similar grievances, of course — particularly the United States, where right-wing talk radio and cable television regularly air anti-science views on everything from global warming to creationism. Stem-cell scientists in Germany and transgenic-crop researchers in France have also been assailed by journalism out of step with the scientific evidence that it claims to examine.

English: receiving from Judge his certificate ...

In Britain, an inquiry into the  standards and ethics of the press has asked the scientific community to provide support for the thesis that press coverage that does not apply the scientific method is harmful to science. While this is an interesting debate in itself, it begs us to ask the larger question – is press coverage good for science? It also raises questions on if scientists should be responsible for and trained in scientific journalism or if journalists should be trained in the scientific method. While several points can be made in support or opposition to this discussion, many proponents from both sides may agree that good, responsible press coverage is critical for good science. If not, then why do many researchers cite ‘media coverage’ on their resumes?

Read more at The press under pressure : Nature : Nature Publishing Group.