What makes a tweet go viral?

There is a lot of recent research studies that investigate what factors influence the popularity of memes on social networks. Much of this research analyzes twitter posts and has identified many reasons why certain tweets go viral. These include factors related to the tweet itself (e.g. how controversial the tweet is) and factors related to the tweeters (e.g. number of followers, influence, and frequency of posting). New work says that ‘going viral’ is a random process.

Visualizations of meme diffusion networks for different topics.

This new study uses an agent-based model to study this phenomenon. This model simulates message sharing on a social network and incorporates two key characteristics of such a context: users have limited attention spans and can only view a portion of all tweets.

The predictions of our model are consistent with empirical data from Twitter, a popular microblogging platform. Surprisingly, we can explain the massive heterogeneity in the popularity and persistence of memes as deriving from a combination of the competition for our limited attention and the structure of the social network, without the need to assume different intrinsic values among ideas.

Or in other words, the pattern of twitter memes can be replicated in the absence of tweet or tweeter based factors.  This raises interesting questions regarding the direction of causality – do tweets go viral because of certain factors, or is the popularity of posts on social networks a random process and we find mere correlations in our bid to find explanations. While some say that this correlation versus causation conundrum can be solved only empirically, others say that a controlled, experimental approach is the way to go.

Suppose, Menczer says, that in his study he randomly assigned different colours to each tweet. If red tweets ended up being the most popular, one could argue that colour was a predictive factor for success when in reality the popularity of red-coloured tweets was coincidental.

Read more at  Going viral on Twitter is a random act – tech – 13 April 2012 – New Scientist and Competition among memes in a world with limited attention : Scientific Reports : Nature Publishing Group.


Weng, L., Flammini, A., Vespignani, A. & Menczer, F., Competition among memes in a world with limited attention, Scientific Reports 2, Article number: 335 doi:10.1038/srep00335

Stock market prediction using Twitter

A recent paper presented at the ACM International Conference on Web Search & Data Mining uses a simulation to show that a trading strategy based on Twitter conversations can outperform the Dow Jones and basic trading strategies. Using twitter and stock market data over a four-month period, the study also finds that the number of distinct twitter conversations about a stock is a strong predictor of stock trading volume. A higher number of distinct conversations is also positively related to higher stock prices while lower number of conversations can predict a lower stock price.


The main trading room of the Tokyo Stock Excha...

Image via Wikipedia




While past research has looked the sentiment, positive or negative, of tweets to predict stock price, little research has focused on the volume of tweets and the ways that tweets are linked to other tweets, topics or users. Further, past work has mostly studied the overall stock market indexes, and not individual stocks.


A key limitation of the study is that the market lost value during the examined time period. Thus the Twitter based strategy outperformed other strategies by losing the least amount.


For the study, the researchers simulated a series of investments between March 1, 2010 and June 30, 2010 and analyzed performance using several investment strategies.


Read more here and here. See the paper here.




Games and being in the ‘Flow’

Some games become ‘boring’ and ‘uninteresting’ after a few hours of game play, others (like Diablo), are imminently replay-able and enjoyable. This is because good games enable players to get into the ‘Flow’. The concept of flow becomes even more important as gamification trends catch on. This article presents an introduction to flow and provides recommendations for game developers (and others who wish to incorporate gaming concepts into their products and services). It provides a scientific explanation of why text-based games and adventure games are increasingly relics of the past (a simpler view by xkcd is below).


Image by xkcd

Continue reading

Correlation versus Causation (as per xkcd)

While Businessweek had an interesting infographic on the difference between correlation and causation, xkcd provides a different take on the logical fallacy of cum hoc ergo propter hoc (correlation proves causation).

Correlation versus Causation (cum hoc ergo propter hoc)

Image by xkcd

In other words, correlation enables prediction whereas causation enables explanation. Correlation denotes the strength of a relationship between two variables, while causation requires correlation of cause and effect, temporal precedence and rejection of alternative hypotheses.

via Correlation versus Causation (cum hoc ergo propter hoc) « A little bit of this, a little bit of that.


Edit: Don’t forget the mouse-over text.

Implementing heteroskedasticity-consistent standard errors in SPSS (and SAS)

Homoskedasticity (also spelled as Homoscedasticity), or constant variance of regression error terms, is a key assumption of ordinary least squares (OLS) regression. When this assumption is violated, i.e. the errors are heteroskedastic (or heteroscedastic), the regression estimator is unbiased and consistent. However, it is less efficient and this leads to Type I error inflation or reduced statistical power for coefficient hypothesis tests.

Thus correcting for heteroskedasticity is necessary while conducting OLS. There are methods for this, which include transforming the data, use of weighted least squares (WLS) regression and generalized least squares (GLS) estimation. Another alternative is to use  heteroskedasticity-consistent standard error (HCSE) estimators of OLS parameter estimates (White, 1980).

Comparison of residuals between first order He...

Comparison of residuals between first order Heteroskedastic and Homoskedastic disturbances (Photo credit: Wikipedia)

HCSE are of four types. Standard errors from HC0 (the most common implementation) are best used for large sample sizes as these estimators are downward biased for small sample sizes. HC1, HC2, and HC3 estimators are better used for smaller samples.

Many researchers conduct their statistical analysis in STATA, which has in-built procedures for estimating standard errors using all of the HC methods. However, others use SPSS due to its pair-wise deletion capability (versus list-wise deletion in STATA) and suffer from its  lack of heteroskedasticity correction capabilities. This wonderful paper by Hayes and Cai, provides a macro (in the Appendix) that can implement HCSE estimators in SPSS. They also provide a similar macro for SAS.

Note that the macro has no error-handling procedures, hence pre-screening of the data is required. Also, missing data is handled by list-wise deletion (which might defeat the purpose of using SPSS for some users).

Another link to the paper is here.


Hayes, A.F. and Cai, L. (2007) , “Using heteroskedasticity-consistent standard error estimators in OLS regression: An introduction and software implementation”, Behavior Research Methods, 39 – 4, 709-722, DOI: 10.3758/BF03192961.

White, H. (1980), “A Heteroscedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroscedasticity,” Econometrica, 48, 817-838.

Is press coverage good for science?

The past few months the press has reported many stories about grand scientific discoveries. For example, the discovery of the lair of the Kraken, a pre-historic monster that ate dinosaurs for fun (see Lair of Ancient ‘Kraken’ Sea Monster Possibly Discovered – Yahoo! News and The Giant, Prehistoric Squid That Ate Common Sense | Wired Science | Wired.com). There was also extensive coverage of a study that said that the speed of light can be broken and hence Einstein’s theory of special relativity was flawed (read more at  Speed of light may have been broken – Q&A – Telegraph).

Pen and wash drawing by malacologist Pierre Dé...

Image via Wikipedia

In hindsight, it seems that the reporting of these scientific discoveries was a little pre-mature. In the zest to get the next big story, non-reviewed working papers get cited, data gets mis-represented, findings get mis-quoted, and scientific ethics ignored. These problems are more prevalent in some parts of the world.

Everyone has an example of the scientific ignorance of the press, but researchers in Britain probably have more than most. With stories ranging from ludicrous (wind turbine attacked by aliens) to downright irresponsible (promoting the link between childhood vaccinations and autism), the fourth estate in the United Kingdom has hardly covered itself in glory when it comes to science and scientific issues. Other countries have similar grievances, of course — particularly the United States, where right-wing talk radio and cable television regularly air anti-science views on everything from global warming to creationism. Stem-cell scientists in Germany and transgenic-crop researchers in France have also been assailed by journalism out of step with the scientific evidence that it claims to examine.

English: receiving from Judge his certificate ...

In Britain, an inquiry into the  standards and ethics of the press has asked the scientific community to provide support for the thesis that press coverage that does not apply the scientific method is harmful to science. While this is an interesting debate in itself, it begs us to ask the larger question – is press coverage good for science? It also raises questions on if scientists should be responsible for and trained in scientific journalism or if journalists should be trained in the scientific method. While several points can be made in support or opposition to this discussion, many proponents from both sides may agree that good, responsible press coverage is critical for good science. If not, then why do many researchers cite ‘media coverage’ on their resumes?

Read more at The press under pressure : Nature : Nature Publishing Group.

Correlation versus Causation (cum hoc ergo propter hoc)

A picture is worth a thousand words. This wonderful image by Businessweek helps to convey the difference between correlation and causation in a much easier to understand manner as compared with standard research texts.

Image via BusinessWeek


In other words, correlation enables prediction whereas causation enables explanation. Correlation denotes the strength of a relationship between two variables, while causation requires correlation of cause and effect, temporal precedence and rejection of alternative hypotheses.


Thus that correlation proves causation, or, cum hoc ergo propter hoc, is a logical fallacy.


See the original at Correlation or Causation? – Businessweek.

The ironic effects of packaging

English: Photo by R L Sheehan of commercially ...

Image via Wikipedia

A paper published in advance in the Journal of Marketing Research finds that signalling effectiveness is a double-edged sword. Termed as the ironic effects of packaging, the experimental study found that while attractive packaging increases initial product sales, it also leads to lesser post-purchase use of the product.

“Consumers become so convinced of the power of a boldly packaged product that they judge they can use less of it,” lead researcher Meng Zhu says. “Conversely, they tend to use more of a product when the packaging lacks strong cues of effectiveness.”


Meng Zhu, Darron M Billeter, and J. Jeffrey Inman (2011). The Double-Edged Sword of Signaling Effectiveness: When Salient Cues Curb Postpurchase Consumption. Journal of Marketing Research. Ahead of Print.

Read more at Marketing Press | The Double-Edged Sword of Signaling Effectiveness: When Salient Cues Curb Postpurchase Consumption and Futurity.org – Package irony: Buy quickly, use slowly.

World’s most innovative

Good Business presents an infographic by Thomson Reuters that highlights the world’s most innovative countries and industries. They use more than patent count to come up with their measure of innovation. They combine patent success, reach, influence and volume to come up with a few surprises. The top 100 innovative companies in the world hail from just 9 countries – while the US leads the pack with 40% companies, Liechtenstein is 9th with 1 company on the list.


A GOOD.is Transparency

Image via Good Business

Read more and see the original (large) image at Infographic: The World’s Leading Innovators – Business – GOOD.

Twitter fuels a bank run

Swedish banks operating in Latvia were recently victim to a social media-fueled bank run. Analysis by Orgnet.com shows the social graph of Twitter users who tweeted and retweeted rumors about the bank. The image highlights the Swedbank‘s central role in tweeting denials and trying to control the spread of rumors. One can clearly see two sub-networks at play – the first being a highly connected network of rumor-mongers, the second a highly centralized network of denials.



Image via ReadWriteWeb


Download the original pdf here: Twitter Bank Run.

Read more at Did A Twitter-Fueled Latvian Bank Run Start With One Account? [UPDATED].