World Cup, Twitter sentiment and equity prices…any correlation?

I heard a news story on the radio today about stock markets going quiet during World Cup events, especially when the home country is on the field. This made me think about how live activities affect the major markets. My colleague Bo Borland at Pentaho posed an interesting question on this topic just yesterday at MongoDB World in New York, “Do real time Tweets have an affect on the stock markets?” Working for a Big Data integration and analytics company, Bo of course used Pentaho tools to see if there was indeed a correlation. A cool idea, but what resulted was even cooler than I’d imagined….

Using Pentaho Data Integration, Bo easily pulled minute-by-minute stock tick data which is highly structured, and blended it with unstructured Twitter data. Next, he pushed the blended data into a MongoDB collection to take advantage of its flexibility. (Note: Bo is also the author of Pentaho Analytics for MongoDB). Taking the integration and analysis a step further, he scored the tweet sentiment by including a Weka predictive algorithm as part of the data ingestion process from Twitter. Once the data was in place, he used one of the cool new features in Pentaho 5.1 to “slice and dice” the data stored in MongoDB.

It’s worth pointing out that the ability to analyze data directly from MongoDB with no coding is a first to market feature. Pentaho’s designed and delivered native integration with MongoDB's Aggregation Framework allowing business users and analysts to immediately access, analyze and visualize MongoDB data for superior insight and governance.

Here’s Bo’s process simplified:

Pentaho Data Integration

  • Ingest data from external data source (TickData) into MongoDB
  • Ingest data from Twitter using public API into MongoDB
  • Execute a Weka Scoring step in during the ingestion process to properly score the incoming tweets and calculate the sentiment

Connect Pentaho Analytics to the Mongo Collection(s)

  • Start analyzing data
  • Slice and dice large amounts of data quickly

Here’s what the process looks like:

diagram mongodb

If you want to see this slicing and dicing directly on data in MongoDB check out this video.

Bo presented this demo yesterday live to a standing room only crowd using Tesla data at MongoDB World. You can access his slides here: [slideshare id=36297211&sc=no]

So the question still remains, “Does Twitter sentiment correlate to equity prices?” I’ll let you take a look and decide, but I’ve got some stocks to research….

Chuck Yarbrough Director, Big Data Product Marketing Pentaho