Facebook and Pentaho Data Integration

Social Networking Data

Recently, I have been asked about Pentaho's product interaction with social network providers such as Twitter and Facebook. The data stored within these "social graphs" can provide its owners with critical metrics around their content. By analyzing trends within user growth and demographics as well as consumption and creation of content...owners and developers are better equipped to improve their business with Facebook and Twitter. Social networking data can already be viewed and analyzed utilizing existing tools such as FB Insights or even purchasable 3rd party software packages created specifically for this purpose. Now...Pentaho Data Integration in its traditional sense is an ETL (Extract Transform Load) tool. It can be used to extract and extrapolate data from these services and merge or consolidate it with other relative company data. However, it can also be used to automatically push information about a company's product or service to the social network platforms. You see this in action today if you have ever used Facebook and "Liked" something a company had to offer. At regular intervals, you will sometimes note unsolicited product offers and advertisements posted to your wall from those companies. A great and cost effective way to advertise to the masses.

Application Programming Interface

Interacting with these systems is made possible because they provide an API. (Application Programming Interface) To keep it simple, a developer can write a program in "some language" to run on one machine which communicates with the social networking system on another machine. The API can leverage a 3GL such as Java or JavaScript or even simpler, RESTful services. At times, software developers/vendors will write connectors in the native API that can be distributed and used in many software applications. These connectors can offer a quicker and easier approach than writing code alone. It may be possible within the next release of Pentaho Data Integration, that an out of the box Facebook and/or Twitter transformation step is developed - but until then the RESTful APIs provided work just fine with the simple HTTP POST step.  Using Pentaho Data Integration with this out of the box component, allows quick access to social network graph data. It can also provide the ability to push content to those applications such as Facebook and Twitter without writing any code or purchasing a separate connector.

The Facebook Graph API

Both Facebook and Twitter provide a number of APIs, one worth mentioning is the Facebook Graph API (don't worry Twitter, I'll get back to you in my next blog entry).

The Graph API is a RESTful service that returns a JSON response. Simply stated an HTTP request can initiate a connection with the FB systems and publish / return data that can then be parsed with a programming language or even better yet - without programing using Pentaho Data Integration and its JSON input step.

Since the FB Graph API provides both data access and publish capabilities across a number of objects (photos, events, statuses, people pages) supported in the FB Social graph, one can leverage both automated push and pull capabilities.

If you are interested in giving this a try or seeing this in action, take a look at this tutorial available on the Pentaho Evaluation Sandbox.

Michael Tarallo Director of Enterprise Solutions Pentaho