Pentaho for Storm

Closing the Gap Between Big Data Batch and Real Time
Pentaho continues to lead the way in big data ETL by providing developers a visual environment to design, test, and execute transformations that leverage the power of MapReduce.

Storm is a free and open source distributed real-time computation system that simplifies reliable processing of unbounded streams of data, doing for real-time processing what Hadoop did for batch processing.

Developers can now be immediately be productive with one of the most popular distributed streaming processing systems today. Existing Pentaho transformations can be executed as real-time processes via Storm - including those used in Pentaho MapReduce. This powerful combination brings data to business users immediately without the delay of batch processing or overhead of designing additional transformations.

Process Data as it Arrives

With Pentaho processing data begins when it arrives from the source and delivers valuable data sets immediately. Up to the second insights are available for key business metrics delivering real-time dashboards, reports, or intermediate data sets to be used by existing applications.

Leverage Existing Data Transformations

Many customers have long running batch Pentaho jobs that run within Hadoop via MapReduce. Pentaho for Storm compliments these allowing developers to reuse existing transformations to process data immediately. Both batch and real-time workflows are powered by Pentaho. Existing developers can build upon years of knowledge to learn the most from their data, instantly.

Pentaho for Storm allows developers to reuse their knowledge and components to process data differently. Deliver data when it’s needed - all in a familiar environment.

Next Steps

Today, Pentaho for Storm can process many of existing transformations but this is still an innovation in incubation. To learn more visit: http://wiki.pentaho.com/display/BAD/Kettle+Execution+on+Storm