Big Data Innovations That Drive Greater Insights

Pentaho Labs drives innovation in big data integration and analytics through incubation of new breakthrough advanced technologies. Innovations include advanced visualization, big data templates, real time and predictive analytics.

Visit often to see what is cooking in the labs next.

Pentaho with Spark

Apache Spark™ - Engine for Large-Scale Data Processing

Pentaho leads the way in Big Data by enabling developers to process data at scale. Recent work investigating and prototyping numerous potential use cases where Spark can be leveraged within a customer environment has shown positive results.  The labs team continues to future-proof big data investments with the incubation of new breakthrough technologies, as is the case with Spark.  

Process Data in Memory

With Pentaho and Spark, customers can process data upwards of 100 times faster than Hadoop.  This means that customers can more easily meet SLAs with customers by delivering insights faster than ever before.

Current Use Cases being Prototyped:

Queries directly on Spark SQL

  • Enable access to Spark SQL in Pentaho Data Integration through JDBC
  • Use Spark as a data source for pixel perfect reports

Orchestrate Spark Parallel Execution

  • Enable the execution of Spark Applications from Pentaho Data Integration

Visual Spark ETL

  • Enable Pentaho Data Integration transformations to be executed at scale inside of Spark

Interactive Spark

  • Operationalize Spark Scala Scripts

Spark Streaming

  • Enable realtime feeds for:
    • Complex Event Processing
    • Alerting
    • Monitoring


Pentaho with Storm on YARN

Pentaho with Storm and YARN for Real Time Big Data Analytics

Pentaho continues to lead the way in big data enabling developers to process big data analytics in real time, speeding critical decisions based on time-sensitive data with Pentaho Data Integration (PDI) with Storm on YARN.

The YARN-based architecture of Hadoop provides a more general processing platform not constrained to MapReduce. Pentaho Labs continues to future-proof big data investments with the incubation of new breakthrough technologies such as YARN and Storm.

This player has embed/social functionality - only use for ungated content

Developers can now immediately be productive with one of the most popular distributed streaming processing systems today. Existing Pentaho transformations can be executed as real-time processes via Storm - including those used in Pentaho MapReduce. This powerful combination brings data to business users immediately without delay or overhead of designing additional transformations.

Process Data as it Arrives

With Pentaho processing data begins when it arrives from the source and delivers valuable data sets immediately. Up to the second insights are available for key business metrics delivering real-time dashboards, reports, or intermediate data sets to be used by existing applications.

Leverage Existing Data Transformations

Many customers have long running batch Pentaho jobs that run within Hadoop via MapReduce. Pentaho for Storm compliments these allowing developers to reuse existing transformations to process data immediately. Both batch and real-time workflows are powered by Pentaho. Existing developers can build upon years of knowledge to learn the most from their data, instantly.

Pentaho with Storm allows developers to reuse their knowledge and components to process data differently. Deliver data when it’s needed - all in a familiar environment.

Next Steps

Today, Pentaho with Storm can process many of existing transformations but this is still an innovation in incubation. To learn more visit:


Adaptive Big Data Layer

With Pentaho you can now literally plug into popular big data stores with an adaptive big data layer that brings greater flexibility, insulation from change and increased competitive advantage to companies.

The Pentaho adaptive big data layer includes plug-ins for Hadoop distributions from Cloudera, Hortonworks, MapR and Intel, as well as popular NoSQL databases Cassandra and MongoDB, and introduces support for Splunk. Download plug-ins.

Adaptive Big Data Layer Overview:

This player has embed/social functionality - only use for ungated content