Big Data Innovations That Drive Greater Insights
Pentaho Labs drives innovation in big data integration and analytics through incubation of new breakthrough advanced technologies. Innovations include advanced visualization, big data templates, real time and predictive analytics.
Visit often to see what is cooking in the labs next.
Pentaho with Spark
Apache Spark™ - Engine for Large-Scale Data Processing
Pentaho leads the way in Big Data by enabling developers to process data at scale. Recent work investigating and prototyping numerous potential use cases where Spark can be leveraged within a customer environment has shown positive results. The labs team continues to future-proof big data investments with the incubation of new breakthrough technologies, as is the case with Spark.
Process Data in Memory
With Pentaho and Spark, customers can process data upwards of 100 times faster than Hadoop. This means that customers can more easily meet SLAs with customers by delivering insights faster than ever before.
Current Use Cases being Prototyped:
Queries directly on Spark SQL
- Enable access to Spark SQL in Pentaho Data Integration through JDBC
- Use Spark as a data source for pixel perfect reports
Orchestrate Spark Parallel Execution
- Enable the execution of Spark Applications from Pentaho Data Integration
Visual Spark ETL
- Enable Pentaho Data Integration transformations to be executed at scale inside of Spark
- Operationalize Spark Scala Scripts
- Enable realtime feeds for:
- Complex Event Processing
Pentaho with Storm on YARNPentaho with Storm and YARN for Real Time Big Data Analytics
Pentaho continues to lead the way in big data enabling developers to process big data analytics in real time, speeding critical decisions based on time-sensitive data with Pentaho Data Integration (PDI) with Storm on YARN.
The YARN-based architecture of Hadoop provides a more general processing platform not constrained to MapReduce. Pentaho Labs continues to future-proof big data investments with the incubation of new breakthrough technologies such as YARN and Storm.
Developers can now immediately be productive with one of the most popular distributed streaming processing systems today. Existing Pentaho transformations can be executed as real-time processes via Storm - including those used in Pentaho MapReduce. This powerful combination brings data to business users immediately without delay or overhead of designing additional transformations.
Process Data as it Arrives
With Pentaho processing data begins when it arrives from the source and delivers valuable data sets immediately. Up to the second insights are available for key business metrics delivering real-time dashboards, reports, or intermediate data sets to be used by existing applications.
Leverage Existing Data Transformations
Many customers have long running batch Pentaho jobs that run within Hadoop via MapReduce. Pentaho for Storm compliments these allowing developers to reuse existing transformations to process data immediately. Both batch and real-time workflows are powered by Pentaho. Existing developers can build upon years of knowledge to learn the most from their data, instantly.
Pentaho with Storm allows developers to reuse their knowledge and components to process data differently. Deliver data when it’s needed - all in a familiar environment.
Today, Pentaho with Storm can process many of existing transformations but this is still an innovation in incubation. To learn more visit: http://wiki.pentaho.com
Adaptive Big Data Layer
With Pentaho you can now literally plug into popular big data stores with an adaptive big data layer that brings greater flexibility, insulation from change and increased competitive advantage to companies.
The Pentaho adaptive big data layer includes plug-ins for Hadoop distributions from Cloudera, Hortonworks, MapR and Intel, as well as popular NoSQL databases Cassandra and MongoDB, and introduces support for Splunk. Download plug-ins.
Adaptive Big Data Layer Overview: