Streamlined Data Refinery

Blend, enrich and refine any data source into secure, on-demand analytic data sets with a Streamlined Data Refinery. Using Hadoop as a big data processing hub, Pentaho Data Integration processes and refines specific data sets. With a single click, data sets are automatically modeled, published, and delivered to users for immediate visual analytics.

Deliver Governed, Analytic Data Sets

With Pentaho’s data integration and analytics platform, Hadoop becomes a high performance, multisource business information hub where you can stream data, blend it and then automatically publish refined data sets into one of the popular analytic databases (such as Amazon Redshift or HPE Vertica). For the end-user, a rich set of data discovery, reports, dashboards and visualizations are immediately available for high performance analytics.

Analytics-ready blended data sets at scale

  • A pragmatic approach for delivering analytic data sets at scale for immediate high-performance analytics
  • A self-service data integration process for blending and enriching vast volumes of highly diverse data
  • An agile data integration process that includes data transformation steps and tools for simplified in-cluster data processing in Hadoop
  • A self-service analytics experience via an automated process allows for high speed queries and visualizations

Example of how a streamlined data refinery may look within an IT landscape:

  • An electronic marketing firm has created a refinery architecture for delivering personalized offers
  • Online campaign, enrollment, and transaction data is ingested into Hadoop, processed via Pentaho Data Integration, modeled automatically and delivered to an analytic database
  • Users drive execution of analytic data sets on demand
  • A business analytics front-end includes reporting and ad hoc analysis for business users

How it Works

The Results:

  • Business users have access to reliable, highly governed data generated from diverse sources at high volume with limited support from IT
  • Achieve ETL and data management cost savings by utilizing the right technology for the most appropriate purpose
  • New data sets engineered for predictive analytics more quickly due to rapid ingestion and powerful processing
  • Automatically model and publish governed data sets for immediate visualizations