Pentaho and NoSQL Databases

Deep native support for emerging NoSQL databases

Pentaho Business Analytics provides easy to use visual development tools and big data analytics that empower users to easily prepare, model, visualize and explore data sets stored in NoSQL databases such as MongoDB, Cassandra and HBase. Pentaho simplifies the end-to-end NoSQL data life cycle by providing a complete platform from data preparation to predictive analytics.

Visual Development for NoSQL Data Prep and Modeling

Pentaho’s visual development tools drastically reduce the time to design, develop and deploy NoSQL analytics solutions by as much as 15x compared to traditional custom coding and ETL approaches.

  • A powerful visual user interface for ingesting and manipulating data within NoSQL databases, as well as making it easy to enrich NoSQL data by integrating with reference data from other sources.
  • Easy to access NoSQL data, either directly, or through rapid visual extraction into data marts/warehouses optimized for expressive aggregate queries.
  • A visual tool for defining business metadata models helps developers prepare their data for analytics. 

Unparalleled Support for NoSQL Databases

Pentaho's native support for NoSQL databases includes the technologies listed below. Learn more about our partnerships with DataStax/Apache CassandraMongoDB, and HPCC Systems.

Support for NoSQL

Visual Interface and Drag & Drop Orchestration

Pentaho provides a powerful library of graphical job steps for orchestrating execution of jobs for NoSQL databases and other large data warehouses. These include conditional checking steps, event waiting steps, execution steps and notification steps. Together these steps enable easy visual assembly of powerful job flow logic, across multiple jobs and data sources.

An end-to-end analytical platform, Pentaho Business Analytics provides visual development tools for IT developers and analysts to immediately integrate and orchestrate NoSQL data with relational data warehouses and marts, enterprise applications and data stored in cloud applications. Pentaho also provides complete business analytics for NoSQL databases, including direct-connect reporting, visualization, dashboards, interactive analysis and advanced statistical and predictive analytics.

Complete Big Data Analytics

Either through direct-connect interactive reporting and visualization, or by simplifying the process of extracting data from a NoSQL database into a relational database for interactive data exploration, Pentaho provides the ability to immediately deploy powerful analytics for data in NoSQL databases.

The tightly-coupled data integration and business analytics platform enables IT and business users easily explore data in NoSQL databases through:

  • Rich visualization – Interactive web-based interfaces for ad hoc reporting, charting and dashboards.
  • Flexible exploration – Views of data across dimensions such as time, product and geography and across measures such as revenue and quantity.
  • Predictive analysis – Powerful predictive analytics capabilities using advanced statistical algorithms such as classification, regression, clustering and association rules.

Instant and Interactive NoSQL Analytics for Data Analysts

Pentaho Instaview takes data analysts from data to visualization in minutes with interactive self-service access and analytics for Hadoop. Preparation of Hadoop data for analysis is greatly simplified and automated, enabling users to accelerate the big data analytics cycle from days and weeks to minutes and hours. Learn more: Pentaho Instaview

Extending NoSQL Query Languages

Many NoSQL database access methods are missing analytic query operations such as grouping and sorting of data. Pentaho effectively extends for the capabilities of these NoSQL databases by post-processing query results. A library of operations is available and can be applied to prepare data for analytics. 

Scalability for Even the Most Complex Organizations

Pentaho's Java-based engine is multi-threaded. Each step in a job executes on its own thread, leveraging the multi-core processors running on each node of the cluster. As a result, Pentaho's ETL jobs for NoSQL databases often execute many times faster than equivalent hand-coded jobs.