With Pentaho, managing the enormous volumes and increased variety and velocity of data entering organizations, regardless of type of data and number of data sources, is greatly simplified. Pentaho’s complete data integration platform delivers “analytics ready” data to end users 15X faster with visual tools that reduce time and complexity. Instead of coding in SQL or writing MapReduce Java functions, organizations can immediately gain real value from their data, including from multiple sources like Hadoop, NoSQL and relational data stores, by using an easy to use graphical designer.

With Pentaho’s Data Integration Platform, organizations can:
|
Turn big data into actionable analytics
Pentaho provides easy access, exploration and organization of all data sources, including Hadoop, NoSQL and relational databases -- for consumable and actionable analytics. |
|
Deliver data to a wide variety of applications
Pentaho’s out-of-the-box data standardization, enrichment and quality capabilities deliver information to 3rd party applications in the shape and form most-suited for those applications. |
|
Integrate big data sources with existing enterprise data
With broad connectivity to any data type and a high performance in-Hadoop execution option, Pentaho makes it easier and faster to integrate existing databases with new sources of data.
|
Simple Visual Designer for the Fastest Path to Big Data Value
Pentaho Data Integration's intuitive and rich graphical designer allows you to do exactly what the most skilled developers can accomplish, in a fraction of the time, and without requiring you to manually code.
Pentaho Data Integration's graphical designer includes:
- Intuitive, drag and drop designer
- Rich library of pre-built components
- Powerful data transformation mappings
- Dynamic transformations, using variables to determine field mappings, validation and enrichment rules
- Integrated debugger for testing and tuning job execution
Visual Designer for Data Preparation, Transformation and Orchestration
Big Data Integration and High-Volume Data Processing
Pentaho makes it easier and faster to integrate with Hadoop, NoSQL and high performance analytic databases. Pentaho’s intuitive graphical design provides:
- Native connectivity to leading Hadoop, NoSQL and analytic databases
- Visual designer for MapReduce jobs to reduce development cycles by as much as 15x
- Data preparation, modeling and exploration of unstructured data sets
Faster Design with a Visual MapReduce; Faster Execution with an in-Hadoop Deployment
Pentaho’s powerful data integration engine provides:
- Multi-threaded engine for fast execution
- Cluster support, enabling distributed processing of jobs across multiple nodes
- Unique in-Hadoop execution for extremely fast performance
Broad Connectivity and Data Delivery
Pentaho Data Integration offers broad connectivity to a variety of diverse data including all popular structured, unstructured and semi-structured data sources. Some examples include:
- Standard relational databases (e.g. Oracle, DB2, MySQL, SQL Server)
- Hadoop (e.g. Apache Hadoop, Cloudera, HortonWorks, MapR)
- NoSQL databases (e.g. MongoDB, Cassandra, HBase)
- Analytic databases (e.g. Vertica, Greenplum, Teradata)
- Packages enterprise applications (e.g. SAP)
- Cloud-based and SaaS applications (e.g. Salesforce, Amazon Web Services)
- Files (e.g. XML, Excel, flat file) and web service APIs
To increase the performance of data extraction, loading and delivery processes, Pentaho offers the following capabilities:
- Native connectivity and bulk-loading to most common data sources
- Data delivery in a multi-dimensional format for analytic applications
- Data delivery through real-time data services for operational 3rd party applications
Team Work and Collaboration for Developers
Pentaho Data Integration is built on a centralized repository where all stakeholders in a data integration project can share and collaborate on developing data flows. Pentaho provides:
- Shared repository for collaboration among data analysts, job developers and data stewards
- Content management, versioning and locking to easily version jobs for roll-back to prior versions
Powerful Administration and Management
Pentaho Data Integration provides out-of-the box capabilities for managing the operations side of data integration projects. These capabilities include:
- Managing security privileges for users and roles
- Integrating with existing security definitions in LDAP and Active Directory
- Setting permissions to control user actions (e.g. read, execute or create)
- Scheduling of data integration flows
- Monitoring and analyzing the performance of data integration processes
Data Profiling and Data Quality
Pentaho provides basic data profiling capabilities such as row counts, mathematical functions and identification of null values. Pentaho also provides data quality operators such as string manipulators, mapping functions, filtering and sorting. For name and address verification capabilities, Pentaho integrates with leading data quality vendors, such as Human Inference and Melissa Data.
Pentaho data profiling and data quality capabilities help data stewards:
- Identify data that fails to comply with business rules and standards
- De-duplicate and cleanse inconsistent and redundant data
- Validate, standardize and correct name, address, email and telephone data
Why Pentaho Data Integration?
- Power of big data orchestration and integration: Integration of all data - Hadoop, NoSQL and relational - in one platform; In-Hadoop and clustered execution of data processing for maximum scalability
- Ease of use: Simple set up; Intuitive graphical designer; No extra code generation; Over 100 out of the box mapping objects, including a visual MapReduce designer for Hadoop
- Modern and Extensible: 100% Java for a cross-platform deployment; Pluggable architecture for adding connectors, transformations and user-defined expressions
- High value, low cost: No upfront fees; No maintenance fees; No developer/user license fees