Search: 
 
Product Overview
Enterprise Edition
Reporting
Analysis
Dashboards
Data Integration
Data Mining
BI Platform
Discover
Demos
Case Studies
White Papers
Market Insights
Try Our Hosted Demo

SourceForge.net Logo

Open Source ETL Tools An Attractive Alternative To Custom Code

Pentaho Data Mining
Explore and Learn
Once you've got analysis, reporting, and dashboards deployed, it's time to take your business intelligence (BI) to the next level by adding data mining and advanced analytics. This is a level of BI excellence that many organizations never manage to evolve to, however the importance of pushing ahead with advanced capabilities cannot be underestimated - they can provide a truly sustainable competitive advantage and enable your organization to maximize both its efficiency and effectiveness.

Data Mining is the process of running data through sophisticated algorithms to uncover meaningful patterns and correlations that may otherwise be hidden. These can be used to help you understand the business better and also exploited to improve future performance through predictive analytics. For example, data mining can warn you there’s a high probability a specific customer won’t pay on time based on an analysis of customers with similar characteristics.

To help you fully utilize data mining for organizational advantage, the Pentaho BI Project team has worked in conjunction with the development and business communities to integrate mainstream BI capabilities with advanced data mining. Pentaho Data Mining is differentiated by its open, standards-compliant nature, use of Weka data mining technology, and tight integration with core business intelligence capabilities including reporting, analysis and dashboards. Other data mining offerings lack this level of sophistication and integration.

In this document we cover the business benefits of integrating data mining as part of your business intelligence deployment, together with the how’s and why’s of data mining to provide you with a solid understanding of this topic.

Pentaho Data Mining can be deployed as:

  • An out-of-the-box solution for immediate deployment to analysts. As far as end-users are concerned, data mining operates entirely in the background – users see results and recommendations through e-mail or other web pages, which can include Pentaho Dashboards.
  • A set of components that enable Java™ developers to quickly create custom reporting solutions using Java Objects or Java Server Pages (JSPs). These can be tightly integrated with other applications or portals.
  • Together with other components of the overall Pentaho BI Suite
Features and Benefits

Provides insight into hidden patterns and relationships in your data

  • A classic example of data mining is a retailer who uncovers a relationship between sales of diapers and beer on Sunday afternoons – two items you wouldn’t normally consider as linked. The explanation is that husbands who are sent out to pick up a fresh supply of diapers are also likely to pick up some beer while they happen to be in the store – something that hadn’t been recognized as a significant sales driver before data mining uncovered it.

Enables you to exploit these correlations to improve organizational performance

  • Continuing the example above, very often retailers act on the relationships they discover by using tactics such as placing linked items together on end-of-isle displays as a way to spur additional purchases. All organizations can benefit from acting in a similar way – using newly discovered patterns and correlations as the basis for taking action to improve their efficiency and effectiveness.

Provides indicators of future performance

  • “Those who do not learn from history are doomed to repeat it” is a famous quote from philosopher George Santayana. In the case of data mining, being able to predict outcomes based on historic data can dramatically improve the quality and outcomes of decision making in the present. As a simple example, if the best indicator of whether a customer will pay on time turns out to be a combination of their market segment and whether or not they have paid previous bills on time, then this is information you can usefully benefit from in making current credit decisions.

Enables embedding of recommendations in your applications

  • You can use the data mining results to display a simple summary statement and recommendations within operational applications. For example, on a credit screen you could add: “Based on this new account profile there is an 85% chance this customer will pay late. It is therefore recommended you require a 50% prepayment on this order”. Reporting on aggregate results such as Days Sales Outstanding (DSO) enables you to measure business improvements based on when recommendations were followed and when they weren’t so that you can fine-tune your model and recommendations over time for optimal effect.

Enables you to take full advantage of a range of data mining algorithms

  • No algorithm is likely to be optimal in all situations. For this reason it’s important that you’re able to try out a range to find the algorithm that fits a particular set of data the best.
  • If you find several data mining algorithms that fit well, you can use all of them - for example: “Based on analysis of 3 predictive models, the chances this customer will pay late are; Model A: 95% (96% correct), Model B: 89% (92% correct), Model C: 76% (97% correct)”.
Technology

Powerful Data Mining Engine

  • Provides a comprehensive set of machine learning algorithms from the Weka project including clustering, segmentation, decision trees, random forests, neural networks, and principal component analysis.
  • Pentaho has added integration with Pentaho Data Integration and automated the process of transforming data into the format the data mining engine needs.
  • Algorithms can either be applied directly to a dataset or called from Java code.
  • Output can be viewed graphically, interacted with programmatically, or used data source for reports, further analysis, and other processes.
  • Filters are provided for discretization, normalization, re-sampling, attribute selection, and transforming and combining attributes.
  • Classifiers provide models for predicting nominal or numeric quantities. Learning schemes include decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, and other advanced techniques.
  • The data mining engine is also well-suited for developing new machine learning schemes, enabling customers to incorporate their own models.
  • Inputs and outputs can be controlled programmatically, enabling developers to create completely custom solutions using the components provided.

Graphical Design Tools

  • Graphical user interfaces are provided for data pre-processing, classification, regression, clustering, association rules, and visualization.

Pentaho Data Mining Enterprise Edition
Pentaho Data Mining Enterprise Edition extends Pentaho’s best-in-class open source business intelligence (BI) capabilities with additional software and services designed to help you and your organization:
  • Achieve BI success
  • Save time, resources, and money
  • Mitigate risk
For more information on the features and benefits of Pentaho’s Enterprise Data Mining Editions, please see the Pentaho Data Mining data sheet.

 
Close Video
 
   Terms of Use    Privacy Statement    Contributor Agreement    Site Map    © 2008, Pentaho Corporation