A Toolkit to Tackle Big Data Challenges
Reducing data preparation time--often 60-80% of total time spent on big data analytics projects--translates to valuable expertise being spent on analyzing and applying advanced algorithms to data. The Pentaho Data Science pack is designed to simplify the data preparation, cleansing and orchestration of analytic data sets.
By operationalizing two commonly used technologies, R and Weka, Pentaho offloads the burden of the data flow process. Leveraging familiar tools and common predictive models results in a broader view of customer behavior.
Data Science Pack Plugins
The pack offers practical advanced analytics capabilities to tackle big data integration and blending challenges. The Pack includes:
- R Script Executor for PDI: An R executor step allows an R script to be run as part of a Pentaho Data Integration transformation removing the burden of data preparation.
- Weka Scoring for PDI: This tool allows the user to “score” data as part of a PDI transformation by applying classification, clustering, and regression models constructed in WEKA.
- Weka Forecasting for PDI: Weka forecasting leverages forecasting models created in Weka’s time series analysis and forecasting environment in order to create future predictions on incoming data within a PDI transformation