Streamlining the Machine Learning Workflow
With Pentaho’s machine learning orchestration, the process of building and deploying advanced analytics models maximizes efficiency. Most enterprises struggle to put models to work because data professionals often operate in silos and the workflow - from data preparation to updating models - create bottlenecks.
Pentaho’s platform enables collaboration and removes bottlenecks in four key areas:
1. Prepare Data and Engineer New Features
Pentaho helps data scientists and engineers easily prepare and blend traditional sources like ERP, EAM and big data sources like sensors and social media. Pentaho also accelerates the notoriously difficult and costly task of feature engineering by automating data onboarding, data transformation and data validation in an easy-to-use drag and drop environment.
2. Train, Tune, and Test Models
Data scientists often apply trial and error to strike the right balance of complexity, performance and accuracy in their models. With integrations for languages like R and Python, and for machine learning packages like Spark MLlib and Weka, Pentaho allows data scientists to seamlessly train, tune, build and test models faster.
3. Deploy and Operationalize Models
A completely trained, tuned and tested machine learning model still needs to be deployed. Pentaho allows data professionals to easily embed models developed by the data scientist directly in a data workflow. They can leverage existing data and feature engineering efforts, significantly reducing time-to-deployment. With embeddable APIs, organizations can also include the full power of Pentaho within existing applications.
4. Update Models Regularly
With Pentaho, data engineers and scientists can re-train existing models with new data sets or make feature updates using custom execution steps for R, Python, Spark MLlib and Weka. Pre-built workflows can automatically update models and archive existing ones.