Training - Course Description

Course:
Pentaho Data Integration I

Course Number: PDI2000
Audience: This course is intended for technical users who integrate disparate data sources (including big data sources), build/maintain data models for analysis, and manage BI data/metadata, including: Database Developers, Power Users, Technical Business Analysts, BI Solution Architects, Systems Integrators, and Data Scientists
Level: Introductory – This course is intended for students with database development or administration experience who are new to Pentaho Data Integration
Delivery Method: Public classroom, Instructor-led online, private on-site (please contact us for on-site pricing)
Duration: 4 Days
Public Training Cost: USD $2,600 (4 credits)
Course Placement: This course is the first course in the Database Developer path. Students with prior database development or administration experience who are new to Pentaho Data Integration should take this course.

Course Schedule:

In the schedule below, click the course title for detailed information on the class, click the provider link for information on the authorized training provider, or click the Register Now link to enroll.

Location Language Provider Date/Time Availability
Geneva, Switzerland English Register Now
Tampa, FL English Register Now
Munich, Germany German Register Now
Washington, DC English Register Now
Chicago, IL English Register Now
Online English Register Now
Birmingham, UK English Register Now
Milano, Italy Italian Register Now
Washington, DC English Register Now
Online English Register Now
London, UK English Register Now
Zurich, Switzerland English Register Now
Cologne, Germany German Register Now
Paris,France French Register Now
Massa, Italy Italian Register Now
Madrid, Spanish Spanish Register Now
Online English Register Now
Birmingham, UK English Register Now

Note: All courses provided by Pentaho-EMEA are offered in the Central European time zone. All US courses (provided by Pentaho) are offered in the Eastern time zone. Beginning in July, 2013, all US courses will be offered in the Central time zone.

Course Overview:

With continuous volumes and increased variety and velocity of data, organizations need fast and easy ways to harness data and gain insight from it. However, one of the biggest challenges facing IT organizations today is to provide a consistent, single version of the truth across all sources of information in an analytics-ready format. With powerful data extract, transform and load (ETL) capabilities, an intuitive and rich graphical design environment, and an open and standards-based architecture, Pentaho Data Integration is increasingly the choice over proprietary and homegrown data integration tools.

Pentaho Data Integration provides a full ETL solution, including:

  • Rich graphical designer to empower ETL developers
  • Broad connectivity to any type of data, including diverse and big data
  • Enterprise scalability and performance, including in-memory caching
  • Big data integration, analytics and reporting, including Hadoop, NoSQL, traditional OLTP & analytic databases
  • Modern, open, standards-based architecture

Through a series of lectures and hands-on exercises covering theory, best practices, and design patterns, Pentaho Data Integration for Database Developers provides students the skills they need to maximize the value of data to the organization. This course helps prepare you for the Pentaho Data Integration Certification Exam.

Course Benefits:

  • Improve productivity by giving your data integration team the skills they need to succeed with Pentaho Data Integration
  • Instead of coding in SQL or writing MapReduce Java functions, you can immediately gain real value from data, including from multiple sources like Hadoop, NoSQL and relational data stores, using an easy to use graphical designer
  • Learn to deliver data to a wide variety of applications using Pentaho's out-of-the-box data standardization, enrichment and quality capabilities
  • Interactive, hands-on training materials significantly improve skill development and maximize retention

Skills Achieved:

At the completion of this course, you should be able to:

  • Install Pentaho Data Integration
  • Create, preview, and run basic transformations containing steps and hops
  • View transformation results in the Step Metrics view and the Log view
  • Create a database connection and use Database Explorer to interact with a data source
  • Create more complex transformations that involve configuring the following steps: Table input, Table output, Text file output, CSV file input, Insert/Update, Add constants, Filter, Value Mapper, Stream lookup, Join rows, Merge join, Sort rows, Row normalizer, JavaScript, Dimension lookup/update, Database Lookup, Get Data from XML, Set Environment Variables, and Analytic query
  • Create transformations that use parameterized values
  • Map the structure of an online transaction processing database to the structure of an online analytical processing database
  • Load data from and write data to different data sources
  • Use ETL design patterns to populate a data warehouse
  • Create a transformation that handles slowly changing dimensions
  • Create Pentaho Data Integration jobs that: run multiple transformations, use variables, contain sub-jobs, provide built-in error notification, load and process multiple text files, and convert files into Microsoft Excel format
  • Configure logging for transformation steps and for job entries and examine the logged data
  • Configure error handling for transformation steps
  • Configure the Pentaho Enterprise Repository, including basic security
  • Use the Pentaho Enterprise Repository to: create folders; store transformations and jobs; move, lock, revise, delete, and restore artifacts
  • Schedule and monitor the execution of a transformation in Pentaho Data Integration and in the Pentaho Enterprise Console
  • Create and drop indexes using a transformation
  • Create a transformation that contains steps configured to run in a cluster, run the transformation in the cluster, examine the results, and monitor the transformation
  • Create a transformation that uses a partition schema to partition data to slave servers in the cluster

Course Requirements:

Students attending classroom courses in the United States are provided with a PC to use during class. Students attending courses outside the US should contact the Authorized Training Provider regarding PC requirements for Pentaho courses.

In general, if your training provider requires you to bring a PC to class, it must meet the following requirements. You can also verify your system against the Compatibility Matrix: List of Supported Products topic in the Pentaho InfoCenter:

  • Windows XP, 7 desktop operating system (for Macintosh support, please contact your Customer Success Manager)
  • RAM: at least 4GB
  • Hard drive space: at least 2GB for the software, and more for solution and content files
  • Processor: dual-core AMD64 or Intel EM64T
  • DVD drive

Online courses require a broadband Internet connection, the use of a modern Web browser (such as Microsoft Internet Explorer or Mozilla Firefox), and the ability to connect to the WebEx Training Center. For more information on WebEx Training Center requirements, see http://www.webex.com. Online courses use Pentaho’s cloud-based exercise environment. Students are provided access to a virtual machine used to complete the exercises.

For online courses, students are provided with a secured, electronic course manual. Printed manuals are not provided for online courses. When an electronic manual is provided, students are encouraged to print the exercise book before class begins, though this is not required.

Students attending this course on-site should contact their Customer Success Manager for hardware and software requirements. You can also email us at training@pentaho.com for more information regarding on-site training requirements.

Course Agenda:

Day 1

Course Introduction

Module 1: Pentaho Data Integration Overview

Exercise 1: Introducing Pentaho Data Integration

Module 2: Inputs and Outputs

Module 3: Introduction to the Training Data

Exercise 2: Inputs and Outputs

Module 4: Data Warehouse Steps

Exercise 3: Data Warehouse Steps

Day 2

Module 5: Lookups

Module 6: Field Transformations, Part 1

Exercise 4: Lookups and Field Transformations

Module 7: Set Transformations

Exercise 5: Set Transformations

Module 8: Pivot Transformations

Exercise 6: Pivot Transformations

Module 9: Field Transformations, Part 2

Module 10: Loading the Time Dimension and the Fact Table

Exercise 7: Loading a Fact Table

Day 3

Module 11: Introduction to Jobs

Exercise 8: Creating a Job

Module 12: Advanced Job Concepts

Exercise 9: Advanced Job Concepts

Module 13: Common Scripting Uses

Exercise 10: Using JavaScript

Module 14: Dynamic Transformations

Module 15: Using XML in Pentaho Data Integration

Exercise 11: Using XML

Module 16: Portable Transformations and Jobs

Exercise 12: Portable Transformations and Jobs

Day 4

Module 17: Logging

Exercise 13: Configuring Logging

Module 18: Error Handling in Transformations

Exercise 14: Error Handling in Transformations

Module 19: ETL Patterns

Exercise 15: Calculating Time Between Orders

(Optional) Module 20: Pentaho Enterprise Repository

(Optional) Exercise 16: Pentaho Enterprise Repository

Module 21: Scheduling and Monitoring

Exercise 17: Scheduling and Monitoring

Module 22: Pre and Post-Processing

Exercise 18: Constraint and Index Management

(Optional) Module 23: Tuning and Administration Topics

Module 24: Interpreting Runtime Data

Module 25: Clustering and Partitioning

Exercise 19: Clustering and Partitioning

(Optional) Module 26: Operational Patterns