Big Data Fundamentals
With growing volumes and varieties of data flowing at increasing speed, organizations need a fast and easy way to harness and gain insight from their big data sources. Pentaho accelerates the realization of value from big data with the most complete solution for big data analytics.
Pentaho provides the right set of tools to each user, all within a tightly coupled data integration and analytics platform that supports the entire big data lifecycle. For IT and developers, Pentaho provides a complete, visual design environment to simplify and accelerate data preparation and modeling. For business users, Pentaho provides visualization and exploration of data. And for data analysts and scientists, Pentaho provides full data discovery, exploration and predictive analytics.
Using a combination of instructor-led presentations and hands-on exercises, this course provides an overview of big data technologies and an overview of the Pentaho tools for both working with big data and for visualizing it. This course helps prepare you for the Pentaho Data Integration Certification Exam.Back to Courses
This course is a stand-alone course in the Data Analyst learning path. Students who need a comprehensive overview of big data tools and technologies should take this course instead of DI1100 Pentaho Big Data Integration.
The content in DI1100 will be completely covered in this DI2000 Big Data Fundamentals course but you will also receive additional knowledge related to big data concepts, tools and technologies.
If you complete the DI2000 Big Data Fundamentals course, you should not register for DI1100 Pentaho Big Data Integration.
|Online||English||Pentaho||March 18, 2014 - 10:00 AM EDT||Register Now|
|Online||English||Pentaho||May 6, 2014 - 10:00 AM EDT||Register Now|
At the completion of this course, you should be able to:
- Identify the purpose and value of various big data technologies: Hadoop, HDFS, Hive, MapReduce, NoSQL databases, etc.
- Read and write data using HDFS
- Orchestrate big data jobs in Pentaho Data Integration
- Use Pentaho Data Integration (and Pentaho MapReduce) to manipulate big data
- Read and write data using a NoSQL data source
- Visualize big data using Pentaho InstaView
Before taking this class, students should complete course DI1000: Pentaho Data Integration or have equivalent field experience with Pentaho Data Integration. Big data knowledge is helpful but not required. Some basic knowledge of the Linux operating system (CentOS) is required.
Though not a requirement, attendees would benefit from taking Business Analytics User Console (BA1000) prior to taking this class to gain an overview of the Pentaho Business Analytics interface.
Online courses require a broadband Internet connection, the use of a modern Web browser (such as Microsoft Internet Explorer or Mozilla Firefox), and the ability to connect to the WebEx Training Center. For more information on WebEx Training Center requirements, see www.webex.com. Online courses use Pentaho’s cloud-based exercise environment. Students are provided access to a virtual machine used to complete the exercises.
For online courses, students are provided with a secured, electronic course manual. Printed manuals are not provided for online courses. When an electronic manual is provided, students are encouraged to print the exercise book before class begins, though this is not required.
Students attending this course on-site should contact their Customer Success Manager for hardware and software requirements. You can also email us at firstname.lastname@example.org for more information regarding on-site training requirements.
Day 1 Agenda
Pentaho and Big Data Big Data Overview and Architecture
Hadoop, HDFS and Flume
Writing Data to HDFS using Flume
Working with Structured Data
Working with MapReduce
Working with Pentaho MapReduce
Day 2 Agenda
Working with Hive
Working with Pentaho InstaView
Reporting on Big Data
Working with NoSQL Databases Job Orchestration Oozie, Pig and Sqoop Transforming Data using Pig