ProSiebenSat.1 Digital

About This Customer

ProSiebenSat.1 Digital
One of Europe’s leading media corporations

Resources

Screenshots

PDI
Creating a hybrid architecture that integrates relational databases and Hadoop ensures that we are future-proof. Pentaho Data Integration ties both worlds together seamlessly and makes it fast and easy for users to gain insights from our entire body of data.
Jürgen Popp Director of Business Intelligence, ProSiebenSat.1 Digital

Use Case Overview

Business Challenges

Needed to create a centralized data warehouse (DWH) that was:

  • Future-proof; capable of integrating any new, non-relational data sources that arise
  • Capable of processing a broad variety of data formats
  • Capable of processing fast-growing data volume
  • Offering a competitive price/performance ratio

Pentaho Solution

Pentaho’s partner Inovex GmbH realized that a traditional DWH would quickly reach its limit and suggested moving beyond the original brief and implementing a hybrid data architecture comprised of:

  • A relational DWH based on PostgreSQL to host current data
  • A cluster of eight Apache Hadoop nodes to host historical data
  • Pentaho Data Integration (PDI) for transmitting data between the DWH and the Hadoop cluster and for importing data into Hadoop

Value Added

The hybrid data architecture with PDI has been live since March 2013 and is continuously expanding. Using PDI provides ProSiebenSat.1 Digital GmbH with:

  • Integration: seamless interaction of big data and relational data
  • An easy-to-use ETL environment for big data: PDI eliminates the need for specialist skills for executing MapReduce jobs and scripting in Java and other languages
  • Data blending: PDI allows data to be blended from the DWH with historical data stored in Hadoop and makes these blends readily available through any reporting tool
  • Powerful data delivery: PDI’s multi-threaded data integration engine enables fast processing and delivery of data from the Hadoop cluster

Why Pentaho

  • Straightforward integration of relational and big data sources
  • Capable of coping with different data sources and high data volume
  • Easy to use in complex Hadoop environment; no specialist skills required
  • Excellent price/performance ratio