Educational Hub

What is Data Integration?


The wide range of information gathered by a business is rarely stored in a single database or format. However, analytics software is charged with providing a holistic view of a company’s operations based on this diverse data. Data integration is the process by which information from multiple databases is consolidated for use in a single application. Put in laymen’s terms, data integration combines parts that don’t normally fit together.

Business analytics software depends upon accurate data integration to build dashboards, visualizations and reports that reflect accurate, consistent information. Failure to clean up data would result in queries returning useless apples-to-oranges comparisons. The volume and diversity challenges created by big data make effective integration even more important.

Homegrown applications can be developed to translate data across sources into a common format. However, as the volume and diversity of information increase, the data management demands on an application can become overwhelming. Complex coding and substantial hardware investments may be required to keep it all working.

A data warehouse gets around these obstacles by serving as an information repository that other programs can draw upon – essentially a database of databases. An ETL (extract, transform and load) process ingests and cleans up data on a scheduled basis, then delivers the ready-to-use information to the data warehouse. Programs that access the data warehouse find the information prepared for analysis.

There are advantages and disadvantages to any data integration effort. The most effective solution will depend upon the resources and requirements of your company.


The importance of data integration is apparent to anyone who’s spent time fetching information from multiple systems for a basic report. When a business grows, so do the demands on a company’s information systems. Additional locations, new revenue streams and changing priorities will affect the form your data takes. What’s more, many businesses rely on legacy systems to provide historical data from sources that may not even exist anymore.

Data integration frees employees to concentrate on analysis and forecasting – tasks that require a human touch. It also vastly reduces the chances for errors to be introduced during the data translation process.

Simply put, data integration is the essential link between information and insight. Businesses that ensure their various databases can “talk” to one another are able to take advantage of the details they’re already collecting.

Pentaho Data Integration is a powerful ETL application that is part of the Pentaho Business Analytics platform. Thanks to the visual MapReduce interface, you can extract information from any data source for preparation and delivery to a data warehouse or Hadoop cluster – all without writing a single line of code. Increase developer productivity while enjoying the benefits of the premier data integration technology.

Helpful Resources

Choosing the best business analytics solution can be complicated. Check out our library of helpful content including case studies, whitepapers, webinars and demos. 

See Related Resources >

Related Topics

What is Big Data?

What is Hadoop?

What is Governed Data Delivery?