August 22, 2016
I speak to a lot of customers that are all facing the same issue – they have a limited IT staff, they have a shrinking budget, they are using legacy tools to manage their growing data needs and they just don’t have enough time to accomplish it all. We have all heard the statistic from Ventana Research that organizations spend 46% of their time preparing the data and a whopping 52% of their time checking for data quality and consistency. That means IT groups responsible...
August 19, 2016
A blueprint for big data success What is the “Filling the Data Lake” blueprint? The blueprint for filling the data lake refers to a modern data onboarding process for ingesting big data into Hadoop data lakes that is flexible, scalable, and repeatable. It streamlines data ingestion from a wide variety of source data and business users, reduces dependence on hard-coded data movement procedures, and it simplifies regular data movement at scale into the data lake. The “ Filling the Data Lake ”blueprint provides developers with...
August 16, 2016
The time has come again to recognize the outstanding achievements of our customers using Pentaho solutions for big data analysis and integration. That’s right, the 2016 Pentaho Excellence Awards are upon us once again, and we just sent this year’s submissions to our judges for review. By the time our nomination process closed, we received more than double the number submissions than last year. The entries come from around the world and represent a wide variety of vertical markets, including government , healthcare and retail...
August 12, 2016
Modern Data Preparation Roadblocks by Kevin Haas - Partner, Inquidia Consulting Now that we’re in the summer heat of election season, political rhetoric filling the air, all are focused on the potential results of the democratic process. I’m not writing, however, to discuss the power of the people to drive candidates, but rather the power struggle happening inside businesses around using technology to solve problems. In particular, I am referring to a growing trend wherein business analysts have more control than ever to prepare and...
August 11, 2016
Disclaimer: This article will include a very light-touch treatment of mathematics, Scala, and Spark. In short, there’s something for everyone to dislike. Please send complaints to WELRIFAI at PENTAHO dot COM. Predictive Maintenance, Anomaly Detection & Spark Spark is a very hot topic right now in Big Data circles - and with good reason. It offers a great framework for execution of complex recursive algorithms typical of machine learning techniques and provides a robust capability to integrate with Hadoop’s distributed file system (HDFS) for highly...
August 5, 2016
It’s official: Dresner thinks Pentaho is excellent. In case you missed it, last month Dresner Advisory Services announced the results of its 2016 Industry Excellence Awards for business intelligence. Pentaho has been named a “Trust Leader for Business Intelligence” due to our Leadership Status in the annual Wisdom of Crowds® Business Intelligence (BI) Research. The 2016 Industry Excellence Awards were presented to 19 vendors across five categories: Overall Leader, Customer Experience Leader, Technology Leader, Credibility Leader, and Trust Leader. As you may know, Dresner Advisory...
July 27, 2016
Summer doesn’t have to mean bad novels on the beach. If you’re looking for something to keep you on point and expand your perspective, check our list of top 5 summer reads. These are some of our favorite ebooks and papers, and will provide some food for thought while you enjoy your holiday. Best Practices for Data Prep Success If data prep keeps you up at night – or at least takes a big part of your day - consider the TDWI Best Practices Report:...
July 22, 2016
Here’s a quick summary of four of our favorite blogs from our Chief Geek, also known as the Lord of the Ones and Zeroes, James Dixon . 1. Apache Spark Integration: Why it Matters The integration we launched last year enables Spark jobs to be orchestrated using Pentaho Data Integration so that Spark can be coordinated with the rest of your data architecture. Like Hadoop , Spark has come a long way since it was created as a scalable in-memory solution for one data scientist...
July 6, 2016
You’ve probably heard a bunch of pitches from startups in the ever-changing landscape of Hadoop along with other cutting-edge technologies that set an exciting vision; a chance to be one of the first, and work with the “next big thing,” as they rocket to their IPO. They’ve got great slides and convincing evangelists. So how do you know the real data integration solutions among the paper tigers in the Hadoop marketplace? Here’s what you should consider before signing on the dotted line: 1. NUMBER OF...
June 22, 2016
The Problem of Data Variety If you ask organizations what data problems they face, the most common answer isn’t “big” data problems, or “real-time” data problems: it’s data variety. Data variety often comes from diverse data types, formats, and sources. When data is varied, it is also often siloed away from other data. To meet the needs of IT, business teams, and analytics groups, only a complete approach to data preparation can solve this problem. What is a Data Silo and Why Does It Matter?...

Pages