Developing and providing supercomputing capabilities to domestic and international researchers
- In 2013 the computing time for the user lab alone was 459,333,565 CPU hours. As CSCS had no centralized data warehouse or reporting tool, gathering data for reports was a cumbersome manual task that took about two-three weeks and many interactions with various staff members
- To reduce reporting time, optimize supercomputer usage and efficiently manage the CPU utilization for each project, CSCS needed a centralized solution that could cope with large (12 million rows) and growing data volumes and automate the data collection process
- The solution also needed an interface to SAP system’s financial data to merge two silos - supercomputing machine data and financial project data. This would enable projects for both customers and user lab projects to be managed more efficiently.
- CSCS used Pentaho Consulting Services for deployment assurance which includes the server installment and configuration, architecture review and best practices
- CSCS uses Pentaho Data Integration (PDI) to automatically extract CPU data from the supercomputers and the SAP system and blend it for reporting
- Pentaho Reporting is used to generate reports for the management team and paying customers and for financial controlling
- PDI has eliminated the need for manual CPU data gathering, which used to take from 2-3 weeks
- CSCS now fully understands the actual CPU usage for each project and can more effectively allocate CPU time. It is now also able to use PDI to monitor CPU performance and can calculate the CPU value per machine. This also enables teams to find machines that are under or over-utilized
- The simplicity and speed of gathering machine data enables CSCS to give their paying customers monthly reports to help them understand actual CPU requirements
- The financial team can now easily set-up reports as needed
- In the future CSCS plans to integrate further metrics and monitoring. This will include the ability to monitor disk storage utilization and internal IT ticketing systems.
- Data Integration capabilities connect two former data silos - machine and financial data
- PDI automates data gathering and job scheduling
- Easy-to-use reporting
- Internal knowledge and positive prior experience