Interview with Julita Corbalán, CTO at EAR
Why energy optimisation has become critical
In an interview with Julita Corbalán, she explains how EAR software optimises energy consumption in data centres, aiming for compatibility with nearly 99% of current CPUs and GPUs. Julita explains how the ODISSEE project is extending EAR's capabilities to meet future energy challenges, in collaboration with research infrastructures such as CERN and SKAO.
With climate change and energy prices at stake, energy consumption is becoming increasingly critical in all sectors. How are your technologies helping HPC players address this issue?
It’s clear that today, power usage by CPUs, and especially GPUs, is reaching levels that can no longer be ignored. Energy optimization and power management are no longer optional – they are essential for future architectures and for both public and private institutions.
To that end, our EAR software has been designed for energy monitoring, management and optimization in HPC and AI data centres. Today, we aim to extend the range of hardware we support, to provide greater energy efficiency for new architectures. By the end of the ODISSEE project, EAR will support almost all – say 99% – of the CPU and GPU models currently found in production environments. We already support Intel and AMD, and we also support Grace from NVIDIA. By the end of the project, we expect to include Rhea processors from SiPearl and Maverick from NextSilicon. With that, at the CPU level, the software should support all or nearly all architectures used in both private companies and public data centres. This would allow us to offer optimized energy efficiency across the full spectrum of modern CPUs.
The ODISSE project is a collaboration EU-funded project. How are the technologies you are developing within the project will benefit European industry or organisations?
Considering the upcoming power restrictions – which are expected in the short term – one key feature that EAR implements is power capping. This means that beyond dynamic optimization, we also support power caps at the application or workflow level. These caps – whether for a workflow, a cluster, or even an entire system – will depend on the institution’s specific constraints. EAR ensures that power consumption remains within the required limits.
So, on one hand, we guarantee that the system consumes the power best suited to the specific characteristics of a given application or workflow. On the other hand – and just as importantly – we provide the power management features needed to stay under institutional or infrastructural constraints such as maximum power or temperature thresholds.
In my opinion, a software such as EAR is already essential. With the power consumption we’re currently observing in today’s data centres, and even more so in the coming years, energy and power management software will no longer be optional – it will be a must for all actors.
« It’s clear that today, power usage by CPUs, and especially GPUs, is reaching levels that can no longer be ignored. »
« The role of the EAR software in the ODISSEE project is to guarantee that the workloads for data centres are executed with maximum energy efficiency. »
How EAR contributes to the success of the project?
The role of the EAR software in the ODISSEE project is to guarantee that the workloads for data centres that we’re going to execute in the present and in the near future are executed with maximum energy efficiency. EAR is currently able to monitor and automatically optimize the execution of individual applications, and to control the power consumption of the data centre.
The target in ODISSEE is to extend the features and capabilities of our software to a more global scope – going from individual applications to workloads. Additionally, EAR will be extended to take into account new architectures that are going to be tested and later deployed in the project – such as MAVERICK from NextSilicon, which is a kind of modular architecture.
And how would you approach the main technical challenges of ODISSEE?
We have a very exciting challenge ahead of us: to create new energy models for emerging systems, particularly modular architectures with innovative designs that we’ve never applied before. We also want to try these new architectures with some new ideas — new ideas for us, not necessarily new in other contexts — but for example, such as using different machine learning techniques.
There’s a second main technical challenge. EAR operates in a fully automatic and dynamic way in traditional HPC scenarios, where the user is a person, and it’s designed to require no prior knowledge to use. But in workflow contexts — whether in data centres, data science, or other domains — the ‘user’ is often a workflow manager, essentially a tool. So we’ve started to shift our mindset and accept that this ‘user’ can also be a system component guiding or driving EAR’s decision-making
Could you describe how are you integrating your technologies within the project?
The new energy models that we will develop for new architectures will be transparently integrated, because they will be applied automatically once an application or the individual components of the workflows are executed.
Then, in the context of workflows and workflow managers, there are basically two types. In some cases, workflow managers operate with a coarse-grained granularity, creating and managing workflows at the job scheduling level – submitting multiple jobs and handling the execution of their dependencies. In those cases, the integration can also be coarse-grained, for example through files or well-defined APIs like REST APIs. That would make it suitable for EAR to support scenarios where the workflow is even distributed across multiple sites or data centres. The second type of workflow manager is exemplified by PyCOMPSs from the Barcelona Supercomputing Center. In that case, PyCOMPSs typically creates workflows by allocating a large job in a data centre and then managing the internal resources within that allocation.
The third topic is the use of data centre monitoring by EAR. One of the actions in the project is to use these metrics to implement tasks like predictive maintenance.
The project includes two major research infrastructures: CERN and SKAO. How does collaborating with them drive technological innovation?
They bring us new challenging use cases that let us test and run our software in direct collaboration with researchers. Their workflows come with very specific requirements, which are different from the scenarios we typically work with.
These new use cases help us improve the tool’s quality, particularly in areas like deployment, production booting, security, and adapting to varying requirements across different data centres, which is extremely valuable.
« We have a very exciting challenge ahead of us: to create new energy models for these new architectures. »
News and events
Follow us
ODISSEE Webinar: how to shape trust in the age of data deluge and AI-driven processing?
Join us to learn about the future of online data processing in the realm of giant research infrastructures ! In this 45-minutes webinar we will...
Launch of EU-funded project ODISSEE
The ODISSEE project, funded by the European Union, aims to develop innovative technologies and methodologies to process the unprecedented volume of...