An South East Asian agency, focused on natural disaster detection and prevention, has de­ployed a multi-region, GPU-accelerated, HPC cluster based on the LMX AI reference architecture, as a national AI cloud for cloud native workloads. The state agency was tasked with mitigating a na­tionwide crisis (COVID-19), and faced major operational challenges in responding to the needs of the public and internal services.
 
THE BRIEF

Due to the rapidly developing COVID-19 crisis, this was an urgent customer requirement. The agency needed a flexible cloud platform primarily used for data analytics, but must be agile, scalable and support modern containerisation, all while being affordable and providing the highest performance. Unfortunately, affordability and high performance usually don’t go hand in hand – which is why they looked to opensource solutions. 

THE REQUIREMENTS
 
  • Scalable and Agile 
  • Support containerised applications 
  • Improve communication channels 
  • Infrastructure must be built quickly to respond to emergencies 
  • Enhance big-data collection, synchronisation and analytics 
  • Centralised command center
  • Affordable and high performance 
  • Optimise remote workforce collaboration and communication 
  • Manage scalable workloads 
  • Standardisation and unification across rural and urban districts, countrywide 
LIMITATIONS OF A LEGACY SYSTEM 

The customer was running existing applications on a legacy platform, that was nearing capacity. Their infrastructure was fragmented with conflicting software applications and ill-equipped hardware resources. The customer required rapid migration, deployment and provisioning from a new platform, that supported both mobile and remote workforces, as well as field operations, and collaborative data analytics.

“With LMX Cloud Software, you benefit from lower entry and operating costs, no lock-in or ongoing licensing fees. Everything we do is based on Open methodology and open architecture and fully disaggregated hardware!” 

Choo Yuh Joo, Managing Director, Starview International 

THE CHALLENGES 

The unique landscape of South East Asia poses immense challenges regarding the management of a pandemic. 

Travel restrictions also proved challenging, but thanks to our global integration and deployment partners across the UK, Ireland, Israel, Taiwan, Singapore and South East Asia, the solution was deployed seamlessly, on-time and under budget. 

Fig 1. The deployment serves as a 7-Region national scale cloud, which is primarily being used for Disaster detection and prevention

THE SOLUTION 

The solution included 2 private cloud environments for high availability and resilience, hosted in the customer’s DC, combined with 5 edge cloud environments, for a total of 7 locations. The new cloud platform featured integrated GPUs for AI and Machine Learning workloads, with Lightbits NVMe-over-fabric software addressing the high performance storage requirement. 

The hardware backbone comprised of high core count, AMD based systems (2U, 64 Core servers), with Kubernetes container orchestration delivered by SUSE Rancher. 

The deployment serves as a 7-Region national scale cloud, which is primarily being used for Disaster detection and prevention. Featuring a resilient core for AI training (including NVIDIA GPUs as well as NVMe), with all workloads orchestrated via k8s (using SUSE Rancher). Each of the 7 cloud regions operates a full LMX cloud environment with secure multi-tenancy. 

Download the case study pdf and find out more about LMX's additional capabilities.

AI Reference Architecture
If you are interested in simplified IT management, scalable HPC resources, accelerated workloads and faster time to insight, talk to us about LMX today.