An South East Asian agency, focused on natural disaster detection and prevention, has deployed a multi-region, GPU-accelerated, HPC cluster based on the LMX AI reference architecture, as a national AI cloud for cloud native workloads. The state agency was tasked with mitigating a nationwide crisis (COVID-19), and faced major operational challenges in responding to the needs of the public and internal services.
Due to the rapidly developing COVID-19 crisis, this was an urgent customer requirement. The agency needed a flexible cloud platform primarily used for data analytics, but must be agile, scalable and support modern containerisation, all while being affordable and providing the highest performance. Unfortunately, affordability and high performance usually don’t go hand in hand – which is why they looked to opensource solutions.
Scalable and Agile
Support containerised applications
Improve communication channels
Infrastructure must be built quickly to respond to emergencies
Enhance big-data collection, synchronisation and analytics
Centralised command center
Affordable and high performance
Optimise remote workforce collaboration and communication
Manage scalable workloads
Standardisation and unification across rural and urban districts, countrywide
LIMITATIONS OF A LEGACY SYSTEM
The customer was running existing applications on a legacy platform, that was nearing capacity. Their infrastructure was fragmented with conflicting software applications and ill-equipped hardware resources. The customer required rapid migration, deployment and provisioning from a new platform, that supported both mobile and remote workforces, as well as field operations, and collaborative data analytics.
“With LMX Cloud Software, you benefit from lower entry and operating costs, no lock-in or ongoing licensing fees. Everything we do is based on Open methodology and open architecture and fully disaggregated hardware!”
Choo Yuh Joo, Managing Director, Starview International
The unique landscape of South East Asia poses immense challenges regarding the management of a pandemic.
Travel restrictions also proved challenging, but thanks to our global integration and deployment partners across the UK, Ireland, Israel, Taiwan, Singapore and South East Asia, the solution was deployed seamlessly, on-time and under budget.
Fig 1. The deployment serves as a 7-Region national scale cloud, which is primarily being used for Disaster detection and prevention
The solution included 2 private cloud environments for high availability and resilience, hosted in the customer’s DC, combined with 5 edge cloud environments, for a total of 7 locations. The new cloud platform featured integrated GPUs for AI and Machine Learning workloads, with Lightbits NVMe-over-fabric software addressing the high performance storage requirement.
The hardware backbone comprised of high core count, AMD based systems (2U, 64 Core servers), with Kubernetes container orchestration delivered by SUSE Rancher.
The deployment serves as a 7-Region national scale cloud, which is primarily being used for Disaster detection and prevention. Featuring a resilient core for AI training (including NVIDIA GPUs as well as NVMe), with all workloads orchestrated via k8s (using SUSE Rancher). Each of the 7 cloud regions operates a full LMX cloud environment with secure multi-tenancy.