AI, deep learning and HPC workloads require highly parallel processing capability - only truly delivered by GPU-accelerated systems - featuring one or more NVIDIA GPUs depending on the stage you are at in the pipeline - development, training or inferencing. The power and type of GPUs required for these three different phases differs greatly, so the Scan AI team has curated a portfolio of GPU-accelerated systems tailored for the unique demand of these three stages.
Furthermore, as an NVIDIA Elite Solution Provider you can be sure all our configurations and solutions are tried, tested and in many cases certified by NVIDIA to ensure we deliver the latest technologies, offering the best performance whilst remaining cost-effective.
Development Systems
Development is the stage of deep learning where you work to establish models that can be taken through to full scale training. Whether working with deep learning libraries, frameworks or applications, you would typically use small sets of data repeatedly making minor tweaks and changes to see if the results look like the outcomes you are wanting. This type of workload is usually not overly GPU-intensive so a workstation is sufficient - in many cases with just a single GPU. Scan AI has designed a range of DEVELOPMENT BOXES with up to six NVIDIA GPUs to address both cost efficiency and performance where using multiple GPUs allows for faster discovery by developing several models alongside each other.
Due to AI development projects being smaller repeated workloads, they may also benefit from utilising underused GPU resource across a number of systems. This is made possible by RUN:AI ATLAS SOFTWARE that pools GPUs from multiple systems to give you a virtual centralised resource that can be allocated and segregated amongst multiple tasks or users - dynamically as demand changes and projects evolve.
Alternatively, development workloads can be performed on one of our CLOUD AI & HPC virtual instances - as, although we’ve stated that it requires GPU-accelerated hardware, it doesn’t necessarily have to be a physical system at your desk. This public cloud service delivers the power of GPUs to any device, anywhere to make your development phase as flexible and scalable as possible, offering multiple profiles that can be easily scaled up or down as your workloads change.
If you’re unsure what hardware requirements may best suit your development plans, then the Scan AI team is always on hand to advise and help. You can explore our free proof of concept trials or managed hosting offerings by clicking the links below - or simply GET IN TOUCH.
Training Systems
Training is the phase of deep learning where you have identified a suitable model worth investigating further from the small datasets. You would now expand the dataset to a far greater capacity to undertake the repeated cycles of training so that the model can learn and become more accurate with each iteration. This type of intensive work on larger datasets requires much more GPU resource, so server systems offering up to eight GPUs are commonplace to accomplish these tasks. NVIDIA-certified 3XS EGX AND HGX SERVERS are the starting point for training offering professional -grade NVIDIA GPUs tailored to specific workloads, and fully customisable with a choice of Intel Xeon or AMD EPYC CPUs and a range of memory capacities. It is also possible to choose from either Ethernet or InfiniBand networking at various throughput speeds.
For the highest demand workloads we recommend the NVIDIA range of DGX appliances - the DGX H100, featuring the latest Hopper-based GPUs. At the very top of the scale multiple DGX units can be combined in a POD ARCHITECTURE to deliver huge performance supported by AI-optimised NVMe all-flash storage. Each DGX appliance has a range of COMPREHENSIVE SUPPORT CONTRACTS covering both software updates and hardware components, coupled with a choice of media retention packages to further protect any sensitive data within the memory or storage. To further ensure optimal GPU utilisation across your training infrastructure, multiple GPU systems can also benefit from pooling to become a single virtualised parallel compute resource that can be allocated and segregated dynamically using RUN ATLAS SOFTWARE - regardless of the type of systems and GPUs involved.
Alternatively, training workloads can be done on one of our SCAN CLOUD AI & HPC virtual instances - as, although we’ve stated that multi-GPU hardware is needed, it doesn’t necessarily have to be a physical system in your server room or datacentre. This service delivers the power of multi-GPUs systems to any device, anywhere to make your training phase as flexible and scalable as possible offering multiple profiles that can be easily scaled up or down as your workloads changes.
If you’re unsure what hardware requirements may best suit your training plans, then the Scan AI team is always on hand to advise and help. You can explore our free proof of concept trials or managed hosting offerings by clicking the links below - or simply GET IN TOUCH.
Inferencing Systems
When it comes to inferencing, the type of GPU resource required may be quite different from development or training in that a fully trained model ready for deployment in the real world doesn’t need significant power to carry out its task - whether that be on video surveillance footage, image recognition or data collection via sensors. Additionally, it may be that the inferencing device requires remote placing where access is limited so low power, zero maintenance and 4G / 5G connectivity are necessary. For this reason embedded GPU systems are often chosen as they meet this criteria, can be highly customised for specific needs such as harsh environments or extreme temperatures.
Occasionally, it may be that a model needs retraining to ensure accuracy remains at the levels required, but the full power of a datacentre is not required - for these cases we have a range of ruggedised NVIDIA EGX retraining servers that address this need.
If you’re unsure what hardware requirements may best suit your inferencing deployment, then the Scan AI team is always on hand to advise and help - don’t hesitate to get in touch.
Free Proof of Concept Trial
Any of our AI hardware systems can be tested in a secure datacentre environment, guided by our team of AI experts to ensure you get the maximum benefit and insight from your trial.
As an alternative to hosting your AI hardware within your premises, the Scan AI team can arrange for your servers and other infrastructure in a number of UK and European datacentres.