Products

Desktops to data center, all Make in India

14 product categories across compute, AI, and data center. Deployment-ready from our 28,000 sq ft facility.

Download Product Catalog
AI Solutions

Sovereign AI infrastructure

End-to-end AI compute under one sovereign umbrella. Designed here. Manufactured here. Supported here.

Talk to a Solutions Architect
Support

SLA-driven. Not ticket-driven.

Warranty. SLA. On-site service. Account management. Every commitment documented, every response time defined.

Download SLA Commitment
Company

Built on process, not promises

ISO 9001. PLI 2.0. SOP-led manufacturing. The systems behind every device we ship.

Our Story
Reference Architecture RA-004

AI Research Cluster

High-density GPU cluster for deep learning research, model training, and scientific computing. This reference architecture delivers 40+ GPUs with petabyte-scale high-performance storage and 400Gbps InfiniBand fabric—designed for universities, research labs, and AI centers of excellence.

Research Institutions Universities AI Centers of Excellence National Labs
40+ GPUs Total
1.92 TB GPU Memory
1 PB Storage Capacity
400 Gbps InfiniBand Fabric

Executive Summary

Production-grade AI infrastructure for large-scale model training and research workloads.

Use Case

Large Language Model training, computer vision research, scientific simulation, drug discovery, and multi-GPU distributed training workloads.

Challenges Addressed

GPU utilization optimization, storage I/O bottlenecks, multi-node synchronization, power density, and cooling for high-wattage GPUs.

Key Outcomes

Linear scaling for distributed training, 90%+ GPU utilization, shared research datasets, and job scheduling for multi-tenant access.

Make in India Compliant
NVIDIA DGX-Ready
CUDA 12.x Certified
InfiniBand Verified

Architecture Overview

Total GPU Compute Summary

40
NVIDIA H100 80GB GPUs
3.2 TB
Total GPU HBM3 Memory
~130 PF
FP8 Tensor Performance
5
8-GPU Server Nodes

Detailed Bill of Quantities

Complete hardware specification for 40-GPU research cluster.

GPU Compute Nodes

5 Units
Component Specifications Qty Purpose
8-GPU Training Server
RDP-GPU-8U-H100
  • 8U GPU Server Chassis (liquid-ready)
  • Dual Intel Xeon Platinum 8480+ (56C/112T each)
  • 2TB DDR5-4800 ECC RAM (32× 64GB)
  • 8× NVIDIA H100 80GB SXM5 GPUs
  • NVLink 4.0 (900 GB/s GPU-GPU)
  • 8× 3.84TB NVMe U.2 SSD (Local Scratch)
  • 8× 400GbE QSFP112 (InfiniBand NDR)
  • 2× 100GbE QSFP28 (Ethernet Management)
  • 10.2 kW TDP (Air/Liquid Cooling)
  • Redundant 3+1 PSU Configuration
Distributed training nodes (40 GPUs total)

High-Performance Storage

1 PB Cluster
Component Specifications Qty Purpose
All-Flash Scratch Storage
RDP-PFS-NVMe-200T
  • 2U All-NVMe Storage Node
  • Intel Xeon Gold 6338 (32C/64T)
  • 512GB DDR4 ECC RAM
  • 24× 15.36TB NVMe SSD
  • 4× 200GbE QSFP56 (IB HDR)
  • ~200TB usable per node
  • Lustre/GPFS/BeeGFS Client
  • 80+ GB/s sequential throughput
Training scratch, checkpoints (~400TB flash)
Capacity Storage Array
RDP-NAS-4U-60B
  • 4U 60-Bay JBOD Enclosure
  • 60× 22TB Enterprise SATA HDD
  • SAS Expander Architecture
  • Dual I/O Modules (Redundant)
  • ~900TB raw per enclosure
  • Erasure coding for efficiency
Dataset storage, model archive (~1PB)
Parallel File System Metadata
RDP-MDS-2U-NVMe
  • 2U Metadata Server
  • Intel Xeon Gold 6342 (24C/48T)
  • 512GB DDR4 ECC RAM
  • 8× 3.84TB NVMe SSD (RAID10)
  • 4× 100GbE QSFP28
  • Lustre MDS / GPFS NSD
HA metadata servers for PFS

Data Preprocessing Servers

4 Units
Component Specifications Qty Purpose
CPU Compute Server
RDP-SRV-2U-HPC
  • 2U Dual-Socket Server
  • Dual Intel Xeon Platinum 8380 (40C/80T each)
  • 1TB DDR4-3200 ECC RAM
  • 4× 7.68TB NVMe SSD
  • 2× 200GbE QSFP56 (IB HDR)
  • 2× 100GbE QSFP28 (Ethernet)
  • Dual 1600W Titanium PSU
Data preprocessing, ETL, feature engineering

High-Speed Interconnect

InfiniBand NDR
Component Specifications Qty Purpose
InfiniBand NDR Switch
IB-SW-NDR-64
  • 64× 400Gbps QSFP112 Ports
  • 51.2 Tbps Switching Capacity
  • Non-blocking Fat-tree Topology
  • RDMA over Converged Ethernet
  • Adaptive Routing, Congestion Control
  • SHARP In-network Computing
GPU fabric spine switches
Ethernet Management Switch
SW-MGMT-48X100G
  • 48× 100GbE QSFP28 Ports
  • 6× 400GbE QSFP-DD Uplinks
  • Layer 3 Routing
  • VXLAN, BGP-EVPN
  • Management & Storage Network
Management & storage Ethernet
Optical Cabling Kit
OPTICS-NDR-KIT
  • 64× 400G QSFP112 SR4 Transceivers
  • 48× 100G QSFP28 SR4 Transceivers
  • OM5 Multi-mode Fiber Cables
  • MPO/MTP Connectivity
  • Fiber Patch Panels
1× Kit Cluster interconnect cabling

Power & Cooling Requirements

Critical infrastructure planning for high-density GPU deployment.

Power Budget

Component Qty Max Power Typical
GPU Servers (8U) 5 51 kW 45 kW
Storage Nodes 6 6 kW 4.5 kW
CPU Servers 4 3.2 kW 2.4 kW
Network Switches 4 2.4 kW 1.8 kW
Total ~63 kW ~54 kW

Cooling Requirements

  • Cooling Capacity ~220,000 BTU/hr
  • Recommended Cooling Direct Liquid Cooling (DLC)
  • Alternative Rear-door Heat Exchanger
  • Inlet Temperature 18°C – 27°C
  • Airflow ~15,000 CFM
  • Rack Density 30-40 kW per rack

Electrical Infrastructure

  • Input Power 3-Phase 415V AC
  • Circuit Capacity 2× 100A 3-Phase
  • UPS Capacity 80 kVA (minimum)
  • Generator Backup Required (100 kVA+)
  • PDU per Rack 2× 60A 3-Phase

Physical Space

  • Rack Count 2-3 Full Racks
  • Floor Space ~6 m² (rack footprint)
  • Weight per Rack ~1,200 kg
  • Floor Load Reinforced required
  • Ceiling Height 3m+ recommended

Software Stack

Operating System

  • Ubuntu 22.04 LTS (HPC optimized)
  • RHEL 9.x / Rocky Linux 9
  • NVIDIA GPU Driver 535+
  • CUDA 12.x Toolkit

AI/ML Frameworks

  • PyTorch 2.x + DeepSpeed
  • TensorFlow 2.x + Horovod
  • NVIDIA NeMo, Megatron-LM
  • JAX, Hugging Face Transformers

Cluster Management

  • Slurm Workload Manager
  • Kubernetes + NVIDIA GPU Operator
  • Lustre / GPFS / BeeGFS
  • Prometheus + Grafana Monitoring

Reference Architecture Disclaimer

This reference architecture is provided for planning and discussion purposes. GPU availability subject to NVIDIA allocation and lead times. Actual configurations may vary based on specific research workloads, facility capabilities, and budget. Liquid cooling infrastructure may require additional site preparation. Final BOQ will be prepared after detailed requirements analysis and site assessment.

RDP Hardware Portfolio

Make in India certified infrastructure manufactured at our Hyderabad facility.

Ready to Build Your AI Research Infrastructure?

Get a customized BOQ based on your research workloads, GPU requirements, and facility capabilities.