G TC 2022 Session별 highlights — HPC: Supercomputing
4 min readApr 17, 2022
이번 GTC 2022에 총 1000여 개의 Session이 발표되었다. 이중에서 주요 topic별로 highlight를 요약하였다. 그 세번째로 LLM 학습에 대해 살펴본다.
1.LLM(Large Language Model) 학습
2.Data Center: Networking/Virtualization/Cloud
3. HPC: Supercomputing
1.Supercomputer Performance, Meet Cloud Versatility
[AI at Microsoft]
[Azure Cognitive Services]
[Azure Machine Learning]
[Innovation without boundaries]
2.Azure AI Supercomputing VMs: AI at Any Scale
[Azure HPC & AI breakthroughs]
[GPU Products in Azure]
[Azure Accelerated Visualization]
[NV v5 — Future Visualization Platform]
[GPU-P: Right vGPU profile for an user persona]
[Powerful Infrastructure Options for Every Workload]
[NDm A100 v4 — Distributed AI training Platform]
[NCA100 v4 Future mid range compute platform]
[NC A10 v4 Future inference/light compute platform]
[Azure 인스턴스별 용도 및 GPU의 종류]
3.Cloud Native Supercomputing Next Phase: Multi-tenant Performance Isolation
[NVIDIA Quantum-2]
- 400G NDR InfiniBand Cloud-Native Supercomputing
[In-Network Computing Accelerated Supercomputing]
[SHARP(Scalable Hierarchical Aggregation & Reduction Protocol)]
- In-network Tree based Aggregation Mechanism
- Multiple Simultaneous Outstanding Operations
- Small Message and Large Message Reduction
- Barrier, Reduce, All-Reduce, Broadcast and More
- Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND
- Integer and Floating-Point, 16/32/64 bits
[Microsoft Azure VM Family]
[Azure HPC/AI VM Series]
[Azure HBv3]
- Ideal for traditional HPC/MPI workloads
[Azure NDv4]
- Ideal for traditional AI/DL workloads
[Azure NCv4 (Preview)]
- Ideal for AI training/Inference workloads
[InfiniBand Features in Azure]
[Adaptive Routing]
레퍼런스
[1] Supercomputer Performance, Meet Cloud Versatility
[2] Azure AI Supercomputing VMs: AI at Any Scale
[3] Cloud Native Supercomputing Next Phase: Multi-tenant Performance Isolation