At the 2020 Global Digital Supercomputing Conference (SC20 Conference), NVIDIA announced the launch of NVIDIA® Mellanox® 400G InfiniBand product, which is the world's first generation of end-to-end network solution with 400Gb/s network speed, which can provide the fastest network interconnection performance for AI and HPC users around the world, while successfully integrating computing, programmability and software definition. It has become the leading software-defined and hardware-accelerated programmable network in the industry, providing new ideas for researchers and engineers around the world to design a new generation of computing systems and improve application performance.NVIDIA Mellanox InfiniBand NDR product is the 7th generation InfiniBand product. By using 100Gb/s PAM4 Serdes technology, the single-port transmission bandwidth of 400Gb/s is achieved, which is twice that of the previous generation product. At the same time, by adding more and more powerful acceleration engines, more powerful computing and communication capabilities are realized.Nvidia mellanox NDR 400g InfiniBand product line"Speed Of Light" is the first feature of NDR InfiniBand technology. By doubling the bandwidth and faster packet processing rate, the application performance based on advanced communication technologies such as RDMA, GPU Direct RDMA and GPU Direct Storage has been further improved. InfiniBand network is a natural SDN network, which allows users to choose different network topologies according to the attributes of applications to achieve the best performance, such as Fat-Tree, DragonFly+, various Torus, etc. For example, through DragonFly+ network topology, one million nodes can communicate at the same time in four switch hops, which is far beyond the scale required by E-class machines, even 10E-class machines or 10E-class machines; At the same time, its natural SDN attribute makes dynamic routing and network congestion control easier to realize. InfiniBand dynamic routing has been widely used in various network topologies and has become a key means to optimize communication performance. For example, the Summit supercomputer of Oak Forest National Laboratory in the United States has improved the communication efficiency of the supercomputer center from 60% to 96% through dynamic routing. NDR InfiniBand switch can support 64 ports of 400Gb/s or 128 ports of 200Gb/s in 1U space, which is 3 times of the port density of the previous generation switch. It also increases the aggregate bidirectional throughput of the frame switch system by five times, reaching 1.64 petabits/s, making it the switch with the largest number of ports and the largest switching capacity in the world.Hardware acceleration is the biggest feature of InfiniBand network. As more and more acceleration engines are added to InfiniBand hardware, it further increases its lead over other network technologies. For example, NDR InfiniBand has realized the hardware uninstallation of All2All and Allreduce communication, which is the biggest headache in the industry, and can improve the performance of MPI communication by four times. NDR InfiniBand improves the MPI communication performance by 1.8 times for the hardware unloading of MPI Tag Matching. NDR InfiniBand can completely uninstall NVMEEOF, Target uninstall of NVMEEOF can make the storage system reach millions of IOPS with almost no CPU consumption at the Target end, and NVME SNAP can uninstall the Initiator end of NVMEEOF. At the same time, InfiniBand network can be simulated as NVMe disk and provided to the host CPU, which can solve the problem that many OS have no Initiator support of NVMeoF at present, and realize the comprehensive support of NVMeoF for any OS, whether virtualization or physical machine; InfiniBand FIO SNAP can realize local simulation of file storage, so that any OS can enjoy the performance advantages of the most advanced distributed file storage system.InfiniBand Sharp (Scalable Hierarchical Aggregation and Reduction Protocol) technology completely eliminates the Incast Burst problem caused by multi-hit-one communication in Allreduce operation of MPI or NCCL. On the premise of ensuring the full line speed of all ports and the data input of 12.8Tb/s or 25.6Tb/s in total, AllReduce, Barrier, reduce and Broadcast calculations on the switch are realized, and the calculation performance on the NDR switch is improved by 32 times compared with the previous generation switch. InfiniBand SHIELD(Self Healing) technology realizes the self-repair of the link fault in the network, so that the network does not need to wait for the participation of management software to recover the link fault, and the performance is more than 1,000 times faster than the traditional software fault recovery, so that your application is no longer disturbed by the link fault and the performance of the application is improved.InfiniBand security uninstallation is an application scenario for Cloud Native. InfiniBand has been supported by Open Stack's official software. With its own hardware functions such as IPSec, TLS, AES and Root Of Trust, data can be encrypted and decrypted with line-speed performance when it flows in the network or when it is dumped into storage, thus realizing security in a virtualized environment or a containerized environment.Software programmability has further extended the application scenario of InfiniBand. Programmable NDR InfiniBand can not only allow users to handle the header of data, but also operate the data path of data. For example, users can customize rules to operate the data path. Or preprocess the data directly in the network without sending it to CPU for preprocessing. Users can also extract the communication characteristics of data, and then use AI technology to train them to get the general communication characteristics of different application data. If abnormal communication information is found, they can actively send an early warning to the administrator.Nvidia mellanox NDR 400g InfiniBand HighlightsNDR InfiniBand attracts many partners to build an ecosystem with its excellent performance and flexible usage scenarios, including server vendors such as Atos, Dell Technology, Fujitsu, Inspur, Lenovo and SuperMicro, as well as storage vendors such as DDN and IBM Storage. Companies have begun to develop their next-generation products to support NDR InfiniBand. The top users in the world, including Microsoft Azure Public Cloud, Los Alamos National Laboratory in the United States, and European Jülich Supercomputing Center, have expressed their expectation to apply NDR InfiniBand to their business as soon as possible and enjoy the technological advantages of NDR.Gilad Shainer, senior vice president of NVIDIA Network, said: "The most important job for our AI customers is to deal with increasingly complex applications, which requires a faster, smarter and more scalable network. The massive throughput and intelligent acceleration engine of NVIDIA Mellanox 400G InfiniBand help HPC, AI and hyperscale cloud infrastructure achieve unparalleled performance at lower cost and complexity.The era of E-class AI and HPC has come, which brings new challenges. The programmable NDR InfiniBand products with software definition, hardware acceleration and network computing will provide samples in the second quarter of 2021. The appearance of NDR products will greatly improve the performance and efficiency of E-class AI and HPC systems, simplify the management and operation of the systems, and reduce the TCO of the systems, thus protecting the investment of data centers.