This paper discusses the up/down InfiniBand routing algorithm. This post is still relatively basic. However, readers should have a good understanding of the network and be familiar with the concept of InfiniBand.Multiple InfiniBand routing engines can be configured on the network, such as Min Hop, Up Down, Down Up, Fat Tree, etc. (see opensm). In Clos/fat tree networks, the most commonly used InfiniBand routing algorithms are Up/Down(UpDn) and Fat Tree.Note: This includes a tree structure built with the main switch and the 1U switch. The two levels of the physical switch shell represent three switch ASIC layers, because each main switch contains two ASIC layers.1. Up/Down(UpDn) routing algorithmLike most InfiniBand routing algorithms, UpDn uses the shortest path that can be used between any two endpoints. It can route any set of InfiniBand switches and HCAs. Unlike MinHop, UpDn can guarantee that there is no credit loop in fabric. UpDn starts with a list of "root" or top-level fabric composed of switch ASIC. This list is set by the subnet manager (SM) flag-root _ guid _ file. This is a simple text file with one line for each globally unique ID(GUID) root ASIC. Although UpDn has the option of automatically discovering the root ASIC, it is strongly recommended to provide a root_guid_file. If the root switch ASIC or topology extension is replaced, the root_guid_file must be updated, and each SM must have the same copy of the GUID file.The UpDn algorithm starts with the ASIC of the root switch, which we call zero. Then, the algorithm finds each switch ASIC that is one link away from the root switch. These ASIC can be regarded as distance 1 because they are one hop away from the root switch. Then, the algorithm finds all switch ASIC two hops away from the root switch, which can be regarded as distance 2. This process continues until each switch ASIC is assigned a distance from root. The following figure shows an example of a 3-layer fabric with an assigned distance.

This process generates a breadth-first spanning tree (BFSP), which is similar to the Spanning Tree Protocol (STP) method used in Ethernet. Unlike STP, UpDn allows multiple root and tries to provide as many paths as possible between each pair of terminal nodes. Then, UpDn algorithm finds all possible shortest paths between each pair of terminal nodes. Next, UpDn discards the hop from the distance N ASIC to the distance N+1 ASIC, and then jumps back to any path at the distance n.. In other words, it discards any path that goes "down" (away from roots) and then "up" (towards roots). Legal paths can be up, down, or up and then down, or remain at the same level, but not down and then up. By discarding these paths and not configuring them in the switch, UpDn ensures that there are no logical loops and credit loops in the route that may cause traffic stagnation.The following figure shows examples of allowed and disallowed paths:

Note: The two potential paths between nodes E and F have the same length (the same number of hops), but only one of them obeys UpDn rules. The path that is not allowed contains a DnUp segment.The free credit loop of UpDn (and Fat Tree) routing topology is very important for reliable network operation.However, because some potential paths are discarded, there is a situation that a pair of terminal nodes may be disconnected and unable to communicate with each other.Setting the calculate_missing_routes option to TRUE (the default value) in the opensm configuration file will use UPDN and Fat Tree routing in a free way of credit loop, thus ensuring the connectivity between all terminals in the fabric.For example, consider a different fabric in which nodes are connected to the "above" (nodes G, H and J) of the leaf switch. Nodes connected to L1 switches (A, B, C, etc.) have legal UpDn paths with nodes G, H and J.. There is a legal UpDn path between nodes g and H. However, there is no legal path between G and J, and these nodes will not be able to communicate with each other. Setting calculate_missing_routes to TRUE will provide all terminals with routes with free credit loops.
There may be cases where some nodes do not need to communicate with each other (for example, storage nodes that do not communicate with each other). However, this situation is rare. For Clos-5 three-layer fabric, the best practice is not to connect nodes to L2 switches.Note: The above chart also applies to two different situations: a fabric built by a three-layer 1U switch and a fabric using two main switches and 1U switches below them. In the latter case, nodes E, F and G represent nodes connected to the leaf submodule of the main switch.2. Discrete portsWhen assigning logical paths to physical links, UpDn algorithm tries to map the same number of paths to each link to maximize the utilization of available bandwidth. This balance is done statically, and there is no knowledge about the actual workload and traffic patterns. The path balancing decision is made locally on each switch, and nothing about the physical topology is assumed. The resulting path allocation may not be optimal for a typical Clos/Fat Tree workload.Both MinHop and UpDn routing engines provide a routing option called "scatter-ports". It instructs the routing algorithm to randomize the local assignment of paths to links, which usually leads to better link utilization. The scatter-ports option requires an integer parameter to generate the seed of a random number. It is suggested to use prime numbers as seeds; A seed of zero will turn off randomization.Note: scatter-ports configuration is only available on SM running on the host (or UFM), and it is not supported if SM is running on the switch.