RTGWG Working Group R. Wang Internet Draft China Mobile Intended status: Informational C. Lin Expires: September 3,2024 New H3C Technologies W. Wang China Mobile W. Cheng China Mobile March 3, 2024 Routing mechanism in Dragonfly Networks Gap Analysis, Problem Statement, and Requirements draft-wang-rtgwg-dragonfly-routing-problem-01 Abstract This document provides the gap analysis of existing routing mechanism in dragonfly networks, describes the fundamental problems, and defines the requirements for technical improvements. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 3 2024. Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this wang, et al. Expire September 3, 2024 [Page 1] Internet-Draft Dragonfly Routing Problem Statement March 2024 document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction...................................................3 1.1. Requirements Language.....................................3 1.2. Terminology...............................................3 2. Existing Mechanisms............................................4 2.1. Basic Topology............................................4 2.2. Routing mechanisms in Dragonfly network...................5 3. Gap Analysis...................................................6 3.1. Load In balance...........................................6 3.2. Adaptive Routing Notifications............................6 4. Problem Statement..............................................8 5. Requirements for Dragonfly network Mechanisms..................8 6. Security Considerations........................................9 7. IANA Considerations............................................9 8. References....................................................10 8.1. Normative References.....................................10 8.2. Informative References...................................10 Authors' Addresses...............................................11 cheng, et al. Expires September 3, 2024 [Page 2] Internet-Draft Dragonfly Routing Problem Statement March 2024 1. Introduction Dragonfly network is a type of high-performance computer interconnection network architecture that is commonly used in large- scale computing environments. It consists of a collection of interconnected groups, with each group containing several computing resources such as processors, storage devices, and nodes. The nodes within each group communicate with each other using a high-speed local network, while the groups themselves are connected through a global network. Dragonfly networks are designed to provide high bandwidth and low latency communication capabilities, making them ideal for applications that require large-scale data processing and intensive computing tasks. Overall, dragonfly networks offer a scalable, efficient, and flexible solution for connecting hundreds or even thousands of computing resources in a parallel computing environment. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 1.2. Terminology Group: In a group, multiple nodes are organized into a physical topology structure and interconnected by a high-speed network. Inter-group link: Link connecting different groups. Routing: The path or strategy that data packets take to transmit through the network. Topology: The physical and logical layout structure of the network. Dragonfly network is a type of topology. Routing algorithm: The algorithm that determines the path or strategy for data packets to transmit through the network. Congestion control: When there is too much traffic in the network, adjusting the transmission rate and routing method, etc., to avoid network congestion. MR : Minimal Routing NMR Non-Minimal Routing wang, et al. Expires September 3, 2024 [Page 3] Internet-Draft Dragonfly Routing Problem Statement March 2024 AR: Adaptive Routing VLB: Valiant Load-Balanced Routing 2. Existing Mechanisms 2.1. Basic Topology N N N N N N N N N N N N N N N N N N | | | | | | | | | | | | | | | | | | ++-+-+-+-+-++ ++-+-+-+-+-++ ++-+-+-+-+-++ | G1 | | G2 |...| G8 | +-+---+----++ ++----+----++ ++---+----+-+ | | | | | | | | | | | +----+ | +-----+ | | | +--------------)--------------+ | +-+------------------+-------------------+-+ | +------------------------------+ | | | | G0 | | +-+-+ +---+ +-+-+ | | |R0 +----------+R1 +-----------+ R2| | | ++-++ ++-++ ++-++ | | | | | | | | | +--)-)------------)-)-------------)-)------+ | | | | | | N N N N N N Figure 1: DragonFly network diagram In the DragonFly network shown in Figure 1, there are a total of 9 groups, with each group consisting of 3 routers (G). Each router is connected to 2 nodes (N). The groups in the DragonFly network are connected through inter-group links. The routers within each group, as well as between routers and nodes, are connected through high- speed links within the group. For data communication within a group, it is typically sufficient to forward traffic only through the links within the group. For data communication between groups, traffic needs to be forwarded through both the links within each group and the inter-group links. The specific path selection in the Dragonfly network is typically determined by the routing protocol used in the network. The routing protocol is responsible for dynamically determining the best path for data packets to travel from the source to the destination. Various topologies can be used to form the intra-group connectivity.A typical intra-group topology is a fully connected wang, et al. Expires September 3, 2024 [Page 4] Internet-Draft Dragonfly Routing Problem Statement March 2024 graphwhere all switches are directly connected to each other. An exampleof such an intra-group topology is shown in the G0 group in Figure 1. The intra-group connectivity in the Cascade architectureis a 2-dimensional all-to-all mesh. 2.2. Routing mechanisms in Dragonfly network This section briefly introduces the existing routing mechanisms in dragonfly networks. Dragonfly networks use several routing mechanisms, each with its own advantages and disadvantages. Here are some brief overviews of several common routing mechanisms: o Minimal Routing is the simplest and most commonly used routing mechanism in Dragonfly networks. It uses the path with the least number of channels to quickly deliver data to the destination node. The advantage of MR is that it is easy to implement and has low latency. However, since MR only focuses on the path with the least number of channels, the risk of load imbalance is relatively high. o Non-Minimal Routing is a routing mechanism that avoids load imbalance by choosing a path other than the one with the least number of channels. The advantage of NMR is that the routing algorithm is intelligent and flexible, able to balance the load of network communication and reduce latency. However, NMR is more complex, requiring more computational resources and communication overhead. o Adaptive Routing is a mechanism that can dynamically adjust the routing path by intelligently judging the network congestion status. AR's strengths lie in its adaptability, which can control traffic in high-load situations and prevent congestion. The disadvantage is that its implementation is complex and requires more sophisticated algorithms and computational resources. o Valiant Load-Balanced Routing uses the classic Valiant algorithm to select paths between global routing networks and then uses load-balancing routing algorithms between each group. The advantage of VLB is that it can achieve load balancing across the network range. The disadvantage is that it is complex, requiring more resources and computational costs. Overall, the choice of routing mechanism in dragonfly networks requires a balance between performance, cost, and other factors and depends on specific application scenarios and requirements. wang, et al. Expires September 3, 2024 [Page 5] Internet-Draft Dragonfly Routing Problem Statement March 2024 3. Gap Analysis 3.1. Load In balance When the Dragonfly network routes through the minimum-route mechanism, the problem of load imbalance is easy to occur because the routing path is fixed and the communication volume between different groups in Dragonfly network may not be the same. When load imbalance occurs, the group with larger communication volume may be overly congested, affecting the overall performance of the network. We need a load balancing mechanism that can distribute the load between optimal and non-optimal links to avoid congestion on the main link. There are several load balancing mechanisms that can achieve this goal. One common approach is to use a combination of Equal-Cost Multi-Path routing and Link Aggregation. ECMP distributes the traffic across multiple paths based on their cost, while Link Aggregation combines multiple physical links into a single logical link to increase bandwidth and provide redundancy. However, non-minimum-route mechanism is required to calculate the distance of all possible paths, which requires more communication and computational resources, and cannot completely avoid the problem of load imbalance. Adaptive routing mechanism can dynamically adjust the routing path according to the network congestion situation, making the network more adaptable to different traffic. However, the computational cost of this mechanism is high, and it occupies some of the bandwidth of the network, which may affect the performance of applications. Valiant load-balancing routing algorithm can achieve load balancing across the entire network, but its design and implementation are complex and require more computing and communication resources. Although it can improve routing reliability and fault tolerance, it may not be necessary to adopt this mechanism in small-scale networks. Due to the random network topology used in the Dragonfly network, the distance between each node internally is random, which may cause some unnecessary redundancies in routing and affect routing efficiency. 3.2. Adaptive Routing Notifications The dynamic adjustment of flow paths based on the load situation is a traffic scheduling algorithm that can dynamically choose the wang, et al. Expires September 3, 2024 [Page 6] Internet-Draft Dragonfly Routing Problem Statement March 2024 optimal flow path based on the load of nodes (or links) in the network to transmit data packets. This algorithm usually uses two techniques: one is based on traffic measurement, and the other is based on protocol exchange between routers or switches. The traffic measurement-based technology measures the traffic in the network using network analysis tools or dedicated hardware embedded in routers or switches. Once some nodes or links with high loads are detected, the flow path can be automatically adjusted to alleviate the load. On the other hand, the protocol exchange-based technology relies on protocol communication between routers or switches to obtain the current network topology and node load data, and flow paths can be adjusted accordingly based on this information. In this way, network administrators can ensure that there is always the best data flow path at any time, thereby maximizing network performance, reducing latency, and avoiding network congestion. At the same time, dynamic adjustment of flow paths can also provide robustness to the network, enabling it to automatically adapt to adverse events such as changes in network topology and node failures. Whether based on traffic testing or protocol exchange between routers or switches, devices need to be able to communicate the current network performance in a quantitative manner. Traffic testing technology requires the use of network analysis tools or dedicated hardware embedded in routers or switches to measure traffic in the network and obtain information about node load. These node load data needs to be translated into digital data and sent to the control plane through protocols or interfaces such as SNMP (Simple Network Management Protocol), Netflow, etc. Protocol exchange technology uses protocol communication between routers or switches, such as OpenFlow, IS-IS, etc., to obtain the current network topology and node load information through the control plane. These information are often encoded into digital formats and transmitted to the operation plane through network transmission protocols. Adaptive routing notifications are a communication protocol used to relay routing information and network load in a network. These notifications can be messages between nodes or between switches and routers. In the Dragonfly network, adaptive routing notifications are utilized to implement adaptive routing mechanisms. When the network wang, et al. Expires September 3, 2024 [Page 7] Internet-Draft Dragonfly Routing Problem Statement March 2024 load reaches a certain level, nodes and switches use notifications to dynamically choose routing paths. For example, during network congestion, switches and routers send notifications to prompt nodes to redirect traffic to different ports or nodes. These notifications can also include other information about network congestion and load balancing, such as bandwidth usage, device load and performance, and traffic rates. The benefits of using adaptive routing notifications in the Dragonfly network are that they enable real-time adjustments of routing paths for nodes and switches, avoiding congestion and improving network performance. Additionally, adaptive routing notifications help network administrators identify and resolve network issues more easily, such as pinpointing congestion points and routing bottlenecks. In summary, adaptive routing notifications play a significant role in the Dragonfly network and are a crucial component in implementing adaptive routing mechanisms. Regardless of the approach, communication between devices needs to be standardized and routinized to achieve self-adaptation and interoperability across devices. Standardized and routinized communication between devices is critical to building adaptive networks. 4. Problem Statement The current problem with the Dragonfly network is the lack of a concise and effective routing protocol for load balancing between optimal and non-optimal links. Another problem is that for dynamic load balancing, it is necessary to standardize how network performance is quantified and communicated in a quantitative manner. This requires standardization. 5. Requirements for Dragonfly network Mechanisms In the Dragonfly architecture, the routing protocol is a crucial component that guides packet transmission and route selection. Here are several aspects that the routing protocol in the Dragonfly architecture requires: * Low latency: Low latency is essential in the Dragonfly architecture. Therefore, the routing protocol must be fast and wang, et al. Expires September 3, 2024 [Page 8] Internet-Draft Dragonfly Routing Problem Statement March 2024 efficient to ensure that packets are transmitted to the destination node promptly. * Load balancing: Load balancing is important in the Dragonfly architecture, and the routing protocol needs to support multiple available paths for load balancing. The routing protocol should dynamically select among multiple available paths to ensure fast packet transmission and distribute the load across network connections. * Scalability: The Dragonfly architecture is typically deployed at large scale with a large number of nodes communicating with each other. Hence, the routing protocol needs to be scalable and capable of supporting route selection and packet transmission among a large number of nodes. * Adaptability: The network topology in the Dragonfly architecture can change over time. The routing protocol needs to be adaptive and capable of re-computing optimal paths when the network topology changes, ensuring the selection of the best path for packet transmission. * Reliability: The routing protocol in the Dragonfly architecture needs to ensure packet reliability. It should support link failure detection and recovery to ensure that packets can be correctly transmitted to the destination node in the event of link failures. In summary, the routing protocol is a critical component in the Dragonfly architecture, requiring support for low latency, load balancing, scalability, adaptability, and reliability. Only with these requirements fulfilled can the routing protocol reliably operate in the Dragonfly architecture and provide efficient support for network communication. 6. Security Considerations TBD. 7. IANA Considerations This document does not request any IANA allocations. wang, et al. Expires September 3, 2024 [Page 9] Internet-Draft Dragonfly Routing Problem Statement March 2024 8. References 8.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . 8.2. Informative References TBD wang, et al. Expires September 3, 2024 [Page 10] Internet-Draft Dragonfly Routing Problem Statement March 2024 Authors' Addresses Ruixue Wang China Mobile China Email: wangruixue@chinamobile.com Changwang Lin New H3C Technologies China Email: linchangwang.04414@h3c.com Wenxuan Wang China Mobile China Email: wangwenxuan@chinamobile.com Weiqiang Cheng China Mobile China Email: chengweiqiang@chinamobile.com wang, et al. Expires September 3, 2024 [Page 11]