| rfc9623.original | rfc9623.txt | |||
|---|---|---|---|---|
| TAPS Working Group A. Brunstrom, Ed. | Internet Engineering Task Force (IETF) A. Brunstrom, Ed. | |||
| Internet-Draft Karlstad University | Request for Comments: 9623 Karlstad University | |||
| Intended status: Informational T. Pauly, Ed. | Category: Informational T. Pauly, Ed. | |||
| Expires: 16 June 2024 Apple Inc. | ISSN: 2070-1721 Apple Inc. | |||
| R. Enghardt | R. Enghardt | |||
| Netflix | Netflix | |||
| P. Tiesel | P.S. Tiesel | |||
| SAP SE | SAP SE | |||
| M. Welzl | M. Welzl | |||
| University of Oslo | University of Oslo | |||
| 14 December 2023 | December 2024 | |||
| Implementing Interfaces to Transport Services | Implementing Interfaces to Transport Services | |||
| draft-ietf-taps-impl-18 | ||||
| Abstract | Abstract | |||
| The Transport Services system enables applications to use transport | The Transport Services system enables applications to use transport | |||
| protocols flexibly for network communication and defines a protocol- | protocols flexibly for network communication and defines a protocol- | |||
| independent Transport Services Application Programming Interface | independent Transport Services Application Programming Interface | |||
| (API) that is based on an asynchronous, event-driven interaction | (API) that is based on an asynchronous, event-driven interaction | |||
| pattern. This document serves as a guide to implementing such a | pattern. This document serves as a guide to implementing such a | |||
| system. | system. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
| provisions of BCP 78 and BCP 79. | published for informational purposes. | |||
| Internet-Drafts are working documents of the Internet Engineering | ||||
| Task Force (IETF). Note that other groups may also distribute | ||||
| working documents as Internet-Drafts. The list of current Internet- | ||||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
| and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
| time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
| material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Not all documents | |||
| approved by the IESG are candidates for any level of Internet | ||||
| Standard; see Section 2 of RFC 7841. | ||||
| This Internet-Draft will expire on 16 June 2024. | Information about the current status of this document, any errata, | |||
| and how to provide feedback on it may be obtained at | ||||
| https://www.rfc-editor.org/info/rfc9623. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2023 IETF Trust and the persons identified as the | Copyright (c) 2024 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
| extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
| described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
| provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
| in the Revised BSD License. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
| 2. Implementing Connection Objects . . . . . . . . . . . . . . . 4 | 2. Implementing Connection Objects | |||
| 3. Implementing Pre-Establishment . . . . . . . . . . . . . . . 5 | 3. Implementing Preestablishment | |||
| 3.1. Configuration-time errors . . . . . . . . . . . . . . . . 5 | 3.1. Configuration-Time Errors | |||
| 3.2. Role of system policy . . . . . . . . . . . . . . . . . . 6 | 3.2. Role of System Policy | |||
| 4. Implementing Connection Establishment . . . . . . . . . . . . 7 | 4. Implementing Connection Establishment | |||
| 4.1. Structuring Candidates as a Tree . . . . . . . . . . . . 9 | 4.1. Structuring Candidates as a Tree | |||
| 4.1.1. Branch Types . . . . . . . . . . . . . . . . . . . . 10 | 4.1.1. Branch Types | |||
| 4.1.2. Branching Order-of-Operations . . . . . . . . . . . . 13 | 4.1.2. Branching Order-of-Operations | |||
| 4.1.3. Sorting Branches . . . . . . . . . . . . . . . . . . 14 | 4.1.3. Sorting Branches | |||
| 4.2. Candidate Gathering . . . . . . . . . . . . . . . . . . . 16 | 4.2. Candidate Gathering | |||
| 4.2.1. Gathering Endpoint Candidates . . . . . . . . . . . . 16 | 4.2.1. Gathering Endpoint Candidates | |||
| 4.3. Candidate Racing . . . . . . . . . . . . . . . . . . . . 17 | 4.3. Candidate Racing | |||
| 4.3.1. Simultaneous . . . . . . . . . . . . . . . . . . . . 18 | 4.3.1. Simultaneous | |||
| 4.3.2. Staggered . . . . . . . . . . . . . . . . . . . . . . 18 | 4.3.2. Staggered | |||
| 4.3.3. Failover . . . . . . . . . . . . . . . . . . . . . . 19 | 4.3.3. Failover | |||
| 4.4. Completing Establishment . . . . . . . . . . . . . . . . 19 | 4.4. Completing Establishment | |||
| 4.4.1. Determining Successful Establishment . . . . . . . . 20 | 4.4.1. Determining Successful Establishment | |||
| 4.5. Establishing multiplexed connections . . . . . . . . . . 21 | 4.5. Establishing Multiplexed Connections | |||
| 4.6. Handling connectionless protocols . . . . . . . . . . . . 22 | 4.6. Handling Connectionless Protocols | |||
| 4.7. Implementing Listeners . . . . . . . . . . . . . . . . . 22 | 4.7. Implementing Listeners | |||
| 4.7.1. Implementing Listeners for Connected Protocols . . . 22 | 4.7.1. Implementing Listeners for Connected Protocols | |||
| 4.7.2. Implementing Listeners for Connectionless | 4.7.2. Implementing Listeners for Connectionless Protocols | |||
| Protocols . . . . . . . . . . . . . . . . . . . . . . 23 | 4.7.3. Implementing Listeners for Multiplexed Protocols | |||
| 4.7.3. Implementing Listeners for Multiplexed Protocols . . 23 | 5. Implementing Sending and Receiving Data | |||
| 5. Implementing Sending and Receiving Data . . . . . . . . . . . 23 | 5.1. Sending Messages | |||
| 5.1. Sending Messages . . . . . . . . . . . . . . . . . . . . 24 | 5.1.1. Message Properties | |||
| 5.1.1. Message Properties . . . . . . . . . . . . . . . . . 24 | 5.1.2. Send Completion | |||
| 5.1.2. Send Completion . . . . . . . . . . . . . . . . . . . 26 | 5.1.3. Batching Sends | |||
| 5.1.3. Batching Sends . . . . . . . . . . . . . . . . . . . 26 | 5.2. Receiving Messages | |||
| 5.2. Receiving Messages . . . . . . . . . . . . . . . . . . . 26 | 5.3. Handling of Data for Fast-Open Protocols | |||
| 5.3. Handling of data for fast-open protocols . . . . . . . . 27 | 6. Implementing Message Framers | |||
| 6. Implementing Message Framers . . . . . . . . . . . . . . . . 28 | 6.1. Defining Message Framers | |||
| 6.1. Defining Message Framers . . . . . . . . . . . . . . . . 29 | 6.2. Sender-Side Message Framing | |||
| 6.2. Sender-side Message Framing . . . . . . . . . . . . . . . 30 | 6.3. Receiver-Side Message Framing | |||
| 6.3. Receiver-side Message Framing . . . . . . . . . . . . . . 31 | 7. Implementing Connection Management | |||
| 7. Implementing Connection Management . . . . . . . . . . . . . 32 | 7.1. Pooled Connection | |||
| 7.1. Pooled Connection . . . . . . . . . . . . . . . . . . . . 33 | 7.2. Handling Path Changes | |||
| 7.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 33 | 8. Implementing Connection Termination | |||
| 8. Implementing Connection Termination . . . . . . . . . . . . . 35 | 9. Cached State | |||
| 9. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 35 | 9.1. Protocol State Caches | |||
| 9.1. Protocol state caches . . . . . . . . . . . . . . . . . . 35 | 9.2. Performance Caches | |||
| 9.2. Performance caches . . . . . . . . . . . . . . . . . . . 36 | 10. Specific Transport Protocol Considerations | |||
| 10. Specific Transport Protocol Considerations . . . . . . . . . 37 | 10.1. TCP | |||
| 10.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . 38 | 10.2. MPTCP | |||
| 10.2. MPTCP . . . . . . . . . . . . . . . . . . . . . . . . . 40 | 10.3. UDP | |||
| 10.3. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . 40 | 10.4. UDP-Lite | |||
| 10.4. UDP-Lite . . . . . . . . . . . . . . . . . . . . . . . . 42 | 10.5. UDP Multicast Receive | |||
| 10.5. UDP Multicast Receive . . . . . . . . . . . . . . . . . 42 | 10.6. SCTP | |||
| 10.6. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 44 | 11. IANA Considerations | |||
| 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 46 | 12. Security Considerations | |||
| 12. Security Considerations . . . . . . . . . . . . . . . . . . . 46 | 12.1. Considerations for Candidate Gathering | |||
| 12.1. Considerations for Candidate Gathering . . . . . . . . . 47 | 12.2. Considerations for Candidate Racing | |||
| 12.2. Considerations for Candidate Racing . . . . . . . . . . 47 | 13. References | |||
| 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 47 | 13.1. Normative References | |||
| 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 48 | 13.2. Informative References | |||
| 14.1. Normative References . . . . . . . . . . . . . . . . . . 48 | Appendix A. API Mapping Template | |||
| 14.2. Informative References . . . . . . . . . . . . . . . . . 49 | Appendix B. Reasons for Errors | |||
| Appendix A. API Mapping Template . . . . . . . . . . . . . . . . 51 | Appendix C. Existing Implementations | |||
| Appendix B. Reasons for errors . . . . . . . . . . . . . . . . . 52 | Acknowledgements | |||
| Appendix C. Existing Implementations . . . . . . . . . . . . . . 53 | Authors' Addresses | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 54 | ||||
| 1. Introduction | 1. Introduction | |||
| The Transport Services architecture [I-D.ietf-taps-arch] defines a | The Transport Services architecture [RFC9621] defines a system that | |||
| system that allows applications to flexibly use transport networking | allows applications to flexibly use transport networking protocols. | |||
| protocols. The API that such a system exposes to applications is | The API that such a system exposes to applications is defined as the | |||
| defined as the Transport Services API [I-D.ietf-taps-interface]. | Transport Services API [RFC9622]. This API is designed to be generic | |||
| This API is designed to be generic across multiple transport | across multiple transport protocols and sets of protocol features. | |||
| protocols and sets of protocol features. | ||||
| This document serves as a guide to implementing a system that | This document serves as a guide to implementing a system that | |||
| provides a Transport Services API. This guide offers suggestions to | provides a Transport Services API. This guide offers suggestions to | |||
| developers, but it is not prescriptive: implementations are free to | developers, but it is not prescriptive: implementations are free to | |||
| take any desired form as long as the API specification in | take any desired form as long as the API specification defined in | |||
| [I-D.ietf-taps-interface] is honored. It is the job of an | [RFC9622] is honored. It is the job of an implementation of a | |||
| implementation of a Transport Services system to turn the requests of | Transport Services system to turn the requests of an application into | |||
| an application into decisions on how to establish connections, and | decisions on how to establish connections and how to transfer data | |||
| how to transfer data over those connections once established. The | over those connections once established. The terminology used in | |||
| terminology used in this document is based on the Transport Services | this document is based on the terminology defined in the Transport | |||
| architecture [I-D.ietf-taps-arch]. | Services architecture [RFC9621]. | |||
| 2. Implementing Connection Objects | 2. Implementing Connection Objects | |||
| The connection objects that are exposed to applications for Transport | The Connection objects that are exposed to applications for Transport | |||
| Services are: | Services are: | |||
| * the Preconnection, the bundle of properties that describes the | * the Preconnection, the bundle of properties that describes the | |||
| application constraints on, and preferences for, the transport; | application constraints on, and preferences for, the transport; | |||
| * the Connection, the basic object that represents a flow of data as | * the Connection, the basic object that represents a flow of data as | |||
| Messages in either direction between the Local and Remote | Messages in either direction between the Local and Remote | |||
| Endpoints; | Endpoints; | |||
| * and the Listener, a passive waiting object that delivers new | * and the Listener, a passive waiting object that delivers new | |||
| Connections. | Connections. | |||
| Preconnection objects should be implemented as bundles of properties | Preconnection objects should be implemented as bundles of properties | |||
| that an application can both read and write. A Preconnection object | that an application can both read and write. A Preconnection object | |||
| influences a Connection only at one point in time: when the | influences a Connection only at one point in time: when the | |||
| Connection is created. Connection objects represent the interface | Connection is created. Connection objects represent the interface | |||
| between the application and the implementation to manage transport | between the application and the implementation to manage transport | |||
| state, and conduct data transfer. During the process of | state and conduct data transfer. During the process of establishment | |||
| establishment (Section 4), the Connection will not necessarily be | (Section 4), the Connection will not necessarily be immediately bound | |||
| immediately bound to a transport protocol instance, since multiple | to a transport protocol instance, since multiple candidate Protocol | |||
| candidate Protocol Stacks might be raced. | Stacks might be raced. | |||
| Once a Preconnection has been used to create an outbound Connection | Once a Preconnection has been used to create an outbound Connection | |||
| or a Listener, the implementation should ensure that the copy of the | or a Listener, the implementation should ensure that the copy of the | |||
| properties held by the Connection or Listener cannot be mutated by | properties held by the Connection or Listener cannot be mutated by | |||
| the application making changes to the original Preconnection object. | the application making changes to the original Preconnection object. | |||
| This may involve the implementation performing a deep-copy, copying | This may involve the implementation performing a deep-copy, copying | |||
| the object with all the objects that it references. | the object with all the objects that it references. | |||
| Once the Connection is established, the Transport Services | Once the Connection is established, the Transport Services | |||
| Implementation maps actions and events to the details of the chosen | Implementation maps actions and events to the details of the chosen | |||
| Protocol Stack. For example, the same Connection object may | Protocol Stack. For example, the same Connection object may | |||
| ultimately represent a single transport protocol instance (e.g., a | ultimately represent a single transport protocol instance (e.g., a | |||
| TCP connection, a TLS session over TCP, a UDP flow with fully- | TCP connection, a TLS session over TCP, a UDP flow with fully | |||
| specified Local and Remote Endpoint Identifiers, a DTLS session, a | specified Local and Remote Endpoint Identifiers, a DTLS session, a | |||
| SCTP stream, a QUIC stream, or an HTTP/2 stream). The Connection | Stream Control Transmission Protocol (SCTP) stream, a QUIC stream, or | |||
| Properties held by a Connection or Listener are independent of other | an HTTP/2 stream). The Connection Properties held by a Connection or | |||
| Connections that are not part of the same Connection Group. | Listener are independent of other Connections that are not part of | |||
| the same Connection Group. | ||||
| Connection establishment is only a local operation for a | Connection establishment is only a local operation for connectionless | |||
| connectionless protocols, which serves to simplify the local send/ | protocols, which serves to simplify the local send/receive functions | |||
| receive functions and to filter the traffic for the specified | and to filter the traffic for the specified addresses and ports | |||
| addresses and ports [RFC8085] (for example using UDP or UDP-Lite | [RFC8085] (for example, using UDP or UDP-Lite transport without a | |||
| transport without a connection handshake procedure). | connection handshake procedure). | |||
| Once Initiate has been called, the Selection Properties and Endpoint | Once Initiate has been called, the Selection Properties and Endpoint | |||
| information of the created Connection are immutable (i.e, an | information of the created Connection are immutable (i.e., an | |||
| application is not able to later modify the properties of a | application is not able to later modify the properties of a | |||
| Connection by manipulating the original Preconnection object). | Connection by manipulating the original Preconnection object). | |||
| Listener objects are created with a Preconnection, at which point | Listener objects are created with a Preconnection, at which point | |||
| their configuration should be considered immutable by the | their configuration should be considered immutable by the | |||
| implementation. The process of listening is described in | implementation. The process of listening is described in | |||
| Section 4.7. | Section 4.7. | |||
| 3. Implementing Pre-Establishment | 3. Implementing Preestablishment | |||
| The pre-establishment phase allows applications to specify properties | The preestablishment phase allows applications to specify properties | |||
| for the Connections that they are about to make, or to query the API | for the Connections that they are about to make or to query the API | |||
| about potential Connections they could make. | about potential Connections they could make. | |||
| During pre-establishment the application specifies one or more | During preestablishment, the application specifies one or more | |||
| Endpoints to be used for communication as well as protocol | Endpoints to be used for communication as well as protocol | |||
| preferences and constraints via Selection Properties and, if desired, | preferences and constraints via Selection Properties and, if desired, | |||
| also Connection Properties. Section 4 of [I-D.ietf-taps-interface] | also Connection Properties. Section 4 of [RFC9622] states that | |||
| states that Connection Properties should preferably be configured | Connection Properties should preferably be configured during | |||
| during pre-establishment, because they can serve as input to | preestablishment because they can serve as input to decisions that | |||
| decisions that are made by the implementation (e.g., the capacity | are made by the implementation (e.g., the capacity profile can guide | |||
| profile can guide usage of a protocol offering scavenger-type | usage of a protocol offering scavenger-type congestion control). | |||
| congestion control). | ||||
| The implementation stores these properties as a part of the | The implementation stores these properties as a part of the | |||
| Preconnection object for use during connection establishment. For | Preconnection object for use during connection establishment. For | |||
| Selection Properties that are not provided by the application, the | Selection Properties that are not provided by the application, the | |||
| implementation uses the default values specified in the Transport | implementation uses the default values specified in the Transport | |||
| Services API ([I-D.ietf-taps-interface]). | Services API ([RFC9622]). | |||
| 3.1. Configuration-time errors | 3.1. Configuration-Time Errors | |||
| The Transport Services system should have a list of supported | The Transport Services system should have a list of supported | |||
| protocols available, which each have transport features reflecting | protocols available, each of which has transport features reflecting | |||
| the capabilities of the protocol. Once an application specifies its | the capabilities of the protocol. Once an application specifies its | |||
| Transport Properties, the Transport Services system matches the | Transport Properties, the Transport Services system matches the | |||
| required and prohibited properties against the transport features of | required and prohibited properties against the transport features of | |||
| the available protocols (see Section 6.2 of [I-D.ietf-taps-interface] | the available protocols (see Section 6.2 of [RFC9622] for the | |||
| for the definition of property preferences). | definition of property preferences). | |||
| In the following cases, failure should be detected during pre- | In the following cases, failure should be detected during | |||
| establishment: | preestablishment: | |||
| * A request by an application for properties that cannot be | * A request by an application for properties that cannot be | |||
| satisfied by any of the available protocols. For example, if an | satisfied by any of the available protocols. For example, if an | |||
| application requires perMsgReliability, but no such feature is | application requires perMsgReliability, but no such feature is | |||
| available in any protocol on the host running the Transport | available in any protocol on the host running the Transport | |||
| Services system this should result in an error. | Services system, this should result in an error. | |||
| * A request by an application for properties that are in conflict | * A request by an application for properties that are in conflict | |||
| with each other, such as specifying required and prohibited | with each other, such as specifying required and prohibited | |||
| properties that cannot be satisfied by any protocol. For example, | properties that cannot be satisfied by any protocol. For example, | |||
| if an application prohibits reliability but then requires | if an application prohibits reliability but then requires | |||
| perMsgReliability, this mismatch should result in an error. | perMsgReliability, this mismatch should result in an error. | |||
| To avoid allocating resources that are not finally needed, it is | To avoid allocating resources that are not needed, it is important | |||
| important that configuration-time errors fail as early as possible. | that configuration-time errors fail as early as possible. | |||
| 3.2. Role of system policy | 3.2. Role of System Policy | |||
| The properties specified during pre-establishment have a close | The properties specified during preestablishment have a close | |||
| relationship to system policy. The implementation is responsible for | relationship to system policy. The implementation is responsible for | |||
| combining and reconciling several different sources of preferences | combining and reconciling several different sources of preferences | |||
| when establishing Connections. These include, but are not limited | when establishing Connections. These include, but are not limited | |||
| to: | to: | |||
| 1. Application preferences, i.e., preferences specified during the | 1. Application preferences, i.e., preferences specified during | |||
| pre-establishment via Selection Properties. | preestablishment via Selection Properties. | |||
| 2. Dynamic system policy, i.e., policy compiled from internally and | 2. Dynamic system policy, i.e., policy compiled from internally and | |||
| externally acquired information about available network | externally acquired information about available network | |||
| interfaces, supported transport protocols, and current/previous | interfaces, supported transport protocols, and current/previous | |||
| Connections. Examples of ways to externally retrieve policy- | Connections. Examples of ways to externally retrieve policy- | |||
| support information are through OS-specific statistics/ | support information are through OS-specific statistics/ | |||
| measurement tools and tools that reside on middleboxes and | measurement tools and tools that reside on middleboxes and | |||
| routers. | routers. | |||
| 3. Default implementation policy, i.e., predefined policy by OS or | 3. Default implementation policy, i.e., predefined policy by the OS | |||
| application. | or application. | |||
| In general, any protocol or path used for a Connection must conform | In general, any protocol or path used for a Connection must conform | |||
| to all three sources of constraints. A violation that occurs at any | to all three sources of constraints. A violation that occurs at any | |||
| of the policy layers should cause a protocol or path to be considered | of the policy layers should cause a protocol or path to be considered | |||
| ineligible for use. If such a violation prevents a Connection from | ineligible for use. If such a violation prevents a Connection from | |||
| being established, this should be communicated to the application, | being established, this should be communicated to the application, | |||
| e.g. via the EstablishmentError event. For an example of application | e.g., via the EstablishmentError event. For an example of | |||
| preferences leading to constraints, an application may prohibit the | application preferences leading to constraints, an application may | |||
| use of metered network interfaces for a given Connection to avoid | prohibit the use of metered network interfaces for a given Connection | |||
| user cost. Similarly, the system policy at a given time may prohibit | to avoid user cost. Similarly, the system policy at a given time may | |||
| the use of such a metered network interface from the application's | prohibit the use of such a metered network interface from the | |||
| process. Lastly, the implementation itself may default to | application's process. Lastly, the implementation itself may default | |||
| disallowing certain network interfaces unless explicitly requested by | to disallowing certain network interfaces unless explicitly requested | |||
| the application. | by the application. | |||
| It is expected that the database of system policies and the method of | It is expected that the database of system policies and the method of | |||
| looking up these policies will vary across various platforms. An | looking up these policies will vary across various platforms. An | |||
| implementation should attempt to look up the relevant policies for | implementation should attempt to look up the relevant policies for | |||
| the system in a dynamic way to make sure it is reflecting an accurate | the system in a dynamic way to make sure it reflects an accurate | |||
| version of the system policy, since the system's policy regarding the | version of the system policy, since the system's policy regarding the | |||
| application's traffic may change over time due to user or | application's traffic may change over time due to user or | |||
| administrative changes. | administrative changes. | |||
| 4. Implementing Connection Establishment | 4. Implementing Connection Establishment | |||
| The process of establishing a network connection begins when an | The process of establishing a network connection begins when an | |||
| application expresses intent to communicate with a Remote Endpoint by | application expresses intent to communicate with a Remote Endpoint by | |||
| calling Initiate, at which point the Preconnection object contains | calling Initiate, at which point the Preconnection object contains | |||
| all constraints or requirements the application has configured. The | all constraints or requirements the application has configured. The | |||
| establishment process can be considered complete once there is at | establishment process can be considered complete once there is at | |||
| least one Protocol Stack that has completed any required setup to the | least one Protocol Stack that has completed any required setup to the | |||
| point that it can transmit and receive the application's data. | point that it can transmit and receive the application's data. | |||
| Connection establishment is divided into two top-level steps: | Connection establishment is divided into two top-level steps: | |||
| Candidate Gathering (defined in Section 4.2.1 of | ||||
| [I-D.ietf-taps-arch]), to identify the paths, protocols, and | * Candidate Gathering (defined in Section 4.2.1 of [RFC9621]) to | |||
| endpoints to use (see Section 4.2); and Candidate Racing (defined in | identify the paths, protocols, and endpoints to use (see | |||
| Section 4.2.2 of [I-D.ietf-taps-arch]), in which the necessary | Section 4.2) and | |||
| protocol handshakes are conducted so that the Transport Services | ||||
| system can select which set to use (see Section 4.3). Candidate | * Candidate Racing (defined in Section 4.2.2 of [RFC9621]), in which | |||
| Racing involves attempting multiple options for connection | the necessary protocol handshakes are conducted so that the | |||
| establishment, and choosing the first option to succeed as the | Transport Services system can select which set to use (see | |||
| Section 4.3). | ||||
| Candidate Racing involves attempting multiple options for connection | ||||
| establishment and choosing the first option to succeed as the | ||||
| Protocol Stack to use for the connection. These attempts are usually | Protocol Stack to use for the connection. These attempts are usually | |||
| staggered, starting each next option after a delay, but they can also | staggered, with each next option starting after a delay; however, | |||
| be performed in parallel or only after waiting for failures. | they can also be performed in parallel or after failures occur. | |||
| For ease of illustration, this document structures the candidates for | For ease of illustration, this document structures the candidates for | |||
| racing as a tree (see Section 4.1). This is not meant to restrict | racing as a tree (see Section 4.1). This is not meant to restrict | |||
| implementations from structuring racing candidates differently. | implementations from structuring racing candidates differently. | |||
| The most simple example of this process might involve identifying the | The simplest example of this process might involve identifying the | |||
| single IP address to which the implementation wishes to connect, | single IP address to which the implementation wishes to connect, | |||
| using the system's current default path (i.e., using the default | using the system's current default path (i.e., using the default | |||
| interface), and starting a TCP handshake to establish a stream to the | interface), and starting a TCP handshake to establish a stream to the | |||
| specified IP address. However, each step may also differ depending | specified IP address. However, each step may also differ depending | |||
| on the requirements of the connection: if the Endpoint Identifier is | on the requirements of the connection: | |||
| a hostname and port, then there may be multiple resolved addresses | ||||
| that are available; there may also be multiple paths available, (in | * if the Endpoint Identifier is a hostname and port, then there may | |||
| this case using an interface other than the default system | be multiple resolved addresses that are available; | |||
| interface); and some protocols may not need any transport handshake | ||||
| to be considered "established" (such as UDP), while other connections | * there may also be multiple paths available (in this case using an | |||
| may utilize layered protocol handshakes, such as TLS over TCP. | interface other than the default system interface); and | |||
| * some protocols may not need any transport handshake to be | ||||
| considered "established" (such as UDP), while other connections | ||||
| may utilize layered protocol handshakes, such as TLS over TCP. | ||||
| Whenever an implementation has multiple options for connection | Whenever an implementation has multiple options for connection | |||
| establishment, it can view the set of all individual connection | establishment, it can view the set of all individual connection | |||
| establishment options as a single, aggregate connection | establishment options as a single aggregate connection establishment. | |||
| establishment. The aggregate set conceptually includes every valid | The aggregate set conceptually includes every valid combination of | |||
| combination of endpoints, paths, and protocols. As an example, | endpoints, paths, and protocols. As an example, consider an | |||
| consider an implementation that initiates a TCP connection to a | implementation that initiates a TCP connection to a hostname + port | |||
| hostname + port Endpoint Identifier, and has two valid interfaces | Endpoint Identifier and that has two valid interfaces available (Wi- | |||
| available (Wi-Fi and LTE). The hostname resolves to a single IPv4 | Fi and LTE). The hostname resolves to a single IPv4 address on the | |||
| address on the Wi-Fi network, and resolves to the same IPv4 address | Wi-Fi network, to the same IPv4 address on the LTE network, and to a | |||
| on the LTE network, as well as a single IPv6 address. The aggregate | single IPv6 address. The aggregate set of connection establishment | |||
| set of connection establishment options can be viewed as follows: | options can be viewed as follows, with the Endpoint Identifier | |||
| abbreviated as “EId”: | ||||
| Aggregate [Endpoint Identifier: www.example.com:443] [Interface: Any] [Protocol: TCP] | Aggregate [EId: example.com:443] [Interface: Any] [Protocol: TCP] | |||
| |-> [Endpoint Identifier: [2001:db8:23::1]:443] [Interface: Wi-Fi] [Protocol: TCP] | |-> [EId: [3fff:23::1]:443] [Interface: Wi-Fi] [Protocol: TCP] | |||
| |-> [Endpoint Identifier: 192.0.2.1:443] [Interface: LTE] [Protocol: TCP] | |-> [EId: 192.0.2.1:443] [Interface: LTE] [Protocol: TCP] | |||
| |-> [Endpoint Identifier: [2001:db8:42::1]:443] [Interface: LTE] [Protocol: TCP] | |-> [EId: [3fff:42::1]:443] [Interface: LTE] [Protocol: TCP] | |||
| Any one of these sub-entries on the aggregate connection attempt | Any one of these subentries on the aggregate connection attempt would | |||
| would satisfy the original application intent. The concern of this | satisfy the original application intent. The concern of this section | |||
| section is the algorithm defining which of these options to try, | is the algorithm defining which of these options to try, when to try | |||
| when, and in what order. | them, and in what order. | |||
| During Candidate Gathering (Section 4.2), an implementation prunes | During Candidate Gathering (Section 4.2), an implementation prunes | |||
| and sorts branches according to the Selection Property preferences | and sorts branches according to the Selection Property preferences | |||
| (Section 6.2 of [I-D.ietf-taps-interface]. It first excludes all | (Section 6.2 of [RFC9622]). First, it excludes all protocols and | |||
| protocols and paths that match a Prohibit property or do not match | paths that match a Prohibit property or do not match all Require | |||
| all Require properties. Then it will sort branches according to | properties. Then, it will sort branches according to Preferred | |||
| Preferred properties, Avoided properties, and possibly other | properties, Avoided properties, and, possibly, other criteria. | |||
| criteria. | ||||
| 4.1. Structuring Candidates as a Tree | 4.1. Structuring Candidates as a Tree | |||
| As noted above, the consideration of multiple candidates in a | As noted above, the consideration of multiple candidates in a | |||
| gathering and racing process can be conceptually structured as a | gathering and racing process can be conceptually structured as a | |||
| tree; this terminological convention is used throughout this | tree; this terminological convention is used throughout this | |||
| document. | document. | |||
| Each leaf node of the tree represents a single, coherent connection | Each leaf node of the tree represents a single coherent connection | |||
| attempt, with an endpoint, a network path, and a set of protocols | attempt with an endpoint, a network path, and a set of protocols that | |||
| that can directly negotiate and send data on the network. Each node | can directly negotiate and send data on the network. Each node in | |||
| in the tree that is not a leaf represents a connection attempt that | the tree that is not a leaf represents a connection attempt that is | |||
| is either underspecified, or else includes multiple distinct options. | either underspecified or includes multiple distinct options. For | |||
| For example, when connecting on an IP network, a connection attempt | example, when connecting on an IP network, a connection attempt to a | |||
| to a hostname and port is underspecified, because the connection | hostname and port is underspecified because the connection attempt | |||
| attempt requires a resolved IP address as its Remote Endpoint | requires a resolved IP address as its Remote Endpoint Identifier. In | |||
| Identifier. In this case, the node represented by the connection | this case, the node represented by the connection attempt to the | |||
| attempt to the hostname is a parent node, with child nodes for each | hostname is a parent node with child nodes for each IP address. | |||
| IP address. Similarly, an implementation that is allowed to connect | Similarly, an implementation that is allowed to connect using | |||
| using multiple interfaces will have a parent node of the tree for the | multiple interfaces will have a parent node of the tree for the | |||
| decision between the network paths, with a branch for each interface. | decision between the network paths with a branch for each interface. | |||
| The example aggregate connection attempt above can be drawn as a tree | The example aggregate connection attempt above can be drawn as a tree | |||
| by grouping the addresses resolved on the same interface into | by grouping the addresses resolved on the same interface into | |||
| branches: | branches: | |||
| || | || | |||
| +==============================+ | +============================+ | |||
| | www.example.com:443/any path | | www.example.com:443/any path | |||
| +==============================+ | +============================+ | |||
| // \\ | // \\ | |||
| +===========================+ +===========================+ | +=========================+ +=======================+ | |||
| | www.example.com:443/Wi-Fi | | www.example.com:443/LTE | | www.example.com:443/Wi-Fi www.example.com:443/LTE | |||
| +===========================+ +===========================+ | +=========================+ +=======================+ | |||
| || // \\ | || // \\ | |||
| +============================+ +=====================+ +==========================+ | +======================+ +=================+ +====================+ | |||
| | [2001:db8:23::1]:443/Wi-Fi | | 192.0.2.1:443/LTE | | [2001:db8:42::1]:443/LTE | | [3fff:23::1]:443/Wi-Fi 192.0.2.1:443/LTE [3fff:42::1]:443/LTE | |||
| +============================+ +=====================+ +==========================+ | +======================+ +=================+ +====================+ | |||
| The rest of this section will use a notation scheme to represent this | The rest of this section will use a notation scheme to represent this | |||
| tree. The root node (or parent node) of the tree will be represented | tree. The root node (or parent node) of the tree will be represented | |||
| by a single integer, such as "1". ("1" is used assuming that this is | by a single integer, such as "1". ("1" is used assuming that this is | |||
| the first connection made by the system; future connections created | the first connection made by the system; future connections created | |||
| by the application would allocate numbers in an increasing manner.) | by the application would allocate numbers in an increasing manner.) | |||
| Each child of that node will have an integer that identifies it, from | Each child of that node will have an integer that identifies it, from | |||
| 1 to the number of children. That child node will be uniquely | 1 to the number of children. That child node will be uniquely | |||
| identified by concatenating its integer to its parent's identifier | identified by concatenating its integer to its parent's identifier | |||
| with a dot in between, such as "1.1" and "1.2". Each node will be | with a dot character (".") in between, such as "1.1" and "1.2". Each | |||
| summarized by a tuple of three elements: endpoint, path (labeled here | node will be summarized by a tuple of three elements: endpoint, path | |||
| by interface), and protocol. In Protocol Stacks, the layers are | (labeled here by interface), and protocol. In Protocol Stacks, the | |||
| separated by '/' and ordered with the protocol closest to the | layers are separated by a slash character ("/") and ordered with the | |||
| application first. The above example can now be written more | protocol closest to the application first. The above example can now | |||
| succinctly as: | be written more succinctly as: | |||
| 1 [www.example.com:443, any path, TCP] | 1 [www.example.com:443, any path, TCP] | |||
| 1.1 [www.example.com:443, Wi-Fi, TCP] | 1.1 [www.example.com:443, Wi-Fi, TCP] | |||
| 1.1.1 [[2001:db8:23::1]:443, Wi-Fi, TCP] | 1.1.1 [[2001:db8:23::1]:443, Wi-Fi, TCP] | |||
| 1.2 [www.example.com:443, LTE, TCP] | 1.2 [www.example.com:443, LTE, TCP] | |||
| 1.2.1 [192.0.2.1:443, LTE, TCP] | 1.2.1 [192.0.2.1:443, LTE, TCP] | |||
| 1.2.2 [[2001:db8.42::1]:443, LTE, TCP] | 1.2.2 [[2001:db8.42::1]:443, LTE, TCP] | |||
| When an implementation is asked to establish a single connection, | When an implementation is asked to establish a single connection, | |||
| only one of the leaf nodes in the candidate set is needed to transfer | only one of the leaf nodes in the candidate set is needed to transfer | |||
| data. Thus, once a single leaf node becomes ready to use, then the | data. Thus, once a single leaf node becomes ready to use, the | |||
| connection establishment tree is considered ready. One way to | connection establishment tree is considered ready. One way to | |||
| implement this is by having every leaf node update the state of its | implement this is by having every leaf node update the state of its | |||
| parent node when it becomes ready, until the root node of the tree is | parent node when it becomes ready until the root node of the tree is | |||
| ready, which then notifies the application that the Connection as a | ready, which then notifies the application that the Connection as a | |||
| whole is ready to use. | whole is ready to use. | |||
| A connection establishment tree may consist of only a single node, | A connection establishment tree may consist of only a single node, | |||
| such as a connection attempt to an IP address over a single interface | such as a connection attempt to an IP address over a single interface | |||
| with a single protocol. | with a single protocol. | |||
| 1 [[2001:db8:23::1]:443, Wi-Fi, TCP] | 1 [[2001:db8:23::1]:443, Wi-Fi, TCP] | |||
| A root node may also only have one child (or leaf) node, such as a | A root node may also only have one child (or leaf) node, such as a | |||
| when a hostname resolves to only a single IP address. | when a hostname resolves to only a single IP address. | |||
| 1 [www.example.com:443, Wi-Fi, TCP] | 1 [www.example.com:443, Wi-Fi, TCP] | |||
| 1.1 [[2001:db8:23::1]:443, Wi-Fi, TCP] | 1.1 [[2001:db8:23::1]:443, Wi-Fi, TCP] | |||
| 4.1.1. Branch Types | 4.1.1. Branch Types | |||
| There are three types of branching from a parent node into one or | There are three types of branching from a parent node into one or | |||
| more child nodes. Any parent node of the tree must only use one type | more child nodes: Derived Endpoints, Network Paths, and Protocol | |||
| of branching. | Options. Any parent node of the tree must use only one type of | |||
| branching. | ||||
| 4.1.1.1. Derived Endpoints | 4.1.1.1. Derived Endpoints | |||
| If a connection originally targets a single Endpoint Identifer, there | If a connection originally targets a single Endpoint Identifier, | |||
| may be multiple endpoint candidates of different types that can be | there may be multiple endpoint candidates of different types that can | |||
| derived from the original. This creates an ordered list of the | be derived from the original. This creates an ordered list of the | |||
| derived endpoint candidates according to application preference, | derived endpoint candidates according to application preference, | |||
| system policy and expected performance. | system policy, and expected performance. | |||
| DNS hostname-to-address resolution is the most common method of | DNS hostname-to-address resolution is the most common method of | |||
| endpoint derivation. When trying to connect to a hostname Endpoint | endpoint derivation. When trying to connect to a hostname Endpoint | |||
| Identifer on a traditional IP network, the implementation should send | Identifier on an IP network, the implementation should send all | |||
| all applicable DNS queries. Commonly, this will include both A | applicable DNS queries. Commonly, this will include both A (IPv4) | |||
| (IPv4) and AAAA (IPv6) records if both address families are supported | and AAAA (IPv6) records if both address families are supported on the | |||
| on the local interface. This can also include SRV records [RFC2782], | local interface. This can also include SRV records [RFC2782], SVCB | |||
| SVCB and HTTPS records [I-D.ietf-dnsop-svcb-https], or other future | and HTTPS records [RFC9460], or other future record types. The | |||
| record types. The algorithm for ordering and racing these addresses | algorithm for ordering and racing these addresses should follow the | |||
| should follow the recommendations in Happy Eyeballs [RFC8305]. | recommendations in Happy Eyeballs [RFC8305]. | |||
| 1 [www.example.com:443, Wi-Fi, TCP] | 1 [www.example.com:443, Wi-Fi, TCP] | |||
| 1.1 [[2001:db8::1]:443, Wi-Fi, TCP] | 1.1 [[2001:db8::1]:443, Wi-Fi, TCP] | |||
| 1.2 [192.0.2.1:443, Wi-Fi, TCP] | 1.2 [192.0.2.1:443, Wi-Fi, TCP] | |||
| 1.3 [[2001:db8::2]:443, Wi-Fi, TCP] | 1.3 [[2001:db8::2]:443, Wi-Fi, TCP] | |||
| 1.4 [[2001:db8::3]:443, Wi-Fi, TCP] | 1.4 [[2001:db8::3]:443, Wi-Fi, TCP] | |||
| DNS-Based Service Discovery [RFC6763] can also provide an endpoint | DNS-Based Service Discovery [RFC6763] can also provide an endpoint | |||
| derivation step. When trying to connect to a named service, the | derivation step. When trying to connect to a named service, the | |||
| client may discover one or more hostname and port pairs on the local | client may discover one or more hostname and port pairs on the local | |||
| skipping to change at page 11, line 37 ¶ | skipping to change at line 496 ¶ | |||
| addresses, which would create multiple layers of branching. | addresses, which would create multiple layers of branching. | |||
| 1 [term-printer._ipp._tcp.meeting.example.com, Wi-Fi, TCP] | 1 [term-printer._ipp._tcp.meeting.example.com, Wi-Fi, TCP] | |||
| 1.1 [term-printer.meeting.example.com:631, Wi-Fi, TCP] | 1.1 [term-printer.meeting.example.com:631, Wi-Fi, TCP] | |||
| 1.1.1 [31.133.160.18:631, Wi-Fi, TCP] | 1.1.1 [31.133.160.18:631, Wi-Fi, TCP] | |||
| Applications can influence which derived Endpoints are allowed and | Applications can influence which derived Endpoints are allowed and | |||
| preferred via Selection Properties set on the Preconnection. For | preferred via Selection Properties set on the Preconnection. For | |||
| example, setting a preference for useTemporaryLocalAddress would | example, setting a preference for useTemporaryLocalAddress would | |||
| prefer the use of IPv6 over IPv4, and requiring | prefer the use of IPv6 over IPv4, and requiring | |||
| useTemporaryLocalAddress would eliminate IPv4 options, since IPv4 | useTemporaryLocalAddress would eliminate IPv4 options since IPv4 does | |||
| does not support temporary addresses. | not support temporary addresses. | |||
| 4.1.1.2. Network Paths | 4.1.1.2. Network Paths | |||
| If a client has multiple network paths available to it, e.g., a | If a client has multiple network paths available to it, e.g., a | |||
| mobile client with interfaces for both Wi-Fi and Cellular | mobile client with interfaces for both Wi-Fi and Cellular | |||
| connectivity, it can attempt a connection over any of the paths. | connectivity, it can attempt a connection over any of the paths. | |||
| This represents a branch point in the connection establishment. | This represents a branch point in the connection establishment. | |||
| Similar to a derived endpoint, the paths should be ranked based on | Similar to a derived endpoint, the paths should be ranked based on | |||
| preference, system policy, and performance. Attempts should be | preference, system policy, and performance. Attempts should be | |||
| started on one path (e.g., a specific interface), and then | started on one path (e.g., a specific interface) and then | |||
| successively on other paths (or interfaces) after delays based on the | successively on other paths (or interfaces) after delays based on the | |||
| expected path round-trip-time or other available metrics. | expected path RTT or other available metrics. | |||
| 1 [192.0.2.1:443, any path, TCP] | 1 [192.0.2.1:443, any path, TCP] | |||
| 1.1 [192.0.2.1:443, Wi-Fi, TCP] | 1.1 [192.0.2.1:443, Wi-Fi, TCP] | |||
| 1.2 [192.0.2.1:443, LTE, TCP] | 1.2 [192.0.2.1:443, LTE, TCP] | |||
| The same approach applies to any situation in which the client is | The same approach applies to any situation in which the client is | |||
| aware of multiple links or views of the network. A single interface | aware of multiple links or views of the network. A single interface | |||
| may be shared by multiple network paths, each with a coherent set of | may be shared by multiple network paths, each with a coherent set of | |||
| addresses, routes, DNS server, and more. A path may also represent a | addresses, routes, DNS server, and more. A path may also represent a | |||
| virtual interface service such as a Virtual Private Network (VPN). | virtual interface service such as a Virtual Private Network (VPN). | |||
| The list of available paths should be constrained by any requirements | The list of available paths should be constrained by any requirements | |||
| the application sets, as well as by the system policy. | the application sets as well as by the system policy. | |||
| 4.1.1.3. Protocol Options | 4.1.1.3. Protocol Options | |||
| Differences in possible protocol compositions and options can also | Differences in possible protocol compositions and options can also | |||
| provide a branching point in connection establishment. This allows | provide a branching point in connection establishment. This allows | |||
| clients to be resilient to situations in which a certain protocol is | clients to be resilient to situations in which a certain protocol is | |||
| not functioning on a server or network. | not functioning on a server or network. | |||
| This approach is commonly used for connections with optional proxy | This approach is commonly used for connections with optional proxy | |||
| server configurations. A single connection might have several | server configurations. A single connection might have several | |||
| options available: an HTTP-based proxy, a SOCKS-based proxy, or no | options available: an HTTP-based proxy, a SOCKS-based proxy, or no | |||
| proxy. As above, these options should be ranked based on preference, | proxy. As above, these options should be ranked based on preference, | |||
| system policy, and performance and attempted in succession. | system policy, and performance, and should be attempted in | |||
| succession. | ||||
| 1 [www.example.com:443, any path, HTTP/TCP] | 1 [www.example.com:443, any path, HTTP/TCP] | |||
| 1.1 [192.0.2.8:443, any path, HTTP/HTTP Proxy/TCP] | 1.1 [192.0.2.8:443, any path, HTTP/HTTP Proxy/TCP] | |||
| 1.2 [192.0.2.7:10234, any path, HTTP/SOCKS/TCP] | 1.2 [192.0.2.7:10234, any path, HTTP/SOCKS/TCP] | |||
| 1.3 [www.example.com:443, any path, HTTP/TCP] | 1.3 [www.example.com:443, any path, HTTP/TCP] | |||
| 1.3.1 [192.0.2.1:443, any path, HTTP/TCP] | 1.3.1 [192.0.2.1:443, any path, HTTP/TCP] | |||
| This approach also allows a client to attempt different sets of | This approach also allows a client to attempt different sets of | |||
| application and transport protocols that, when available, could | application and transport protocols that, when available, could | |||
| provide preferable features. For example, the protocol options could | provide preferable features. For example, the protocol options could | |||
| involve QUIC [RFC9000] over UDP on one branch, and HTTP/2 [RFC7540] | involve QUIC [RFC9000] over UDP on one branch and HTTP/2 [RFC9113] | |||
| over TLS over TCP on the other: | over TLS over TCP on the other: | |||
| 1 [www.example.com:443, any path, HTTP] | 1 [www.example.com:443, any path, HTTP] | |||
| 1.1 [www.example.com:443, any path, HTTP3/QUIC/UDP] | 1.1 [www.example.com:443, any path, HTTP3/QUIC/UDP] | |||
| 1.1.1 [192.0.2.1:443, any path, HTTP3/QUIC/UDP] | 1.1.1 [192.0.2.1:443, any path, HTTP3/QUIC/UDP] | |||
| 1.2 [www.example.com:443, any path, HTTP2/TLS/TCP] | 1.2 [www.example.com:443, any path, HTTP2/TLS/TCP] | |||
| 1.2.1 [192.0.2.1:443, any path, HTTP2/TLS/TCP] | 1.2.1 [192.0.2.1:443, any path, HTTP2/TLS/TCP] | |||
| Another example is racing SCTP with TCP: | Another example is racing SCTP with TCP: | |||
| 1 [www.example.com:4740, any path, reliable-inorder-stream] | 1 [www.example.com:4740, any path, reliable-inorder-stream] | |||
| 1.1 [www.example.com:4740, any path, SCTP] | 1.1 [www.example.com:4740, any path, SCTP] | |||
| 1.1.1 [192.0.2.1:4740, any path, SCTP] | 1.1.1 [192.0.2.1:4740, any path, SCTP] | |||
| 1.2 [www.example.com:4740, any path, TCP] | 1.2 [www.example.com:4740, any path, TCP] | |||
| 1.2.1 [192.0.2.1:4740, any path, TCP] | 1.2.1 [192.0.2.1:4740, any path, TCP] | |||
| Implementations that support racing protocols and protocol options | Implementations that support racing protocols and protocol options | |||
| should maintain a history of which protocols and protocol options | should maintain a history of which protocols and protocol options | |||
| were successfully established, on a per-network and per-endpoint | were successfully established on a per-network and per-endpoint basis | |||
| basis (see Section 9.2). This information can influence future | (see Section 9.2). This information can influence future racing | |||
| racing decisions to prioritize or prune branches. | decisions to prioritize or prune branches. | |||
| 4.1.2. Branching Order-of-Operations | 4.1.2. Branching Order-of-Operations | |||
| Branch types ought to occur in a specific order relative to one | Branch types ought to occur in a specific order relative to one | |||
| another to avoid creating leaf nodes with invalid or incompatible | another to avoid creating leaf nodes with invalid or incompatible | |||
| settings. In the example above, it would be invalid to branch for | settings. In the example above, it would be invalid to branch for | |||
| derived endpoints (the DNS results for www.example.com) before | derived endpoints (the DNS results for www.example.com) before | |||
| branching between interface paths, since there are situations when | branching between interface paths since there are situations when the | |||
| the results will be different across networks due to private names or | results will be different across networks due to private names or | |||
| different supported IP versions. Implementations need to be careful | different supported IP versions. Implementations need to be careful | |||
| to branch in a consistent order that results in usable leaf nodes | to branch in a consistent order that results in usable leaf nodes | |||
| whenever there are multiple branch types that could be used from a | whenever there are multiple branch types that could be used from a | |||
| single node. | single node. | |||
| This document recommends the following order of operations for | This document recommends the following order of operations for | |||
| branching: | branching: | |||
| 1. Network Paths | 1. Network Paths | |||
| 2. Protocol Options | 2. Protocol Options | |||
| 3. Derived Endpoints | 3. Derived Endpoints | |||
| where a lower number indicates higher precedence and therefore higher | where a lower number indicates higher precedence and, therefore, | |||
| placement in the tree. Branching between paths is the first in the | higher placement in the tree. Branching between paths is the first | |||
| list because results across multiple interfaces are likely not | in the list because results across multiple interfaces are likely not | |||
| related to one another: endpoint resolution may return different | related to one another: endpoint resolution may return different | |||
| results, especially when using locally resolved host and service | results, especially when using locally resolved host and service | |||
| names, and which protocols are supported and preferred may differ | names and the protocols that are supported and preferred may differ | |||
| across interfaces. Thus, if multiple paths are attempted, the | across interfaces. Thus, if multiple paths are attempted, the | |||
| overall connection establishment process can be seen as a race | overall connection establishment process can be seen as a race | |||
| between the available paths or interfaces. | between the available paths or interfaces. | |||
| Protocol options are next checked in order. Whether or not a set of | Protocol options are next checked in order. Whether or not a set of | |||
| protocols, or protocol-specific options, can successfully connect is | protocols, or protocol-specific options, can successfully connect is | |||
| generally not dependent on which specific IP address is used. | generally not dependent on which specific IP address is used. | |||
| Furthermore, the Protocol Stacks being attempted may influence or | Furthermore, the Protocol Stacks being attempted may influence or | |||
| altogether change the Endpoint Identifers being used. Adding a proxy | altogether change the Endpoint Identifiers being used. Adding a | |||
| to a connection's branch will change the Endpoint Identifer to the | proxy to a connection's branch will change the Endpoint Identifier to | |||
| proxy's IP address or hostname. Choosing an alternate protocol may | the proxy's IP address or hostname. Choosing an alternate protocol | |||
| also modify the ports that should be selected. | may also modify the ports that should be selected. | |||
| Branching for derived endpoints is the final step, and may have | Branching for derived endpoints is the final step and may have | |||
| multiple layers of derivation or resolution, such as DNS service | multiple layers of derivation or resolution, such as DNS service | |||
| resolution and DNS hostname resolution. | resolution and DNS hostname resolution. | |||
| For example, if the application has indicated both a preference for | For example, if the application has indicated both a preference for | |||
| WiFi over LTE and for a feature only available in SCTP, branches will | Wi-Fi over LTE and for a feature only available in SCTP, branches | |||
| be first sorted accord to path selection, with WiFi attempted first. | will first be sorted according to path selection, with Wi-Fi | |||
| Then, branches with SCTP will be attempted first within their subtree | attempted as the first path. Then, branches with SCTP will be | |||
| according to the properties influencing protocol selection. However, | attempted within their subtree according to the properties | |||
| if the implementation has current cache information that SCTP is not | influencing protocol selection. However, if the implementation has | |||
| available on the path over WiFi, there would be no SCTP node in the | current cache information that SCTP is not available on the path over | |||
| WiFi subtree. Here, the path over WiFi will be attempted first, and, | Wi-Fi, there would be no SCTP node in the Wi-Fi subtree. Here, the | |||
| if connection establishment succeeds, TCP will be used. Thus, the | path over Wi-Fi will be attempted first, and, if connection | |||
| Selection Property preferring WiFi takes precedence over the Property | establishment succeeds, TCP will be used. Thus, the Selection | |||
| that led to a preference for SCTP. | Property preferring Wi-Fi takes precedence over the Property that led | |||
| to a preference for SCTP. | ||||
| 1. [www.example.com:80, any path, reliable-inorder-stream] | 1. [www.example.com:80, any path, reliable-inorder-stream] | |||
| 1.1 [192.0.2.1:443, Wi-Fi, reliable-inorder-stream] | 1.1 [192.0.2.1:443, Wi-Fi, reliable-inorder-stream] | |||
| 1.1.1 [192.0.2.1:443, Wi-Fi, TCP] | 1.1.1 [192.0.2.1:443, Wi-Fi, TCP] | |||
| 1.2 [192.0.3.1:443, LTE, reliable-inorder-stream] | 1.2 [192.0.3.1:443, LTE, reliable-inorder-stream] | |||
| 1.2.1 [192.0.3.1:443, LTE, SCTP] | 1.2.1 [192.0.3.1:443, LTE, SCTP] | |||
| 1.2.2 [192.0.3.1:443, LTE, TCP] | 1.2.2 [192.0.3.1:443, LTE, TCP] | |||
| 4.1.3. Sorting Branches | 4.1.3. Sorting Branches | |||
| Implementations should sort the branches of the tree of connection | Implementations should sort the branches of the tree of connection | |||
| options in order of their preference rank, from most preferred to | options in order of their preference rank from most preferred to | |||
| least preferred as specified by Selection Properties | least preferred as specified by Selection Properties [RFC9622]. Leaf | |||
| [I-D.ietf-taps-interface]. Leaf nodes on branches with higher | nodes on branches with higher rankings represent connection attempts | |||
| rankings represent connection attempts that will be raced first. | that will be raced first. | |||
| In addition to the properties provided by the application, an | In addition to the properties provided by the application, an | |||
| implementation may include additional criteria such as cached | implementation may include additional criteria such as cached | |||
| performance estimates, see Section 9.2, or system policy, see | performance estimates (see Section 9.2) or system policy (see | |||
| Section 3.2, in the ranking. Two examples of how Selection and | Section 3.2) in the ranking. Two examples of how Selection and | |||
| Connection Properties may be used to sort branches are provided | Connection Properties may be used to sort branches are provided | |||
| below: | below: | |||
| * "Interface Instance or Type" (property name interface): If the | "Interface Instance or Type" (property name interface): | |||
| application specifies an interface type to be preferred or | If the application specifies an interface type to be preferred or | |||
| avoided, implementations should accordingly rank the paths. If | avoided, implementations should accordingly rank the paths. If | |||
| the application specifies an interface type to be required or | the application specifies an interface type to be required or | |||
| prohibited, an implementation is expected to exclude the non- | prohibited, an implementation is expected to exclude the | |||
| conforming paths. | nonconforming paths. | |||
| * "Capacity Profile" (property name connCapacityProfile): An | "Capacity Profile" (property name connCapacityProfile): | |||
| implementation can use the capacity profile to prefer paths that | An implementation can use the capacity profile to prefer paths | |||
| match an application's expected traffic profile. This match will | that match an application's expected traffic profile. This match | |||
| use cached performance estimates, see Section 9.2. Some examples | will use cached performance estimates; see Section 9.2. Some | |||
| of path preferences based on capacity profiles include: | examples of path preferences based on capacity profiles include: | |||
| - Low Latency/Interactive: Prefer paths with the lowest expected | Low Latency/Interactive: Prefer paths with the lowest expected | |||
| Round Trip Time, based on observed Round Trip Time estimates; | Round-Trip Time (RTT), based on observed RTT estimates; | |||
| - Low Latency/Non-Interactive: Prefer paths with a low expected | Low Latency/Non-Interactive: Prefer paths with a low expected | |||
| Round Trip Time, but can tolerate delay variation; | Round-Trip Time (RTT) and possible delay variation; | |||
| - Constant-Rate Streaming: Prefer paths that are expected to | Constant-Rate Streaming: Prefer paths that are expected to | |||
| satisfy the requested stream send or receive bitrate, based on | satisfy the requested stream send or receive bitrate based on | |||
| the observed maximum throughput; | the observed maximum throughput; | |||
| - Capacity-Seeking: Prefer adapting to paths to determine the | Capacity-Seeking: Prefer adapting to paths to determine the | |||
| highest available capacity, based on the observed maximum | highest available capacity based on the observed maximum | |||
| throughput. | throughput. | |||
| As another example, branch sorting can also be influenced by bounds | As another example, branch sorting can also be influenced by bounds | |||
| on the send or receive rate (Selection Properties minSendRate / | on the send or receive rate (Selection Properties minSendRate / | |||
| minRecvRate / maxSendRate / maxRecvRate): if the application | minRecvRate / maxSendRate / maxRecvRate): if the application | |||
| indicates a bound on the expected send or receive bitrate, an | indicates a bound on the expected send or receive bitrate, an | |||
| implementation may prefer a path that can likely provide the desired | implementation may prefer a path that can likely provide the desired | |||
| bandwidth, based on cached maximum throughput, see Section 9.2. The | bandwidth, based on cached maximum throughput (see Section 9.2). The | |||
| application may know the send or receive bitrate from metadata in | application may know the send or receive bitrate from metadata in | |||
| adaptive HTTP streaming, such as MPEG-DASH. | adaptive HTTP streaming, such as MPEG-DASH. | |||
| Implementations process the Properties (Section 6.2 of | Implementations process the Properties (Section 6.2 of [RFC9622]) in | |||
| [I-D.ietf-taps-interface]) in the following order: Prohibit, Require, | the following order: Prohibit, Require, Prefer, Avoid. If Selection | |||
| Prefer, Avoid. If Selection Properties contain any prohibited | Properties contain any prohibited properties, the implementation | |||
| properties, the implementation should first purge branches containing | should first purge branches containing nodes with these properties. | |||
| nodes with these properties. For required properties, it should only | For required properties, it should only keep branches that satisfy | |||
| keep branches that satisfy these requirements. Finally, it should | these requirements. Finally, it should order the branches according | |||
| order the branches according to the preferred properties, and finally | to the preferred properties and use any avoided properties as a | |||
| use any avoided properties as a tiebreaker. When ordering branches, | tiebreaker. When ordering branches, an implementation can give more | |||
| an implementation can give more weight to properties that the | weight to properties that the application has explicitly set rather | |||
| application has explicitly set, than to the properties that are | than to the properties that are set by default. | |||
| default. | ||||
| The available protocols and paths on a specific system and in a | The available protocols and paths on a specific system and in a | |||
| specific context can change; therefore, the result of sorting and the | specific context can change; therefore, the result of sorting and the | |||
| outcome of racing may vary, even when using the same Selection and | outcome of racing may vary, even when using the same Selection and | |||
| Connection Properties. However, an implementation ought to provide a | Connection Properties. However, an implementation ought to provide a | |||
| consistent outcome to applications, e.g., by preferring protocols and | consistent outcome to applications, e.g., by preferring protocols and | |||
| paths that are already used by existing Connections that specified | paths that are already used by existing Connections that specified | |||
| similar Properties. | similar Properties. | |||
| 4.2. Candidate Gathering | 4.2. Candidate Gathering | |||
| The step of gathering candidates involves identifying which paths, | The step of gathering candidates involves identifying which paths, | |||
| protocols, and endpoints may be used for a given Connection. This | protocols, and endpoints may be used for a given Connection. This | |||
| list is determined by the requirements, prohibitions, and preferences | list is determined by the requirements, prohibitions, preferences, | |||
| of the application as specified in the Selection Properties. | and avoidances of the application as specified in the Selection | |||
| Properties. | ||||
| 4.2.1. Gathering Endpoint Candidates | 4.2.1. Gathering Endpoint Candidates | |||
| Both Local and Remote Endpoint Candidates must be discovered during | Both Local and Remote Endpoint Candidates must be discovered during | |||
| connection establishment. To support Interactive Connectivity | connection establishment. To support Interactive Connectivity | |||
| Establishment (ICE) [RFC8445], or similar protocols that involve out- | Establishment (ICE) [RFC8445], or similar protocols that involve out- | |||
| of-band indirect signalling to exchange candidates with the Remote | of-band indirect signaling to exchange candidates with the Remote | |||
| Endpoint, it is important to query the set of candidate Local | Endpoint, it is important to query the set of candidate Local | |||
| Endpoints, and provide the Protocol Stack with a set of candidate | Endpoints and provide the Protocol Stack with a set of candidate | |||
| Remote Endpoints, before the Local Endpoint attempts to establish | Remote Endpoints before the Local Endpoint attempts to establish | |||
| connections. | connections. | |||
| 4.2.1.1. Local Endpoint candidates | 4.2.1.1. Local Endpoint Candidates | |||
| The set of possible Local Endpoints is gathered. In a simple case, | The set of possible Local Endpoints is gathered. In a simple case, | |||
| this merely enumerates the local interfaces and protocols, and | this merely enumerates the local interfaces and protocols and | |||
| allocates ephemeral source ports. For example, a system that has | allocates ephemeral source ports. For example, a system that has Wi- | |||
| WiFi and Ethernet and supports IPv4 and IPv6 might gather four | Fi and Ethernet and supports IPv4 and IPv6 might gather four | |||
| candidate Local Endpoints (IPv4 on Ethernet, IPv6 on Ethernet, IPv4 | candidate Local Endpoints (IPv4 on Ethernet, IPv6 on Ethernet, IPv4 | |||
| on WiFi, and IPv6 on WiFi) that can form the source for a transient. | on Wi-Fi, and IPv6 on Wi-Fi) that can form the source for a | |||
| transient. | ||||
| If NAT traversal is required, the process of gathering Local | If NAT traversal is required, the process of gathering Local | |||
| Endpoints becomes broadly equivalent to the ICE Candidate Gathering | Endpoints becomes broadly equivalent to the ICE Candidate Gathering | |||
| phase (see Section 5.1.1 of [RFC8445]). The endpoint determines its | phase (see Section 5.1.1 of [RFC8445]). The endpoint determines its | |||
| server reflexive Local Endpoints (i.e., the translated address of a | server-reflexive Local Endpoints (i.e., the translated address of a | |||
| Local Endpoint, on the other side of a NAT, e.g via a STUN sever | Local Endpoint, on the other side of a NAT, e.g., via a STUN server | |||
| [RFC5389]) and relayed Local Endpoints (e.g., via a TURN server | [RFC8489]) and relayed Local Endpoints (e.g., via a TURN server | |||
| [RFC5766] or other relay), for each interface and network protocol. | [RFC8656] or other relay) for each interface and network protocol. | |||
| These are added to the set of candidate Local Endpoint Identifers for | These are added to the set of candidate Local Endpoint Identifiers | |||
| this connection. | for this connection. | |||
| Gathering Local Endpoints is primarily a local operation, although it | Gathering Local Endpoints is primarily a local operation, although it | |||
| might involve exchanges with a STUN server to derive server reflexive | might involve exchanges with a STUN server to derive server-reflexive | |||
| Local Endpoints, or with a TURN server or other relay to derive | Local Endpoints or with a TURN server or other relay to derive | |||
| relayed Local Endpoints. However, it does not involve communication | relayed Local Endpoints. However, it does not involve communication | |||
| with the Remote Endpoint. | with the Remote Endpoint. | |||
| 4.2.1.2. Remote Endpoint Candidates | 4.2.1.2. Remote Endpoint Candidates | |||
| The Remote Endpoint Identifer is typically a name that needs to be | The Remote Endpoint Identifier is typically a name that needs to be | |||
| resolved into a set of possible addresses that can be used for | resolved into a set of possible addresses that can be used for | |||
| communication. Resolving the Remote Endpoint is the process of | communication. Resolving the Remote Endpoint is the process of | |||
| recursively performing such name lookups, until fully resolved, to | recursively performing such name lookups, until fully resolved, to | |||
| return the set of candidates for the Remote Endpoint of this | return the set of candidates for the Remote Endpoint of this | |||
| Connection. | Connection. | |||
| How this resolution is done will depend on the type of the Remote | How this resolution is done will depend on the type of the Remote | |||
| Endpoint, and can also be specific to each Local Endpoint. A common | Endpoint and can also be specific to each Local Endpoint. A common | |||
| case is when the Remote Endpoint Identifer is a DNS name, in which | case is when the Remote Endpoint Identifier is a DNS name, in which | |||
| case it is resolved to give a set of IPv4 and IPv6 addresses | case, it is resolved to give a set of IPv4 and IPv6 addresses | |||
| representing that name. Some types of Remote Endpoint Identifers | representing that name. Some types of Remote Endpoint Identifiers | |||
| might require more complex resolution. Resolving the Remote Endpoint | might require more complex resolution. Resolving the Remote Endpoint | |||
| for a peer-to-peer connection might involve communication with a | for a peer-to-peer connection might involve communication with a | |||
| rendezvous server, which in turn contacts the peer to gain consent to | rendezvous server. The server, in turn, contacts the peer to gain | |||
| communicate and retrieve its set of candidate Local Endpoints, which | consent to communicate and retrieve its set of candidate Local | |||
| are returned and form the candidate remote addresses for contacting | Endpoints. These Endpoints are returned and form the candidate | |||
| that peer. | remote addresses for contacting that peer. | |||
| Resolving the Remote Endpoint is not a local operation. It will | Resolving the Remote Endpoint is not a local operation. It will | |||
| involve a directory service, and can require communication with the | involve a directory service and can require communication between the | |||
| Remote Endpoint to rendezvous and exchange peer addresses. This can | Remote Endpoint and a rendezvous server as well as the exchange of | |||
| expose some or all of the candidate Local Endpoints to the Remote | peer addresses. This can expose some or all of the candidate Local | |||
| Endpoint. | Endpoints to the Remote Endpoint. | |||
| 4.3. Candidate Racing | 4.3. Candidate Racing | |||
| The primary goal of the Candidate Racing process is to successfully | The primary goal of the Candidate Racing process is to successfully | |||
| negotiate a Protocol Stack to an endpoint over an interface to | negotiate a Protocol Stack to an endpoint over an interface to | |||
| connect a single leaf node of the tree with as little delay and as | connect a single leaf node of the tree with as little delay and as | |||
| few unnecessary connections attempts as possible. Optimizing these | few unnecessary connections attempts as possible. Optimizing these | |||
| two factors improves the user experience, while minimizing network | two factors improves the user experience, while minimizing network | |||
| load. | load. | |||
| This section covers the dynamic aspect of connection establishment. | This section covers the dynamic aspect of connection establishment. | |||
| The tree described above is a useful conceptual and architectural | The tree described above is a useful conceptual and architectural | |||
| model. However, an implementation is unable to know all of the nodes | model. However, an implementation is unable to know all of the nodes | |||
| that will be used until steps like name resolution have occurred, and | that will be used until steps like name resolution have occurred; | |||
| many of the possible branches ultimately might not be attempted. | many of the possible branches ultimately might not be attempted. | |||
| There are three different approaches to racing the attempts for | There are three different approaches to racing the attempts for | |||
| different nodes of the connection establishment tree: | different nodes of the connection establishment tree: | |||
| 1. Simultaneous | 1. Simultaneous | |||
| 2. Staggered | 2. Staggered | |||
| 3. Failover | 3. Failover | |||
| Each approach is appropriate in different use-cases and branch types. | Each approach is appropriate in different use cases and branch types. | |||
| However, to avoid consuming unnecessary network resources, | However, to avoid consuming unnecessary network resources, | |||
| implementations should not use simultaneous racing as a default | implementations should not use simultaneous racing as a default | |||
| approach. | approach. | |||
| The timing algorithms for racing should remain independent across | The timing algorithms for racing should remain independent across | |||
| branches of the tree. Any timer or racing logic is isolated to a | branches of the tree. Any timer or racing logic is isolated to a | |||
| given parent node, and is not ordered precisely with regards to | given parent node and is not ordered precisely with regard to | |||
| children of other nodes. | children of other nodes. | |||
| 4.3.1. Simultaneous | 4.3.1. Simultaneous | |||
| Simultaneous racing is when multiple alternate branches are started | Simultaneous racing is when multiple alternate branches are started | |||
| without waiting for any one branch to make progress before starting | without waiting for any one branch to make progress before starting | |||
| the next alternative. This means the attempts are effectively | the next alternative. This means the attempts are effectively | |||
| simultaneous. Simultaneous racing should be avoided by | simultaneous. Simultaneous racing should be avoided by | |||
| implementations, since it consumes extra network resources and | implementations since it consumes extra network resources and | |||
| establishes state that might not be used. | establishes state that might not be used. | |||
| 4.3.2. Staggered | 4.3.2. Staggered | |||
| Staggered racing can be used whenever a single node of the tree has | Staggered racing can be used whenever a single node of the tree has | |||
| multiple child nodes. Based on the order determined when building | multiple child nodes. Based on the order determined when building | |||
| the tree, the first child node will be initiated immediately, | the tree, the first child node will be initiated immediately, | |||
| followed by the next child node after some delay. Once that second | followed by the next child node after some delay. Once that second | |||
| child node is initiated, the third child node (if present) will begin | child node is initiated, the third child node (if present) will begin | |||
| after another delay, and so on until all child nodes have been | after another delay, and so on until all child nodes have been | |||
| initiated, or one of the child nodes successfully completes its | initiated or one of the child nodes successfully completes its | |||
| negotiation. | negotiation. | |||
| Staggered racing attempts can proceed in parallel. Implementations | Staggered racing attempts can proceed in parallel. Implementations | |||
| should not terminate an earlier child connection attempt upon | should not terminate an earlier child connection attempt upon | |||
| starting a secondary child. | starting a secondary child. | |||
| If a child node fails to establish connectivity (as in Section 4.4.1) | If a child node fails to establish connectivity (as in Section 4.4.1) | |||
| before the delay time has expired for the next child, the next child | before the delay time has expired for the next child, the next child | |||
| should be started immediately. | should be started immediately. | |||
| Staggered racing between IP addresses for a generic Connection should | Staggered racing between IP addresses for a generic Connection should | |||
| follow the Happy Eyeballs algorithm described in [RFC8305]. | follow the Happy Eyeballs algorithm described in [RFC8305]. Guidance | |||
| [RFC8421] provides guidance for racing when performing Interactive | for racing when performing ICE can be found in [RFC8421]. | |||
| Connectivity Establishment (ICE). | ||||
| Generally, the delay before starting a given child node ought to be | Generally, the delay before starting a given child node ought to be | |||
| based on the length of time the previously started child node is | based on the length of time the previously started child node is | |||
| expected to take before it succeeds or makes progress in connection | expected to take before it succeeds or makes progress in connection | |||
| establishment. Algorithms like Happy Eyeballs choose a delay based | establishment. Algorithms like Happy Eyeballs choose a delay based | |||
| on how long the transport connection handshake is expected to take. | on how long the transport connection handshake is expected to take. | |||
| When performing staggered races in multiple branch types (such as | When performing staggered races in multiple branch types (such as | |||
| racing between network interfaces, and then racing between IP | racing between network interfaces and then racing between IP | |||
| addresses), a longer delay may be chosen for some branch types. For | addresses), a longer delay may be chosen for some branch types. For | |||
| example, when racing between network interfaces, the delay should | example, when racing between network interfaces, the delay should | |||
| also take into account the amount of time it takes to prepare the | also take into account the amount of time it takes to prepare the | |||
| network interface (such as radio association) and name resolution | network interface (such as radio association) and name resolution | |||
| over that interface, in addition to the delay that would be added for | over that interface in addition to the delay that would be added for | |||
| a single transport connection handshake. | a single transport connection handshake. | |||
| Since the staggered delay can be chosen based on dynamic information, | Since the staggered delay can be chosen based on dynamic information, | |||
| such as predicted Round Trip Time, implementations should define | such as predicted RTT, implementations should define upper and lower | |||
| upper and lower bounds for delay times. These bounds are | bounds for delay times. These bounds are implementation specific and | |||
| implementation-specific, and may differ based on which branch type is | may differ based on which branch type is being used. | |||
| being used. | ||||
| 4.3.3. Failover | 4.3.3. Failover | |||
| If an implementation or application has a strong preference for one | If an implementation or application has a strong preference for one | |||
| branch over another, the branching node may choose to wait until one | branch over another, the branching node may choose to wait until one | |||
| child has failed before starting the next. Failure of a leaf node is | child has failed before starting the next. Failure of a leaf node is | |||
| determined by its protocol negotiation failing or timing out; failure | determined by its protocol negotiation failing or timing out; failure | |||
| of a parent branching node is determined by all of its children | of a parent branching node is determined by all of its children | |||
| failing. | failing. | |||
| An example in which failover is recommended is a race between a | An example in which failover is recommended is a race between a | |||
| preferred Protocol Stack that uses a proxy and an alternate Protocol | preferred Protocol Stack that uses a proxy and an alternate Protocol | |||
| Stack that bypasses the proxy. Failover is useful in case the proxy | Stack that bypasses the proxy. Failover is useful if the proxy is | |||
| is down or misconfigured, but any more aggressive type of racing may | down or misconfigured, but any more aggressive type of racing may end | |||
| end up unnecessarily avoiding a proxy that was preferred by policy. | up unnecessarily avoiding a proxy that was preferred by policy. | |||
| 4.4. Completing Establishment | 4.4. Completing Establishment | |||
| The process of connection establishment completes when one leaf node | The process of connection establishment completes when one leaf node | |||
| of the tree has successfully completed negotiation with the Remote | of the tree has successfully completed negotiation with the Remote | |||
| Endpoint, or else all nodes of the tree have failed to connect. The | Endpoint or when all nodes of the tree have failed to connect. The | |||
| first leaf node to complete its connection is then used by the | first leaf node to complete its connection is then used by the | |||
| application to send and receive data. This is signalled to the | application to send and receive data. This is signaled to the | |||
| application using the Ready event in the API (Section 7.1 of | application using the Ready event in the API (Section 7.1 of | |||
| [RFC9622]). | ||||
| [I-D.ietf-taps-interface]). | ||||
| Successes and failures of a given attempt should be reported up to | Successes and failures of a given attempt should be reported up to | |||
| parent nodes (towards the root of the tree). For example, in the | parent nodes (toward the root of the tree). For example, in the | |||
| following case, if 1.1.1 fails to connect, it reports the failure to | following case, if 1.1.1 fails to connect, it reports the failure to | |||
| 1.1. Since 1.1 has no other child nodes, it also has failed and | 1.1. Since 1.1 has no other child nodes, it also has failed and | |||
| reports that failure to 1. Because 1.2 has not yet failed, 1 is not | reports that failure to 1. Because 1.2 has not yet failed, 1 is not | |||
| considered to have failed. Since 1.2 has not yet started, it is | considered to have failed. Since 1.2 has not yet started, it is | |||
| started and the process continues. Similarly, if 1.1.1 successfully | started and the process continues. Similarly, if 1.1.1 successfully | |||
| connects, then it marks 1.1 as connected, which propagates to the | connects, then it marks 1.1 as connected, which propagates to the | |||
| root node 1. At this point, the Connection as a whole is considered | root node 1. At this point, the Connection as a whole is considered | |||
| to be successfully connected and ready to process application data. | to be successfully connected and ready to process application data. | |||
| 1 [www.example.com:443, Any, TCP] | 1 [www.example.com:443, Any, TCP] | |||
| 1.1 [www.example.com:443, Wi-Fi, TCP] | 1.1 [www.example.com:443, Wi-Fi, TCP] | |||
| 1.1.1 [192.0.2.1:443, Wi-Fi, TCP] | 1.1.1 [192.0.2.1:443, Wi-Fi, TCP] | |||
| 1.2 [www.example.com:443, LTE, TCP] | 1.2 [www.example.com:443, LTE, TCP] | |||
| ... | ... | |||
| If a leaf node has successfully completed its connection, all other | If a leaf node has successfully completed its connection, all other | |||
| attempts should be made ineligible for use by the application for the | attempts should be made ineligible for use by the application for the | |||
| original request. New connection attempts that involve transmitting | original request. New connection attempts that involve transmitting | |||
| data on the network ought not to be started after another leaf node | data on the network ought not to be started after another leaf node | |||
| has already successfully completed, because the Connection as a whole | has already successfully completed because the Connection as a whole | |||
| has now been established. An implementation could choose to let | has now been established. An implementation could choose to let | |||
| certain handshakes and negotiations complete to gather metrics that | certain handshakes and negotiations complete to gather metrics that | |||
| influence future connections. Keeping additional connections is | influence future connections. Keeping additional connections is | |||
| generally not recommended, because those attempts were slower to | generally not recommended because those attempts were slower to | |||
| connect and may exhibit less desirable properties. | connect and may exhibit less desirable properties. | |||
| 4.4.1. Determining Successful Establishment | 4.4.1. Determining Successful Establishment | |||
| On a per-protocol basis, implementations may select different | On a per-protocol basis, implementations may select different | |||
| criteria by which a leaf node is considered to be successfully | criteria by which a leaf node is considered to be successfully | |||
| connected. If the only protocol being used is a transport protocol | connected. If the only protocol being used is a transport protocol | |||
| with a clear handshake, like TCP, then the obvious choice is to | with a clear handshake, like TCP, then the obvious choice is to | |||
| declare that node "connected" when the three-way handshake has been | declare that node "connected" when the three-way handshake completes. | |||
| completed. If the only protocol being used is an connectionless | If the only protocol being used is a connectionless protocol, like | |||
| protocol, like UDP, the implementation may consider the node fully | UDP, the implementation may consider the node fully "connected" the | |||
| "connected" the moment it determines a route is present, before | moment it determines a route is present, before sending any packets | |||
| sending any packets on the network, see further Section 4.6. | on the network, see further in Section 4.6. | |||
| When the Initiate action is called without any Messages being sent at | Depending on the protocols involved, there is no guarantee that the | |||
| the same time, depending on the protocols involved, it is not | Remote Endpoint will be notified when the Initiate action is called | |||
| guaranteed that the Remote Endpoint will be notified of this, and | without any Messages being sent at the same time. Therefore, a | |||
| hence a passive endpoint's application may not receive a | passive endpoint's application may not receive a ConnectionReceived | |||
| ConnectionReceived event until it receives the first Message on the | event until it receives the first Message on the new Connection. | |||
| new Connection. | ||||
| For Protocol Stacks with multiple handshakes, the decision becomes | For Protocol Stacks with multiple handshakes, the decision becomes | |||
| more nuanced. If the Protocol Stack involves both TLS and TCP, an | more nuanced. If the Protocol Stack involves both TLS and TCP, an | |||
| implementation could determine that a leaf node is connected after | implementation could determine that a leaf node is connected after | |||
| the TCP handshake is complete, or it can wait for the TLS handshake | the TCP handshake is complete, or it can wait for the TLS handshake | |||
| to complete as well. The benefit of declaring completion when the | to complete as well. The benefit of declaring completion when the | |||
| TCP handshake finishes, and thus stopping the race for other branches | TCP handshake finishes, and thus stopping the race for other branches | |||
| of the tree, is reduced burden on the network and Remote Endpoints | of the tree, is reduced burden on the network and Remote Endpoints | |||
| from further connection attempts that are likely to be abandoned. On | from further connection attempts that are likely to be abandoned. On | |||
| the other hand, by waiting until the TLS handshake is complete, an | the other hand, by waiting until the TLS handshake is complete, an | |||
| implementation avoids the scenario in which a TCP handshake completes | implementation avoids the scenario in which a TCP handshake completes | |||
| quickly, but TLS negotiation is either very slow or fails altogether | quickly, but TLS negotiation is either very slow or fails altogether | |||
| in particular network conditions or to a particular endpoint. To | in particular network conditions or to a particular endpoint. To | |||
| avoid the issue of TLS possibly failing, the implementation should | avoid the issue of TLS possibly failing, the implementation should | |||
| not generate a Ready event for the Connection until the TLS handshake | not generate a Ready event for the Connection until the TLS handshake | |||
| is complete. | is complete. | |||
| If all of the leaf nodes fail to connect during racing, i.e. none of | If all of the leaf nodes fail to connect during racing, i.e., none of | |||
| the configurations that satisfy all requirements given in the | the configurations that satisfy all requirements given in the | |||
| Transport Properties actually work over the available paths, then the | Transport Properties actually work over the available paths, then the | |||
| Transport Services system should report an EstablishmentError to the | Transport Services system should report an EstablishmentError to the | |||
| application. An EstablishmentError event should also be generated in | application. An EstablishmentError event should also be generated if | |||
| case the Transport Services system finds no usable candidates to | the Transport Services system finds no usable candidates to race. | |||
| race. | ||||
| 4.5. Establishing multiplexed connections | 4.5. Establishing Multiplexed Connections | |||
| Multiplexing several Connections over a single underlying transport | Multiplexing several Connections over a single underlying transport | |||
| connection requires that the Connections to be multiplexed belong to | connection requires that the multiplexed Connections belong to the | |||
| the same Connection Group (as is indicated by the application using | same Connection Group (as is indicated by the application using the | |||
| the Clone action). When the underlying transport connection supports | Clone action). When the underlying transport connection supports | |||
| multi-streaming, the Transport Services System can map each | multistreaming, the Transport Services System can map each Connection | |||
| Connection in the Connection Group to a different stream of this | in the Connection Group to a different stream of this connection. | |||
| connection. | ||||
| For such streams, there is often no explicit connection establishment | For such streams, there is often no explicit connection establishment | |||
| procedure for the new stream prior to sending data on it (e.g., with | procedure for the new stream prior to sending data on it (e.g., with | |||
| SCTP). In this case, the same considerations apply to determining | SCTP). In this case, the same considerations apply to determining | |||
| stream establishment as apply to establishing a UDP connection, as | stream establishment as apply to establishing a UDP connection, as | |||
| discussed in Section 4.4.1. This means that there might not be any | discussed in Section 4.4.1. This means that there might not be any | |||
| "establishment" message (like a TCP SYN). | "establishment" message (like a TCP SYN). | |||
| 4.6. Handling connectionless protocols | 4.6. Handling Connectionless Protocols | |||
| While protocols that use an explicit handshake to validate a | While protocols that use an explicit handshake to validate a | |||
| connection to a peer can be used for racing multiple establishment | connection to a peer can be used for racing multiple establishment | |||
| attempts in parallel, connectionless protocols such as raw UDP do not | attempts in parallel, connectionless protocols such as raw UDP do not | |||
| offer a way to validate the presence of a peer or the usability of a | offer a way to validate the presence of a peer or the usability of a | |||
| Connection without application feedback. An implementation should | Connection without application feedback. An implementation should | |||
| consider such a Protocol Stack to be established as soon as the | consider such a Protocol Stack to be established as soon as the | |||
| Transport Services system has selected a path on which to send data. | Transport Services system has selected a path on which to send data. | |||
| However, this can cause a problem if a specific peer is not reachable | However, this can cause a problem if a specific peer is not reachable | |||
| over the network using the connectionless protocol, or data cannot be | over the network using the connectionless protocol or data cannot be | |||
| exchanged with the peer for any other reason. To handle the lack of | exchanged with the peer for any other reason. To handle the lack of | |||
| an explicit handshake in the underlying protocol, an application can | an explicit handshake in the underlying protocol, an application can | |||
| use a Message Framer (Section 6) on top of a connectionless protocol | use a Message Framer (Section 6) on top of a connectionless protocol | |||
| to only mark a specific connection attempt as ready when some data | to only mark a specific connection attempt as ready when some data | |||
| has been received, or after some application-level handshake has been | has been received or after some application-level handshake has been | |||
| performed by the Message Framer. | performed by the Message Framer. | |||
| 4.7. Implementing Listeners | 4.7. Implementing Listeners | |||
| When an implementation is asked to Listen, it registers with the | When an implementation is asked to Listen, it registers with the | |||
| system to wait for incoming traffic to the Local Endpoint. If no | system to wait for incoming traffic to the Local Endpoint. If no | |||
| Local Endpoint Identifer is specified, the implementation should use | Local Endpoint Identifier is specified, the implementation should use | |||
| an ephemeral port. | an ephemeral port. | |||
| If the Selection Properties do not require a single network interface | If the Selection Properties do not require a single network interface | |||
| or path, but allow the use of multiple paths, the Listener object | or path but allow the use of multiple paths, the Listener object | |||
| should register for incoming traffic on all of the network interfaces | should register for incoming traffic on all of the network interfaces | |||
| or paths that conform to the Properties. The set of available paths | or paths that conform to the Properties. The set of available paths | |||
| can change over time, so the implementation should monitor network | can change over time, so the implementation should monitor network | |||
| path changes, and change the registration of the Listener across all | path changes and change the registration of the Listener across all | |||
| usable paths as appropriate. When using multiple paths, the Listener | usable paths as appropriate. When using multiple paths, the Listener | |||
| is generally expected to use the same port for listening on each. | is generally expected to use the same port for listening on each. | |||
| If the Selection Properties allow multiple protocols to be used for | If the Selection Properties allow multiple protocols to be used for | |||
| listening, and the implementation supports it, the Listener object | listening and the implementation supports it, the Listener object | |||
| should support receiving inbound connections for each eligible | should support receiving inbound connections for each eligible | |||
| protocol on each eligible path. | protocol on each eligible path. | |||
| 4.7.1. Implementing Listeners for Connected Protocols | 4.7.1. Implementing Listeners for Connected Protocols | |||
| Connected protocols such as TCP and TLS-over-TCP have a strong | Connected protocols such as TCP and TLS-over-TCP have a strong | |||
| mapping between the Local and Remote Endpoint Identifers (four-tuple) | mapping between the Local and Remote Endpoint Identifiers (four- | |||
| and their protocol connection state. These map into Connection | tuple) and their protocol connection state. These map to Connection | |||
| objects. Whenever a new inbound handshake is being started, the | objects. Whenever a new inbound handshake is being started, the | |||
| Listener should generate a new Connection object and pass it to the | Listener should generate a new Connection object and pass it to the | |||
| application. | application. | |||
| 4.7.2. Implementing Listeners for Connectionless Protocols | 4.7.2. Implementing Listeners for Connectionless Protocols | |||
| Connectionless protocols such as UDP and UDP-lite generally do not | Connectionless protocols such as UDP and UDP-Lite generally do not | |||
| provide the same mechanisms that connected protocols do to offer | provide the same mechanisms that Connected protocols do to offer | |||
| Connection objects. Implementations should wait for incoming packets | Connection objects. Implementations should wait for incoming packets | |||
| for connectionless protocols on a listening port and should perform | for connectionless protocols on a listening port and should perform | |||
| four-tuple matching of packets to existing Connection objects if | four-tuple matching of packets to existing Connection objects if | |||
| possible. If a matching Connection object does not exist, an | possible. If a matching Connection object does not exist, an | |||
| incoming packet from a connectionless protocol should cause a new | incoming packet from a connectionless protocol should cause a new | |||
| Connection object to be created. | Connection object to be created. | |||
| 4.7.3. Implementing Listeners for Multiplexed Protocols | 4.7.3. Implementing Listeners for Multiplexed Protocols | |||
| Protocols that provide multiplexing of streams can listen for | Protocols that provide multiplexing of streams can listen for | |||
| entirely new connections as well as for new sub-connections (streams | entirely new connections as well as for new subconnections (streams | |||
| of an already existing connection). A new stream arrival on an | of an already-existing connection). A new stream arrival on an | |||
| existing connection is presented to the application as a new | existing connection is presented to the application as a new | |||
| Connection. This new Connection is grouped with all other | Connection. This new Connection is grouped with all other | |||
| Connections that are multiplexed via the same protocol. | Connections that are multiplexed via the same protocol. | |||
| 5. Implementing Sending and Receiving Data | 5. Implementing Sending and Receiving Data | |||
| The most basic mapping for sending a Message is an abstraction of | The most basic mapping for sending a Message is an abstraction of | |||
| datagrams, in which the transport protocol naturally deals in | datagrams, in which the transport protocol naturally deals in | |||
| discrete packets (such as UDP). Each Message here corresponds to a | discrete packets (such as UDP). Each Message here corresponds to a | |||
| single datagram. | single datagram. | |||
| skipping to change at page 23, line 42 ¶ | skipping to change at line 1058 ¶ | |||
| For protocols that expose byte-streams (such as TCP), the only | For protocols that expose byte-streams (such as TCP), the only | |||
| delineation provided by the protocol is the end of the stream in a | delineation provided by the protocol is the end of the stream in a | |||
| given direction. Each Message in this case corresponds to the entire | given direction. Each Message in this case corresponds to the entire | |||
| stream of bytes in a direction. These Messages may be quite long, in | stream of bytes in a direction. These Messages may be quite long, in | |||
| which case they can be sent in multiple parts. | which case they can be sent in multiple parts. | |||
| Protocols that provide framing (such as length-value protocols, or | Protocols that provide framing (such as length-value protocols, or | |||
| protocols that use delimiters like HTTP/1.1) may support Message | protocols that use delimiters like HTTP/1.1) may support Message | |||
| sizes that do not fit within a single datagram. Each Message for | sizes that do not fit within a single datagram. Each Message for | |||
| framing protocols corresponds to a single frame, which may be sent | framing protocols corresponds to a single frame, which may be sent | |||
| either as a complete Message in the underlying protocol, or in | either as a complete Message in the underlying protocol or in | |||
| multiple parts. | multiple parts. | |||
| Messages themselves generally consist of bytes passed in the | Messages themselves generally consist of bytes passed in the | |||
| messageData parameter intended to be processed at an application | messageData parameter intended to be processed at an application | |||
| layer. However, Message objects presented through the API can carry | layer. However, Message objects presented through the API can carry | |||
| associated Message Properties passed through the messageContext | associated Message Properties passed through the messageContext | |||
| parameter. When these are Protocol Specific Properties, they can | parameter. When these are Protocol-specific Properties, they can | |||
| include metadata that exists separately from a byte encoding. For | include metadata that exists separately from a byte encoding. For | |||
| example, these Properties can include name-value pairs of | example, these Properties can include name-value pairs of | |||
| information, like HTTP header fields. In such cases, Messages might | information, like HTTP header fields. In such cases, Messages might | |||
| be "empty", insofar as they contain zero bytes in the messageData | be "empty" insofar as they contain zero bytes in the messageData | |||
| parameter, but can still include data in the messageContext that is | parameter, but they can still include data in the messageContext that | |||
| interpreted by the Protocol Stack. | is interpreted by the Protocol Stack. | |||
| 5.1. Sending Messages | 5.1. Sending Messages | |||
| The effect of the application sending a Message is determined by the | The effect of the application sending a Message is determined by the | |||
| top-level protocol in the established Protocol Stack. That is, if | top-level protocol in the established Protocol Stack. That is, if | |||
| the top-level protocol provides an abstraction of framed Messages | the top-level protocol provides an abstraction of framed Messages | |||
| over a connection, the receiving application will be able to obtain | over a connection, the receiving application will be able to obtain | |||
| multiple Messages on that connection, even if the framing protocol is | multiple Messages on that connection, even if the framing protocol is | |||
| built on a byte-stream protocol like TCP. | built on a byte-stream protocol like TCP. | |||
| 5.1.1. Message Properties | 5.1.1. Message Properties | |||
| The API allows various properties to be associated with each Message, | The API allows various properties to be associated with each Message, | |||
| which should be implemented as discussed below. | which should be implemented as discussed below. | |||
| * msgLifetime: this should be implemented by removing the Message | msgLifetime: This should be implemented by removing the Message from | |||
| from the queue of pending Messages after the Lifetime has expired. | the queue of pending Messages after the Lifetime has expired. A | |||
| A queue of pending Messages within the Transport Services | queue of pending Messages within the Transport Services | |||
| Implementation that have yet to be handed to the Protocol Stack | Implementation that have yet to be handed to the Protocol Stack | |||
| can always support this property, but once a Message has been sent | can always support this property, but once a Message has been sent | |||
| into the send buffer of a protocol, only certain protocols may | into the send buffer of a protocol, only certain protocols may | |||
| support removing it from their send buffer. For example, a | support removing it from their send buffer. For example, a | |||
| Transport Services Implementation cannot remove bytes from a TCP | Transport Services Implementation cannot remove bytes from a TCP | |||
| send buffer, while it can remove data from a SCTP send buffer | send buffer, while it can remove data from an SCTP send buffer | |||
| using the partial reliability extension [RFC8303]. When there is | using the partial reliability extension [RFC8303]. When there is | |||
| no standing queue of Messages within the system, and the Protocol | no standing queue of Messages within the system, and the Protocol | |||
| Stack does not support the removal of a Message from the stack's | Stack does not support the removal of a Message from the stack's | |||
| send buffer, this property may be ignored. | send buffer, this property may be ignored. | |||
| * msgPriority: this represents the ability to prioritize a Message | msgPriority: This represents the ability to prioritize a Message | |||
| over other Messages. This can be implemented by the Transport | over other Messages. This can be implemented by the Transport | |||
| Services system by re-ordering Messages that have yet to be handed | Services system by reordering Messages that have yet to be handed | |||
| to the Protocol Stack, or by giving relative priority hints to | to the Protocol Stack or by giving relative priority hints to | |||
| protocols that support priorities per Message. For example, an | protocols that support priorities per Message. For example, an | |||
| implementation of HTTP/2 could choose to send Messages of | implementation of HTTP/2 could choose to send Messages of | |||
| different priority on streams of different priority. | different priority on streams of different priority. | |||
| * msgOrdered: when this is false, this disables the requirement of | msgOrdered: When this is false, it disables the requirement of in- | |||
| in-order-delivery for protocols that support configurable | order delivery for protocols that support configurable ordering. | |||
| ordering. When the Protocol Stack does not support configurable | When the Protocol Stack does not support configurable ordering, | |||
| ordering, this property may be ignored. | this property may be ignored. | |||
| * safelyReplayable: when this is true, this means that the Message | safelyReplayable: When this is true, it means that the Message can | |||
| can be used by a transport mechanism that might deliver it | be used by a transport mechanism that might deliver it multiple | |||
| multiple times -- e.g., as a result of racing multiple transports | times -- e.g., as a result of racing multiple transports or as | |||
| or as part of TCP Fast Open. Also, protocols that do not protect | part of TCP Fast Open (TFO). Also, protocols that do not protect | |||
| against duplicated Messages, such as UDP (when used directly, | against duplicated Messages, such as UDP (when used directly, | |||
| without a protocol layered atop), can only be used with Messages | without a protocol layered atop), can only be used with Messages | |||
| that are Safely Replayable. When a Transport Services system is | that are Safely Replayable. When a Transport Services system is | |||
| permitted to replay Messages, replay protection could be provided | permitted to replay Messages, replay protection could be provided | |||
| by the application. | by the application. | |||
| * final: when this is true, this means that the sender will not send | final: When this is true, it means that the sender will not send any | |||
| any further Messages. The Connection need not be closed (in case | further Messages. The Connection need not be closed (if the | |||
| the Protocol Stack supports half-close operation, like TCP). Any | Protocol Stack supports half-closed operations, like TCP). Any | |||
| Messages sent after a Message marked final will result in a | Messages sent after a Message marked Final will result in a | |||
| SendError. | SendError. | |||
| * msgChecksumLen: when this is set to any value other than Full | msgChecksumLen: When this is set to any value other than Full | |||
| Coverage, it sets the minimum protection in protocols that allow | Coverage, it sets the minimum protection in protocols that allow | |||
| limiting the checksum length (e.g. UDP-Lite). If the Protocol | limiting the checksum length (e.g., UDP-Lite). If the Protocol | |||
| Stack does not support checksum length limitation, this property | Stack does not support checksum length limitation, this property | |||
| may be ignored. | may be ignored. | |||
| * msgReliable: When true, the property specifies that the Message | msgReliable: When true, the property specifies that the Message must | |||
| must be reliably transmitted. When false, and if unreliable | be reliably transmitted. When false, and if unreliable | |||
| transmission is supported by the underlying protocol, then the | transmission is supported by the underlying protocol, then the | |||
| Message should be unreliably transmitted. If the underlying | Message should be unreliably transmitted. If the underlying | |||
| protocol does not support unreliable transmission, the Message | protocol does not support unreliable transmission, the Message | |||
| should be reliably transmitted. | should be reliably transmitted. | |||
| * msgCapacityProfile: When true, this expresses a wish to override | msgCapacityProfile: When true, this expresses a wish to override the | |||
| the Generic Connection Property connCapacityProfile for this | Generic Connection Property connCapacityProfile for this Message. | |||
| Message. Depending on the value, this can, for example, be | Depending on the value, this can, for example, be implemented by | |||
| implemented by changing the DSCP value of the associated packet | changing the Differentiated Services Code Point (DSCP) value of | |||
| (note that the guidelines in Section 6 of [RFC7657] apply; e.g., | the associated packet (note that the guidelines in Section 6 of | |||
| the DSCP value should not be changed for different packets within | [RFC7657] apply; for example, the DSCP value should not be changed | |||
| a reliable transport protocol session or DCCP connection). | for different packets within a reliable transport protocol session | |||
| or DCCP connection). | ||||
| * noFragmentation: Setting this avoids network-layer fragmentation. | noFragmentation: Setting this avoids network-layer fragmentation. | |||
| Messages exceeding the transport’s current estimate of its maximum | Messages exceeding the transport's current estimate of its maximum | |||
| packet size (the singularTransmissionMsgMaxLen Connection | packet size (the singularTransmissionMsgMaxLen Connection | |||
| Property) can result in transport segmentation when permitted, or | Property) can result in transport segmentation when permitted or | |||
| generate an error. When used with transports running over IP | generate an error. When used with transports running over IPv4, | |||
| version 4, the Don't Fragment bit should be set to avoid on-path | the Don't Fragment (DF) bit should be set to avoid on-path IP | |||
| IP fragmentation ([RFC8304]). | fragmentation [RFC8304]. | |||
| * noSegmentation: When set, this property limits the Message size to | noSegmentation: When set, this property limits the Message size to | |||
| the transport’s current estimate of its maximum packet size (the | the transport's current estimate of its maximum packet size (the | |||
| singularTransmissionMsgMaxLen Connection Property). Messages | singularTransmissionMsgMaxLen Connection Property). Messages | |||
| larger than this size generate an error. Setting this avoids | larger than this size generate an error. Setting this avoids | |||
| transport-layer segmentation and network-layer fragmentation. | transport-layer segmentation and network-layer fragmentation. | |||
| When used with transports running over IP version 4, the Don't | When used with transports running over IPv4, the DF bit should be | |||
| Fragment bit should be set to avoid on-path IP fragmentation | set to avoid on-path IP fragmentation ([RFC8304]). | |||
| ([RFC8304]). | ||||
| 5.1.2. Send Completion | 5.1.2. Send Completion | |||
| The application should be notified (using a Sent, Expired or | The application should be notified (using a Sent, Expired, or | |||
| SendError event) whenever a Message or partial Message has been | SendError event) whenever a Message or partial Message has been | |||
| consumed by the Protocol Stack, or has failed to send. The time at | consumed by the Protocol Stack or has failed to send. The time at | |||
| which a Message is considered to have been consumed by the Protocol | which a Message is considered to have been consumed by the Protocol | |||
| Stack may vary depending on the protocol. For example, for a basic | Stack may vary depending on the protocol. For example, for a basic | |||
| datagram protocol like UDP, this may correspond to the time when the | datagram protocol like UDP, this may correspond to the time when the | |||
| packet is sent into the interface driver. For a protocol that | packet is sent into the interface driver. For a protocol that | |||
| buffers data in queues, like TCP, this may correspond to when the | buffers data in queues, like TCP, this may correspond to when the | |||
| data has entered the send buffer. The time at which a Message failed | data has entered the send buffer. The time at which a Message failed | |||
| to send is when the Transport Services Implementation (including the | to send is when the Transport Services Implementation (including the | |||
| Protocol Stack) has experienced a failure related to sending; this | Protocol Stack) has experienced a failure related to sending; this | |||
| can depend on protocol-specific timeouts. | can depend on protocol-specific timeouts. | |||
| skipping to change at page 26, line 43 ¶ | skipping to change at line 1197 ¶ | |||
| switch between the application and the Transport Services System). | switch between the application and the Transport Services System). | |||
| To avoid this, the application can indicate a batch of Send actions | To avoid this, the application can indicate a batch of Send actions | |||
| through the API. When this is used, the implementation can defer the | through the API. When this is used, the implementation can defer the | |||
| processing of Messages until the batch is complete. | processing of Messages until the batch is complete. | |||
| 5.2. Receiving Messages | 5.2. Receiving Messages | |||
| Similar to sending, receiving a Message is determined by the top- | Similar to sending, receiving a Message is determined by the top- | |||
| level protocol in the established Protocol Stack. The main | level protocol in the established Protocol Stack. The main | |||
| difference with receiving is that the size and boundaries of the | difference with receiving is that the size and boundaries of the | |||
| Message are not known beforehand. The application can communicate in | Message are not known beforehand. The application can communicate | |||
| its Receive action the parameters for the Message, which can help the | the parameters for the Message in its Receive action, which can help | |||
| Transport Services Implementation know how much data to deliver and | the Transport Services Implementation know how much data to deliver | |||
| when. For example, if the application only wants to receive a | and when. For example, if the application only wants to receive a | |||
| complete Message, the implementation should wait until an entire | complete Message, the implementation should wait until an entire | |||
| Message (datagram, stream, or frame) is read before delivering any | Message (datagram, stream, or frame) is read before delivering any | |||
| Message content to the application. This requires the implementation | Message content to the application. This requires the implementation | |||
| to understand where Messages end, either via a supplied Message | to understand where Messages end, either via a supplied Message | |||
| Framer or because the top-level protocol in the established Protocol | Framer or because the top-level protocol in the established Protocol | |||
| Stack preserves message boundaries. The application can also control | Stack preserves message boundaries. The application can also control | |||
| the flow of received data by specifying the minimum and maximum | the flow of received data by specifying the minimum and maximum | |||
| number of bytes of Message content it wants to receive at one time. | number of bytes of Message content it wants to receive at one time. | |||
| If a Connection finishes before a requested Receive action can be | If a Connection finishes before a requested Receive action can be | |||
| satisfied, the Transport Services system should deliver any partial | satisfied, the Transport Services system should deliver any | |||
| Message content outstanding, or if none is available, an indication | outstanding partial Message content; if none is available, the system | |||
| that there will be no more received Messages. | should indicate that there will be no additional received Messages. | |||
| 5.3. Handling of data for fast-open protocols | 5.3. Handling of Data for Fast-Open Protocols | |||
| Several protocols allow sending higher-level protocol or application | Several protocols allow sending higher-level protocol or application | |||
| data during their protocol establishment, such as TCP Fast Open | data during their protocol establishment, such as TFO [RFC7413] and | |||
| [RFC7413] and TLS 1.3 [RFC8446]. This approach is referred to as | TLS 1.3 [RFC8446]. This approach is referred to as sending Zero-RTT | |||
| sending Zero-RTT (0-RTT) data. This is a desirable feature, but | (0-RTT) data. This is a desirable feature, but it poses challenges | |||
| poses challenges to an implementation that uses racing during | to an implementation that uses racing during connection | |||
| connection establishment. | establishment. | |||
| The application can express its preference for sending messagess as | The application can express its preference for sending messages as | |||
| 0-RTT data by using the zeroRttMsg Selection Property on the | 0-RTT data by using the zeroRttMsg Selection Property on the | |||
| Preconnection. Then, the application can provide the message to send | Preconnection. Then, the application can provide the message to send | |||
| as 0-RTT data via the InitiateWithSend action. In order to be sent | as 0-RTT data via the InitiateWithSend action. In order to be sent | |||
| as 0-RTT data, the message needs to be marked with the | as 0-RTT data, the message needs to be marked with the | |||
| safelyReplayable send paramteter. In general, 0-RTT data may be | safelyReplayable send parameter. In general, 0-RTT data may be | |||
| replayed (for example, if a TCP SYN contains data, and the SYN is | replayed (for example, if a TCP SYN contains data, and the SYN is | |||
| retransmitted, the data will be retransmitted as well but may be | retransmitted, the data will be retransmitted as well but may be | |||
| considered as a new connection instead of a retransmission). When | considered a new connection instead of a retransmission). When | |||
| racing connections, different leaf nodes have the opportunity to send | racing connections, different leaf nodes have the opportunity to send | |||
| the same data independently. If data is truly safely replayable, | the same data independently. If data is truly safely replayable, | |||
| this is permissible. | this is permissible. | |||
| Once the application has provided its 0-RTT data, a Transport | Once the application has provided its 0-RTT data, a Transport | |||
| Services Implementation should keep a copy of this data and provide | Services Implementation should keep a copy of this data and provide | |||
| it to each new leaf node that is started and for which a protocol | it to each new leaf node that is started and for which a protocol | |||
| instance supporting 0-RTT is being used. Note that the amount of | instance supporting 0-RTT is being used. Note that the amount of | |||
| data that can actually be sent as 0-RTT data varies by protocol, so | data that can actually be sent as 0-RTT data varies by protocol, so | |||
| any given Protocol Stack might only consume part of the saved data | any given Protocol Stack might only consume part of the saved data | |||
| prior to becoming established. The implementation needs to keep | prior to becoming established. The implementation needs to keep | |||
| track of how much data a particular Protocol Stack has consumed, and | track of how much data a particular Protocol Stack has consumed and | |||
| ensure that any pending 0-RTT-eligible data from the application is | ensure that any pending 0-RTT-eligible data from the application is | |||
| handled before subsequent Messages. | handled before subsequent Messages. | |||
| It is also possible for Protocol Stacks within a particular leaf node | It is also possible for Protocol Stacks within a particular leaf node | |||
| to use a 0-RTT handshakes in a lower-level protocol without any | to use a 0-RTT handshake in a lower-level protocol without any safely | |||
| safely replayable application data if a higher-level protocol in the | replayable application data if a higher-level protocol in the stack | |||
| stack has idempotent handshake data to send. For example, TCP Fast | has idempotent handshake data to send. For example, TFO could use a | |||
| Open could use a Client Hello from TLS as its 0-RTT data, without any | Client Hello from TLS as its 0-RTT data without any data being | |||
| data being provided by the application. | provided by the application. | |||
| 0-RTT handshakes often rely on previous state, such as TCP Fast Open | 0-RTT handshakes often rely on previous state, such as TFO cookies, | |||
| cookies, previously established TLS tickets, or out-of-band | previously established TLS tickets, or out-of-band distributed pre- | |||
| distributed pre-shared keys (PSKs). Implementations should be aware | shared keys (PSKs). Implementations should be aware of security | |||
| of security concerns around using these tokens across multiple | concerns around using these tokens across multiple addresses or paths | |||
| addresses or paths when racing. In the case of TLS, any given ticket | when racing. In the case of TLS, any given ticket or PSK should only | |||
| or PSK should only be used on one leaf node, since servers will | be used on one leaf node, since servers will likely reject duplicate | |||
| likely reject duplicate tickets in order to prevent replays (see | tickets in order to prevent replays (see Section 8.1 of [RFC8446]). | |||
| Section 8.1 of [RFC8446]). If implementations have multiple tickets | If implementations have multiple tickets available from a previous | |||
| available from a previous connection, each leaf node attempt can use | connection, each leaf node attempt can use a different ticket. In | |||
| a different ticket. In effect, each leaf node will send the same | effect, each leaf node will send the same early application data, but | |||
| early application data, yet encoded (encrypted) differently on the | the data will be encoded (encrypted) differently on the wire. | |||
| wire. | ||||
| 6. Implementing Message Framers | 6. Implementing Message Framers | |||
| Message Framers are functions that define simple transformations | Message Framers are functions that define simple transformations | |||
| between application Message data and raw transport protocol data. | between application Message data and raw transport protocol data. | |||
| Generally, a Message Framer implements a simple application protocol | Generally, a Message Framer implements a simple application protocol | |||
| that can either be provided by the Transport Services implementation | that can be provided either by the Transport Services implementation | |||
| or by the application. It is optional for Transport Services system | or by the application. It is optional for Transport Services system | |||
| implementations to provide Message Framers: the specification | implementations to provide Message Framers: the API specification | |||
| [I-D.ietf-taps-interface] does not prescribe any particular Message | [RFC9622] does not prescribe any particular Message Framers to be | |||
| Framers to be implemented. A Framer can encapsulate or encode | implemented. A Framer can encapsulate or encode outbound Messages, | |||
| outbound Messages, decapsulate or decode inbound data into Messages, | decapsulate or decode inbound data into Messages, and implement parts | |||
| and implement parts of protocols that do not directly map to | of protocols that do not directly map to application Messages (such | |||
| application Messages (such as protocol handshakes or preludes before | as protocol handshakes or preludes before Message exchange). | |||
| Message exchange). | ||||
| While many protocols can be represented as Message Framers, for the | While many protocols can be represented as Message Framers, for the | |||
| purposes of the Transport Services API, these are ways for | purposes of the Transport Services API, these are ways for | |||
| applications or application frameworks to define their own Message | applications or application frameworks to define their own Message | |||
| parsing to be included within a Connection's Protocol Stack. As an | parsing to be included within a Connection's Protocol Stack. As an | |||
| example, TLS is a protocol that is by default built into the | example, TLS is a protocol that is by default built into the | |||
| Transport Services API, even though it could also serve the purpose | Transport Services API, even though it could also serve the purpose | |||
| of framing data over TCP. | of framing data over TCP. | |||
| Most Message Framers fall into one of two categories: | Most Message Framers fall into one of two categories: | |||
| * Header-prefixed record formats, such as a basic Type-Length-Value | * Header-prefixed record formats, such as a basic Type-Length-Value | |||
| (TLV) structure | (TLV) structure | |||
| * Delimiter-separated formats, such as HTTP/1.1 | * Delimiter-separated formats, such as HTTP/1.1 | |||
| Common Message Framers can be provided by a Transport Services | Common Message Framers can be provided by a Transport Services | |||
| Implementation, but an implementation ought to allow custom Message | Implementation, but an implementation ought to allow custom Message | |||
| Framers to be defined by the application or some other piece of | Framers to be defined by the application or some other piece of | |||
| software. This section describes one possible API for defining | software. This section describes one possible API for defining | |||
| Message Framers, as an example. | Message Framers as an example. | |||
| 6.1. Defining Message Framers | 6.1. Defining Message Framers | |||
| A Message Framer is primarily defined by the code that handles events | A Message Framer is primarily defined by the code that handles events | |||
| for a framer implementation, specifically how it handles inbound and | for a framer implementation, specifically how it handles inbound and | |||
| outbound data parsing. The function that implements custom framing | outbound data parsing. The function that implements custom framing | |||
| logic will be referred to as the "framer implementation", which may | logic will be referred to as the "framer implementation", which may | |||
| be provided by a Transport Services implementation or the application | be provided by a Transport Services implementation or the application | |||
| itself. The Message Framer refers to the object or function within | itself. The Message Framer holds a reference to the object or | |||
| the main Connection implementation that delivers events to the custom | function within the main Connection implementation that delivers | |||
| framer implementation whenever data is ready to be parsed or framed. | events to the custom framer implementation whenever data is ready to | |||
| be parsed or framed. | ||||
| The API examples in this section use the notation conventions for the | The API examples in this section use the notation conventions for the | |||
| Transport Services API defined in Section 1.1 of | Transport Services API defined in Section 1.1 of [RFC9622]. | |||
| [I-D.ietf-taps-interface]. | ||||
| The Transport Services Implementation needs to ensure that all of the | The Transport Services Implementation needs to ensure that all of the | |||
| events and actions taken on a Message Framer are synchronized to | events and actions taken on a Message Framer are synchronized to | |||
| ensure consistent behavior. For example, some of the actions defined | ensure consistent behavior. For example, some of the actions defined | |||
| below (such as PrependFramer and StartPassthrough) modify how data | below (such as PrependFramer and StartPassthrough) modify how data | |||
| flows in a protocol stack, and require synchronization with sending | flows in a Protocol Stack and require synchronization with sending | |||
| and parsing data in the Message Framer. | and parsing data in the Message Framer. | |||
| When a Connection establishment attempt begins, an event can be | When a Connection establishment attempt begins, an event can be | |||
| delivered to notify the framer implementation that a new Connection | delivered to notify the framer implementation that a new Connection | |||
| is being created. Similarly, a stop event can be delivered when a | is being created. Similarly, a Stop event can be delivered when a | |||
| Connection is being torn down. The framer implementation can use the | Connection is being torn down. The framer implementation can use the | |||
| Connection object to look up specific properties of the Connection or | Connection object to look up specific properties of the Connection or | |||
| the network being used that may influence how to frame Messages. | the network being used that may influence how to frame Messages. | |||
| MessageFramer -> Start<connection> | MessageFramer -> Start<connection> | |||
| MessageFramer -> Stop<connection> | MessageFramer -> Stop<connection> | |||
| When a Message Framer generates a Start event, the framer | When a Message Framer generates a Start event, the framer | |||
| implementation has the opportunity to start writing some data prior | implementation has the opportunity to start writing some data prior | |||
| to the Connection delivering its Ready event. This allows the | to the Connection delivering its Ready event. This allows the | |||
| skipping to change at page 30, line 4 ¶ | skipping to change at line 1345 ¶ | |||
| implementation has the opportunity to start writing some data prior | implementation has the opportunity to start writing some data prior | |||
| to the Connection delivering its Ready event. This allows the | to the Connection delivering its Ready event. This allows the | |||
| implementation to communicate control data to the Remote Endpoint | implementation to communicate control data to the Remote Endpoint | |||
| that can be used to parse Messages. | that can be used to parse Messages. | |||
| Once the framer implementation has completed its setup or handshake, | Once the framer implementation has completed its setup or handshake, | |||
| it can indicate to the application that it is ready for handling data | it can indicate to the application that it is ready for handling data | |||
| with this call. | with this call. | |||
| MessageFramer.MakeConnectionReady(connection) | MessageFramer.MakeConnectionReady(connection) | |||
| Similarly, when a Message Framer generates a Stop event, the framer | Similarly, when a Message Framer generates a Stop event, the framer | |||
| implementation has the opportunity to write some final data or clear | implementation has the opportunity to write some final data or clear | |||
| up its local state before the Closed event is delivered to the | up its local state before the Closed event is delivered to the | |||
| Application. The framer implementation can indicate that it has | application. The framer implementation can indicate that it has | |||
| finished with this call. | finished with this call. | |||
| MessageFramer.MakeConnectionClosed(connection) | MessageFramer.MakeConnectionClosed(connection) | |||
| At any time if the implementation encounters a fatal error, it can | If the implementation encounters a fatal error at any time, it can | |||
| also cause the Connection to fail and provide an error. | also cause the Connection to fail and provide an error. | |||
| MessageFramer.FailConnection(connection, error) | MessageFramer.FailConnection(connection, error) | |||
| Should the framer implementation deem the candidate selected during | Should the framer implementation deem the candidate selected during | |||
| racing unsuitable, it can signal this to the Transport Services API | racing unsuitable, it can signal this to the Transport Services API | |||
| by failing the Connection prior to marking it as ready. If there are | by failing the Connection prior to marking it as ready. If there are | |||
| no other candidates available, the Connection will fail. Otherwise, | no other candidates available, the Connection will fail. Otherwise, | |||
| the Connection will select a different candidate and the Message | the Connection will select a different candidate and the Message | |||
| Framer will generate a new Start event. | Framer will generate a new Start event. | |||
| skipping to change at page 30, line 35 ¶ | skipping to change at line 1377 ¶ | |||
| dynamically add a protocol or framer above it in the stack. This | dynamically add a protocol or framer above it in the stack. This | |||
| allows protocols that need to add TLS conditionally, like STARTTLS | allows protocols that need to add TLS conditionally, like STARTTLS | |||
| [RFC3207], to modify the Protocol Stack based on a handshake result. | [RFC3207], to modify the Protocol Stack based on a handshake result. | |||
| otherFramer := NewMessageFramer() | otherFramer := NewMessageFramer() | |||
| MessageFramer.PrependFramer(connection, otherFramer) | MessageFramer.PrependFramer(connection, otherFramer) | |||
| A Message Framer might also choose to go into a passthrough mode once | A Message Framer might also choose to go into a passthrough mode once | |||
| an initial exchange or handshake has been completed, such as the | an initial exchange or handshake has been completed, such as the | |||
| STARTTLS case mentioned above. This can also be useful for proxy | STARTTLS case mentioned above. This can also be useful for proxy | |||
| protocols like SOCKS [RFC1928] or HTTP CONNECT [RFC7230]. In such | protocols like SOCKS [RFC1928] or HTTP CONNECT [RFC9110]. In such | |||
| cases, a Message Framer implementation can intercept sending and | cases, a Message Framer implementation can initially intercept | |||
| receiving of Messages at first, but then indicate that no more | Messages being sent and received and subsequently indicate that no | |||
| processing is needed. | further processing is needed. | |||
| MessageFramer.StartPassthrough() | MessageFramer.StartPassthrough() | |||
| 6.2. Sender-side Message Framing | 6.2. Sender-Side Message Framing | |||
| Message Framers generate an event whenever a Connection sends a new | Message Framers generate an event whenever a Connection sends a new | |||
| Message. The parameters to the event align with the Send action in | Message. The parameters to the event align with the Send action in | |||
| the API (Section 9.2 of [I-D.ietf-taps-interface]). | the API (Section 9.2 of [RFC9622]). | |||
| MessageFramer | MessageFramer | |||
| | | | | |||
| V | V | |||
| NewSentMessage<connection, messageData, messageContext, endOfMessage> | NewSentMessage<connection, messageData, messageContext, endOfMessage> | |||
| Upon receiving this event, a framer implementation is responsible for | Upon receiving this event, a framer implementation is responsible for | |||
| performing any necessary transformations and sending the resulting | performing any necessary transformations and sending the resulting | |||
| data back to the Message Framer, which will in turn send it to the | data back to the Message Framer, which, in turn, will send it to the | |||
| next protocol. To improve performance, implementations should ensure | next protocol. To improve performance, implementations should ensure | |||
| that there is a way to pass the original data through without | that there is a way to pass the original data through without | |||
| copying. | copying. | |||
| MessageFramer.Send(connection, messageData) | MessageFramer.Send(connection, messageData) | |||
| To provide an example, a simple protocol that adds the length of the | To provide an example, a simple protocol that adds the length of the | |||
| Message data as a header would receive the NewSentMessage event, | Message data as a header would receive the NewSentMessage event, | |||
| create a data representation of the length of the Message data, and | create a data representation of the length of the Message data, and | |||
| then send a block of data that is the concatenation of the length | then send a block of data that is the concatenation of the length | |||
| header and the original Message data. | header and the original Message data. | |||
| 6.3. Receiver-side Message Framing | 6.3. Receiver-Side Message Framing | |||
| In order to parse a received flow of data into Messages, the Message | In order to parse a received flow of data into Messages, the Message | |||
| Framer notifies the framer implementation whenever new data is | Framer notifies the framer implementation whenever new data is | |||
| available to parse. | available to parse. | |||
| The parameters to the events and calls for receiving data with a | The parameters to the events and calls for receiving data with a | |||
| framer align with the Receive action in the API (Section 9.3 of | framer align with the Receive action in the API (Section 9.3 of | |||
| [I-D.ietf-taps-interface]). | [RFC9622]). | |||
| MessageFramer -> HandleReceivedData<connection> | MessageFramer -> HandleReceivedData<connection> | |||
| Upon receiving this event, the framer implementation can inspect the | Upon receiving this event, the framer implementation can inspect the | |||
| inbound data. The data is parsed from a particular cursor | inbound data. The data is parsed from a particular cursor | |||
| representing the unprocessed data. The application requests a | representing the unprocessed data. The application requests a | |||
| specific amount of data it needs to have available in order to parse. | specific amount of data it needs to have available in order to parse. | |||
| If the data is not available, the parse fails. | If the data is not available, the parse fails. | |||
| MessageFramer.Parse(connection, minimumIncompleteLength, maximumLength) | MessageFramer.Parse(connection, minimumIncompleteLength, maximumLength) | |||
| | | | | |||
| V | V | |||
| (messageData, messageContext, endOfMessage) | (messageData, messageContext, endOfMessage) | |||
| The framer implementation can directly advance the receive cursor | The framer implementation can directly advance the receive cursor | |||
| once it has parsed data to effectively discard data (for example, | once it has parsed data to effectively discard data (for example, | |||
| discard a header once the content has been parsed). | discard a header once the content has been parsed). | |||
| To deliver a Message to the application, the framer implementation | To deliver a Message to the application, the framer implementation | |||
| can either directly deliver data that it has allocated, or deliver a | can either directly deliver data that it has allocated or deliver a | |||
| range of data directly from the underlying transport and | range of data directly from the underlying transport and | |||
| simultaneously advance the receive cursor. | simultaneously advance the receive cursor. | |||
| MessageFramer.AdvanceReceiveCursor(connection, length) | MessageFramer.AdvanceReceiveCursor(connection, length) | |||
| MessageFramer.DeliverAndAdvanceReceiveCursor(connection, messageContext, length, endOfMessage) | MessageFramer.DeliverAndAdvanceReceiveCursor(connection, messageContext, | |||
| MessageFramer.Deliver(connection, messageContext, messageData, endOfMessage) | length, endOfMessage) | |||
| MessageFramer.Deliver(connection, messageContext, messageData, | ||||
| endOfMessage) | ||||
| Note that MessageFramer.DeliverAndAdvanceReceiveCursor allows the | Note that MessageFramer.DeliverAndAdvanceReceiveCursor allows the | |||
| framer implementation to earmark bytes as part of a Message even | framer implementation to earmark bytes as part of a Message even | |||
| before they are received by the transport. This allows the delivery | before they are received by the transport. This allows the delivery | |||
| of very large Messages without requiring the implementation to | of very large Messages without requiring the implementation to | |||
| directly inspect all of the bytes. | directly inspect all of the bytes. | |||
| To provide an example, a simple protocol that parses the length of | To provide an example, a simple protocol that parses the length of | |||
| the Message data as a header value would receive the | the Message data as a header value would receive the | |||
| HandleReceivedData event, and call Parse with a minimum and maximum | HandleReceivedData event and call Parse with a minimum and maximum | |||
| set to the length of the header field. Once the parse succeeded, it | set to the length of the header field. Once the parse succeeded, it | |||
| would call AdvanceReceiveCursor with the length of the header field, | would call AdvanceReceiveCursor with the length of the header field | |||
| and then call DeliverAndAdvanceReceiveCursor with the length of the | and then call DeliverAndAdvanceReceiveCursor with the length of the | |||
| body that was parsed from the header, marking the new Message as | body that was parsed from the header, marking the new Message as | |||
| complete. | complete. | |||
| 7. Implementing Connection Management | 7. Implementing Connection Management | |||
| Once a Connection is established, the Transport Services API allows | Once a Connection is established, the Transport Services API allows | |||
| applications to interact with the Connection by modifying or | applications to interact with the Connection by modifying or | |||
| inspecting Connection Properties. A Connection can also generate | inspecting Connection Properties. A Connection can also generate | |||
| error events in the form of SoftError events. | error events in the form of SoftError events. | |||
| The set of Connection Properties that are supported for setting and | The set of Connection Properties that are supported for setting and | |||
| getting on a Connection are described in [I-D.ietf-taps-interface]. | getting on a Connection are described in [RFC9622]. For any | |||
| For any properties that are generic, and thus could apply to all | properties that are generic and, thus, could apply to all protocols | |||
| protocols being used by a Connection, the Transport Services | being used by a Connection, the Transport Services Implementation | |||
| Implementation should store the properties in storage common to all | should store the properties in storage common to all protocols and | |||
| protocols, and notify the Protocol Stack as a whole whenever the | notify the Protocol Stack as a whole whenever the properties have | |||
| properties have been modified by the application. [RFC8303] and | been modified by the application. [RFC8303] and [RFC8304] offer | |||
| [RFC8304] offer guidance on how to do this for TCP, MPTCP, SCTP, UDP | guidance on how to do this for TCP, Multipath TCP (MPTCP), SCTP, UDP, | |||
| and UDP-Lite; see Section 10 for a description of a back-tracking | and UDP-Lite; see Section 10 for a description of a backtracking | |||
| method to find the relevant protocol primitives using these | method to find the relevant protocol primitives using these | |||
| documents. For Protocol-specific Properties, such as the User | documents. For Protocol-specific Properties, such as the User | |||
| Timeout that applies to TCP, the Transport Services Implementation | Timeout that applies to TCP, the Transport Services Implementation | |||
| only needs to update the relevant protocol instance. | only needs to update the relevant protocol instance. | |||
| Some Connection Properties might apply to multiple protocols within a | Some Connection Properties might apply to multiple protocols within a | |||
| Protocol Stack. Depending on the specific property, it might be | Protocol Stack. Depending on the specific property, it might be | |||
| appropriate to apply the property across multiple protocols | appropriate to apply the property across multiple protocols | |||
| simultaneously, or else only apply it to one protocol. In general, | simultaneously or only apply it to one protocol. In general, the | |||
| the Transport Services Implementation should allow the protocol | Transport Services Implementation should allow the protocol closest | |||
| closest to the application to interpret Connection Properties, and | to the application to interpret Connection Properties and, | |||
| potentially modify the set of Connection Properties passed down to | potentially, modify the set of Connection Properties passed down to | |||
| the next protocol in the stack. For example, if the application has | the next protocol in the stack. For example, if the application has | |||
| requested to use keepalives with the keepAlive property, and the | requested to use keep-alives with the keepAlive property, and the | |||
| Protocol Stack contains both HTTP/2 and TCP, the HTTP/2 protocol can | Protocol Stack contains both HTTP/2 and TCP, the HTTP/2 protocol can | |||
| choose to enable its own keepalives to satisfy the application | choose to enable its own keep-alives to satisfy the application | |||
| request, and disable TCP-level keepalives. For cases where the | request and disable TCP-level keep-alives. For cases where the | |||
| application needs to have fine-grained per-protocol control, the | application needs to have fine-grained per-protocol control, the | |||
| Transport Services Implementation can expose Protocol-specific | Transport Services Implementation can expose Protocol-specific | |||
| Properties. | Properties. | |||
| If an error is encountered in setting a property (for example, if the | If an error is encountered in setting a property (for example, if the | |||
| application tries to set a TCP-specific property on a Connection that | application tries to set a TCP-specific property on a Connection that | |||
| is not using TCP), the action must fail gracefully. The application | is not using TCP), the action must fail gracefully. The application | |||
| must be informed of the error, but the Connection itself must not be | must be informed of the error but the Connection itself must not be | |||
| terminated. | terminated. | |||
| When protocol instances in the Protocol Stack report generic or | When protocol instances in the Protocol Stack report generic or | |||
| protocol-specific errors, the API will deliver them to the | protocol-specific errors, the API will deliver them to the | |||
| application as SoftError events. These allow the application to be | application as SoftError events. These allow the application to be | |||
| informed of ICMP errors, and other similar events. | informed of ICMP errors and other similar events. | |||
| 7.1. Pooled Connection | 7.1. Pooled Connection | |||
| For applications that do not need in-order delivery of Messages, the | For applications that do not need in-order delivery of Messages, the | |||
| Transport Services Implementation may distribute Messages of a single | Transport Services Implementation may distribute Messages of a single | |||
| Connection across several underlying transport connections or | Connection across several underlying transport connections or | |||
| multiple streams of multi-streaming connections between endpoints, as | multiple streams of multistreaming connections between endpoints, as | |||
| long as all of these satisfy the Selection Properties. The Transport | long as all of these satisfy the Selection Properties. The Transport | |||
| Services Implementation will then hide this connection management and | Services Implementation will then hide this connection management and | |||
| only expose a single Connection object, which we here call a "Pooled | only expose a single Connection object, which we call a "Pooled | |||
| Connection". This is in contrast to Connection Groups, which | Connection". This is in contrast to Connection Groups, which | |||
| explicitly expose combined treatment of Connections, giving the | explicitly expose combined treatment of Connections, giving the | |||
| application control over multiplexing, for example. | application control over multiplexing, for example. | |||
| Pooled Connections can be useful when the application using the | Pooled Connections can be useful when the application using the | |||
| Transport Services system implements a protocol such as HTTP, which | Transport Services system implements a protocol such as HTTP, which | |||
| employs request/response pairs and does not require in-order delivery | employs request/response pairs and does not require in-order delivery | |||
| of responses. This enables implementations of Transport Services | of responses. This enables implementations of Transport Services | |||
| systems to realize transparent connection coalescing, connection | systems to realize transparent connection coalescing and connection | |||
| migration, and to perform per-message endpoint and path selection by | migration and to perform per-message endpoint and path selection by | |||
| choosing among multiple underlying connections. | choosing among multiple underlying connections. | |||
| 7.2. Handling Path Changes | 7.2. Handling Path Changes | |||
| When a path change occurs, e.g., when the IP address of an interface | When a path change occurs, e.g., when the IP address of an interface | |||
| changes or a new interface becomes available, the Transport Services | changes or a new interface becomes available, the Transport Services | |||
| Implementation is responsible for notifying the Protocol Instance of | Implementation is responsible for notifying the Protocol Instance of | |||
| the change. The path change may interrupt connectivity on a path for | the change. The path change may interrupt connectivity on a path for | |||
| an active Connection or provide an opportunity for a transport that | an active Connection or provide an opportunity for a transport that | |||
| supports multipath or migration to adapt to the new paths. Note | supports multipath or migration to adapt to the new paths. Note | |||
| that, in the model of the Transport Services API, migration is | that, in the model of the Transport Services API, migration is | |||
| considered a part of multipath connectivity; it is just a limiting | considered a part of multipath connectivity; it is just a limiting | |||
| policy on multipath usage. If the multipath Selection Property is | policy on multipath usage. If the multipath Selection Property is | |||
| set to Disabled, migration is disallowed. | set to Disabled, migration is disallowed. | |||
| For protocols that do not support multipath or migration, the | For protocols that do not support multipath or migration, the | |||
| Protocol Instances should be informed of the path change, but should | Protocol Instances should be informed of the path change but should | |||
| not be forcibly disconnected if the previously used path becomes | not be forcibly disconnected if the previously used path becomes | |||
| unavailable. There are many common usage scenarios that can lead to | unavailable. There are many common usage scenarios that can lead to | |||
| a path becoming temporarily unavailable, and then recovering before | a path becoming temporarily unavailable and then recovering before | |||
| the transport protocol reaches a timeout error. These are | the transport protocol reaches a timeout error. These are | |||
| particularly common using mobile devices. Examples include: an | particularly common using mobile devices. Examples include: | |||
| Ethernet cable becoming unplugged and then plugged back in; a device | ||||
| losing a Wi-Fi signal while a user is in an elevator, and reattaching | * an Ethernet cable becoming unplugged and then plugged back in; | |||
| when the user leaves the elevator; and a user losing the radio signal | ||||
| while riding a train through a tunnel. If the device is able to | * a device losing a Wi-Fi signal while a user is in an elevator and | |||
| rejoin a network with the same IP address, a stateful transport | reattaching when the user leaves the elevator; and | |||
| connection can generally resume. Thus, while it is useful for a | ||||
| Protocol Instance to be aware of a temporary loss of connectivity, | * a user losing the radio signal while riding a train through a | |||
| the Transport Services Implementation should not aggressively close | tunnel. | |||
| Connections in these scenarios. | ||||
| If the device is able to rejoin a network with the same IP address, a | ||||
| stateful transport connection can generally resume. Thus, while it | ||||
| is useful for a Protocol Instance to be aware of a temporary loss of | ||||
| connectivity, the Transport Services Implementation should not | ||||
| aggressively close Connections in these scenarios. | ||||
| If the Protocol Stack includes a transport protocol that supports | If the Protocol Stack includes a transport protocol that supports | |||
| multipath connectivity, the Transport Services Implementation should | multipath connectivity, the Transport Services Implementation should | |||
| also inform the Protocol Instance about potentially new paths that | also inform the Protocol Instance about potentially new paths that | |||
| become permissible based on the multipath Selection Property and the | become permissible based on the multipath Selection Property and the | |||
| multipathPolicy Connection Property choices made by the application. | multipathPolicy Connection Property choices made by the application. | |||
| A protocol can then establish new subflows over new paths while an | A protocol can then establish new subflows over new paths while an | |||
| active path is still available or, if migration is supported, also | active path is still available or after a break has been detected, | |||
| after a break has been detected, and should attempt to tear down | and it should attempt to tear down subflows over paths that are no | |||
| subflows over paths that are no longer used. The Connection Property | longer used. The Connection Property multipathPolicy of the | |||
| multipathPolicy of the Transport Services API allows an application | Transport Services API allows an application to indicate when and how | |||
| to indicate when and how different paths should be used. However, | different paths should be used. However, detailed handling of these | |||
| detailed handling of these policies is implementation-specific. For | policies is implementation specific. For example, if the multipath | |||
| example, if the multipath Selection Property is set to active, the | Selection Property is set to Active, the decision about when to | |||
| decision about when to create a new path or to announce a new path or | create a new path or to announce a new path or set of paths to the | |||
| set of paths to the Remote Endpoint, e.g., in the form of additional | Remote Endpoint, e.g., in the form of additional IP addresses, is | |||
| IP addresses, is implementation-specific. If the Protocol Stack | implementation specific. If the Protocol Stack includes a transport | |||
| includes a transport protocol that does not support multipath, but | protocol that does not support multipath but does support migrating | |||
| does support migrating between paths, the update to the set of | between paths, the update to the set of available paths can trigger | |||
| available paths can trigger the connection to be migrated. | the connection to be migrated. | |||
| In the case of a Pooled Connection Section 7.1, the Transport | In the case of a Pooled Connection (Section 7.1), the Transport | |||
| Services Implementation may add connections over new paths to the | Services Implementation may add connections over new paths to the | |||
| pool if permissible based on the multipath policy and Selection | pool if permissible based on the multipathPolicy and Selection | |||
| Properties. In the case that a previously used path becomes | Properties. If a previously used path becomes unavailable, the | |||
| unavailable, the Transport Services system may disconnect all | Transport Services system may disconnect all connections that require | |||
| connections that require this path, but should not disconnect the | this path, but it should not disconnect the pooled Connection object | |||
| pooled Connection object exposed to the application. The strategy to | exposed to the application. The strategy to do so is implementation | |||
| do so is implementation-specific, but should be consistent with the | specific, but it should be consistent with the behavior of multipath | |||
| behavior of multipath transports. | transports. | |||
| 8. Implementing Connection Termination | 8. Implementing Connection Termination | |||
| For Close (which leads to a Closed event) and Abort (which leads to a | For Close (which leads to a Closed event) and Abort (which leads to a | |||
| ConnectionError event), the application might find it useful to be | ConnectionError event), the application might find it useful to be | |||
| informed when a peer closes or aborts a Connection. Whether this is | informed when a peer closes or aborts a Connection. Whether this is | |||
| possible depends on the underlying protocol, and no guarantees can be | possible depends on the underlying protocol, and no guarantees can be | |||
| given. When an underlying transport connection supports multi- | given. When an underlying transport connection supports | |||
| streaming (such as SCTP), the Transport Services system can use a | multistreaming (such as SCTP), the Transport Services system can use | |||
| stream reset procedure to cause a Finish event upon a Close action | a stream reset procedure to cause a Finish event upon a Close action | |||
| from the peer [NEAT-flow-mapping]. | from the peer [NEAT-flow-mapping]. | |||
| 9. Cached State | 9. Cached State | |||
| Beyond a single Connection's lifetime, it is useful for an | Beyond a single Connection's lifetime, it is useful for an | |||
| implementation to keep state and history. This cached state can help | implementation to keep state and history. This cached state can help | |||
| improve future Connection establishment due to re-using results and | improve future Connection establishment due to reusing results and | |||
| credentials, and favoring paths and protocols that performed well in | credentials and favoring paths and protocols that performed well in | |||
| the past. | the past. | |||
| Cached state may be associated with different endpoints for the same | Cached state may be associated with different endpoints for the same | |||
| Connection, depending on the protocol generating the cached content. | Connection, depending on the protocol generating the cached content. | |||
| For example, session tickets for TLS are associated with specific | For example, session tickets for TLS are associated with specific | |||
| endpoints, and thus should be cached based on a connection's hostname | endpoints; thus, they should be cached based on a connection's | |||
| Endpoint Identifer (if applicable). However, performance | hostname Endpoint Identifier (if applicable). However, performance | |||
| characteristics of a path are more likely tied to the IP address and | characteristics of a path are more likely tied to the IP address and | |||
| subnet being used. | subnet being used. | |||
| 9.1. Protocol state caches | 9.1. Protocol State Caches | |||
| Some protocols will have long-term state to be cached in association | Some protocols will have long-term state to be cached in association | |||
| with endpoints. This state often has some time after which it is | with endpoints. This state often has some time after which it is | |||
| expired, so the implementation should allow each protocol to specify | expired, so the implementation should allow each protocol to specify | |||
| an expiration for cached content. | an expiration for cached content. | |||
| Examples of cached protocol state include: | Examples of cached protocol state include: | |||
| * The DNS protocol can cache resolved addresses (such as those | * The DNS protocol can cache resolved addresses (such as those | |||
| retrieved from A and AAAA queries), associated with a Time To Live | retrieved from A and AAAA queries) associated with a Time To Live | |||
| (TTL) to be used for future hostname resolutions without requiring | (TTL) to be used for future hostname resolutions without requiring | |||
| asking the DNS resolver again. | asking the DNS resolver again. | |||
| * TLS caches session state and tickets based on a hostname, which | * TLS caches session state and tickets based on a hostname, which | |||
| can be used for resuming sessions with a server. | can be used for resuming sessions with a server. | |||
| * TCP can cache cookies for use in TCP Fast Open. | * TCP can cache cookies for use in TFO | |||
| Cached protocol state is primarily used during Connection | Cached protocol state is primarily used during Connection | |||
| establishment for a single Protocol Stack, but may be used to | establishment for a single Protocol Stack, but it may be used to | |||
| influence an implementation's preference between several candidate | influence an implementation's preference between several candidate | |||
| Protocol Stacks. For example, if two IP address Endpoint Identifers | Protocol Stacks. For example, if two IP address Endpoint Identifiers | |||
| are otherwise equally preferred, an implementation may choose to | are otherwise equally preferred, an implementation may choose to | |||
| attempt a connection to an address for which it has a TCP Fast Open | attempt a connection to an address for which it has a TFO cookie. | |||
| cookie. | ||||
| Applications can use the Transport Services API to request that a | Applications can use the Transport Services API to request that a | |||
| Connection Group maintain a separate cache for protocol state. | Connection Group maintain a separate cache for protocol state. | |||
| Connections in the group will not use cached state from Connections | Connections in the group will not use cached state from Connections | |||
| outside the group, and Connections outside the group will not use | outside the group, and Connections outside the group will not use | |||
| state cached from Connections inside the group. This may be | state cached from Connections inside the group. This may be | |||
| necessary, for example, if application-layer identifiers rotate and | necessary, for example, if application-layer identifiers rotate and | |||
| clients wish to avoid linkability via trackable TLS tickets or TFO | clients wish to avoid linkability via trackable TLS tickets or TFO | |||
| cookies. | cookies. | |||
| 9.2. Performance caches | 9.2. Performance Caches | |||
| In addition to protocol state, Protocol Instances should provide data | In addition to protocol state, Protocol Instances should provide data | |||
| into a performance-oriented cache to help guide future protocol and | into a performance-oriented cache to help guide future protocol and | |||
| path selection. Some performance information can be gathered | path selection. Some performance information can be gathered | |||
| generically across several protocols to allow predictive comparisons | generically across several protocols to allow predictive comparisons | |||
| between protocols on given paths: | between protocols on given paths: | |||
| * Observed Round Trip Time | * Observed RTT | |||
| * Connection establishment latency | * Connection establishment latency | |||
| * Connection establishment success rate | * Connection establishment success rate | |||
| These items can be cached on a per-address and per-subnet | These items can be cached on a per-address and per-subnet granularity | |||
| granularity, and averaged between different values. The information | and averaged between different values. The information should be | |||
| should be cached on a per-network basis, since it is expected that | cached on a per-network basis since it is expected that different | |||
| different network attachments will have different performance | network attachments will have different performance characteristics. | |||
| characteristics. Besides Protocol Instances, other system entities | Besides Protocol Instances, other system entities may also provide | |||
| may also provide data into performance-oriented caches. This could | data into performance-oriented caches. This could for instance be | |||
| for instance be signal strength information reported by radio modems | signal strength information reported by radio modems like Wi-Fi and | |||
| like Wi-Fi and mobile broadband or information about the battery- | mobile broadband or information about the battery level of the | |||
| level of the device. Furthermore, the system may cache the observed | device. Furthermore, the system may cache the observed maximum | |||
| maximum throughput on a path as an estimate of the available | throughput on a path as an estimate of the available bandwidth. | |||
| bandwidth. | ||||
| An implementation should use this information, when possible, to | An implementation should use this information, when possible, to | |||
| influence preference between candidate paths, endpoints, and protocol | influence preference between candidate paths, endpoints, and protocol | |||
| options. Eligible options that historically had significantly better | options. Eligible options that historically had significantly better | |||
| performance than others should be selected first when gathering | performance than others should be selected first when gathering | |||
| candidates (see Section 4.2) to ensure better performance for the | candidates (see Section 4.2) to ensure better performance for the | |||
| application. | application. | |||
| The reasonable lifetime for cached performance values will vary | The reasonable lifetime for cached performance values will vary | |||
| depending on the nature of the value. Certain information, like the | depending on the nature of the value. Certain information, like the | |||
| connection establishment success rate to a Remote Endpoint using a | connection establishment success rate to a Remote Endpoint using a | |||
| given Protocol Stack, can be stored for a long period of time (hours | given Protocol Stack, can be stored for a long period of time (hours | |||
| or longer), since it is expected that the capabilities of the Remote | or longer) since it is expected that the capabilities of the Remote | |||
| Endpoint are not changing very quickly. On the other hand, the Round | Endpoint are not changing very quickly. On the other hand, the RTT | |||
| Trip Time observed by TCP over a particular network path may vary | observed by TCP over a particular network path may vary over a | |||
| over a relatively short time interval. For such values, the | relatively short time interval. For such values, the implementation | |||
| implementation should remove them from the cache more quickly, or | should remove them from the cache more quickly or treat older values | |||
| treat older values with less confidence/weight. | with less confidence/weight. | |||
| [RFC9040] provides guidance about sharing of TCP Control Block | [RFC9040] provides guidance about sharing of TCP Control Block | |||
| information between connections on initialization. | information between connections on initialization. | |||
| 10. Specific Transport Protocol Considerations | 10. Specific Transport Protocol Considerations | |||
| Each protocol that is supported by a Transport Services | Each protocol that is supported by a Transport Services | |||
| Implementation should have a well-defined API mapping. API mappings | Implementation should have a well-defined API mapping. API mappings | |||
| for a protocol are important for Connections in which a given | for a protocol are important for Connections in which a given | |||
| protocol is the "top" of the Protocol Stack. For example, the | protocol is the "top" of the Protocol Stack. For example, the | |||
| mapping of the Send function for TCP applies to Connections in which | mapping of the Send function for TCP applies to Connections in which | |||
| the application directly sends over TCP. | the application directly sends over TCP. | |||
| Each protocol has a notion of Connectedness. Possible definitions of | Each protocol has a notion of "Connectedness". Possible definitions | |||
| Connectedness for various types of protocols are: | of Connectedness for various types of protocols are: | |||
| * Connectionless. Connectionless protocols do not establish | Connectionless: Connectionless protocols do not establish explicit | |||
| explicit state between endpoints, and do not perform a handshake | state between endpoints and do not perform a handshake during | |||
| during Connection establishment. | Connection establishment. | |||
| * Connected. Connected (also called "connection-oriented") | Connected: Connected (also called "connection-oriented") protocols | |||
| protocols establish state between endpoints, and perform a | establish state between endpoints and perform a handshake during | |||
| handshake during connection establishment. The handshake may be | connection establishment. The handshake may be 0-RTT to send data | |||
| 0-RTT to send data or resume a session, but bidirectional traffic | or resume a session, but bidirectional traffic is required to | |||
| is required to confirm connectedness. | confirm Connectedness. | |||
| * Multiplexing Connected. Multiplexing Connected protocols share | Multiplexing Connected: Multiplexing Connected protocols share | |||
| properties with Connected protocols, but also explictly support | properties with Connected protocols but also explicitly support | |||
| opening multiple application-level flows. This means that they | opening multiple application-level flows. This means that they | |||
| can support cloning new Connection objects without a new explicit | can support cloning new Connection objects without a new explicit | |||
| handshake. | handshake. | |||
| Protocols also have a notion of Data Unit. Possible values for Data | Protocols also have a notion of "Data Unit". Possible values for | |||
| Unit are: | Data Unit are: | |||
| * Byte-stream. Byte-stream protocols do not define any message | Byte-stream: Byte-stream protocols do not define any message | |||
| boundaries of their own apart from the end of a stream in each | boundaries of their own apart from the end of a stream in each | |||
| direction. | direction. | |||
| * Datagram. Datagram protocols define message boundaries at the | Datagram: Datagram protocols define message boundaries at the same | |||
| same level of transmission, such that only complete (not partial) | level of transmission, such that only complete (not partial) | |||
| messages are supported. | messages are supported. | |||
| * Message. Message protocols support message boundaries that can be | Message: Message protocols support message boundaries that can be | |||
| sent and received either as complete or partial messages. Maximum | sent and received either as complete or partial messages. Maximum | |||
| message lengths can be defined, and messages can be partially | message lengths can be defined, and messages can be partially | |||
| reliable. | reliable. | |||
| Below, terms in capitals with a dot (e.g., "CONNECT.SCTP") refer to | Below, terms in capitals with a dot character (".") (e.g., | |||
| the primitives with the same name in Section 4 of [RFC8303]. For | "CONNECT.SCTP") refer to the primitives with the same name in | |||
| further implementation details, the description of these primitives | Section 4 of [RFC8303]. For further implementation details, the | |||
| in [RFC8303] points to Section 3 of [RFC8303] and Section 3 of | description of these primitives in [RFC8303] points to Section 3 of | |||
| [RFC8304], which refers back to the relevant specifications for each | [RFC8303] and Section 3 of [RFC8304], which refers back to the | |||
| protocol. This back-tracking method applies to all elements of | relevant specifications for each protocol. This applies to all | |||
| [RFC8923] (see appendix D of [I-D.ietf-taps-interface]): they are | elements of [RFC8923] (see Appendix C of [RFC9622]): they are listed | |||
| listed in appendix A of [RFC8923] with an implementation hint in the | in Appendix A of [RFC8923] with an implementation hint in the same | |||
| same style, pointing back to Section 4 of [RFC8303]. | style, pointing back to Section 4 of [RFC8303]. | |||
| This document presents the protocol mappings defined in [RFC8923]. | This document presents the protocol mappings defined in [RFC8923]. | |||
| Other protocol mappings can be provided as separate documents, | Other protocol mappings can be provided as separate documents, | |||
| following the mapping template in Appendix A. | following the mapping template in Appendix A. | |||
| 10.1. TCP | 10.1. TCP | |||
| Connectedness: Connected | Connectedness: Connected | |||
| Data Unit: Byte-stream | Data Unit: Byte-stream | |||
| Connection Object: TCP connections between two hosts map directly to | Connection Object: TCP connections between two hosts map directly to | |||
| Connection objects. | Connection objects. | |||
| Initiate: CONNECT.TCP. Calling Initiate on a TCP Connection causes | Initiate: CONNECT.TCP. Calling Initiate on a TCP Connection causes | |||
| it to reserve a local port, and send a SYN to the Remote Endpoint. | it to reserve a local port and send a SYN to the Remote Endpoint. | |||
| InitiateWithSend: CONNECT.TCP with parameter user message. Early | InitiateWithSend: CONNECT.TCP with parameter user message. Early | |||
| safely replayable data is sent on a TCP Connection in the SYN, as | safely replayable data is sent on a TCP Connection in the SYN, as | |||
| TCP Fast Open data. | TFO data. | |||
| Ready: A TCP Connection is ready once the three-way handshake is | Ready: A TCP Connection is ready once the three-way handshake is | |||
| complete. | complete. | |||
| EstablishmentError: Failure of CONNECT.TCP. TCP can throw various | EstablishmentError: Failure of CONNECT.TCP. TCP can throw various | |||
| errors during connection setup. Specifically, it is important to | errors during connection setup. Specifically, it is important to | |||
| handle a RST being sent by the peer during the handshake. | handle a RST being sent by the peer during the handshake. | |||
| ConnectionError: Once established, TCP throws errors whenever the | ConnectionError: Once established, TCP throws errors whenever the | |||
| connection is disconnected, such as due to receiving a RST from | connection is disconnected, such as due to receiving a RST from | |||
| skipping to change at page 39, line 24 ¶ | skipping to change at line 1803 ¶ | |||
| ConnectionReceived: TCP Listeners will deliver new connections once | ConnectionReceived: TCP Listeners will deliver new connections once | |||
| they have replied to an inbound SYN with a SYN-ACK. | they have replied to an inbound SYN with a SYN-ACK. | |||
| Clone: Calling Clone on a TCP Connection creates a new Connection | Clone: Calling Clone on a TCP Connection creates a new Connection | |||
| with equivalent parameters. These Connections, and Connections | with equivalent parameters. These Connections, and Connections | |||
| generated via later calls to Clone on an Established Connection, | generated via later calls to Clone on an Established Connection, | |||
| form a Connection Group. To realize entanglement for these | form a Connection Group. To realize entanglement for these | |||
| Connections, with the exception of connPriority, changing a | Connections, with the exception of connPriority, changing a | |||
| Connection Property on one of them must affect the Connection | Connection Property on one of them must affect the Connection | |||
| Properties of the others too. No guarantees of honoring the | Properties of the others too. No guarantees of honoring the | |||
| Connection Property connPriority are given, and thus it is safe | Connection Property connPriority are given; thus, it is safe for | |||
| for an implementation of a Transport Services system to ignore | an implementation of a Transport Services system to ignore this | |||
| this property. When it is reasonable to assume that Connections | property. When it is reasonable to assume that Connections | |||
| traverse the same path (e.g., when they share the same | traverse the same path (e.g., when they share the same | |||
| encapsulation), support for it can also experimentally be | encapsulation), support for it can also experimentally be | |||
| implemented using a congestion control coupling mechanism (see for | implemented using a congestion control coupling mechanism (for | |||
| example [TCP-COUPLING] or [RFC3124]). | example, see [TCP-COUPLING] or [RFC3124]). | |||
| Send: SEND.TCP. TCP does not on its own preserve message | Send: SEND.TCP. On its own, TCP does not preserve message | |||
| boundaries. Calling Send on a TCP connection lays out the bytes | boundaries. Calling Send on a TCP connection lays out the bytes | |||
| on the TCP send stream without any other delineation. Any Message | on the TCP send stream without any other delineation. Any Message | |||
| marked as Final will cause TCP to send a FIN once the Message has | marked as Final will cause TCP to send a FIN once the Message has | |||
| been completely written, by calling CLOSE.TCP immediately upon | been completely written, by calling CLOSE.TCP immediately upon | |||
| successful termination of SEND.TCP. Note that transmitting a | successful termination of SEND.TCP. Note that transmitting a | |||
| Message marked as Final should not cause the Closed event to be | Message marked as Final should not cause the Closed event to be | |||
| delivered to the application, as it will still be possible to | delivered to the application as it will still be possible to | |||
| receive data until the peer closes or aborts the TCP connection. | receive data until the peer closes or aborts the TCP connection. | |||
| Receive: With RECEIVE.TCP, TCP delivers a stream of bytes without | Receive: With RECEIVE.TCP, TCP delivers a stream of bytes without | |||
| any Message delineation. All data delivered in the Received or | any Message delineation. All data delivered in the Received or | |||
| ReceivedPartial event will be part of a single stream-wide Message | ReceivedPartial event will be part of a single stream-wide Message | |||
| that is marked Final (unless a Message Framer is used). | that is marked Final (unless a Message Framer is used). | |||
| EndOfMessage will be delivered when the TCP Connection has | EndOfMessage will be delivered when the TCP Connection has | |||
| received a FIN (CLOSE-EVENT.TCP) from the peer. Note that | received a FIN (CLOSE-EVENT.TCP) from the peer. Note that | |||
| reception of a FIN should not cause the Closed event to be | reception of a FIN should not cause the Closed event to be | |||
| delivered to the application, as it will still be possible for the | delivered to the application, as it will still be possible for the | |||
| skipping to change at page 40, line 25 ¶ | skipping to change at line 1851 ¶ | |||
| CloseGroup: Calling CloseGroup on a TCP Connection (CLOSE.TCP) is | CloseGroup: Calling CloseGroup on a TCP Connection (CLOSE.TCP) is | |||
| identical to calling Close on this Connection and on all | identical to calling Close on this Connection and on all | |||
| Connections in the same ConnectionGroup. | Connections in the same ConnectionGroup. | |||
| AbortGroup: Calling AbortGroup on a TCP Connection (ABORT.TCP) is | AbortGroup: Calling AbortGroup on a TCP Connection (ABORT.TCP) is | |||
| identical to calling Abort on this Connection and on all | identical to calling Abort on this Connection and on all | |||
| Connections in the same ConnectionGroup. | Connections in the same ConnectionGroup. | |||
| 10.2. MPTCP | 10.2. MPTCP | |||
| Connectedness: Connected | Connectedness: Connected | |||
| Data Unit: Byte-stream | Data Unit: Byte-stream | |||
| The Transport Services API mappings for MPTCP are identical to TCP. | The Transport Services API mappings for MPTCP are identical to TCP. | |||
| MPTCP adds support for multipath properties, such as multipath and | MPTCP adds support for multipath properties, such as multipath and | |||
| multipathPolicy, and actions for managing paths, such as AddRemote | multipathPolicy, and actions for managing paths, such as AddRemote | |||
| and RemoveRemote. | and RemoveRemote. | |||
| 10.3. UDP | 10.3. UDP | |||
| Connectedness: Connectionless | Connectedness: Connectionless | |||
| Data Unit: Datagram | Data Unit: Datagram | |||
| Connection Object: UDP Connections represent a pair of specific IP | Connection Object: UDP Connections represent a pair of specific IP | |||
| addresses and ports on two hosts. | addresses and ports on two hosts. | |||
| Initiate: CONNECT.UDP. Calling Initiate on a UDP Connection causes | Initiate: CONNECT.UDP. Calling Initiate on a UDP Connection causes | |||
| it to reserve a local port, but does not generate any traffic. | it to reserve a local port but does not generate any traffic. | |||
| InitiateWithSend: Early data on a UDP Connection does not have any | InitiateWithSend: Early data on a UDP Connection does not have any | |||
| special meaning. The data is sent whenever the Connection is | special meaning. The data is sent whenever the Connection is | |||
| Ready. | Ready. | |||
| Ready: A UDP Connection is ready once the system has reserved a | Ready: A UDP Connection is ready once the system has reserved a | |||
| local port and has a path to send to the Remote Endpoint. | local port and has a path to send to the Remote Endpoint. | |||
| EstablishmentError: UDP Connections can only generate errors on | EstablishmentError: UDP Connections can only generate errors on | |||
| initiation due to port conflicts on the local system. | initiation due to port conflicts on the local system. | |||
| skipping to change at page 41, line 27 ¶ | skipping to change at line 1901 ¶ | |||
| they have received traffic from a new Remote Endpoint. | they have received traffic from a new Remote Endpoint. | |||
| Clone: Calling Clone on a UDP Connection creates a new Connection | Clone: Calling Clone on a UDP Connection creates a new Connection | |||
| with equivalent parameters. The two Connections are otherwise | with equivalent parameters. The two Connections are otherwise | |||
| independent. | independent. | |||
| Send: SEND.UDP. Calling Send on a UDP connection sends the data as | Send: SEND.UDP. Calling Send on a UDP connection sends the data as | |||
| the payload of a complete UDP datagram. Marking Messages as Final | the payload of a complete UDP datagram. Marking Messages as Final | |||
| does not change anything in the datagram's contents. Upon sending | does not change anything in the datagram's contents. Upon sending | |||
| a UDP datagram, some relevant fields and flags in the IP header | a UDP datagram, some relevant fields and flags in the IP header | |||
| can be controlled: DSCP (SET_DSCP.UDP), DF in IPv4 (SET_DF.UDP) | can be controlled: DSCP (SET_DSCP.UDP), DF in IPv4 (SET_DF.UDP), | |||
| and ECN flag (SET_ECN.UDP). | and ECN flag (SET_ECN.UDP). | |||
| Receive: RECEIVE.UDP. UDP only delivers complete Messages to | Receive: RECEIVE.UDP. UDP only delivers complete Messages to | |||
| Received, each of which represents a single datagram received in a | Received, each of which represents a single datagram received in a | |||
| UDP packet. Upon receiving a UDP datagram, the ECN flag from the | UDP packet. Upon receiving a UDP datagram, the ECN flag from the | |||
| IP header can be obtained (GET_ECN.UDP). | IP header can be obtained (GET_ECN.UDP). | |||
| Close: Calling Close on a UDP Connection (ABORT.UDP) releases the | Close: Calling Close on a UDP Connection (ABORT.UDP) releases the | |||
| local port reservation. The Connection then issues a Closed | local port reservation. The Connection then issues a Closed | |||
| event. | event. | |||
| Abort: Calling Abort on a UDP Connection (ABORT.UDP) is identical to | Abort: Calling Abort on a UDP Connection (ABORT.UDP) is identical to | |||
| calling Close, except that the Connection will send a | calling Close except that the Connection will send a | |||
| ConnectionError event rather than a Closed event. | ConnectionError event rather than a Closed event. | |||
| CloseGroup: Calling CloseGroup on a UDP Connection (ABORT.UDP) is | CloseGroup: Calling CloseGroup on a UDP Connection (ABORT.UDP) is | |||
| identical to calling Close on this Connection and on all | identical to calling Close on this Connection and on all | |||
| Connections in the same ConnectionGroup. | Connections in the same ConnectionGroup. | |||
| AbortGroup: Calling AbortGroup on a UDP Connection (ABORT.UDP) is | AbortGroup: Calling AbortGroup on a UDP Connection (ABORT.UDP) is | |||
| identical to calling Close on this Connection and on all | identical to calling Close on this Connection and on all | |||
| Connections in the same ConnectionGroup. | Connections in the same ConnectionGroup. | |||
| 10.4. UDP-Lite | 10.4. UDP-Lite | |||
| Connectedness: Connectionless | Connectedness: Connectionless | |||
| Data Unit: Datagram | Data Unit: Datagram | |||
| The Transport Services API mappings for UDP-Lite are identical to | The Transport Services API mappings for UDP-Lite are identical to | |||
| UDP. In addition, UDP-Lite supports the msgChecksumLen and | UDP. In addition, UDP-Lite supports the msgChecksumLen and | |||
| recvChecksumLen Properties that allow an application to specify the | recvChecksumLen Properties that allow an application to specify the | |||
| minimum number of bytes in a Message that need to be covered by a | minimum number of bytes in a Message that need to be covered by a | |||
| checksum. | checksum. | |||
| This includes: CONNECT.UDP-Lite; LISTEN.UDP-Lite; SEND.UDP-Lite; | This includes: CONNECT.UDP-Lite; LISTEN.UDP-Lite; SEND.UDP-Lite; | |||
| RECEIVE.UDP-Lite; ABORT.UDP-Lite; ERROR.UDP-Lite; SET_DSCP.UDP-Lite; | RECEIVE.UDP-Lite; ABORT.UDP-Lite; ERROR.UDP-Lite; SET_DSCP.UDP-Lite; | |||
| SET_DF.UDP-Lite; SET_ECN.UDP-Lite; GET_ECN.UDP-Lite. | SET_DF.UDP-Lite; SET_ECN.UDP-Lite; GET_ECN.UDP-Lite. | |||
| 10.5. UDP Multicast Receive | 10.5. UDP Multicast Receive | |||
| Connectedness: Connectionless | Connectedness: Connectionless | |||
| Data Unit: Datagram | Data Unit: Datagram | |||
| Connection Object: Established UDP Multicast Receive connections | Connection Object: Established UDP Multicast Receive connections | |||
| represent a pair of specific IP addresses and ports. The | represent a pair of specific IP addresses and ports. The | |||
| direction Selection Property must be set to unidirectional | direction Selection Property must be set to Unidirectional | |||
| receive, and the Local Endpoint must be configured with a group IP | receive, and the Local Endpoint must be configured with a group IP | |||
| address and a port. | address and a port. | |||
| Initiate: Calling Initiate on a UDP Multicast Receive Connection | Initiate: Calling Initiate on a UDP Multicast Receive Connection | |||
| causes an immediate EstablishmentError. This is an unsupported | causes an immediate EstablishmentError. This is an unsupported | |||
| operation. | operation. | |||
| InitiateWithSend: Calling InitiateWithSend on a UDP Multicast | InitiateWithSend: Calling InitiateWithSend on a UDP Multicast | |||
| Receive Connection causes an immediate EstablishmentError. This | Receive Connection causes an immediate EstablishmentError. This | |||
| is an unsupported operation. | is an unsupported operation. | |||
| skipping to change at page 43, line 6 ¶ | skipping to change at line 1974 ¶ | |||
| EstablishmentError: UDP Multicast Receive Connections generate an | EstablishmentError: UDP Multicast Receive Connections generate an | |||
| EstablishmentError indicating that joining a multicast group | EstablishmentError indicating that joining a multicast group | |||
| failed if Initiate is called. | failed if Initiate is called. | |||
| ConnectionError: The only ConnectionError generated by a UDP | ConnectionError: The only ConnectionError generated by a UDP | |||
| Multicast Receive Connection is in response to an Abort call. | Multicast Receive Connection is in response to an Abort call. | |||
| Listen: LISTEN.UDP. Calling Listen for UDP Multicast Receive binds | Listen: LISTEN.UDP. Calling Listen for UDP Multicast Receive binds | |||
| a local port, prepares it to receive inbound UDP datagrams from | a local port, prepares it to receive inbound UDP datagrams from | |||
| peers, and issues a multicast host join. If a Remote Endpoint | peers, and issues a multicast host join. If a Remote Endpoint | |||
| Identifer with an address is supplied, the join is Source-specific | Identifier with an address is supplied, the join is Source- | |||
| Multicast, and the path selection is based on the route to the | Specific Multicast, and the path selection is based on the route | |||
| Remote Endpoint. If a Remote Endpoint Identifer is not supplied, | to the Remote Endpoint. If a Remote Endpoint Identifier is not | |||
| the join is Any-source Multicast, and the path selection is based | supplied, the join is Any-Source Multicast, and the path selection | |||
| on the outbound route to the group supplied in the Local Endpoint. | is based on the outbound route to the group supplied in the Local | |||
| Endpoint. | ||||
| There are cases where it is required to open multiple connections for | There are cases where it is required to open multiple connections for | |||
| the same address(es). For example, one Connection might be opened | the same address(es). For example, one Connection might be opened | |||
| for a multicast group to for a multicast control bus, and another | for a multicast group used for a shared control bus, and another | |||
| application later opens a separate Connection to the same group to | application later opens a separate Connection to the same group to | |||
| send signals to and/or receive signals from the common bus. In such | send signals to and/or receive signals from the common bus. In such | |||
| cases, the Transport Services system needs to explicitly enable re- | cases, the Transport Services system needs to explicitly enable reuse | |||
| use of the same set of addresses (equivalent to setting SO_REUSEADDR | of the same set of addresses (equivalent to setting SO_REUSEADDR in | |||
| in the socket API). | the Socket API). | |||
| ConnectionReceived: UDP Multicast Receive Listeners will deliver new | ConnectionReceived: UDP Multicast Receive Listeners will deliver new | |||
| Connections once they have received traffic from a new Remote | Connections once they have received traffic from a new Remote | |||
| Endpoint. | Endpoint. | |||
| Clone: Calling Clone on a UDP Multicast Receive Connection creates a | Clone: Calling Clone on a UDP Multicast Receive Connection creates a | |||
| new Connection with equivalent parameters. The two Connections | new Connection with equivalent parameters. The two Connections | |||
| are otherwise independent. | are otherwise independent. | |||
| Send: SEND.UDP. Calling Send on a UDP Multicast Receive connection | Send: SEND.UDP. Calling Send on a UDP Multicast Receive Connection | |||
| causes an immediate SendError. This is an unsupported operation. | causes an immediate SendError. This is an unsupported operation. | |||
| Receive: RECEIVE.UDP. The Receive operation in a UDP Multicast | Receive: RECEIVE.UDP. The Receive operation in a UDP Multicast | |||
| Receive connection only delivers complete Messages to Received, | Receive Connection only delivers complete Messages to Received, | |||
| each of which represents a single datagram received in a UDP | each of which represents a single datagram received in a UDP | |||
| packet. Upon receiving a UDP datagram, the ECN flag from the IP | packet. Upon receiving a UDP datagram, the ECN flag from the IP | |||
| header can be obtained (GET_ECN.UDP). | header can be obtained (GET_ECN.UDP). | |||
| Close: Calling Close on a UDP Multicast Receive Connection | Close: Calling Close on a UDP Multicast Receive Connection | |||
| (ABORT.UDP) releases the local port reservation and leaves the | (ABORT.UDP) releases the local port reservation and leaves the | |||
| group. The Connection then issues a Closed event. | group. The Connection then issues a Closed event. | |||
| Abort: Calling Abort on a UDP Multicast Receive Connection | Abort: Calling Abort on a UDP Multicast Receive Connection | |||
| (ABORT.UDP) is identical to calling Close, except that the | (ABORT.UDP) is identical to calling Close except that the | |||
| Connection will send a ConnectionError event rather than a Closed | Connection will send a ConnectionError event rather than a Closed | |||
| event. | event. | |||
| CloseGroup: Calling CloseGroup on a UDP Multicast Receive Connection | CloseGroup: Calling CloseGroup on a UDP Multicast Receive Connection | |||
| (ABORT.UDP) is identical to calling Close on this Connection and | (ABORT.UDP) is identical to calling Close on this Connection and | |||
| on all Connections in the same ConnectionGroup. | on all Connections in the same ConnectionGroup. | |||
| AbortGroup: Calling AbortGroup on a UDP Multicast Receive Connection | AbortGroup: Calling AbortGroup on a UDP Multicast Receive Connection | |||
| (ABORT.UDP) is identical to calling Close on this Connection and | (ABORT.UDP) is identical to calling Close on this Connection and | |||
| on all Connections in the same ConnectionGroup. | on all Connections in the same ConnectionGroup. | |||
| 10.6. SCTP | 10.6. SCTP | |||
| Connectedness: Connected | Connectedness: Connected | |||
| Data Unit: Message | Data Unit: Message | |||
| Connection Object: Connection objects can be mapped to an SCTP | Connection Object: Connection objects can be mapped to an SCTP | |||
| association or a stream in an SCTP association. Mapping | association or a stream in an SCTP association. Mapping | |||
| Connection objects to SCTP streams is called "stream mapping" and | Connection objects to SCTP streams is called "stream mapping" and | |||
| has additional requirements as follows. The following explanation | has additional requirements as follows. The following explanation | |||
| assumes a client-server communication model. | assumes a client-server communication model. | |||
| Stream mapping requires an association to already be in place between | Stream mapping requires an association to already be in place | |||
| the client and the server, and it requires the server to understand | between the client and the server, and it requires the server to | |||
| that a new incoming stream should be represented as a new Connection | understand that a new incoming stream should be represented as a | |||
| object by the Transport Services system. A new SCTP stream is | new Connection object by the Transport Services system. A new | |||
| created by sending an SCTP message with a new stream id. Thus, to | SCTP stream is created by sending an SCTP message with a new | |||
| implement stream mapping, the Transport Services API must provide a | stream id. Thus, to implement stream mapping, the Transport | |||
| newly created Connection object to the application upon the reception | Services API must provide a newly created Connection object to the | |||
| of such a message. The necessary semantics to implement a Transport | application upon the reception of such a message. The necessary | |||
| Services system's Close and Abort primitives are provided by the | semantics to implement a Transport Services system's Close and | |||
| stream reconfiguration (reset) procedure described in [RFC6525]. | Abort primitives are provided by the stream reconfiguration | |||
| This also allows to re-use a stream id after resetting ("closing") | (reset) procedure described in [RFC6525]. This also allows a | |||
| the stream. To implement this functionality, SCTP stream | stream id to be reused after resetting ("closing") the stream. To | |||
| reconfiguration [RFC6525] must be supported by both the client and | implement this functionality, SCTP stream reconfiguration | |||
| the server side. | [RFC6525] must be supported by both the client and the server | |||
| side. | ||||
| To avoid head-of-line blocking, stream mapping should only be | To avoid head-of-line blocking, stream mapping should only be | |||
| implemented when both sides support message interleaving [RFC8260]. | implemented when both sides support message interleaving | |||
| This allows a sender to schedule transmissions between multiple | [RFC8260]. This allows a sender to schedule transmissions between | |||
| streams without risking that transmission of a large message on one | multiple streams without risking that transmission of a large | |||
| stream might block transmissions on other streams for a long time. | message on one stream will block transmissions on other streams | |||
| for a long time. | ||||
| To avoid conflicts between stream ids, the following procedure is | To avoid conflicts between stream ids, the following procedure is | |||
| recommended: the first Connection, for which the SCTP association has | recommended: the first Connection, for which the SCTP association | |||
| been created, must always use stream id zero. All additional | has been created, must always use stream id zero. All additional | |||
| Connections are assigned to unused stream ids in growing order. To | Connections are assigned to unused stream ids in ascending order. | |||
| avoid a conflict when both endpoints map new Connections | To avoid a conflict when both endpoints map new Connections | |||
| simultaneously, the peer which initiated association must use even | simultaneously, the peer that initiated association must use even | |||
| stream ids whereas the remote side must map its Connections to odd | stream ids whereas the remote side must map its Connections to odd | |||
| stream ids. Both sides maintain a status map of the assigned stream | stream ids. Both sides maintain a status map of the assigned | |||
| ids. Generally, new streams should consume the lowest available | stream ids. Generally, new streams should consume the lowest | |||
| (even or odd, depending on the side) stream id; this rule is relevant | available (even or odd, depending on the side) stream id; this | |||
| when lower ids become available because Connection objects associated | rule is relevant when lower ids become available because | |||
| with the streams are closed. | Connection objects associated with the streams are closed. | |||
| SCTP stream mapping as described here has been implemented in a | SCTP stream mapping as described here has been implemented in a | |||
| research prototype; a desription of this implementation is given in | research prototype; a description of this implementation is given | |||
| [NEAT-flow-mapping]. | in [NEAT-flow-mapping]. | |||
| Initiate: If this is the only Connection object that is assigned to | Initiate: If this is the only Connection object that is assigned to | |||
| the SCTP Association or stream mapping is not used, CONNECT.SCTP | the SCTP Association or stream mapping is not used, CONNECT.SCTP | |||
| is called. Else, unless the Selection Property | is called. Else, unless the Selection Property | |||
| activeReadBeforeSend is Preferred or Required, a new stream is | activeReadBeforeSend is Preferred or Required, a new stream is | |||
| used: if there are enough streams available, Initiate is a local | used: if there are enough streams available, Initiate is a local | |||
| operation that assigns a new stream id to the Connection object. | operation that assigns a new stream id to the Connection object. | |||
| The number of streams is negotiated as a parameter of the prior | The number of streams is negotiated as a parameter of the prior | |||
| CONNECT.SCTP call, and it represents a trade-off between local | CONNECT.SCTP call, and it represents a trade-off between local | |||
| resource usage and the number of Connection objects that can be | resource usage and the number of Connection objects that can be | |||
| mapped without requiring a reconfiguration signal. When running | mapped without requiring a reconfiguration signal. When running | |||
| out of streams, ADD_STREAM.SCTP must be called. | out of streams, ADD_STREAM.SCTP must be called. | |||
| InitiateWithSend: If this is the only Connection object that is | InitiateWithSend: If this is the only Connection object that is | |||
| assigned to the SCTP association or stream mapping is not used, | assigned to the SCTP association or stream mapping is not used, | |||
| CONNECT.SCTP is called with the "user message" parameter. Else, a | CONNECT.SCTP is called with the user message parameter. Else, a | |||
| new stream is used (see Initiate for how to handle running out of | new stream is used (see Initiate for how to handle running out of | |||
| streams), and this just sends the first message on a new stream. | streams), and this just sends the first message on a new stream. | |||
| Ready: Initiate or InitiateWithSend returns without an error, i.e. | Ready: Initiate or InitiateWithSend returns without an error, i.e., | |||
| SCTP's four-way handshake has completed. If an association with | SCTP's four-way handshake has completed. If an association with | |||
| the peer already exists, stream mapping is used and enough streams | the peer already exists, stream mapping is used, and enough | |||
| are available, a Connection object instantly becomes Ready after | streams are available, a Connection object instantly becomes Ready | |||
| calling Initiate or InitiateWithSend. | after calling Initiate or InitiateWithSend. | |||
| EstablishmentError: Failure of CONNECT.SCTP. | EstablishmentError: Failure of CONNECT.SCTP. | |||
| ConnectionError: TIMEOUT.SCTP or ABORT-EVENT.SCTP. | ConnectionError: TIMEOUT.SCTP or ABORT-EVENT.SCTP. | |||
| Listen: LISTEN.SCTP. If an association with the peer already exists | Listen: LISTEN.SCTP. If an association with the peer already exists | |||
| and stream mapping is used, Listen just expects to receive a new | and stream mapping is used, Listen just expects to receive a new | |||
| message with a new stream id (chosen in accordance with the stream | message with a new stream id (chosen in accordance with the stream | |||
| id assignment procedure described above). | id assignment procedure described above). | |||
| ConnectionReceived: LISTEN.SCTP returns without an error (a result | ConnectionReceived: LISTEN.SCTP returns without an error (a result | |||
| of successful CONNECT.SCTP from the peer), or, in case of stream | of successful CONNECT.SCTP from the peer) or, in the case of | |||
| mapping, the first message has arrived on a new stream (in this | stream mapping, the first message has arrived on a new stream (in | |||
| case, Receive is also invoked). | this case, Receive is also invoked). | |||
| Clone: Calling Clone on an SCTP association creates a new Connection | Clone: Calling Clone on an SCTP association creates a new Connection | |||
| object and assigns it a new stream id in accordance with the | object and assigns it a new stream id in accordance with the | |||
| stream id assignment procedure described above. If there are not | stream id assignment procedure described above. If there are not | |||
| enough streams available, ADD_STREAM.SCTP must be called. | enough streams available, ADD_STREAM.SCTP must be called. | |||
| Send: SEND.SCTP. Message Properties such as msgLifetime and | Send: SEND.SCTP. Message Properties such as msgLifetime and | |||
| msgOrdered map to parameters of this primitive. | msgOrdered map to parameters of this primitive. | |||
| Receive: RECEIVE.SCTP. The "partial flag" of RECEIVE.SCTP invokes a | Receive: RECEIVE.SCTP. The "partial flag" of RECEIVE.SCTP invokes a | |||
| ReceivedPartial event. | ReceivedPartial event. | |||
| Close: If this is the only Connection object that is assigned to the | Close: If this is the only Connection object that is assigned to the | |||
| SCTP association, CLOSE.SCTP is called, and the Closed event will be | SCTP association, CLOSE.SCTP is called and the Closed event will | |||
| delivered to the application upon the ensuing CLOSE-EVENT.SCTP. | be delivered to the application upon the ensuing CLOSE-EVENT.SCTP. | |||
| Else, the Connection object is one out of several Connection objects | Else, the Connection object is one out of several Connection | |||
| that are assigned to the same SCTP assocation, and RESET_STREAM.SCTP | objects that are assigned to the same SCTP association, and | |||
| must be called, which informs the peer that the stream will no longer | RESET_STREAM.SCTP must be called, which informs the peer that the | |||
| be used for mapping and can be used by future Initiate, | stream will no longer be used for mapping and can be used by | |||
| InitiateWithSend or Listen calls. At the peer, the event | future Initiate, InitiateWithSend, or Listen calls. At the peer, | |||
| RESET_STREAM-EVENT.SCTP will fire, which the peer must answer by | the event RESET_STREAM-EVENT.SCTP will be initiated, which the | |||
| issuing RESET_STREAM.SCTP too. The resulting local RESET_STREAM- | peer must answer by issuing RESET_STREAM.SCTP too. The resulting | |||
| EVENT.SCTP informs the Transport Services system that the stream id | local RESET_STREAM-EVENT.SCTP informs the Transport Services | |||
| can now be re-used by the next Initiate, InitiateWithSend or Listen | system that the stream id can now be reused by the next Initiate, | |||
| calls, and invokes a Closed event towards the application. | InitiateWithSend, or Listen calls, and invokes a Closed event | |||
| toward the application. | ||||
| Abort: If this is the only Connection object that is assigned to the | Abort: If this is the only Connection object that is assigned to the | |||
| SCTP association, ABORT.SCTP is called. Else, the Connection object | SCTP association, ABORT.SCTP is called. Else, the Connection | |||
| is one out of several Connection objects that are assigned to the | object is one out of several Connection objects that are assigned | |||
| same SCTP assocation, and shutdown proceeds as described under Close. | to the same SCTP association, and shutdown proceeds as described | |||
| under Close. | ||||
| CloseGroup: Calling CloseGroup calls CLOSE.SCTP, closing all | CloseGroup: Calling CloseGroup calls CLOSE.SCTP, which closes all | |||
| Connections in the SCTP association. | Connections in the SCTP association. | |||
| AbortGroup: Calling AbortGroup calls ABORT.SCTP, immediately closing | AbortGroup: Calling AbortGroup calls ABORT.SCTP, which immediately | |||
| all Connections in the SCTP association. | closes all Connections in the SCTP association. | |||
| In addition to the API mappings described above, when there are | In addition to the API mappings described above, when there are | |||
| multiple Connection objects assigned to the same SCTP association, | multiple Connection objects assigned to the same SCTP association, | |||
| SCTP can support Connection properties such as connPriority and | SCTP can support Connection properties such as connPriority and | |||
| connScheduler where CONFIGURE_STREAM_SCHEDULER.SCTP can be called to | connScheduler where CONFIGURE_STREAM_SCHEDULER.SCTP can be called to | |||
| adjust the priorities of streams in the SCTP association. | adjust the priorities of streams in the SCTP association. | |||
| 11. IANA Considerations | 11. IANA Considerations | |||
| This document has no actions for IANA. | This document has no IANA actions. | |||
| 12. Security Considerations | 12. Security Considerations | |||
| [I-D.ietf-taps-arch] outlines general security consideration and | [RFC9621] outlines general security considerations and requirements | |||
| requirements for any system that implements the Transport Services | for any system that implements the Transport Services architecture. | |||
| archtecture. [I-D.ietf-taps-interface] provides further discussion | [RFC9622] provides further discussion on security and privacy | |||
| on security and privacy implications of the Transport Services API. | implications of the Transport Services API. This document provides | |||
| This document provides additional guidance on implementation | additional guidance on implementation specifics for the Transport | |||
| specifics for the Transport Services API and as such the security | Services API; as such, the security considerations in both of these | |||
| considerations in both of these documents apply. The next two | documents apply. The next two subsections discuss further | |||
| subsections discuss further considerations that are specific to | considerations that are specific to mechanisms specified in this | |||
| mechanisms specified in this document. | document. | |||
| 12.1. Considerations for Candidate Gathering | 12.1. Considerations for Candidate Gathering | |||
| The Security Considerations of the Transport Services Architecture | As discussed in Sections 3 and 6 of [RFC9621], gathering and racing | |||
| [I-D.ietf-taps-arch] forbids gathering and racing with Protocol | with Protocol Stacks that do not have equivalent security properties | |||
| Stacks that do not have equivalent security properties. Therefore, | ought not be attempted. Therefore, implementations need to avoid | |||
| implementations need to avoid downgrade attacks that allow network | downgrade attacks that allow network interference to cause the | |||
| interference to cause the implementation to select less secure, or | implementation to select less secure, or entirely insecure, | |||
| entirely insecure, combinations of paths and protocols. | combinations of paths and protocols. | |||
| 12.2. Considerations for Candidate Racing | 12.2. Considerations for Candidate Racing | |||
| See Section 5.3 for security considerations around racing with 0-RTT | See Section 5.3 for security considerations around racing with 0-RTT | |||
| data. | data. | |||
| An attacker that knows a particular device is racing several options | An attacker that knows a particular device is racing several options | |||
| during connection establishment may be able to block packets for the | during connection establishment may be able to block packets for the | |||
| first connection attempt, thus inducing the device to fall back to a | first connection attempt, thus inducing the device to fall back to a | |||
| secondary attempt. This is a problem if the secondary attempts have | secondary attempt. This is a problem if the secondary attempts have | |||
| skipping to change at page 47, line 38 ¶ | skipping to change at line 2204 ¶ | |||
| security properties to avoid incentivizing attacks. | security properties to avoid incentivizing attacks. | |||
| Since results from the network can determine how a connection attempt | Since results from the network can determine how a connection attempt | |||
| tree is built, such as when DNS returns a list of resolved endpoints, | tree is built, such as when DNS returns a list of resolved endpoints, | |||
| it is possible for the network to cause an implementation to consume | it is possible for the network to cause an implementation to consume | |||
| significant on-device resources. Implementations should limit the | significant on-device resources. Implementations should limit the | |||
| maximum amount of state allowed for any given node, including the | maximum amount of state allowed for any given node, including the | |||
| number of child nodes, especially when the state is based on results | number of child nodes, especially when the state is based on results | |||
| from the network. | from the network. | |||
| 13. Acknowledgements | 13. References | |||
| This work has received funding from the European Union's Horizon 2020 | ||||
| research and innovation programme under grant agreement No. 644334 | ||||
| (NEAT) and No. 815178 (5GENESIS). | ||||
| This work has been supported by Leibniz Prize project funds of DFG - | ||||
| German Research Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ | ||||
| FE 570/4-1). | ||||
| This work has been supported by the UK Engineering and Physical | ||||
| Sciences Research Council under grant EP/R04144X/1. | ||||
| This work has been supported by the Research Council of Norway under | ||||
| its "Toppforsk" programme through the "OCARINA" project. | ||||
| Thanks to Colin Perkins, Tom Jones, Karl-Johan Grinnemo, Gorry | ||||
| Fairhurst, for their contributions to the design of this | ||||
| specification. Thanks also to Stuart Cheshire, Josh Graessley, David | ||||
| Schinazi, and Eric Kinnear for their implementation and design | ||||
| efforts, including Happy Eyeballs, that heavily influenced this work. | ||||
| 14. References | ||||
| 14.1. Normative References | ||||
| [I-D.ietf-taps-arch] | ||||
| Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G., and | ||||
| C. Perkins, "Architecture and Requirements for Transport | ||||
| Services", Work in Progress, Internet-Draft, draft-ietf- | ||||
| taps-arch-19, 9 November 2023, | ||||
| <https://datatracker.ietf.org/doc/html/draft-ietf-taps- | ||||
| arch-19>. | ||||
| [I-D.ietf-taps-interface] | 13.1. Normative References | |||
| Trammell, B., Welzl, M., Enghardt, R., Fairhurst, G., | ||||
| Kühlewind, M., Perkins, C., Tiesel, P. S., and T. Pauly, | ||||
| "An Abstract Application Layer Interface to Transport | ||||
| Services", Work in Progress, Internet-Draft, draft-ietf- | ||||
| taps-interface-23, 14 November 2023, | ||||
| <https://datatracker.ietf.org/doc/html/draft-ietf-taps- | ||||
| interface-23>. | ||||
| [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP | [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP | |||
| Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, | Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, | |||
| <https://www.rfc-editor.org/rfc/rfc7413>. | <https://www.rfc-editor.org/info/rfc7413>. | |||
| [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext | ||||
| Transfer Protocol Version 2 (HTTP/2)", RFC 7540, | ||||
| DOI 10.17487/RFC7540, May 2015, | ||||
| <https://www.rfc-editor.org/rfc/rfc7540>. | ||||
| [RFC8303] Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of | [RFC8303] Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of | |||
| Transport Features Provided by IETF Transport Protocols", | Transport Features Provided by IETF Transport Protocols", | |||
| RFC 8303, DOI 10.17487/RFC8303, February 2018, | RFC 8303, DOI 10.17487/RFC8303, February 2018, | |||
| <https://www.rfc-editor.org/rfc/rfc8303>. | <https://www.rfc-editor.org/info/rfc8303>. | |||
| [RFC8304] Fairhurst, G. and T. Jones, "Transport Features of the | [RFC8304] Fairhurst, G. and T. Jones, "Transport Features of the | |||
| User Datagram Protocol (UDP) and Lightweight UDP (UDP- | User Datagram Protocol (UDP) and Lightweight UDP (UDP- | |||
| Lite)", RFC 8304, DOI 10.17487/RFC8304, February 2018, | Lite)", RFC 8304, DOI 10.17487/RFC8304, February 2018, | |||
| <https://www.rfc-editor.org/rfc/rfc8304>. | <https://www.rfc-editor.org/info/rfc8304>. | |||
| [RFC8305] Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2: | [RFC8305] Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2: | |||
| Better Connectivity Using Concurrency", RFC 8305, | Better Connectivity Using Concurrency", RFC 8305, | |||
| DOI 10.17487/RFC8305, December 2017, | DOI 10.17487/RFC8305, December 2017, | |||
| <https://www.rfc-editor.org/rfc/rfc8305>. | <https://www.rfc-editor.org/info/rfc8305>. | |||
| [RFC8421] Martinsen, P., Reddy, T., and P. Patil, "Guidelines for | [RFC8421] Martinsen, P., Reddy, T., and P. Patil, "Guidelines for | |||
| Multihomed and IPv4/IPv6 Dual-Stack Interactive | Multihomed and IPv4/IPv6 Dual-Stack Interactive | |||
| Connectivity Establishment (ICE)", BCP 217, RFC 8421, | Connectivity Establishment (ICE)", BCP 217, RFC 8421, | |||
| DOI 10.17487/RFC8421, July 2018, | DOI 10.17487/RFC8421, July 2018, | |||
| <https://www.rfc-editor.org/rfc/rfc8421>. | <https://www.rfc-editor.org/info/rfc8421>. | |||
| [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol | [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol | |||
| Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, | Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, | |||
| <https://www.rfc-editor.org/rfc/rfc8446>. | <https://www.rfc-editor.org/info/rfc8446>. | |||
| [RFC8923] Welzl, M. and S. Gjessing, "A Minimal Set of Transport | [RFC8923] Welzl, M. and S. Gjessing, "A Minimal Set of Transport | |||
| Services for End Systems", RFC 8923, DOI 10.17487/RFC8923, | Services for End Systems", RFC 8923, DOI 10.17487/RFC8923, | |||
| October 2020, <https://www.rfc-editor.org/rfc/rfc8923>. | October 2020, <https://www.rfc-editor.org/info/rfc8923>. | |||
| 14.2. Informative References | [RFC9113] Thomson, M., Ed. and C. Benfield, Ed., "HTTP/2", RFC 9113, | |||
| DOI 10.17487/RFC9113, June 2022, | ||||
| <https://www.rfc-editor.org/info/rfc9113>. | ||||
| [I-D.ietf-dnsop-svcb-https] | [RFC9621] Pauly, T., Ed., Trammell, B., Ed., Brunstrom, A., | |||
| Schwartz, B. M., Bishop, M., and E. Nygren, "Service | Fairhurst, G., and C. S. Perkins, "Architecture and | |||
| Binding and Parameter Specification via the DNS (SVCB and | Requirements for Transport Services", RFC 9621, | |||
| HTTPS Resource Records)", Work in Progress, Internet- | DOI 10.17487/RFC9621, December 2024, | |||
| Draft, draft-ietf-dnsop-svcb-https-12, 11 March 2023, | <https://www.rfc-editor.org/info/rfc9621>. | |||
| <https://datatracker.ietf.org/doc/html/draft-ietf-dnsop- | ||||
| svcb-https-12>. | [RFC9622] Trammell, B., Ed., Welzl, M., Ed., Enghardt, R., | |||
| Fairhurst, G., Kühlewind, M., Perkins, C. S., Tiesel, P. | ||||
| S., and T. Pauly, "An Abstract Application Programming | ||||
| Interface (API) for Transport Services", RFC 9622, | ||||
| DOI 10.17487/RFC9622, December 2024, | ||||
| <https://www.rfc-editor.org/info/rfc9622>. | ||||
| 13.2. Informative References | ||||
| [NEAT-flow-mapping] | [NEAT-flow-mapping] | |||
| "Transparent Flow Mapping for NEAT", IFIP NETWORKING 2017 | Weinrank, F. and M. Tuxen, "Transparent flow mapping for | |||
| Workshop on Future of Internet Transport (FIT 2017) , | NEAT", 2017 IFIP Networking Conference (IFIP Networking) | |||
| 2017. | and Workshops, DOI 10.23919/IFIPNetworking.2017.8264876, | |||
| June 2017, <https://ieeexplore.ieee.org/document/8264876>. | ||||
| [RFC1928] Leech, M., Ganis, M., Lee, Y., Kuris, R., Koblas, D., and | [RFC1928] Leech, M., Ganis, M., Lee, Y., Kuris, R., Koblas, D., and | |||
| L. Jones, "SOCKS Protocol Version 5", RFC 1928, | L. Jones, "SOCKS Protocol Version 5", RFC 1928, | |||
| DOI 10.17487/RFC1928, March 1996, | DOI 10.17487/RFC1928, March 1996, | |||
| <https://www.rfc-editor.org/rfc/rfc1928>. | <https://www.rfc-editor.org/info/rfc1928>. | |||
| [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for | [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for | |||
| specifying the location of services (DNS SRV)", RFC 2782, | specifying the location of services (DNS SRV)", RFC 2782, | |||
| DOI 10.17487/RFC2782, February 2000, | DOI 10.17487/RFC2782, February 2000, | |||
| <https://www.rfc-editor.org/rfc/rfc2782>. | <https://www.rfc-editor.org/info/rfc2782>. | |||
| [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", | [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", | |||
| RFC 3124, DOI 10.17487/RFC3124, June 2001, | RFC 3124, DOI 10.17487/RFC3124, June 2001, | |||
| <https://www.rfc-editor.org/rfc/rfc3124>. | <https://www.rfc-editor.org/info/rfc3124>. | |||
| [RFC3207] Hoffman, P., "SMTP Service Extension for Secure SMTP over | [RFC3207] Hoffman, P., "SMTP Service Extension for Secure SMTP over | |||
| Transport Layer Security", RFC 3207, DOI 10.17487/RFC3207, | Transport Layer Security", RFC 3207, DOI 10.17487/RFC3207, | |||
| February 2002, <https://www.rfc-editor.org/rfc/rfc3207>. | February 2002, <https://www.rfc-editor.org/info/rfc3207>. | |||
| [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, | ||||
| "Session Traversal Utilities for NAT (STUN)", RFC 5389, | ||||
| DOI 10.17487/RFC5389, October 2008, | ||||
| <https://www.rfc-editor.org/rfc/rfc5389>. | ||||
| [RFC5766] Mahy, R., Matthews, P., and J. Rosenberg, "Traversal Using | ||||
| Relays around NAT (TURN): Relay Extensions to Session | ||||
| Traversal Utilities for NAT (STUN)", RFC 5766, | ||||
| DOI 10.17487/RFC5766, April 2010, | ||||
| <https://www.rfc-editor.org/rfc/rfc5766>. | ||||
| [RFC6525] Stewart, R., Tuexen, M., and P. Lei, "Stream Control | [RFC6525] Stewart, R., Tuexen, M., and P. Lei, "Stream Control | |||
| Transmission Protocol (SCTP) Stream Reconfiguration", | Transmission Protocol (SCTP) Stream Reconfiguration", | |||
| RFC 6525, DOI 10.17487/RFC6525, February 2012, | RFC 6525, DOI 10.17487/RFC6525, February 2012, | |||
| <https://www.rfc-editor.org/rfc/rfc6525>. | <https://www.rfc-editor.org/info/rfc6525>. | |||
| [RFC6762] Cheshire, S. and M. Krochmal, "Multicast DNS", RFC 6762, | [RFC6762] Cheshire, S. and M. Krochmal, "Multicast DNS", RFC 6762, | |||
| DOI 10.17487/RFC6762, February 2013, | DOI 10.17487/RFC6762, February 2013, | |||
| <https://www.rfc-editor.org/rfc/rfc6762>. | <https://www.rfc-editor.org/info/rfc6762>. | |||
| [RFC6763] Cheshire, S. and M. Krochmal, "DNS-Based Service | [RFC6763] Cheshire, S. and M. Krochmal, "DNS-Based Service | |||
| Discovery", RFC 6763, DOI 10.17487/RFC6763, February 2013, | Discovery", RFC 6763, DOI 10.17487/RFC6763, February 2013, | |||
| <https://www.rfc-editor.org/rfc/rfc6763>. | <https://www.rfc-editor.org/info/rfc6763>. | |||
| [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer | ||||
| Protocol (HTTP/1.1): Message Syntax and Routing", | ||||
| RFC 7230, DOI 10.17487/RFC7230, June 2014, | ||||
| <https://www.rfc-editor.org/rfc/rfc7230>. | ||||
| [RFC7657] Black, D., Ed. and P. Jones, "Differentiated Services | [RFC7657] Black, D., Ed. and P. Jones, "Differentiated Services | |||
| (Diffserv) and Real-Time Communication", RFC 7657, | (Diffserv) and Real-Time Communication", RFC 7657, | |||
| DOI 10.17487/RFC7657, November 2015, | DOI 10.17487/RFC7657, November 2015, | |||
| <https://www.rfc-editor.org/rfc/rfc7657>. | <https://www.rfc-editor.org/info/rfc7657>. | |||
| [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage | [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage | |||
| Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, | Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, | |||
| March 2017, <https://www.rfc-editor.org/rfc/rfc8085>. | March 2017, <https://www.rfc-editor.org/info/rfc8085>. | |||
| [RFC8260] Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann, | [RFC8260] Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann, | |||
| "Stream Schedulers and User Message Interleaving for the | "Stream Schedulers and User Message Interleaving for the | |||
| Stream Control Transmission Protocol", RFC 8260, | Stream Control Transmission Protocol", RFC 8260, | |||
| DOI 10.17487/RFC8260, November 2017, | DOI 10.17487/RFC8260, November 2017, | |||
| <https://www.rfc-editor.org/rfc/rfc8260>. | <https://www.rfc-editor.org/info/rfc8260>. | |||
| [RFC8445] Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive | [RFC8445] Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive | |||
| Connectivity Establishment (ICE): A Protocol for Network | Connectivity Establishment (ICE): A Protocol for Network | |||
| Address Translator (NAT) Traversal", RFC 8445, | Address Translator (NAT) Traversal", RFC 8445, | |||
| DOI 10.17487/RFC8445, July 2018, | DOI 10.17487/RFC8445, July 2018, | |||
| <https://www.rfc-editor.org/rfc/rfc8445>. | <https://www.rfc-editor.org/info/rfc8445>. | |||
| [RFC8489] Petit-Huguenin, M., Salgueiro, G., Rosenberg, J., Wing, | ||||
| D., Mahy, R., and P. Matthews, "Session Traversal | ||||
| Utilities for NAT (STUN)", RFC 8489, DOI 10.17487/RFC8489, | ||||
| February 2020, <https://www.rfc-editor.org/info/rfc8489>. | ||||
| [RFC8656] Reddy, T., Ed., Johnston, A., Ed., Matthews, P., and J. | ||||
| Rosenberg, "Traversal Using Relays around NAT (TURN): | ||||
| Relay Extensions to Session Traversal Utilities for NAT | ||||
| (STUN)", RFC 8656, DOI 10.17487/RFC8656, February 2020, | ||||
| <https://www.rfc-editor.org/info/rfc8656>. | ||||
| [RFC9000] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | [RFC9000] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | |||
| Multiplexed and Secure Transport", RFC 9000, | Multiplexed and Secure Transport", RFC 9000, | |||
| DOI 10.17487/RFC9000, May 2021, | DOI 10.17487/RFC9000, May 2021, | |||
| <https://www.rfc-editor.org/rfc/rfc9000>. | <https://www.rfc-editor.org/info/rfc9000>. | |||
| [RFC9040] Touch, J., Welzl, M., and S. Islam, "TCP Control Block | [RFC9040] Touch, J., Welzl, M., and S. Islam, "TCP Control Block | |||
| Interdependence", RFC 9040, DOI 10.17487/RFC9040, July | Interdependence", RFC 9040, DOI 10.17487/RFC9040, July | |||
| 2021, <https://www.rfc-editor.org/rfc/rfc9040>. | 2021, <https://www.rfc-editor.org/info/rfc9040>. | |||
| [RFC9110] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, | ||||
| Ed., "HTTP Semantics", STD 97, RFC 9110, | ||||
| DOI 10.17487/RFC9110, June 2022, | ||||
| <https://www.rfc-editor.org/info/rfc9110>. | ||||
| [RFC9460] Schwartz, B., Bishop, M., and E. Nygren, "Service Binding | ||||
| and Parameter Specification via the DNS (SVCB and HTTPS | ||||
| Resource Records)", RFC 9460, DOI 10.17487/RFC9460, | ||||
| November 2023, <https://www.rfc-editor.org/info/rfc9460>. | ||||
| [TCP-COUPLING] | [TCP-COUPLING] | |||
| "ctrlTCP: Reducing Latency through Coupled, Heterogeneous | Islam, S., Welzl, M., Hiorth, K., Hayes, D., Armitage, G., | |||
| Multi-Flow TCP Congestion Control", IEEE INFOCOM Global | and S. Gjessing, "ctrlTCP: Reducing latency through | |||
| Internet Symposium (GI) workshop (GI 2018) , n.d.. | coupled, heterogeneous multi-flow TCP congestion control", | |||
| IEEE INFOCOM 2018 - IEEE Conference on Computer | ||||
| Communications Workshops (INFOCOM WKSHPS), | ||||
| DOI 10.1109/INFCOMW.2018.8406887, 2018, | ||||
| <https://ieeexplore.ieee.org/document/8406887>. | ||||
| Appendix A. API Mapping Template | Appendix A. API Mapping Template | |||
| Any protocol mapping for the Transport Services API should follow a | Any protocol mapping for the Transport Services API should follow a | |||
| common template. | common template. | |||
| Connectedness: (Connectionless/Connected/Multiplexing Connected) | Connectedness: (Connectionless/Connected/Multiplexing Connected) | |||
| Data Unit: (Byte-stream/Datagram/Message) | Data Unit: (Byte-stream/Datagram/Message) | |||
| skipping to change at page 52, line 15 ¶ | skipping to change at line 2396 ¶ | |||
| Receive: | Receive: | |||
| Close: | Close: | |||
| Abort: | Abort: | |||
| CloseGroup: | CloseGroup: | |||
| AbortGroup: | AbortGroup: | |||
| Appendix B. Reasons for errors | Appendix B. Reasons for Errors | |||
| The Transport Services API [I-D.ietf-taps-interface] allows for the | The Transport Services API [RFC9622] allows for several generic error | |||
| several generic error types to specify a more detailed reason about | types to specify a more detailed reason about why an error occurred. | |||
| why an error occurred. This appendix lists some of the possible | This appendix lists some of the possible reasons. | |||
| reasons. | ||||
| * InvalidConfiguration: The transport properties and Endpoint | InvalidConfiguration: The transport properties and Endpoint | |||
| Identifers provided by the application are either contradictory or | Identifiers provided by the application are either contradictory | |||
| incomplete. Examples include the lack of a Remote Endpoint | or incomplete. Examples include the lack of a Remote Endpoint | |||
| Identifer on an active open or using a multicast group address | Identifier on an active open or using a multicast group address | |||
| while not requesting a unidirectional receive. | while not requesting a Unidirectional receive. | |||
| * NoCandidates: The configuration is valid, but none of the | NoCandidates: The configuration is valid, but none of the available | |||
| available transport protocols can satisfy the transport properties | transport protocols can satisfy the transport properties provided | |||
| provided by the application. | by the application. | |||
| * ResolutionFailed: The remote or local specifier provided by the | ResolutionFailed: The remote or local specifier provided by the | |||
| application can not be resolved. | application cannot be resolved. | |||
| * EstablishmentFailed: The Transport Services system was unable to | EstablishmentFailed: The Transport Services system was unable to | |||
| establish a transport-layer connection to the Remote Endpoint | establish a transport-layer connection to the Remote Endpoint | |||
| specified by the application. | specified by the application. | |||
| * PolicyProhibited: The system policy prevents the Transport | PolicyProhibited: The system policy prevents the Transport Services | |||
| Services system from performing the action requested by the | system from performing the action requested by the application. | |||
| application. | ||||
| * NotCloneable: The Protocol Stack is not capable of being cloned. | NotCloneable: The Protocol Stack is not capable of being cloned. | |||
| * MessageTooLarge: The Message is too big for the Transport Services | MessageTooLarge: The Message is too big for the Transport Services | |||
| system to handle. | system to handle. | |||
| * ProtocolFailed: The underlying Protocol Stack failed. | ProtocolFailed: The underlying Protocol Stack failed. | |||
| * InvalidMessageProperties: The Message Properties either contradict | InvalidMessageProperties: The Message Properties either contradict | |||
| the Transport Properties or they can not be satisfied by the | the Transport Properties or cannot be satisfied by the Transport | |||
| Transport Services system. | Services system. | |||
| * DeframingFailed: The data that was received by the underlying | DeframingFailed: The data that was received by the underlying | |||
| Protocol Stack could not be processed by the Message Framer. | Protocol Stack could not be processed by the Message Framer. | |||
| * ConnectionAborted: The connection was aborted by the peer. | ConnectionAborted: The connection was aborted by the peer. | |||
| * Timeout: Delivery of a Message was not possible after a timeout. | Timeout: Delivery of a Message was not possible after a timeout. | |||
| Appendix C. Existing Implementations | Appendix C. Existing Implementations | |||
| This appendix gives an overview of existing implementations, at the | This appendix gives an overview of existing implementations, at the | |||
| time of writing, of Transport Services systems that are (to some | time of writing, of Transport Services systems that are (to some | |||
| degree) in line with this document. | degree) in line with this document. | |||
| * Apple's Network.framework: | * Apple's Network.framework: | |||
| - Network.framework is a transport-level API built for C, | - Network.framework is a transport-level API built for C, | |||
| Objective-C, and Swift. It a connect-by-name API that supports | Objective-C, and Swift. It is a connect-by-name API that | |||
| transport security protocols. It provides userspace | supports transport security protocols. It provides user-space | |||
| implementations of TCP, UDP, TLS, DTLS, proxy protocols, and | implementations of TCP, UDP, TLS, DTLS, and proxy protocols, | |||
| allows extension via custom framers. | and it allows extension via custom framers. | |||
| - Documentation: https://developer.apple.com/documentation/ | - Documentation: https://developer.apple.com/documentation/ | |||
| network (https://developer.apple.com/documentation/network) | network | |||
| * NEAT and NEATPy: | * NEAT and NEATPy: | |||
| - NEAT is the output of the European H2020 research project | - NEAT is the output of the European H2020 research project | |||
| "NEAT"; it is a user-space library for protocol-independent | "NEAT"; it is a user-space library for protocol-independent | |||
| communication on top of TCP, UDP and SCTP, with many more | communication on top of TCP, UDP, and SCTP, with many more | |||
| features, such as a policy manager. | features, such as a policy manager. | |||
| - Code: https://github.com/NEAT-project/neat (https://github.com/ | - Code: https://github.com/NEAT-project/neat | |||
| NEAT-project/neat) | ||||
| - Code at the Software Heritage Archive: | - Code at the Software Heritage Archive: | |||
| https://archive.softwareheritage.org/swh:1:dir:737820840f83c4ec | https://archive.softwareheritage.org/swh:1:dir:737820840f83c4ec | |||
| 9493a8c0cc89b3159e2e1a57;origin=https://github.com/NEAT- | 9493a8c0cc89b3159e2e1a57;origin=https://github.com/NEAT- | |||
| project/neat;visit=swh:1:snp:bbb611b04e355439d47e426e8ad5d07cdb | project/neat;visit=swh:1:snp:bbb611b04e355439d47e426e8ad5d07cdb | |||
| f647e0;anchor=swh:1:rev:652ee991043ce3560a6e5715fa2a5c211139d15 | f647e0;anchor=swh:1:rev:652ee991043ce3560a6e5715fa2a5c211139d15 | |||
| c (https://archive.softwareheritage.org/swh:1:dir:737820840f83c | c | |||
| 4ec9493a8c0cc89b3159e2e1a57;origin=https://github.com/NEAT- | ||||
| project/neat;visit=swh:1:snp:bbb611b04e355439d47e426e8ad5d07cdb | ||||
| f647e0;anchor=swh:1:rev:652ee991043ce3560a6e5715fa2a5c211139d15 | ||||
| c) | ||||
| - NEAT project: https://www.neat-project.org (https://www.neat- | ||||
| project.org) | ||||
| - NEATPy is a Python shim over NEAT which updates the NEAT API to | - NEATPy is a Python shim over NEAT that updates the NEAT API to | |||
| be in line with version 6 of the Transport Services API draft. | be in line with version 6 of the Transport Services API | |||
| [RFC9622]. | ||||
| - Code: https://github.com/theagilepadawan/NEATPy | - Code: https://github.com/theagilepadawan/NEATPy | |||
| (https://github.com/theagilepadawan/NEATPy) | ||||
| - Code at the Software Heritage Archive: | - Code at the Software Heritage Archive: | |||
| https://archive.softwareheritage.org/swh:1:dir:295ccd148cf918cc | https://archive.softwareheritage.org/swh:1:dir:295ccd148cf918cc | |||
| b9ed7ad14b5ae968a8d2c370;origin=https://github.com/ | b9ed7ad14b5ae968a8d2c370;origin=https://github.com/ | |||
| theagilepadawan/NEATPy;visit=swh:1:snp:6e1a3a9dd4c532ba6c0f52c8 | theagilepadawan/NEATPy;visit=swh:1:snp:6e1a3a9dd4c532ba6c0f52c8 | |||
| f734c1256a06cedc;anchor=swh:1:rev:cd0788d7f7f34a0e9b8654516da7c | f734c1256a06cedc;anchor=swh:1:rev:cd0788d7f7f34a0e9b8654516da7c | |||
| 002c44d2e95 (https://archive.softwareheritage.org/swh:1:dir:295 | 002c44d2e95 | |||
| ccd148cf918ccb9ed7ad14b5ae968a8d2c370;origin=https://github.com | ||||
| /theagilepadawan/NEATPy;visit=swh:1:snp:6e1a3a9dd4c532ba6c0f52c | ||||
| 8f734c1256a06cedc;anchor=swh:1:rev:cd0788d7f7f34a0e9b8654516da7 | ||||
| c002c44d2e95) | ||||
| * PyTAPS: | * PyTAPS: | |||
| - A TAPS implementation based on Python asyncio, offering | - A Transport Services (TAPS) implementation based on Python | |||
| protocol-independent communication to applications on top of | asyncio, offering protocol-independent communication to | |||
| TCP, UDP and TLS, with support for multicast. | applications on top of TCP, UDP, and TLS, with support for | |||
| multicast. | ||||
| - Code: https://github.com/fg-inet/python-asyncio-taps | - Code: https://github.com/fg-inet/python-asyncio-taps | |||
| (https://github.com/fg-inet/python-asyncio-taps) | ||||
| - Code at the Software Heritage Archive: | - Code at the Software Heritage Archive: | |||
| https://archive.softwareheritage.org/swh:1:dir:a7151096d91352b4 | https://archive.softwareheritage.org/swh:1:dir:a7151096d91352b4 | |||
| 39b092ef116d04f38e52e556;origin=https://github.com/fg-inet/ | 39b092ef116d04f38e52e556;origin=https://github.com/fg-inet/ | |||
| python-asyncio-taps;visit=swh:1:snp:4841e59b53b28bb385726e7d3a5 | python-asyncio-taps;visit=swh:1:snp:4841e59b53b28bb385726e7d3a5 | |||
| 69bee0fea7fc4;anchor=swh:1:rev:63571fd7545da25142bc1a6371b8f130 | 69bee0fea7fc4;anchor=swh:1:rev:63571fd7545da25142bc1a6371b8f130 | |||
| 97cba38e (https://archive.softwareheritage.org/swh:1:dir:a71510 | 97cba38e | |||
| 96d91352b439b092ef116d04f38e52e556;origin=https://github.com/ | ||||
| fg-inet/python-asyncio-taps;visit=swh:1:snp:4841e59b53b28bb3857 | Acknowledgements | |||
| 26e7d3a569bee0fea7fc4;anchor=swh:1:rev:63571fd7545da25142bc1a63 | ||||
| 71b8f13097cba38e) | This work has received funding from the European Union's Horizon 2020 | |||
| research and innovation programme under grant agreement No. 644334 | ||||
| (NEAT) and No. 815178 (5GENESIS). | ||||
| This work has been supported by: | ||||
| * Leibniz Prize project funds from the DFG - German Research | ||||
| Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ FE 570/4-1). | ||||
| * the UK Engineering and Physical Sciences Research Council under | ||||
| grant EP/R04144X/1. | ||||
| * the Research Council of Norway under its "Toppforsk" programme | ||||
| through the "OCARINA" project. | ||||
| Thanks to Colin S. Perkins, Tom Jones, Karl-Johan Grinnemo, and Gorry | ||||
| Fairhurst for their contributions to the design of this | ||||
| specification. Thanks also to Stuart Cheshire, Josh Graessley, David | ||||
| Schinazi, and Eric Kinnear for their implementation and design | ||||
| efforts, including Happy Eyeballs, that heavily influenced this work. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Anna Brunstrom (editor) | Anna Brunstrom (editor) | |||
| Karlstad University | Karlstad University | |||
| Universitetsgatan 2 | Universitetsgatan 2 | |||
| 651 88 Karlstad | 651 88 Karlstad | |||
| Sweden | Sweden | |||
| Email: anna.brunstrom@kau.se | Email: anna.brunstrom@kau.se | |||
| Tommy Pauly (editor) | Tommy Pauly (editor) | |||
| Apple Inc. | Apple Inc. | |||
| One Apple Park Way | One Apple Park Way | |||
| Cupertino, California 95014, | Cupertino, CA 95014 | |||
| United States of America | United States of America | |||
| Email: tpauly@apple.com | Email: tpauly@apple.com | |||
| Reese Enghardt | Reese Enghardt | |||
| Netflix | Netflix | |||
| 121 Albright Way | 121 Albright Way | |||
| Los Gatos, CA 95032, | Los Gatos, CA 95032 | |||
| United States of America | United States of America | |||
| Email: ietf@tenghardt.net | Email: ietf@tenghardt.net | |||
| Philipp S. Tiesel | Philipp S. Tiesel | |||
| SAP SE | SAP SE | |||
| George-Stephenson-Straße 7-13 | George-Stephenson-Str. 7-13 | |||
| 10557 Berlin | 10557 Berlin | |||
| Germany | Germany | |||
| Email: philipp@tiesel.net | Email: philipp@tiesel.net | |||
| Michael Welzl | Michael Welzl | |||
| University of Oslo | University of Oslo | |||
| PO Box 1080 Blindern | PO Box 1080 Blindern | |||
| 0316 Oslo | 0316 Oslo | |||
| Norway | Norway | |||
| Email: michawe@ifi.uio.no | Email: michawe@ifi.uio.no | |||
| End of changes. 307 change blocks. | ||||
| 899 lines changed or deleted | 893 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||