| rfc9626.original | rfc9626.txt | |||
|---|---|---|---|---|
| Network Working Group M. Zanaty | Internet Engineering Task Force (IETF) M. Zanaty | |||
| Internet-Draft E. Berger | Request for Comments: 9626 E. Berger | |||
| Intended status: Experimental S. Nandakumar | Category: Experimental S. Nandakumar | |||
| Expires: 5 September 2024 Cisco Systems | ISSN: 2070-1721 Cisco Systems | |||
| 4 March 2024 | August 2024 | |||
| Video Frame Marking RTP Header Extension | Video Frame Marking RTP Header Extension | |||
| draft-ietf-avtext-framemarking-16 | ||||
| Abstract | Abstract | |||
| This document describes a Video Frame Marking RTP header extension | This document describes a Video Frame Marking RTP header extension | |||
| used to convey information about video frames that is critical for | used to convey information about video frames that is critical for | |||
| error recovery and packet forwarding in RTP middleboxes or network | error recovery and packet forwarding in RTP middleboxes or network | |||
| nodes. It is most useful when media is encrypted, and essential when | nodes. It is most useful when media is encrypted and essential when | |||
| the middlebox or node has no access to the media decryption keys. It | the middlebox or node has no access to the media decryption keys. It | |||
| is also useful for codec-agnostic processing of encrypted or | is also useful for codec-agnostic processing of encrypted or | |||
| unencrypted media, while it also supports extensions for codec- | unencrypted media, while it also supports extensions for codec- | |||
| specific information. | specific information. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
| provisions of BCP 78 and BCP 79. | published for examination, experimental implementation, and | |||
| evaluation. | ||||
| Internet-Drafts are working documents of the Internet Engineering | ||||
| Task Force (IETF). Note that other groups may also distribute | ||||
| working documents as Internet-Drafts. The list of current Internet- | ||||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | This document defines an Experimental Protocol for the Internet | |||
| and may be updated, replaced, or obsoleted by other documents at any | community. This document is a product of the Internet Engineering | |||
| time. It is inappropriate to use Internet-Drafts as reference | Task Force (IETF). It represents the consensus of the IETF | |||
| material or to cite them other than as "work in progress." | community. It has received public review and has been approved for | |||
| publication by the Internet Engineering Steering Group (IESG). Not | ||||
| all documents approved by the IESG are candidates for any level of | ||||
| Internet Standard; see Section 2 of RFC 7841. | ||||
| This Internet-Draft will expire on 5 September 2024. | Information about the current status of this document, any errata, | |||
| and how to provide feedback on it may be obtained at | ||||
| https://www.rfc-editor.org/info/rfc9626. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2024 IETF Trust and the persons identified as the | Copyright (c) 2024 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
| extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
| described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
| provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
| in the Revised BSD License. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
| 2. Key Words for Normative Requirements . . . . . . . . . . . . 4 | 2. Requirements Language | |||
| 3. Frame Marking RTP Header Extension . . . . . . . . . . . . . 4 | 3. Frame Marking RTP Header Extension | |||
| 3.1. Long Extension for Scalable Streams . . . . . . . . . . . 5 | 3.1. Long Extension for Scalable Streams | |||
| 3.2. Short Extension for Non-Scalable Streams . . . . . . . . 7 | 3.2. Short Extension for Non-scalable Streams | |||
| 3.3. Layer ID Mappings for Scalable Streams . . . . . . . . . 7 | 3.3. LID Mappings for Scalable Streams | |||
| 3.3.1. VP9 LID Mapping . . . . . . . . . . . . . . . . . . . 8 | 3.3.1. VP9 LID Mapping | |||
| 3.3.2. H265 LID Mapping . . . . . . . . . . . . . . . . . . 8 | 3.3.2. H265 LID Mapping | |||
| 3.3.3. H264-SVC LID Mapping . . . . . . . . . . . . . . . . 9 | 3.3.3. H264 Scalable Video Coding (SVC) LID Mapping | |||
| 3.3.4. H264 (AVC) LID Mapping . . . . . . . . . . . . . . . 10 | 3.3.4. H264 Advanced Video Coding (AVC) LID Mapping | |||
| 3.3.5. VP8 LID Mapping . . . . . . . . . . . . . . . . . . . 10 | 3.3.5. VP8 LID Mapping | |||
| 3.3.6. Future Codec LID Mapping . . . . . . . . . . . . . . 11 | 3.3.6. Future Codec LID Mapping | |||
| 3.4. Signaling Information . . . . . . . . . . . . . . . . . . 11 | 3.4. Signaling Information | |||
| 3.5. Usage Considerations . . . . . . . . . . . . . . . . . . 11 | 3.5. Usage Considerations | |||
| 3.5.1. Relation to Layer Refresh Request (LRR) . . . . . . . 12 | 3.5.1. Relation to Layer Refresh Request (LRR) | |||
| 3.5.2. Scalability Structures . . . . . . . . . . . . . . . 12 | 3.5.2. Scalability Structures | |||
| 4. Security Considerations and Privacy Considerations . . . . . 12 | 4. Security and Privacy Considerations | |||
| 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 | 5. IANA Considerations | |||
| 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 | 6. References | |||
| 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 | 6.1. Normative References | |||
| 7.1. Normative References . . . . . . . . . . . . . . . . . . 14 | 6.2. Informative References | |||
| 7.2. Informative References . . . . . . . . . . . . . . . . . 14 | Acknowledgements | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 | Authors' Addresses | |||
| 1. Introduction | 1. Introduction | |||
| Many widely deployed RTP [RFC3550] topologies [RFC7667] used in | Many widely deployed RTP [RFC3550] topologies [RFC7667] used in | |||
| modern voice and video conferencing systems include a centralized | modern voice and video conferencing systems include a centralized | |||
| component that acts as an RTP switch. It receives voice and video | component that acts as an RTP switch. It receives voice and video | |||
| streams from each participant, which may be encrypted using SRTP | streams from each participant, which may be encrypted using Secure | |||
| [RFC3711], or extensions that provide participants with private media | Real-time Transport Protocol (SRTP) [RFC3711] or extensions that | |||
| [RFC8871] via end-to-end encryption where the switch has no access to | provide participants with private media [RFC8871] via end-to-end | |||
| media decryption keys. The goal is to provide a set of streams back | encryption where the switch has no access to media decryption keys. | |||
| to the participants which enable them to render the right media | The goal is to provide a set of streams back to the participants, | |||
| content. In a simple video configuration, for example, the goal will | which enable them to render the right media content. For example, in | |||
| be that each participant sees and hears just the active speaker. In | a simple video configuration, the goal will be that each participant | |||
| that case, the goal of the switch is to receive the voice and video | sees and hears just the active speaker. In that case, the goal of | |||
| streams from each participant, determine the active speaker based on | the switch is to receive the voice and video streams from each | |||
| energy in the voice packets, possibly using the client-to-mixer audio | participant, determine the active speaker based on energy in the | |||
| level RTP header extension [RFC6464], and select the corresponding | voice packets, possibly using the client-to-mixer audio level RTP | |||
| video stream for transmission to participants; see Figure 1. | header extension [RFC6464], and select the corresponding video stream | |||
| for transmission to participants; see Figure 1. | ||||
| In this document, an "RTP switch" is used as a common short term for | In this document, an "RTP switch" is used as shorthand for the terms | |||
| the terms "switching RTP mixer", "source projecting middlebox", | "switching RTP mixer", "source projecting middlebox", "source | |||
| "source forwarding unit/middlebox" and "video switching MCU" as | forwarding unit/middlebox" and "video switching Multipoint Control | |||
| discussed in [RFC7667]. | Unit (MCU)", as discussed in [RFC7667]. | |||
| +---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
| | A |<---->| |<---->| B | | | A |<---->| |<---->| B | | |||
| +---+ | | +---+ | +---+ | | +---+ | |||
| | RTP | | | RTP | | |||
| +---+ | Switch | +---+ | +---+ | Switch | +---+ | |||
| | C |<---->| |<---->| D | | | C |<---->| |<---->| D | | |||
| +---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
| Figure 1: RTP switch | Figure 1: RTP Switch | |||
| In order to properly support switching of video streams, the RTP | In order to properly support the switching of video streams, the RTP | |||
| switch typically needs some critical information about video frames | switch typically needs some critical information about video frames | |||
| in order to start and stop forwarding streams. | in order to start and stop forwarding streams. | |||
| * Because of inter-frame dependencies, it should ideally switch | * Because of inter-frame dependencies, it should ideally switch | |||
| video streams at a point where the first frame from the new | video streams at a point where the first frame from the new | |||
| speaker can be decoded by recipients without prior frames, e.g | speaker can be decoded by recipients without prior frames, e.g., | |||
| switch on an intra-frame. | switch on an intra-frame. | |||
| * In many cases, the switch may need to drop frames in order to | * In many cases, the switch may need to drop frames in order to | |||
| realize congestion control techniques, and needs to know which | realize congestion control techniques, and it needs to know which | |||
| frames can be dropped with minimal impact to video quality. | frames can be dropped with minimal impact to video quality. | |||
| * For scalable streams with dependent layers, the switch may need to | * For scalable streams with dependent layers, the switch may need to | |||
| selectively forward specific layers to specific recipients due to | selectively forward specific layers to specific recipients due to | |||
| recipient bandwidth or decoder limits. | recipient bandwidth or decoder limits. | |||
| Furthermore, it is highly desirable to do this in a payload format- | Furthermore, it is highly desirable to do this in a payload format- | |||
| agnostic way which is not specific to each different video codec. | agnostic way that is not specific to each different video codec. | |||
| Most modern video codecs share common concepts around frame types and | Most modern video codecs share common concepts around frame types and | |||
| other critical information to make this codec-agnostic handling | other critical information to make this codec-agnostic handling | |||
| possible. | possible. | |||
| It is also desirable to be able to do this for SRTP without requiring | It is also desirable to be able to do this for SRTP without requiring | |||
| the video switch to decrypt the packets. SRTP will encrypt the RTP | the video switch to decrypt the packets. SRTP will encrypt the RTP | |||
| payload format contents and consequently this data is not usable for | payload format contents; consequently, this data is not usable for | |||
| the switching function without decryption, which may not even be | the switching function without decryption, which may not even be | |||
| possible in the case of end-to-end encryption of private media | possible in the case of end-to-end encryption of private media | |||
| [RFC8871]. | [RFC8871]. | |||
| By providing meta-information about the RTP streams outside the | By providing meta-information about the RTP streams outside the | |||
| encrypted media payload, an RTP switch can do codec-agnostic | encrypted media payload, an RTP switch can do codec-agnostic | |||
| selective forwarding without decrypting the payload. This document | selective forwarding without decrypting the payload. This document | |||
| specifies the necessary meta-information in an RTP header extension. | specifies the necessary meta-information in an RTP header extension. | |||
| 2. Key Words for Normative Requirements | 2. Requirements Language | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
| 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| 3. Frame Marking RTP Header Extension | 3. Frame Marking RTP Header Extension | |||
| This specification uses RTP header extensions as defined in | This specification uses RTP header extensions as defined in | |||
| [RFC8285]. A subset of meta-information from the video stream is | [RFC8285]. A subset of meta-information from the video stream is | |||
| provided as an RTP header extension to allow an RTP switch to do | provided as an RTP header extension to allow an RTP switch to do | |||
| generic selective forwarding of video streams encoded with | generic selective forwarding of video streams encoded with | |||
| potentially different video codecs. | potentially different video codecs. | |||
| The Frame Marking RTP header extension is encoded using the one-byte | The Frame Marking RTP header extension is encoded using the one-byte | |||
| header or two-byte header as described in [RFC8285]. The one-byte | header or two-byte header as described in [RFC8285]. The one-byte | |||
| header format is used for examples in this memo. The two-byte header | header format is used for examples in this document. The two-byte | |||
| format is used when other two-byte header extensions are present in | header format is used when other two-byte header extensions are | |||
| the same RTP packet, since mixing one-byte and two-byte extensions is | present in the same RTP packet since mixing one-byte and two-byte | |||
| not possible in the same RTP packet. | extensions is not possible in the same RTP packet. | |||
| This extension is only specified for Source (not Redundancy) RTP | This extension is only specified for Source (not Redundancy) RTP | |||
| Streams [RFC7656] that carry video payloads. It is not specified for | Streams [RFC7656] that carry video payloads. It is not specified for | |||
| audio payloads, nor is it specified for Redundancy RTP Streams. The | audio payloads, nor is it specified for Redundancy RTP Streams. The | |||
| (separate) specifications for Redundancy RTP Streams often include | (separate) specifications for Redundancy RTP Streams often include | |||
| provisions for recovering any header extensions that were part of the | provisions for recovering any header extensions that were part of the | |||
| original source packet. Such provisions can be followed to recover | original source packet. Such provisions can be followed to recover | |||
| the Frame Marking RTP header extension of the original source packet. | the Frame Marking RTP header extension of the original source packet. | |||
| Source packet frame markings may be useful when generating Redundancy | Source packet frame markings may be useful when generating Redundancy | |||
| RTP Streams; for example, the I (Independent Frame) and D | RTP Streams; for example, the I (Independent Frame) and D | |||
| (Discardable Frame) bits, defined in Section 3.1, can be used to | (Discardable Frame) bits, defined in Section 3.1, can be used to | |||
| generate extra or no redundancy, respectively, and redundancy schemes | generate extra or no redundancy, respectively, and redundancy schemes | |||
| with source blocks can align source block boundaries with independent | with source blocks can align source block boundaries with independent | |||
| frame boundaries as marked by the I bit. | frame boundaries as marked by the I bit. | |||
| A frame, in the context of this specification, is the set of RTP | A frame, in the context of this specification, is the set of RTP | |||
| packets with the same RTP timestamp from a specific RTP | packets with the same RTP timestamp from a specific RTP | |||
| synchronization source (SSRC). A frame within a layer is the set of | Synchronization Source (SSRC). A frame within a layer is the set of | |||
| RTP packets with the same RTP timestamp, SSRC, Temporal ID (TID), and | RTP packets with the same RTP timestamp, SSRC, Temporal ID (TID), and | |||
| Layer ID (LID). | Layer ID (LID). | |||
| 3.1. Long Extension for Scalable Streams | 3.1. Long Extension for Scalable Streams | |||
| The following RTP header extension is RECOMMENDED for scalable | The following RTP header extension is RECOMMENDED for scalable | |||
| streams. It MAY also be used for non-scalable streams, in which case | streams. It MAY also be used for non-scalable streams, in which case | |||
| TID, LID and TL0PICIDX MUST be 0 or omitted. The ID is assigned per | the TID, LID, and TL0PICIDX MUST be 0 or omitted. The ID is assigned | |||
| [RFC8285], and the length is encoded as L=2 which indicates 3 octets | per [RFC8285]. The length is encoded as follows: | |||
| of data when nothing is omitted, or L=1 for 2 octets when TL0PICIDX | ||||
| is omitted, or L=0 for 1 octet when both LID and TL0PICIDX are | * L=2 to indicate 3 octets of data when nothing is omitted, | |||
| omitted. | ||||
| * L=1 for 2 octets when TL0PICIDX is omitted, or | ||||
| * L=0 for 1 octet when both the LID and TL0PICIDX are omitted. | ||||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | ID=? | L=2 |S|E|I|D|B| TID | LID | TL0PICIDX | | | ID=? | L=2 |S|E|I|D|B| TID | LID | TL0PICIDX | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| or | or | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | ID=? | L=1 |S|E|I|D|B| TID | LID | (TL0PICIDX omitted) | | ID=? | L=1 |S|E|I|D|B| TID | LID | (TL0PICIDX omitted) | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| or | or | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | ID=? | L=0 |S|E|I|D|B| TID | (LID and TL0PICIDX omitted) | | ID=? | L=0 |S|E|I|D|B| TID | (LID and TL0PICIDX omitted) | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| The following information are extracted from the media payload and | The following information is extracted from the media payload and | |||
| sent in the Frame Marking RTP header extension. | sent in the Frame Marking RTP header extension. | |||
| * S: Start of Frame (1 bit) - MUST be 1 in the first packet in a | S: Start of Frame (1 bit) | |||
| frame within a layer; otherwise MUST be 0. | MUST be 1 in the first packet in a frame within a layer; | |||
| * E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame | otherwise, MUST be 0. | |||
| within a layer; otherwise MUST be 0. Note that the RTP header | ||||
| marker bit MAY be used to infer the last packet of the highest | ||||
| enhancement layer, in payload formats with such semantics. | ||||
| * I: Independent Frame (1 bit) - MUST be 1 for a frame within a | ||||
| layer that can be decoded independent of temporally prior frames, | ||||
| e.g. intra-frame, VPX keyframe, H.264 IDR [RFC6184], H.265 | ||||
| IDR/CRA/BLA/RAP [RFC7798]; otherwise MUST be 0. Note that this | ||||
| bit only signals temporal independence, so it can be 1 in spatial | ||||
| or quality enhancement layers that depend on temporally co-located | ||||
| layers but not temporally prior frames. | ||||
| * D: Discardable Frame (1 bit) - MUST be 1 for a frame within a | ||||
| layer the sender knows can be discarded, and still provide a | ||||
| decodable media stream; otherwise MUST be 0. | ||||
| * B: Base Layer Sync (1 bit) - When TID is not 0, this MUST be 1 if | ||||
| the sender knows this frame within a layer only depends on the | ||||
| base temporal layer; otherwise MUST be 0. When TID is 0 or if no | ||||
| scalability is used, this MUST be 0. | ||||
| * TID: Temporal ID (3 bits) - Identifies the temporal layer/sub- | E: End of Frame (1 bit) | |||
| layer encoded, starting with 0 for the base layer, and increasing | MUST be 1 in the last packet in a frame within a layer; otherwise, | |||
| with higher temporal fidelity. If no scalability is used, this | MUST be 0. Note that the RTP header marker bit MAY be used to | |||
| MUST be 0. It is implicitly 0 in the short extension format. | infer the last packet of the highest enhancement layer in payload | |||
| * LID: Layer ID (8 bits) - Identifies the spatial and quality layer | formats with such semantics. | |||
| encoded, starting with 0 for the base layer, and increasing with | ||||
| higher fidelity. If no scalability is used, this MUST be 0 or | ||||
| omitted to reduce length. When omitted, TL0PICIDX MUST also be | ||||
| omitted. It is implicitly 0 in the short extension format or when | ||||
| omitted in the long extension format. | ||||
| * TL0PICIDX: Temporal Layer 0 Picture Index (8 bits) - When TID is 0 | ||||
| and LID is 0, this is a cyclic counter labeling base layer frames. | ||||
| When TID is not 0 or LID is not 0, this indicates a dependency on | ||||
| the given index, such that this frame within this layer depends on | ||||
| the frame with this label in the layer with TID 0 and LID 0. If | ||||
| no scalability is used, or the cyclic counter is unknown, this | ||||
| MUST be omitted to reduce length. Note that 0 is a valid index | ||||
| value for TL0PICIDX. | ||||
| The layer information contained in TID and LID convey useful aspects | I: Independent Frame (1 bit) | |||
| of the layer structure that can be utilized in selective forwarding. | MUST be 1 for a frame within a layer that can be decoded | |||
| independent of temporally prior frames, e.g., intra-frame, VPX | ||||
| keyframe, H.264 Instantaneous Decoding Refresh (IDR) [RFC6184], or | ||||
| H.265 IDR / Clean Random Access (CRA) / Broken Link Access (BLA) / | ||||
| Random Access Point (RAP) [RFC7798]; otherwise, MUST be 0. Note | ||||
| that this bit only signals temporal independence, so it can be 1 | ||||
| in spatial or quality enhancement layers that depend on temporally | ||||
| co-located layers but not temporally prior frames. | ||||
| D: Discardable Frame (1 bit) | ||||
| MUST be 1 for a frame within a layer the sender knows can be | ||||
| discarded and still provide a decodable media stream; otherwise, | ||||
| MUST be 0. | ||||
| B: Base Layer Sync (1 bit) | ||||
| When the TID is not 0, this MUST be 1 if the sender knows this | ||||
| frame within a layer only depends on the base temporal layer; | ||||
| otherwise, MUST be 0. When the TID is 0 or if no scalability is | ||||
| used, this MUST be 0. | ||||
| TID: Temporal ID (3 bits) | ||||
| Identifies the temporal layer/sub-layer encoded, starting with 0 | ||||
| for the base layer and increasing with higher temporal fidelity. | ||||
| If no scalability is used, this MUST be 0. It is implicitly 0 in | ||||
| the short extension format. | ||||
| LID: Layer ID (8 bits) | ||||
| Identifies the spatial and quality layer encoded, starting with 0 | ||||
| for the base layer and increasing with higher fidelity. If no | ||||
| scalability is used, this MUST be 0 or omitted to reduce length. | ||||
| When the LID is omitted, TL0PICIDX MUST also be omitted. It is | ||||
| implicitly 0 in the short extension format or when omitted in the | ||||
| long extension format. | ||||
| TL0PICIDX: Temporal Layer 0 Picture Index (8 bits) | ||||
| When the TID is 0 and the LID is 0, this is a cyclic counter | ||||
| labeling base layer frames. When the TID is not 0 or the LID is | ||||
| not 0, the indication is that a dependency on the given index, | ||||
| such that this frame within this layer depends on the frame with | ||||
| this label in the layer with a TID 0 and LID 0. If no scalability | ||||
| is used, or the cyclic counter is unknown, TL0PICIDX MUST be | ||||
| omitted to reduce length. Note that 0 is a valid index value for | ||||
| TL0PICIDX. | ||||
| The layer information contained in the TID and LID convey useful | ||||
| aspects of the layer structure that can be utilized in selective | ||||
| forwarding. | ||||
| Without further information about the layer structure, these TID/LID | Without further information about the layer structure, these TID/LID | |||
| identifiers can only be used for relative priority of layers and | identifiers can only be used for relative priority of layers and | |||
| implicit dependencies between layers. They convey a layer hierarchy | implicit dependencies between layers. They convey a layer hierarchy | |||
| with TID=0 and LID=0 identifying the base layer. Higher values of | with TID = 0 and LID = 0 identifying the base layer. Higher values | |||
| TID identify higher temporal layers with higher frame rates. Higher | of TID identify higher temporal layers with higher frame rates. | |||
| values of LID identify higher spatial and/or quality layers with | Higher values of LID identify higher spatial and/or quality layers | |||
| higher resolutions and/or bitrates. Implicit dependencies between | with higher resolutions and/or bitrates. Implicit dependencies | |||
| layers assume that a layer with a given TID/LID MAY depend on | between layers assume that a layer with a given TID/LID MAY depend on | |||
| layer(s) with the same or lower TID/LID, but MUST NOT depend on | a layer or layers with the same or lower TID/LID, but they MUST NOT | |||
| layer(s) with higher TID/LID. | depend on a layer or layers with higher TID/LID. | |||
| With further information, for example, possible future RTCP SDES | With further information, for example, possible future RTCP source | |||
| items that convey full layer structure information, it may be | description (SDES) items that convey full layer structure | |||
| possible to map these TIDs and LIDs to specific absolute frame rates, | information, it may be possible to map these TIDs and LIDs to | |||
| resolutions and bitrates, as well as explicit dependencies between | specific absolute frame rates, resolutions, bitrates, and explicit | |||
| layers. Such additional layer information may be useful for | dependencies between layers. Such additional layer information may | |||
| forwarding decisions in the RTP switch, but is beyond the scope of | be useful for forwarding decisions in the RTP switch but is beyond | |||
| this memo. The relative layer information is still useful for many | the scope of this memo. The relative layer information is still | |||
| selective forwarding decisions even without such additional layer | useful for many selective forwarding decisions, even without such | |||
| information. | additional layer information. | |||
| 3.2. Short Extension for Non-Scalable Streams | 3.2. Short Extension for Non-scalable Streams | |||
| The following RTP header extension is RECOMMENDED for non-scalable | The following RTP header extension is RECOMMENDED for non-scalable | |||
| streams. It is identical to the shortest form of the extension for | streams. It is identical to the shortest form of the extension for | |||
| scalable streams, except the last four bits (B and TID) are replaced | scalable streams, except the last four bits (B and TID) are replaced | |||
| with zeros. It MAY also be used for scalable streams if the sender | with zeros. It MAY also be used for scalable streams if the sender | |||
| has limited or no information about stream scalability. The ID is | has limited or no information about stream scalability. The ID is | |||
| assigned per [RFC8285], and the length is encoded as L=0 which | assigned per [RFC8285]; the length is encoded as L=0, which indicates | |||
| indicates 1 octet of data. | 1 octet of data. | |||
| 0 1 | 0 1 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | ID=? | L=0 |S|E|I|D|0 0 0 0| | | ID=? | L=0 |S|E|I|D|0 0 0 0| | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| The following information are extracted from the media payload and | The following information is extracted from the media payload and | |||
| sent in the Frame Marking RTP header extension. | sent in the Frame Marking RTP header extension. | |||
| * S: Start of Frame (1 bit) - MUST be 1 in the first packet in a | S: Start of Frame (1 bit) | |||
| frame; otherwise MUST be 0. | MUST be 1 in the first packet in a frame; otherwise, MUST be 0. | |||
| * E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame; | ||||
| otherwise MUST be 0. SHOULD match the RTP header marker bit in | ||||
| payload formats with such semantics for marking end of frame. | ||||
| * I: Independent Frame (1 bit) - MUST be 1 for frames that can be | ||||
| decoded independent of temporally prior frames, e.g. intra-frame, | ||||
| VPX keyframe, H.264 IDR [RFC6184], H.265 IDR/CRA/BLA/IRAP | ||||
| [RFC7798]; otherwise MUST be 0. | ||||
| * D: Discardable Frame (1 bit) - MUST be 1 for frames the sender | ||||
| knows can be discarded, and still provide a decodable media | ||||
| stream; otherwise MUST be 0. | ||||
| * The remaining (4 bits) - are reserved/fixed values and not used | ||||
| for non-scalable streams; they MUST be set to 0 upon transmission | ||||
| and ignored upon reception. | ||||
| 3.3. Layer ID Mappings for Scalable Streams | E: End of Frame (1 bit) | |||
| MUST be 1 in the last packet in a frame; otherwise, MUST be 0. | ||||
| SHOULD match the RTP header marker bit in payload formats with | ||||
| such semantics for marking end of frame. | ||||
| This section maps the specific Layer ID information contained in | I: Independent Frame (1 bit) | |||
| specific scalable codecs to the generic LID and TID fields. | MUST be 1 for frames that can be decoded independent of temporally | |||
| prior frames, e.g., intra-frame, VPX keyframe, H.264 IDR | ||||
| [RFC6184], or H.265 IDR/CRA/BLA/IRAP [RFC7798]; otherwise, MUST be | ||||
| 0. | ||||
| Note that non-scalable streams have no Layer ID information and thus | D: Discardable Frame (1 bit) | |||
| no mappings. | MUST be 1 for frames the sender knows can be discarded and still | |||
| provide a decodable media stream; otherwise, MUST be 0. | ||||
| The remaining (4 bits) | ||||
| These are reserved/fixed values and not used for non-scalable | ||||
| streams; they MUST be set to 0 upon transmission and ignored upon | ||||
| reception. | ||||
| 3.3. LID Mappings for Scalable Streams | ||||
| This section maps the specific Layer ID (LID) information contained | ||||
| in specific scalable codecs to the generic LID and TID fields. | ||||
| Note that non-scalable streams have no LID information; thus, they | ||||
| have no mappings. | ||||
| 3.3.1. VP9 LID Mapping | 3.3.1. VP9 LID Mapping | |||
| The VP9 [I-D.ietf-payload-vp9] Spatial Layer ID (SID, 3 bits) and | The VP9 [RFC9628] Spatial Layer ID (SID, 3 bits) and Temporal Layer | |||
| Temporal Layer ID (TID, 3 bits) in the VP9 payload descriptor are | ID (TID, 3 bits) in the VP9 payload descriptor are mapped to the | |||
| mapped to the generic LID and TID fields in the header extension as | generic LID and TID fields in the header extension as shown in the | |||
| shown in the following figure. | following figure. | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0| SID | TL0PICIDX | | | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0| SID | TL0PICIDX | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| The S bit MUST match the B bit in the VP9 payload descriptor. | The S bit MUST match the B bit in the VP9 payload descriptor. | |||
| The E bit MUST match the E bit in the VP9 payload descriptor. | The E bit MUST match the E bit in the VP9 payload descriptor. | |||
| The I bit MUST match the inverse of the P bit in the VP9 payload | The I bit MUST match the inverse of the P bit in the VP9 payload | |||
| descriptor. | descriptor. | |||
| The D bit MUST be 1 if the refresh_frame_flags in the VP9 payload | The D bit MUST be 1 if the refresh_frame_flags in the VP9 payload | |||
| uncompressed header are all 0, otherwise it MUST be 0. | uncompressed header are all 0; otherwise, it MUST be 0. | |||
| The B bit MUST be 0 if TID is 0; otherwise, if TID is not 0, it MUST | The B bit MUST be 0 if the TID is 0; if the TID is not 0, it MUST | |||
| match the U bit in the VP9 payload descriptor. Note: When using | match the U bit in the VP9 payload descriptor. Note: when using | |||
| temporally nested scalability structures as recommended in | temporally nested scalability structures as recommended in | |||
| Section 3.5.2, the B bit and VP9 U bit will always be 1 if TID is not | Section 3.5.2, the B bit and VP9 U bit will always be 1 if the TID is | |||
| 0, since it is always possible to switch up to a higher temporal | not 0 since it is always possible to switch up to a higher temporal | |||
| layer in such nested structures. | layer in such nested structures. | |||
| TID, SID and TL0PICIDX MUST match the correspondingly named fields in | The TID, SID, and TL0PICIDX MUST match the correspondingly named | |||
| the VP9 payload descriptor, with SID aligned in the least significant | fields in the VP9 payload descriptor, with SID aligned in the least | |||
| 3 bits of the 8-bit LID field and zeros in the most significant 5 | significant 3 bits of the 8-bit LID field and zeros in the most | |||
| bits. | significant 5 bits. | |||
| 3.3.2. H265 LID Mapping | 3.3.2. H265 LID Mapping | |||
| The H265 [RFC7798] LayerID (6 bits) and TID (3 bits) from the NAL | The H265 [RFC7798] LayerID (6 bits), and TID (3 bits) from the | |||
| unit header are mapped to the generic LID and TID fields in the | Network Abstraction Layer (NAL) unit header are mapped to the generic | |||
| header extension as shown in the following figure. | LID and TID fields in the header extension as shown in the following | |||
| figure. | ||||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | ID=? | L=2 |S|E|I|D|B| TID |0|0| LayerID | TL0PICIDX | | | ID=? | L=2 |S|E|I|D|B| TID |0|0| LayerID | TL0PICIDX | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| The S and E bits MUST match the correspondingly named bits in | The S and E bits MUST match the correspondingly named bits in | |||
| PACI:PHES:TSCI payload structures. | PACI:PHES:TSCI payload structures. | |||
| The I bit MUST be 1 when the NAL unit type is 16-23 (inclusive) or | The I bit MUST be 1 when the NAL unit type is 16-23 (inclusive) or | |||
| 32-34 (inclusive), or an aggregation packet or fragmentation unit | 32-34 (inclusive), or an aggregation packet or fragmentation unit | |||
| encapsulating any of these types, otherwise it MUST be 0. These | encapsulating any of these types; otherwise, it MUST be 0. These | |||
| ranges cover intra (IRAP) frames as well as critical parameter sets | ranges cover intra (IRAP) frames as well as critical parameter sets | |||
| (VPS, SPS, PPS). | (Video Parameter Set (VPS), Sequence Parameter Set (SPS), Picture | |||
| Parameter Set (PPS)). | ||||
| The D bit MUST be 1 when the NAL unit type is 0, 2, 4, 6, 8, 10, 12, | The D bit MUST be 1 when the NAL unit type is 0, 2, 4, 6, 8, 10, 12, | |||
| 14, or 38, or an aggregation packet or fragmentation unit | 14, 38, or an aggregation packet or fragmentation unit encapsulating | |||
| encapsulating only these types, otherwise it MUST be 0. These ranges | only these types; otherwise, it MUST be 0. These ranges cover non- | |||
| cover non-reference frames as well as filler data. | reference frames as well as filler data. | |||
| The B bit can not be determined reliably from simple inspection of | The B bit cannot be determined reliably from simple inspection of | |||
| payload headers, and therefore is determined by implementation- | payload headers; therefore, it is determined by implementation- | |||
| specific means. For example, internal codec interfaces may provide | specific means. For example, internal codec interfaces may provide | |||
| information to set this reliably. | information to set this reliably. | |||
| TID and LayerID MUST match the correspondingly named fields in the | The TID and LayerID MUST match the correspondingly named fields in | |||
| H265 NAL unit header, with LayerID aligned in the least significant 6 | the H265 NAL unit header, with LayerID aligned in the least | |||
| bits of the 8-bit LID field and zeros in the most significant 2 bits. | significant 6 bits of the 8-bit LID field and zeros in the most | |||
| significant 2 bits. | ||||
| 3.3.3. H264-SVC LID Mapping | 3.3.3. H264 Scalable Video Coding (SVC) LID Mapping | |||
| The following shows H264-SVC [RFC6190] Layer encoding information (3 | The following shows H264-SVC [RFC6190] Layer encoding information (3 | |||
| bits for spatial/dependency layer, 4 bits for quality layer and 3 | bits for spatial/dependency layer, 4 bits for quality layer, and 3 | |||
| bits for temporal layer) mapped to the generic LID and TID fields. | bits for temporal layer) mapped to the generic LID and TID fields. | |||
| The S, E, I and D bits MUST match the correspondingly named bits in | The S, E, I, and D bits MUST match the correspondingly named bits in | |||
| PACSI payload structures. | Payload Content Scalability Information (PACSI) payload structures. | |||
| The I bit MUST be 1 when the NAL unit type is 5, 7, 8, 13, or 15, or | The I bit MUST be 1 when the NAL unit type is 5, 7, 8, 13, 15, or an | |||
| an aggregation packet or fragmentation unit encapsulating any of | aggregation packet or fragmentation unit encapsulating any of these | |||
| these types, otherwise it MUST be 0. These ranges cover intra (IDR) | types; otherwise, it MUST be 0. These ranges cover intra (IDR) | |||
| frames as well as critical parameter sets (SPS/PPS variants). | frames as well as critical parameter sets (SPS/PPS variants). | |||
| The D bit MUST be 1 when the NAL unit header NRI field is 0, or an | The D bit MUST be 1 when the NAL unit header Network Remote | |||
| aggregation packet or fragmentation unit encapsulating only NAL units | Identification (NRI) field is 0, or an aggregation packet or | |||
| with NRI=0, otherwise it MUST be 0. The NRI=0 condition signals non- | fragmentation unit encapsulating only NAL units with NRI=0; | |||
| reference frames. | otherwise, it MUST be 0. The NRI=0 condition signals non-reference | |||
| frames. | ||||
| The B bit can not be determined reliably from simple inspection of | The B bit cannot be determined reliably from simple inspection of | |||
| payload headers, and therefore is determined by implementation- | payload headers; therefore, it is determined by implementation- | |||
| specific means. For example, internal codec interfaces may provide | specific means. For example, internal codec interfaces may provide | |||
| information to set this reliably. | information to set this reliably. | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | ID=? | L=2 |S|E|I|D|B| TID |0| DID | QID | TL0PICIDX | | | ID=? | L=2 |S|E|I|D|B| TID |0| DID | QID | TL0PICIDX | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| 3.3.4. H264 (AVC) LID Mapping | 3.3.4. H264 Advanced Video Coding (AVC) LID Mapping | |||
| The following shows the header extension for H264 (AVC) [RFC6184] | The following shows the header extension for H264 (AVC) [RFC6184] | |||
| that contains only temporal layer information. | that contains only temporal layer information. | |||
| The S bit MUST be 1 when the timestamp in the RTP header differs from | The S bit MUST be 1 when the timestamp in the RTP header differs from | |||
| the timestamp in the prior RTP sequence number from the same SSRC, | the timestamp in the prior RTP sequence number from the same SSRC; | |||
| otherwise it MUST be 0. | otherwise, it MUST be 0. | |||
| The E bit MUST match the M bit in the RTP header. | The E bit MUST match the M bit in the RTP header. | |||
| The I bit MUST be 1 when the NAL unit type is 5, 7, or 8, or an | The I bit MUST be 1 when the NAL unit type is 5, 7, or 8, or an | |||
| aggregation packet or fragmentation unit encapsulating any of these | aggregation packet or fragmentation unit encapsulating any of these | |||
| types, otherwise it MUST be 0. These ranges cover intra (IDR) frames | types; otherwise, it MUST be 0. These ranges cover intra (IDR) | |||
| as well as critical parameter sets (SPS/PPS). | frames as well as critical parameter sets (SPS/PPS). | |||
| The D bit MUST be 1 when the NAL unit header NRI field is 0, or an | The D bit MUST be 1 when the NAL unit header NRI field is 0, or an | |||
| aggregation packet or fragmentation unit encapsulating only NAL units | aggregation packet or fragmentation unit encapsulating only NAL units | |||
| with NRI=0, otherwise it MUST be 0. The NRI=0 condition signals non- | with NRI=0; otherwise, it MUST be 0. The NRI=0 condition signals | |||
| reference frames. | non-reference frames. | |||
| The B bit can not be determined reliably from simple inspection of | The B bit cannot be determined reliably from simple inspection of | |||
| payload headers, and therefore is determined by implementation- | payload headers; therefore, it is determined by implementation- | |||
| specific means. For example, internal codec interfaces may provide | specific means. For example, internal codec interfaces may provide | |||
| information to set this reliably. | information to set this reliably. | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | | | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| 3.3.5. VP8 LID Mapping | 3.3.5. VP8 LID Mapping | |||
| The following shows the header extension for VP8 [RFC7741] that | The following shows the header extension for VP8 [RFC7741] that | |||
| contains only temporal layer information. | contains only temporal layer information. | |||
| The S bit MUST match the correspondingly named bit in the VP8 payload | The S bit MUST match the correspondingly named bit in the VP8 payload | |||
| descriptor when PID=0, otherwise it MUST be 0. | descriptor when PID=0; otherwise, it MUST be 0. | |||
| The E bit MUST match the M bit in the RTP header. | The E bit MUST match the M bit in the RTP header. | |||
| The I bit MUST match the inverse of the P bit in the VP8 payload | The I bit MUST match the inverse of the P bit in the VP8 payload | |||
| header. | header. | |||
| The D bit MUST match the N bit in the VP8 payload descriptor. | The D bit MUST match the N bit in the VP8 payload descriptor. | |||
| The B bit MUST match the Y bit in the VP8 payload descriptor. Note: | The B bit MUST match the Y bit in the VP8 payload descriptor. Note: | |||
| When using temporally nested scalability structures as recommended in | when using temporally nested scalability structures as recommended in | |||
| Section 3.5.2, the B bit and VP8 Y bit will always be 1 if TID is not | Section 3.5.2, the B bit and VP8 Y bit will always be 1 if the TID is | |||
| 0, since it is always possible to switch up to a higher temporal | not 0 since it is always possible to switch up to a higher temporal | |||
| layer in such nested structures. | layer in such nested structures. | |||
| TID and TL0PICIDX MUST match the correspondingly named fields in the | The TID and TL0PICIDX MUST match the correspondingly named fields in | |||
| VP8 payload descriptor. | the VP8 payload descriptor. | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | | | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| 3.3.6. Future Codec LID Mapping | 3.3.6. Future Codec LID Mapping | |||
| The RTP payload format specification for future video codecs SHOULD | The RTP payload format specification for future video codecs SHOULD | |||
| skipping to change at page 11, line 51 ¶ | skipping to change at line 544 ¶ | |||
| 3.5. Usage Considerations | 3.5. Usage Considerations | |||
| The header extension values MUST represent what is already in the RTP | The header extension values MUST represent what is already in the RTP | |||
| payload. | payload. | |||
| When an RTP switch needs to discard a received video frame due to | When an RTP switch needs to discard a received video frame due to | |||
| congestion control considerations, it is RECOMMENDED that it | congestion control considerations, it is RECOMMENDED that it | |||
| preferably drop frames marked with the D (Discardable) bit set, or | preferably drop frames marked with the D (Discardable) bit set, or | |||
| the highest values of TID and LID, which indicate the highest | the highest values of TID and LID, which indicate the highest | |||
| temporal and spatial/quality enhancement layers, since those | temporal and spatial/quality enhancement layers, since those | |||
| typically have fewer dependenices on them than lower layers. | typically have fewer dependencies on them than lower layers. | |||
| When an RTP switch wants to forward a new video stream to a receiver, | When an RTP switch wants to forward a new video stream to a receiver, | |||
| it is RECOMMENDED to select the new video stream from the first | it is RECOMMENDED to select the new video stream from the first | |||
| switching point with the I (Independent) bit set in all spatial | switching point with the I (Independent) bit set in all spatial | |||
| layers and forward the same. An RTP switch can request a media | layers and forward the same. An RTP switch can request that a media | |||
| source to generate a switching point by sending Full Intra Request | source generate a switching point by sending Full Intra Request (RTCP | |||
| (RTCP FIR) as defined in [RFC5104], for example. | FIR) as defined in [RFC5104], for example. | |||
| 3.5.1. Relation to Layer Refresh Request (LRR) | 3.5.1. Relation to Layer Refresh Request (LRR) | |||
| Receivers can use the Layer Refresh Request (LRR) | Receivers can use the Layer Refresh Request (LRR) [RFC9627] RTCP | |||
| [I-D.ietf-avtext-lrr] RTCP feedback message to upgrade to a higher | feedback message to upgrade to a higher layer in scalable encodings. | |||
| layer in scalable encodings. The TID/LID values and formats used in | The TID/LID values and formats used in LRR messages MUST correspond | |||
| LRR messages MUST correspond to the same values and formats specified | to the same values and formats specified in Section 3.1. | |||
| in Section 3.1. | ||||
| Because frame marking can only be used with temporally-nested | Because frame marking can only be used with temporally nested | |||
| streams, temporal-layer LRR refreshes are unnecessary for frame- | streams, temporal-layer LRR refreshes are unnecessary for frame- | |||
| marked streams. Other refreshes can be detected based on the I bit | marked streams. Other refreshes can be detected based on the I bit | |||
| being set for the specific spatial layers. | being set for the specific spatial layers. | |||
| 3.5.2. Scalability Structures | 3.5.2. Scalability Structures | |||
| The LID and TID information is most useful for fixed scalability | The LID and TID information is most useful for fixed scalability | |||
| structures, such as nested hierarchical temporal layering structures, | structures, such as nested hierarchical temporal layering structures, | |||
| where each temporal layer only references lower temporal layers or | where each temporal layer only references lower temporal layers or | |||
| the base temporal layer. The LID and TID information is less useful, | the base temporal layer. The LID and TID information is less useful, | |||
| or even not useful at all, for complex, irregular scalability | or even not useful at all, for complex, irregular scalability | |||
| structures that do not conform to common, fixed patterns of inter- | structures that do not conform to common, fixed patterns of inter- | |||
| layer dependencies and referencing structures. Therefore it is | layer dependencies and referencing structures. Therefore, it is | |||
| RECOMMENDED to use LID and TID information for RTP switch forwarding | RECOMMENDED to use LID and TID information for RTP switch forwarding | |||
| decisions only in the case of temporally nested scalability | decisions only in the case of temporally nested scalability | |||
| structures, and it is NOT RECOMMENDED for other (more complex or | structures, and it is NOT RECOMMENDED for other (more complex or | |||
| irregular) scalability structures. | irregular) scalability structures. | |||
| 4. Security Considerations and Privacy Considerations | 4. Security and Privacy Considerations | |||
| In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP | In "The Secure Real-time Transport Protocol (SRTP)" [RFC3711], RTP | |||
| header extensions are authenticated and optionally encrypted | header extensions are authenticated and optionally encrypted | |||
| [RFC9335]. When unencrypted header extensions are used, some | [RFC9335]. When unencrypted header extensions are used, some | |||
| metadata is exposed and visible to middle boxes on the network path, | metadata is exposed and visible to middleboxes on the network path, | |||
| while encrypted media data and metadata in encrypted header | while encrypted media data and metadata in encrypted header | |||
| extensions are not exposed. | extensions are not exposed. | |||
| The primary utility of this specification is for RTP switches to make | The primary utility of this specification is for RTP switches to make | |||
| proper media forwarding decisions. RTP switches are the SRTP peers | proper media forwarding decisions. RTP switches are the SRTP peers | |||
| of endpoints, so they can access encrypted header extensions, but not | of endpoints, so they can access encrypted header extensions, but not | |||
| end-to-end encrypted private media payloads. Other middle boxes on | end-to-end encrypted private media payloads. Other middleboxes on | |||
| the network path can only access unencrypted header extensions, since | the network path can only access unencrypted header extensions since | |||
| they are not SRTP peers. | they are not SRTP peers. | |||
| RTP endpoints which negotiate this extension should consider whether | RTP endpoints that negotiate this extension should consider whether: | |||
| this video frame marking metadata needs to be exposed to the SRTP | ||||
| peer only, in which case the header extension can be encrypted; or | ||||
| whether other middle boxes on the network path also need this | ||||
| metadata, for example, to optimize packet drop decisions that | ||||
| minimize media quality impacts, in which case the header extension | ||||
| can be unencrypted, if the endpoint accepts the potential privacy | ||||
| leakage of this metadata. For example, it would be possible to | ||||
| determine keyframes and their frequency in unencrypted header | ||||
| extensions. This information can often be obtained via statistical | ||||
| analysis of encrypted data. For example, keyframes are usually much | ||||
| larger than other frames, so frame size alone can leak this in the | ||||
| absence of any unencrypted metadata. However, unencrypted metadata | ||||
| provides a reliable signal rather than a statistical probability; so | ||||
| endpoints should take that into consideration to balance the privacy | ||||
| leakage risk against the potential benefit of optimized media | ||||
| delivery when deciding whether to negotiate and encrypt this header | ||||
| extension. | ||||
| 5. Acknowledgements | * this video frame marking metadata needs to be exposed to the SRTP | |||
| peer only, in which case the header extension can be encrypted; or | ||||
| Many thanks to Bernard Aboba, Jonathan Lennox, Stephan Wenger, Dale | * other middleboxes on the network path also need this metadata, for | |||
| Worley, and Magnus Westerlund for their inputs. | example, to optimize packet drop decisions that minimize media | |||
| quality impacts, in which case the header extension can be | ||||
| unencrypted, if the endpoint accepts the potential privacy leakage | ||||
| of this metadata. | ||||
| 6. IANA Considerations | For example, it would be possible to determine keyframes and their | |||
| frequency in unencrypted header extensions. This information can | ||||
| often be obtained via statistical analysis of encrypted data. For | ||||
| example, keyframes are usually much larger than other frames, so | ||||
| frame size alone can leak this in the absence of any unencrypted | ||||
| metadata. However, unencrypted metadata provides a reliable signal | ||||
| rather than a statistical probability; so endpoints should take that | ||||
| into consideration to balance the privacy leakage risk against the | ||||
| potential benefit of optimized media delivery when deciding whether | ||||
| to negotiate and encrypt this header extension. | ||||
| This document defines a new extension URI to the RTP Compact | 5. IANA Considerations | |||
| HeaderExtensions sub-registry of the Real-Time Transport Protocol | ||||
| (RTP) Parameters registry, according to the following data: | This document defines a new extension URI listed in the "RTP Compact | |||
| Header Extensions" subregistry of the "Real-Time Transport Protocol | ||||
| (RTP) Parameters" registry, according to the following data: | ||||
| Extension URI: urn:ietf:params:rtp-hdrext:framemarkinginfo | Extension URI: urn:ietf:params:rtp-hdrext:framemarkinginfo | |||
| Description: Frame marking information for video streams | Description: Frame marking information for video streams | |||
| Contact: mzanaty@cisco.com | Contact: mzanaty@cisco.com | |||
| Reference: RFC XXXX | Reference: RFC 9626 | |||
| Note to RFC Editor: please replace RFC XXXX with the number of this | ||||
| RFC. | ||||
| 7. References | 6. References | |||
| 7.1. Normative References | 6.1. Normative References | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
| skipping to change at page 14, line 43 ¶ | skipping to change at line 669 ¶ | |||
| [RFC7741] Westin, P., Lundin, H., Glover, M., Uberti, J., and F. | [RFC7741] Westin, P., Lundin, H., Glover, M., Uberti, J., and F. | |||
| Galligan, "RTP Payload Format for VP8 Video", RFC 7741, | Galligan, "RTP Payload Format for VP8 Video", RFC 7741, | |||
| DOI 10.17487/RFC7741, March 2016, | DOI 10.17487/RFC7741, March 2016, | |||
| <https://www.rfc-editor.org/info/rfc7741>. | <https://www.rfc-editor.org/info/rfc7741>. | |||
| [RFC7798] Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M. | [RFC7798] Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M. | |||
| M. Hannuksela, "RTP Payload Format for High Efficiency | M. Hannuksela, "RTP Payload Format for High Efficiency | |||
| Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, | Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, | |||
| March 2016, <https://www.rfc-editor.org/info/rfc7798>. | March 2016, <https://www.rfc-editor.org/info/rfc7798>. | |||
| 7.2. Informative References | 6.2. Informative References | |||
| [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and | [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and | |||
| B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms | B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms | |||
| for Real-Time Transport Protocol (RTP) Sources", RFC 7656, | for Real-Time Transport Protocol (RTP) Sources", RFC 7656, | |||
| DOI 10.17487/RFC7656, November 2015, | DOI 10.17487/RFC7656, November 2015, | |||
| <https://www.rfc-editor.org/info/rfc7656>. | <https://www.rfc-editor.org/info/rfc7656>. | |||
| [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, | [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, | |||
| DOI 10.17487/RFC7667, November 2015, | DOI 10.17487/RFC7667, November 2015, | |||
| <https://www.rfc-editor.org/info/rfc7667>. | <https://www.rfc-editor.org/info/rfc7667>. | |||
| skipping to change at page 15, line 40 ¶ | skipping to change at line 712 ¶ | |||
| [RFC8871] Jones, P., Benham, D., and C. Groves, "A Solution | [RFC8871] Jones, P., Benham, D., and C. Groves, "A Solution | |||
| Framework for Private Media in Privacy-Enhanced RTP | Framework for Private Media in Privacy-Enhanced RTP | |||
| Conferencing (PERC)", RFC 8871, DOI 10.17487/RFC8871, | Conferencing (PERC)", RFC 8871, DOI 10.17487/RFC8871, | |||
| January 2021, <https://www.rfc-editor.org/info/rfc8871>. | January 2021, <https://www.rfc-editor.org/info/rfc8871>. | |||
| [RFC9335] Uberti, J., Jennings, C., and S. Murillo, "Completely | [RFC9335] Uberti, J., Jennings, C., and S. Murillo, "Completely | |||
| Encrypting RTP Header Extensions and Contributing | Encrypting RTP Header Extensions and Contributing | |||
| Sources", RFC 9335, DOI 10.17487/RFC9335, January 2023, | Sources", RFC 9335, DOI 10.17487/RFC9335, January 2023, | |||
| <https://www.rfc-editor.org/info/rfc9335>. | <https://www.rfc-editor.org/info/rfc9335>. | |||
| [I-D.ietf-avtext-lrr] | [RFC9627] Lennox, J., Hong, D., Uberti, J., Holmer, S., and M. | |||
| Lennox, J., Hong, D., Uberti, J., Holmer, S., and M. | ||||
| Flodman, "The Layer Refresh Request (LRR) RTCP Feedback | Flodman, "The Layer Refresh Request (LRR) RTCP Feedback | |||
| Message", Work in Progress, Internet-Draft, draft-ietf- | Message", RFC 9627, DOI 10.17487/RFC9627, August 2024, | |||
| avtext-lrr-07, 2 July 2017, | <https://www.rfc-editor.org/info/rfc9627>. | |||
| <https://datatracker.ietf.org/doc/html/draft-ietf-avtext- | ||||
| lrr-07>. | ||||
| [I-D.ietf-payload-vp9] | [RFC9628] Uberti, J., Holmer, S., Flodman, M., Hong, D., and J. | |||
| Uberti, J., Holmer, S., Flodman, M., Hong, D., and J. | Lennox, "RTP Payload Format for VP9 Video", RFC 9628, | |||
| Lennox, "RTP Payload Format for VP9 Video", Work in | DOI 10.17487/RFC9628, August 2024, | |||
| Progress, Internet-Draft, draft-ietf-payload-vp9-16, 10 | <https://www.rfc-editor.org/info/rfc9628>. | |||
| June 2021, <https://datatracker.ietf.org/doc/html/draft- | ||||
| ietf-payload-vp9-16>. | Acknowledgements | |||
| Many thanks to Bernard Aboba, Jonathan Lennox, Stephan Wenger, Dale | ||||
| Worley, and Magnus Westerlund for their inputs. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Mo Zanaty | Mo Zanaty | |||
| Cisco Systems | Cisco Systems | |||
| 170 West Tasman Drive | 170 West Tasman Drive | |||
| San Jose, CA 95134 | San Jose, CA 95134 | |||
| United States of America | United States of America | |||
| Email: mzanaty@cisco.com | Email: mzanaty@cisco.com | |||
| End of changes. 84 change blocks. | ||||
| 281 lines changed or deleted | 310 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||