Packed CBOR: Table permutations

Austria christian@amsuess.com

cbor cbor packed uri cri Packed CBOR is a compression mechanism for Concise Binary Object Representation (CBOR) that can be used without a decompression step. This document introduces a means for altering configured tables to optimize the use of efficient encoding points by shuffling around entries in the packing tables. About This Document The latest revision of this draft can be found at . Status information for this document may be found at . Discussion of this document takes place on the CBOR Working Group mailing list (), which is archived at . Subscribe at . Source for this draft and an issue tracker can be found at .

Introduction [ See abstract. ] This document uses the "CPA" convention of . If it reaches the RFC editing stage, as IANA processes the IANA Considerations, this note is to be removed, and all occurrences of "CPA" will have been replaced with allocated numbers.

Setting up the tables by reference CBOR tag CPA115 describes a permutation of the CBOR packing tables around a rump. Unlike tags CPA113 and most other table setup tags, it does not add items; instead, it modifies the table such that items with very short encoded lengths (most efficient are the first 16 items of the shared table, and the first item of the argument items in straight use, and the first 8 items of the inverted items) can be used even when they were not originally assigned to those positions. Without excluding others, there are two groups of use cases envisioned:

Rearranging preconfigured tables. If a media type comes with a pre-configured table, and/or when one ore more external tables are referenced (e.g. by using ), a single tag CPA115 can be used around the main rump of the document to pull the most frequently used entries into the prominent positions.
Local optimization. If some table entries are frequent in a sub-tree of the document, using tag CPA115 around that subtree can be beneficiall to overall compactness even when the table was set up locally (using tag CPA113) with an ordering suitable for the whole document.

([ shuffle-shared, ?shuffle-argument, rump]) rump = any shuffle-shared = shuffle shuffle-argument = shuffle shuffle = [+(offset, ?length)] offset = uint length = nint cpa115 = 115 ; preliminary value, see IANA considerations ]]>

Permutation semantics Inside the rump, each table has a permutation applied compared to the outer table, described in the suffle-shared value for the shared table, and in the shuffle-argument value for the argument table (which defaults to the identical permutation). The applied permutations are described in inverse (so that a table index used in the rump gets mapped to a table index outside this tag): The item shuffle consecutively lists the lowest items of the inner table by indicating their postions in the outer table. A negative number indicates that a group of in total (1 - length) items are included consecutively starting with the item before the negative number. This creates a kind of run-length encoding. At the end of the shuffle array, all items that were not referenced are appended in their original sequence from the outer table. An inner index can be calculated in a way suitable to constrained systems by applying the algorighm in .

Examples

Single permutation illustrated

Using permutations to make the 6() straight argument available. Note that in the tick/tock example, using 6() over 224() saves just 1 byte, whereas the setup around tag CPA115 costs 6 bytes. This particular use would break even at 6 uses of tag 6.

Security Considerations [ I don't think there's anything to add? ] The security considerations of apply.

IANA Considerations

CBOR Tags Registry In the registry "CBOR Tags", IANA is requested to allocate one tag:

Tag: CPA115
Data item: Array [shuffle-shared, ?shuffle-argument, rump]
Semantics: "Packed CBOR: table permutation"
Reference: This document

References Normative References Packed CBOR Universität Bremen TZI TUD Dresden University of Technology The Concise Binary Object Representation (CBOR, RFC 8949 == STD 94) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation. CBOR does not provide any forms of data compression. CBOR data items, in particular when generated from legacy data models, often allow considerable gains in compactness when applying data compression. While traditional data compression techniques such as DEFLATE (RFC 1951) can work well for CBOR encoded data items, their disadvantage is that the receiver needs to decompress the compressed form to make use of the data. This specification describes Packed CBOR, a simple transformation of a CBOR data item into another CBOR data item that is almost as easy to consume as the original CBOR data item. A separate decompression step is therefore often not required at the receiver. // The present version (-12) updates the IANA "Values for Tag // Numbers" table, sorting it and cleaning up the "Data Item" column. Managing CBOR numbers in Internet-Drafts Universität Bremen TZI CBOR-based protocols often make use of numbers allocated in a registry. During development of the protocols, those numbers may not yet be available. This impedes the generation of data models and examples that actually can be used by tools. This short draft proposes a common way to handle these situations, without any changes to existing tools. Also, in conjunction with the application-oriented EDN literal "e", a further reduction in editorial processing of CBOR examples around the time of approval can be achieved. Informative References Packed CBOR: Table set up by reference Packed CBOR is a compression mechanism for Concise Binary Object Representation (CBOR) that can be used without a decompression step. This document introduces a means for setting up its tables by means of dereferencable identifiers, and introduces a pattern of using it without sending long identifiers.

Example implementation The following algorithm illustrates how table lookup can be implemented without the need for variable size storage per compression table. An implementation in fixed memory is expected to accommodate up to a fixed size of nested table setup tags. When parsing the shuffle item, if may calculate early_item_count and store it for later use, along with a reference to the position of the shuffle table, through which it iterates again for every item to be unpacked.

Open issues

For this to progress it will need some real use cases, not just synthetic examples.
An unoptimized version of this can be achieved with tag CPA113 from , without the numeric encoding of items and the special encoding for consecutive items. Using CPA113 this way is relatively easy for shared items; for argument items, it requires referencing them from the shared table or performing a no-op concatenation with an empty item.
Is skipping of used values worth it? If we accepted that this tag produced not a permutation but an extended copy of the outer table, we could do without the double iteration / pre-computation of the number of items. This would be a simplification, and more similar to using CPA113. The advantage of the permutation construction is that the non-optimized entries are not pushed back more than necessary, saving a bit (in some cases, literally) in the encoded size of those points. Is there maybe a different expression of the permutation that has the same nice properties added complexity? (Stating the number of items in front of the shuffle item saves the double processing, but adds redundant information and does not help with the branching in the implementation).

Whether this document should progress on from the initial version should depend on whether good use cases are identified and whether it outperforms alternative solutions by a sufficient margin.