<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!-- generated by https://github.com/cabo/kramdown-rfc version  (Ruby 3.1.2) -->
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" docName="draft-amsuess-cbor-packed-shuffle-00" category="std" consensus="true" tocInclude="true" sortRefs="true" symRefs="true" version="3">
  <!-- xml2rfc v2v3 conversion 3.22.0 -->
  <front>
    <title>Packed CBOR: Table permutations</title>
    <seriesInfo name="Internet-Draft" value="draft-amsuess-cbor-packed-shuffle-00"/>
    <author initials="C." surname="Amsüss" fullname="Christian Amsüss">
      <organization/>
      <address>
        <postal>
          <country>Austria</country>
        </postal>
        <email>christian@amsuess.com</email>
      </address>
    </author>
    <date year="2024" month="July" day="29"/>
    <workgroup>cbor</workgroup>
    <keyword>cbor</keyword>
    <keyword>packed</keyword>
    <keyword>uri</keyword>
    <keyword>cri</keyword>
    <abstract>
      <t>Packed CBOR is a compression mechanism for Concise Binary Object Representation (CBOR)
that can be used without a decompression step.
This document introduces a means for altering configured tables
to optimize the use of efficient encoding points
by shuffling around entries in the packing tables.</t>
    </abstract>
    <note removeInRFC="true">
      <name>About This Document</name>
      <t>
        The latest revision of this draft can be found at <eref target="https://chrysn.codeberg.page/packed-shuffle/draft-amsuess-cbor-packed-by-reference.html"/>.
        Status information for this document may be found at <eref target="https://datatracker.ietf.org/doc/draft-amsuess-cbor-packed-shuffle/"/>.
      </t>
      <t>
        Discussion of this document takes place on the
        CBOR Working Group mailing list (<eref target="mailto:cbor@ietf.org"/>),
        which is archived at <eref target="https://mailarchive.ietf.org/arch/browse/cbor/"/>.
        Subscribe at <eref target="https://www.ietf.org/mailman/listinfo/cbor/"/>.
      </t>
      <t>Source for this draft and an issue tracker can be found at
        <eref target="https://codeberg.org/chrysn/packed-shuffle"/>.</t>
    </note>
  </front>
  <middle>
    <section anchor="introduction">
      <name>Introduction</name>
      <t>[ See abstract. ]</t>
      <t>This document uses the "CPA" convention of <xref target="I-D.bormann-cbor-draft-numbers"/>.
If it reaches the RFC editing stage, as IANA processes the IANA Considerations,
this note is to be removed, and all occurrences of "CPA" will have been replaced with allocated numbers.</t>
    </section>
    <section anchor="setting-up-the-tables-by-reference">
      <name>Setting up the tables by reference</name>
      <t>CBOR tag CPA115 describes a permutation of the CBOR packing tables
around a rump.
Unlike tags CPA113 and most other table setup tags,
it does not add items;
instead, it modifies the table such that items with very short encoded lengths
(most efficient are the first 16 items of the shared table,
and the first item of the argument items in straight use,
and the first 8 items of the inverted items)
can be used even when they were not originally assigned to those positions.</t>
      <t>Without excluding others, there are two groups of use cases envisioned:</t>
      <ul spacing="normal">
        <li>
          <t>Rearranging preconfigured tables.  </t>
          <t>
If a media type comes with a pre-configured table, 
and/or when one ore more external tables are referenced (e.g. by using <xref target="I-D.amsuess-cbor-packed-by-reference"/>),
a single tag CPA115 can be used around the main rump of the document
to pull the most frequently used entries into the prominent positions.</t>
        </li>
        <li>
          <t>Local optimization.  </t>
          <t>
If some table entries are frequent in a sub-tree of the document,
using tag CPA115 around that subtree can be beneficiall to overall compactness
even when the table was set up locally (using tag CPA113) with an ordering suitable for the whole document.</t>
        </li>
      </ul>
      <sourcecode type="cddl"><![CDATA[
Packed-Shuffle = #6.<cpa115>([
    shuffle-shared,
    ?shuffle-argument,
    rump])
rump = any
shuffle-shared = shuffle
shuffle-argument = shuffle
shuffle = [+(offset, ?length)]
offset = uint
length = nint

cpa115 = 115   ; preliminary value, see IANA considerations
]]></sourcecode>
      <section anchor="permutation-semantics">
        <name>Permutation semantics</name>
        <t>Inside the rump, each table has a permutation applied compared to the outer table,
described in the <tt>suffle-shared</tt> value for the shared table,
and in the <tt>shuffle-argument</tt> value for the argument table (which defaults to the identical permutation).</t>
        <t>The applied permutations are described in inverse
(so that a table index used in the rump gets mapped to a table index outside this tag):
The item <tt>shuffle</tt> consecutively lists the lowest items of the inner table
by indicating their postions in the outer table.
A negative number indicates that a group of in total (1 - length) items are included
consecutively starting with the item before the negative number.
This creates a kind of run-length encoding.
At the end of the <tt>shuffle</tt> array,
all items that were not referenced
are appended in their original sequence
from the outer table.</t>
        <t>An inner index can be calculated in a way suitable to constrained systems
by applying the algorighm in <xref target="python"/>.</t>
      </section>
      <section anchor="examples">
        <name>Examples</name>
        <figure>
          <name>Single permutation illustrated</name>
          <artwork><![CDATA[
Outer table:
 A B C D E F G H

Permutation CBOR item:
 [1 / pick the "B" /, 4 / pick the "E" /, -2 / "3 items" /]

Inner table:
 B E F G A C D H
]]></artwork>
        </figure>
        <figure>
          <name>Using permutations to make the 6() straight argument available.</name>
          <sourcecode type="cbor-diag"><![CDATA[
1113([
  ["tick.", "tock.", "tickety."],
  ["The clock goes ", "And it goes "],
    [
    [
      6(simple(0)), / "The clock goes tick." /
      6(simple(1)), / "The clock goes tock." /
      224(simple(2)), / "And it goes tickety." /
      6(simple(1)), / "The clock goes tock." /
    ]
    115([[], [1], [
      / The argument table is now:
        ["And it goes ", "The clock goes "]
      /
      6(simple(0)), / "And it goes tick." /
      6(simple(1)), / "And it goes tock." /
      6(simple(2)), / "And it goes tickety." /
      6(simple(1)), / "And it goes tock." /
    ]]),
    [
      6(simple(1)), / "The clock goes tock." /
    ]
  ]
])
]]></sourcecode>
        </figure>
        <t>Note that in the tick/tock example, using <tt>6()</tt> over <tt>224()</tt>
saves just 1 byte,
whereas the setup around tag CPA115 costs 6 bytes.
This particular use would break even at 6 uses of tag 6.</t>
      </section>
    </section>
    <section anchor="security-considerations">
      <name>Security Considerations</name>
      <t>[ I don't think there's anything to add? ]</t>
      <t>The security considerations of <xref target="I-D.ietf-cbor-packed"/> apply.</t>
    </section>
    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      <section anchor="cbor-tags-registry">
        <name>CBOR Tags Registry</name>
        <t>In the registry "CBOR Tags", IANA is requested to allocate one tag:</t>
        <ul spacing="normal">
          <li>
            <t>Tag: CPA115</t>
          </li>
          <li>
            <t>Data item: Array <tt>[shuffle-shared, ?shuffle-argument, rump]</tt></t>
          </li>
          <li>
            <t>Semantics: "Packed CBOR: table permutation"</t>
          </li>
          <li>
            <t>Reference: This document</t>
          </li>
        </ul>
      </section>
    </section>
  </middle>
  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <reference anchor="I-D.ietf-cbor-packed">
          <front>
            <title>Packed CBOR</title>
            <author fullname="Carsten Bormann" initials="C." surname="Bormann">
              <organization>Universität Bremen TZI</organization>
            </author>
            <author fullname="Mikolai Gütschow" initials="M." surname="Gütschow">
              <organization>TUD Dresden University of Technology</organization>
            </author>
            <date day="2" month="March" year="2024"/>
            <abstract>
              <t>   The Concise Binary Object Representation (CBOR, RFC 8949 == STD 94)
   is a data format whose design goals include the possibility of
   extremely small code size, fairly small message size, and
   extensibility without the need for version negotiation.

   CBOR does not provide any forms of data compression.  CBOR data
   items, in particular when generated from legacy data models, often
   allow considerable gains in compactness when applying data
   compression.  While traditional data compression techniques such as
   DEFLATE (RFC 1951) can work well for CBOR encoded data items, their
   disadvantage is that the receiver needs to decompress the compressed
   form to make use of the data.

   This specification describes Packed CBOR, a simple transformation of
   a CBOR data item into another CBOR data item that is almost as easy
   to consume as the original CBOR data item.  A separate decompression
   step is therefore often not required at the receiver.


   // The present version (-12) updates the IANA "Values for Tag
   // Numbers" table, sorting it and cleaning up the "Data Item" column.

              </t>
            </abstract>
          </front>
          <seriesInfo name="Internet-Draft" value="draft-ietf-cbor-packed-12"/>
        </reference>
        <reference anchor="I-D.bormann-cbor-draft-numbers">
          <front>
            <title>Managing CBOR numbers in Internet-Drafts</title>
            <author fullname="Carsten Bormann" initials="C." surname="Bormann">
              <organization>Universität Bremen TZI</organization>
            </author>
            <date day="29" month="February" year="2024"/>
            <abstract>
              <t>   CBOR-based protocols often make use of numbers allocated in a
   registry.  During development of the protocols, those numbers may not
   yet be available.  This impedes the generation of data models and
   examples that actually can be used by tools.

   This short draft proposes a common way to handle these situations,
   without any changes to existing tools.  Also, in conjunction with the
   application-oriented EDN literal "e", a further reduction in
   editorial processing of CBOR examples around the time of approval can
   be achieved.

              </t>
            </abstract>
          </front>
          <seriesInfo name="Internet-Draft" value="draft-bormann-cbor-draft-numbers-03"/>
        </reference>
      </references>
      <references>
        <name>Informative References</name>
        <reference anchor="I-D.amsuess-cbor-packed-by-reference">
          <front>
            <title>Packed CBOR: Table set up by reference</title>
            <author fullname="Christian Amsüss" initials="C." surname="Amsüss">
         </author>
            <date day="4" month="March" year="2024"/>
            <abstract>
              <t>   Packed CBOR is a compression mechanism for Concise Binary Object
   Representation (CBOR) that can be used without a decompression step.
   This document introduces a means for setting up its tables by means
   of dereferencable identifiers, and introduces a pattern of using it
   without sending long identifiers.

              </t>
            </abstract>
          </front>
          <seriesInfo name="Internet-Draft" value="draft-amsuess-cbor-packed-by-reference-02"/>
        </reference>
      </references>
    </references>
    <section anchor="python">
      <name>Example implementation</name>
      <t>The following algorithm illustrates how table lookup can be implemented
without the need for variable size storage per compression table.</t>
      <t>An implementation in fixed memory is expected to accommodate up to a fixed size of nested table setup tags.
When parsing the shuffle item,
if may calculate <tt>early_item_count</tt> and store it for later use,
along with a reference to the position of the shuffle table,
through which it iterates again for every item to be unpacked.</t>
      <sourcecode type="python"><![CDATA[
def early_item_count(shuffle):
    count = 0
    for (offset, length) in shuffle:
        length = 1 if length is None else 1 - length
        count += length
    return count

def lookup(index, shuffle, outertable):
    shuffle_early_items = early_item_count(shuffle)
    if index < shuffle_early_items:
        for (offset, length) in shuffle:
            length = 1 if length is None else 1 - length
            if index < length:
                return outertable[offset + index]
            index -= length
    else:
        outer_index = index - shuffle_early_items
        for (offset, length) in shuffle:
            length = 1 if length is None else 1 - length
            if offset <= outer_index:
                outer_index += length
        return outertable[outer_index]
]]></sourcecode>
    </section>
    <section anchor="open-issues">
      <name>Open issues</name>
      <ul spacing="normal">
        <li>
          <t>For this to progress it will need some real use cases,
not just synthetic examples.</t>
        </li>
        <li>
          <t>An unoptimized version of this can be achieved with tag CPA113 from <xref target="I-D.ietf-cbor-packed"/>,
without the numeric encoding of items
and the special encoding for consecutive items.
Using CPA113 this way is relatively easy for shared items;
for argument items, it requires referencing them from the shared table
or performing a no-op concatenation with an empty item.</t>
        </li>
        <li>
          <t>Is skipping of used values worth it?
If we accepted that this tag produced not a permutation but an extended copy of the outer table,
we could do without the double iteration / pre-computation of the number of items.
This would be a simplification, and more similar to using CPA113.
The advantage of the permutation construction is that the non-optimized entries are not pushed back more than necessary,
saving a bit (in some cases, literally) in the encoded size of those points.  </t>
          <t>
Is there maybe a different expression of the permutation that has the same nice properties added complexity?
(Stating the number of items in front of the shuffle item saves the double processing,
but adds redundant information and does not help with the branching in the implementation).</t>
        </li>
      </ul>
      <t>Whether this document should progress on from the initial version
should depend on whether good use cases are identified
and whether it outperforms alternative solutions by a sufficient margin.</t>
    </section>
  </back>
  <!-- ##markdown-source:
H4sIAAAAAAAAA71ZX3PbNhJ/56fAKQ+VWkmOk57nTm0uddL26pmbppOk0wdX
E0MkJKEmARYALes87ie7t/ti99sFSJGyc9fpw73IJggs9v/+djmbzbKbhXie
ZUGHUi3E6AeZX6tCvH715u1CvJerUolauaoJMmhr/CgrbG5kha2Fk+swk5Vv
lPezfGXdrObDM79t1utSzZ4+zXIZ1Ma6/UL4UGSZrt1CBNf48Ozp078+fZbt
rLveONvUC0EUsmu1x1KxyISYxRX6J9Llfxun4zv8zW6UaRTtTSSIazxVUpeR
3ldahfXcug1WnartQmxDqP3i5CS3hVopt6GXJ/nW7b05GbKPIyW496F3iPfN
u7O13KijUycfV8tqP3NqrZwyuZpvQ1VmmWzC1jpIMMNtQkTFvt467YOWRpxX
/t//8p7f5bYxgRR5Du05LXlRJVHbE1+le8FilWXGugpmu2EVXcy+npM2+iwt
YBGzPuzKstlsJuQKF8g8ZFnPGYT2QoKJqnagD1cQlcq30mhfCVAQr63JtVfi
lTbS7cWb1S8qD+Ktou3KRO8RY6I0ycJWBpFDvpUSjccFOw01NAEXFKp/hQ+q
nmfvt7gbbtdUICQ0tGCLJlfET6Wk8Xy/LINy2mzAolnrTeNANpD7+ixYYeug
K/1PJcKWrxR2LdR6rXNNJGEQW9DZ2oK6z1Z7Ea1JaxKuZQrsgdJxpzZMg/RH
b+MV86i4ShcF/CZ7Ii4SkyR1lv18Kd4p1el1Ln5eZkdSgSfPhEevfzgfkRDw
bdYZOL27+xNZb0WGMiYaMLqZaSo4or+/n2cXa6EDnFzm20Tq7bevhSp0ID59
gKtOhfTi4vz7c1E7CwW2d/ISDOh1oVyM8ymMBP6MDYosDxXCVk5V9kYVIAOF
yLIUNs8bx/7sic/I+07jzVbeKBxRhsKulHkyMp2ylBMKkVifk7reqcBcNjXz
E5UqYIYuXrKMnRBSCFxyevpneIpHDlixG/QyFPFBNHj70EpZMqUUrqngVz+a
Ul/TbRsfiT5nwSrrg7Cg4eI54VUgxrBtmkHFhVWsGCGLAipXlf8CUQRXldAM
3lfwpbVOqk0Umnwr2Ot5f1TFjXLkZ9YlB4ROSmU2YeuzMfNwcFDpoueutcP6
6Vkik0T1W9l5+zQjEQ57aWO7T7pNCiE+rSm+nNSbLbvf8cm/DC/RcEhHduPV
SdYPXwVfFbut4tDYix1MxgqyTm+QDspyD8fzemOIS4tNFhFYW6/Z1eABP6X4
V7d52XAksv79lAg6FeXf2ZjmmSWK4VySAytzoylZcDb7FBlHOifNhsPZqQfZ
ALchF645dxRairCvFaU1lawi6dTs+NhU4BT0c4JMw4LiPoinYGz8qFvkHsjZ
Oi6x23luIcZqvpmTNzeeuLq7e0nR/L8qxP39ZEqXCjpUqr7r93WffJpMhFpg
2LVbm7XZBWSg9rpBXPI+cq61U782eFfukw27/MYWUpQiKm3IXfqW+lT8A/Fb
tgmVQ65VqYcWk8O31EgT7U3kcJCmWc2CU+qYR5I16qcnaCccIgcH+VwSfqWM
ovCgNEQJHt5J/1L1QIo1iqvmwDETazvkQEQ05RpKReSc46OLn0+SL8DOrohl
xTc6nqdiQ9R2W1se2IcOfvvtN5GjAKSqOXsXAYF4IZ6czb/MawmJ/ja+5Lrd
IqQYulNee9kutnEal8mgy0nGdn0BlvbZ8DAWW8ByTODhK6xcfja26zU0MBUv
Y76ZLLO4grcN7J/FZTwZesoi63ikXyG+oAgpYX0u9DeybBAeXqUykg/KCCkF
Cf6J+KGXoT1QC2pb7rPsgjezPknAqaDylQy1lcepXdZ1qSExG9m1yQSu1IQ2
WU+zti4Ubam+8n11XUWOOzM+TJ7dsSNtHp/stBzZHe+2GrwXai2bMviWN4hH
siJkepJM5lT/VSdQH2FzzAyE4NTrVTb2NoaCTFdqU6jbGL6JafaSjcL1FWhH
DQ13Q1dJ5VTW5WayYE64TrQiX7EVVd4QKkR8lACXsZyVdqd8OK4MptU+ISdc
A3G5nOOtdpQ+omCJx5615tm5MGrD6DMBgvY810+WlZM+XUbnbYAmx6doAJLv
Jl5IadpQ9UCfMOQe0McxOxzToRV2pdY2ldUjFhLizIGlAsMLYIiCGHCNmaXY
aCEjJAhMQ8Utfc+5ElSK9vAqJKbIJovU1cdDjciIf7KYKTpjQnVtAUXIUAoF
DFojKz/UYnZukhmikVOOhNflTcloi3PvTu4PeQyeQXoiBEB12e89cUgWJK/c
J/sBsG2Ii21FJO7u6j0qtSG8SVH9za2sakJWHOdvDiyh4TgXr8Rr8bX4Rnwr
/i6+QzPRi+TYUeBCbLw8FSei1vl1xL+vRuJkKj4frH3Da7NnWBw9j6rEypLy
h+ld+Spdds4Xf8dM3aHfpNb2xehdrKP9hAKkSs0UaWgk7lmKCK613GSnKASc
ri9HCODr+WgqRsG2/2BFhf18tJzyDoqhHPXkWmwIHtKWc8olIT0vYzK/7P0K
cTb2mvQ3fjqZTEm2IyrxWnFyvP30I9vtYPuzZ5+3B56lA32WOgn+GP0l/6Ii
jC8vl1MYkX4SoRPx/mF+5G5it0hbSGcDBU0fXDdatuQ+pq5jcf6bLIO99vG9
f1BNHyW9XE6mjxv892p4maH0D934R8Yqg4qBUK7kdcxlZ+PJAdZ3JpA3Upec
K9jPv6e2LvYjCRpByhO6HViWQ3qawNgV6F0xuhJX5FCTq8yjtfPil4Y6EaDa
gLK5I5QuY4mI3VKL3Hqg1VIROeMTPqXYmjIzpSjHgH5nm7IQK5C6jtANDJ7F
1phSK2idpXYRbacO+6OelRvtC0Ay8wnlZG2uY/vwiSfYRAsbrodF8TJ14MRt
IjXELb2u+3hmcn8fEyRz8kjnzImR89t7aizfqg2qp9tTroolOi2gW243wfmZ
EDTCWNmHVLlTs8zdBsTnBgcHFkmnePpaBhkTqTinYiOuLo+w5SO4MmLKKxx/
10Kx48lfOJ78jbi3SvVqIQbTizj9WOE86SQVBcGeXnWzn7snqXZExa8thNvx
gIUrTKAK06VjL7Z2l5gorb2GR6Wa1lFF0WzHRrGGg3eCZjfS6dh007zHB+vk
hiUZDK/6hXPIJyJirW9BrFJo7vZkFHVbq7y1SQ4yaPHJLDQVIHwV9/N98BuT
7Hc0O5hnP1EjApf3bW1tQTnZb5rpNaJ4fyja4gqtbLn/QG8/8PTviucTJBKd
YWlpo0vte2lbkCMP0KLFoW0TdxgaxLsT8A1bBOxmKyKI1QzxoiHkhtpKukvx
xILBU5wHNSaGRGx+onmBv9fimPFxum2yOIwy0U885Sci3bUkHaozLYeHitG1
JacCukpPMM/3FB6qRAI54MLuULzrsxf9dQejOBNfZcxw9LExo6dpe/M0gizW
UOI8vflwENCDn4+Ky2f0OqGyLx87fhDvd+vhD+viiJv4dki3p56D8JepPfws
Hl0OCTK12UDBxMCBLhP6EPe9aPc/pov/vyqSYF++6DP5UCN9CYau9BF9HfYv
Uwss3gDcgy0a+lAa/5a7yDharZ3dUGKiuOPpKSczHqegFpaHYRehCWocuPj6
vUEgI3m3NTvOZ5DRGtPOuwuaMvou7KmriWkUbbZGPKep7GHyIbjBuLt7vPLR
/YOki/TviIF2gE5dWrJkO030yJ0aQnR7yLi9Di0emONERDaJD2aWGhYuiqVM
zRxQxp4ppL49zV+jywwHnNM4D/+10dBtlxBT7q1E10n1JwAgBDooFvRRhIsT
9D2zNTFMldjEGtFOh1RVh5gRWfUXXvhrXddJE9yY88wAklhHXhlexlHZjiyQ
q5oLBeGwth0nX6CPG0WcMA+6lRV9ITE8beQ2Mbf1vk3ngxGIIPo5g6nCDgxW
2GYVC05EK9Rl8bizqo/m56kbbw1K9uGanzCa4sEkvE6vudu3ZpqG544qb6UJ
08G1m55NIw2cLG4AOqgqp7v6QsaeNH454S8PUTvUMJvZwa37A0bSVN34LZYJ
g0QecM4gjugrh3R70glwa7ToCn4xpmxCARbjSpSskrLcT1pI3M7k27LeTq3p
C1GcePo0nUbVZn0Ues0+RnPsDmg8IiKLtG3hsgQTRuc8cMWuwFIVRRpzleoW
6JS8ZvwudFOVY+swaHEWNx9Vdy7WEbD3zJ++/oAY6YXdqigoRApgdsmT2vRR
kIZupjh88tiqsj7MUVZOmpxhddLZEErRlAuYJ35JGXzt8lv2oi7vWXMIR22A
U5AvUuLK0t5C1Txi4Xkuk9xYW/S+A/AEiIdta00DFWxud8LeCIAU1T5+KjRx
4uNt2UTET4MPQaPC9M2lQjbRZp79B9/O/okjHwAA

-->

</rfc>
