<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>

<!-- generated by https://github.com/cabo/kramdown-rfc2629 version 1.5.12 -->

<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" docName="draft-ietf-core-senml-data-ct-07" number="9193" obsoletes="" updates="" submissionType="IETF" category="std" consensus="true" xml:lang="en" tocInclude="true" tocDepth="3" symRefs="true" sortRefs="true" version="3">
  
  <!-- xml2rfc v2v3 conversion 3.9.1 -->
  <front>
    <title abbrev="SenML Data Content-Format Indication">Sensor Measurement Lists (SenML) Fields for Indicating Data Value Content-Format</title>
    <seriesInfo name="RFC" value="9193"/>
    <author initials="A." surname="Keränen" fullname="Ari Keränen">
      <organization>Ericsson</organization>
      <address>
        <postal>
          <city>Jorvas</city>
          <code>02420</code>
          <country>Finland</country>
        </postal>
        <email>ari.keranen@ericsson.com</email>
      </address>
    </author>
    <author initials="C." surname="Bormann" fullname="Carsten Bormann">
      <organization>Universität Bremen TZI</organization>
      <address>
        <postal>
          <street>Postfach 330440</street>
          <city>Bremen</city>
          <code>D-28359</code>
          <country>Germany</country>
        </postal>
        <phone>+49-421-218-63921</phone>
        <email>cabo@tzi.org</email>
      </address>
    </author>
    <date year="2022" month="June"/>
    <keyword>Internet of Things (IoT)</keyword>
    <keyword>Internet of Things</keyword>
    <keyword>IOT</keyword>
    <keyword>data model</keyword>
    <keyword>media type</keyword>

    <abstract>
      <t>The Sensor Measurement Lists (SenML) media types support multiple types
of values, from numbers to text strings and arbitrary binary Data Values.
In order to facilitate processing of binary Data Values, this document
specifies a pair of new SenML fields for indicating the
content format of those binary Data Values, i.e., their Internet media
type, including parameters as well as any content codings applied.</t>
    </abstract>
  </front>
  <middle>
    <section anchor="intro" numbered="true" toc="default">
      <name>Introduction</name>
      <t>The Sensor Measurement Lists (SenML) media types <xref target="RFC8428" format="default"/> can be used
to send various kinds of data.  In the example given in
<xref target="ex-1" format="default"/>, a temperature value, an indication whether a lock is open, and
a Data Value (with SenML field "vd") read from a Near Field Communication (NFC) reader is sent in a
single SenML Pack.
The example is given in SenML JSON representation, so the "vd" (Data
Value) field is encoded as a base64url string (without
padding), as per <xref section="5" sectionFormat="of" target="RFC8428" format="default"/>.</t>
      <figure anchor="ex-1">
        <name>SenML Pack with Unidentified Binary Data</name>
        <sourcecode type="senml-json"><![CDATA[
[
  {"bn":"urn:dev:ow:10e2073a01080063:","n":"temp","u":"Cel","v":7.1},
  {"n":"open","vb":false},
  {"n":"nfc-reader","vd":"aGkgCg"}
]
]]></sourcecode>
      </figure>
      <t>The receiver is expected to know how to interpret the data in the "vd"
field based on the context, e.g., the name of the data source and out-of-band
knowledge of the application. However, this context may not always be
easily available to entities processing the SenML Pack, especially if
the Pack is propagated over time and via multiple entities. To facilitate
automatic interpretation, it is useful to be able to indicate an Internet
media type and, optionally, content codings right in the SenML Record.</t>
      <t>The Constrained Application Protocol (CoAP)
Content-Format (<xref section="12.3" sectionFormat="of" target="RFC7252" format="default"/>) provides this
information in the form of a single unsigned integer. For instance, <xref target="RFC8949" format="default"/> defines the Content-Format number 60 for
Content-Type application/cbor. Enclosing this Content-Format number in the Record is illustrated in <xref target="ex-2" format="default"/>. All registered CoAP Content-Format numbers are listed
in the "<xref section="CoAP Content-Formats" relative="#content-formats" sectionFormat="bare" target="IANA.core-parameters" format="default"/>" registry <xref target="IANA.core-parameters" format="default"/>, as specified by
<xref section="12.3" sectionFormat="of" target="RFC7252" format="default"/>.
Note that, at the time of writing, the structure of this registry only
provides for zero or one content coding; nothing in the present
document needs to change if the registry is extended to allow
sequences of content codings.</t>
      <figure anchor="ex-2">
        <name>SenML Record with Binary Data Identified as CBOR</name>
        <sourcecode type="json"><![CDATA[
{"n":"nfc-reader", "vd":"gmNmb28YKg", "ct":"60"}
]]></sourcecode>
      </figure>
      <t>In this example SenML Record, the Data Value contains a string "foo" and a
number 42 encoded in a Concise Binary Object Representation (CBOR) <xref target="RFC8949" format="default"/> array. Since the example above
uses the JSON format of SenML, the Data Value containing the binary CBOR
value is base64 encoded (<xref section="5" sectionFormat="of" target="RFC4648" format="default"/>).
The Data Value after base64 decoding is shown
with CBOR diagnostic notation in <xref target="ex-2-cbor" format="default"/>.</t>
      <figure anchor="ex-2-cbor">
        <name>Example Data Value in CBOR Diagnostic Notation</name>
        <sourcecode type="cbor-pretty"><![CDATA[
82           # array(2)
   63        # text(3)
      666F6F # "foo"
   18 2A     # unsigned(42)
]]></sourcecode>
      </figure>
      <section anchor="evolution" numbered="true" toc="default">
        <name>Evolution</name>
        <t>As with SenML in general, there is no expectation that the creator of
a SenML Pack knows (or has negotiated with) each consumer of that Pack,
which may be very remote in space and particularly in time.
This means that the SenML creator in general has no way to know
whether the consumer knows:</t>
        <ul spacing="normal">
          <li>each specific Media-Type-Name used,</li>
          <li>each parameter and each parameter value used,</li>
          <li>each content coding in use, and</li>
          <li>each Content-Format number in use for a combination of these.</li>
        </ul>
        <t>What SenML, as well as the new fields defined here, guarantees is that
a recipient implementation <em>knows</em> when it needs to be updated to
understand these field values and the values controlled by them;
registries are used to evolve these name spaces in a controlled way.
SenML Packs can be processed by a consumer while not understanding all
the information in them, and information can generally be preserved in
this processing such that it is useful for further consumers.</t>
      </section>
    </section>
    <section anchor="terminology" numbered="true" toc="default">
      <name>Terminology</name>
      <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL</bcp14>
NOT", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
"<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
described in BCP&nbsp;14 <xref target="RFC2119" format="default"/> <xref target="RFC8174" format="default"/> when, and only when, they
appear in all capitals, as shown here.</t>
      <dl newline="false" spacing="normal">
        <dt>Media type:  </dt>
        <dd>A registered label for representations (byte strings) prepared for
interchange <xref target="RFC1590" format="default"/>
<xref target="RFC6838" format="default"/>, identified by a Media-Type-Name.</dd>
        <dt>Media-Type-Name:  </dt>
        <dd>A combination of a type-name and a subtype-name registered in
<xref target="IANA.media-types" format="default"/>, as per <xref target="RFC6838" format="default"/>, conventionally
identified by the two names separated by a slash.</dd>
        <dt>Content-Type:  </dt>
        <dd>A Media-Type-Name, optionally associated with parameters
(<xref section="5" sectionFormat="of" target="RFC2045" format="default"/>, separated from
the Media-Type-Name and from each other by a semicolon).
In HTTP and many other protocols, it is used in a <tt>Content-Type</tt> header field.</dd>
        <dt>Content coding:  </dt>
        <dd>A name registered in the "<xref section="HTTP Content Coding Registry" relative="#content-coding" sectionFormat="bare" target="IANA.http-parameters" format="default"/>" <xref target="IANA.http-parameters" format="default"/>, as specified by
Sections <xref target="RFC9110" section="16.6.1" sectionFormat="bare" format="default"/> and <xref target="RFC9110" section="18.6" sectionFormat="bare" format="default"/> of <xref target="RFC9110" format="default"/>, indicating an encoding
transformation with semantics further specified in <xref section="8.4.1" sectionFormat="of" target="RFC9110" format="default"/>.
Confusingly, in HTTP, content coding values are found in a header field
called "Content-Encoding"; however, "content coding" is the correct
term for the process and the registered values.</dd>
        <dt>Content format:  </dt>
        <dd>The combination of a Content-Type and zero or more content codings, identified
by (1) a numeric identifier defined in the "<xref section="CoAP Content-Formats" relative="#content-formats" sectionFormat="bare" target="IANA.core-parameters" format="default"/>" registry <xref target="IANA.core-parameters" format="default"/>,
as per <xref section="12.3" sectionFormat="of" target="RFC7252" format="default"/> (referred to as Content-Format
number), or (2) a Content-Format-String.</dd>
        <dt>Content-Format-String:</dt>
        <dd>The string representation of the combination of a Content-Type and
	zero or more content codings.</dd>
        <dt>Content-Format-Spec:</dt>
        <dd>The string representation of a content format; either a
	Content-Format-String or the (decimal) string representation of a
	Content-Format number.</dd>
      </dl>
      <t>Readers should also be familiar with the terms and concepts discussed in
<xref target="RFC8428" format="default"/>.</t>
    </section>
    <section anchor="senml-content-format-ct-field" numbered="true" toc="default">
      <name>SenML Content-Format ("ct") Field</name>
      <t>When a SenML Record contains a Data Value field ("vd"), the Record <bcp14>MAY</bcp14>
also include a Content-Format indication field, using label "ct".  The
value of this field is a Content-Format-Spec, i.e., one of the following:</t>
      <ul spacing="normal">
        <li>a CoAP Content-Format number in decimal form with no leading
zeros (except for the value "0" itself). This value represents an
unsigned integer in the range of 0-65535, similar to the  "ct"
attribute defined in <xref section="7.2.1" sectionFormat="of" target="RFC7252" format="default"/> for CoRE Link
Format <xref target="RFC6690" format="default"/>.</li>
        <li>a Content-Format-String containing a Content-Type and
zero or more content codings (see below).</li>
      </ul>
      <t>The syntax of this field is formally defined in <xref target="abnf" format="default"/>.</t>
      <t>The CoAP Content-Format number provides a simple and efficient way
to indicate the type of the data.  Since some Internet media types and
their content coding and parameter alternatives do not have assigned
CoAP Content-Format numbers, using Content-Type and zero or more
content codings
is also allowed. Both methods use a string value in the "ct" field to
keep its data type consistent across uses.  When the "ct" field
contains only digits, it is interpreted as a CoAP Content-Format
number.</t>
      <t>To indicate that one or more content codings are used with a Content-Type,
each of the content coding values is appended to the Content-Type value (media
type and parameters, if any), separated by an "@" sign, in the order of when
the content codings were applied (the same order as in <xref section="8.4" sectionFormat="of" target="RFC9110" format="default"/>).
For example (using a content coding value of "deflate", as defined in
<xref section="8.4.1.2" sectionFormat="of" target="RFC9110" format="default"/>):</t>
      <artwork name="" type="" align="left" alt=""><![CDATA[
text/plain; charset=utf-8@deflate
]]></artwork>
      <t>If no "@" sign is present after the media type and parameters,
then no content coding has been specified, and the "identity"
content coding is used -- no encoding transformation is employed.</t>
    </section>
    <section anchor="senml-base-content-format-bct-field" numbered="true" toc="default">
      <name>SenML Base Content-Format ("bct") Field</name>
      <t>The Base Content-Format field, label "bct", provides a default value for
the Content-Format field (label "ct") within its range.  The range of the
base field includes the Record containing it, up to (but not including)
the next Record containing a "bct" field, if any, or up to the end of the
Pack otherwise.  The process of resolving (<xref section="4.6" sectionFormat="of" target="RFC8428" format="default"/>) this base
field is performed by adding its value with the label "ct" to all Records
in this range that carry a "vd" field but do not already contain a
Content-Format ("ct") field.</t>
      <t><xref target="ex-bct" format="default"/> shows a variation of <xref target="ex-2" format="default"/> with multiple records, with the
"nfc-reader" records resolving to the base field value "60" and the
"iris-photo" record overriding this with the "image/png" media type
(actual data left out for brevity).</t>
      <figure anchor="ex-bct">
        <name>SenML Pack with the bct Field</name>
        <sourcecode type="senml-json"><![CDATA[
[
  {"n":"nfc-reader", "vd":"gmNmb28YKg",
   "bct":"60", "bt":1627430700},
  {"n":"nfc-reader", "vd":"gmNiYXIYKw", "t":10},
  {"n":"iris-photo", "vd":".....", "ct":"image/png", "t":10},
  {"n":"nfc-reader", "vd":"gmNiYXoYLA", "t":20}
]
]]></sourcecode>
      </figure>
    </section>
    <section anchor="examples" numbered="true" toc="default">
      <name>Examples</name>
      <t>The following examples are valid values for the "ct" and "bct" fields
(explanation/comments in parentheses):</t>
      <ul spacing="normal">
        <li>"60" (CoAP Content-Format number for "application/cbor")</li>
        <li>"0" (CoAP Content-Format number for "text/plain" with parameter
"charset=utf-8")</li>
        <li>"application/json" (JSON Content-Type -- equivalent to "50" CoAP
Content-Format number)</li>
        <li>"application/json@deflate" (JSON Content-Type with "deflate" as
content coding -- equivalent to "11050" CoAP Content-Format number)</li>
        <li>"application/json@deflate@aes128gcm" (JSON Content-Type with
"deflate" followed by "aes128gcm" as content codings)</li>
        <li>"text/csv" (Comma-Separated Values (CSV) <xref target="RFC4180" format="default"/> Content-Type)</li>
        <li>"text/csv;header=present@gzip" (CSV with header row, using "gzip" as
content coding)</li>
      </ul>
    </section>
    <section anchor="abnf" numbered="true" toc="default">
      <name>ABNF</name>
      <t>This specification provides a formal definition of the syntax of
Content-Format-Spec strings using ABNF notation <xref target="RFC5234" format="default"/>, which
contains three new rules and a number of rules collected and adapted
from various RFCs <xref target="RFC9110" format="default"/> <xref target="RFC6838" format="default"/> <xref target="RFC5234" format="default"/> <xref target="RFC8866" format="default"/>.</t>
      <figure anchor="content-format-spec">
        <name>ABNF Syntax of Content-Format-Spec</name>
        <sourcecode type="abnf"><![CDATA[
; New in this document

Content-Format-Spec = Content-Format-Number / Content-Format-String

Content-Format-Number = "0" / (POS-DIGIT *DIGIT)
Content-Format-String   = Content-Type *("@" Content-Coding)

; Cleaned up from RFC 9110,
; leaving only SP as blank space,
; removing legacy 8-bit characters, and
; leaving the parameter as mandatory with each semicolon:

Content-Type   = Media-Type-Name *( *SP ";" *SP parameter )
parameter      = token "=" ( token / quoted-string )

token          = 1*tchar
tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
               / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
               / DIGIT / ALPHA
quoted-string  = %x22 *( qdtext / quoted-pair ) %x22
qdtext         = SP / %x21 / %x23-5B / %x5D-7E
quoted-pair    = "\" ( SP / VCHAR )

; Adapted from Section 8.4.1 of RFC 9110

Content-Coding   = token

; Adapted from various specs

Media-Type-Name = type-name "/" subtype-name

; From RFC 6838

type-name = restricted-name
subtype-name = restricted-name

restricted-name = restricted-name-first *126restricted-name-chars
restricted-name-first  = ALPHA / DIGIT
restricted-name-chars  = ALPHA / DIGIT / "!" / "#" /
                         "$" / "&" / "-" / "^" / "_"
restricted-name-chars =/ "." ; Characters before first dot always
                             ; specify a facet name
restricted-name-chars =/ "+" ; Characters after last plus always
                             ; specify a structured syntax suffix


; Boilerplate from RFC 5234 and RFC 8866

DIGIT     =  %x30-39           ; 0 - 9
POS-DIGIT =  %x31-39           ; 1 - 9
ALPHA     =  %x41-5A / %x61-7A ; A - Z / a - z
SP        =  %x20
VCHAR     =  %x21-7E           ; printable ASCII (no SP)

]]></sourcecode>
      </figure>
    </section>
    <section anchor="seccons" numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>The indication of a media type in the data does not exempt a consuming
application from properly checking its inputs.
Also, the ability for an attacker to supply crafted SenML data that
specifies media types chosen by the attacker may expose vulnerabilities
of handlers for these media types to the attacker.
This includes "decompression bombs", compressed data that is crafted
to decompress to extremely large data items.</t>
    </section>
    <section anchor="iana" numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>IANA has assigned the following new labels in the 
"<xref section="SenML Labels" sectionFormat="bare" target="IANA.senml" relative="#senml-labels" format="default"/>" subregistry
of the "Sensor Measurement Lists (SenML)" registry <xref target="IANA.senml" format="default"/> (as defined in <xref section="12.2" sectionFormat="of" target="RFC8428" format="default"/>) for the
Content-Format indication, as per <xref target="tbl-senml-reg" format="default"/>:</t>
      <table anchor="tbl-senml-reg" align="center">
        <name>IANA Registration for New SenML Labels</name>
        <thead>
          <tr>
            <th align="right">Name</th>
            <th align="left">Label</th>
            <th align="left">JSON Type</th>
            <th align="left">XML Type</th>
            <th align="left">Reference</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td align="right">Base Content-Format</td>
            <td align="left">bct</td>
            <td align="left">String</td>
            <td align="left">string</td>
            <td align="left">RFC 9193</td>
          </tr>
          <tr>
            <td align="right">Content-Format</td>
            <td align="left">ct</td>
            <td align="left">String</td>
            <td align="left">string</td>
            <td align="left">RFC 9193</td>
          </tr>
        </tbody>
      </table>
      <t>Note that, per <xref section="12.2" sectionFormat="of" target="RFC8428" format="default"/>, no CBOR labels nor Efficient XML Interchange (EXI)
schemaId values (EXI ID column) are supplied.</t>
    </section>
  </middle>
  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>

<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2045.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8428.xml"/>

<reference anchor="IANA.senml" target="https://www.iana.org/assignments/senml">
          <front>
            <title>Sensor Measurement Lists (SenML)</title>
            <author>
              <organization>IANA</organization>
            </author>
          </front>
        </reference>

<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7252.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5234.xml"/>

<reference anchor='RFC9110'>
<front>
<title>HTTP Semantics</title>
<author initials='R' surname='Fielding' fullname='Roy Fielding'>
<organization />
</author>
<author initials='M' surname='Nottingham' fullname='Mark Nottingham'>
<organization />
</author>
<author initials='J' surname='Reschke' fullname='Julian Reschke'>
<organization />
</author>
<date year='2022' month='February' />
</front>
<seriesInfo name="STD" value="97"/>
<seriesInfo name="RFC" value="9110"/>
<seriesInfo name="DOI" value="10.17487/RFC9110"/>
</reference>

        <reference anchor="IANA.media-types" target="https://www.iana.org/assignments/media-types">
          <front>
            <title>Media Types</title>
            <author>
              <organization>IANA</organization>
            </author>
          </front>
        </reference>

        <reference anchor="IANA.core-parameters" target="https://www.iana.org/assignments/core-parameters">
          <front>
            <title>Constrained RESTful Environments (CoRE) Parameters</title>
            <author>
              <organization>IANA</organization>
            </author>
          </front>
        </reference>

        <reference anchor="IANA.http-parameters" target="https://www.iana.org/assignments/http-parameters">
          <front>
            <title>Hypertext Transfer Protocol (HTTP) Parameters</title>
            <author>
              <organization>IANA</organization>
            </author>
          </front>
        </reference>

<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>

      </references>
      <references>
        <name>Informative References</name>

<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4648.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8949.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6838.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.1590.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6690.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4180.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8866.xml"/>

      </references>
    </references>
    <section numbered="false" anchor="acks" toc="default">
      <name>Acknowledgments</name>
      <t>The authors would like to thank <contact fullname="Sérgio Abreu"/> for the discussions leading
to the design of this extension and <contact fullname="Isaac Rivera"/> for reviews and
feedback.
<contact fullname="Klaus Hartke"/> suggested not burdening this document with a separate
mandatory-to-implement version of the fields.
<contact fullname="Alexey Melnikov"/>, <contact fullname="Jim Schaad"/>, and <contact fullname="Thomas Fossati"/> provided helpful
comments at Working Group Last Call.
<contact fullname="Marco Tiloca"/> asked for clarifying and using the term Content-Format-Spec.</t>
    </section>
  </back>
</rfc>
