<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!-- generated by https://github.com/cabo/kramdown-rfc version 1.7.17 (Ruby 3.3.1) -->
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" docName="draft-rankin-ccsv-01" category="info" consensus="true" submissionType="IETF" tocInclude="true" sortRefs="true" symRefs="true" version="3">
  <!-- xml2rfc v2v3 conversion 3.21.0 -->
  <front>
    <title abbrev="CCSV">Common Format and Media Type for Control-Character-Separated Values (CCSV) Files</title>
    <seriesInfo name="Internet-Draft" value="draft-rankin-ccsv-01"/>
    <author fullname="Mike Rankin">
      <organization>ICF International, Inc.</organization>
      <address>
        <email>mrankin@oldgrognard.pub</email>
      </address>
    </author>
    <date year="2024" month="June" day="25"/>
    <keyword>next generation</keyword>
    <keyword>unicorn</keyword>
    <keyword>sparkling distributed ledger</keyword>
    <abstract>
      <?line 37?>

<t>This document establishes the format used for Control-Character-Separated Values (CCSV) files and registers the associated MIME type "text/ccsv".</t>
    </abstract>
    <note removeInRFC="true">
      <name>About This Document</name>
      <t>
        The latest revision of this draft can be found at <eref target="https://oldgrognard.github.io/ccsv-id/draft-rankin-ccsv.html"/>.
        Status information for this document may be found at <eref target="https://datatracker.ietf.org/doc/draft-rankin-ccsv/"/>.
      </t>
      <t>Source for this draft and an issue tracker can be found at
        <eref target="https://github.com/oldgrognard/ccsv-id"/>.</t>
    </note>
  </front>
  <middle>
    <?line 42?>

<section anchor="introduction">
      <name>Introduction</name>
      <t>A CCSV (Control-Character-Separated Values file) is a file format that enables moving data between spreadsheets, statistical analysis programs, databases, and any other program that works with rectangular data. It is very similar to (CSV) Comma-Separated Values files <xref target="RFC4180"/>, (TSV) Tab-Separated Values files, and their derivatives. Unlike those file types, the CCSV minimizes usage ambiguity by having non-printable characters as delimiters. The two delimiter characters may not appear in the document's text, making the practice of escaping certain characters or adding additional delimiters for certain strings unnecessary. This document seeks to define the format of Control Character Separated Values (CCSV) files and formally register the "text/ccsv" Media Type for CCSV in accordance with <xref target="RFC6838"/>.</t>
    </section>
    <section anchor="conventions-and-definitions">
      <name>Conventions and Definitions</name>
      <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
"<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they
appear in all capitals, as shown here.</t>
      <?line -18?>

<section anchor="definition-of-the-ccsv-format">
        <name>Definition of the CCSV format</name>
        <t>In order for a file to be a CCSV, it <bcp14>MUST</bcp14> adhere to the following formatting rules:</t>
        <section anchor="formatting-rules">
          <name>Formatting Rules</name>
          <ol spacing="normal" type="1"><li>
              <t>The file <bcp14>MUST</bcp14> use UTF-8 encoding. Since US-ASCII is a subset of UTF-8, programs may create CCSV files with that encoding.  A consuming program may not be able to interpret all characters if it only works with US-ASCII and <bcp14>SHOULD</bcp14> work with UTF-8 if the source is unknown.</t>
            </li>
            <li>
              <t>The file <bcp14>MUST NOT</bcp14> begin with a Byte Order Mark (U+FEFF).</t>
            </li>
            <li>
              <t>A CCSV <bcp14>MUST</bcp14> begin with a header.  The header consists of the names of the columns separated with US (U+001F) entities.</t>
            </li>
            <li>
              <t>A Unit Separator US (U+001F) is used between each field in a record. Note that carriage returns and line feeds are not part of the delimiter and are valid characters in the body of a field.</t>
            </li>
            <li>
              <t>A Record Separator RS (U+001E) is used between each record in the file including the header.</t>
            </li>
            <li>
              <t>The header and each record, if any, <bcp14>MUST</bcp14> contain the same number of US (U+001F) entities. For example, the header and each record <bcp14>MUST</bcp14> have the same number of fields.</t>
            </li>
            <li>
              <t>Empty fields are represented by consecutive delimiters.</t>
            </li>
            <li>
              <t>The US (U+001F) entity and the RS (U+001E) entity <bcp14>MUST NOT</bcp14> appear in the body of a field.</t>
            </li>
          </ol>
          <t>The ABNF grammar <xref target="STD68"/> appears as follows:</t>
          <artwork><![CDATA[
file = header RS *(record RS) [record]
header = name *( US name )
record = field *( US field )
name = field
field = *VCHAR
VCHAR = %x00-1D / %x20-D7FF / %xE000-10FFFF ; all characters except the designated delimiters and surrogates
RS = %x1E ; record separator
US = %x1F ; unit separator
]]></artwork>
        </section>
      </section>
    </section>
    <section anchor="encoding-considerations">
      <name>Encoding Considerations</name>
      <t>CCSV files <bcp14>MUST</bcp14> be encoded using UTF-8 <xref target="RFC3629"/>.</t>
      <t>Implementations <bcp14>MUST NOT</bcp14> add a byte order mark (U+FEFF) to the beginning of the file or networked-transmitted text.</t>
      <section anchor="why-utf-8">
        <name>Why UTF-8?</name>
        <section anchor="compatibility">
          <name>Compatibility</name>
          <t>UTF-8 is widely supported across different platforms, operating systems, and languages.  This ensures that CCSV files can be opened and read correctly regardless of the environment they are used in.</t>
        </section>
        <section anchor="internationalization">
          <name>Internationalization</name>
          <t>UTF-8 supports a vast range of characters from various languages, including those that use non-Latin scripts. This is crucial for data that might include international names, addresses, or other text in multiple languages, ensuring that all characters are preserved and displayed correctly.</t>
        </section>
        <section anchor="efficiency">
          <name>Efficiency</name>
          <t>UTF-8 is a variable-width encoding scheme that uses 1 to 4 bytes for each character. It is efficient for encoding text that is primarily in English, as it uses only one byte for the most common characters, but can still accommodate characters from other languages when needed.</t>
        </section>
        <section anchor="standardization">
          <name>Standardization</name>
          <t>By requiring UTF-8, the CCSV format ensures a standard way of encoding text, which simplifies processing, parsing, and exchanging files. It helps in avoiding the complexities and potential errors that can arise from dealing with multiple encodings.</t>
        </section>
        <section anchor="future-proofing">
          <name>Future-Proofing</name>
          <t>As the internet and technologies continue to evolve, UTF-8 remains a robust and forward-compatible choice, ensuring that CCSV files remain accessible and usable in the long term</t>
        </section>
      </section>
    </section>
    <section anchor="security-considerations">
      <name>Security Considerations</name>
      <t>CCSV files alone are considered relatively harmless as there is no additional prescribed processing. However, the file may be parsed and further processed by the recipient. To the extent that a receiving application executes arbitrary system level commands from strings contained in a CCSV file, they may be at risk.</t>
    </section>
    <section anchor="interoperability-considerations">
      <name>Interoperability Considerations</name>
      <t>Adherence to the Formatting Rules <xref target="formatting-rules"/> and the Encoding Considerations <xref target="encoding-considerations"/> ensures a high level of interoperability.</t>
    </section>
    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      <t>This section provides the media-type registration application (as per <xref target="RFC6838"/>).</t>
      <t>Type name: text</t>
      <t>Subtype name: ccsv</t>
      <t>Required parameters: N/A</t>
      <t>Optional parameters: N/A</t>
      <t>Encoding considerations: See <xref target="encoding-considerations"/></t>
      <t>Security considerations: See <xref target="security-considerations"/></t>
      <t>Interoperability considerations: See <xref target="interoperability-considerations"/></t>
      <t>Published specification: TBD</t>
      <t>Applications that use this media type:</t>
      <artwork><![CDATA[
    Databases, spreadsheets, statistical programs, and data conversion utilities
]]></artwork>
      <t>Fragment identifier considerations: N/A</t>
      <t>Additional information:</t>
      <artwork><![CDATA[
    Deprecated alias names for this type: N/A
    Magic number(s): N/A
    File extension(s): CCSV
    Macintosh file type code(s): TEXT
]]></artwork>
      <t>Person &amp; email address to contact for further information:</t>
      <artwork><![CDATA[
    Mike Rankin
    ICF International, Inc.
    1902 Reston Metro Plaza
    Reston, VA  20190
    USA

    mrankin@icf.com
]]></artwork>
      <t>Intended usage: COMMON</t>
      <t>Restrictions on usage: N/A</t>
      <t>Author/Change controller: Mike Rankin</t>
      <t>Provisional registration?</t>
    </section>
  </middle>
  <back>
    <references anchor="sec-normative-references">
      <name>Normative References</name>
      <reference anchor="RFC4180">
        <front>
          <title>Common Format and MIME Type for Comma-Separated Values (CSV) Files</title>
          <author fullname="Y. Shafranovich" initials="Y." surname="Shafranovich"/>
          <date month="October" year="2005"/>
          <abstract>
            <t>This RFC documents the format used for Comma-Separated Values (CSV) files and registers the associated MIME type "text/csv". This memo provides information for the Internet community.</t>
          </abstract>
        </front>
        <seriesInfo name="RFC" value="4180"/>
        <seriesInfo name="DOI" value="10.17487/RFC4180"/>
      </reference>
      <reference anchor="RFC6838">
        <front>
          <title>Media Type Specifications and Registration Procedures</title>
          <author fullname="N. Freed" initials="N." surname="Freed"/>
          <author fullname="J. Klensin" initials="J." surname="Klensin"/>
          <author fullname="T. Hansen" initials="T." surname="Hansen"/>
          <date month="January" year="2013"/>
          <abstract>
            <t>This document defines procedures for the specification and registration of media types for use in HTTP, MIME, and other Internet protocols. This memo documents an Internet Best Current Practice.</t>
          </abstract>
        </front>
        <seriesInfo name="BCP" value="13"/>
        <seriesInfo name="RFC" value="6838"/>
        <seriesInfo name="DOI" value="10.17487/RFC6838"/>
      </reference>
      <reference anchor="RFC2119">
        <front>
          <title>Key words for use in RFCs to Indicate Requirement Levels</title>
          <author fullname="S. Bradner" initials="S." surname="Bradner"/>
          <date month="March" year="1997"/>
          <abstract>
            <t>In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t>
          </abstract>
        </front>
        <seriesInfo name="BCP" value="14"/>
        <seriesInfo name="RFC" value="2119"/>
        <seriesInfo name="DOI" value="10.17487/RFC2119"/>
      </reference>
      <reference anchor="RFC8174">
        <front>
          <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
          <author fullname="B. Leiba" initials="B." surname="Leiba"/>
          <date month="May" year="2017"/>
          <abstract>
            <t>RFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.</t>
          </abstract>
        </front>
        <seriesInfo name="BCP" value="14"/>
        <seriesInfo name="RFC" value="8174"/>
        <seriesInfo name="DOI" value="10.17487/RFC8174"/>
      </reference>
      <referencegroup anchor="STD68" target="https://www.rfc-editor.org/info/std68">
        <reference anchor="RFC5234" target="https://www.rfc-editor.org/info/rfc5234">
          <front>
            <title>Augmented BNF for Syntax Specifications: ABNF</title>
            <author fullname="D. Crocker" initials="D." role="editor" surname="Crocker"/>
            <author fullname="P. Overell" initials="P." surname="Overell"/>
            <date month="January" year="2008"/>
            <abstract>
              <t>Internet technical specifications often need to define a formal syntax. Over the years, a modified version of Backus-Naur Form (BNF), called Augmented BNF (ABNF), has been popular among many Internet specifications. The current specification documents ABNF. It balances compactness and simplicity with reasonable representational power. The differences between standard BNF and ABNF involve naming rules, repetition, alternatives, order-independence, and value ranges. This specification also supplies additional rule definitions and encoding for a core lexical analyzer of the type common to several Internet specifications. [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="STD" value="68"/>
          <seriesInfo name="RFC" value="5234"/>
          <seriesInfo name="DOI" value="10.17487/RFC5234"/>
        </reference>
      </referencegroup>
      <reference anchor="RFC3629">
        <front>
          <title>UTF-8, a transformation format of ISO 10646</title>
          <author fullname="F. Yergeau" initials="F." surname="Yergeau"/>
          <date month="November" year="2003"/>
          <abstract>
            <t>ISO/IEC 10646-1 defines a large character set called the Universal Character Set (UCS) which encompasses most of the world's writing systems. The originally proposed encodings of the UCS, however, were not compatible with many current applications and protocols, and this has led to the development of UTF-8, the object of this memo. UTF-8 has the characteristic of preserving the full US-ASCII range, providing compatibility with file systems, parsers and other software that rely on US-ASCII values but are transparent to other values. This memo obsoletes and replaces RFC 2279.</t>
          </abstract>
        </front>
        <seriesInfo name="STD" value="63"/>
        <seriesInfo name="RFC" value="3629"/>
        <seriesInfo name="DOI" value="10.17487/RFC3629"/>
      </reference>
    </references>
    <?line 185?>

<section numbered="false" anchor="acknowledgments">
      <name>Acknowledgments</name>
      <t>TODO acknowledge.</t>
    </section>
  </back>
  <!-- ##markdown-source:
H4sIAAAAAAAAA5VZ7XLbxhX9j6fY0tPWSklKdDyJw8ZNaImMOWNJrkg5zWTy
YwksyR0Bu+guQJnJuM/SZ+mT9dy7CxCU5Gbq8dgA9uvec8/9Wg4Gg6TSVa7G
ondui8IaMbOukJWQJhOXKtNSLPelEmvrxLk1lbP54HwrnUwr5QYLVeKxUpn4
IPNaefH8/Hzx4UTMdK58L5GrlVM72hpfe0mKmRvr9mOhzdomSWZTIwscnTm5
rgZOmjttBmnqd4OzUeLrVaG91zgUAozFfLqcCfFMyNxbbKlNpkqFf0zV64se
JK2s0zKnl/nkDf6DxL35zXLWS0xdrJQbJxkEGCepNV4ZX/uxqFytEgj4ZYJ9
nZJjMbmZTvByb93dxtm6HIsffxA/4k2bjfiBviR3ao/hbJyIgTDqYyU2yiig
AEnpU210ah0/eqBzl9PKTPvK6VVNUOUq2yiX7JSpIc0zIdqD6CUoe3wiPhdS
5zTle/VRFmWuhqkt6Lt06XYstlVV+vHpaWfwFNtha11t6xXgsnmGYzZGuuyU
EdZZD+M5EPEVxpsdOvOGYfFQ22bF6SNDDbdVkfeSRNbV1jqCBJsKsa7zPJi2
d6nvlLjhFT0es24jjf6V8cL4/Hwm5gZkMvxF5n28psMwVwW1e0U48vuudGW9
wsGG2ap3QDIhVh3eksFgIOQKuIOrSbLcai/AuLoAYwSUlqtc+y04W22Z3kT6
2sM+/x/V10R19hanNrCycmFH6b1NNS+4nF9O2a6iV4EvjGZvGCUsdJblKoEt
53RkVqdMpGQiaH+c8vuSkAgnAupJfmyUqbb4RxnoiTmF3TEPZSXFSlX3ShnQ
E5TPgIGqfF8AkQry61TmUEfme48dS6DtZIFhWrmSXuGRlJVmLyz0dM2UcBz5
jRf3IA7gSCtpNnUuHS8einlFQu6U2wuvC00DlYWGBCMFH/m0al789tsfbmbn
L0evzj596ovnS1qwlKvPTA8CQjaNg5XTO2aEH4pbkxMZwVSvAlJkFMwnezHa
hTYQ7FfsVHu5gRGLld7UutqL1V5sJUNorBmUThsikBJpYxagD36pHOvpbSiW
2LS6t4dv3bmF3GMjRNmyVIBBG5ahoeefQSEQpY9pHAVorKSlOlXCrsHeVJY0
kCpXSSzu7AzyyiyjQfovuFRHLmZ3s4xiktlAWWNUqryXbk9ydx3FKwWDVqTG
WhvVdRYIEskpWnKK33cTXp3n+9ZfeM+OZzxKO2QZSCtTxNVMGmDABAus+OrV
l68+fRqSA0EaBFVSOZx0QSIzBJ78XwlEbmJo5hGWbhdLShX0v7i65ueb6d9v
5zfTC3pevJ28e9c+JHHG4u317buLw9Nh5fn15eX06iIsxldx9CnpXU5+6gVi
9q7fL+fXV5N3vWD2LtrIQYT1SmEIyMA/CUnpkwwmR/7AC9a8OX//n3+PXkYA
XoxG33z6FF9ejb5+iZf7rTLhNGuAdHgFzPvkQDjYQBCNKiTUPpHXb+29EXBp
BTS/+JmQ+WUsvl2l5ejl3+IHUvjoY4PZ0UfG7PGXR4sDiE98euKYFs2j7w+Q
PpZ38tPRe4N75+O33+VE6sHo1Xd/Qzx+9qzDGeJ3GxgC5ZNkjs8OQYWJGcNt
MJjkiX2hK8FAyYyQpMHgMnlu78krw04VPboaPjGmY5/Fqos/39DnJBmFEMJH
8I5ITuJ2ORu8QlBPLbn4UCw0ucPtYjBZnM/nIQegcPKK3ZNn99sgzlEnRcyv
Gq3YKdmZYrJo9hUTQWVSXZBATYRvghZpuwqKtzQNdDqEIb0mJAL7DjmhFZSo
GU1Nw3GUldMBdm9rB9U0Rac7A2YOHyNC1l8hipiwXoo3e6h2zQa6ROklnt/+
ZTadzU54bcyovPJo1RZZUDkoTbuHF9Ye0ck3LKBqpn1JbV4XCDK+DXZROzrw
7Gw0OxEUhyqNrBNOvgWnmtgI5nRnkoZUdjRZWcl0Cx1Vzr4uKY+CckNxZSsV
zJRKh0oX6Qm41y4GOybyWikEN4oiZCecVjUiH7IQZ2/M2MlcZ0cmC0loZbM9
rZJBiKjADUvRUeGmUWH6GRWC3M2ubDSQNa+zJqNF2BuzRuBJvM76PvEBxUY/
2A1m4czFDIFJRCjtmexPoU9+JWJV3O+c+vCYsDsyvHpqawYimHJalKgGwgeG
0SnQH/0EsQBFAjcXaU0VR7caaJR8JOS+KVaOAI1DLcmPy4RHFuLkNnlzNRPk
pwVmIhsslhdfITPGtVyehChEIedf+JOwTV43iOD8L55HNG4WJ+Ln8PxLEsdf
sw9gDinBjydJnP468jWMheeThOfEoSR8fC2++HD+dnKT8L94/ePHs7PB6EKc
4unF2eDi69mMn6dn9P1shj/irw9ji/qYqrKKtPYa7QCB36lxCFJfO4Qtam8S
aEYnjabYKkrsGx4nt3GQzqnJSw9DDBLqimkMi1RgeJ3FXs8nnRgag0qIoBCm
9jQ/BLSQmb/86sU3XKfMiYuU7cMuHSNncExwCG4eckzRDWFNJuHIZWj36Nls
RbDcwPkQSlU2QMdjPKAgVKisGnJm+3G7DwJ9FzIOKu4SIqx0DqolSQy+FKiB
JCr0uiyt4/IjddajSNHrNTIaqpQSXSOlMZQNtmQ0II3fo5YrYvGdU+GPCOU5
qGJT6rcd91qIXx3cUmkINmxj6CTuoiSiknXUPoQiEc0eprbBV5mddtZwvUQl
DXshhx9thkG1o34ytpqNhlEvSpQ76SsBrDZcVHcItna2wKjTtvYHXfpH8Yt6
CNaG0jL1BO8IB0FlWln5WEbjb+pqtIE5FwzcffGiQm+2VdwvFnuNvCHV9IkO
QIwbLiwNvRZZk6JAUeeVBo26wjHEQTb5KB0TRhyn3C7inGkPO+5VB+yI3nS9
1qkGk7uskIwHpf0BCIJk19QK0HgLOrdYeDEiqr5kIod2g+NsK0zTB6p4TBXm
NNuxirwZt58aTqDBAyg9NRvq2LlW1fEsLjAsMh+7DW1EFCksDJuG66wDCH2x
qitmHLpcwEP9BKbQpdAj4we4W3S5goaHKfh2RGmB5jYDNVt6vSGy/rPWrvX8
/sP6sXUDSd02Lxf3kmP5kf59nKeBGbrkMtcIndyIU4eGGX3K6+GBk9hHyG42
XFqSSzG8W5WXnNDlzuo24UJdcOYjZ0ZeW6KkQKYB5xSCpfNNeYF1TlOPTFBk
SvIVFpc4LfEaeX2EY1ajFFGD985aFNCbJJmEa5BAbRXuEyuVbo3N7YYEoFSu
Tc1lpNrZfIcEHejm6NqHyhrh7Kr2VdM23gOvQRrDFvfeFg3xQ+Z3okvYiAxN
0NES2gmt/YqLERYwtwy6KyjSL5C6HWXeB5G+G+plTnwjh0rjJEVRK+dbhpyu
CVzB8UoyAo5rWGO77Th5YmznDnYdirf2Xu2U6x/COlXcCI9k8Oi269o1ty60
LhQdNB8urEtyJ4SekClApBAjKRzQuNJ8gYGCINcpa4Y5VKyQVm6lkTfoaoYD
ucghSs5OhGOjVzT3BbEKU7FCbcEJPWYjNY4Fi+6G8W5LOU4WId88AnjC7RI1
MzHPPWyIkEYPrdOAWycqbmLx9JkUjUUNUQfp0QjWHpxxi2gcFYYr6geyBgUm
V5NHQnOMR7nHUMIiOwwG2hd0hTHgO79wzRHWHGH/HATBKUf3GCdUzNGqcHtK
wSBJFvWqOnyjO5IkueFQQwRC4CoURa6xuDqdJMl12bDs4UgL0jEUY9Be/S+k
IEHjF0+v9HH4iZWPDP/0Dg8xf2Kn93W4sEXxVoLr6wjjWCzfXIA/B2D9IS/z
7QrbItyrJ3yrTH8uDteZn78GPdx+csak7J3SHZOjXyUEivycY2mSzJzccEGi
6fcIitjukaJsgskhCLS31dChIxe1EykXtAi7YEhoPENmgzLh5wHaq1lxKTc6
jc3Kc39yPEg/xIRAQCLzMHlrZ3EK6K3fHu5DBVWwPHM5/ccSuENdaPuncBnf
lCXkphwG0pC/m7j0tFadHwHab5+5+G/HR9+cvUDb6SscfqkqZ8X7XP4q2/Ew
1BcfJkK8OMPsduR2MTkc3fx0oNM1/2bCjDShRkdqBx7Xl5fXV+RRFN7SwCAy
bxgOZuOfN07Pt1wspuHWM1dufKRZ8p5CgA/m7br9d/GufyXTOwomk5RuNOh3
ICKNT34bB/Op7HVvLXOveuD78vriGqmrmamGyX8Bc02ZuK0bAAA=

-->

</rfc>
