dmc

dynamic mail client
git clone git://git.suckless.org/dmc
Log | Files | Refs | README | LICENSE

mbox-rfc4155.txt (19645B)


      1 
      2 
      3 
      4 
      5 
      6 
      7 Network Working Group                                            E. Hall
      8 Request for Comments: 4155                                September 2005
      9 Category: Informational
     10 
     11 
     12                     The application/mbox Media Type
     13 
     14 Status of This Memo
     15 
     16    This memo provides information for the Internet community.  It does
     17    not specify an Internet standard of any kind.  Distribution of this
     18    memo is unlimited.
     19 
     20 Copyright Notice
     21 
     22    Copyright (C) The Internet Society (2005).
     23 
     24 Abstract
     25 
     26    This memo requests that the application/mbox media type be authorized
     27    for allocation by the IESG, according to the terms specified in RFC
     28    2048.  This memo also defines a default format for the mbox database,
     29    which must be supported by all conformant implementations.
     30 
     31 1.  Background and Overview
     32 
     33    UNIX-like operating systems have historically made widespread use of
     34    "mbox" database files for a variety of local email purposes.  In the
     35    common case, mbox files store linear sequences of one or more
     36    electronic mail messages, with local email clients treating the
     37    database as a logical folder of email messages.  mbox databases are
     38    also used by a variety of other messaging tools, such as mailing list
     39    management programs, archiving and filtering utilities, messaging
     40    servers, and other related applications.  In recent years, mbox
     41    databases have also become common on a large number of non-UNIX
     42    computing platforms, for similar kinds of purposes.
     43 
     44    The increased pervasiveness of these files has led to an increased
     45    demand for a standardized, network-wide interchange of these files as
     46    discrete database objects.  In turn, this dictates a need for a
     47    general media type definition for mbox files, which is the subject
     48    and purpose of this memo.
     49 
     50 
     51 
     52 
     53 
     54 
     55 
     56 
     57 
     58 Hall                         Informational                      [Page 1]
     59 
     60 RFC 4155            The application/mbox Media Type       September 2005
     61 
     62 
     63 2.  About the mbox Database
     64 
     65    The mbox database format is not documented in an authoritative
     66    specification, but instead exists as a well-known output format that
     67    is anecdotally documented, or which is only authoritatively
     68    documented for a specific platform or tool.
     69 
     70    mbox databases typically contain a linear sequence of electronic mail
     71    messages.  Each message begins with a separator line that identifies
     72    the message sender, and also identifies the date and time at which
     73    the message was received by the final recipient (either the last-hop
     74    system in the transfer path, or the system which serves as the
     75    recipient's mailstore).  Each message is typically terminated by an
     76    empty line.  The end of the database is usually recognized by either
     77    the absence of any additional data, or by the presence of an explicit
     78    end-of-file marker.
     79 
     80    The structure of the separator lines vary across implementations, but
     81    usually contain the exact character sequence of "From", followed by a
     82    single Space character (0x20), an email address of some kind, another
     83    Space character, a timestamp sequence of some kind, and an end-of-
     84    line marker.  However, due to the lack of any authoritative
     85    specification, each of these attributes are known to vary widely
     86    across implementations.  For example, the email address can reflect
     87    any addressing syntax that has ever been used on any messaging system
     88    in all of history (specifically including address forms that are not
     89    compatible with Internet messages, as defined by RFC 2822 [RFC2822]).
     90    Similarly, the timestamp sequences can also vary according to system
     91    output, while the end-of-line sequences will often reflect platform-
     92    specific requirements.  Different data formats can even appear within
     93    a single database as a result of multiple mbox files being
     94    concatenated together, or because a single file was accessed by
     95    multiple messaging clients, each of which has used its own syntax for
     96    the separator line.
     97 
     98    Message data within mbox databases often reflects site-specific
     99    peculiarities.  For example, it is entirely possible for the message
    100    body or headers in an mbox database to contain untagged eight-bit
    101    character data that implicitly reflects a site-specific default
    102    language or locale, or that reflects local defaults for timestamps
    103    and email addresses; none of this data is widely portable beyond the
    104    local scope.  Similarly, message data can also contain unencoded
    105    eight-bit binary data, or can use encoding formats that represent a
    106    specific platform (e.g., BINHEX or UUENCODE sequences).
    107 
    108 
    109 
    110 
    111 
    112 
    113 
    114 Hall                         Informational                      [Page 2]
    115 
    116 RFC 4155            The application/mbox Media Type       September 2005
    117 
    118 
    119    Many implementations are also known to escape message body lines that
    120    begin with the character sequence of "From ", so as to prevent
    121    confusion with overly-liberal parsers that do not search for full
    122    separator lines.  In the common case, a leading Greater-Than symbol
    123    (0x3E) is used for this purpose (with "From " becoming ">From ").
    124    However, other implementations are known not to escape such lines
    125    unless they are immediately preceded by a blank line or if they also
    126    appear to contain an email address and a timestamp.  Other
    127    implementations are also known to perform secondary escapes against
    128    these lines if they are already escaped or quoted, while others
    129    ignore these mechanisms altogether.
    130 
    131    A comprehensive description of mbox database files on UNIX-like
    132    systems can be found at http://qmail.org./man/man5/mbox.html, which
    133    should be treated as mostly authoritative for those variations that
    134    are otherwise only documented in anecdotal form.  However, readers
    135    are advised that many other platforms and tools make use of mbox
    136    databases, and that there are many more potential variations that can
    137    be encountered in the wild.
    138 
    139    In order to mitigate errors that may arise from such vagaries, this
    140    specification defines a "format" parameter to the application/mbox
    141    media type declaration, which can be used to identify the specific
    142    kind of mbox database that is being transferred.  Furthermore, this
    143    specification defines a "default" database format which MUST be
    144    supported by implementations that claim to be compliant with this
    145    specification, and which is to be used as the implicit format for
    146    undeclared application/mbox data objects.  Additional format types
    147    are to be defined in subsequent specifications.  Messaging systems
    148    that receive an mbox database with an unknown format parameter value
    149    SHOULD treat the data as an opaque binary object, as if the data had
    150    been declared as application/octet-stream
    151 
    152    Refer to Appendix A for a description of the default mbox format.
    153 
    154    Note that RFC 2046 [RFC2046] defines the multipart/digest media type
    155    for transferring platform-independent message files.  Because that
    156    specification defines a set of neutral and strict formatting rules,
    157    the multipart/digest media type already facilitates highly-
    158    predictable transfer and conversion operations; as such, implementers
    159    are strongly encouraged to support and use that media type where
    160    possible.
    161 
    162 
    163 
    164 
    165 
    166 
    167 
    168 
    169 
    170 Hall                         Informational                      [Page 3]
    171 
    172 RFC 4155            The application/mbox Media Type       September 2005
    173 
    174 
    175 3.  Prerequisites and Terminology
    176 
    177    Readers of this document are expected to be familiar with the
    178    specification for MIME [RFC2045] and MIME-type registrations
    179    [RFC2048].
    180 
    181    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    182    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
    183    document are to be interpreted as described in RFC 2119 [RFC2119].
    184 
    185 4.  The application/mbox Media Type Registration
    186 
    187    This section provides the media type registration application (as per
    188    [RFC2048]).
    189 
    190    MIME media type name: application
    191 
    192    MIME subtype name: mbox
    193 
    194    Required parameters: none
    195 
    196    Optional parameters: The "format" parameter identifies the format of
    197    the mbox database and the messages contained therein.  The default
    198    value for the "format" parameter is "default", and refers to the
    199    formatting rules defined in Appendix A of this memo.  mbox databases
    200    that do not have a "format" parameter SHOULD be interpreted as having
    201    the implicit "format" value of "default".  mbox databases that have
    202    an unknown value for the "format" parameter SHOULD be treated as
    203    opaque data objects, as if the media type had been specified as
    204    application/octet-stream.  Additional values for the format parameter
    205    are to be defined in subsequent specifications, and registered with
    206    IANA.
    207 
    208    Encoding considerations: If an email client receives an mbox database
    209    as a message attachment, and then stores that attachment within a
    210    local mbox database, the contents of the two database files may
    211    become irreversibly intermingled, such that both databases are
    212    rendered unrecognizable.  In order to avoid these collisions,
    213    messaging systems that support this specification MUST encode an mbox
    214    database (or at a minimum, the separator lines) with non-transparent
    215    transfer encoding (such as BASE64 or Quoted-Printable) whenever an
    216    application/mbox object is transferred via messaging protocols.
    217    Other transfer services are generally encouraged to adopt similar
    218    encoding strategies in order to allow for any subsequent
    219    retransmission that might occur, but this is not a requirement.
    220    Implementers should also be prepared to encode mbox data locally if
    221    non-compliant data is received.
    222 
    223 
    224 
    225 
    226 Hall                         Informational                      [Page 4]
    227 
    228 RFC 4155            The application/mbox Media Type       September 2005
    229 
    230 
    231    Security considerations: mbox data is passive, and does not generally
    232    represent a unique or new security threat.  However, there is risk in
    233    sharing any kind of data, because unintentional information may be
    234    exposed, and this risk certainly applies to mbox data as well.
    235 
    236    Interoperability considerations: Due to the lack of a single
    237    authoritative specification for mbox databases, there are a large
    238    number of variations between database formats (refer to the
    239    introduction text for common examples), and it is expected that non-
    240    conformant data will be erroneously tagged or exchanged.  Although
    241    the "default" format specified in this memo does not allow for these
    242    kinds of vagaries, prior negotiation or agreement between humans may
    243    sometimes be needed.
    244 
    245    Published specification: see Appendix A.
    246 
    247    Applications that use this media type: hundreds of messaging products
    248    make use of the mbox database format, in one form or another.
    249 
    250    Magic number(s): mbox database files can be recognized by having a
    251    leading character sequence of "From", followed by a single Space
    252    character (0x20), followed by additional printable character data
    253    (refer to the description in Appendix A for details).  However,
    254    implementers are cautioned that all such files will not be compliant
    255    with all of the formatting rules, therefore implementers should treat
    256    these files with an appropriate amount of circumspection.
    257 
    258    File extension(s): mbox database files sometimes have an ".mbox"
    259    extension, but this is not required nor expected.  As with magic
    260    numbers, implementers should avoid reflexive assumptions about the
    261    contents of such files.
    262 
    263    Macintosh File Type Code(s): None are known to be common.
    264 
    265    Person & email address to contact for further information: Eric A.
    266    Hall (ehall@ntrg.com)
    267 
    268    Intended usage: COMMON
    269 
    270 5.  Security Considerations
    271 
    272    See the discussion in section 4.
    273 
    274 
    275 
    276 
    277 
    278 
    279 
    280 
    281 
    282 Hall                         Informational                      [Page 5]
    283 
    284 RFC 4155            The application/mbox Media Type       September 2005
    285 
    286 
    287 6.  IANA Considerations
    288 
    289    The IANA has registered the application/mbox media type in the MIME
    290    registry, using the application provided in section 4 above.
    291 
    292    Furthermore, IANA has established and will maintain a registry of
    293    values for the "format" parameter as described in this memo.  The
    294    first registration is the "default" value, using the description
    295    provided in Appendix A.  Subsequent values for the "format" parameter
    296    MUST be accompanied by some form of recognizable, complete, and
    297    legitimate specification, such as an IESG-approved specification, or
    298    some kind of authoritative vendor documentation.
    299 
    300 7.  Normative References
    301 
    302    [RFC2045]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
    303                Extensions (MIME) Part One: Format of Internet Message
    304                Bodies", RFC 2045, November 1996.
    305 
    306    [RFC2046]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
    307                Extensions (MIME) Part Two: Media Types", RFC 2046,
    308                November 1996.
    309 
    310    [RFC2048]   Freed, N., Klensin, J., and J. Postel, "Multipurpose
    311                Internet Mail Extensions (MIME) Part Four: Registration
    312                Procedures", BCP 13, RFC 2048, November 1996.
    313 
    314    [RFC2119]   Bradner, S., "Key words for use in RFCs to Indicate
    315                Requirement Levels", BCP 14, RFC 2119, March 1997.
    316 
    317    [RFC2822]   Resnick, P., "Internet Message Format", RFC 2822, April
    318                2001.
    319 
    320 
    321 
    322 
    323 
    324 
    325 
    326 
    327 
    328 
    329 
    330 
    331 
    332 
    333 
    334 
    335 
    336 
    337 
    338 Hall                         Informational                      [Page 6]
    339 
    340 RFC 4155            The application/mbox Media Type       September 2005
    341 
    342 
    343 Appendix A.  The "default" mbox Database Format
    344 
    345    In order to improve interoperability among messaging systems, this
    346    memo defines a "default" mbox database format, which MUST be
    347    supported by all implementations that claim to be compliant with this
    348    specification.
    349 
    350    The "default" mbox database format uses a linear sequence of Internet
    351    messages, with each message being immediately prefaced by a separator
    352    line, and being terminated by an empty line.  More specifically:
    353 
    354       o Each message within the database MUST follow the syntax and
    355         formatting rules defined in RFC 2822 [RFC2822] and its related
    356         specifications, with the exception that the canonical mbox
    357         database MUST use a single Line-Feed character (0x0A) as the
    358         end-of-line sequence, and MUST NOT use a Carriage-Return/Line-
    359         Feed pair (NB: this requirement only applies to the canonical
    360         mbox database as transferred, and does not override any other
    361         specifications).  This usage represents the most common
    362         historical representation of the mbox database format, and
    363         allows for the least amount of conversion.
    364 
    365       o Messages within the default mbox database MUST consist of
    366         seven-bit characters within an eight-bit stream.  Eight-bit data
    367         within the stream MUST be converted to a seven-bit form (using
    368         appropriate, standardized encoding) and appropriately tagged
    369         (with the correct header fields) before the database is
    370         transferred.
    371 
    372       o Message headers and data in the default mbox database MUST be
    373         fully-qualified, as per the relevant specification(s).  For
    374         example, email addresses in the various header fields MUST have
    375         legitimate domain names (as per RFC 2822), while extended
    376         characters and encodings MUST be specified in the appropriate
    377         location (as per the appropriate MIME specifications), and so
    378         forth.
    379 
    380       o Each message in the mbox database MUST be immediately preceded
    381         by a single separator line, which MUST conform to the following
    382         syntax:
    383 
    384            The exact character sequence of "From";
    385 
    386            a single Space character (0x20);
    387 
    388            the email address of the message sender (as obtained from the
    389            message envelope or other authoritative source), conformant
    390            with the "addr-spec" syntax from RFC 2822;
    391 
    392 
    393 
    394 Hall                         Informational                      [Page 7]
    395 
    396 RFC 4155            The application/mbox Media Type       September 2005
    397 
    398 
    399            a single Space character;
    400 
    401            a timestamp indicating the UTC date and time when the message
    402            was originally received, conformant with the syntax of the
    403            traditional UNIX 'ctime' output sans timezone (note that the
    404            use of UTC precludes the need for a timezone indicator);
    405 
    406            an end-of-line marker.
    407 
    408       o Each message in the database MUST be terminated by an empty
    409         line, containing a single end-of-line marker.
    410 
    411    Note that the first message in an mbox database will only be prefaced
    412    by a separator line, while every other message will begin with two
    413    end-of-line sequences (one at the end of the message itself, and
    414    another to mark the end of the message within the mbox database file
    415    stream) and a separator line (marking the new message).  The end of
    416    the database is implicitly reached when no more message data or
    417    separator lines are found.
    418 
    419    Also note that this specification does not prescribe any escape
    420    syntax for message body lines that begin with the character sequence
    421    of "From ".  Recipient systems are expected to parse full separator
    422    lines as they are documented above.
    423 
    424 Author's Address
    425 
    426    Eric A. Hall
    427 
    428    EMail: ehall@ntrg.com
    429 
    430 
    431 
    432 
    433 
    434 
    435 
    436 
    437 
    438 
    439 
    440 
    441 
    442 
    443 
    444 
    445 
    446 
    447 
    448 
    449 
    450 Hall                         Informational                      [Page 8]
    451 
    452 RFC 4155            The application/mbox Media Type       September 2005
    453 
    454 
    455 Full Copyright Statement
    456 
    457    Copyright (C) The Internet Society (2005).
    458 
    459    This document is subject to the rights, licenses and restrictions
    460    contained in BCP 78, and except as set forth therein, the authors
    461    retain all their rights.
    462 
    463    This document and the information contained herein are provided on an
    464    "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
    465    OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
    466    ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
    467    INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
    468    INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
    469    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
    470 
    471 Intellectual Property
    472 
    473    The IETF takes no position regarding the validity or scope of any
    474    Intellectual Property Rights or other rights that might be claimed to
    475    pertain to the implementation or use of the technology described in
    476    this document or the extent to which any license under such rights
    477    might or might not be available; nor does it represent that it has
    478    made any independent effort to identify any such rights.  Information
    479    on the procedures with respect to rights in RFC documents can be
    480    found in BCP 78 and BCP 79.
    481 
    482    Copies of IPR disclosures made to the IETF Secretariat and any
    483    assurances of licenses to be made available, or the result of an
    484    attempt made to obtain a general license or permission for the use of
    485    such proprietary rights by implementers or users of this
    486    specification can be obtained from the IETF on-line IPR repository at
    487    http://www.ietf.org/ipr.
    488 
    489    The IETF invites any interested party to bring to its attention any
    490    copyrights, patents or patent applications, or other proprietary
    491    rights that may cover technology that may be required to implement
    492    this standard.  Please address the information to the IETF at ietf-
    493    ipr@ietf.org.
    494 
    495 Acknowledgement
    496 
    497    Funding for the RFC Editor function is currently provided by the
    498    Internet Society.
    499 
    500 
    501 
    502 
    503 
    504 
    505 
    506 Hall                         Informational                      [Page 9]
    507