mbox-rfc4155.txt (19645B)
1 2 3 4 5 6 7 Network Working Group E. Hall 8 Request for Comments: 4155 September 2005 9 Category: Informational 10 11 12 The application/mbox Media Type 13 14 Status of This Memo 15 16 This memo provides information for the Internet community. It does 17 not specify an Internet standard of any kind. Distribution of this 18 memo is unlimited. 19 20 Copyright Notice 21 22 Copyright (C) The Internet Society (2005). 23 24 Abstract 25 26 This memo requests that the application/mbox media type be authorized 27 for allocation by the IESG, according to the terms specified in RFC 28 2048. This memo also defines a default format for the mbox database, 29 which must be supported by all conformant implementations. 30 31 1. Background and Overview 32 33 UNIX-like operating systems have historically made widespread use of 34 "mbox" database files for a variety of local email purposes. In the 35 common case, mbox files store linear sequences of one or more 36 electronic mail messages, with local email clients treating the 37 database as a logical folder of email messages. mbox databases are 38 also used by a variety of other messaging tools, such as mailing list 39 management programs, archiving and filtering utilities, messaging 40 servers, and other related applications. In recent years, mbox 41 databases have also become common on a large number of non-UNIX 42 computing platforms, for similar kinds of purposes. 43 44 The increased pervasiveness of these files has led to an increased 45 demand for a standardized, network-wide interchange of these files as 46 discrete database objects. In turn, this dictates a need for a 47 general media type definition for mbox files, which is the subject 48 and purpose of this memo. 49 50 51 52 53 54 55 56 57 58 Hall Informational [Page 1] 59 60 RFC 4155 The application/mbox Media Type September 2005 61 62 63 2. About the mbox Database 64 65 The mbox database format is not documented in an authoritative 66 specification, but instead exists as a well-known output format that 67 is anecdotally documented, or which is only authoritatively 68 documented for a specific platform or tool. 69 70 mbox databases typically contain a linear sequence of electronic mail 71 messages. Each message begins with a separator line that identifies 72 the message sender, and also identifies the date and time at which 73 the message was received by the final recipient (either the last-hop 74 system in the transfer path, or the system which serves as the 75 recipient's mailstore). Each message is typically terminated by an 76 empty line. The end of the database is usually recognized by either 77 the absence of any additional data, or by the presence of an explicit 78 end-of-file marker. 79 80 The structure of the separator lines vary across implementations, but 81 usually contain the exact character sequence of "From", followed by a 82 single Space character (0x20), an email address of some kind, another 83 Space character, a timestamp sequence of some kind, and an end-of- 84 line marker. However, due to the lack of any authoritative 85 specification, each of these attributes are known to vary widely 86 across implementations. For example, the email address can reflect 87 any addressing syntax that has ever been used on any messaging system 88 in all of history (specifically including address forms that are not 89 compatible with Internet messages, as defined by RFC 2822 [RFC2822]). 90 Similarly, the timestamp sequences can also vary according to system 91 output, while the end-of-line sequences will often reflect platform- 92 specific requirements. Different data formats can even appear within 93 a single database as a result of multiple mbox files being 94 concatenated together, or because a single file was accessed by 95 multiple messaging clients, each of which has used its own syntax for 96 the separator line. 97 98 Message data within mbox databases often reflects site-specific 99 peculiarities. For example, it is entirely possible for the message 100 body or headers in an mbox database to contain untagged eight-bit 101 character data that implicitly reflects a site-specific default 102 language or locale, or that reflects local defaults for timestamps 103 and email addresses; none of this data is widely portable beyond the 104 local scope. Similarly, message data can also contain unencoded 105 eight-bit binary data, or can use encoding formats that represent a 106 specific platform (e.g., BINHEX or UUENCODE sequences). 107 108 109 110 111 112 113 114 Hall Informational [Page 2] 115 116 RFC 4155 The application/mbox Media Type September 2005 117 118 119 Many implementations are also known to escape message body lines that 120 begin with the character sequence of "From ", so as to prevent 121 confusion with overly-liberal parsers that do not search for full 122 separator lines. In the common case, a leading Greater-Than symbol 123 (0x3E) is used for this purpose (with "From " becoming ">From "). 124 However, other implementations are known not to escape such lines 125 unless they are immediately preceded by a blank line or if they also 126 appear to contain an email address and a timestamp. Other 127 implementations are also known to perform secondary escapes against 128 these lines if they are already escaped or quoted, while others 129 ignore these mechanisms altogether. 130 131 A comprehensive description of mbox database files on UNIX-like 132 systems can be found at http://qmail.org./man/man5/mbox.html, which 133 should be treated as mostly authoritative for those variations that 134 are otherwise only documented in anecdotal form. However, readers 135 are advised that many other platforms and tools make use of mbox 136 databases, and that there are many more potential variations that can 137 be encountered in the wild. 138 139 In order to mitigate errors that may arise from such vagaries, this 140 specification defines a "format" parameter to the application/mbox 141 media type declaration, which can be used to identify the specific 142 kind of mbox database that is being transferred. Furthermore, this 143 specification defines a "default" database format which MUST be 144 supported by implementations that claim to be compliant with this 145 specification, and which is to be used as the implicit format for 146 undeclared application/mbox data objects. Additional format types 147 are to be defined in subsequent specifications. Messaging systems 148 that receive an mbox database with an unknown format parameter value 149 SHOULD treat the data as an opaque binary object, as if the data had 150 been declared as application/octet-stream 151 152 Refer to Appendix A for a description of the default mbox format. 153 154 Note that RFC 2046 [RFC2046] defines the multipart/digest media type 155 for transferring platform-independent message files. Because that 156 specification defines a set of neutral and strict formatting rules, 157 the multipart/digest media type already facilitates highly- 158 predictable transfer and conversion operations; as such, implementers 159 are strongly encouraged to support and use that media type where 160 possible. 161 162 163 164 165 166 167 168 169 170 Hall Informational [Page 3] 171 172 RFC 4155 The application/mbox Media Type September 2005 173 174 175 3. Prerequisites and Terminology 176 177 Readers of this document are expected to be familiar with the 178 specification for MIME [RFC2045] and MIME-type registrations 179 [RFC2048]. 180 181 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 182 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 183 document are to be interpreted as described in RFC 2119 [RFC2119]. 184 185 4. The application/mbox Media Type Registration 186 187 This section provides the media type registration application (as per 188 [RFC2048]). 189 190 MIME media type name: application 191 192 MIME subtype name: mbox 193 194 Required parameters: none 195 196 Optional parameters: The "format" parameter identifies the format of 197 the mbox database and the messages contained therein. The default 198 value for the "format" parameter is "default", and refers to the 199 formatting rules defined in Appendix A of this memo. mbox databases 200 that do not have a "format" parameter SHOULD be interpreted as having 201 the implicit "format" value of "default". mbox databases that have 202 an unknown value for the "format" parameter SHOULD be treated as 203 opaque data objects, as if the media type had been specified as 204 application/octet-stream. Additional values for the format parameter 205 are to be defined in subsequent specifications, and registered with 206 IANA. 207 208 Encoding considerations: If an email client receives an mbox database 209 as a message attachment, and then stores that attachment within a 210 local mbox database, the contents of the two database files may 211 become irreversibly intermingled, such that both databases are 212 rendered unrecognizable. In order to avoid these collisions, 213 messaging systems that support this specification MUST encode an mbox 214 database (or at a minimum, the separator lines) with non-transparent 215 transfer encoding (such as BASE64 or Quoted-Printable) whenever an 216 application/mbox object is transferred via messaging protocols. 217 Other transfer services are generally encouraged to adopt similar 218 encoding strategies in order to allow for any subsequent 219 retransmission that might occur, but this is not a requirement. 220 Implementers should also be prepared to encode mbox data locally if 221 non-compliant data is received. 222 223 224 225 226 Hall Informational [Page 4] 227 228 RFC 4155 The application/mbox Media Type September 2005 229 230 231 Security considerations: mbox data is passive, and does not generally 232 represent a unique or new security threat. However, there is risk in 233 sharing any kind of data, because unintentional information may be 234 exposed, and this risk certainly applies to mbox data as well. 235 236 Interoperability considerations: Due to the lack of a single 237 authoritative specification for mbox databases, there are a large 238 number of variations between database formats (refer to the 239 introduction text for common examples), and it is expected that non- 240 conformant data will be erroneously tagged or exchanged. Although 241 the "default" format specified in this memo does not allow for these 242 kinds of vagaries, prior negotiation or agreement between humans may 243 sometimes be needed. 244 245 Published specification: see Appendix A. 246 247 Applications that use this media type: hundreds of messaging products 248 make use of the mbox database format, in one form or another. 249 250 Magic number(s): mbox database files can be recognized by having a 251 leading character sequence of "From", followed by a single Space 252 character (0x20), followed by additional printable character data 253 (refer to the description in Appendix A for details). However, 254 implementers are cautioned that all such files will not be compliant 255 with all of the formatting rules, therefore implementers should treat 256 these files with an appropriate amount of circumspection. 257 258 File extension(s): mbox database files sometimes have an ".mbox" 259 extension, but this is not required nor expected. As with magic 260 numbers, implementers should avoid reflexive assumptions about the 261 contents of such files. 262 263 Macintosh File Type Code(s): None are known to be common. 264 265 Person & email address to contact for further information: Eric A. 266 Hall (ehall@ntrg.com) 267 268 Intended usage: COMMON 269 270 5. Security Considerations 271 272 See the discussion in section 4. 273 274 275 276 277 278 279 280 281 282 Hall Informational [Page 5] 283 284 RFC 4155 The application/mbox Media Type September 2005 285 286 287 6. IANA Considerations 288 289 The IANA has registered the application/mbox media type in the MIME 290 registry, using the application provided in section 4 above. 291 292 Furthermore, IANA has established and will maintain a registry of 293 values for the "format" parameter as described in this memo. The 294 first registration is the "default" value, using the description 295 provided in Appendix A. Subsequent values for the "format" parameter 296 MUST be accompanied by some form of recognizable, complete, and 297 legitimate specification, such as an IESG-approved specification, or 298 some kind of authoritative vendor documentation. 299 300 7. Normative References 301 302 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 303 Extensions (MIME) Part One: Format of Internet Message 304 Bodies", RFC 2045, November 1996. 305 306 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 307 Extensions (MIME) Part Two: Media Types", RFC 2046, 308 November 1996. 309 310 [RFC2048] Freed, N., Klensin, J., and J. Postel, "Multipurpose 311 Internet Mail Extensions (MIME) Part Four: Registration 312 Procedures", BCP 13, RFC 2048, November 1996. 313 314 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 315 Requirement Levels", BCP 14, RFC 2119, March 1997. 316 317 [RFC2822] Resnick, P., "Internet Message Format", RFC 2822, April 318 2001. 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 Hall Informational [Page 6] 339 340 RFC 4155 The application/mbox Media Type September 2005 341 342 343 Appendix A. The "default" mbox Database Format 344 345 In order to improve interoperability among messaging systems, this 346 memo defines a "default" mbox database format, which MUST be 347 supported by all implementations that claim to be compliant with this 348 specification. 349 350 The "default" mbox database format uses a linear sequence of Internet 351 messages, with each message being immediately prefaced by a separator 352 line, and being terminated by an empty line. More specifically: 353 354 o Each message within the database MUST follow the syntax and 355 formatting rules defined in RFC 2822 [RFC2822] and its related 356 specifications, with the exception that the canonical mbox 357 database MUST use a single Line-Feed character (0x0A) as the 358 end-of-line sequence, and MUST NOT use a Carriage-Return/Line- 359 Feed pair (NB: this requirement only applies to the canonical 360 mbox database as transferred, and does not override any other 361 specifications). This usage represents the most common 362 historical representation of the mbox database format, and 363 allows for the least amount of conversion. 364 365 o Messages within the default mbox database MUST consist of 366 seven-bit characters within an eight-bit stream. Eight-bit data 367 within the stream MUST be converted to a seven-bit form (using 368 appropriate, standardized encoding) and appropriately tagged 369 (with the correct header fields) before the database is 370 transferred. 371 372 o Message headers and data in the default mbox database MUST be 373 fully-qualified, as per the relevant specification(s). For 374 example, email addresses in the various header fields MUST have 375 legitimate domain names (as per RFC 2822), while extended 376 characters and encodings MUST be specified in the appropriate 377 location (as per the appropriate MIME specifications), and so 378 forth. 379 380 o Each message in the mbox database MUST be immediately preceded 381 by a single separator line, which MUST conform to the following 382 syntax: 383 384 The exact character sequence of "From"; 385 386 a single Space character (0x20); 387 388 the email address of the message sender (as obtained from the 389 message envelope or other authoritative source), conformant 390 with the "addr-spec" syntax from RFC 2822; 391 392 393 394 Hall Informational [Page 7] 395 396 RFC 4155 The application/mbox Media Type September 2005 397 398 399 a single Space character; 400 401 a timestamp indicating the UTC date and time when the message 402 was originally received, conformant with the syntax of the 403 traditional UNIX 'ctime' output sans timezone (note that the 404 use of UTC precludes the need for a timezone indicator); 405 406 an end-of-line marker. 407 408 o Each message in the database MUST be terminated by an empty 409 line, containing a single end-of-line marker. 410 411 Note that the first message in an mbox database will only be prefaced 412 by a separator line, while every other message will begin with two 413 end-of-line sequences (one at the end of the message itself, and 414 another to mark the end of the message within the mbox database file 415 stream) and a separator line (marking the new message). The end of 416 the database is implicitly reached when no more message data or 417 separator lines are found. 418 419 Also note that this specification does not prescribe any escape 420 syntax for message body lines that begin with the character sequence 421 of "From ". Recipient systems are expected to parse full separator 422 lines as they are documented above. 423 424 Author's Address 425 426 Eric A. Hall 427 428 EMail: ehall@ntrg.com 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 Hall Informational [Page 8] 451 452 RFC 4155 The application/mbox Media Type September 2005 453 454 455 Full Copyright Statement 456 457 Copyright (C) The Internet Society (2005). 458 459 This document is subject to the rights, licenses and restrictions 460 contained in BCP 78, and except as set forth therein, the authors 461 retain all their rights. 462 463 This document and the information contained herein are provided on an 464 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 465 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 466 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 467 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 468 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 469 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 470 471 Intellectual Property 472 473 The IETF takes no position regarding the validity or scope of any 474 Intellectual Property Rights or other rights that might be claimed to 475 pertain to the implementation or use of the technology described in 476 this document or the extent to which any license under such rights 477 might or might not be available; nor does it represent that it has 478 made any independent effort to identify any such rights. Information 479 on the procedures with respect to rights in RFC documents can be 480 found in BCP 78 and BCP 79. 481 482 Copies of IPR disclosures made to the IETF Secretariat and any 483 assurances of licenses to be made available, or the result of an 484 attempt made to obtain a general license or permission for the use of 485 such proprietary rights by implementers or users of this 486 specification can be obtained from the IETF on-line IPR repository at 487 http://www.ietf.org/ipr. 488 489 The IETF invites any interested party to bring to its attention any 490 copyrights, patents or patent applications, or other proprietary 491 rights that may cover technology that may be required to implement 492 this standard. Please address the information to the IETF at ietf- 493 ipr@ietf.org. 494 495 Acknowledgement 496 497 Funding for the RFC Editor function is currently provided by the 498 Internet Society. 499 500 501 502 503 504 505 506 Hall Informational [Page 9] 507