mime-p2-rfc2046.txt (105854B)
1 2 3 4 5 6 7 Network Working Group N. Freed 8 Request for Comments: 2046 Innosoft 9 Obsoletes: 1521, 1522, 1590 N. Borenstein 10 Category: Standards Track First Virtual 11 November 1996 12 13 14 Multipurpose Internet Mail Extensions 15 (MIME) Part Two: 16 Media Types 17 18 Status of this Memo 19 20 This document specifies an Internet standards track protocol for the 21 Internet community, and requests discussion and suggestions for 22 improvements. Please refer to the current edition of the "Internet 23 Official Protocol Standards" (STD 1) for the standardization state 24 and status of this protocol. Distribution of this memo is unlimited. 25 26 Abstract 27 28 STD 11, RFC 822 defines a message representation protocol specifying 29 considerable detail about US-ASCII message headers, but which leaves 30 the message content, or message body, as flat US-ASCII text. This 31 set of documents, collectively called the Multipurpose Internet Mail 32 Extensions, or MIME, redefines the format of messages to allow for 33 34 (1) textual message bodies in character sets other than 35 US-ASCII, 36 37 (2) an extensible set of different formats for non-textual 38 message bodies, 39 40 (3) multi-part message bodies, and 41 42 (4) textual header information in character sets other than 43 US-ASCII. 44 45 These documents are based on earlier work documented in RFC 934, STD 46 11, and RFC 1049, but extends and revises them. Because RFC 822 said 47 so little about message bodies, these documents are largely 48 orthogonal to (rather than a revision of) RFC 822. 49 50 The initial document in this set, RFC 2045, specifies the various 51 headers used to describe the structure of MIME messages. This second 52 document defines the general structure of the MIME media typing 53 system and defines an initial set of media types. The third document, 54 RFC 2047, describes extensions to RFC 822 to allow non-US-ASCII text 55 56 57 58 Freed & Borenstein Standards Track [Page 1] 59 60 RFC 2046 Media Types November 1996 61 62 63 data in Internet mail header fields. The fourth document, RFC 2048, 64 specifies various IANA registration procedures for MIME-related 65 facilities. The fifth and final document, RFC 2049, describes MIME 66 conformance criteria as well as providing some illustrative examples 67 of MIME message formats, acknowledgements, and the bibliography. 68 69 These documents are revisions of RFCs 1521 and 1522, which themselves 70 were revisions of RFCs 1341 and 1342. An appendix in RFC 2049 71 describes differences and changes from previous versions. 72 73 Table of Contents 74 75 1. Introduction ......................................... 3 76 2. Definition of a Top-Level Media Type ................. 4 77 3. Overview Of The Initial Top-Level Media Types ........ 4 78 4. Discrete Media Type Values ........................... 6 79 4.1 Text Media Type ..................................... 6 80 4.1.1 Representation of Line Breaks ..................... 7 81 4.1.2 Charset Parameter ................................. 7 82 4.1.3 Plain Subtype ..................................... 11 83 4.1.4 Unrecognized Subtypes ............................. 11 84 4.2 Image Media Type .................................... 11 85 4.3 Audio Media Type .................................... 11 86 4.4 Video Media Type .................................... 12 87 4.5 Application Media Type .............................. 12 88 4.5.1 Octet-Stream Subtype .............................. 13 89 4.5.2 PostScript Subtype ................................ 14 90 4.5.3 Other Application Subtypes ........................ 17 91 5. Composite Media Type Values .......................... 17 92 5.1 Multipart Media Type ................................ 17 93 5.1.1 Common Syntax ..................................... 19 94 5.1.2 Handling Nested Messages and Multiparts ........... 24 95 5.1.3 Mixed Subtype ..................................... 24 96 5.1.4 Alternative Subtype ............................... 24 97 5.1.5 Digest Subtype .................................... 26 98 5.1.6 Parallel Subtype .................................. 27 99 5.1.7 Other Multipart Subtypes .......................... 28 100 5.2 Message Media Type .................................. 28 101 5.2.1 RFC822 Subtype .................................... 28 102 5.2.2 Partial Subtype ................................... 29 103 5.2.2.1 Message Fragmentation and Reassembly ............ 30 104 5.2.2.2 Fragmentation and Reassembly Example ............ 31 105 5.2.3 External-Body Subtype ............................. 33 106 5.2.4 Other Message Subtypes ............................ 40 107 6. Experimental Media Type Values ....................... 40 108 7. Summary .............................................. 41 109 8. Security Considerations .............................. 41 110 9. Authors' Addresses ................................... 42 111 112 113 114 Freed & Borenstein Standards Track [Page 2] 115 116 RFC 2046 Media Types November 1996 117 118 119 A. Collected Grammar .................................... 43 120 121 1. Introduction 122 123 The first document in this set, RFC 2045, defines a number of header 124 fields, including Content-Type. The Content-Type field is used to 125 specify the nature of the data in the body of a MIME entity, by 126 giving media type and subtype identifiers, and by providing auxiliary 127 information that may be required for certain media types. After the 128 type and subtype names, the remainder of the header field is simply a 129 set of parameters, specified in an attribute/value notation. The 130 ordering of parameters is not significant. 131 132 In general, the top-level media type is used to declare the general 133 type of data, while the subtype specifies a specific format for that 134 type of data. Thus, a media type of "image/xyz" is enough to tell a 135 user agent that the data is an image, even if the user agent has no 136 knowledge of the specific image format "xyz". Such information can 137 be used, for example, to decide whether or not to show a user the raw 138 data from an unrecognized subtype -- such an action might be 139 reasonable for unrecognized subtypes of "text", but not for 140 unrecognized subtypes of "image" or "audio". For this reason, 141 registered subtypes of "text", "image", "audio", and "video" should 142 not contain embedded information that is really of a different type. 143 Such compound formats should be represented using the "multipart" or 144 "application" types. 145 146 Parameters are modifiers of the media subtype, and as such do not 147 fundamentally affect the nature of the content. The set of 148 meaningful parameters depends on the media type and subtype. Most 149 parameters are associated with a single specific subtype. However, a 150 given top-level media type may define parameters which are applicable 151 to any subtype of that type. Parameters may be required by their 152 defining media type or subtype or they may be optional. MIME 153 implementations must also ignore any parameters whose names they do 154 not recognize. 155 156 MIME's Content-Type header field and media type mechanism has been 157 carefully designed to be extensible, and it is expected that the set 158 of media type/subtype pairs and their associated parameters will grow 159 significantly over time. Several other MIME facilities, such as 160 transfer encodings and "message/external-body" access types, are 161 likely to have new values defined over time. In order to ensure that 162 the set of such values is developed in an orderly, well-specified, 163 and public manner, MIME sets up a registration process which uses the 164 Internet Assigned Numbers Authority (IANA) as a central registry for 165 MIME's various areas of extensibility. The registration process for 166 these areas is described in a companion document, RFC 2048. 167 168 169 170 Freed & Borenstein Standards Track [Page 3] 171 172 RFC 2046 Media Types November 1996 173 174 175 The initial seven standard top-level media type are defined and 176 described in the remainder of this document. 177 178 2. Definition of a Top-Level Media Type 179 180 The definition of a top-level media type consists of: 181 182 (1) a name and a description of the type, including 183 criteria for whether a particular type would qualify 184 under that type, 185 186 (2) the names and definitions of parameters, if any, which 187 are defined for all subtypes of that type (including 188 whether such parameters are required or optional), 189 190 (3) how a user agent and/or gateway should handle unknown 191 subtypes of this type, 192 193 (4) general considerations on gatewaying entities of this 194 top-level type, if any, and 195 196 (5) any restrictions on content-transfer-encodings for 197 entities of this top-level type. 198 199 3. Overview Of The Initial Top-Level Media Types 200 201 The five discrete top-level media types are: 202 203 (1) text -- textual information. The subtype "plain" in 204 particular indicates plain text containing no 205 formatting commands or directives of any sort. Plain 206 text is intended to be displayed "as-is". No special 207 software is required to get the full meaning of the 208 text, aside from support for the indicated character 209 set. Other subtypes are to be used for enriched text in 210 forms where application software may enhance the 211 appearance of the text, but such software must not be 212 required in order to get the general idea of the 213 content. Possible subtypes of "text" thus include any 214 word processor format that can be read without 215 resorting to software that understands the format. In 216 particular, formats that employ embeddded binary 217 formatting information are not considered directly 218 readable. A very simple and portable subtype, 219 "richtext", was defined in RFC 1341, with a further 220 revision in RFC 1896 under the name "enriched". 221 222 223 224 225 226 Freed & Borenstein Standards Track [Page 4] 227 228 RFC 2046 Media Types November 1996 229 230 231 (2) image -- image data. "Image" requires a display device 232 (such as a graphical display, a graphics printer, or a 233 FAX machine) to view the information. An initial 234 subtype is defined for the widely-used image format 235 JPEG. . subtypes are defined for two widely-used image 236 formats, jpeg and gif. 237 238 (3) audio -- audio data. "Audio" requires an audio output 239 device (such as a speaker or a telephone) to "display" 240 the contents. An initial subtype "basic" is defined in 241 this document. 242 243 (4) video -- video data. "Video" requires the capability 244 to display moving images, typically including 245 specialized hardware and software. An initial subtype 246 "mpeg" is defined in this document. 247 248 (5) application -- some other kind of data, typically 249 either uninterpreted binary data or information to be 250 processed by an application. The subtype "octet- 251 stream" is to be used in the case of uninterpreted 252 binary data, in which case the simplest recommended 253 action is to offer to write the information into a file 254 for the user. The "PostScript" subtype is also defined 255 for the transport of PostScript material. Other 256 expected uses for "application" include spreadsheets, 257 data for mail-based scheduling systems, and languages 258 for "active" (computational) messaging, and word 259 processing formats that are not directly readable. 260 Note that security considerations may exist for some 261 types of application data, most notably 262 "application/PostScript" and any form of active 263 messaging. These issues are discussed later in this 264 document. 265 266 The two composite top-level media types are: 267 268 (1) multipart -- data consisting of multiple entities of 269 independent data types. Four subtypes are initially 270 defined, including the basic "mixed" subtype specifying 271 a generic mixed set of parts, "alternative" for 272 representing the same data in multiple formats, 273 "parallel" for parts intended to be viewed 274 simultaneously, and "digest" for multipart entities in 275 which each part has a default type of "message/rfc822". 276 277 278 279 280 281 282 Freed & Borenstein Standards Track [Page 5] 283 284 RFC 2046 Media Types November 1996 285 286 287 (2) message -- an encapsulated message. A body of media 288 type "message" is itself all or a portion of some kind 289 of message object. Such objects may or may not in turn 290 contain other entities. The "rfc822" subtype is used 291 when the encapsulated content is itself an RFC 822 292 message. The "partial" subtype is defined for partial 293 RFC 822 messages, to permit the fragmented transmission 294 of bodies that are thought to be too large to be passed 295 through transport facilities in one piece. Another 296 subtype, "external-body", is defined for specifying 297 large bodies by reference to an external data source. 298 299 It should be noted that the list of media type values given here may 300 be augmented in time, via the mechanisms described above, and that 301 the set of subtypes is expected to grow substantially. 302 303 4. Discrete Media Type Values 304 305 Five of the seven initial media type values refer to discrete bodies. 306 The content of these types must be handled by non-MIME mechanisms; 307 they are opaque to MIME processors. 308 309 4.1. Text Media Type 310 311 The "text" media type is intended for sending material which is 312 principally textual in form. A "charset" parameter may be used to 313 indicate the character set of the body text for "text" subtypes, 314 notably including the subtype "text/plain", which is a generic 315 subtype for plain text. Plain text does not provide for or allow 316 formatting commands, font attribute specifications, processing 317 instructions, interpretation directives, or content markup. Plain 318 text is seen simply as a linear sequence of characters, possibly 319 interrupted by line breaks or page breaks. Plain text may allow the 320 stacking of several characters in the same position in the text. 321 Plain text in scripts like Arabic and Hebrew may also include 322 facilitites that allow the arbitrary mixing of text segments with 323 opposite writing directions. 324 325 Beyond plain text, there are many formats for representing what might 326 be known as "rich text". An interesting characteristic of many such 327 representations is that they are to some extent readable even without 328 the software that interprets them. It is useful, then, to 329 distinguish them, at the highest level, from such unreadable data as 330 images, audio, or text represented in an unreadable form. In the 331 absence of appropriate interpretation software, it is reasonable to 332 show subtypes of "text" to the user, while it is not reasonable to do 333 so with most nontextual data. Such formatted textual data should be 334 represented using subtypes of "text". 335 336 337 338 Freed & Borenstein Standards Track [Page 6] 339 340 RFC 2046 Media Types November 1996 341 342 343 4.1.1. Representation of Line Breaks 344 345 The canonical form of any MIME "text" subtype MUST always represent a 346 line break as a CRLF sequence. Similarly, any occurrence of CRLF in 347 MIME "text" MUST represent a line break. Use of CR and LF outside of 348 line break sequences is also forbidden. 349 350 This rule applies regardless of format or character set or sets 351 involved. 352 353 NOTE: The proper interpretation of line breaks when a body is 354 displayed depends on the media type. In particular, while it is 355 appropriate to treat a line break as a transition to a new line when 356 displaying a "text/plain" body, this treatment is actually incorrect 357 for other subtypes of "text" like "text/enriched" [RFC-1896]. 358 Similarly, whether or not line breaks should be added during display 359 operations is also a function of the media type. It should not be 360 necessary to add any line breaks to display "text/plain" correctly, 361 whereas proper display of "text/enriched" requires the appropriate 362 addition of line breaks. 363 364 NOTE: Some protocols defines a maximum line length. E.g. SMTP [RFC- 365 821] allows a maximum of 998 octets before the next CRLF sequence. 366 To be transported by such protocols, data which includes too long 367 segments without CRLF sequences must be encoded with a suitable 368 content-transfer-encoding. 369 370 4.1.2. Charset Parameter 371 372 A critical parameter that may be specified in the Content-Type field 373 for "text/plain" data is the character set. This is specified with a 374 "charset" parameter, as in: 375 376 Content-type: text/plain; charset=iso-8859-1 377 378 Unlike some other parameter values, the values of the charset 379 parameter are NOT case sensitive. The default character set, which 380 must be assumed in the absence of a charset parameter, is US-ASCII. 381 382 The specification for any future subtypes of "text" must specify 383 whether or not they will also utilize a "charset" parameter, and may 384 possibly restrict its values as well. For other subtypes of "text" 385 than "text/plain", the semantics of the "charset" parameter should be 386 defined to be identical to those specified here for "text/plain", 387 i.e., the body consists entirely of characters in the given charset. 388 In particular, definers of future "text" subtypes should pay close 389 attention to the implications of multioctet character sets for their 390 subtype definitions. 391 392 393 394 Freed & Borenstein Standards Track [Page 7] 395 396 RFC 2046 Media Types November 1996 397 398 399 The charset parameter for subtypes of "text" gives a name of a 400 character set, as "character set" is defined in RFC 2045. The rules 401 regarding line breaks detailed in the previous section must also be 402 observed -- a character set whose definition does not conform to 403 these rules cannot be used in a MIME "text" subtype. 404 405 An initial list of predefined character set names can be found at the 406 end of this section. Additional character sets may be registered 407 with IANA. 408 409 Other media types than subtypes of "text" might choose to employ the 410 charset parameter as defined here, but with the CRLF/line break 411 restriction removed. Therefore, all character sets that conform to 412 the general definition of "character set" in RFC 2045 can be 413 registered for MIME use. 414 415 Note that if the specified character set includes 8-bit characters 416 and such characters are used in the body, a Content-Transfer-Encoding 417 header field and a corresponding encoding on the data are required in 418 order to transmit the body via some mail transfer protocols, such as 419 SMTP [RFC-821]. 420 421 The default character set, US-ASCII, has been the subject of some 422 confusion and ambiguity in the past. Not only were there some 423 ambiguities in the definition, there have been wide variations in 424 practice. In order to eliminate such ambiguity and variations in the 425 future, it is strongly recommended that new user agents explicitly 426 specify a character set as a media type parameter in the Content-Type 427 header field. "US-ASCII" does not indicate an arbitrary 7-bit 428 character set, but specifies that all octets in the body must be 429 interpreted as characters according to the US-ASCII character set. 430 National and application-oriented versions of ISO 646 [ISO-646] are 431 usually NOT identical to US-ASCII, and in that case their use in 432 Internet mail is explicitly discouraged. The omission of the ISO 646 433 character set from this document is deliberate in this regard. The 434 character set name of "US-ASCII" explicitly refers to the character 435 set defined in ANSI X3.4-1986 [US- ASCII]. The new international 436 reference version (IRV) of the 1991 edition of ISO 646 is identical 437 to US-ASCII. The character set name "ASCII" is reserved and must not 438 be used for any purpose. 439 440 NOTE: RFC 821 explicitly specifies "ASCII", and references an earlier 441 version of the American Standard. Insofar as one of the purposes of 442 specifying a media type and character set is to permit the receiver 443 to unambiguously determine how the sender intended the coded message 444 to be interpreted, assuming anything other than "strict ASCII" as the 445 default would risk unintentional and incompatible changes to the 446 semantics of messages now being transmitted. This also implies that 447 448 449 450 Freed & Borenstein Standards Track [Page 8] 451 452 RFC 2046 Media Types November 1996 453 454 455 messages containing characters coded according to other versions of 456 ISO 646 than US-ASCII and the 1991 IRV, or using code-switching 457 procedures (e.g., those of ISO 2022), as well as 8bit or multiple 458 octet character encodings MUST use an appropriate character set 459 specification to be consistent with MIME. 460 461 The complete US-ASCII character set is listed in ANSI X3.4- 1986. 462 Note that the control characters including DEL (0-31, 127) have no 463 defined meaning in apart from the combination CRLF (US-ASCII values 464 13 and 10) indicating a new line. Two of the characters have de 465 facto meanings in wide use: FF (12) often means "start subsequent 466 text on the beginning of a new page"; and TAB or HT (9) often (though 467 not always) means "move the cursor to the next available column after 468 the current position where the column number is a multiple of 8 469 (counting the first column as column 0)." Aside from these 470 conventions, any use of the control characters or DEL in a body must 471 either occur 472 473 (1) because a subtype of text other than "plain" 474 specifically assigns some additional meaning, or 475 476 (2) within the context of a private agreement between the 477 sender and recipient. Such private agreements are 478 discouraged and should be replaced by the other 479 capabilities of this document. 480 481 NOTE: An enormous proliferation of character sets exist beyond US- 482 ASCII. A large number of partially or totally overlapping character 483 sets is NOT a good thing. A SINGLE character set that can be used 484 universally for representing all of the world's languages in Internet 485 mail would be preferrable. Unfortunately, existing practice in 486 several communities seems to point to the continued use of multiple 487 character sets in the near future. A small number of standard 488 character sets are, therefore, defined for Internet use in this 489 document. 490 491 The defined charset values are: 492 493 (1) US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII]. 494 495 (2) ISO-8859-X -- where "X" is to be replaced, as 496 necessary, for the parts of ISO-8859 [ISO-8859]. Note 497 that the ISO 646 character sets have deliberately been 498 omitted in favor of their 8859 replacements, which are 499 the designated character sets for Internet mail. As of 500 the publication of this document, the legitimate values 501 for "X" are the digits 1 through 10. 502 503 504 505 506 Freed & Borenstein Standards Track [Page 9] 507 508 RFC 2046 Media Types November 1996 509 510 511 Characters in the range 128-159 has no assigned meaning in ISO-8859- 512 X. Characters with values below 128 in ISO-8859-X have the same 513 assigned meaning as they do in US-ASCII. 514 515 Part 6 of ISO 8859 (Latin/Arabic alphabet) and part 8 (Latin/Hebrew 516 alphabet) includes both characters for which the normal writing 517 direction is right to left and characters for which it is left to 518 right, but do not define a canonical ordering method for representing 519 bi-directional text. The charset values "ISO-8859-6" and "ISO-8859- 520 8", however, specify that the visual method is used [RFC-1556]. 521 522 All of these character sets are used as pure 7bit or 8bit sets 523 without any shift or escape functions. The meaning of shift and 524 escape sequences in these character sets is not defined. 525 526 The character sets specified above are the ones that were relatively 527 uncontroversial during the drafting of MIME. This document does not 528 endorse the use of any particular character set other than US-ASCII, 529 and recognizes that the future evolution of world character sets 530 remains unclear. 531 532 Note that the character set used, if anything other than US- ASCII, 533 must always be explicitly specified in the Content-Type field. 534 535 No character set name other than those defined above may be used in 536 Internet mail without the publication of a formal specification and 537 its registration with IANA, or by private agreement, in which case 538 the character set name must begin with "X-". 539 540 Implementors are discouraged from defining new character sets unless 541 absolutely necessary. 542 543 The "charset" parameter has been defined primarily for the purpose of 544 textual data, and is described in this section for that reason. 545 However, it is conceivable that non-textual data might also wish to 546 specify a charset value for some purpose, in which case the same 547 syntax and values should be used. 548 549 In general, composition software should always use the "lowest common 550 denominator" character set possible. For example, if a body contains 551 only US-ASCII characters, it SHOULD be marked as being in the US- 552 ASCII character set, not ISO-8859-1, which, like all the ISO-8859 553 family of character sets, is a superset of US-ASCII. More generally, 554 if a widely-used character set is a subset of another character set, 555 and a body contains only characters in the widely-used subset, it 556 should be labelled as being in that subset. This will increase the 557 chances that the recipient will be able to view the resulting entity 558 correctly. 559 560 561 562 Freed & Borenstein Standards Track [Page 10] 563 564 RFC 2046 Media Types November 1996 565 566 567 4.1.3. Plain Subtype 568 569 The simplest and most important subtype of "text" is "plain". This 570 indicates plain text that does not contain any formatting commands or 571 directives. Plain text is intended to be displayed "as-is", that is, 572 no interpretation of embedded formatting commands, font attribute 573 specifications, processing instructions, interpretation directives, 574 or content markup should be necessary for proper display. The 575 default media type of "text/plain; charset=us-ascii" for Internet 576 mail describes existing Internet practice. That is, it is the type 577 of body defined by RFC 822. 578 579 No other "text" subtype is defined by this document. 580 581 4.1.4. Unrecognized Subtypes 582 583 Unrecognized subtypes of "text" should be treated as subtype "plain" 584 as long as the MIME implementation knows how to handle the charset. 585 Unrecognized subtypes which also specify an unrecognized charset 586 should be treated as "application/octet- stream". 587 588 4.2. Image Media Type 589 590 A media type of "image" indicates that the body contains an image. 591 The subtype names the specific image format. These names are not 592 case sensitive. An initial subtype is "jpeg" for the JPEG format 593 using JFIF encoding [JPEG]. 594 595 The list of "image" subtypes given here is neither exclusive nor 596 exhaustive, and is expected to grow as more types are registered with 597 IANA, as described in RFC 2048. 598 599 Unrecognized subtypes of "image" should at a miniumum be treated as 600 "application/octet-stream". Implementations may optionally elect to 601 pass subtypes of "image" that they do not specifically recognize to a 602 secure and robust general-purpose image viewing application, if such 603 an application is available. 604 605 NOTE: Using of a generic-purpose image viewing application this way 606 inherits the security problems of the most dangerous type supported 607 by the application. 608 609 4.3. Audio Media Type 610 611 A media type of "audio" indicates that the body contains audio data. 612 Although there is not yet a consensus on an "ideal" audio format for 613 use with computers, there is a pressing need for a format capable of 614 providing interoperable behavior. 615 616 617 618 Freed & Borenstein Standards Track [Page 11] 619 620 RFC 2046 Media Types November 1996 621 622 623 The initial subtype of "basic" is specified to meet this requirement 624 by providing an absolutely minimal lowest common denominator audio 625 format. It is expected that richer formats for higher quality and/or 626 lower bandwidth audio will be defined by a later document. 627 628 The content of the "audio/basic" subtype is single channel audio 629 encoded using 8bit ISDN mu-law [PCM] at a sample rate of 8000 Hz. 630 631 Unrecognized subtypes of "audio" should at a miniumum be treated as 632 "application/octet-stream". Implementations may optionally elect to 633 pass subtypes of "audio" that they do not specifically recognize to a 634 robust general-purpose audio playing application, if such an 635 application is available. 636 637 4.4. Video Media Type 638 639 A media type of "video" indicates that the body contains a time- 640 varying-picture image, possibly with color and coordinated sound. 641 The term 'video' is used in its most generic sense, rather than with 642 reference to any particular technology or format, and is not meant to 643 preclude subtypes such as animated drawings encoded compactly. The 644 subtype "mpeg" refers to video coded according to the MPEG standard 645 [MPEG]. 646 647 Note that although in general this document strongly discourages the 648 mixing of multiple media in a single body, it is recognized that many 649 so-called video formats include a representation for synchronized 650 audio, and this is explicitly permitted for subtypes of "video". 651 652 Unrecognized subtypes of "video" should at a minumum be treated as 653 "application/octet-stream". Implementations may optionally elect to 654 pass subtypes of "video" that they do not specifically recognize to a 655 robust general-purpose video display application, if such an 656 application is available. 657 658 4.5. Application Media Type 659 660 The "application" media type is to be used for discrete data which do 661 not fit in any of the other categories, and particularly for data to 662 be processed by some type of application program. This is 663 information which must be processed by an application before it is 664 viewable or usable by a user. Expected uses for the "application" 665 media type include file transfer, spreadsheets, data for mail-based 666 scheduling systems, and languages for "active" (computational) 667 material. (The latter, in particular, can pose security problems 668 which must be understood by implementors, and are considered in 669 detail in the discussion of the "application/PostScript" media type.) 670 671 672 673 674 Freed & Borenstein Standards Track [Page 12] 675 676 RFC 2046 Media Types November 1996 677 678 679 For example, a meeting scheduler might define a standard 680 representation for information about proposed meeting dates. An 681 intelligent user agent would use this information to conduct a dialog 682 with the user, and might then send additional material based on that 683 dialog. More generally, there have been several "active" messaging 684 languages developed in which programs in a suitably specialized 685 language are transported to a remote location and automatically run 686 in the recipient's environment. 687 688 Such applications may be defined as subtypes of the "application" 689 media type. This document defines two subtypes: 690 691 octet-stream, and PostScript. 692 693 The subtype of "application" will often be either the name or include 694 part of the name of the application for which the data are intended. 695 This does not mean, however, that any application program name may be 696 used freely as a subtype of "application". 697 698 4.5.1. Octet-Stream Subtype 699 700 The "octet-stream" subtype is used to indicate that a body contains 701 arbitrary binary data. The set of currently defined parameters is: 702 703 (1) TYPE -- the general type or category of binary data. 704 This is intended as information for the human recipient 705 rather than for any automatic processing. 706 707 (2) PADDING -- the number of bits of padding that were 708 appended to the bit-stream comprising the actual 709 contents to produce the enclosed 8bit byte-oriented 710 data. This is useful for enclosing a bit-stream in a 711 body when the total number of bits is not a multiple of 712 8. 713 714 Both of these parameters are optional. 715 716 An additional parameter, "CONVERSIONS", was defined in RFC 1341 but 717 has since been removed. RFC 1341 also defined the use of a "NAME" 718 parameter which gave a suggested file name to be used if the data 719 were to be written to a file. This has been deprecated in 720 anticipation of a separate Content-Disposition header field, to be 721 defined in a subsequent RFC. 722 723 The recommended action for an implementation that receives an 724 "application/octet-stream" entity is to simply offer to put the data 725 in a file, with any Content-Transfer-Encoding undone, or perhaps to 726 use it as input to a user-specified process. 727 728 729 730 Freed & Borenstein Standards Track [Page 13] 731 732 RFC 2046 Media Types November 1996 733 734 735 To reduce the danger of transmitting rogue programs, it is strongly 736 recommended that implementations NOT implement a path-search 737 mechanism whereby an arbitrary program named in the Content-Type 738 parameter (e.g., an "interpreter=" parameter) is found and executed 739 using the message body as input. 740 741 4.5.2. PostScript Subtype 742 743 A media type of "application/postscript" indicates a PostScript 744 program. Currently two variants of the PostScript language are 745 allowed; the original level 1 variant is described in [POSTSCRIPT] 746 and the more recent level 2 variant is described in [POSTSCRIPT2]. 747 748 PostScript is a registered trademark of Adobe Systems, Inc. Use of 749 the MIME media type "application/postscript" implies recognition of 750 that trademark and all the rights it entails. 751 752 The PostScript language definition provides facilities for internal 753 labelling of the specific language features a given program uses. 754 This labelling, called the PostScript document structuring 755 conventions, or DSC, is very general and provides substantially more 756 information than just the language level. The use of document 757 structuring conventions, while not required, is strongly recommended 758 as an aid to interoperability. Documents which lack proper 759 structuring conventions cannot be tested to see whether or not they 760 will work in a given environment. As such, some systems may assume 761 the worst and refuse to process unstructured documents. 762 763 The execution of general-purpose PostScript interpreters entails 764 serious security risks, and implementors are discouraged from simply 765 sending PostScript bodies to "off- the-shelf" interpreters. While it 766 is usually safe to send PostScript to a printer, where the potential 767 for harm is greatly constrained by typical printer environments, 768 implementors should consider all of the following before they add 769 interactive display of PostScript bodies to their MIME readers. 770 771 The remainder of this section outlines some, though probably not all, 772 of the possible problems with the transport of PostScript entities. 773 774 (1) Dangerous operations in the PostScript language 775 include, but may not be limited to, the PostScript 776 operators "deletefile", "renamefile", "filenameforall", 777 and "file". "File" is only dangerous when applied to 778 something other than standard input or output. 779 Implementations may also define additional nonstandard 780 file operators; these may also pose a threat to 781 security. "Filenameforall", the wildcard file search 782 operator, may appear at first glance to be harmless. 783 784 785 786 Freed & Borenstein Standards Track [Page 14] 787 788 RFC 2046 Media Types November 1996 789 790 791 Note, however, that this operator has the potential to 792 reveal information about what files the recipient has 793 access to, and this information may itself be 794 sensitive. Message senders should avoid the use of 795 potentially dangerous file operators, since these 796 operators are quite likely to be unavailable in secure 797 PostScript implementations. Message receiving and 798 displaying software should either completely disable 799 all potentially dangerous file operators or take 800 special care not to delegate any special authority to 801 their operation. These operators should be viewed as 802 being done by an outside agency when interpreting 803 PostScript documents. Such disabling and/or checking 804 should be done completely outside of the reach of the 805 PostScript language itself; care should be taken to 806 insure that no method exists for re-enabling full- 807 function versions of these operators. 808 809 (2) The PostScript language provides facilities for exiting 810 the normal interpreter, or server, loop. Changes made 811 in this "outer" environment are customarily retained 812 across documents, and may in some cases be retained 813 semipermanently in nonvolatile memory. The operators 814 associated with exiting the interpreter loop have the 815 potential to interfere with subsequent document 816 processing. As such, their unrestrained use 817 constitutes a threat of service denial. PostScript 818 operators that exit the interpreter loop include, but 819 may not be limited to, the exitserver and startjob 820 operators. Message sending software should not 821 generate PostScript that depends on exiting the 822 interpreter loop to operate, since the ability to exit 823 will probably be unavailable in secure PostScript 824 implementations. Message receiving and displaying 825 software should completely disable the ability to make 826 retained changes to the PostScript environment by 827 eliminating or disabling the "startjob" and 828 "exitserver" operations. If these operations cannot be 829 eliminated or completely disabled the password 830 associated with them should at least be set to a hard- 831 to-guess value. 832 833 (3) PostScript provides operators for setting system-wide 834 and device-specific parameters. These parameter 835 settings may be retained across jobs and may 836 potentially pose a threat to the correct operation of 837 the interpreter. The PostScript operators that set 838 system and device parameters include, but may not be 839 840 841 842 Freed & Borenstein Standards Track [Page 15] 843 844 RFC 2046 Media Types November 1996 845 846 847 limited to, the "setsystemparams" and "setdevparams" 848 operators. Message sending software should not 849 generate PostScript that depends on the setting of 850 system or device parameters to operate correctly. The 851 ability to set these parameters will probably be 852 unavailable in secure PostScript implementations. 853 Message receiving and displaying software should 854 disable the ability to change system and device 855 parameters. If these operators cannot be completely 856 disabled the password associated with them should at 857 least be set to a hard-to-guess value. 858 859 (4) Some PostScript implementations provide nonstandard 860 facilities for the direct loading and execution of 861 machine code. Such facilities are quite obviously open 862 to substantial abuse. Message sending software should 863 not make use of such features. Besides being totally 864 hardware-specific, they are also likely to be 865 unavailable in secure implementations of PostScript. 866 Message receiving and displaying software should not 867 allow such operators to be used if they exist. 868 869 (5) PostScript is an extensible language, and many, if not 870 most, implementations of it provide a number of their 871 own extensions. This document does not deal with such 872 extensions explicitly since they constitute an unknown 873 factor. Message sending software should not make use 874 of nonstandard extensions; they are likely to be 875 missing from some implementations. Message receiving 876 and displaying software should make sure that any 877 nonstandard PostScript operators are secure and don't 878 present any kind of threat. 879 880 (6) It is possible to write PostScript that consumes huge 881 amounts of various system resources. It is also 882 possible to write PostScript programs that loop 883 indefinitely. Both types of programs have the 884 potential to cause damage if sent to unsuspecting 885 recipients. Message-sending software should avoid the 886 construction and dissemination of such programs, which 887 is antisocial. Message receiving and displaying 888 software should provide appropriate mechanisms to abort 889 processing after a reasonable amount of time has 890 elapsed. In addition, PostScript interpreters should be 891 limited to the consumption of only a reasonable amount 892 of any given system resource. 893 894 895 896 897 898 Freed & Borenstein Standards Track [Page 16] 899 900 RFC 2046 Media Types November 1996 901 902 903 (7) It is possible to include raw binary information inside 904 PostScript in various forms. This is not recommended 905 for use in Internet mail, both because it is not 906 supported by all PostScript interpreters and because it 907 significantly complicates the use of a MIME Content- 908 Transfer-Encoding. (Without such binary, PostScript 909 may typically be viewed as line-oriented data. The 910 treatment of CRLF sequences becomes extremely 911 problematic if binary and line-oriented data are mixed 912 in a single Postscript data stream.) 913 914 (8) Finally, bugs may exist in some PostScript interpreters 915 which could possibly be exploited to gain unauthorized 916 access to a recipient's system. Apart from noting this 917 possibility, there is no specific action to take to 918 prevent this, apart from the timely correction of such 919 bugs if any are found. 920 921 4.5.3. Other Application Subtypes 922 923 It is expected that many other subtypes of "application" will be 924 defined in the future. MIME implementations must at a minimum treat 925 any unrecognized subtypes as being equivalent to "application/octet- 926 stream". 927 928 5. Composite Media Type Values 929 930 The remaining two of the seven initial Content-Type values refer to 931 composite entities. Composite entities are handled using MIME 932 mechanisms -- a MIME processor typically handles the body directly. 933 934 5.1. Multipart Media Type 935 936 In the case of multipart entities, in which one or more different 937 sets of data are combined in a single body, a "multipart" media type 938 field must appear in the entity's header. The body must then contain 939 one or more body parts, each preceded by a boundary delimiter line, 940 and the last one followed by a closing boundary delimiter line. 941 After its boundary delimiter line, each body part then consists of a 942 header area, a blank line, and a body area. Thus a body part is 943 similar to an RFC 822 message in syntax, but different in meaning. 944 945 A body part is an entity and hence is NOT to be interpreted as 946 actually being an RFC 822 message. To begin with, NO header fields 947 are actually required in body parts. A body part that starts with a 948 blank line, therefore, is allowed and is a body part for which all 949 default values are to be assumed. In such a case, the absence of a 950 Content-Type header usually indicates that the corresponding body has 951 952 953 954 Freed & Borenstein Standards Track [Page 17] 955 956 RFC 2046 Media Types November 1996 957 958 959 a content-type of "text/plain; charset=US-ASCII". 960 961 The only header fields that have defined meaning for body parts are 962 those the names of which begin with "Content-". All other header 963 fields may be ignored in body parts. Although they should generally 964 be retained if at all possible, they may be discarded by gateways if 965 necessary. Such other fields are permitted to appear in body parts 966 but must not be depended on. "X-" fields may be created for 967 experimental or private purposes, with the recognition that the 968 information they contain may be lost at some gateways. 969 970 NOTE: The distinction between an RFC 822 message and a body part is 971 subtle, but important. A gateway between Internet and X.400 mail, 972 for example, must be able to tell the difference between a body part 973 that contains an image and a body part that contains an encapsulated 974 message, the body of which is a JPEG image. In order to represent 975 the latter, the body part must have "Content-Type: message/rfc822", 976 and its body (after the blank line) must be the encapsulated message, 977 with its own "Content-Type: image/jpeg" header field. The use of 978 similar syntax facilitates the conversion of messages to body parts, 979 and vice versa, but the distinction between the two must be 980 understood by implementors. (For the special case in which parts 981 actually are messages, a "digest" subtype is also defined.) 982 983 As stated previously, each body part is preceded by a boundary 984 delimiter line that contains the boundary delimiter. The boundary 985 delimiter MUST NOT appear inside any of the encapsulated parts, on a 986 line by itself or as the prefix of any line. This implies that it is 987 crucial that the composing agent be able to choose and specify a 988 unique boundary parameter value that does not contain the boundary 989 parameter value of an enclosing multipart as a prefix. 990 991 All present and future subtypes of the "multipart" type must use an 992 identical syntax. Subtypes may differ in their semantics, and may 993 impose additional restrictions on syntax, but must conform to the 994 required syntax for the "multipart" type. This requirement ensures 995 that all conformant user agents will at least be able to recognize 996 and separate the parts of any multipart entity, even those of an 997 unrecognized subtype. 998 999 As stated in the definition of the Content-Transfer-Encoding field 1000 [RFC 2045], no encoding other than "7bit", "8bit", or "binary" is 1001 permitted for entities of type "multipart". The "multipart" boundary 1002 delimiters and header fields are always represented as 7bit US-ASCII 1003 in any case (though the header fields may encode non-US-ASCII header 1004 text as per RFC 2047) and data within the body parts can be encoded 1005 on a part-by-part basis, with Content-Transfer-Encoding fields for 1006 each appropriate body part. 1007 1008 1009 1010 Freed & Borenstein Standards Track [Page 18] 1011 1012 RFC 2046 Media Types November 1996 1013 1014 1015 5.1.1. Common Syntax 1016 1017 This section defines a common syntax for subtypes of "multipart". 1018 All subtypes of "multipart" must use this syntax. A simple example 1019 of a multipart message also appears in this section. An example of a 1020 more complex multipart message is given in RFC 2049. 1021 1022 The Content-Type field for multipart entities requires one parameter, 1023 "boundary". The boundary delimiter line is then defined as a line 1024 consisting entirely of two hyphen characters ("-", decimal value 45) 1025 followed by the boundary parameter value from the Content-Type header 1026 field, optional linear whitespace, and a terminating CRLF. 1027 1028 NOTE: The hyphens are for rough compatibility with the earlier RFC 1029 934 method of message encapsulation, and for ease of searching for 1030 the boundaries in some implementations. However, it should be noted 1031 that multipart messages are NOT completely compatible with RFC 934 1032 encapsulations; in particular, they do not obey RFC 934 quoting 1033 conventions for embedded lines that begin with hyphens. This 1034 mechanism was chosen over the RFC 934 mechanism because the latter 1035 causes lines to grow with each level of quoting. The combination of 1036 this growth with the fact that SMTP implementations sometimes wrap 1037 long lines made the RFC 934 mechanism unsuitable for use in the event 1038 that deeply-nested multipart structuring is ever desired. 1039 1040 WARNING TO IMPLEMENTORS: The grammar for parameters on the Content- 1041 type field is such that it is often necessary to enclose the boundary 1042 parameter values in quotes on the Content-type line. This is not 1043 always necessary, but never hurts. Implementors should be sure to 1044 study the grammar carefully in order to avoid producing invalid 1045 Content-type fields. Thus, a typical "multipart" Content-Type header 1046 field might look like this: 1047 1048 Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p 1049 1050 But the following is not valid: 1051 1052 Content-Type: multipart/mixed; boundary=gc0pJq0M:08jU534c0p 1053 1054 (because of the colon) and must instead be represented as 1055 1056 Content-Type: multipart/mixed; boundary="gc0pJq0M:08jU534c0p" 1057 1058 This Content-Type value indicates that the content consists of one or 1059 more parts, each with a structure that is syntactically identical to 1060 an RFC 822 message, except that the header area is allowed to be 1061 completely empty, and that the parts are each preceded by the line 1062 1063 1064 1065 1066 Freed & Borenstein Standards Track [Page 19] 1067 1068 RFC 2046 Media Types November 1996 1069 1070 1071 --gc0pJq0M:08jU534c0p 1072 1073 The boundary delimiter MUST occur at the beginning of a line, i.e., 1074 following a CRLF, and the initial CRLF is considered to be attached 1075 to the boundary delimiter line rather than part of the preceding 1076 part. The boundary may be followed by zero or more characters of 1077 linear whitespace. It is then terminated by either another CRLF and 1078 the header fields for the next part, or by two CRLFs, in which case 1079 there are no header fields for the next part. If no Content-Type 1080 field is present it is assumed to be "message/rfc822" in a 1081 "multipart/digest" and "text/plain" otherwise. 1082 1083 NOTE: The CRLF preceding the boundary delimiter line is conceptually 1084 attached to the boundary so that it is possible to have a part that 1085 does not end with a CRLF (line break). Body parts that must be 1086 considered to end with line breaks, therefore, must have two CRLFs 1087 preceding the boundary delimiter line, the first of which is part of 1088 the preceding body part, and the second of which is part of the 1089 encapsulation boundary. 1090 1091 Boundary delimiters must not appear within the encapsulated material, 1092 and must be no longer than 70 characters, not counting the two 1093 leading hyphens. 1094 1095 The boundary delimiter line following the last body part is a 1096 distinguished delimiter that indicates that no further body parts 1097 will follow. Such a delimiter line is identical to the previous 1098 delimiter lines, with the addition of two more hyphens after the 1099 boundary parameter value. 1100 1101 --gc0pJq0M:08jU534c0p-- 1102 1103 NOTE TO IMPLEMENTORS: Boundary string comparisons must compare the 1104 boundary value with the beginning of each candidate line. An exact 1105 match of the entire candidate line is not required; it is sufficient 1106 that the boundary appear in its entirety following the CRLF. 1107 1108 There appears to be room for additional information prior to the 1109 first boundary delimiter line and following the final boundary 1110 delimiter line. These areas should generally be left blank, and 1111 implementations must ignore anything that appears before the first 1112 boundary delimiter line or after the last one. 1113 1114 NOTE: These "preamble" and "epilogue" areas are generally not used 1115 because of the lack of proper typing of these parts and the lack of 1116 clear semantics for handling these areas at gateways, particularly 1117 X.400 gateways. However, rather than leaving the preamble area 1118 blank, many MIME implementations have found this to be a convenient 1119 1120 1121 1122 Freed & Borenstein Standards Track [Page 20] 1123 1124 RFC 2046 Media Types November 1996 1125 1126 1127 place to insert an explanatory note for recipients who read the 1128 message with pre-MIME software, since such notes will be ignored by 1129 MIME-compliant software. 1130 1131 NOTE: Because boundary delimiters must not appear in the body parts 1132 being encapsulated, a user agent must exercise care to choose a 1133 unique boundary parameter value. The boundary parameter value in the 1134 example above could have been the result of an algorithm designed to 1135 produce boundary delimiters with a very low probability of already 1136 existing in the data to be encapsulated without having to prescan the 1137 data. Alternate algorithms might result in more "readable" boundary 1138 delimiters for a recipient with an old user agent, but would require 1139 more attention to the possibility that the boundary delimiter might 1140 appear at the beginning of some line in the encapsulated part. The 1141 simplest boundary delimiter line possible is something like "---", 1142 with a closing boundary delimiter line of "-----". 1143 1144 As a very simple example, the following multipart message has two 1145 parts, both of them plain text, one of them explicitly typed and one 1146 of them implicitly typed: 1147 1148 From: Nathaniel Borenstein <nsb@bellcore.com> 1149 To: Ned Freed <ned@innosoft.com> 1150 Date: Sun, 21 Mar 1993 23:56:48 -0800 (PST) 1151 Subject: Sample message 1152 MIME-Version: 1.0 1153 Content-type: multipart/mixed; boundary="simple boundary" 1154 1155 This is the preamble. It is to be ignored, though it 1156 is a handy place for composition agents to include an 1157 explanatory note to non-MIME conformant readers. 1158 1159 --simple boundary 1160 1161 This is implicitly typed plain US-ASCII text. 1162 It does NOT end with a linebreak. 1163 --simple boundary 1164 Content-type: text/plain; charset=us-ascii 1165 1166 This is explicitly typed plain US-ASCII text. 1167 It DOES end with a linebreak. 1168 1169 --simple boundary-- 1170 1171 This is the epilogue. It is also to be ignored. 1172 1173 1174 1175 1176 1177 1178 Freed & Borenstein Standards Track [Page 21] 1179 1180 RFC 2046 Media Types November 1996 1181 1182 1183 The use of a media type of "multipart" in a body part within another 1184 "multipart" entity is explicitly allowed. In such cases, for obvious 1185 reasons, care must be taken to ensure that each nested "multipart" 1186 entity uses a different boundary delimiter. See RFC 2049 for an 1187 example of nested "multipart" entities. 1188 1189 The use of the "multipart" media type with only a single body part 1190 may be useful in certain contexts, and is explicitly permitted. 1191 1192 NOTE: Experience has shown that a "multipart" media type with a 1193 single body part is useful for sending non-text media types. It has 1194 the advantage of providing the preamble as a place to include 1195 decoding instructions. In addition, a number of SMTP gateways move 1196 or remove the MIME headers, and a clever MIME decoder can take a good 1197 guess at multipart boundaries even in the absence of the Content-Type 1198 header and thereby successfully decode the message. 1199 1200 The only mandatory global parameter for the "multipart" media type is 1201 the boundary parameter, which consists of 1 to 70 characters from a 1202 set of characters known to be very robust through mail gateways, and 1203 NOT ending with white space. (If a boundary delimiter line appears to 1204 end with white space, the white space must be presumed to have been 1205 added by a gateway, and must be deleted.) It is formally specified 1206 by the following BNF: 1207 1208 boundary := 0*69<bchars> bcharsnospace 1209 1210 bchars := bcharsnospace / " " 1211 1212 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / 1213 "+" / "_" / "," / "-" / "." / 1214 "/" / ":" / "=" / "?" 1215 1216 Overall, the body of a "multipart" entity may be specified as 1217 follows: 1218 1219 dash-boundary := "--" boundary 1220 ; boundary taken from the value of 1221 ; boundary parameter of the 1222 ; Content-Type field. 1223 1224 multipart-body := [preamble CRLF] 1225 dash-boundary transport-padding CRLF 1226 body-part *encapsulation 1227 close-delimiter transport-padding 1228 [CRLF epilogue] 1229 1230 1231 1232 1233 1234 Freed & Borenstein Standards Track [Page 22] 1235 1236 RFC 2046 Media Types November 1996 1237 1238 1239 transport-padding := *LWSP-char 1240 ; Composers MUST NOT generate 1241 ; non-zero length transport 1242 ; padding, but receivers MUST 1243 ; be able to handle padding 1244 ; added by message transports. 1245 1246 encapsulation := delimiter transport-padding 1247 CRLF body-part 1248 1249 delimiter := CRLF dash-boundary 1250 1251 close-delimiter := delimiter "--" 1252 1253 preamble := discard-text 1254 1255 epilogue := discard-text 1256 1257 discard-text := *(*text CRLF) *text 1258 ; May be ignored or discarded. 1259 1260 body-part := MIME-part-headers [CRLF *OCTET] 1261 ; Lines in a body-part must not start 1262 ; with the specified dash-boundary and 1263 ; the delimiter must not appear anywhere 1264 ; in the body part. Note that the 1265 ; semantics of a body-part differ from 1266 ; the semantics of a message, as 1267 ; described in the text. 1268 1269 OCTET := <any 0-255 octet value> 1270 1271 IMPORTANT: The free insertion of linear-white-space and RFC 822 1272 comments between the elements shown in this BNF is NOT allowed since 1273 this BNF does not specify a structured header field. 1274 1275 NOTE: In certain transport enclaves, RFC 822 restrictions such as 1276 the one that limits bodies to printable US-ASCII characters may not 1277 be in force. (That is, the transport domains may exist that resemble 1278 standard Internet mail transport as specified in RFC 821 and assumed 1279 by RFC 822, but without certain restrictions.) The relaxation of 1280 these restrictions should be construed as locally extending the 1281 definition of bodies, for example to include octets outside of the 1282 US-ASCII range, as long as these extensions are supported by the 1283 transport and adequately documented in the Content- Transfer-Encoding 1284 header field. However, in no event are headers (either message 1285 headers or body part headers) allowed to contain anything other than 1286 US-ASCII characters. 1287 1288 1289 1290 Freed & Borenstein Standards Track [Page 23] 1291 1292 RFC 2046 Media Types November 1996 1293 1294 1295 NOTE: Conspicuously missing from the "multipart" type is a notion of 1296 structured, related body parts. It is recommended that those wishing 1297 to provide more structured or integrated multipart messaging 1298 facilities should define subtypes of multipart that are syntactically 1299 identical but define relationships between the various parts. For 1300 example, subtypes of multipart could be defined that include a 1301 distinguished part which in turn is used to specify the relationships 1302 between the other parts, probably referring to them by their 1303 Content-ID field. Old implementations will not recognize the new 1304 subtype if this approach is used, but will treat it as 1305 multipart/mixed and will thus be able to show the user the parts that 1306 are recognized. 1307 1308 5.1.2. Handling Nested Messages and Multiparts 1309 1310 The "message/rfc822" subtype defined in a subsequent section of this 1311 document has no terminating condition other than running out of data. 1312 Similarly, an improperly truncated "multipart" entity may not have 1313 any terminating boundary marker, and can turn up operationally due to 1314 mail system malfunctions. 1315 1316 It is essential that such entities be handled correctly when they are 1317 themselves imbedded inside of another "multipart" structure. MIME 1318 implementations are therefore required to recognize outer level 1319 boundary markers at ANY level of inner nesting. It is not sufficient 1320 to only check for the next expected marker or other terminating 1321 condition. 1322 1323 5.1.3. Mixed Subtype 1324 1325 The "mixed" subtype of "multipart" is intended for use when the body 1326 parts are independent and need to be bundled in a particular order. 1327 Any "multipart" subtypes that an implementation does not recognize 1328 must be treated as being of subtype "mixed". 1329 1330 5.1.4. Alternative Subtype 1331 1332 The "multipart/alternative" type is syntactically identical to 1333 "multipart/mixed", but the semantics are different. In particular, 1334 each of the body parts is an "alternative" version of the same 1335 information. 1336 1337 Systems should recognize that the content of the various parts are 1338 interchangeable. Systems should choose the "best" type based on the 1339 local environment and references, in some cases even through user 1340 interaction. As with "multipart/mixed", the order of body parts is 1341 significant. In this case, the alternatives appear in an order of 1342 increasing faithfulness to the original content. In general, the 1343 1344 1345 1346 Freed & Borenstein Standards Track [Page 24] 1347 1348 RFC 2046 Media Types November 1996 1349 1350 1351 best choice is the LAST part of a type supported by the recipient 1352 system's local environment. 1353 1354 "Multipart/alternative" may be used, for example, to send a message 1355 in a fancy text format in such a way that it can easily be displayed 1356 anywhere: 1357 1358 From: Nathaniel Borenstein <nsb@bellcore.com> 1359 To: Ned Freed <ned@innosoft.com> 1360 Date: Mon, 22 Mar 1993 09:41:09 -0800 (PST) 1361 Subject: Formatted text mail 1362 MIME-Version: 1.0 1363 Content-Type: multipart/alternative; boundary=boundary42 1364 1365 --boundary42 1366 Content-Type: text/plain; charset=us-ascii 1367 1368 ... plain text version of message goes here ... 1369 1370 --boundary42 1371 Content-Type: text/enriched 1372 1373 ... RFC 1896 text/enriched version of same message 1374 goes here ... 1375 1376 --boundary42 1377 Content-Type: application/x-whatever 1378 1379 ... fanciest version of same message goes here ... 1380 1381 --boundary42-- 1382 1383 In this example, users whose mail systems understood the 1384 "application/x-whatever" format would see only the fancy version, 1385 while other users would see only the enriched or plain text version, 1386 depending on the capabilities of their system. 1387 1388 In general, user agents that compose "multipart/alternative" entities 1389 must place the body parts in increasing order of preference, that is, 1390 with the preferred format last. For fancy text, the sending user 1391 agent should put the plainest format first and the richest format 1392 last. Receiving user agents should pick and display the last format 1393 they are capable of displaying. In the case where one of the 1394 alternatives is itself of type "multipart" and contains unrecognized 1395 sub-parts, the user agent may choose either to show that alternative, 1396 an earlier alternative, or both. 1397 1398 1399 1400 1401 1402 Freed & Borenstein Standards Track [Page 25] 1403 1404 RFC 2046 Media Types November 1996 1405 1406 1407 NOTE: From an implementor's perspective, it might seem more sensible 1408 to reverse this ordering, and have the plainest alternative last. 1409 However, placing the plainest alternative first is the friendliest 1410 possible option when "multipart/alternative" entities are viewed 1411 using a non-MIME-conformant viewer. While this approach does impose 1412 some burden on conformant MIME viewers, interoperability with older 1413 mail readers was deemed to be more important in this case. 1414 1415 It may be the case that some user agents, if they can recognize more 1416 than one of the formats, will prefer to offer the user the choice of 1417 which format to view. This makes sense, for example, if a message 1418 includes both a nicely- formatted image version and an easily-edited 1419 text version. What is most critical, however, is that the user not 1420 automatically be shown multiple versions of the same data. Either 1421 the user should be shown the last recognized version or should be 1422 given the choice. 1423 1424 THE SEMANTICS OF CONTENT-ID IN MULTIPART/ALTERNATIVE: Each part of a 1425 "multipart/alternative" entity represents the same data, but the 1426 mappings between the two are not necessarily without information 1427 loss. For example, information is lost when translating ODA to 1428 PostScript or plain text. It is recommended that each part should 1429 have a different Content-ID value in the case where the information 1430 content of the two parts is not identical. And when the information 1431 content is identical -- for example, where several parts of type 1432 "message/external-body" specify alternate ways to access the 1433 identical data -- the same Content-ID field value should be used, to 1434 optimize any caching mechanisms that might be present on the 1435 recipient's end. However, the Content-ID values used by the parts 1436 should NOT be the same Content-ID value that describes the 1437 "multipart/alternative" as a whole, if there is any such Content-ID 1438 field. That is, one Content-ID value will refer to the 1439 "multipart/alternative" entity, while one or more other Content-ID 1440 values will refer to the parts inside it. 1441 1442 5.1.5. Digest Subtype 1443 1444 This document defines a "digest" subtype of the "multipart" Content- 1445 Type. This type is syntactically identical to "multipart/mixed", but 1446 the semantics are different. In particular, in a digest, the default 1447 Content-Type value for a body part is changed from "text/plain" to 1448 "message/rfc822". This is done to allow a more readable digest 1449 format that is largely compatible (except for the quoting convention) 1450 with RFC 934. 1451 1452 Note: Though it is possible to specify a Content-Type value for a 1453 body part in a digest which is other than "message/rfc822", such as a 1454 "text/plain" part containing a description of the material in the 1455 1456 1457 1458 Freed & Borenstein Standards Track [Page 26] 1459 1460 RFC 2046 Media Types November 1996 1461 1462 1463 digest, actually doing so is undesireble. The "multipart/digest" 1464 Content-Type is intended to be used to send collections of messages. 1465 If a "text/plain" part is needed, it should be included as a seperate 1466 part of a "multipart/mixed" message. 1467 1468 A digest in this format might, then, look something like this: 1469 1470 From: Moderator-Address 1471 To: Recipient-List 1472 Date: Mon, 22 Mar 1994 13:34:51 +0000 1473 Subject: Internet Digest, volume 42 1474 MIME-Version: 1.0 1475 Content-Type: multipart/mixed; 1476 boundary="---- main boundary ----" 1477 1478 ------ main boundary ---- 1479 1480 ...Introductory text or table of contents... 1481 1482 ------ main boundary ---- 1483 Content-Type: multipart/digest; 1484 boundary="---- next message ----" 1485 1486 ------ next message ---- 1487 1488 From: someone-else 1489 Date: Fri, 26 Mar 1993 11:13:32 +0200 1490 Subject: my opinion 1491 1492 ...body goes here ... 1493 1494 ------ next message ---- 1495 1496 From: someone-else-again 1497 Date: Fri, 26 Mar 1993 10:07:13 -0500 1498 Subject: my different opinion 1499 1500 ... another body goes here ... 1501 1502 ------ next message ------ 1503 1504 ------ main boundary ------ 1505 1506 5.1.6. Parallel Subtype 1507 1508 This document defines a "parallel" subtype of the "multipart" 1509 Content-Type. This type is syntactically identical to 1510 "multipart/mixed", but the semantics are different. In particular, 1511 1512 1513 1514 Freed & Borenstein Standards Track [Page 27] 1515 1516 RFC 2046 Media Types November 1996 1517 1518 1519 in a parallel entity, the order of body parts is not significant. 1520 1521 A common presentation of this type is to display all of the parts 1522 simultaneously on hardware and software that are capable of doing so. 1523 However, composing agents should be aware that many mail readers will 1524 lack this capability and will show the parts serially in any event. 1525 1526 5.1.7. Other Multipart Subtypes 1527 1528 Other "multipart" subtypes are expected in the future. MIME 1529 implementations must in general treat unrecognized subtypes of 1530 "multipart" as being equivalent to "multipart/mixed". 1531 1532 5.2. Message Media Type 1533 1534 It is frequently desirable, in sending mail, to encapsulate another 1535 mail message. A special media type, "message", is defined to 1536 facilitate this. In particular, the "rfc822" subtype of "message" is 1537 used to encapsulate RFC 822 messages. 1538 1539 NOTE: It has been suggested that subtypes of "message" might be 1540 defined for forwarded or rejected messages. However, forwarded and 1541 rejected messages can be handled as multipart messages in which the 1542 first part contains any control or descriptive information, and a 1543 second part, of type "message/rfc822", is the forwarded or rejected 1544 message. Composing rejection and forwarding messages in this manner 1545 will preserve the type information on the original message and allow 1546 it to be correctly presented to the recipient, and hence is strongly 1547 encouraged. 1548 1549 Subtypes of "message" often impose restrictions on what encodings are 1550 allowed. These restrictions are described in conjunction with each 1551 specific subtype. 1552 1553 Mail gateways, relays, and other mail handling agents are commonly 1554 known to alter the top-level header of an RFC 822 message. In 1555 particular, they frequently add, remove, or reorder header fields. 1556 These operations are explicitly forbidden for the encapsulated 1557 headers embedded in the bodies of messages of type "message." 1558 1559 5.2.1. RFC822 Subtype 1560 1561 A media type of "message/rfc822" indicates that the body contains an 1562 encapsulated message, with the syntax of an RFC 822 message. 1563 However, unlike top-level RFC 822 messages, the restriction that each 1564 "message/rfc822" body must include a "From", "Date", and at least one 1565 destination header is removed and replaced with the requirement that 1566 at least one of "From", "Subject", or "Date" must be present. 1567 1568 1569 1570 Freed & Borenstein Standards Track [Page 28] 1571 1572 RFC 2046 Media Types November 1996 1573 1574 1575 It should be noted that, despite the use of the numbers "822", a 1576 "message/rfc822" entity isn't restricted to material in strict 1577 conformance to RFC822, nor are the semantics of "message/rfc822" 1578 objects restricted to the semantics defined in RFC822. More 1579 specifically, a "message/rfc822" message could well be a News article 1580 or a MIME message. 1581 1582 No encoding other than "7bit", "8bit", or "binary" is permitted for 1583 the body of a "message/rfc822" entity. The message header fields are 1584 always US-ASCII in any case, and data within the body can still be 1585 encoded, in which case the Content-Transfer-Encoding header field in 1586 the encapsulated message will reflect this. Non-US-ASCII text in the 1587 headers of an encapsulated message can be specified using the 1588 mechanisms described in RFC 2047. 1589 1590 5.2.2. Partial Subtype 1591 1592 The "partial" subtype is defined to allow large entities to be 1593 delivered as several separate pieces of mail and automatically 1594 reassembled by a receiving user agent. (The concept is similar to IP 1595 fragmentation and reassembly in the basic Internet Protocols.) This 1596 mechanism can be used when intermediate transport agents limit the 1597 size of individual messages that can be sent. The media type 1598 "message/partial" thus indicates that the body contains a fragment of 1599 a larger entity. 1600 1601 Because data of type "message" may never be encoded in base64 or 1602 quoted-printable, a problem might arise if "message/partial" entities 1603 are constructed in an environment that supports binary or 8bit 1604 transport. The problem is that the binary data would be split into 1605 multiple "message/partial" messages, each of them requiring binary 1606 transport. If such messages were encountered at a gateway into a 1607 7bit transport environment, there would be no way to properly encode 1608 them for the 7bit world, aside from waiting for all of the fragments, 1609 reassembling the inner message, and then encoding the reassembled 1610 data in base64 or quoted-printable. Since it is possible that 1611 different fragments might go through different gateways, even this is 1612 not an acceptable solution. For this reason, it is specified that 1613 entities of type "message/partial" must always have a content- 1614 transfer-encoding of 7bit (the default). In particular, even in 1615 environments that support binary or 8bit transport, the use of a 1616 content- transfer-encoding of "8bit" or "binary" is explicitly 1617 prohibited for MIME entities of type "message/partial". This in turn 1618 implies that the inner message must not use "8bit" or "binary" 1619 encoding. 1620 1621 1622 1623 1624 1625 1626 Freed & Borenstein Standards Track [Page 29] 1627 1628 RFC 2046 Media Types November 1996 1629 1630 1631 Because some message transfer agents may choose to automatically 1632 fragment large messages, and because such agents may use very 1633 different fragmentation thresholds, it is possible that the pieces of 1634 a partial message, upon reassembly, may prove themselves to comprise 1635 a partial message. This is explicitly permitted. 1636 1637 Three parameters must be specified in the Content-Type field of type 1638 "message/partial": The first, "id", is a unique identifier, as close 1639 to a world-unique identifier as possible, to be used to match the 1640 fragments together. (In general, the identifier is essentially a 1641 message-id; if placed in double quotes, it can be ANY message-id, in 1642 accordance with the BNF for "parameter" given in RFC 2045.) The 1643 second, "number", an integer, is the fragment number, which indicates 1644 where this fragment fits into the sequence of fragments. The third, 1645 "total", another integer, is the total number of fragments. This 1646 third subfield is required on the final fragment, and is optional 1647 (though encouraged) on the earlier fragments. Note also that these 1648 parameters may be given in any order. 1649 1650 Thus, the second piece of a 3-piece message may have either of the 1651 following header fields: 1652 1653 Content-Type: Message/Partial; number=2; total=3; 1654 id="oc=jpbe0M2Yt4s@thumper.bellcore.com" 1655 1656 Content-Type: Message/Partial; 1657 id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; 1658 number=2 1659 1660 But the third piece MUST specify the total number of fragments: 1661 1662 Content-Type: Message/Partial; number=3; total=3; 1663 id="oc=jpbe0M2Yt4s@thumper.bellcore.com" 1664 1665 Note that fragment numbering begins with 1, not 0. 1666 1667 When the fragments of an entity broken up in this manner are put 1668 together, the result is always a complete MIME entity, which may have 1669 its own Content-Type header field, and thus may contain any other 1670 data type. 1671 1672 5.2.2.1. Message Fragmentation and Reassembly 1673 1674 The semantics of a reassembled partial message must be those of the 1675 "inner" message, rather than of a message containing the inner 1676 message. This makes it possible, for example, to send a large audio 1677 message as several partial messages, and still have it appear to the 1678 recipient as a simple audio message rather than as an encapsulated 1679 1680 1681 1682 Freed & Borenstein Standards Track [Page 30] 1683 1684 RFC 2046 Media Types November 1996 1685 1686 1687 message containing an audio message. That is, the encapsulation of 1688 the message is considered to be "transparent". 1689 1690 When generating and reassembling the pieces of a "message/partial" 1691 message, the headers of the encapsulated message must be merged with 1692 the headers of the enclosing entities. In this process the following 1693 rules must be observed: 1694 1695 (1) Fragmentation agents must split messages at line 1696 boundaries only. This restriction is imposed because 1697 splits at points other than the ends of lines in turn 1698 depends on message transports being able to preserve 1699 the semantics of messages that don't end with a CRLF 1700 sequence. Many transports are incapable of preserving 1701 such semantics. 1702 1703 (2) All of the header fields from the initial enclosing 1704 message, except those that start with "Content-" and 1705 the specific header fields "Subject", "Message-ID", 1706 "Encrypted", and "MIME-Version", must be copied, in 1707 order, to the new message. 1708 1709 (3) The header fields in the enclosed message which start 1710 with "Content-", plus the "Subject", "Message-ID", 1711 "Encrypted", and "MIME-Version" fields, must be 1712 appended, in order, to the header fields of the new 1713 message. Any header fields in the enclosed message 1714 which do not start with "Content-" (except for the 1715 "Subject", "Message-ID", "Encrypted", and "MIME- 1716 Version" fields) will be ignored and dropped. 1717 1718 (4) All of the header fields from the second and any 1719 subsequent enclosing messages are discarded by the 1720 reassembly process. 1721 1722 5.2.2.2. Fragmentation and Reassembly Example 1723 1724 If an audio message is broken into two pieces, the first piece might 1725 look something like this: 1726 1727 X-Weird-Header-1: Foo 1728 From: Bill@host.com 1729 To: joe@otherhost.com 1730 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) 1731 Subject: Audio mail (part 1 of 2) 1732 Message-ID: <id1@host.com> 1733 MIME-Version: 1.0 1734 Content-type: message/partial; id="ABC@host.com"; 1735 1736 1737 1738 Freed & Borenstein Standards Track [Page 31] 1739 1740 RFC 2046 Media Types November 1996 1741 1742 1743 number=1; total=2 1744 1745 X-Weird-Header-1: Bar 1746 X-Weird-Header-2: Hello 1747 Message-ID: <anotherid@foo.com> 1748 Subject: Audio mail 1749 MIME-Version: 1.0 1750 Content-type: audio/basic 1751 Content-transfer-encoding: base64 1752 1753 ... first half of encoded audio data goes here ... 1754 1755 and the second half might look something like this: 1756 1757 From: Bill@host.com 1758 To: joe@otherhost.com 1759 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) 1760 Subject: Audio mail (part 2 of 2) 1761 MIME-Version: 1.0 1762 Message-ID: <id2@host.com> 1763 Content-type: message/partial; 1764 id="ABC@host.com"; number=2; total=2 1765 1766 ... second half of encoded audio data goes here ... 1767 1768 Then, when the fragmented message is reassembled, the resulting 1769 message to be displayed to the user should look something like this: 1770 1771 X-Weird-Header-1: Foo 1772 From: Bill@host.com 1773 To: joe@otherhost.com 1774 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) 1775 Subject: Audio mail 1776 Message-ID: <anotherid@foo.com> 1777 MIME-Version: 1.0 1778 Content-type: audio/basic 1779 Content-transfer-encoding: base64 1780 1781 ... first half of encoded audio data goes here ... 1782 ... second half of encoded audio data goes here ... 1783 1784 The inclusion of a "References" field in the headers of the second 1785 and subsequent pieces of a fragmented message that references the 1786 Message-Id on the previous piece may be of benefit to mail readers 1787 that understand and track references. However, the generation of 1788 such "References" fields is entirely optional. 1789 1790 1791 1792 1793 1794 Freed & Borenstein Standards Track [Page 32] 1795 1796 RFC 2046 Media Types November 1996 1797 1798 1799 Finally, it should be noted that the "Encrypted" header field has 1800 been made obsolete by Privacy Enhanced Messaging (PEM) [RFC-1421, 1801 RFC-1422, RFC-1423, RFC-1424], but the rules above are nevertheless 1802 believed to describe the correct way to treat it if it is encountered 1803 in the context of conversion to and from "message/partial" fragments. 1804 1805 5.2.3. External-Body Subtype 1806 1807 The external-body subtype indicates that the actual body data are not 1808 included, but merely referenced. In this case, the parameters 1809 describe a mechanism for accessing the external data. 1810 1811 When a MIME entity is of type "message/external-body", it consists of 1812 a header, two consecutive CRLFs, and the message header for the 1813 encapsulated message. If another pair of consecutive CRLFs appears, 1814 this of course ends the message header for the encapsulated message. 1815 However, since the encapsulated message's body is itself external, it 1816 does NOT appear in the area that follows. For example, consider the 1817 following message: 1818 1819 Content-type: message/external-body; 1820 access-type=local-file; 1821 name="/u/nsb/Me.jpeg" 1822 1823 Content-type: image/jpeg 1824 Content-ID: <id42@guppylake.bellcore.com> 1825 Content-Transfer-Encoding: binary 1826 1827 THIS IS NOT REALLY THE BODY! 1828 1829 The area at the end, which might be called the "phantom body", is 1830 ignored for most external-body messages. However, it may be used to 1831 contain auxiliary information for some such messages, as indeed it is 1832 when the access-type is "mail- server". The only access-type defined 1833 in this document that uses the phantom body is "mail-server", but 1834 other access-types may be defined in the future in other 1835 specifications that use this area. 1836 1837 The encapsulated headers in ALL "message/external-body" entities MUST 1838 include a Content-ID header field to give a unique identifier by 1839 which to reference the data. This identifier may be used for caching 1840 mechanisms, and for recognizing the receipt of the data when the 1841 access-type is "mail-server". 1842 1843 Note that, as specified here, the tokens that describe external-body 1844 data, such as file names and mail server commands, are required to be 1845 in the US-ASCII character set. 1846 1847 1848 1849 1850 Freed & Borenstein Standards Track [Page 33] 1851 1852 RFC 2046 Media Types November 1996 1853 1854 1855 If this proves problematic in practice, a new mechanism may be 1856 required as a future extension to MIME, either as newly defined 1857 access-types for "message/external-body" or by some other mechanism. 1858 1859 As with "message/partial", MIME entities of type "message/external- 1860 body" MUST have a content-transfer-encoding of 7bit (the default). 1861 In particular, even in environments that support binary or 8bit 1862 transport, the use of a content- transfer-encoding of "8bit" or 1863 "binary" is explicitly prohibited for entities of type 1864 "message/external-body". 1865 1866 5.2.3.1. General External-Body Parameters 1867 1868 The parameters that may be used with any "message/external- body" 1869 are: 1870 1871 (1) ACCESS-TYPE -- A word indicating the supported access 1872 mechanism by which the file or data may be obtained. 1873 This word is not case sensitive. Values include, but 1874 are not limited to, "FTP", "ANON-FTP", "TFTP", "LOCAL- 1875 FILE", and "MAIL-SERVER". Future values, except for 1876 experimental values beginning with "X-", must be 1877 registered with IANA, as described in RFC 2048. 1878 This parameter is unconditionally mandatory and MUST be 1879 present on EVERY "message/external-body". 1880 1881 (2) EXPIRATION -- The date (in the RFC 822 "date-time" 1882 syntax, as extended by RFC 1123 to permit 4 digits in 1883 the year field) after which the existence of the 1884 external data is not guaranteed. This parameter may be 1885 used with ANY access-type and is ALWAYS optional. 1886 1887 (3) SIZE -- The size (in octets) of the data. The intent 1888 of this parameter is to help the recipient decide 1889 whether or not to expend the necessary resources to 1890 retrieve the external data. Note that this describes 1891 the size of the data in its canonical form, that is, 1892 before any Content-Transfer-Encoding has been applied 1893 or after the data have been decoded. This parameter 1894 may be used with ANY access-type and is ALWAYS 1895 optional. 1896 1897 (4) PERMISSION -- A case-insensitive field that indicates 1898 whether or not it is expected that clients might also 1899 attempt to overwrite the data. By default, or if 1900 permission is "read", the assumption is that they are 1901 not, and that if the data is retrieved once, it is 1902 never needed again. If PERMISSION is "read-write", 1903 1904 1905 1906 Freed & Borenstein Standards Track [Page 34] 1907 1908 RFC 2046 Media Types November 1996 1909 1910 1911 this assumption is invalid, and any local copy must be 1912 considered no more than a cache. "Read" and "Read- 1913 write" are the only defined values of permission. This 1914 parameter may be used with ANY access-type and is 1915 ALWAYS optional. 1916 1917 The precise semantics of the access-types defined here are described 1918 in the sections that follow. 1919 1920 5.2.3.2. The 'ftp' and 'tftp' Access-Types 1921 1922 An access-type of FTP or TFTP indicates that the message body is 1923 accessible as a file using the FTP [RFC-959] or TFTP [RFC- 783] 1924 protocols, respectively. For these access-types, the following 1925 additional parameters are mandatory: 1926 1927 (1) NAME -- The name of the file that contains the actual 1928 body data. 1929 1930 (2) SITE -- A machine from which the file may be obtained, 1931 using the given protocol. This must be a fully 1932 qualified domain name, not a nickname. 1933 1934 (3) Before any data are retrieved, using FTP, the user will 1935 generally need to be asked to provide a login id and a 1936 password for the machine named by the site parameter. 1937 For security reasons, such an id and password are not 1938 specified as content-type parameters, but must be 1939 obtained from the user. 1940 1941 In addition, the following parameters are optional: 1942 1943 (1) DIRECTORY -- A directory from which the data named by 1944 NAME should be retrieved. 1945 1946 (2) MODE -- A case-insensitive string indicating the mode 1947 to be used when retrieving the information. The valid 1948 values for access-type "TFTP" are "NETASCII", "OCTET", 1949 and "MAIL", as specified by the TFTP protocol [RFC- 1950 783]. The valid values for access-type "FTP" are 1951 "ASCII", "EBCDIC", "IMAGE", and "LOCALn" where "n" is a 1952 decimal integer, typically 8. These correspond to the 1953 representation types "A" "E" "I" and "L n" as specified 1954 by the FTP protocol [RFC-959]. Note that "BINARY" and 1955 "TENEX" are not valid values for MODE and that "OCTET" 1956 or "IMAGE" or "LOCAL8" should be used instead. IF MODE 1957 is not specified, the default value is "NETASCII" for 1958 TFTP and "ASCII" otherwise. 1959 1960 1961 1962 Freed & Borenstein Standards Track [Page 35] 1963 1964 RFC 2046 Media Types November 1996 1965 1966 1967 5.2.3.3. The 'anon-ftp' Access-Type 1968 1969 The "anon-ftp" access-type is identical to the "ftp" access type, 1970 except that the user need not be asked to provide a name and password 1971 for the specified site. Instead, the ftp protocol will be used with 1972 login "anonymous" and a password that corresponds to the user's mail 1973 address. 1974 1975 5.2.3.4. The 'local-file' Access-Type 1976 1977 An access-type of "local-file" indicates that the actual body is 1978 accessible as a file on the local machine. Two additional parameters 1979 are defined for this access type: 1980 1981 (1) NAME -- The name of the file that contains the actual 1982 body data. This parameter is mandatory for the 1983 "local-file" access-type. 1984 1985 (2) SITE -- A domain specifier for a machine or set of 1986 machines that are known to have access to the data 1987 file. This optional parameter is used to describe the 1988 locality of reference for the data, that is, the site 1989 or sites at which the file is expected to be visible. 1990 Asterisks may be used for wildcard matching to a part 1991 of a domain name, such as "*.bellcore.com", to indicate 1992 a set of machines on which the data should be directly 1993 visible, while a single asterisk may be used to 1994 indicate a file that is expected to be universally 1995 available, e.g., via a global file system. 1996 1997 5.2.3.5. The 'mail-server' Access-Type 1998 1999 The "mail-server" access-type indicates that the actual body is 2000 available from a mail server. Two additional parameters are defined 2001 for this access-type: 2002 2003 (1) SERVER -- The addr-spec of the mail server from which 2004 the actual body data can be obtained. This parameter 2005 is mandatory for the "mail-server" access-type. 2006 2007 (2) SUBJECT -- The subject that is to be used in the mail 2008 that is sent to obtain the data. Note that keying mail 2009 servers on Subject lines is NOT recommended, but such 2010 mail servers are known to exist. This is an optional 2011 parameter. 2012 2013 2014 2015 2016 2017 2018 Freed & Borenstein Standards Track [Page 36] 2019 2020 RFC 2046 Media Types November 1996 2021 2022 2023 Because mail servers accept a variety of syntaxes, some of which is 2024 multiline, the full command to be sent to a mail server is not 2025 included as a parameter in the content-type header field. Instead, 2026 it is provided as the "phantom body" when the media type is 2027 "message/external-body" and the access-type is mail-server. 2028 2029 Note that MIME does not define a mail server syntax. Rather, it 2030 allows the inclusion of arbitrary mail server commands in the phantom 2031 body. Implementations must include the phantom body in the body of 2032 the message it sends to the mail server address to retrieve the 2033 relevant data. 2034 2035 Unlike other access-types, mail-server access is asynchronous and 2036 will happen at an unpredictable time in the future. For this reason, 2037 it is important that there be a mechanism by which the returned data 2038 can be matched up with the original "message/external-body" entity. 2039 MIME mail servers must use the same Content-ID field on the returned 2040 message that was used in the original "message/external-body" 2041 entities, to facilitate such matching. 2042 2043 5.2.3.6. External-Body Security Issues 2044 2045 "Message/external-body" entities give rise to two important security 2046 issues: 2047 2048 (1) Accessing data via a "message/external-body" reference 2049 effectively results in the message recipient performing 2050 an operation that was specified by the message 2051 originator. It is therefore possible for the message 2052 originator to trick a recipient into doing something 2053 they would not have done otherwise. For example, an 2054 originator could specify a action that attempts 2055 retrieval of material that the recipient is not 2056 authorized to obtain, causing the recipient to 2057 unwittingly violate some security policy. For this 2058 reason, user agents capable of resolving external 2059 references must always take steps to describe the 2060 action they are to take to the recipient and ask for 2061 explicit permisssion prior to performing it. 2062 2063 The 'mail-server' access-type is particularly 2064 vulnerable, in that it causes the recipient to send a 2065 new message whose contents are specified by the 2066 original message's originator. Given the potential for 2067 abuse, any such request messages that are constructed 2068 should contain a clear indication that they were 2069 generated automatically (e.g. in a Comments: header 2070 field) in an attempt to resolve a MIME 2071 2072 2073 2074 Freed & Borenstein Standards Track [Page 37] 2075 2076 RFC 2046 Media Types November 1996 2077 2078 2079 "message/external-body" reference. 2080 2081 (2) MIME will sometimes be used in environments that 2082 provide some guarantee of message integrity and 2083 authenticity. If present, such guarantees may apply 2084 only to the actual direct content of messages -- they 2085 may or may not apply to data accessed through MIME's 2086 "message/external-body" mechanism. In particular, it 2087 may be possible to subvert certain access mechanisms 2088 even when the messaging system itself is secure. 2089 2090 It should be noted that this problem exists either with 2091 or without the availabilty of MIME mechanisms. A 2092 casual reference to an FTP site containing a document 2093 in the text of a secure message brings up similar 2094 issues -- the only difference is that MIME provides for 2095 automatic retrieval of such material, and users may 2096 place unwarranted trust is such automatic retrieval 2097 mechanisms. 2098 2099 5.2.3.7. Examples and Further Explanations 2100 2101 When the external-body mechanism is used in conjunction with the 2102 "multipart/alternative" media type it extends the functionality of 2103 "multipart/alternative" to include the case where the same entity is 2104 provided in the same format but via different accces mechanisms. 2105 When this is done the originator of the message must order the parts 2106 first in terms of preferred formats and then by preferred access 2107 mechanisms. The recipient's viewer should then evaluate the list 2108 both in terms of format and access mechanisms. 2109 2110 With the emerging possibility of very wide-area file systems, it 2111 becomes very hard to know in advance the set of machines where a file 2112 will and will not be accessible directly from the file system. 2113 Therefore it may make sense to provide both a file name, to be tried 2114 directly, and the name of one or more sites from which the file is 2115 known to be accessible. An implementation can try to retrieve remote 2116 files using FTP or any other protocol, using anonymous file retrieval 2117 or prompting the user for the necessary name and password. If an 2118 external body is accessible via multiple mechanisms, the sender may 2119 include multiple entities of type "message/external-body" within the 2120 body parts of an enclosing "multipart/alternative" entity. 2121 2122 However, the external-body mechanism is not intended to be limited to 2123 file retrieval, as shown by the mail-server access-type. Beyond 2124 this, one can imagine, for example, using a video server for external 2125 references to video clips. 2126 2127 2128 2129 2130 Freed & Borenstein Standards Track [Page 38] 2131 2132 RFC 2046 Media Types November 1996 2133 2134 2135 The embedded message header fields which appear in the body of the 2136 "message/external-body" data must be used to declare the media type 2137 of the external body if it is anything other than plain US-ASCII 2138 text, since the external body does not have a header section to 2139 declare its type. Similarly, any Content-transfer-encoding other 2140 than "7bit" must also be declared here. Thus a complete 2141 "message/external-body" message, referring to an object in PostScript 2142 format, might look like this: 2143 2144 From: Whomever 2145 To: Someone 2146 Date: Whenever 2147 Subject: whatever 2148 MIME-Version: 1.0 2149 Message-ID: <id1@host.com> 2150 Content-Type: multipart/alternative; boundary=42 2151 Content-ID: <id001@guppylake.bellcore.com> 2152 2153 --42 2154 Content-Type: message/external-body; name="BodyFormats.ps"; 2155 site="thumper.bellcore.com"; mode="image"; 2156 access-type=ANON-FTP; directory="pub"; 2157 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 2158 2159 Content-type: application/postscript 2160 Content-ID: <id42@guppylake.bellcore.com> 2161 2162 --42 2163 Content-Type: message/external-body; access-type=local-file; 2164 name="/u/nsb/writing/rfcs/RFC-MIME.ps"; 2165 site="thumper.bellcore.com"; 2166 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 2167 2168 Content-type: application/postscript 2169 Content-ID: <id42@guppylake.bellcore.com> 2170 2171 --42 2172 Content-Type: message/external-body; 2173 access-type=mail-server 2174 server="listserv@bogus.bitnet"; 2175 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 2176 2177 Content-type: application/postscript 2178 Content-ID: <id42@guppylake.bellcore.com> 2179 2180 get RFC-MIME.DOC 2181 2182 --42-- 2183 2184 2185 2186 Freed & Borenstein Standards Track [Page 39] 2187 2188 RFC 2046 Media Types November 1996 2189 2190 2191 Note that in the above examples, the default Content-transfer- 2192 encoding of "7bit" is assumed for the external postscript data. 2193 2194 Like the "message/partial" type, the "message/external-body" media 2195 type is intended to be transparent, that is, to convey the data type 2196 in the external body rather than to convey a message with a body of 2197 that type. Thus the headers on the outer and inner parts must be 2198 merged using the same rules as for "message/partial". In particular, 2199 this means that the Content-type and Subject fields are overridden, 2200 but the From field is preserved. 2201 2202 Note that since the external bodies are not transported along with 2203 the external body reference, they need not conform to transport 2204 limitations that apply to the reference itself. In particular, 2205 Internet mail transports may impose 7bit and line length limits, but 2206 these do not automatically apply to binary external body references. 2207 Thus a Content-Transfer-Encoding is not generally necessary, though 2208 it is permitted. 2209 2210 Note that the body of a message of type "message/external-body" is 2211 governed by the basic syntax for an RFC 822 message. In particular, 2212 anything before the first consecutive pair of CRLFs is header 2213 information, while anything after it is body information, which is 2214 ignored for most access-types. 2215 2216 5.2.4. Other Message Subtypes 2217 2218 MIME implementations must in general treat unrecognized subtypes of 2219 "message" as being equivalent to "application/octet-stream". 2220 2221 Future subtypes of "message" intended for use with email should be 2222 restricted to "7bit" encoding. A type other than "message" should be 2223 used if restriction to "7bit" is not possible. 2224 2225 6. Experimental Media Type Values 2226 2227 A media type value beginning with the characters "X-" is a private 2228 value, to be used by consenting systems by mutual agreement. Any 2229 format without a rigorous and public definition must be named with an 2230 "X-" prefix, and publicly specified values shall never begin with 2231 "X-". (Older versions of the widely used Andrew system use the "X- 2232 BE2" name, so new systems should probably choose a different name.) 2233 2234 In general, the use of "X-" top-level types is strongly discouraged. 2235 Implementors should invent subtypes of the existing types whenever 2236 possible. In many cases, a subtype of "application" will be more 2237 appropriate than a new top-level type. 2238 2239 2240 2241 2242 Freed & Borenstein Standards Track [Page 40] 2243 2244 RFC 2046 Media Types November 1996 2245 2246 2247 7. Summary 2248 2249 The five discrete media types provide provide a standardized 2250 mechanism for tagging entities as "audio", "image", or several other 2251 kinds of data. The composite "multipart" and "message" media types 2252 allow mixing and hierarchical structuring of entities of different 2253 types in a single message. A distinguished parameter syntax allows 2254 further specification of data format details, particularly the 2255 specification of alternate character sets. Additional optional 2256 header fields provide mechanisms for certain extensions deemed 2257 desirable by many implementors. Finally, a number of useful media 2258 types are defined for general use by consenting user agents, notably 2259 "message/partial" and "message/external-body". 2260 2261 9. Security Considerations 2262 2263 Security issues are discussed in the context of the 2264 "application/postscript" type, the "message/external-body" type, and 2265 in RFC 2048. Implementors should pay special attention to the 2266 security implications of any media types that can cause the remote 2267 execution of any actions in the recipient's environment. In such 2268 cases, the discussion of the "application/postscript" type may serve 2269 as a model for considering other media types with remote execution 2270 capabilities. 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 Freed & Borenstein Standards Track [Page 41] 2299 2300 RFC 2046 Media Types November 1996 2301 2302 2303 9. Authors' Addresses 2304 2305 For more information, the authors of this document are best contacted 2306 via Internet mail: 2307 2308 Ned Freed 2309 Innosoft International, Inc. 2310 1050 East Garvey Avenue South 2311 West Covina, CA 91790 2312 USA 2313 2314 Phone: +1 818 919 3600 2315 Fax: +1 818 919 3614 2316 EMail: ned@innosoft.com 2317 2318 2319 Nathaniel S. Borenstein 2320 First Virtual Holdings 2321 25 Washington Avenue 2322 Morristown, NJ 07960 2323 USA 2324 2325 Phone: +1 201 540 8967 2326 Fax: +1 201 993 3032 2327 EMail: nsb@nsb.fv.com 2328 2329 2330 MIME is a result of the work of the Internet Engineering Task Force 2331 Working Group on RFC 822 Extensions. The chairman of that group, 2332 Greg Vaudreuil, may be reached at: 2333 2334 Gregory M. Vaudreuil 2335 Octel Network Services 2336 17080 Dallas Parkway 2337 Dallas, TX 75248-1905 2338 USA 2339 2340 EMail: Greg.Vaudreuil@Octel.Com 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 Freed & Borenstein Standards Track [Page 42] 2355 2356 RFC 2046 Media Types November 1996 2357 2358 2359 Appendix A -- Collected Grammar 2360 2361 This appendix contains the complete BNF grammar for all the syntax 2362 specified by this document. 2363 2364 By itself, however, this grammar is incomplete. It refers by name to 2365 several syntax rules that are defined by RFC 822. Rather than 2366 reproduce those definitions here, and risk unintentional differences 2367 between the two, this document simply refers the reader to RFC 822 2368 for the remaining definitions. Wherever a term is undefined, it 2369 refers to the RFC 822 definition. 2370 2371 boundary := 0*69<bchars> bcharsnospace 2372 2373 bchars := bcharsnospace / " " 2374 2375 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / 2376 "+" / "_" / "," / "-" / "." / 2377 "/" / ":" / "=" / "?" 2378 2379 body-part := <"message" as defined in RFC 822, with all 2380 header fields optional, not starting with the 2381 specified dash-boundary, and with the 2382 delimiter not occurring anywhere in the 2383 body part. Note that the semantics of a 2384 part differ from the semantics of a message, 2385 as described in the text.> 2386 2387 close-delimiter := delimiter "--" 2388 2389 dash-boundary := "--" boundary 2390 ; boundary taken from the value of 2391 ; boundary parameter of the 2392 ; Content-Type field. 2393 2394 delimiter := CRLF dash-boundary 2395 2396 discard-text := *(*text CRLF) 2397 ; May be ignored or discarded. 2398 2399 encapsulation := delimiter transport-padding 2400 CRLF body-part 2401 2402 epilogue := discard-text 2403 2404 multipart-body := [preamble CRLF] 2405 dash-boundary transport-padding CRLF 2406 body-part *encapsulation 2407 2408 2409 2410 Freed & Borenstein Standards Track [Page 43] 2411 2412 RFC 2046 Media Types November 1996 2413 2414 2415 close-delimiter transport-padding 2416 [CRLF epilogue] 2417 2418 preamble := discard-text 2419 2420 transport-padding := *LWSP-char 2421 ; Composers MUST NOT generate 2422 ; non-zero length transport 2423 ; padding, but receivers MUST 2424 ; be able to handle padding 2425 ; added by message transports. 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 Freed & Borenstein Standards Track [Page 44] 2467