UNIX Unleashed, Internet Edition

- 16 -

MIME--Multipurpose Internet Mail Extensions

by Robin Burk

HTML underlies the World Wide Web, but it is only one of a number of standard data types whose definition makes the Web possible. In this chapter, we'll look at the broader set of data formats used by Web and Internet programs to bridge the gaps between diverse operating systems and hardware platforms.

The topics covered in this chapter include:

How MIME became an Internet standard
Common MIME data types
Web pages, Web servers and MIME

TIP: Understanding what MIME formats are and how they are approved can help you and your users solve day-to-day problems with interpreting e-mail attachments or choosing browser plug-in software.

How MIME Became an Internet Standard

MIME (Multipurpose Internet Mail Extensions) is one of the Internet protocol standards defined by the Internet Engineering Task Force (IETF). Once associated primarily with electronic mail, MIME has evolved to become an important element supporting multimedia applications on the Net. In order to understand MIME and how it operates, it's helpful to step back and see how it got to where it is today.

How Internet Standards Are Adopted

The IETF is the official body that proposes and adopts communications protocols, data formats, and similar conventions to be supported by the public Internet. For instance, all of the familiar Internet communications protocols, such as TCP, IP, PPP and SLIP, are formally defined by IETF documents called Requests For Comment (RFCs). The IETF also defines the Simple Mail Transfer Protocol (SMTP), the Network Timing Protocol (NTP), and newer, multimedia protocols such as the Resource reSerVation Protocol (RSVP) and the Real Time Protocol (RTP) that support interactive conferencing over the Net.

Not all RFCs adopted by the IETF become Internet standards. Those that are proposed for the standards track often begin as Internet Drafts submitted by one or more people from industry or academia. Internet Drafts must advance to RFC status within six months of publication or they are removed from consideration.

Once advanced to RFC status, a proposed protocol is open for comment and can be superseded by a revised version based on feedback from the technical community. Any interested party can participate in the discussion, either online or at face-to-face meetings. Each RFC is shepherded and debated within a specific Working Group of the IETF. The Working Groups meet from time to time to hammer out the details of proposed protocols.

Some RFCs are not intended for adoption as Internet standards. A few contain comments or information about a given technical scenario or about the standards process itself. Other informational RFCs do define protocols in detail, but are not proposed for adoption as standards because they were developed by a single company that chooses to retain control of their evolution. The RFCs that describe successive versions of Sun's Network File System fall into this category. By publishing the definition of the NFS protocol, Sun allows and encourages other vendors to support NFS in their own operating systems. In this way NFS has become a de facto, but not official, Internet standard.

Finally, some RFCs are designated as experimental (available for limited implementation to evaluate their effectiveness) or historical (once in use, now effectively replaced by an alternative protocol).

Official standards are not necessarily required to be adopted by all Internet server or client systems. A standard may fall into any of several categories:

Required--All systems must implement this protocol. The Internet Protocol (IP) and associated Internet Control Message Protocol (ICMP) are among the required standard protocols for all systems that are directly connected to the public Internet.
Recommended--All systems should implement this protocol unless there is strong reason to do otherwise. The Transmission Control Protocol (TCP), File Transfer Protocol (FTP), and Telnet are some of the recommended standard protocols for the Internet.
Elective--Any system that is going to implement something along these lines must do so in accordance with the RFC. For example, MIME is an elective standard protocol for the Internet.
Limited Use--Use of these protocols is limited to special circumstances dues to the experimental, historic, or specialized nature of the protocol.
Not Recommended--General use of these protocols is not recommended due to their limited functionality, specialized nature, or experimental or historic state.

The InterNIC Web site contains links to online copies of the RFCs, Internet Drafts, and other Internet-related information. Point your browser to http://www.internic.net/ds/ for the main site and to http://ds.internic.net/ds/dspg0intdoc.html to search for specific topics in the RFC database.

NOTE: RFCs are never modified once they have been submitted and adopted. Instead, new RFCs are created when it is proposed that an existing protocol be modified. The IETF online index contains an entry for each RFC, which, among other information, states which RFCs it supersedes and which supersede it. You can follow this chain to view the evolution of a protocol over time. In most cases, the authors of newer RFCs will explicitly state in their documents why they're proposing changes to the older standards or drafts.

TIP: The RFC mechanism itself is used to document all of the RFCs that have been officially adopted as Internet standards at any given time. As of June 1997, the list of current standards was contained in RFC 2200. You can look up the index entry for this RFC to determine if any new standards have been adopted since that time.

This is an easy way to acquaint yourself with the current standards for the Internet without retracing the historical development of the Internet protocol suite.

TIP: The definition of a protocol in a Request For Comment can look pretty formidable at first reading. These documents are intended to constrain and direct software implementers, and are often quite formal and abstract in tone.
Most RFCs do have an introduction and rationale that are more accessible, because their purpose is to gain the support of the Internet technical community as a whole. Reading these initial pages of an RFC can help you understand the intent of a protocol and how it fits into the overall Internet architecture.

It will also help you to read an RFC to know that the specific values of parameters, codes, and identifiers for a protocol are maintained in a separate document. The Internet Assigned Numbers Authority (IANA) coordinates the values assigned to parameters throughout the Internet protocols. At the time this is being written, RFC 1700 defined the assigned number codes. The latest Standards list will always identify the associated Assigned Numbers RFC.

You can use the assigned numbers RFC, along with the message formats defined in the protocol RFCs, to completely decode network messages captured by software and hardware monitors. Usually the most common message formats and parameter values are documented by your software vendor; when troubleshooting a network problem, however, it may be necessary to identify and decode an uncommon message type. Knowing your way around the RFCs gives you another tool to use in troubleshooting network operations.

History of MIME

As its name suggests, MIME originally was associated with electronic mail transmission over the Internet.

The core standards for Internet e-mail are defined in RFC 821 "Simple Mail Transfer Protocol" and RFC 822 "Standard for the Format of ARPA Internet Text Messages". Together, these documents define a common format for e-mail encoded as U.S. ASCII characters.

Within the original ARPANET, a single, text-oriented e-mail standard was practical and appropriate. Over time, however, the ARPANET underwent several significant changes, among them a transition from its original home in the Department of Defense to become the public Internet, which in turn now supports the World Wide Web and attracts truly global use.

As the scope of the public internetwork expanded, it became useful to define ways for e-mail to be exchanged across the Net without requiring non-ASCII systems to convert all message character sets. Non-U.S. ASCII e-mail traveling over the Internet is analogous to letters written in French or Chinese being sent through the U.S. Postal Service. All that is required is that the letter be enclosed within an envelope that carries the standard addressing information in a form readable to the Postal Service's employees and scanning machines.

In addition, users often wanted to attach files of various formats and origins to their e-mail messages, much as the writer of a letter might include a newspaper clipping, photograph, or check in the letter's envelope. Potential e-mail attachments might be the output of standard applications such as word processors and spreadsheets, or might consist of binary executable files, graphical images, or even data files from custom applications.

MIME was intended to support both of these scenarios. At its most fundamental, MIME encodes e-mail messages into standard formats beyond the ASCII text format defined in the original ARPANET protocols.

By extending these formats to include multi-part messages, MIME allows e-mail messages to have attached files in a variety of formats. Prior to the adoption of the MIME protocols, users on diverse systems (and often on similar systems) could not easily pass non-text information along with their e-mail.

The MIME protocol provides both a list of currently-defined message types and also a mechanism for adding new formats over time. This means that MIME can evolve to support new multimedia formats, application file types, languages, character sets, and other data types as they become widespread or otherwise useful within the Internet's technical environment. It is this breadth of scope, and its open-ended nature, that places MIME in the category of "elected" rather than "recommended" or "required" Internet protocols.

MIME data type definitions soon found uses beyond e-mail. When the founders of the World Wide Web created a hypertext capability, they found it easy to use the MIME framework to define a new hypertext data type to specify HTML scripts. And when the language rules for HTML were written, the authors found it easy to allow graphics to be embedded in Web pages because MIME had already centralized the definition of graphical image formats.

Today there are MIME formats for audio, video, ZIPped, and vendor-specific data types. MIME even provides a way to name a data type for which no official IANA recognition has yet occurred. This allows software vendors to create optimized or specialized formats that, if they achieve widespread adoption, are then likely to be added to the official list. Developers of browser clients and browser plug-ins have made extensive use of this capability. In this way, MIME plays a critical role in the rapid evolution of both the World Wide Web and of the wider use of multimedia in computing. All this from what started as humble extensions to ASCII e-mail messages!

The MIME Data Type Scheme

For many years, the core MIME documents were RFCs 1521 and 1522. In November 1996, however, a new series of MIME standards were proposed in RFCs 2045 through 2049. These documents reflect the great variety of data types that had evolved, especially for multimedia applications, since the original MIME definitions were established.

RFC 2046 outlines the media types that are supported by MIME. More accurately, this RFC outlines the categories into which such data types can be placed.

The first distinction to be made is between discrete media and composite media. Discrete media contain a single entity or data object. An entity consists of a MIME header and either the contents of a message or one of the parts of a multi-part message. MIME treats discrete media as opaque objects that are passed on to the receiving application without interpretation or other processing.

Composite media contain multiple entities, which can be of the same or different types. Composite media require MIME processing to correctly handle the various entities being transmitted together.

MIME defines top-level media types, which are used to specify the general type of data, and subtypes, which typically specify a particular format for that type of data. New top-level media types and lower-level subtypes may be added as needed. The definition of a top-level media type includes the following:

A name and description of the type, along with the criteria by which a particular media format would be known to fall under this type
Parameters associated with all formats (subtypes) of this type
How a user agent or a gateway should handle otherwise unknown subtypes of this type
Other issues and considerations regarding the handling of entities of this type
Restrictions on content-transfer-encodings for entities of this type to ensure that the information being transmitted is not inadvertently distorted

There are five discrete top-level media types initially defined in the new MIME scheme. These are:

Text--Readable text, including those word processor formats whose content is more or less readable when displayed on a screen or printer.
Image--Static graphical images that require a graphical display (monitor), a graphic printer, or a fax machine for the user to view the information. Subtypes include:
- Audio--Information requiring a speaker, telephone, or similar device to allow the user to hear the contents.
- Video--Information requiring the capability to display moving images, typically with specialized hardware and software.
- Application--Other kinds of data, either binary files (typically stored into a disk file for the user to manage) or information to be processed by an application program. The association of an appropriate application with a specific application data subtype is made at the client machine.

The two top-level composite media types are:

Multipart--Data consisting of multiple entries of independent data types. Subtypes include generic "mixed" entities; "alternative" formats of the same data; "parallel" entities that are intended to be view simultaneously (as with audio and video that go together); and "digest" for transmitting multiple mail messages in a single message.
Message--An encapsulated message. The "rfc822" subtype is used when the encapsulated message is itself an ASCII mail message as defined by RFC822. The "partial" subtype allows large messages to be fragmented and later reassembled. The "external-body" subtype passes a reference to a large, external data source rather than the contents of the source.

MIME types that are not recognized by IANA are given names that start with "x-". For instance, the MPEG layer-2 format for audio information, which is associated with file extension .mp2, is mapped to the MIME type audio/x-mpeg". Officially recognized MIME types are generally supported by the relevant server and client software, but private or experimental types may require explicit configuration at both the Internet server and the client workstation in order to be processed correctly.

Common MIME Data Types

Although the top-level MIME media types correspond to basic concepts that all users would understand, not all subtypes fall under the obvious media category. Those that are associated with specific application software, for instance, may be classified as application types rather than text, image, or audio, despite being widely available over the Internet. Often these data types require a browser plug-in before their contents will be correctly processed when visiting a Web site, or the client browser might ask you to specify which application is associated with that subtype or file extension.

Because the official status of data types is changing rapidly, especially with the rapid expansion of multimedia applications, I've grouped these descriptions by the intuitive categories to which they belong rather than their official status. Each data format description that follows includes the common format name and current MIME name, the file extension(s) associated with the media, and a brief description.

Text Types

Table 16.1 lists the most common MIME text types.

Table 16.1. Text types commonly found on the Internet.

MIME Type File Extensions Common Format Name Description

text/plain txt Text US ASCII text with no format tags

text/html .html, .htm HyperText Markup Language Defines World Wide Web pages

application/rtf .rtf Rich Text Format Vendor-independent word processing file type with some formatting capabilities

application/

postscript ps, .ai, .eps PostScript Print and display format

application/pdf pdf Portable Document Adobe's PDF Format used by Acrobat for platform-independent display and printing

Image Types

Table 16.2 lists the most common MIME image types.

Table 16.2. Image types commonly found on the Internet.

MIME Type File Extensions Common Format Name Description

image/gif .gif Graphics Interchagnge Format Common format for static images on the Web. 8 bit color and lossless compression; very good for drawings. Patented by Unisys.

image/jpeg ..jpeg, .jpe, .jpg Joint Photographic Experts Group (JPEG) 24 bit color with lossy compression. Often used for photos and high-detail drawings on the Web.

image/png .png Portable Network Graphics New format proposed by the IETF as a non- patented replacement for GIF and some uses of TIFF

image/tiff .tiff Tag Image File Format Developed by Aldus Corp. and adopted for experimental use in remote printing over the Internet

Audio Types

Table 16.3 lists the most common MIME audio types.

Table 25.3. Audio types commonly found on the Internet

MIME Type File Extensions Common format name Description

audio/basic .au, .snd M-law Low fidelity, very common on the Web. First introduced by Sun Microsystems and NeXT Computer

audio/mpeg .mp2 Motion Picture Experts Group (MPEG) MPEG-1 audio format with layer II compression. Most systems have drivers for this format, which is also used by some recording and broadcasting companies.

audio/x-aiff .aif, .aiff, .aifc Audio Interchange File Format (AIFF) Apple, Silicon Graphics and Macintosh format for conversion between audio types.

audio/x-voc .voc Creative Voice Used by Creative Lab's Sound Blaster and Sound Blaster Pro audio cards audio/x-midi

audio/x-wav .wav Resource Interchange File Format Waveform Audio Format Adaptive Pulse Code Modulation (APCM) format native to Microsoft Windows environments.

audio/x-xdma .xdm RealAudio Streaming audio format used for real-time audio transmission over the Internet.

??? .mid,.midi Musical Instrument Digital Interface (MIDI) Format used to describe how synthesizers and samplers should reproduce sounds; also used for electronic music composition.

Video Types

Table 16.4 lists the most common MIME video types.

Table 16.4. Video types commonly found on the Internet.

MIME Type File Extensions Common format name Description

video/mpeg . mpeg, .mpg, .mpe Motion Picture Experts Group (MPEG) Video portion of MPEG-2 standard; sometimes combined with MPEG-1 Level II audio

video/quicktime .mov, .moov, .qt QuickTime Proprietary to Apple Computers, combining data and resource forks that are processed in parallel

video/x-msvideo .avi Microsoft's video for Windows. Native to the Windows environment; many translators to QuickTime exist.

application/x-vrml .wrl Virtual Reality Modeling Language Non-proprietary format for 3-dimensional world models

Application Types

Table 16.5 lists the most common MIME application types.

Table 16.5. Application types commonly found on the Internet.

MIME Type File Extensions Common format name Description

application/ x-gzip .gz Gnu ZIP Freeware compression for the UNIX environment

application/ x-compress .z Compress Another common UNIX compression utility application/

x-zip .zip ZIP Multiplatform compression; widely used.

application/ x-tar .tar Tape archive Standard UNIX archive format

application/ x-stuffit .sit Macintosh archive Used for many image and video libraries

Note that there are many other application types that can be sent over the Internet as file attachments to e-mail. Spreadsheet and word processor files are the most common, along with the output of presentation software. E-mail clients, browsers, and similar software that receives such formats will simply store the data in a disk file unless configured to map the file extension or private MIME type to a specific executable for processing.

One important subcategory of the application media type is the variety of compression schemes applied to general files. (Note that many audio, image, and video formats include standard compression/decompression that is automatically applied when the data is processed.) Table 16.6 lists the most common compression types and their public or private MIME names.

Table 16.6. Compression types commonly found on the Internet.

Column Heading Column Heading Column Heading Column Heading

Table Entry Table Entry Table Entry Table Entry

Table Entry Table Entry Table Entry Table Entry

Last Table Entry Last Table Entry Last Table Entry Last Table Entry

Multipart and Message Types

These MIME formats are primarily used for e-mail messages with multiple parts and are manipulated by e-mail server and client software

Listing 16.1 shows a compound e-mail message, which includes the text of a message received earlier, the sender's response, and an attached file. Each element of this message has its own MIME format and is a separate entity within the compound message.

Listing 16.1. MIME supports compound e-mail messages.

X-POP3-Rcpt: robink@wizard.net
Return-Path: robink@wizard.net
From: robink@wizard.net
Date: Mon, 2 Jun 1997 10:29:20 -0500
Subject: example of forwarding a compound email message
To: rburk@digicon.com
Content-Description: cc:Mail note part
Here's my reply, which quotes the original message in full.
-----Original Message-----
From:   rburk@digicon.com
Sent:   Friday, 30 May 1997  11:40:00
To:     robink@wizard.net
Subject:        here's an original message with attachments
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Description: cc:Mail note part
Attached are two files in different formats.
<<File: unx.ini>>S<<File: global95.dot>>u<<File: global97.dot>>b

Web Pages, Web Servers, and MIME

Web servers are the software that runs on a system that provides file, Internet, and World Wide Web access to client workstations.

A variety of commercial and shareware Web servers are available. In most cases, the operating system of choice for server systems is one or more flavors of UNIX.

The primary job of a Web server is to transmit the HTML scripts that make up a World Wide Web page. The client's browser software then interprets the HTML script and displays the Web page contents on the client system's monitor.

Along with the text whose presentation is specified by the HTML script, a Web page may contain images or other multimedia content stored in separate files on the server machine. The client browser will issue requests to the server each time it finds a tag referring to such a file. The Web server software must find the file, encode it appropriately (using standard MIME schemes) so that the integrity of the transmitted information can be verified and send the file off to the client machine. At the client, the browser then decodes the information and displays it, plays it over the speaker, or otherwise presents it as part of the web page.

Web pages may also make use of Common Gateway Interface calls. CGI provides a way for HTML scripts to exchange information with other applications running on the server system. These most commonly are database applications accessed by HTML forms; however, the Web page may contain server-side html logic, which causes the server itself to take different actions depending on what has come before. A Web page form may ask the user to specify whether or not his browser can support frames, for instance. If the user says it does not, the server will then present a non-framed version of the Web page to the user at his workstation. The Web server software is responsible for processing server-side HTML logic.

In each of these cases, MIME data types are at work. HTML itself is a MIME text type, as are the common image, audio, and video formats for Web page multimedia content. Even application and private data types must be encoded properly to protect against transmission errors, and MIME defines appropriate encoding schemes for this purpose.

Many servers and browsers come pre-configured to recognize the standard MIME data types. Some standard types, and all private types, must be defined to the server and browser software before they can be correctly processed.

To configure a MIME data type in the Netscape Navigator browser (version 3.01), for instance, select options, general preferences and helpers from the menu tree. Figure 16.1 shows how the helpers screen allows you to create associations between MIME types, file extensions, and the actions to be taken when such a data object is received.

Figure 16.1
Adding MIME types to the client browser.

Each Web server has its own way of configuring MIME types. Typically, this is done by means of a configuration file read when the server process is created. The Apache Web server, included on the CD-ROM for this book, looks for its configuration files in /usr/local/httpd/conf unless told that the configuration files are located elsewhere. The basic server configuration file httpd.conf and the server resource map srm.conf tell the server which MIME data types are legal and how to process the various data contents. For more information, see the Apache documentation on the CD-ROM or online at http://www.apache.org/docs/.

Conclusion

In this chapter we've taken a brief look at the data format standards that allow diverse hardware and software platforms to exchange data across the Internet and the Web. Understanding how the MIME standard was established, what data formats it covers, and how it is used by Web pages and Web servers can help you correctly configure e-mail and browser software for yourself and your system's users.

An extensible Internet standard, MIME is a fundamental enabling technology for Internet e-mail, the World Wide Web, and most networked multimedia applications.

MIME Type	File Extensions	Common Format Name	Description
text/plain	txt	Text	US ASCII text with no format tags
text/html	.html, .htm	HyperText Markup Language	Defines World Wide Web pages
application/rtf	.rtf	Rich Text Format	Vendor-independent word processing file type with some formatting capabilities
application/
postscript	ps, .ai, .eps	PostScript	Print and display format
application/pdf	pdf	Portable Document Adobe's PDF	Format used by Acrobat for platform-independent display and printing

Column Heading	Column Heading	Column Heading	Column Heading
Table Entry	Table Entry	Table Entry	Table Entry
Table Entry	Table Entry	Table Entry	Table Entry
Last Table Entry	Last Table Entry	Last Table Entry	Last Table Entry