Alphabet Soup: The Internationalization of Linux, Part 2
A message conforming to the MIME standard must have a version header of the form
Some mailers are sufficiently picky as to refuse to do MIME processing on mail lacking a valid MIME-Version header. This would be amusing, except for the fact that many mailers either do not implement the MIME functions correctly, produce an illegal MIME-Version header, or fail to insert the MIME-Version header at all.
The only version of the MIME header formats is 1.0. The MIME standard has undergone several revisions and expansions, but the basic format has remained unchanged at version 1.0. These revisions and standards have added new values for some of the parameters and specified interpretations for some ambiguous areas, but the syntax is unchanged. Case is irrelevant, in both the header tags and the values. The style of capitalization used below is more or less conventional, but not required.
One way to protect the content, or at least check that it has not been truncated, is to provide a Content-Length header. This is allowed by the MIME standard. The general type of encoding of the body is stated in the content-transfer-encoding header. The default is
Other allowed values are “quoted-printable” and “base64” (both implicitly 7-bit) and “8-bit”.
Next, the content type of the body is specified. In most messages it will be plain text, specified as
Other text types commonly found in mail these days are text/rich and text/html. A forwarded message (with no prefatory comments) may have content type specified as message/rfc-822. Messages can also be multipart. This is commonly used to add multimedia attachments, but can also be used to break up the body into components in different languages.
The MIME standard specifies that the character set is ASCII unless otherwise noted. RFC-822 requires that all headers be ASCII, so the MIME character set specification applies only to the body of the message. This specification is done using the charset parameter of the content type header. The default could be explicitly specified as
Note that the optional parameters are specified in keyword=value form. The correct way to specify ASCII is “us-ascii”, because that is the preferred form as registered with the IANA. A list of valid character sets for MIME is at http://www.hunnysoft.com/mime/. Europeans will commonly use
Content-Transfer-Encoding: 8-bit Content-Type: text/plain;charset=iso-8859-1
The Japanese standard for electronic messages is a version of ISO 2022 called ISO-2022-JP. In fact, this encoding needs to be extended only slightly. It can be used for Chinese and Korean as well and even as a multilingual encoding. The extended version is known as ISO-2022-JP-2 or ISO-2022-INT.
The MIME standard also provides a mechanism for putting non-ASCII text in headers. RFC-822 makes this illegal, so use of this mechanism will result in gibberish being displayed by mail programs that do not implement MIME. However, most mail programs today are MIME-aware, so this should not present any problems. If your correspondents complain, tell them to get a MIME-aware mailer.
The mechanism is simple. Non-ASCII text is encoded using either quoted-printable encoding or BASE64 encoding according to convenience, and bundled up into an encoded word. The reason it must be bundled into an encoded word is that the Content-Type header applies to the body, and if the body is multipart, there will be no charset parameter. Using a special header to control the format of headers seems silly, so the encoded word itself will contain the necessary character set information.
The format of an encoded word begins with the characters =?, continues with the name of the encoding used, the character ?, either the letter Q (for “quoted printable”) or the letter B (for BASE64), the character ? again, the encoded text, and finally the characters ?=. For example, the French word “voil<\#226>” is encoded =?ISO-8859-1?Q?V=F3il=E0?=. Incredibly inefficient, of course, but these will be used only a few times per message. Note that one extra restriction is put on quoted printable encoding, not present in the basic encoding: any question marks in the encoded text must be encoded. Otherwise, the sequence <question mark><encoded octet> would be interpreted as the end of the encoded word.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- ServersCheck's Thermal Imaging Camera Sensor
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- The Italian Army Switches to LibreOffice
- Linux Mint 18
- Petros Koutoupis' RapidDisk
- Oracle vs. Google: Round 2
- The FBI and the Mozilla Foundation Lock Horns over Known Security Hole
- Ubuntu Online Summit
- Privacy and the New Math
Until recently, IBM’s Power Platform was looked upon as being the system that hosted IBM’s flavor of UNIX and proprietary operating system called IBM i. These servers often are found in medium-size businesses running ERP, CRM and financials for on-premise customers. By enabling the Power platform to run the Linux OS, IBM now has positioned Power to be the platform of choice for those already running Linux that are facing scalability issues, especially customers looking at analytics, big data or cloud computing.
￼Running Linux on IBM’s Power hardware offers some obvious benefits, including improved processing speed and memory bandwidth, inherent security, and simpler deployment and management. But if you look beyond the impressive architecture, you’ll also find an open ecosystem that has given rise to a strong, innovative community, as well as an inventory of system and network management applications that really help leverage the benefits offered by running Linux on Power.Get the Guide