draft-ietf-idn-mua-00.txt   [plain text]


Internet Draft                                             Maynard Kang
draft-ietf-idn-mua-00.txt                                   i-EMAIL.net
February 5, 2001                                
Expires on August 5, 2001                               

          Internationalizing Domain Names in Mail User Agents
 
Status of this Memo

This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."


     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.



Abstract

This document describes a way where domain names used in Internet e-mail 
can be internationalized by making changes only to end-user Mail User 
Agents and, by doing so, avoid damaging other applications which handle
Internet e-mail, such as Message Transfer Agents and Delivery Agents.

1. Introduction

One of the proposed solutions for internationalized domain names (IDN)
involves only updating the user applications with no changes required
to the DNS protocol, servers and resolvers [IDNA] compared to other
solutions which require changes to be made to protocol, servers,
resolvers and applications.

The underlying principle of [IDNA] may be similarly applied to the
Internet e-mail system today - by effecting changes to only the Mail
User Agent (MUA) component of the e-mail system. Thus, existing
Message Transfer Agents, Delivery Agents and other applications which 
handle e-mail do not have to be changed at all.

1.1 Definitions and Conventions

Usage of terms related to the character encoding model are in
reference to Unicode Technical Report 17 [UTR17].

The terms "international character", "non-ASCII character" and 
"multilingual character", which are used interchangeably, are taken 
to mean any abstract character which is not included in the range 
specified by [US-ASCII].

1.2 Terminology

The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",
and "MAY" in this document are to be interpreted as described in RFC 
2119 [RFC2119].

1.3. Design Philosophy

As the Internet e-mail system is a diverse, distributed and 
heterogeneous system with many vendors deploying a vast number of 
applications, it is of utmost importance that interoperability amongst 
these various components is maintained. Thus, the ideal solution would 
be one which does not compromise or damage the operation of any of these 
existing components once internationalized domain names are encountered.

Also, solutions which call for changes to be made to many or even all
components of the Internet e-mail system would require far too much
time and effort to deploy, given that Internet e-mail has such a huge
installed base.

This solution adheres to both of the above principles, in that
interoperability is preserved and that the cost and speed of 
implementation is low. All that the user has to do to use IDNs in e-mail 
is update his or her MUA.

1.4. IDN Summary

This solution specifies an IDN architecture of arch-3 (just send ACE)
and a transition strategy of trans-1 (always do current plus new
architecture) as described in [IDNCOMP]. The choice of ACE format is not 
defined in this document, but MUST be the same as that specified in 
[IDNA] in order to maintain uniqueness and consistency.

1.5. E-mail Internationalization Summary

As many Internet e-mail standards such as the SMTP protocol [RFC821]
and the e-mail message format [RFC822] only specify usage of the 7-bit
ASCII character set [US-ASCII], international characters which use octet-
based character encoding schemes (CES) cannot be used in e-mail 
transmission, headers and bodies.

Although this issue has been addressed in [RFC2045] for message bodies
and [RFC2047] for message headers through the use of a Transfer Encoding
Syntax (TES) such as Quoted-Printable or Base64, there is no similar 
solution which extends the functionality of [RFC821] to include usage of
international characters, except for [RFC1652] which allows transmission 
of 8-bit data passed by the DATA command in an SMTP session.

[RFC1652] however, does not fully address the problem of using IDNs in
an SMTP session - the IDN may be used in areas within the SMTP session 
other than the DATA command, such as the MAIL FROM and RCPT TO commands, 
where an IDN may be part of the e-mail address(es) specified there.

Hence, this would be a major stumbling block to deploying "just-send-
8bit" IDNs for use in Internet e-mail, as these IDNs would not be able
to be used in SMTP e-mail transmissions due to [RFC821] restrictions.

2. Architectural Overview

The end-user MUA may encounter IDNs in the scenarios below:

(i)   When specifying the transmission server (i.e. SMTP server)
(ii)  When specifying the retrieval server (i.e. POP3/IMAP4/any other
      retrieval mechanism)
(iii) When specifying e-mail addresses during composition of a message
(iv)  When reading messages with e-mail addresses in it

As with [IDNA], the MUA is updated in a similar fashion to process IDNs 
which are input by users and process IDNs which are displayed to users, 
in all of the scenarios above.

For (i) and (ii), the IDN MUST be handled in the same manner as 
specified in [IDNA]. The method of handling an IDN For (iii) and (iv) is
described below in 2.1.

2.1 Interfaces between E-mail components when composing/reading a mail

The interfaces between e-mail components can be pictorially represented 
as shown below.

The example assumes the setup of a POP3/IMAP4 retrieval client and 
server, but the exact nature of end-to-end e-mail transmission may vary
accordingly (e.g. elm or pine would read directly from the mail store). 
However, these variations do not impact an accurate description of this 
solution to a large extent as no changes are required at these levels.

        +------+                                       +------+
        | User |                                       | User |
        +------+                                       +---^--|
          | User Input:          User Display: Characters/ |
          | Keyboard/Pen/etc        Glyphs on CRT or other |
    +-----v---------------+    Representation (e.g. sound) |
    | Input Method Editor |                   +------------|-----+
    +---------------------+                   | Rendering Engine |
        | Input: Any localized/               +---------^--------+
        | internationalized      Output: Any localized/ |
        | charset                     internationalized |
   +----v-----------------+                     charset |
   | +------------------+ |                  +----------|-------------+
   | | Mail Composition | |                  | +--------------+       |
   | | Interface        | | Sender's         | | Mail Reading |       |
   | +------------------+ | MUA              | | Interface    |       |
   |    |                 |                  | +--------^-----+       |
   |    | Nameprepped ACE |       Receiver's |          | Nameprepped |
   |    v                 |              MUA |          | ACE         |
   | +-------------+      |                  | +-------------------+  |
   | | SMTP Client |      |                  | | POP3/IMAP4 Client |  |
   | +-------------+      |                  | +-------------------+  |
   +----|-----------------+                  +----------^-------------+
        | Nameprepped                                   | Nameprepped
        v ACE         Nameprepped       Nameprepped     | ACE
     +-------------+  ACE   +------------+  ACE   +-------------------+
     | SMTP Server | -----> | Mail Store | -----> | POP3/IMAP4 Server |
     +-------------+        +------------+        +-------------------+

2.1.1 Interface between User and Input Method Editor

For ASCII characters, input is straightforward: the user types on the 
keyboard and whichever character that is pressed is sent to the 
application.

However, for international characters, the end-user has to use a script-
specific Input Method Editor (IME), which may or may not be built-into
the OS, to interpret what the user communicates to the system and
thereafter send the respective international characters to the 
application.

For example, for input of Chinese characters, some users use IMEs
which support the "Pinyin" input method. When a user types "zhongguo" 
(in ASCII characters) on the keyboard and selects the characters which
represent "China" (in Chinese) from a list, the IME sends the 
international characters to the application in a user-determined 
charset (e.g. GB2312).

2.1.2 Interface between Input Method Editor and MUA Composition 
      Interface

The MUA mail composition interface (i.e. the "Compose Message"
function of the MUA) SHOULD be able to accept IDNs using 8-bit character 
encoding schemes, including those represented in any localized (e.g. 
GB2312) or internationalized (e.g. UTF-8) charsets.

This input typically takes place where e-mail addresses are entered
such as the "From", "To", "Cc", "Bcc" fields, amongst others, as IDNs 
may be used at the right-hand-side of the "@" sign in an e-mail address
(domain-parts).

The mail composition interface MAY allow ACE input for the same
reasons as specified in [IDNA], but is not recommended as ACE is opaque 
and ugly.

2.1.3 Interface between MUA Composition Interface and SMTP Client

The MUA composition interface communicates with the SMTP client in the
MUA typically through internal function calls within the software itself
or through an API. It is at this level where ACE conversion of any IDN
encountered by the MUA composition interface takes place.

Before converting the name parts of the IDN into ACE, the MUA MUST
prepare each name part as specified in [NAMEPREP]. Thereafter, the MUA 
MUST convert the name parts into ACE before passing any data to the SMTP
client.

The SMTP client then prepares the e-mail for transmission using the
SMTP protocol [RFC821], and thereafter establishes an SMTP connection 
with the user-specified SMTP server to transmit the e-mail.

It is important to note that an IDN specified in the parameters of any
SMTP command MUST be represented in nameprepped ACE at this point in 
time. This includes SMTP commands which require domain parameters (such 
as the HELO and EHLO commands) and commands where e-mail addresses are 
specified (such as the MAIL FROM, RCPT TO, DATA, VRFY, EXPN, SEND, SOML 
and SAML commands).

As for data passed by the DATA command, ACE conversion MUST be
performed when the "domain" portion of an "addr-spec" or when a "domain" 
itself, within the context of [RFC822], is encountered. This is 
necessary as an updated MUA may originate a message which is read by a 
non-updated MUA. If this happens, the non-updated MUA may face 
operational problems dealing with IDNs that appear in the "addr-spec" 
which are not in ACE.

Any transfer encoding syntax to be applied to the mail headers as
specified in [RFC2047] SHOULD be performed before nameprepped ACE 
conversion. This is to reduce confusion between IDNs within "addr-spec" 
and "domain" portions, in the context of [RFC822], and IDNs which appear 
as arbitrary data in mail headers and bodies.

2.1.4. Interface between POP3/IMAP4 client (or local mail store) and 
       Mail Reading Interface

The MUA mail reading interface (i.e. "Read mail" function of an MUA)
typically displays e-mail data retrieved from either a POP3/IMAP4
client or from a local mail store through internal function calls within 
the MUA software or through an API.

When e-mail containing an ACE-represented IDN is to be displayed, the
MUA SHOULD convert the ACE-represented IDN contained within the
"addr-spec" or "domain" portion specified in [RFC822] back into any 
localized or internationalized charset of the user's choice, whenever 
possible. In the event that it is impossible to achieve conversion back 
into the selected localized charset (for example, conversion of RACE-
represented Hangeul characters into ISO-8859-1 is impossible), the MUA 
should prompt the user with an error message.

It may be possible to save and retrieve information about the original
charset of the ACE-converted IDN through the use of additional
[RFC822] mail headers, but that is not (yet) addressed by this memo.

Although it is possible to render ACE into properly decoded glyphs and
display the actual abstract characters without any conversion to other
charsets, the MUA SHOULD NOT do this as it is not the primary function
of an MUA to render characters. This should be left to a rendering 
engine which is separate from the MUA and typically embedded into the 
OS. It is sufficient for the MUA to pass the appropriate charset to the
rendering engine for proper display.

3. ACE Length Considerations

As [RFC821] in Section 4.5.3 restricts the maximum total length of a
domain name to 64 characters, representation of IDNs using ACE may
pose a potential problem. Most ACEs typically require 3-4 ASCII 
characters to represent one international character (especially in the 
case of CJK characters, where compression is less effective).

That would leave only about 16-24 characters for the whole IDN,
including all name parts and dots. This is highly undesirable as some 
languages such as Arabic are unable to be abbreviated and the domain 
names may require a larger length than that which is allowed by 
[RFC821].

To further complicate matters, several mailing list software such as
ezmlm embed domain names into the local-parts portion of an e-mail 
address during management of subscriptions, together with randomly-
generated subscription information. This would leave an even smaller 
maximum ACE length, if interoperability with these mailing list software 
were to be maintained, given that there is also a 64 character 
restriction on local parts.

4. Security Considerations

As this memo is based on [IDNA], security considerations are similar
to that faced by [IDNA]. This includes security considerations from
[NAMEPREP] as well.

5. Other Considerations

Although this document addresses end-user MUAs (e.g. elm, mutt, pine,
Eudora, Outlook Express, etc) to a large extent, the definition of an
MUA could be extended to include web-based e-mail server software and
automated programs such as mailing list management software.

End-user MUAs may also include additional functionality where IDNs may
be encountered, such as calendaring/scheduling, directory services and
digital certificate storage. This is not (yet) addressed in this memo.

6. Future Extensions

It is possible to achieve internationalization of the entire e-mail
address by representation of international characters in the local-parts 
of an "addr-spec" using nameprepped ACE conversion in a similar fashion 
as described in this memo.

However, this is a different problem altogether and is currently beyond
the scope of this memo.

7. References

[IDNA] Paul Hoffman & Patrik Faltstrom, "Internationalizing Host Names
in Applications (IDNA)", draft-ietf-idn-idna.

[UTR17] K. Whistler & M. Davis, Unicode Consortium, "Character Encoding
Model", Unicode Technical Report #17, 
http://www.unicode.org/unicode/reports/tr17/

[US-ASCII] United States of America Standards Institute, "USA Code for 
Information Interchange", X3.4, 1968.

[RFC2119] Scott  Bradner, "Key words for  use in  RFCs to Indicate 
Requirement Levels", March 1997, RFC 2119.

[IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name
Proposals", draft-ietf-idn-compare.

[RFC821] Jonathan B. Postel, "Simple Mail Transfer Protocol", August 
1982, RFC 821.

[RFC822] David H. Crocker, "Standard for the Format of ARPA Internet 
Text Messages", August 1982, RFC 822.

[RFC2045] N. Freed & N. Borenstein, "Multipurpose Internet Mail 
Extensions (MIME) Part One: Format of Internet Message Bodies", 
November 1996, RFC 2045.

[RFC2047] K. Moore, "MIME (Multipurpose Internet Mail Extensions) 
Part Three: Message Header Extensions for Non-ASCII Text", November 
1996, RFC 2047.

[RFC1652] J. Klensin et al., "SMTP Service Extension for 8bit-
MIMEtransport", July 1994, RFC 1652.


[NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of
Internationalized Host Names", draft-ietf-idn-nameprep.

A. Author's Address

Maynard Kang
i-EMAIL.net Pte Ltd
1 Kim Seng Promenade #12-07
Great World City West Tower
Singapore 237994
E-mail: maynard@i-email.net