dwww Home | Manual pages | Find package

MIME::Charset(3pm)    User Contributed Perl Documentation   MIME::Charset(3pm)

NAME
       MIME::Charset - Charset Information for MIME

SYNOPSIS
           use MIME::Charset:

           $charset = MIME::Charset->new("euc-jp");

       Getting charset information:

           $benc = $charset->body_encoding; # e.g. "Q"
           $cset = $charset->as_string; # e.g. "US-ASCII"
           $henc = $charset->header_encoding; # e.g. "S"
           $cset = $charset->output_charset; # e.g. "ISO-2022-JP"

       Translating text data:

           ($text, $charset, $encoding) =
               $charset->header_encode(
                  "\xc9\xc2\xc5\xaa\xc0\xde\xc3\xef\xc5\xaa".
                  "\xc7\xd1\xca\xaa\xbd\xd0\xce\xcf\xb4\xef",
                  Charset => 'euc-jp');
           # ...returns e.g. (<converted>, "ISO-2022-JP", "B").

           ($text, $charset, $encoding) =
               $charset->body_encode(
                   "Collectioneur path\xe9tiquement ".
                   "\xe9clectique de d\xe9chets",
                   Charset => 'latin1');
           # ...returns e.g. (<original>, "ISO-8859-1", "QUOTED-PRINTABLE").

           $len = $charset->encoded_header_len(
               "Perl\xe8\xa8\x80\xe8\xaa\x9e",
               Charset => 'utf-8',
               Encoding => "b");
           # ...returns e.g. 28.

       Manipulating module defaults:

           MIME::Charset::alias("csEUCKR", "euc-kr");
           MIME::Charset::default("iso-8859-1");
           MIME::Charset::fallback("us-ascii");

       Non-OO functions (may be deprecated in near future):

           use MIME::Charset qw(:info);

           $benc = body_encoding("iso-8859-2"); # "Q"
           $cset = canonical_charset("ANSI X3.4-1968"); # "US-ASCII"
           $henc = header_encoding("utf-8"); # "S"
           $cset = output_charset("shift_jis"); # "ISO-2022-JP"

           use MIME::Charset qw(:trans);

           ($text, $charset, $encoding) =
               header_encode(
                  "\xc9\xc2\xc5\xaa\xc0\xde\xc3\xef\xc5\xaa".
                  "\xc7\xd1\xca\xaa\xbd\xd0\xce\xcf\xb4\xef",
                  "euc-jp");
           # ...returns (<converted>, "ISO-2022-JP", "B");

           ($text, $charset, $encoding) =
               body_encode(
                   "Collectioneur path\xe9tiquement ".
                   "\xe9clectique de d\xe9chets",
                   "latin1");
           # ...returns (<original>, "ISO-8859-1", "QUOTED-PRINTABLE");

           $len = encoded_header_len(
               "Perl\xe8\xa8\x80\xe8\xaa\x9e", "b", "utf-8"); # 28

DESCRIPTION
       MIME::Charset provides information about character sets used for MIME
       messages on Internet.

   Definitions
       The charset is ``character set'' used in MIME to refer to a method of
       converting a sequence of octets into a sequence of characters.  It
       includes both concepts of ``coded character set'' (CCS) and ``character
       encoding scheme'' (CES) of ISO/IEC.

       The encoding is that used in MIME to refer to a method of representing
       a body part or a header body as sequence(s) of printable US-ASCII
       characters.

   Constructor
       $charset = MIME::Charset->new([CHARSET [, OPTS]])
           Create charset object.

           OPTS may accept following key-value pair.  NOTE: When
           Unicode/multibyte support is disabled (see "USE_ENCODE"),
           conversion will not be performed.  So this option do not have any
           effects.

           Mapping => MAPTYPE
               Whether to extend mappings actually used for charset names or
               not.  "EXTENDED" uses extended mappings.  "STANDARD" uses
               standardized strict mappings.  Default is "EXTENDED".

   Getting Information of Charsets
       $charset->body_encoding
       body_encoding CHARSET
           Get recommended transfer-encoding of CHARSET for message body.

           Returned value will be one of "B" (BASE64), "Q" (QUOTED-PRINTABLE),
           "S" (shorter one of either) or "undef" (might not be transfer-
           encoded; either 7BIT or 8BIT).  This may not be same as encoding
           for message header.

       $charset->as_string
       canonical_charset CHARSET
           Get canonical name for charset.

       $charset->decoder
           Get "Encode::Encoding" object to decode strings to Unicode by
           charset.  If charset is not specified or not known by this module,
           undef will be returned.

       $charset->dup
           Get a copy of charset object.

       $charset->encoder([CHARSET])
           Get "Encode::Encoding" object to encode Unicode string using
           compatible charset recommended to be used for messages on Internet.

           If optional CHARSET is specified, replace encoder (and output
           charset name) of $charset object with those of CHARSET, therefore,
           $charset object will be a converter between original charset and
           new CHARSET.

       $charset->header_encoding
       header_encoding CHARSET
           Get recommended encoding scheme of CHARSET for message header.

           Returned value will be one of "B", "Q", "S" (shorter one of either)
           or "undef" (might not be encoded).  This may not be same as
           encoding for message body.

       $charset->output_charset
       output_charset CHARSET
           Get a charset which is compatible with given CHARSET and is
           recommended to be used for MIME messages on Internet (if it is
           known by this module).

           When Unicode/multibyte support is disabled (see "USE_ENCODE"), this
           function will simply return the result of "canonical_charset".

   Translating Text Data
       $charset->body_encode(STRING [, OPTS])
       body_encode STRING, CHARSET [, OPTS]
           Get converted (if needed) data of STRING and recommended transfer-
           encoding of that data for message body.  CHARSET is the charset by
           which STRING is encoded.

           OPTS may accept following key-value pairs.  NOTE: When
           Unicode/multibyte support is disabled (see "USE_ENCODE"),
           conversion will not be performed.  So these options do not have any
           effects.

           Detect7bit => YESNO
               Try auto-detecting 7-bit charset when CHARSET is not given.
               Default is "YES".

           Replacement => REPLACEMENT
               Specifies error handling scheme.  See "Error Handling".

           3-item list of (converted string, charset for output, transfer-
           encoding) will be returned.  Transfer-encoding will be either
           "BASE64", "QUOTED-PRINTABLE", "7BIT" or "8BIT".  If charset for
           output could not be determined and converted string contains non-
           ASCII byte(s), charset for output will be "undef" and transfer-
           encoding will be "BASE64".  Charset for output will be "US-ASCII"
           if and only if string does not contain any non-ASCII bytes.

       $charset->decode(STRING [,CHECK])
           Decode STRING to Unicode.

           Note: When Unicode/multibyte support is disabled (see
           "USE_ENCODE"), this function will die.

       detect_7bit_charset STRING
           Guess 7-bit charset that may encode a string STRING.  If STRING
           contains any 8-bit bytes, "undef" will be returned.  Otherwise,
           Default Charset will be returned for unknown charset.

       $charset->encode(STRING [, CHECK])
           Encode STRING (Unicode or non-Unicode) using compatible charset
           recommended to be used for messages on Internet (if this module
           knows it).  Note that string will be decoded to Unicode then
           encoded even if compatible charset was equal to original charset.

           Note: When Unicode/multibyte support is disabled (see
           "USE_ENCODE"), this function will die.

       $charset->encoded_header_len(STRING [, ENCODING])
       encoded_header_len STRING, ENCODING, CHARSET
           Get length of encoded STRING for message header (without folding).

           ENCODING may be one of "B", "Q" or "S" (shorter one of either "B"
           or "Q").

       $charset->header_encode(STRING [, OPTS])
       header_encode STRING, CHARSET [, OPTS]
           Get converted (if needed) data of STRING and recommended encoding
           scheme of that data for message headers.  CHARSET is the charset by
           which STRING is encoded.

           OPTS may accept following key-value pairs.  NOTE: When
           Unicode/multibyte support is disabled (see "USE_ENCODE"),
           conversion will not be performed.  So these options do not have any
           effects.

           Detect7bit => YESNO
               Try auto-detecting 7-bit charset when CHARSET is not given.
               Default is "YES".

           Replacement => REPLACEMENT
               Specifies error handling scheme.  See "Error Handling".

           3-item list of (converted string, charset for output, encoding
           scheme) will be returned.  Encoding scheme will be either "B", "Q"
           or "undef" (might not be encoded).  If charset for output could not
           be determined and converted string contains non-ASCII byte(s),
           charset for output will be "8BIT" (this is not charset name but a
           special value to represent unencodable data) and encoding scheme
           will be "undef" (should not be encoded).  Charset for output will
           be "US-ASCII" if and only if string does not contain any non-ASCII
           bytes.

       $charset->undecode(STRING [,CHECK])
           Encode Unicode string STRING to byte string by input charset of
           $charset.  This is equivalent to "$charset->decoder->encode()".

           Note: When Unicode/multibyte support is disabled (see
           "USE_ENCODE"), this function will die.

   Manipulating Module Defaults
       alias ALIAS [, CHARSET]
           Get/set charset alias for canonical names determined by
           "canonical_charset".

           If CHARSET is given and isn't false, ALIAS will be assigned as an
           alias of CHARSET.  Otherwise, alias won't be changed.  In both
           cases, current charset name that ALIAS is assigned will be
           returned.

       default [CHARSET]
           Get/set default charset.

           Default charset is used by this module when charset context is
           unknown.  Modules using this module are recommended to use this
           charset when charset context is unknown or implicit default is
           expected.  By default, it is "US-ASCII".

           If CHARSET is given and isn't false, it will be set to default
           charset.  Otherwise, default charset won't be changed.  In both
           cases, current default charset will be returned.

           NOTE: Default charset should not be changed.

       fallback [CHARSET]
           Get/set fallback charset.

           Fallback charset is used by this module when conversion by given
           charset is failed and "FALLBACK" error handling scheme is
           specified.  Modules using this module may use this charset as last
           resort of charset for conversion.  By default, it is "UTF-8".

           If CHARSET is given and isn't false, it will be set to fallback
           charset.  If CHARSET is "NONE", fallback charset will be undefined.
           Otherwise, fallback charset won't be changed.  In any cases,
           current fallback charset will be returned.

           NOTE: It is useful that "US-ASCII" is specified as fallback
           charset, since result of conversion will be readable without
           charset information.

       recommended CHARSET [, HEADERENC, BODYENC [, ENCCHARSET]]
           Get/set charset profiles.

           If optional arguments are given and any of them are not false,
           profiles for CHARSET will be set by those arguments.  Otherwise,
           profiles won't be changed.  In both cases, current profiles for
           CHARSET will be returned as 3-item list of (HEADERENC, BODYENC,
           ENCCHARSET).

           HEADERENC is recommended encoding scheme for message header.  It
           may be one of "B", "Q", "S" (shorter one of either) or "undef"
           (might not be encoded).

           BODYENC is recommended transfer-encoding for message body.  It may
           be one of "B", "Q", "S" (shorter one of either) or "undef" (might
           not be transfer-encoded).

           ENCCHARSET is a charset which is compatible with given CHARSET and
           is recommended to be used for MIME messages on Internet.  If
           conversion is not needed (or this module doesn't know appropriate
           charset), ENCCHARSET is "undef".

           NOTE: This function in the future releases can accept more optional
           arguments (for example, properties to handle character widths, line
           folding behavior, ...).  So format of returned value may probably
           be changed.  Use "header_encoding", "body_encoding" or
           "output_charset" to get particular profile.

   Constants
       USE_ENCODE
           Unicode/multibyte support flag.  Non-empty string will be set when
           Unicode and multibyte support is enabled.  Currently, this flag
           will be non-empty on Perl 5.7.3 or later and empty string on
           earlier versions of Perl.

   Error Handling
       "body_encode" and "header_encode" accept following "Replacement"
       options:

       "DEFAULT"
           Put a substitution character in place of a malformed character.
           For UCM-based encodings, <subchar> will be used.

       "FALLBACK"
           Try "DEFAULT" scheme using fallback charset (see "fallback").  When
           fallback charset is undefined and conversion causes error, code
           will die on error with an error message.

       "CROAK"
           Code will die on error immediately with an error message.
           Therefore, you should trap the fatal error with eval{} unless you
           really want to let it die on error.  Synonym is "STRICT".

       "PERLQQ"
       "HTMLCREF"
       "XMLCREF"
           Use "FB_PERLQQ", "FB_HTMLCREF" or "FB_XMLCREF" scheme defined by
           Encode module.

       numeric values
           Numeric values are also allowed.  For more details see "Handling
           Malformed Data" in Encode.

       If error handling scheme is not specified or unknown scheme is
       specified, "DEFAULT" will be assumed.

   Configuration File
       Built-in defaults for option parameters can be overridden by
       configuration file: MIME/Charset/Defaults.pm.  For more details read
       MIME/Charset/Defaults.pm.sample.

VERSION
       Consult $VERSION variable.

       Development versions of this module may be found at
       <http://hatuka.nezumi.nu/repos/MIME-Charset/>.

   Incompatible Changes
       Release 1.001
           •   new() method returns an object when CHARSET argument is not
               specified.

       Release 1.005
           •   Restrict characters in encoded-word according to RFC 2047
               section 5 (3).  This also affects return value of
               encoded_header_len() method.

       Release 1.008.2
           •   body_encoding() method may also returns "S".

           •   Return value of body_encode() method for UTF-8 may include
               "QUOTED-PRINTABLE" encoding item that in earlier versions was
               fixed to "BASE64".

SEE ALSO
       Multipurpose Internet Mail Extensions (MIME).

AUTHOR
       Hatuka*nezumi - IKEDA Soji <hatuka(at)nezumi.nu>

COPYRIGHT
       Copyright (C) 2006-2017 Hatuka*nezumi - IKEDA Soji.  This program is
       free software; you can redistribute it and/or modify it under the same
       terms as Perl itself.

perl v5.34.0                      2022-10-13                MIME::Charset(3pm)

Generated by dwww version 1.15 on Wed Jun 26 06:24:00 CEST 2024.