dwww Home | Manual pages | Find package

Unicode::MapUTF8(3pm) User Contributed Perl DocumentationUnicode::MapUTF8(3pm)

NAME
       Unicode::MapUTF8 - Conversions to and from arbitrary character sets and
       UTF8

SYNOPSIS
        use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);

        # Convert a string in 'ISO-8859-1' to 'UTF8'
        my $output = to_utf8({ -string => 'An example', -charset => 'ISO-8859-1' });

        # Convert a string in 'UTF8' encoding to encoding 'ISO-8859-1'
        my $other  = from_utf8({ -string => 'Other text', -charset => 'ISO-8859-1' });

        # List available character set encodings
        my @character_sets = utf8_supported_charset;

        # Add a character set alias
        utf8_charset_alias({ 'ms-japanese' => 'sjis' });

        # Convert between two arbitrary (but largely compatible) charset encodings
        # (SJIS to EUC-JP)
        my $utf8_string   = to_utf8({ -string =>$sjis_string, -charset => 'sjis'});
        my $euc_jp_string = from_utf8({ -string => $utf8_string, -charset => 'euc-jp' })

        # Verify that a specific character set is supported
        if (utf8_supported_charset('ISO-8859-1') {
            # Yes
        }

DESCRIPTION
       Provides an adapter layer between core routines for converting to and
       from UTF8 and other encodings. In essence, a way to give multiple
       existing Unicode modules a single common interface so you don't have to
       know the underlaying implementations to do simple UTF8 to-from other
       character set encoding conversions. As such, it wraps the
       Unicode::String, Unicode::Map8, Unicode::Map and Jcode modules in a
       standardized and simple API.

       This also provides general character set conversion operation based on
       UTF8 - it is possible to convert between any two compatible and
       supported character sets via a simple two step chaining of conversions.

       As with most things Perlish - if you give it a few big chunks of text
       to chew on instead of lots of small ones it will handle many more
       characters per second.

       By design, it can be easily extended to encompass any new charset
       encoding conversion modules that arrive on the scene.

       This module is intended to provide good Unicode support to versions of
       Perl prior to 5.8. If you are using Perl 5.8.0 or later, you probably
       want to be using the Encode module instead. This module does work with
       Perl 5.8, but Encode is the preferred method in that environment.

CHANGES
       1.14 2020.09.27   Fixing POD breakage in EUC-JP version of POD

       1.13 2020.09.27   Fixing MANIFEST.SKIP error

       1.12 2020.09.27   Build tool updates. Maintainer updates. POD error
       fixes.
                         Relicensed under MIT license.

       1.11 2005.10.10   Documentation changes. Addition of Build.PL support.
                         Added various build tests, LICENSE,
       Artistic_License.txt,
                         GPL_License.txt. Split documentation into seperate
                         .pod file. Added Japanese translation of POD.

       1.10 2005.05.22 - Fixed bug in conversion of ISO-2022-JP to UTF-8.
                         Problem and fix found by Masahiro HONMA
                         <masahiro.honma@tsutaya.co.jp>.

                         Similar bugs in conversions of shift_jis and euc-jp
                         to UTF-8 corrected as well.

       1.09 2001.08.22 - Fixed multiple typo occurances of 'uft'
                         where 'utf' was meant in code. Problem affected
                         utf16 and utf7 encodings. Problem found
                         by devon smith <devon@taller.PSCL.cwru.edu>

       1.08 2000.11.06 Added 'utf8_charset_alias' function to allow for
       runtime
                       setting of character set aliases. Added several
       alternate
                       names for 'sjis' (shiftjis, shift-jis, shift_jis,
       s-jis,
                       and s_jis).

                       Corrected 'croak' messages for 'from_utf8' functions to
                       appropriate function name.

                       Corrected fatal problem in jcode-unicode internals. Problem
                       and fix found by Brian Wisti <wbrian2@uswest.net>.

       1.07 2000.11.01 Added 'croak' to use Carp declaration to fix error
                       messages. Problem and fix found by
       <wbrian2@uswest.net>.

       1.06 2000.10.30 Fix to handle change in stringification of overloaded
                       objects between Perl 5.005 and 5.6.
                       Problem noticed by Brian Wisti <wbrian2@uswest.net>.

       1.05 2000.10.23 Error in conversions from UTF8 to multibyte encodings
       corrected

       1.04 2000.10.23 Additional diagnostic error messages added for
                       internal errors

       1.03 2000.10.22 Bug fix for load time Unicode::Map encoding
                       detection

       1.02 2000.10.22 Bug fix to 'from_utf8' method and load time
                       detection of Unicode::Map8 supported character
                       set encodings

       1.01 2000.10.02 Initial public release

FUNCTIONS
       utf8_charset_alias({ $alias => $charset });
           Used for runtime assignment of character set aliases.

           Called with no parameters, returns a hash of defined aliases and
           the character sets they map to.

           Example:

             my $aliases     = utf8_charset_alias;
             my @alias_names = keys %$aliases;

           If called with ONE parameter, returns the name of the 'real'
           charset if the alias is defined. Returns undef if it is not found
           in the aliases.

           Example:

               if (! utf8_charset_alias('VISCII')) {
                   # No alias for this
               }

           If called with a list of 'alias' => 'charset' pairs, defines those
           aliases for use.

           Example:

               utf8_charset_alias({ 'japanese' => 'sjis', 'japan' => 'sjis' });

           Note: It will croak if a passed pair does not map to a character
           set defined in the predefined set of character encoding. It is NOT
           allowed to alias something to another alias.

           Multiple character set aliases can be set with a single call.

           To clear an alias, pass a character set mapping of undef.

           Example:

               utf8_charset_alias({ 'japanese' => undef });

           While an alias is set, the 'utf8_supported_charset' function will
           return the alias as if it were a predefined charset.

           Overriding a base defined character encoding with an alias will
           generate a warning message to STDERR.

       utf8_supported_charset($charset_name);
           Returns true if the named charset is supported (including user
           defined aliases).

           Returns false if it is not.

           Example:

               if (! utf8_supported_charset('VISCII')) {
                   # No support yet
               }

           If called in a list context with no parameters, it will return a
           list of all supported character set names (including user defined
           aliases).

           Example:

               my @charsets = utf8_supported_charset;

       to_utf8({ -string => $string, -charset => $source_charset });
           Returns the string converted to UTF8 from the specified source
           charset.

       from_utf8({ -string => $string, -charset => $target_charset});
           Returns the string converted from UTF8 to the specified target
           charset.

VERSION
       1.14 2020.09.27

TODO
       Regression tests for Jcode, 2-byte encodings and encoding aliases

SEE ALSO
       Unicode::String Unicode::Map8 Unicode::Map Jcode Encode

COPYRIGHT
       Copyright 2000-2020, Jerilyn Franz. All rights reserved.

AUTHOR
       Jerilyn Franz <cpan@jerilyn.info>

LICENSE
       MIT License

       Copyright (c) 2020 Jerilyn Franz

       Permission is hereby granted, free of charge, to any person obtaining a
       copy of this software and associated documentation files (the
       "Software"), to deal in the Software without restriction, including
       without limitation the rights to use, copy, modify, merge, publish,
       distribute, sublicense, and/or sell copies of the Software, and to
       permit persons to whom the Software is furnished to do so, subject to
       the following conditions:

       The above copyright notice and this permission notice shall be included
       in all copies or substantial portions of the Software.

       THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
       OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
       MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
       IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
       CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
       TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
       SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

perl v5.30.3                      2020-09-29             Unicode::MapUTF8(3pm)

Generated by dwww version 1.15 on Sat Jun 22 13:30:27 CEST 2024.