Class _codecs

java.lang.Object
org.python.modules._codecs

public class _codecs extends Object
This class corresponds to the Python _codecs module, which in turn lends its functions to the codecs module (in Lib/codecs.py). It exposes the implementing functions of several codec families called out in the Python codecs library Lib/encodings/*.py, where it is usually claimed that they are bound "as C functions". Obviously, C stands for "compiled" in this context, rather than dependence on a particular implementation language. Actual transcoding methods often come from the related codecs class.
  • Constructor Details

    • _codecs

      public _codecs()
  • Method Details

    • register

      public static void register(PyObject search_function)
    • lookup

      public static PyTuple lookup(PyString encoding)
    • lookup_error

      public static PyObject lookup_error(PyString handlerName)
    • register_error

      public static void register_error(String name, PyObject errorHandler)
    • decode

      public static PyObject decode(PyString bytes)
      Decode bytes using the system default encoding (see codecs.getDefaultEncoding()). Decoding errors raise a ValueError.
      Parameters:
      bytes - to be decoded
      Returns:
      Unicode string decoded from bytes
    • decode

      public static PyObject decode(PyString bytes, PyString encoding)
      Decode bytes using the codec registered for the encoding. The encoding defaults to the system default encoding (see codecs.getDefaultEncoding()). Decoding errors raise a ValueError.
      Parameters:
      bytes - to be decoded
      encoding - name of encoding (to look up in codec registry)
      Returns:
      Unicode string decoded from bytes
    • decode

      public static PyObject decode(PyString bytes, PyString encoding, PyString errors)
      Decode bytes using the codec registered for the encoding. The encoding defaults to the system default encoding (see codecs.getDefaultEncoding()). The string errors may name a different error handling policy (built-in or registered with register_error(String, PyObject) ). The default error policy is 'strict' meaning that decoding errors raise a ValueError.
      Parameters:
      bytes - to be decoded
      encoding - name of encoding (to look up in codec registry)
      errors - error policy name (e.g. "ignore")
      Returns:
      Unicode string decoded from bytes
    • encode

      public static PyString encode(PyUnicode unicode)
      Encode unicode using the system default encoding (see codecs.getDefaultEncoding()). Encoding errors raise a ValueError.
      Parameters:
      unicode - string to be encoded
      Returns:
      bytes object encoding unicode
    • encode

      public static PyString encode(PyUnicode unicode, PyString encoding)
      Encode unicode using the codec registered for the encoding. The encoding defaults to the system default encoding (see codecs.getDefaultEncoding()). Encoding errors raise a ValueError.
      Parameters:
      unicode - string to be encoded
      encoding - name of encoding (to look up in codec registry)
      Returns:
      bytes object encoding unicode
    • encode

      public static PyString encode(PyUnicode unicode, PyString encoding, PyString errors)
      Encode unicode using the codec registered for the encoding. The encoding defaults to the system default encoding (see codecs.getDefaultEncoding()). The string errors may name a different error handling policy (built-in or registered with register_error(String, PyObject) ). The default error policy is 'strict' meaning that encoding errors raise a ValueError.
      Parameters:
      unicode - string to be encoded
      encoding - name of encoding (to look up in codec registry)
      errors - error policy name (e.g. "ignore")
      Returns:
      bytes object encoding unicode
    • charmap_build

      public static PyObject charmap_build(PyUnicode map)
    • utf_8_decode

      public static PyTuple utf_8_decode(String str)
    • utf_8_decode

      public static PyTuple utf_8_decode(String str, String errors)
    • utf_8_decode

      public static PyTuple utf_8_decode(String str, String errors, PyObject final_)
    • utf_8_decode

      public static PyTuple utf_8_decode(String str, String errors, boolean final_)
    • utf_8_encode

      public static PyTuple utf_8_encode(String str)
    • utf_8_encode

      public static PyTuple utf_8_encode(String str, String errors)
    • utf_7_decode

      public static PyTuple utf_7_decode(String bytes)
    • utf_7_decode

      public static PyTuple utf_7_decode(String bytes, String errors)
    • utf_7_decode

      public static PyTuple utf_7_decode(String bytes, String errors, boolean finalFlag)
    • utf_7_encode

      public static PyTuple utf_7_encode(String str)
    • utf_7_encode

      public static PyTuple utf_7_encode(String str, String errors)
    • escape_decode

      public static PyTuple escape_decode(String str)
    • escape_decode

      public static PyTuple escape_decode(String str, String errors)
    • escape_encode

      public static PyTuple escape_encode(String str)
    • escape_encode

      public static PyTuple escape_encode(String str, String errors)
    • charmap_decode

      public static PyTuple charmap_decode(String bytes)
      Equivalent to charmap_decode(bytes, errors, null). This method is here so the error and mapping arguments can be optional at the Python level.
      Parameters:
      bytes - sequence of bytes to decode
      Returns:
      decoded string and number of bytes consumed
    • charmap_decode

      public static PyTuple charmap_decode(String bytes, String errors)
      Equivalent to charmap_decode(bytes, errors, null). This method is here so the error argument can be optional at the Python level.
      Parameters:
      bytes - sequence of bytes to decode
      errors - error policy
      Returns:
      decoded string and number of bytes consumed
    • charmap_decode

      public static PyTuple charmap_decode(String bytes, String errors, PyObject mapping)
      Decode a sequence of bytes into Unicode characters via a mapping supplied as a container to be indexed by the byte values (as unsigned integers). If the mapping is null or None, decode with latin-1 (essentially treating bytes as character codes directly).
      Parameters:
      bytes - sequence of bytes to decode
      errors - error policy
      mapping - to convert bytes to characters
      Returns:
      decoded string and number of bytes consumed
    • charmap_decode

      public static PyTuple charmap_decode(String bytes, String errors, PyObject mapping, boolean ignoreUnmapped)
      Decode a sequence of bytes into Unicode characters via a mapping supplied as a container to be indexed by the byte values (as unsigned integers).
      Parameters:
      bytes - sequence of bytes to decode
      errors - error policy
      mapping - to convert bytes to characters
      ignoreUnmapped - if true, pass unmapped byte values as character codes [0..256)
      Returns:
      decoded string and number of bytes consumed
    • translateCharmap

      public static PyObject translateCharmap(PyUnicode str, String errors, PyObject mapping)
    • charmap_encode

      public static PyTuple charmap_encode(String str)
      Equivalent to charmap_encode(str, null, null). This method is here so the error and mapping arguments can be optional at the Python level.
      Parameters:
      str - to be encoded
      Returns:
      (encoded data, size(str)) as a pair
    • charmap_encode

      public static PyTuple charmap_encode(String str, String errors)
      Equivalent to charmap_encode(str, errors, null). This method is here so the mapping can be optional at the Python level.
      Parameters:
      str - to be encoded
      errors - error policy name (e.g. "ignore")
      Returns:
      (encoded data, size(str)) as a pair
    • charmap_encode

      public static PyTuple charmap_encode(String str, String errors, PyObject mapping)
      Encoder based on an optional character mapping. This mapping is either an EncodingMap of 256 entries, or an arbitrary container indexable with integers using __finditem__ and yielding byte strings. If the mapping is null, latin-1 (effectively a mapping of character code to the numerically-equal byte) is used
      Parameters:
      str - to be encoded
      errors - error policy name (e.g. "ignore")
      mapping - from character code to output byte (or string)
      Returns:
      (encoded data, size(str)) as a pair
    • ascii_decode

      public static PyTuple ascii_decode(String str)
    • ascii_decode

      public static PyTuple ascii_decode(String str, String errors)
    • ascii_encode

      public static PyTuple ascii_encode(String str)
    • ascii_encode

      public static PyTuple ascii_encode(String str, String errors)
    • latin_1_decode

      public static PyTuple latin_1_decode(String str)
    • latin_1_decode

      public static PyTuple latin_1_decode(String str, String errors)
    • latin_1_encode

      public static PyTuple latin_1_encode(String str)
    • latin_1_encode

      public static PyTuple latin_1_encode(String str, String errors)
    • utf_16_encode

      public static PyTuple utf_16_encode(String str)
    • utf_16_encode

      public static PyTuple utf_16_encode(String str, String errors)
    • utf_16_encode

      public static PyTuple utf_16_encode(String str, String errors, int byteorder)
    • utf_16_le_encode

      public static PyTuple utf_16_le_encode(String str)
    • utf_16_le_encode

      public static PyTuple utf_16_le_encode(String str, String errors)
    • utf_16_be_encode

      public static PyTuple utf_16_be_encode(String str)
    • utf_16_be_encode

      public static PyTuple utf_16_be_encode(String str, String errors)
    • encode_UTF16

      public static String encode_UTF16(String str, String errors, int byteorder)
    • utf_16_decode

      public static PyTuple utf_16_decode(String str)
    • utf_16_decode

      public static PyTuple utf_16_decode(String str, String errors)
    • utf_16_decode

      public static PyTuple utf_16_decode(String str, String errors, boolean final_)
    • utf_16_le_decode

      public static PyTuple utf_16_le_decode(String str)
    • utf_16_le_decode

      public static PyTuple utf_16_le_decode(String str, String errors)
    • utf_16_le_decode

      public static PyTuple utf_16_le_decode(String str, String errors, boolean final_)
    • utf_16_be_decode

      public static PyTuple utf_16_be_decode(String str)
    • utf_16_be_decode

      public static PyTuple utf_16_be_decode(String str, String errors)
    • utf_16_be_decode

      public static PyTuple utf_16_be_decode(String str, String errors, boolean final_)
    • utf_16_ex_decode

      public static PyTuple utf_16_ex_decode(String str)
    • utf_16_ex_decode

      public static PyTuple utf_16_ex_decode(String str, String errors)
    • utf_16_ex_decode

      public static PyTuple utf_16_ex_decode(String str, String errors, int byteorder)
    • utf_16_ex_decode

      public static PyTuple utf_16_ex_decode(String str, String errors, int byteorder, boolean final_)
    • utf_32_encode

      public static PyTuple utf_32_encode(String unicode)
      Encode a Unicode Java String as UTF-32 with byte order mark. (Encoding is in platform byte order, which is big-endian for Java.)
      Parameters:
      unicode - to be encoded
      Returns:
      tuple (encoded_bytes, unicode_consumed)
    • utf_32_encode

      public static PyTuple utf_32_encode(String unicode, String errors)
      Encode a Unicode Java String as UTF-32 with byte order mark. (Encoding is in platform byte order, which is big-endian for Java.)
      Parameters:
      unicode - to be encoded
      errors - error policy name or null meaning "strict"
      Returns:
      tuple (encoded_bytes, unicode_consumed)
    • utf_32_encode

      public static PyTuple utf_32_encode(String unicode, String errors, int byteorder)
      Encode a Unicode Java String as UTF-32 in specified byte order with byte order mark.
      Parameters:
      unicode - to be encoded
      errors - error policy name or null meaning "strict"
      byteorder - decoding "endianness" specified (in the Python -1, 0, +1 convention)
      Returns:
      tuple (encoded_bytes, unicode_consumed)
    • utf_32_le_encode

      public static PyTuple utf_32_le_encode(String unicode)
      Encode a Unicode Java String as UTF-32 with little-endian byte order. No byte-order mark is generated.
      Parameters:
      unicode - to be encoded
      Returns:
      tuple (encoded_bytes, unicode_consumed)
    • utf_32_le_encode

      public static PyTuple utf_32_le_encode(String unicode, String errors)
      Encode a Unicode Java String as UTF-32 with little-endian byte order. No byte-order mark is generated.
      Parameters:
      unicode - to be encoded
      errors - error policy name or null meaning "strict"
      Returns:
      tuple (encoded_bytes, unicode_consumed)
    • utf_32_be_encode

      public static PyTuple utf_32_be_encode(String unicode)
      Encode a Unicode Java String as UTF-32 with big-endian byte order. No byte-order mark is generated.
      Parameters:
      unicode - to be encoded
      Returns:
      tuple (encoded_bytes, unicode_consumed)
    • utf_32_be_encode

      public static PyTuple utf_32_be_encode(String unicode, String errors)
      Encode a Unicode Java String as UTF-32 with big-endian byte order. No byte-order mark is generated.
      Parameters:
      unicode - to be encoded
      errors - error policy name or null meaning "strict"
      Returns:
      tuple (encoded_bytes, unicode_consumed)
    • utf_32_decode

      public static PyTuple utf_32_decode(String bytes)
      Decode (perhaps partially) a sequence of bytes representing the UTF-32 encoded form of a Unicode string and return as a tuple the unicode text, and the amount of input consumed. The endianness used will have been deduced from a byte-order mark, if present, or will be big-endian (Java platform default). The unicode text is presented as a Java String (the UTF-16 representation used by PyUnicode). It is an error for the input bytes not to form a whole number of valid UTF-32 codes.
      Parameters:
      bytes - to be decoded (Jython PyString convention)
      Returns:
      tuple (unicode_result, bytes_consumed)
    • utf_32_decode

      public static PyTuple utf_32_decode(String bytes, String errors)
      Decode a sequence of bytes representing the UTF-32 encoded form of a Unicode string and return as a tuple the unicode text, and the amount of input consumed. The endianness used will have been deduced from a byte-order mark, if present, or will be big-endian (Java platform default). The unicode text is presented as a Java String (the UTF-16 representation used by PyUnicode). It is an error for the input bytes not to form a whole number of valid UTF-32 codes.
      Parameters:
      bytes - to be decoded (Jython PyString convention)
      errors - error policy name (e.g. "ignore", "replace")
      Returns:
      tuple (unicode_result, bytes_consumed)
    • utf_32_decode

      public static PyTuple utf_32_decode(String bytes, String errors, boolean isFinal)
      Decode (perhaps partially) a sequence of bytes representing the UTF-32 encoded form of a Unicode string and return as a tuple the unicode text, and the amount of input consumed. The endianness used will have been deduced from a byte-order mark, if present, or will be big-endian (Java platform default). The unicode text is presented as a Java String (the UTF-16 representation used by PyUnicode).
      Parameters:
      bytes - to be decoded (Jython PyString convention)
      errors - error policy name (e.g. "ignore", "replace")
      isFinal - if a "final" call, meaning the input must all be consumed
      Returns:
      tuple (unicode_result, bytes_consumed)
    • utf_32_le_decode

      public static PyTuple utf_32_le_decode(String bytes)
      Decode a sequence of bytes representing the UTF-32 little-endian encoded form of a Unicode string and return as a tuple the unicode text, and the amount of input consumed. A (correctly-oriented) byte-order mark will pass as a zero-width non-breaking space. The unicode text is presented as a Java String (the UTF-16 representation used by PyUnicode). It is an error for the input bytes not to form a whole number of valid UTF-32 codes.
      Parameters:
      bytes - to be decoded (Jython PyString convention)
      Returns:
      tuple (unicode_result, bytes_consumed)
    • utf_32_le_decode

      public static PyTuple utf_32_le_decode(String bytes, String errors)
      Decode a sequence of bytes representing the UTF-32 little-endian encoded form of a Unicode string and return as a tuple the unicode text, and the amount of input consumed. A (correctly-oriented) byte-order mark will pass as a zero-width non-breaking space. The unicode text is presented as a Java String (the UTF-16 representation used by PyUnicode). It is an error for the input bytes not to form a whole number of valid UTF-32 codes.
      Parameters:
      bytes - to be decoded (Jython PyString convention)
      errors - error policy name (e.g. "ignore", "replace")
      Returns:
      tuple (unicode_result, bytes_consumed)
    • utf_32_le_decode

      public static PyTuple utf_32_le_decode(String bytes, String errors, boolean isFinal)
      Decode (perhaps partially) a sequence of bytes representing the UTF-32 little-endian encoded form of a Unicode string and return as a tuple the unicode text, and the amount of input consumed. A (correctly-oriented) byte-order mark will pass as a zero-width non-breaking space. The unicode text is presented as a Java String (the UTF-16 representation used by PyUnicode).
      Parameters:
      bytes - to be decoded (Jython PyString convention)
      errors - error policy name (e.g. "ignore", "replace")
      isFinal - if a "final" call, meaning the input must all be consumed
      Returns:
      tuple (unicode_result, bytes_consumed)
    • utf_32_be_decode

      public static PyTuple utf_32_be_decode(String bytes)
      Decode a sequence of bytes representing the UTF-32 big-endian encoded form of a Unicode string and return as a tuple the unicode text, and the amount of input consumed. A (correctly-oriented) byte-order mark will pass as a zero-width non-breaking space. The unicode text is presented as a Java String (the UTF-16 representation used by PyUnicode). It is an error for the input bytes not to form a whole number of valid UTF-32 codes.
      Parameters:
      bytes - to be decoded (Jython PyString convention)
      Returns:
      tuple (unicode_result, bytes_consumed)
    • utf_32_be_decode

      public static PyTuple utf_32_be_decode(String bytes, String errors)
      Decode a sequence of bytes representing the UTF-32 big-endian encoded form of a Unicode string and return as a tuple the unicode text, and the amount of input consumed. A (correctly-oriented) byte-order mark will pass as a zero-width non-breaking space. The unicode text is presented as a Java String (the UTF-16 representation used by PyUnicode). It is an error for the input bytes not to form a whole number of valid UTF-32 codes.
      Parameters:
      bytes - to be decoded (Jython PyString convention)
      errors - error policy name (e.g. "ignore", "replace")
      Returns:
      tuple (unicode_result, bytes_consumed)
    • utf_32_be_decode

      public static PyTuple utf_32_be_decode(String bytes, String errors, boolean isFinal)
      Decode (perhaps partially) a sequence of bytes representing the UTF-32 big-endian encoded form of a Unicode string and return as a tuple the unicode text, and the amount of input consumed. A (correctly-oriented) byte-order mark will pass as a zero-width non-breaking space. Unicode string and return as a tuple the unicode text, the amount of input consumed. The unicode text is presented as a Java String (the UTF-16 representation used by PyUnicode).
      Parameters:
      bytes - to be decoded (Jython PyString convention)
      errors - error policy name (e.g. "ignore", "replace")
      isFinal - if a "final" call, meaning the input must all be consumed
      Returns:
      tuple (unicode_result, bytes_consumed)
    • utf_32_ex_decode

      public static PyTuple utf_32_ex_decode(String bytes, String errors, int byteorder)
      Decode a sequence of bytes representing the UTF-32 encoded form of a Unicode string and return as a tuple the unicode text, the amount of input consumed, and the decoding "endianness" used (in the Python -1, 0, +1 convention). The endianness, if not unspecified (=0), will be deduced from a byte-order mark and returned. (This codec entrypoint is used in that way in the utf_32.py codec, but only until the byte order is known.) When not defined by a BOM, processing assumes big-endian coding (Java platform default), but returns "unspecified". (The utf_32.py codec treats this as an error, once more than 4 bytes have been processed.) (Java platform default). The unicode text is presented as a Java String (the UTF-16 representation used by PyUnicode).
      Parameters:
      bytes - to be decoded (Jython PyString convention)
      errors - error policy name (e.g. "ignore", "replace")
      byteorder - decoding "endianness" specified (in the Python -1, 0, +1 convention)
      Returns:
      tuple (unicode_result, bytes_consumed, endianness)
    • utf_32_ex_decode

      public static PyTuple utf_32_ex_decode(String bytes, String errors, int byteorder, boolean isFinal)
      Decode (perhaps partially) a sequence of bytes representing the UTF-32 encoded form of a Unicode string and return as a tuple the unicode text, the amount of input consumed, and the decoding "endianness" used (in the Python -1, 0, +1 convention). The endianness will be that specified, will have been deduced from a byte-order mark, if present, or will be big-endian (Java platform default). Or it may still be undefined if fewer than 4 bytes are presented. (This codec entrypoint is used in the utf-32 codec only untile the byte order is known.) The unicode text is presented as a Java String (the UTF-16 representation used by PyUnicode).
      Parameters:
      bytes - to be decoded (Jython PyString convention)
      errors - error policy name (e.g. "ignore", "replace")
      byteorder - decoding "endianness" specified (in the Python -1, 0, +1 convention)
      isFinal - if a "final" call, meaning the input must all be consumed
      Returns:
      tuple (unicode_result, bytes_consumed, endianness)
    • raw_unicode_escape_encode

      public static PyTuple raw_unicode_escape_encode(String str)
    • raw_unicode_escape_encode

      public static PyTuple raw_unicode_escape_encode(String str, String errors)
    • raw_unicode_escape_decode

      public static PyTuple raw_unicode_escape_decode(String str)
    • raw_unicode_escape_decode

      public static PyTuple raw_unicode_escape_decode(String str, String errors)
    • unicode_escape_encode

      public static PyTuple unicode_escape_encode(String str)
    • unicode_escape_encode

      public static PyTuple unicode_escape_encode(String str, String errors)
    • unicode_escape_decode

      public static PyTuple unicode_escape_decode(String str)
    • unicode_escape_decode

      public static PyTuple unicode_escape_decode(String str, String errors)
    • unicode_internal_encode

      @Deprecated public static PyTuple unicode_internal_encode(String unicode)
      Deprecated.
      Legacy method to encode given unicode in CPython wide-build internal format (equivalent UTF-32BE).
    • unicode_internal_encode

      @Deprecated public static PyTuple unicode_internal_encode(String unicode, String errors)
      Deprecated.
      Legacy method to encode given unicode in CPython wide-build internal format (equivalent UTF-32BE). There must be a multiple of 4 bytes.
    • unicode_internal_decode

      @Deprecated public static PyTuple unicode_internal_decode(String bytes)
      Deprecated.
      Legacy method to decode given bytes as if CPython wide-build internal format (equivalent UTF-32BE). There must be a multiple of 4 bytes.
    • unicode_internal_decode

      @Deprecated public static PyTuple unicode_internal_decode(String bytes, String errors)
      Deprecated.
      Legacy method to decode given bytes as if CPython wide-build internal format (equivalent UTF-32BE). There must be a multiple of 4 bytes.