Class codecs

java.lang.Object
org.python.core.codecs

public class codecs extends Object
This class implements the codec registry and utility methods supporting codecs, such as those providing the standard replacement strategies ("ignore", "backslashreplace", etc.). The _codecs module relies heavily on apparatus implemented here, and therefore so does the Python codecs module (in Lib/codecs.py). It corresponds approximately to CPython's Python/codecs.c.

The class also contains the inner methods of the standard Unicode codecs, available for transcoding of text at the Java level. These also are exposed through the _codecs module. In CPython, the implementations are found in Objects/unicodeobject.c.

Since:
Jython 2.0
  • Field Details

  • Constructor Details

    • codecs

      public codecs()
  • Method Details

    • getDefaultEncoding

      public static String getDefaultEncoding()
    • setDefaultEncoding

      public static void setDefaultEncoding(String encoding)
    • lookup_error

      public static PyObject lookup_error(String handlerName)
    • register_error

      public static void register_error(String name, PyObject error)
    • register

      public static void register(PyObject search_function)
    • lookup

      public static PyTuple lookup(String encoding)
    • decode

      public static PyObject decode(PyString v, String encoding, String errors)
      Decode the bytes v using the codec registered for the encoding. The encoding defaults to the system default encoding (see getDefaultEncoding()). The string errors may name a different error handling policy (built-in or registered with register_error(String, PyObject)). The default error policy is 'strict' meaning that encoding errors raise a ValueError. This method is exposed through the _codecs module as _codecs.decode(PyString, PyString, PyString)
      Parameters:
      v - bytes to be decoded
      encoding - name of encoding (to look up in codec registry)
      errors - error policy name (e.g. "ignore", "replace")
      Returns:
      Unicode string decoded from bytes
    • encode

      public static String encode(PyString v, String encoding, String errors)
      Encode v using the codec registered for the encoding. The encoding defaults to the system default encoding (see getDefaultEncoding()). The string errors may name a different error handling policy (built-in or registered with register_error(String, PyObject)). The default error policy is 'strict' meaning that encoding errors raise a ValueError.
      Parameters:
      v - unicode string to be encoded
      encoding - name of encoding (to look up in codec registry)
      errors - error policy name (e.g. "ignore")
      Returns:
      bytes object encoding v
    • strict_errors

      public static PyObject strict_errors(PyObject[] args, String[] kws)
    • ignore_errors

      public static PyObject ignore_errors(PyObject[] args, String[] kws)
    • replace_errors

      public static PyObject replace_errors(PyObject[] args, String[] kws)
    • xmlcharrefreplace_errors

      public static PyObject xmlcharrefreplace_errors(PyObject[] args, String[] kws)
    • xmlcharrefreplace

      public static StringBuilder xmlcharrefreplace(int start, int end, String toReplace)
    • backslashreplace_errors

      public static PyObject backslashreplace_errors(PyObject[] args, String[] kws)
    • backslashreplace

      public static StringBuilder backslashreplace(int start, int end, String toReplace)
    • PyUnicode_DecodeUTF7Stateful

      public static String PyUnicode_DecodeUTF7Stateful(String bytes, String errors, int[] consumed)
      Decode (perhaps partially) a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object, and amount of input consumed. The only state we preserve is our read position, i.e. how many bytes we have consumed. So if the input ends part way through a Base64 sequence the data reported as consumed is just that up to and not including the Base64 start marker ('+'). Performance will be poor (quadratic cost) on runs of Base64 data long enough to exceed the input quantum in incremental decoding. The returned Java String is a UTF-16 representation of the Unicode result, in line with Java conventions. Unicode characters above the BMP are represented as surrogate pairs.
      Parameters:
      bytes - input represented as String (Jython PyString convention)
      errors - error policy name (e.g. "ignore", "replace")
      consumed - returns number of bytes consumed in element 0, or is null if a "final" call
      Returns:
      unicode result (as UTF-16 Java String)
    • PyUnicode_DecodeUTF7

      public static String PyUnicode_DecodeUTF7(String bytes, String errors)
      Decode completely a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object. The retruned Java String is a UTF-16 representation of the Unicode result, in line with Java conventions. Unicode characters above the BMP are represented as surrogate pairs.
      Parameters:
      bytes - input represented as String (Jython PyString convention)
      errors - error policy name (e.g. "ignore", "replace")
      Returns:
      unicode result (as UTF-16 Java String)
    • PyUnicode_EncodeUTF7

      public static String PyUnicode_EncodeUTF7(String unicode, boolean base64SetO, boolean base64WhiteSpace, String errors)
      Encode a UTF-16 Java String as UTF-7 bytes represented by the low bytes of the characters in a String. (String representation for byte data is chosen so that it may immediately become a PyString.) This method differs from the CPython equivalent (in Object/unicodeobject.c) which works with an array of code points that are, in a wide build, Unicode code points.
      Parameters:
      unicode - to be encoded
      base64SetO - true if characters in "set O" should be translated to base64
      base64WhiteSpace - true if white-space characters should be translated to base64
      errors - error policy name (e.g. "ignore", "replace")
      Returns:
      bytes representing the encoded unicode string
    • PyUnicode_DecodeUTF8

      public static String PyUnicode_DecodeUTF8(String str, String errors)
    • PyUnicode_DecodeUTF8Stateful

      public static String PyUnicode_DecodeUTF8Stateful(String str, String errors, int[] consumed)
    • PyUnicode_EncodeUTF8

      public static String PyUnicode_EncodeUTF8(String str, String errors)
    • PyUnicode_DecodeASCII

      public static String PyUnicode_DecodeASCII(String str, int size, String errors)
    • PyUnicode_DecodeLatin1

      public static String PyUnicode_DecodeLatin1(String str, int size, String errors)
    • PyUnicode_EncodeASCII

      public static String PyUnicode_EncodeASCII(String str, int size, String errors)
    • PyUnicode_EncodeLatin1

      public static String PyUnicode_EncodeLatin1(String str, int size, String errors)
    • PyUnicode_EncodeRawUnicodeEscape

      public static String PyUnicode_EncodeRawUnicodeEscape(String str, String errors, boolean modifed)
    • PyUnicode_DecodeRawUnicodeEscape

      public static String PyUnicode_DecodeRawUnicodeEscape(String str, String errors)
    • PyUnicode_EncodePunycode

      public static String PyUnicode_EncodePunycode(PyUnicode input, String errors)
    • PyUnicode_DecodePunycode

      public static PyUnicode PyUnicode_DecodePunycode(String input, String errors)
    • PyUnicode_EncodeIDNA

      public static String PyUnicode_EncodeIDNA(PyUnicode input, String errors)
    • PyUnicode_DecodeIDNA

      public static PyUnicode PyUnicode_DecodeIDNA(String input, String errors)
    • encoding_error

      public static PyObject encoding_error(String errors, String encoding, String toEncode, int start, int end, String reason)
      Invoke a user-defined error-handling mechanism, for errors encountered during encoding, as registered through register_error(String, PyObject). The return value is the return from the error handler indicating the replacement codec input and the the position at which to resume encoding. Invokes the mechanism described in PEP-293.
      Parameters:
      errors - name of the error policy (or null meaning "strict")
      encoding - name of encoding that encountered the error
      toEncode - unicode string being encoded
      start - index of first char it couldn't encode
      end - index+1 of last char it couldn't encode (usually becomes the resume point)
      reason - contribution to error message if any
      Returns:
      must be a tuple (replacement_unicode, resume_index)
    • insertReplacementAndGetResume

      public static int insertReplacementAndGetResume(StringBuilder partialDecode, String errors, String encoding, String toDecode, int start, int end, String reason)
      Handler for errors encountered during decoding, adjusting the output buffer contents and returning the correct position to resume decoding (if the handler does not simply raise an exception).
      Parameters:
      partialDecode - output buffer of unicode (as UTF-16) that the codec is building
      errors - name of the error policy (or null meaning "strict")
      encoding - name of encoding that encountered the error
      toDecode - bytes being decoded
      start - index of first byte it couldn't decode
      end - index+1 of last byte it couldn't decode (usually becomes the resume point)
      reason - contribution to error message if any
      Returns:
      the resume position: index of next byte to decode
    • decoding_error

      public static PyObject decoding_error(String errors, String encoding, String toDecode, int start, int end, String reason)
      Invoke a user-defined error-handling mechanism, for errors encountered during decoding, as registered through register_error(String, PyObject). The return value is the return from the error handler indicating the replacement codec output and the the position at which to resume decoding. Invokes the mechanism described in PEP-293.
      Parameters:
      errors - name of the error policy (or null meaning "strict")
      encoding - name of encoding that encountered the error
      toDecode - bytes being decoded
      start - index of first byte it couldn't decode
      end - index+1 of last byte it couldn't decode (usually becomes the resume point)
      reason - contribution to error message if any
      Returns:
      must be a tuple (replacement_unicode, resume_index)
    • calcNewPosition

      public static int calcNewPosition(int size, PyObject errorTuple)
      Given the return from some codec error handler (invoked while encoding or decoding), which specifies a resume position, and the length of the input being encoded or decoded, check and interpret the resume position. Negative indexes in the error handler return are interpreted as "from the end". If the result would be out of bounds in the input, an IndexError exception is raised.
      Parameters:
      size - of byte buffer being decoded
      errorTuple - returned from error handler
      Returns:
      absolute resume position.