org.python.core.codecs

public class codecs extends Object

This class implements the codec registry and utility methods supporting codecs, such as those providing the standard replacement strategies ("ignore", "backslashreplace", etc.). The _codecs module relies heavily on apparatus implemented here, and therefore so does the Python codecs module (in Lib/codecs.py). It corresponds approximately to CPython's Python/codecs.c.

The class also contains the inner methods of the standard Unicode codecs, available for transcoding of text at the Java level. These also are exposed through the _codecs module. In CPython, the implementations are found in Objects/unicodeobject.c.

Since:: Jython 2.0

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

codecs.CodecState
Field Summary

Fields

Modifier and Type

Field

Description

static final String

BACKSLASHREPLACE

static final String

IGNORE

static final String

REPLACE

static final String

XMLCHARREFREPLACE
Constructor Summary

Constructors

Constructor

Description

codecs()
Method Summary

Modifier and Type

Method

Description

static StringBuilder

backslashreplace(int start, int end, String toReplace)

static PyObject

backslashreplace_errors(PyObject[] args, String[] kws)

static int

calcNewPosition(int size, PyObject errorTuple)

Given the return from some codec error handler (invoked while encoding or decoding), which specifies a resume position, and the length of the input being encoded or decoded, check and interpret the resume position.

static PyObject

decode(PyString v, String encoding, String errors)

Decode the bytes v using the codec registered for the encoding.

static PyObject

decoding_error(String errors, String encoding, String toDecode, int start, int end, String reason)

Invoke a user-defined error-handling mechanism, for errors encountered during decoding, as registered through register_error(String, PyObject).

static String

encode(PyString v, String encoding, String errors)

Encode v using the codec registered for the encoding.

static PyObject

encoding_error(String errors, String encoding, String toEncode, int start, int end, String reason)

Invoke a user-defined error-handling mechanism, for errors encountered during encoding, as registered through register_error(String, PyObject).

static String

getDefaultEncoding()

static PyObject

ignore_errors(PyObject[] args, String[] kws)

static int

insertReplacementAndGetResume(StringBuilder partialDecode, String errors, String encoding, String toDecode, int start, int end, String reason)

Handler for errors encountered during decoding, adjusting the output buffer contents and returning the correct position to resume decoding (if the handler does not simply raise an exception).

static PyTuple

lookup(String encoding)

static PyObject

lookup_error(String handlerName)

static String

PyUnicode_DecodeASCII(String str, int size, String errors)

static PyUnicode

PyUnicode_DecodeIDNA(String input, String errors)

static String

PyUnicode_DecodeLatin1(String str, int size, String errors)

static PyUnicode

PyUnicode_DecodePunycode(String input, String errors)

static String

PyUnicode_DecodeRawUnicodeEscape(String str, String errors)

static String

PyUnicode_DecodeUTF7(String bytes, String errors)

Decode completely a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object.

static String

PyUnicode_DecodeUTF7Stateful(String bytes, String errors, int[] consumed)

Decode (perhaps partially) a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object, and amount of input consumed.

static String

PyUnicode_DecodeUTF8(String str, String errors)

static String

PyUnicode_DecodeUTF8Stateful(String str, String errors, int[] consumed)

static String

PyUnicode_EncodeASCII(String str, int size, String errors)

static String

PyUnicode_EncodeIDNA(PyUnicode input, String errors)

static String

PyUnicode_EncodeLatin1(String str, int size, String errors)

static String

PyUnicode_EncodePunycode(PyUnicode input, String errors)

static String

PyUnicode_EncodeRawUnicodeEscape(String str, String errors, boolean modifed)

static String

PyUnicode_EncodeUTF7(String unicode, boolean base64SetO, boolean base64WhiteSpace, String errors)

Encode a UTF-16 Java String as UTF-7 bytes represented by the low bytes of the characters in a String.

static String

PyUnicode_EncodeUTF8(String str, String errors)

static void

register(PyObject search_function)

static void

register_error(String name, PyObject error)

static PyObject

replace_errors(PyObject[] args, String[] kws)

static void

setDefaultEncoding(String encoding)

static PyObject

strict_errors(PyObject[] args, String[] kws)

static StringBuilder

xmlcharrefreplace(int start, int end, String toReplace)

static PyObject

xmlcharrefreplace_errors(PyObject[] args, String[] kws)

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- BACKSLASHREPLACE
  
  public static final String BACKSLASHREPLACE
  See Also:
  
  Constant Field Values
- IGNORE
  
  public static final String IGNORE
  See Also:
  
  Constant Field Values
- REPLACE
  
  public static final String REPLACE
  See Also:
  
  Constant Field Values
- XMLCHARREFREPLACE
  
  public static final String XMLCHARREFREPLACE
  See Also:
  
  Constant Field Values
Constructor Details
- codecs
  
  public codecs()
Method Details
- getDefaultEncoding
  
  public static String getDefaultEncoding()
- setDefaultEncoding
  
  public static void setDefaultEncoding(String encoding)
- lookup_error
  
  public static PyObject lookup_error(String handlerName)
- register_error
  
  public static void register_error(String name, PyObject error)
- register
  
  public static void register(PyObject search_function)
- lookup
  
  public static PyTuple lookup(String encoding)
- decode
  
  public static PyObject decode(PyString v, String encoding, String errors)
  
  Decode the bytes v using the codec registered for the encoding. The encoding defaults to the system default encoding (see getDefaultEncoding()). The string errors may name a different error handling policy (built-in or registered with register_error(String, PyObject)). The default error policy is 'strict' meaning that encoding errors raise a ValueError. This method is exposed through the _codecs module as _codecs.decode(PyString, PyString, PyString)
  
  Parameters:
  
  v - bytes to be decoded
  
  encoding - name of encoding (to look up in codec registry)
  
  errors - error policy name (e.g. "ignore", "replace")
  
  Returns:
  
  Unicode string decoded from bytes
- encode
  
  public static String encode(PyString v, String encoding, String errors)
  
  Encode v using the codec registered for the encoding. The encoding defaults to the system default encoding (see getDefaultEncoding()). The string errors may name a different error handling policy (built-in or registered with register_error(String, PyObject)). The default error policy is 'strict' meaning that encoding errors raise a ValueError.
  
  Parameters:
  
  v - unicode string to be encoded
  
  encoding - name of encoding (to look up in codec registry)
  
  errors - error policy name (e.g. "ignore")
  
  Returns:
  
  bytes object encoding v
- strict_errors
  
  public static PyObject strict_errors(PyObject[] args, String[] kws)
- ignore_errors
  
  public static PyObject ignore_errors(PyObject[] args, String[] kws)
- replace_errors
  
  public static PyObject replace_errors(PyObject[] args, String[] kws)
- xmlcharrefreplace_errors
  
  public static PyObject xmlcharrefreplace_errors(PyObject[] args, String[] kws)
- xmlcharrefreplace
  
  public static StringBuilder xmlcharrefreplace(int start, int end, String toReplace)
- backslashreplace_errors
  
  public static PyObject backslashreplace_errors(PyObject[] args, String[] kws)
- backslashreplace
  
  public static StringBuilder backslashreplace(int start, int end, String toReplace)
- PyUnicode_DecodeUTF7Stateful
  
  public static String PyUnicode_DecodeUTF7Stateful(String bytes, String errors, int[] consumed)
  
  Decode (perhaps partially) a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object, and amount of input consumed. The only state we preserve is our read position, i.e. how many bytes we have consumed. So if the input ends part way through a Base64 sequence the data reported as consumed is just that up to and not including the Base64 start marker ('+'). Performance will be poor (quadratic cost) on runs of Base64 data long enough to exceed the input quantum in incremental decoding. The returned Java String is a UTF-16 representation of the Unicode result, in line with Java conventions. Unicode characters above the BMP are represented as surrogate pairs.
  
  Parameters:
  
  bytes - input represented as String (Jython PyString convention)
  
  errors - error policy name (e.g. "ignore", "replace")
  
  consumed - returns number of bytes consumed in element 0, or is null if a "final" call
  
  Returns:
  
  unicode result (as UTF-16 Java String)
- PyUnicode_DecodeUTF7
  
  public static String PyUnicode_DecodeUTF7(String bytes, String errors)
  
  Decode completely a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object. The retruned Java String is a UTF-16 representation of the Unicode result, in line with Java conventions. Unicode characters above the BMP are represented as surrogate pairs.
  
  Parameters:
  
  bytes - input represented as String (Jython PyString convention)
  
  errors - error policy name (e.g. "ignore", "replace")
  
  Returns:
  
  unicode result (as UTF-16 Java String)
- PyUnicode_EncodeUTF7
  
  public static String PyUnicode_EncodeUTF7(String unicode, boolean base64SetO, boolean base64WhiteSpace, String errors)
  
  Encode a UTF-16 Java String as UTF-7 bytes represented by the low bytes of the characters in a String. (String representation for byte data is chosen so that it may immediately become a PyString.) This method differs from the CPython equivalent (in Object/unicodeobject.c) which works with an array of code points that are, in a wide build, Unicode code points.
  
  Parameters:
  
  unicode - to be encoded
  
  base64SetO - true if characters in "set O" should be translated to base64
  
  base64WhiteSpace - true if white-space characters should be translated to base64
  
  errors - error policy name (e.g. "ignore", "replace")
  
  Returns:
  
  bytes representing the encoded unicode string
- PyUnicode_DecodeUTF8
  
  public static String PyUnicode_DecodeUTF8(String str, String errors)
- PyUnicode_DecodeUTF8Stateful
  
  public static String PyUnicode_DecodeUTF8Stateful(String str, String errors, int[] consumed)
- PyUnicode_EncodeUTF8
  
  public static String PyUnicode_EncodeUTF8(String str, String errors)
- PyUnicode_DecodeASCII
  
  public static String PyUnicode_DecodeASCII(String str, int size, String errors)
- PyUnicode_DecodeLatin1
  
  public static String PyUnicode_DecodeLatin1(String str, int size, String errors)
- PyUnicode_EncodeASCII
  
  public static String PyUnicode_EncodeASCII(String str, int size, String errors)
- PyUnicode_EncodeLatin1
  
  public static String PyUnicode_EncodeLatin1(String str, int size, String errors)
- PyUnicode_EncodeRawUnicodeEscape
  
  public static String PyUnicode_EncodeRawUnicodeEscape(String str, String errors, boolean modifed)
- PyUnicode_DecodeRawUnicodeEscape
  
  public static String PyUnicode_DecodeRawUnicodeEscape(String str, String errors)
- PyUnicode_EncodePunycode
  
  public static String PyUnicode_EncodePunycode(PyUnicode input, String errors)
- PyUnicode_DecodePunycode
  
  public static PyUnicode PyUnicode_DecodePunycode(String input, String errors)
- PyUnicode_EncodeIDNA
  
  public static String PyUnicode_EncodeIDNA(PyUnicode input, String errors)
- PyUnicode_DecodeIDNA
  
  public static PyUnicode PyUnicode_DecodeIDNA(String input, String errors)
- encoding_error
  
  public static PyObject encoding_error(String errors, String encoding, String toEncode, int start, int end, String reason)
  
  Invoke a user-defined error-handling mechanism, for errors encountered during encoding, as registered through register_error(String, PyObject). The return value is the return from the error handler indicating the replacement codec input and the the position at which to resume encoding. Invokes the mechanism described in PEP-293.
  
  Parameters:
  
  errors - name of the error policy (or null meaning "strict")
  
  encoding - name of encoding that encountered the error
  
  toEncode - unicode string being encoded
  
  start - index of first char it couldn't encode
  
  end - index+1 of last char it couldn't encode (usually becomes the resume point)
  
  reason - contribution to error message if any
  
  Returns:
  
  must be a tuple (replacement_unicode, resume_index)
- insertReplacementAndGetResume
  
  public static int insertReplacementAndGetResume(StringBuilder partialDecode, String errors, String encoding, String toDecode, int start, int end, String reason)
  
  Handler for errors encountered during decoding, adjusting the output buffer contents and returning the correct position to resume decoding (if the handler does not simply raise an exception).
  
  Parameters:
  
  partialDecode - output buffer of unicode (as UTF-16) that the codec is building
  
  errors - name of the error policy (or null meaning "strict")
  
  encoding - name of encoding that encountered the error
  
  toDecode - bytes being decoded
  
  start - index of first byte it couldn't decode
  
  end - index+1 of last byte it couldn't decode (usually becomes the resume point)
  
  reason - contribution to error message if any
  
  Returns:
  
  the resume position: index of next byte to decode
- decoding_error
  
  public static PyObject decoding_error(String errors, String encoding, String toDecode, int start, int end, String reason)
  
  Invoke a user-defined error-handling mechanism, for errors encountered during decoding, as registered through register_error(String, PyObject). The return value is the return from the error handler indicating the replacement codec output and the the position at which to resume decoding. Invokes the mechanism described in PEP-293.
  
  Parameters:
  
  errors - name of the error policy (or null meaning "strict")
  
  encoding - name of encoding that encountered the error
  
  toDecode - bytes being decoded
  
  start - index of first byte it couldn't decode
  
  end - index+1 of last byte it couldn't decode (usually becomes the resume point)
  
  reason - contribution to error message if any
  
  Returns:
  
  must be a tuple (replacement_unicode, resume_index)
- calcNewPosition
  
  public static int calcNewPosition(int size, PyObject errorTuple)
  
  Given the return from some codec error handler (invoked while encoding or decoding), which specifies a resume position, and the length of the input being encoded or decoded, check and interpret the resume position. Negative indexes in the error handler return are interpreted as "from the end". If the result would be out of bounds in the input, an IndexError exception is raised.
  
  Parameters:
  
  size - of byte buffer being decoded
  
  errorTuple - returned from error handler
  
  Returns:
  
  absolute resume position.

Class codecs

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

BACKSLASHREPLACE

IGNORE

REPLACE

XMLCHARREFREPLACE

Constructor Details

codecs

Method Details

getDefaultEncoding

setDefaultEncoding

lookup_error

register_error

register

lookup

decode

encode

strict_errors

ignore_errors

replace_errors

xmlcharrefreplace_errors

xmlcharrefreplace

backslashreplace_errors

backslashreplace

PyUnicode_DecodeUTF7Stateful

PyUnicode_DecodeUTF7

PyUnicode_EncodeUTF7

PyUnicode_DecodeUTF8

PyUnicode_DecodeUTF8Stateful

PyUnicode_EncodeUTF8

PyUnicode_DecodeASCII

PyUnicode_DecodeLatin1

PyUnicode_EncodeASCII

PyUnicode_EncodeLatin1

PyUnicode_EncodeRawUnicodeEscape

PyUnicode_DecodeRawUnicodeEscape

PyUnicode_EncodePunycode

PyUnicode_DecodePunycode

PyUnicode_EncodeIDNA

PyUnicode_DecodeIDNA

encoding_error

insertReplacementAndGetResume

decoding_error

calcNewPosition