Package org.python.core
Class codecs
java.lang.Object
org.python.core.codecs
This class implements the codec registry and utility methods supporting codecs, such as those
providing the standard replacement strategies ("ignore", "backslashreplace", etc.). The _codecs
module relies heavily on apparatus implemented here, and therefore so does the Python
codecs
module (in Lib/codecs.py
). It corresponds approximately to
CPython's Python/codecs.c
.
The class also contains the inner methods of the standard Unicode codecs, available for
transcoding of text at the Java level. These also are exposed through the _codecs
module. In CPython, the implementations are found in Objects/unicodeobject.c
.
- Since:
- Jython 2.0
-
Nested Class Summary
Nested Classes -
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic StringBuilder
backslashreplace
(int start, int end, String toReplace) static PyObject
backslashreplace_errors
(PyObject[] args, String[] kws) static int
calcNewPosition
(int size, PyObject errorTuple) Given the return from some codec error handler (invoked while encoding or decoding), which specifies a resume position, and the length of the input being encoded or decoded, check and interpret the resume position.static PyObject
Decode the bytesv
using the codec registered for theencoding
.static PyObject
decoding_error
(String errors, String encoding, String toDecode, int start, int end, String reason) Invoke a user-defined error-handling mechanism, for errors encountered during decoding, as registered throughregister_error(String, PyObject)
.static String
Encodev
using the codec registered for theencoding
.static PyObject
encoding_error
(String errors, String encoding, String toEncode, int start, int end, String reason) Invoke a user-defined error-handling mechanism, for errors encountered during encoding, as registered throughregister_error(String, PyObject)
.static String
static PyObject
ignore_errors
(PyObject[] args, String[] kws) static int
insertReplacementAndGetResume
(StringBuilder partialDecode, String errors, String encoding, String toDecode, int start, int end, String reason) Handler for errors encountered during decoding, adjusting the output buffer contents and returning the correct position to resume decoding (if the handler does not simply raise an exception).static PyTuple
static PyObject
lookup_error
(String handlerName) static String
PyUnicode_DecodeASCII
(String str, int size, String errors) static PyUnicode
PyUnicode_DecodeIDNA
(String input, String errors) static String
PyUnicode_DecodeLatin1
(String str, int size, String errors) static PyUnicode
PyUnicode_DecodePunycode
(String input, String errors) static String
PyUnicode_DecodeRawUnicodeEscape
(String str, String errors) static String
PyUnicode_DecodeUTF7
(String bytes, String errors) Decode completely a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object.static String
PyUnicode_DecodeUTF7Stateful
(String bytes, String errors, int[] consumed) Decode (perhaps partially) a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object, and amount of input consumed.static String
PyUnicode_DecodeUTF8
(String str, String errors) static String
PyUnicode_DecodeUTF8Stateful
(String str, String errors, int[] consumed) static String
PyUnicode_EncodeASCII
(String str, int size, String errors) static String
PyUnicode_EncodeIDNA
(PyUnicode input, String errors) static String
PyUnicode_EncodeLatin1
(String str, int size, String errors) static String
PyUnicode_EncodePunycode
(PyUnicode input, String errors) static String
PyUnicode_EncodeRawUnicodeEscape
(String str, String errors, boolean modifed) static String
PyUnicode_EncodeUTF7
(String unicode, boolean base64SetO, boolean base64WhiteSpace, String errors) Encode a UTF-16 Java String as UTF-7 bytes represented by the low bytes of the characters in a String.static String
PyUnicode_EncodeUTF8
(String str, String errors) static void
static void
register_error
(String name, PyObject error) static PyObject
replace_errors
(PyObject[] args, String[] kws) static void
setDefaultEncoding
(String encoding) static PyObject
strict_errors
(PyObject[] args, String[] kws) static StringBuilder
xmlcharrefreplace
(int start, int end, String toReplace) static PyObject
xmlcharrefreplace_errors
(PyObject[] args, String[] kws)
-
Field Details
-
BACKSLASHREPLACE
- See Also:
-
IGNORE
- See Also:
-
REPLACE
- See Also:
-
XMLCHARREFREPLACE
- See Also:
-
-
Constructor Details
-
codecs
public codecs()
-
-
Method Details
-
getDefaultEncoding
-
setDefaultEncoding
-
lookup_error
-
register_error
-
register
-
lookup
-
decode
Decode the bytesv
using the codec registered for theencoding
. Theencoding
defaults to the system default encoding (seegetDefaultEncoding()
). The stringerrors
may name a different error handling policy (built-in or registered withregister_error(String, PyObject)
). The default error policy is 'strict' meaning that encoding errors raise aValueError
. This method is exposed through the _codecs module as_codecs.decode(PyString, PyString, PyString)
- Parameters:
v
- bytes to be decodedencoding
- name of encoding (to look up in codec registry)errors
- error policy name (e.g. "ignore", "replace")- Returns:
- Unicode string decoded from
bytes
-
encode
Encodev
using the codec registered for theencoding
. Theencoding
defaults to the system default encoding (seegetDefaultEncoding()
). The stringerrors
may name a different error handling policy (built-in or registered withregister_error(String, PyObject)
). The default error policy is 'strict' meaning that encoding errors raise aValueError
.- Parameters:
v
- unicode string to be encodedencoding
- name of encoding (to look up in codec registry)errors
- error policy name (e.g. "ignore")- Returns:
- bytes object encoding
v
-
strict_errors
-
ignore_errors
-
replace_errors
-
xmlcharrefreplace_errors
-
xmlcharrefreplace
-
backslashreplace_errors
-
backslashreplace
-
PyUnicode_DecodeUTF7Stateful
Decode (perhaps partially) a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object, and amount of input consumed. The only state we preserve is our read position, i.e. how many bytes we have consumed. So if the input ends part way through a Base64 sequence the data reported as consumed is just that up to and not including the Base64 start marker ('+'). Performance will be poor (quadratic cost) on runs of Base64 data long enough to exceed the input quantum in incremental decoding. The returned Java String is a UTF-16 representation of the Unicode result, in line with Java conventions. Unicode characters above the BMP are represented as surrogate pairs.- Parameters:
bytes
- input represented as String (Jython PyString convention)errors
- error policy name (e.g. "ignore", "replace")consumed
- returns number of bytes consumed in element 0, or is null if a "final" call- Returns:
- unicode result (as UTF-16 Java String)
-
PyUnicode_DecodeUTF7
Decode completely a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object. The retruned Java String is a UTF-16 representation of the Unicode result, in line with Java conventions. Unicode characters above the BMP are represented as surrogate pairs.- Parameters:
bytes
- input represented as String (Jython PyString convention)errors
- error policy name (e.g. "ignore", "replace")- Returns:
- unicode result (as UTF-16 Java String)
-
PyUnicode_EncodeUTF7
public static String PyUnicode_EncodeUTF7(String unicode, boolean base64SetO, boolean base64WhiteSpace, String errors) Encode a UTF-16 Java String as UTF-7 bytes represented by the low bytes of the characters in a String. (String representation for byte data is chosen so that it may immediately become a PyString.) This method differs from the CPython equivalent (inObject/unicodeobject.c
) which works with an array of code points that are, in a wide build, Unicode code points.- Parameters:
unicode
- to be encodedbase64SetO
- true if characters in "set O" should be translated to base64base64WhiteSpace
- true if white-space characters should be translated to base64errors
- error policy name (e.g. "ignore", "replace")- Returns:
- bytes representing the encoded unicode string
-
PyUnicode_DecodeUTF8
-
PyUnicode_DecodeUTF8Stateful
-
PyUnicode_EncodeUTF8
-
PyUnicode_DecodeASCII
-
PyUnicode_DecodeLatin1
-
PyUnicode_EncodeASCII
-
PyUnicode_EncodeLatin1
-
PyUnicode_EncodeRawUnicodeEscape
-
PyUnicode_DecodeRawUnicodeEscape
-
PyUnicode_EncodePunycode
-
PyUnicode_DecodePunycode
-
PyUnicode_EncodeIDNA
-
PyUnicode_DecodeIDNA
-
encoding_error
public static PyObject encoding_error(String errors, String encoding, String toEncode, int start, int end, String reason) Invoke a user-defined error-handling mechanism, for errors encountered during encoding, as registered throughregister_error(String, PyObject)
. The return value is the return from the error handler indicating the replacement codec input and the the position at which to resume encoding. Invokes the mechanism described in PEP-293.- Parameters:
errors
- name of the error policy (or null meaning "strict")encoding
- name of encoding that encountered the errortoEncode
- unicode string being encodedstart
- index of first char it couldn't encodeend
- index+1 of last char it couldn't encode (usually becomes the resume point)reason
- contribution to error message if any- Returns:
- must be a tuple
(replacement_unicode, resume_index)
-
insertReplacementAndGetResume
public static int insertReplacementAndGetResume(StringBuilder partialDecode, String errors, String encoding, String toDecode, int start, int end, String reason) Handler for errors encountered during decoding, adjusting the output buffer contents and returning the correct position to resume decoding (if the handler does not simply raise an exception).- Parameters:
partialDecode
- output buffer of unicode (as UTF-16) that the codec is buildingerrors
- name of the error policy (or null meaning "strict")encoding
- name of encoding that encountered the errortoDecode
- bytes being decodedstart
- index of first byte it couldn't decodeend
- index+1 of last byte it couldn't decode (usually becomes the resume point)reason
- contribution to error message if any- Returns:
- the resume position: index of next byte to decode
-
decoding_error
public static PyObject decoding_error(String errors, String encoding, String toDecode, int start, int end, String reason) Invoke a user-defined error-handling mechanism, for errors encountered during decoding, as registered throughregister_error(String, PyObject)
. The return value is the return from the error handler indicating the replacement codec output and the the position at which to resume decoding. Invokes the mechanism described in PEP-293.- Parameters:
errors
- name of the error policy (or null meaning "strict")encoding
- name of encoding that encountered the errortoDecode
- bytes being decodedstart
- index of first byte it couldn't decodeend
- index+1 of last byte it couldn't decode (usually becomes the resume point)reason
- contribution to error message if any- Returns:
- must be a tuple
(replacement_unicode, resume_index)
-
calcNewPosition
Given the return from some codec error handler (invoked while encoding or decoding), which specifies a resume position, and the length of the input being encoded or decoded, check and interpret the resume position. Negative indexes in the error handler return are interpreted as "from the end". If the result would be out of bounds in the input, anIndexError
exception is raised.- Parameters:
size
- of byte buffer being decodederrorTuple
- returned from error handler- Returns:
- absolute resume position.
-