Class CharUtilities

java.lang.Object
org.apache.fop.util.CharUtilities

public class CharUtilities extends Object
This class provides utilities to distinguish various kinds of Unicode whitespace and to get character widths in a given FontState.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final char
    carriage return
    static final char
    Character code used to signal a character boundary in inline content, such as an inline with borders and padding or a nested block object.
    static final int
    Character class: Boundary between text runs
    static final char
    Ideogreaphic space
    static final char
    line-separator
    static final int
    Character class: Line feed
    static final char
    linefeed character
    static final char
    left-to-right embedding
    static final char
    left-to-right mark
    static final char
    left-to-right override
    static final char
    missing ideograph
    static final char
    non-breaking space
    static final char
    next line control character
    static final int
    Character class: non-whitespace
    static final char
    Unicode value indicating the the character is "not a character".
    static final char
    null char
    static final char
    Object replacement character
    static final char
    paragraph-separator
    static final char
    pop directional formatting
    static final char
    right-to-left embedding
    static final char
    right-to-left mark
    static final char
    right-to-left override
    static final char
    soft hyphen
    static final char
    normal space
    static final char
    normal tab
    static final int
    Character class: Unicode white space
    static final char
    word joiner
    static final int
    Character class: XML whitespace
    static final char
    zero-width joiner
    static final char
    zero-width no-break space (= byte order mark)
    static final char
    zero-width space
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    protected
    Utility class: Constructor prevents instantiating when subclassed.
  • Method Summary

    Modifier and Type
    Method
    Description
    static String
    charToNCRef(int c)
    Convert a single unicode scalar value to an XML numeric character reference.
    static int
    classOf(int c)
    Return the appropriate CharClass constant for the type of the passed character.
    Creates an iterator to iter a CharSequence codepoints.
    codepointsIter(CharSequence s, int beginIndex, int endIndex)
    Creates an iterator to iter a sub-CharSequence codepoints.
    static boolean
    Tells whether there is a surrogate pair starting from the given index in the CharSequence.
    static String
    format(int c)
    Format character for debugging output, which it is prefixed with "0x", padded left with '0' and either 4 or 6 hex characters in width according to whether it is in the BMP or not.
    static int
    incrementIfNonBMP(int codePoint)
    Returns 1 if codePoint not in the BMP.
    static boolean
    Method to determine if the character is an adjustable space.
    static boolean
    isAlphabetic(int c)
    Indicates whether a character is classified as "Alphabetic" by the Unicode standard.
    static boolean
    isAnySpace(int c)
    Determines if the character represents any kind of space.
    static boolean
    isBmpCodePoint(int codePoint)
    Determine whether the specified character (Unicode code point) is in then Basic Multilingual Plane (BMP).
    static boolean
    Helper method to determine if the character is a space with normal behavior.
    static boolean
    Indicates whether the given character is an explicit break-character
    static boolean
    Method to determine if the character is a (breakable) fixed-width space.
    static boolean
    Method to determine if the character is a nonbreaking space.
    static boolean
    Determine if two character sequences contain the same characters.
    static boolean
    isSurrogatePair(char ch)
    Determine if the given characters is part of a surrogate pair.
    static boolean
    Method to determine if the character is a zero-width space.
    static String
    padLeft(String s, int width, char pad)
    Pad a string S on left out to width W using padding character PAD.
    static String
    Convert a string to a sequence of ASCII or XML numeric character references.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • CODE_EOT

      public static final char CODE_EOT
      Character code used to signal a character boundary in inline content, such as an inline with borders and padding or a nested block object.
      See Also:
    • UCWHITESPACE

      public static final int UCWHITESPACE
      Character class: Unicode white space
      See Also:
    • LINEFEED

      public static final int LINEFEED
      Character class: Line feed
      See Also:
    • EOT

      public static final int EOT
      Character class: Boundary between text runs
      See Also:
    • NONWHITESPACE

      public static final int NONWHITESPACE
      Character class: non-whitespace
      See Also:
    • XMLWHITESPACE

      public static final int XMLWHITESPACE
      Character class: XML whitespace
      See Also:
    • NULL_CHAR

      public static final char NULL_CHAR
      null char
      See Also:
    • LINEFEED_CHAR

      public static final char LINEFEED_CHAR
      linefeed character
      See Also:
    • CARRIAGE_RETURN

      public static final char CARRIAGE_RETURN
      carriage return
      See Also:
    • TAB

      public static final char TAB
      normal tab
      See Also:
    • SPACE

      public static final char SPACE
      normal space
      See Also:
    • NBSPACE

      public static final char NBSPACE
      non-breaking space
      See Also:
    • NEXT_LINE

      public static final char NEXT_LINE
      next line control character
      See Also:
    • ZERO_WIDTH_SPACE

      public static final char ZERO_WIDTH_SPACE
      zero-width space
      See Also:
    • WORD_JOINER

      public static final char WORD_JOINER
      word joiner
      See Also:
    • ZERO_WIDTH_JOINER

      public static final char ZERO_WIDTH_JOINER
      zero-width joiner
      See Also:
    • LRM

      public static final char LRM
      left-to-right mark
      See Also:
    • RLM

      public static final char RLM
      right-to-left mark
      See Also:
    • LRE

      public static final char LRE
      left-to-right embedding
      See Also:
    • RLE

      public static final char RLE
      right-to-left embedding
      See Also:
    • PDF

      public static final char PDF
      pop directional formatting
      See Also:
    • LRO

      public static final char LRO
      left-to-right override
      See Also:
    • RLO

      public static final char RLO
      right-to-left override
      See Also:
    • ZERO_WIDTH_NOBREAK_SPACE

      public static final char ZERO_WIDTH_NOBREAK_SPACE
      zero-width no-break space (= byte order mark)
      See Also:
    • SOFT_HYPHEN

      public static final char SOFT_HYPHEN
      soft hyphen
      See Also:
    • LINE_SEPARATOR

      public static final char LINE_SEPARATOR
      line-separator
      See Also:
    • PARAGRAPH_SEPARATOR

      public static final char PARAGRAPH_SEPARATOR
      paragraph-separator
      See Also:
    • MISSING_IDEOGRAPH

      public static final char MISSING_IDEOGRAPH
      missing ideograph
      See Also:
    • IDEOGRAPHIC_SPACE

      public static final char IDEOGRAPHIC_SPACE
      Ideogreaphic space
      See Also:
    • OBJECT_REPLACEMENT_CHARACTER

      public static final char OBJECT_REPLACEMENT_CHARACTER
      Object replacement character
      See Also:
    • NOT_A_CHARACTER

      public static final char NOT_A_CHARACTER
      Unicode value indicating the the character is "not a character".
      See Also:
  • Constructor Details

    • CharUtilities

      protected CharUtilities()
      Utility class: Constructor prevents instantiating when subclassed.
  • Method Details

    • classOf

      public static int classOf(int c)
      Return the appropriate CharClass constant for the type of the passed character.
      Parameters:
      c - character to inspect
      Returns:
      the determined character class
    • isBreakableSpace

      public static boolean isBreakableSpace(int c)
      Helper method to determine if the character is a space with normal behavior. Normal behavior means that it's not non-breaking.
      Parameters:
      c - character to inspect
      Returns:
      True if the character is a normal space
    • isZeroWidthSpace

      public static boolean isZeroWidthSpace(int c)
      Method to determine if the character is a zero-width space.
      Parameters:
      c - the character to check
      Returns:
      true if the character is a zero-width space
    • isFixedWidthSpace

      public static boolean isFixedWidthSpace(int c)
      Method to determine if the character is a (breakable) fixed-width space.
      Parameters:
      c - the character to check
      Returns:
      true if the character has a fixed-width
    • isNonBreakableSpace

      public static boolean isNonBreakableSpace(int c)
      Method to determine if the character is a nonbreaking space.
      Parameters:
      c - character to check
      Returns:
      True if the character is a nbsp
    • isAdjustableSpace

      public static boolean isAdjustableSpace(int c)
      Method to determine if the character is an adjustable space.
      Parameters:
      c - character to check
      Returns:
      True if the character is adjustable
    • isAnySpace

      public static boolean isAnySpace(int c)
      Determines if the character represents any kind of space.
      Parameters:
      c - character to check
      Returns:
      True if the character represents any kind of space
    • isAlphabetic

      public static boolean isAlphabetic(int c)
      Indicates whether a character is classified as "Alphabetic" by the Unicode standard.
      Parameters:
      c - the character
      Returns:
      true if the character is "Alphabetic"
    • isExplicitBreak

      public static boolean isExplicitBreak(int c)
      Indicates whether the given character is an explicit break-character
      Parameters:
      c - the character to check
      Returns:
      true if the character represents an explicit break
    • charToNCRef

      public static String charToNCRef(int c)
      Convert a single unicode scalar value to an XML numeric character reference. If in the BMP, four digits are used, otherwise 6 digits are used.
      Parameters:
      c - a unicode scalar value
      Returns:
      a string representing a numeric character reference
    • toNCRefs

      public static String toNCRefs(String s)
      Convert a string to a sequence of ASCII or XML numeric character references.
      Parameters:
      s - a java string (encoded in UTF-16)
      Returns:
      a string representing a sequence of numeric character reference or ASCII characters
    • padLeft

      public static String padLeft(String s, int width, char pad)
      Pad a string S on left out to width W using padding character PAD.
      Parameters:
      s - string to pad
      width - width of field to add padding
      pad - character to use for padding
      Returns:
      padded string
    • format

      public static String format(int c)
      Format character for debugging output, which it is prefixed with "0x", padded left with '0' and either 4 or 6 hex characters in width according to whether it is in the BMP or not.
      Parameters:
      c - character code
      Returns:
      formatted character string
    • isSameSequence

      public static boolean isSameSequence(CharSequence cs1, CharSequence cs2)
      Determine if two character sequences contain the same characters.
      Parameters:
      cs1 - first character sequence
      cs2 - second character sequence
      Returns:
      true if both sequences have same length and same character sequence
    • isBmpCodePoint

      public static boolean isBmpCodePoint(int codePoint)
      Determine whether the specified character (Unicode code point) is in then Basic Multilingual Plane (BMP). Such code points can be represented using a single char.
      Parameters:
      codePoint - the character (Unicode code point) to be tested
      Returns:
      true if the specified code point is between Character#MIN_VALUE and Character#MAX_VALUE} inclusive; false otherwise
      See Also:
    • incrementIfNonBMP

      public static int incrementIfNonBMP(int codePoint)
      Returns 1 if codePoint not in the BMP. This function is particularly useful in for loops over strings where, in presence of surrogate pairs, you need to skip one loop.
      Parameters:
      codePoint - 1 if codePoint > 0xFFFF, 0 otherwise
      Returns:
      1 if codePoint > 0xFFFF, 0 otherwise
    • isSurrogatePair

      public static boolean isSurrogatePair(char ch)
      Determine if the given characters is part of a surrogate pair.
      Parameters:
      ch - character to be checked
      Returns:
      true if ch is an high surrogate or a low surrogate
    • containsSurrogatePairAt

      public static boolean containsSurrogatePairAt(CharSequence chars, int index)
      Tells whether there is a surrogate pair starting from the given index in the CharSequence. If the character at index is an high surrogate then the character at index+1 is checked to be a low surrogate. If a malformed surrogate pair is encountered then an IllegalArgumentException is thrown.
       high surrogate [0xD800 - 0xDC00]
       low surrogate [0xDC00 - 0xE000]
       
      Parameters:
      chars - CharSequence to check
      index - index in the CharSequqnce where to start the check
      Returns:
      true if there is a well-formed surrogate pair at index
      Throws:
      IllegalArgumentException - if there wrong usage of surrogate pairs
    • codepointsIter

      public static Iterable<Integer> codepointsIter(CharSequence s)
      Creates an iterator to iter a CharSequence codepoints.
      Parameters:
      s - CharSequence to iter
      Returns:
      codepoint iterator for the given CharSequence.
      See Also:
    • codepointsIter

      public static Iterable<Integer> codepointsIter(CharSequence s, int beginIndex, int endIndex)
      Creates an iterator to iter a sub-CharSequence codepoints.
      Parameters:
      s - CharSequence to iter
      beginIndex - lower range
      endIndex - upper range
      Returns:
      codepoint iterator for the given sub-CharSequence.
      See Also: