/usr/share/doc/texlive-doc/support/lua-uca/README.md

dwww Home | Show directory contents | Find package
\iffalse
# The `Lua-UCA` package
\fi

This package adds support for the [Unicode collation algorithm](https://unicode.org/reports/tr10/) for Lua 5.3. 


## Usage

To sort a table using Czech collation rules:

   
    kpse.set_program_name "luatex"
    local ducet = require "lua-uca.lua-uca-ducet"
    local collator = require "lua-uca.lua-uca-collator"
    local languages = require "lua-uca.lua-uca-languages"
    
    local collator_obj = collator.new(ducet)
    -- load Czech rules
    collator_obj = languages.cs(collator_obj)
    
    local t = {"cihla",  "chochol", "hudba", "jasan", "čáp"}
    
    table.sort(t, function(a,b) 
      return collator_obj:compare_strings(a,b) 
    end)
    
    for _, v in ipairs(t) do
      print(v)
    end

The output:

> cihla
> čáp
> hudba
> chochol
> jasan

More samples of the library usage can be found in the source repository of this package on [Github](https://github.com/michal-h21/lua-uca).
% See `HACKING.md` file in the repo for more information.

## Use with Xindex processor

[Xindex](https://www.ctan.org/pkg/xindex) is flexible index processor written
in Lua by Herbert Voß. It has built-in `Lua-UCA` support starting with version
`0.23`. The support can be requested using the `-u` option:

     xindex -u -l no -c norsk filename.idx


## Change sorting rules

The simplest way to change the default sorting order is to use the
`tailor_string` method of the `collator_obj` object. It updates the collator object using
special syntax which is subset of the format used by the [Unicode locale data
markup
language](https://www.unicode.org/reports/tr35/tr35-collation.html#Orderings).

    collator_obj:tailor_string "&a<b"

Full example with Czech rules:

    kpse.set_program_name "luatex"
    local ducet = require "lua-uca.lua-uca-ducet"
    local collator = require "lua-uca.lua-uca-collator"
    local languages = require "lua-uca.lua-uca-languages"
    
    local collator_obj = collator.new(ducet)
    local tailoring = function(s) collator_obj:tailor_string(s) end

    tailoring "&c<č<<<Č"
    tailoring "&h<ch<<<cH<<<Ch<<<CH"
    tailoring "&R<ř<<<Ř"
    tailoring "&s<š<<<Š"
    tailoring "&z<ž<<<Ž"

Note that the sequence of letters `ch`, `Ch`, `cH` and `CH` will be sorted after `h`

It is also possible to expand a letter to multiple letters, like this example for DIN 2:


    tailoring "&Ö=Oe"
    tailoring "&ö=oe"

Some languages, like Norwegian, sort uppercase letters before lowercase. This
can be enabled using `collator_obj:uppercase_first()` function:

    local tailoring = function(s) collator_obj:tailor_string(s) end
    collator_obj:uppercase_first()
    tailoring("&D<<đ<<<Đ<<ð<<<Ð")
    tailoring("&th<<<þ")
    tailoring("&TH<<<Þ")
    tailoring("&Y<<ü<<<Ü<<ű<<<Ű")
    tailoring("&ǀ<æ<<<Æ<<ä<<<Ä<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<å<<<Å<<<aa<<<Aa<<<AA")
    tailoring("&oe<<œ<<<Œ")

% More information on a new language support is in the `HACKING.md`
% document in the [`Lua-UCA` Github repo](https://github.com/michal-h21/lua-uca/blob/master/HACKING.md).

### Script reordering

Many languages sort different scripts after the script this language uses. As
Latin based scripts are sorted first, it is necessary to reorder scripts in
such cases.

The `collator_obj:reorder` function takes table with scripts that need to be reordered. 
For example Cyrillic can be sorted before Latin using:

    collator_obj:reorder {"cyrillic"}

In German or Czech, numbers should be sorted after all other characters. This can be done using:

    collator_obj:reorder {"others", "digits"}

The special keyword "others" means that the scripts that follows in the table
will be sorted at the very end.


# What is missing

- Algorithm for setting implicit sort weights of characters that are not explicitly listed in DUCET.
- Special handling of CJK scripts.

\iffalse
# Copyright

Michal Hoftich, 2021. See LICENSE file for more details.


\fi
Generated by dwww version 1.15 on Sun Jun 30 05:00:28 CEST 2024.