dwww Home | Show directory contents | Find package

Quickstart
==========

This is a very brief introduction to the use of PyStemmer.

First, import the library:

>>> import Stemmer

Just for show, we'll display a list of the available stemming algorithms:

>>> print(Stemmer.algorithms())
['arabic', 'armenian', 'basque', 'catalan', 'danish', 'dutch', 'english', 'finnish', 'french', 'german', 'greek', 'hindi', 'hungarian', 'indonesian', 'irish', 'italian', 'lithuanian', 'nepali', 'norwegian', 'porter', 'portuguese', 'romanian', 'russian', 'serbian', 'spanish', 'swedish', 'tamil', 'turkish', 'yiddish']

Now, we'll get an instance of the english stemming algorithm:

>>> stemmer = Stemmer.Stemmer('english')

Stem a single word:

>>> print(stemmer.stemWord('cycling'))
cycl

Stem a list of words:

>>> print(stemmer.stemWords(['cycling', 'cyclist']))
['cycl', 'cyclist']

Strings which are supplied are assumed to be unicode.
We can use UTF-8 encoded input, too:

>>> print(stemmer.stemWords(['cycling', b'cyclist']))
['cycl', b'cyclist']

Each instance of the stemming algorithms uses a cache to speed up processing of
common words.  By default, the cache holds 10000 words, but this may be
modified.  The cache may be disabled entirely by setting the cache size to 0:

>>> print(stemmer.maxCacheSize)
10000

>>> stemmer.maxCacheSize = 1000

>>> print(stemmer.maxCacheSize)
1000

Generally you should create a stemmer object and reuse it rather than creating
a fresh object for each word stemmed, since there's some cost to creating and
destroying the object.  Reusing the object is also needed to benefit from the
caching.

The stemmer code is re-entrant, but not thread-safe if the same stemmer object
is used concurrently in different threads.

If you want to perform stemming concurrently in different threads, we suggest
creating a new stemmer object for each thread.  The alternative is to share
stemmer objects between threads and protect access using a mutex or similar
(e.g. `threading.Lock` in Python) but that's liable to slow your program down
as threads can end up waiting for the lock.

Generated by dwww version 1.15 on Sat May 18 08:42:47 CEST 2024.