GETTING STARTED Summary: 0. Terminology 1. Installing Bogofilter 2. Preparing for use a. Configuring bogofilter b. Training bogofilter 3. Setting up the mail transfer and delivery agents 4. Use with mail user agent 5. Ongoing training 6. Tuning bogofilter 7. The bogoutil program 8. Other useful commands 9. Additional information 0. Terminology -------------- spam - unwanted email ham - wanted mail (also called non-spam) false positive - a ham message that is wrongly scored as spam false negative - a spam message that is wrongly scored as ham 1. Installing Bogofilter ------------------------ Bogofilter can be installed from source or from a binary package. Releases are made available on SourceForge.net. If you're a newbie, installing from a binary package is quickest and easiest. If you're running an rpm based distro like Fedora or OpenSUSE, install bogofilter from an rpm. Similarly if you're running Debian, Mint, or Ubuntu, install from a deb package. Once downloaded and untarred, build and install with the usual commands, i.e. "configure", "make", and "make install". To ensure that the newly built bogofilter is running properly on your hardware and operating system, use "make check" to run a series of tests. For source rpms, use "rpm -bb bogofilter.spec" and "rpm -ivh bogofilter" (or comparable commands). Binary formats include builds for dynamically linked (shared) libaries, e.g. bogofilter-VER.x64_64.rpm. See the INSTALL file for more info. 2. Preparing for use -------------------- Once bogofilter has been installed, it needs to be configured and trained, i.e. given messages that you classify as spam and ham. 2a. Configuring bogofilter -------------------------- Bogofilter's default configuration is conservative, i.e. only messages that score very high on the ham/spam scale are classified as spam. This is done to minimize the number of false positives (non-spam messages which are classified as spam). If you need (or wish) to change bogofilter's configuration options, the file is named "bogofilter.cf" and bogofilter first checks for /etc/bogofilter.cf and then for ~/.bogofilter/bogofilter.cf. The configuration options are described in file bogofilter.cf.example. 2b. Training bogofilter ----------------------- Bogofilter uses a database for storing its tokens and their ham and spam counts. The file is commonly called "the wordlist" and its standard location is ~/.bogofilter/wordlist.db. The simple rule when training bogofilter is "more is better". As distributed, bogofilter does not include a wordlist. You, the user, need to tell bogofilter what you consider spam and what you consider ham. This is bogofilter's training process and involves running bogofilter with appropriate flags and with messages you've determined are ham and spam. As bogofilter can work with multiple mail formats, e.g. mailboxes, maildirs, MH directories, etc, the training commands will depend on your environment. As the default wordlist directory is $HOME/.bogofilter, the wordlist itself will be in $HOME/.bogofilter/wordlist.db. For user john, this is /home/john/.bogofilter/wordlist.db. Some useful options for training include: -s - register message(s) as spam. -n - register message(s) as non-spam. -M - use mailbox mode, i.e. classify multiple messages in an mbox formatted file. -B file1, file2, ... - set bulk mode, i.e. process multiple messages (files or directories) named on the command line. -v - sets the verbosity level, with the -s and -n training options, this will give the number of messages read and words entered in wordlist.db These options are documented in the bogofilter man page. Here are some sample commands: bogofilter -vn < ham.message.file bogofilter -vnM <ham.message.mbox bogofilter -vnMB ham.maildir bogofilter -vs < spam.message.file bogofilter -vsM <spam.message.mbox bogofilter -vsMB spam.maildir 3. Setting up the mail transfer and delivery agents --------------------------------------------------- Bogofilter works with many mail transfer agents (such as postfix, sendmail, and qmail) and many mail delivery agents (for example procmail and maildrop). Each of these has its own configuration file and methods for invoking spam filters. Bogofilter's documentation includes files "integrating-with-postfix" and "integrating-with-qmail". Read them for ideas on how to set up bogofilter for your environment. The most common setup uses bogofilter's "-p" (passthrough) option which adds an "X-Bogosity:" line as the end of the message's mail header. Typical examples of this line are: (for spam) X-Bogosity: Spam, tests=bogofilter, spamicity=1.000000, version=0.92.8 X-Bogosity: Spam, tests=bogofilter, spamicity=0.999765, version=0.92.8 (for non-spam) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=0.92.8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000413, version=0.92.8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.373476, version=0.92.8 (for "unsures") X-Bogosity: Unsure, tests=bogofilter, spamicity=0.500332, version=0.92.8 X-Bogosity: Unsure, tests=bogofilter, spamicity=0.463498, version=0.92.8 X-Bogosity: Unsure, tests=bogofilter, spamicity=0.640426, version=0.92.8 X-Bogosity: Unsure, tests=bogofilter, spamicity=0.824933, version=0.92.8 Alternatively, bogofilter's return codes can be used by procmail (or maildrop) rules to put spam in one mailbox and ham in another. 4. Use with mail user agent --------------------------- Bogofilter is compatible with all mail user agents. MUAs with filtering abilities can check the headers for "X-Bogosity: Spam" and "X-Bogosity: Ham" and take the appropriate actions for spam and ham. Alternatively, if your MUA has sufficient scripting capabilities, the MUA can run bogofilter and take the appropriate action. As time goes by and bogofilter encounters messages that it can not classify with certainty, there will be messages classified as "Unsure". As these messages are in the "gray" area, meaning "not clearly ham and not clearly spam" it's useful to have your MUA filter these messages to a separate folder (or mailbox) so you can use them to train bogofilter. 5. Ongoing training ------------------- Bogofilter can only do a good job if it has accurate and comprehensive information in its wordlist. As time goes by and bogofilter classifies messages for you, it will encounter problems because it does not have enough information to correctly classify each and every message. It's important to check message classifications! "False negatives", i.e. spam classified as ham, are easy since they'll appear in your inbox and be noticed. "False positives" are important to find because they're messages you want! All messages in these groups should be used to train bogofilter. Filtering "Unsure" messages into a separate folder (or mailbox), and manually classifying and separating them into spam and ham, gives a good set of messages for training (using bogofilter's "-s" and "-n" flags). Bogofilter's FAQ has two entries that provide additional info: How do I start my bogofilter training?" What are "training on error" and "training to exhaustion"? The FAQ can be online in English and French at: http://bogofilter.sourceforge.net/bogofilter-faq.html http://bogofilter.sourceforge.net/bogofilter-faq-fr.html 6. Tuning bogofilter -------------------- Once you've use bogofilter for a while, you may wish to optimize its classification parameters. The bogotune utility uses your wordlist and additional ham and spam messages to check a large variety of possible parameter values and find what'll work best for your environment. For more info, read the bogotune man page and file bogofilter-tuning.HOWTO.html. 7. The bogoutil program ----------------------- Bogoutil is a program that allows dumping the wordlist (as a text file), loading the wordlist (from a text file), displaying information about individual words, etc. Here are some sample uses of it: To display the wordlist contents: bogoutil -d ~/.bogofilter/wordlist.db To display the message counts for a word: bogoutil -w ~/.bogofilter .MSG_COUNT 8. Other useful commands ------------------------ To test scoring of individual words: echo show these words | bogofilter -H -vvv or: bogoutil -p ~/.bogofilter show these words To see the tokens and their spamicity scores for a message: bogofilter -vvv < message 9. Additional information ------------------------- Bogofilter's distribution includes a number of files containing more information. You'll find them in /usr/share/doc (or comparable location). The following files are included: FAQs: English - bogofilter-faq.html French - bogofilter-faq-fr.html General: INSTALL NEWS README RELEASE.NOTES Man pages: bogofilter bogolexer bogoutil bogotune bogoupgrade (also distributed in html and xml formats) HOWTOS: bogofilter-tuning.HOWTO.html integrating-with-postfix integrating-with-qmail Operating System specific README files: README.freebsd README.hp-ux README.RISC-OS
Generated by dwww version 1.15 on Sun May 19 03:22:44 CEST 2024.