PYTHEAS Software & Services PYTHEAS MailGate - POP3 Connector for Microsoft Exchange and Lotus Domino

Spam Detection Using SpamAssassin with PYTHEAS MailGate

News (June 16th, 2008)

This page now includes instructions how to install SpamAssassin release 3.2.5. Upgrade instructions are here.

What is SpamAssassin?

SpamAssassin (tm) is an open source product that performs heuristic spam analysis and RBL (Realtime Blackhole List) lookups among other tests, to clearly tag spam mail as such. PYTHEAS MailGate can then be instructed to handle spam mail in a particular way.

SpamAssassin (tm) is open source software, licensed under the Apache Software License (which you can find at http://www.apache.org/foundation/licence-FAQ.html). No guarantees or warranties apply to the software. You use it entirely at your own risk.

Neither SpamAssassin nor the software components it requires are installed by the PYTHEAS MailGate setup program. Please note that you need a PYTHEAS MailGate license key which activates the Content-Checking Rules engine; see the About tab of the Configuration Program to learn about the options activated by your license key.

In its default form, SpamAssassin is designed and written for Unix platforms. This document outlines how to get SpamAssassin working on a Win32 platform such as Windows 200x/XP. Although it may seem a little bit cumbersome at first glance, we are sure that you will recognize that it is worth the trouble - it has an amazing efficiency.

Upgrading SpamAssassin

If you are doing a fresh install, you can skip this section.

Upgrading a SpamAssassin v. 3.x Installation

For the time of the upgrade, you should stop the Pytheas.MailGate service (or the Communication Task). To upgrade to a newer version of SpamAssassin:

  • If you upgrade from SpamAssassin 2.x to 3.x, be sure to read these notes first.
  • Uninstall ActivePerl. Then delete the whole c:\perl subtree. Be sure not to delete the c:\etc\mail\spamassassin folder. You may also want to move the NMAKE utilitiy from C:\perl\bin to some safe place.
  • Be sure to get the new SpamAssassin support files. The sa.cmd file required for SpamAssassin v.3.2.5 is different from the one included in the package for the previous version of SpamAssassin. Please copy DOS2UNIX.EXE und UNIX2DOS.EXE to the folder where PYTHEAS MailGate has been installed.
  • Your configuration file pmg-local.cf  may contain options which are no longer supported in the new version. Carefully read the beginning of spamdebug.txt when checking your new SpamAssassin installation later.
  • Proceed the same way as you would for a fresh installation, starting from here.

Installing Perl

You should install Perl on Windows 200x/XP-platforms only. It seems to be possible to get it running on Microsoft Windows 95/98/ME, but Perl is said to act unreliably on such platforms.

Go to http://www.activestate.com, go to the ActivePerl download page, and get ActivePerl 5.8.x. Choose the Windows MSI installer version.
Please note: even on 64bit systems, you should install the Windows (x86) package.

  • Double click on the MSI file and run it. Keep the features Perl et PPM selected. You may unselect the features Perl ISAPI, PerlEx, PerlScript, Documentation et Exemples.
  • Open a Command-Line window and type PERL -V to check that everything is fine.
  • In subsequent sections, it will be assumed that Perl has been installed in C:\PERL. Make appropriate changes if necessary.
  • Configure access to public DNS (this is not yet needed here, but would require another reboot if we do it later): The following environment variables need to be defined at system level:
    RES_NAMESERVERS = ipaddress
    LANG = en_US
    ipaddress represents the IP address of your ISP's DNS server or your own DNS server, provided it is linked to the public DNS. To add more than one, separate the addresses with a space character. Add these to the global environment variables of your operating system which can be defined in Control Panel / System, on the Advanced tab.
  • Reboot the computer. If Perl already had been installed on your computer, and the environment variables already had been defined, for ex. during an upgrade, you may skip the reboot.
  • After rebooting, open a command line window, and type PATH to make sure that C:\PERL\BIN is now part of your PATH environment variable.

Installing NMAKE

Installing the Necessary Perl Modules

Perl uses modules to extend the language's capabilities. Many of them are included with the core distribution, but many others are available. SpamAssassin requires several modules which are not in the core distribution of ActivePerl.

Install the DB_File, IP-Country, Mail-SPF-Query and Win32-Registry-File modules

  • Open a command line window.
  • Type: PPM-Shell
    note 1: PPM connects to the repository via TCP Port 80 so you should be connected to the Internet and keep this port open.
    note 2: if PPM finds several repositories from where a module can be installed, it will show a list of these instead of installing the requested module. In this case, type INSTALL n to install the module from repository number n.
  • At the PPM> prompt, type: install DB_File
    Expected response:
    Downloading DB_File-1.816
    (...)
    Updating files in site area...done
    7 files installed
  • Still at the PPM> prompt, type: install IP-Country
    Expected response:
    Downloading IP-Country-2.23
    (...)
    Updating files in site area...done
    19 files installed
  • Still at the PPM> prompt, type: install Mail-SPF-Query
    Expected response:
    Downloading Mail-SPF-Query-1.999.1
    Downloading Net-DNS-0.63
    Downloading Net-CIDR-Lite-0.20
    Downloading Sys-Hostname-Long-1.4
    Downloading Net-IP-1.25
    (...)
    Updating files in site area...done
    121 files installed
  • Still at the PPM> prompt, type: install Win32-Registry-File
    Expected response:
    Downloading Win32-Registry-File-1.10
    Downloading Tie-IxHash-1.21
    (...)
    Updating files in site area...done
    5 files installed
  • Type: quit

Obtaining and Installing SpamAssassin

  • Be sure to have PYTHEAS MailGate v. 2.32a (or a newer version). Upgrade if necessary.
  • Go to http://spamassassin.apache.org/downloads.html, and download the ZIP file distribution. Extract the Zip file off the root. For SpamAssassin version 3.2.5 for example, this will create C:\Mail-SpamAssassin-3.2.5  or C:\Mail-SpamAssassin-3.2.5\Mail-SpamAssassin-3.2.5, depending on how you proceed. We'll refer to this folder as the SPAMSOURCE folder in subsequent sections.
  • Open a command-line window, go to the SPAMSOURCE folder and type:
    PERL MAKEFILE.PL
    You will be asked a couple of questions. Be sure to answer No to the first one, which is not the default response:
    First question: Build spamc.exe (...)?
    Answer: N
    Next question: What email address or URL should be used (...)
    Answer: give a meaningful answer for your site.
    You may safely ignore the warnings about optional missing modules:
    (...)
    optional module missing: Razor2
    optional module missing: Net::Ident
    optional module missing: IO::Socket::INET6
    optional module missing: IO::Socket::SSL
    (...)
  • Still in the SPAMSOURCE folder, type:
    NMAKE
    NMAKE INSTALL
  • Make a backup copy of c:\perl\site\etc\mail\spamassassin\v310.pre (name it  v310.backup for ex.; in any case, don't give it the .pre extension). Open the file c:\perl\site\etc\mail\spamassassin\v310.pre in a text editor (Wordpad.exe will handle the line endings better than Notepad.exe).
    At the beginning of the lines
    loadplugin Mail::SpamAssassin::Plugin::Pyzor
    loadplugin Mail::SpamAssassin::Plugin::Razor2
    add the character # to transform them into a comment and avoid loading the plug-ins.

Configure Access to Public DNS

DNS access is needed for all RBL lookups. We already set the required environment variables:
SET RES_NAMESERVERS=ipaddress
SET LANG=en_US

Testing Your SpamAssassin Installation

Rename the SPAMSOURCE\rules subfolder (call it rules-orig for ex.).

From a command line window, in the SPAMSOURCE folder, type:
c:\perl\site\bin\spamassassin -D < sample-nonspam.txt 2>spamdebug.txt

This command should run smoothly. In the command line window, you will get the message after it passed through SpamAssassin. The output should indicate that this sample message is not spam - look at the X-Spam-... lines added by SpamAssassin in the header part of the message.

Please note: it may happen that the file spamassassin.bat is not created in the c:\perl\site\bin folder, but in the c:\perl\bin folder. In this case please adjust the suggested commands in the subsequent chapters.

Have a look at spamdebug.txt which has been created by this run. Check for DNS resolution. In the Received header parsing part of it, you should see:
dbg: dns: is Net::DNS::Resolver available? yes
dbg: dns: Net::DNS version: (...)
dbg: dns: trying (3) w3.org...
dbg: dns: looking up NS for 'w3.org'
dbg: dns: NS lookup of w3.org using (...) succeeded => DNS available (set dns_available to override)

If there is trouble with DNS resolution, verify that you properly configured access to public DNS. If you are in doubt with a DNS server, you can check it with NSLOOKUP (issue the server configuration command to connect to the DNS server in question).

At the end of the file, check for the results:
dbg: check: is spam? score=0 required=5
dbg: check: tests=
dbg: check: subtests=__CT,__CTYPE_CHARSET_QUOTED, __CT_TEXT_PLAIN, __DOS_BODY_STOCK, __DOS_BODY_SUN, __DOS_HAS_ANY_URI, __DOS_LINK, __DOS_RCVD_FRI, __FB_PICK, __FB_S_STOCK, __FM_STOCK_WORDS, __HAS_ANY_EMAIL, __HAS_ANY_URI, __HAS_MSGID, __HAS_RCVD, __HAS_SUBJECT, __LAST_UNTRUSTED_RELAY_NO_AUTH, __MIME_VERSION, __MISSING_REF, __MSOE_MID_WRONG_CASE, __NAKED_TO, __NONEMPTY_BODY, __RCVD_IN_SORBS, __RCVD_IN_ZEN, __SANE_MSGID, __TOCC_EXISTS, __YOUR_ACCOUNT

Now let's check if a message is correctly identified as spam. From the SPAMSOURCE folder, type:
c:\perl\site\bin\spamassassin -D < sample-spam.txt 2>spamdebug.txt

The output in the command line window should indicate that this sample message is spam (look at the X-Spam-... lines added by SpamAssassin in the header part of the message, and the body of the message which has been modified by SpamAssassin).

Have a look at spamdebug.txt. At the end of the file, check for the results:
dbg: check: is spam? score=999.998 required=5
dbg: check: tests=GTUBE,NO_RECEIVED,NO_RELAYS
dbg: check: subtests=__CT,__CTE,__CT_TEXT_PLAIN,__HAS_MSGID,__HAS_SUBJECT, __MIME_VERSION, __MISSING_REF, __MSGID_OK_HOST, __NONEMPTY_BODY, __SANE_MSGID, __TOCC_EXISTS, __UNUSABLE_MSGID

The Online Documentation

You can access the documentation at http://spamassassin.apache.org/full/3.1.x/dist/doc/. The most important file to read is Mail Spamassassin Conf - it outlines all major configuration parameters.

Connect SpamAssassin and PYTHEAS MailGate

Download and unzip the SpamAssassin support files. If you do not have a pmg-local.cf file, copy this file from the pack to C:\etc\mail\spamassassin. Create this folder if it does not exist. Use this file to configure the way SpamAssassin should work for your site. You should not edit global configuration files in C:\perl\site\share\spamassassin as your settings could be lost during the next upgrade. Of course, it is a good idea to look at the global configuration files to know what parameters can be changed.

You may also want to check out the configuration tool available at http://www.openhandhome.com/saconf.html.

Copy the files sa.cmd, DOS2UNIX.EXE et UNIX2DOS.EXE to the C:\Program Files\PytheasMailgate folder. The downloadable version of the file assumes that Perl has been installed in the C:\perl folder. Please note that we do not really need DOS2UNIX.EXE and UNIX2DOS.EXE for the current version of SpamAssassin, but it may be useful for future versions. Here are some comments about the contents of sa.cmd:

-D Instructs SpamAssassin to produce diagnostic output, which PYTHEAS MailGate may optionally insert into its Session Log message. You may change this option to obtain different diagnostic output. You can also omit this parameter altogether, if you do not need the diagnostic output.
-e Instructs SpamAssassin to set the exit code depending on the spam status. PYTHEAS MailGate uses this exit code to pick up the spam status.
-p ... Instructs SpamAssassin to use the Pmg-local.cf file, regardless of the user context in which it is running.
%1, %2, %3, %4 PYTHEAS MailGate will always call sa.cmd with 4 parameters. Please see details below.
%1 Path name of the file containing the message to be checked.
%2 Path name of the file to contain the checked message (this is always Temp_folder\PmgSaChk.tmp).
%3 Path name of the file to contain the diagnostic output produced by SpamAssassin (this is always Temp_folder\PmgSpamA.log).
%4 Determined by the POP3 account configuration in PYTHEAS MailGate. Note: the downloadable version of sa.cmd includes a code to handle the value NoSpamCheck for this parameter, which does what its name suggests: if you add Spam-A:NoSpamCheck to the Comment of a POP3 account, it will be excluded from spam checking.
Exit code or Errorlevel Since v. 2.31c,  PYTHEAS MailGate no longer relies on the exit code (or Errorlevel value) of the sa.cmd command file, as with previous versions.

To check your installation, you may use sapmg.cmd from the SpamAssassin support files. This command file calls SpamAssassin the same way PYTHEAS MailGate does. You will find the message which has been checked by SpamAssassin, and the diagnostic output spamdebug.txt, in the folder referenced by the TEMP environment variable (use the SET command to show environment variables).

Test it

If you activate spam-checking for the first time, you may want to activate it for a single POP3 account only, with the following options:

  • Check incoming mail with SpamAssassin... Only from POP3 accounts with the word Spam-A in the comment. Put the word Spam-A into the Comment field of the POP3 account entry.
  • Forward messages identified as Spam to... The intended Recipient as usual
  • Add SpamAssassin's report to the Session Log message...Always. Be sure to have your Recipient entry configured to receive Session Log messages (check the corresponding box on its property sheet). This is for debugging purposes only. Be sure to remove this option once you have SpamAssassin up and running.

After messages have been spam-checked, look for the following lines In the Remote Control Program or in the Session Log message:

[11:16] [Spamassassin] Spam status: No, score=-4.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.2.5

or

[11:06] *** [Spamassassin] Spam status: Yes, score=8.8 required=5.0 tests=BAYES_99, BIZ_TLD, HTML_60_70, HTML_MESSAGE, HTML_TITLE_UNTITLED, HTTP_EXCESSIVE_ESCAPES, MIME_BASE64_TEXT, MIME_HTML_NO_CHARSET, MIME_HTML_ONLY autolearn=no version=3.2.5

In case you have problems:

  • Please have a look at PmgSpamA.log or at PmgSaChk.tmp (you will need to make a copy of these file while the download session is still in progress, as they will be deleted upon termination). You will find these files in the folder you specified on the Environment tab of the Configuration Program.
  • If you have trouble to get SpamAssassin to work while running PYTHEAS MailGate as a service: please try to run the PYTHEAS MailGate Communication Task from the Start menu; you will need to stop the service for this purpose.
  • Did you really restart the computer since you installed Perl?

Cleaning up

The SPAMSOURCE folder is no longer needed once the installation is completed.

Spam Handling in PYTHEAS MailGate

To activate spam detection in PYTHEAS MailGate, open the configuration form which can be accessed from the Content Checking page. The SpamAssassin diagnostic output can be inserted into the PYTHEAS MailGate Session Log messages (please note that they will not be visible in the Remote Control Program).

SpamAssassin interface configuration form

Setting Spam Delivery Options in PYTHEAS MailGate

You have the following options for the delivery of messages which have been identified as spam:

  • deliver as usual (please note that the spam will have been tagged as such by SpamAssassin),
  • always deliver to a particular Recipient
  • do not deliver to anybody. If you have configured to write a log entry for every incoming message, messages identified as spam are logged even if they are actually not forwarded to any internal Recipient at all. Such messages receive a [Spam] tag at the beginning of the message subject.
  • Messages with a spam score above a certain level can be handled in a different way, as compared to spam messages with a spam score below this level.

Specific Configuration Settings for POP3 Accounts

You can activate spam analysis for all POP3 accounts, or only for selected ones. The Comments field in the POP3 Account properties is used for this purpose.

To activate spam detection only for certain POP3 accounts, configure the corresponding option in the PYTHEAS MailGate configuration (see screen shot above), and type the word Spam-A anywhere as a separate word into the Comment field of the selected POP3 accounts.

To use specific SpamAssassin configuration settings for POP3 accounts, proceed as follows:

  • Put the following expression into the Comment field of each POP3 Account entry: Spam-A:ConfigTag.
    ConfigTag
    is some identifier (only composed of letters and numbers). It will be passed as 4th parameter to sa.cmd.
  • You can now write code in sa.cmd to switch to different configuration files, based on this parameter.
  • If for a particular POP3 account, no ConfigTag value is found in the Comments field, the word Nothing is passed as 4th parameter (so you can be sure that your sa.cmd file always gets 4 parameters).
  • The sa.cmd file included in the SpamAssassin support files files contains code to handle the ConfigTag value of NoSpamCheck, to exclude a particular POP3 account from spam checking.

Spam/Ham Learning for SpamAssassin

For spam/ham learning with sa-learn, messages are needed in text format according to RFC822, with the complete message header lines. Unfortunately, there does not seem to be an easy way to save messages in such a format using Microsoft Outlook.

How to save incoming messages to files in RFC822 format

PYTHEAS MailGate v. 2.30c (or later) supports a new way to write messages to disk files in RFC822 format. This new function is managed by a tag in the Comment field of POP3 account entries. The name of the tag is SaveToDisk, and it has two parameters, which are separated by a vertical bar (ASCII_124):

  • a name for a folder (which will be created if it does not exist). Messages will be saved to this folder. It will be located in  Program_Files\PytheasMailgate\Incoming ;
  • an age limit (in hours). Any files in this folder older than the age limit will automatically be deleted. An age limit of 0 (zero) will disable automatic cleaning.

As an example, adding the expression SaveToDisk:SpamHam|24 to the Comment field of a POP3 account entry will save all messages from this POP3 mailbox to the folder Program_Files\PytheasMailgate\Incoming\SpamHam, and any file older than 24 hours in this folder will be cleaned out at the beginning of the upcoming download session. Message delivery will continue as usual. Several POP3 mailboxes can have their messages dropped into the same folder.

Another way to obtain messages in RFC822 format is to use the View/Delete messages function (accessible from the POP3 account property page). It has a Save message as-function (press F10 to access it). You should also configure PYTHEAS MailGate not to delete messages after downloading them, and clean them after a day or two. So you can get messages in RFC822 format directly from the POP3 account. With this method, you can also get the messages to teach the Bayes engine with messages for which it does not yield the correct result.

To streamline the process, you could do the following:

  • Set up a folder structure as described in the SpamAssassin support files package.
  • Make shortcuts on the desktop for the programs LearnHam.cmd and LearnSpam.cmd, and the folders  SpamTest\Ham and SpamTest\Spam.

Now the learning procedure could look like this:

  • If you configured your POP3 account to have the messages saved to files by using the SaveToDisk option (see above), open the ...\Incoming\... folder. Drag-and-drop the messages to the SpamTest\Spam or SpamTest\Ham shortcut.
  • Alternatively, you can save the message to feed into the learning process on the desktop (View/Delete messages, F10, Save message as). Then drag-and-drop the file to the shortcut pointing to the SpamTest\Spam or SpamTest\Ham folder.
  • Double-click on the shortcut for LernSpam.cmd or LernHam.cmd (this will feed all files contained in this folder into sa-learn).

Additional instructions for upgrading from SpamAssassin 2.x

  • Before installing a 3.x version of SpamAssassin over a 2.x version,  you should put your Bayes database into a "clean" state:
    from a command line prompt, execute:
    sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf --rebuild
  • Clean the c:\etc\mail\spamassassin folder: leave only pmg-local.cf and the bayesdb subfolder and its contents; delete all the other files.
  • After installing the 3.x version of SpamAssassin: From a command line prompt, execute...
    c:\perl\site\bin\sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf --sync
    followed by
    c:\perl\site\bin\sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf -D --import
    to migrate the data into new DB_File format. Be patient, these commands may take a couple of minutes to complete, depending on the size of your Bayes database.
  • Check that the new version of SpamAssassin works on your machine (we recommend to use the spam-a.cmd command file included in the SpamAssassin support files for this purpose, because it includes a reference to your pmg-local.cf preferences file, which in turn contains the pointer to your Bayes database in c:\etc\mail\spamassassin\bayesdb). Look in the debug output for configuration options in pmg-local.cf which may be no longer supported or which have a new syntax. You may want to compare your configuration file to the sample pmg-local.cf file contained in the SpamAssassin support files.

More Information

Credits

This document has been inspired by USING SpamAssassin WITH WIN32, (c) 2002,2004 by Michael Bell (thanks!), which can be found at http://www.openhandhome.com/howtosa310.html.

SpamAssassin is a trademark of the Apache Software Foundation.