Spam Detection Using SpamAssassin with PYTHEAS MailGate

This page now includes instructions how to install SpamAssassin release 3.4.1. Upgrade instructions are here.

What is SpamAssassin?

SpamAssassin (tm) is an open source product that performs heuristic spam analysis and RBL (Realtime Blackhole List) lookups among other tests, to clearly tag spam mail as such. PYTHEAS MailGate can then be instructed to handle spam mail in a particular way.

SpamAssassin (tm) is open source software, licensed under the Apache Software License (which you can find at http://www.apache.org/foundation/licence-FAQ.html). No guarantees or warranties apply to the software. You use it entirely at your own risk.

Neither SpamAssassin nor the software components it requires are installed by the PYTHEAS MailGate setup program. Please note that you need a PYTHEAS MailGate license key which activates the Content-Checking Rules engine; see the About tab to learn about the options activated by your license key.

In its default form, SpamAssassin is designed and written for Unix platforms. This document outlines how to get SpamAssassin working on a Windows platform. Although it may seem a little bit cumbersome at first glance, we are sure that you will recognize that it is worth the trouble - it has an amazing efficiency.

Upgrading SpamAssassin

If you are doing a fresh install, you can skip this section.

Upgrading a SpamAssassin v. 3.x Installation

For the time of the upgrade, you should stop the Pytheas.MailGate service (or the Communication Task). To upgrade to a newer version of SpamAssassin:

  • If you upgrade from SpamAssassin 2.x to 3.x, be sure to read these notes first.
  • Uninstall ActivePerl. Then delete the whole c:\perl subtree. Be sure not to delete the c:\etc\mail\spamassassin folder. You may also want to move the NMAKE utilitiy from C:\perl\bin to some safe place.
  • Be sure to get the new SpamAssassin support files. The sa.cmd file required for SpamAssassin v.3.4.1 is different from the one included in the package for earlier (pre 3.3.0) versions of SpamAssassin. Please copy DOS2UNIX.EXE und UNIX2DOS.EXE to the folder where PYTHEAS MailGate has been installed.
  • Your configuration file pmg-local.cf  may contain options which are no longer supported in the new version. Carefully read the beginning of spamdebug.txt when checking your new SpamAssassin installation later.
  • Proceed the same way as you would for a fresh installation, starting from here.

Installing Perl

Check that you have the latest version of the SpamAssassin support files. If not, download and unzip.

  • Install ActivePerl (v. 5.8.8.822). Keep the features Perl and PPM selected. You may unselect the features Perl ISAPI, PerlEx, PerlScript, Documentation et Exemples.
  • Open a Command-Line window and type PERL -v to check that everything is fine.
  • In subsequent sections, it will be assumed that Perl has been installed in C:\PERL. Make appropriate changes if necessary.
  • Reboot the computer. If Perl already had been installed on your computer, and the PATH environment variables already had been defined, for ex. during an upgrade, you may skip the reboot. After rebooting, open a command line window, and type PATH to make sure that C:\PERL\BIN is now part of your PATH environment variable.

Installing NMAKE

  • Download NMAKE.
  • Extract the files, and place them in C:\PERL\BIN. Both NMAKE.EXE and NMAKE.ERR are needed.

Installing the Necessary Perl Modules

Perl uses modules to extend the language's capabilities. Many of them are included with the core distribution, but many others are available. SpamAssassin requires several modules which are not in the core distribution of ActivePerl.

  • Open a command line window (an elevated command line window on Windows Server 2008 and later).
  • Type: PPM-Shell
    note 1: PPM connects to the repository via TCP Port 80 so you should be connected to the Internet and keep this port open.
  • At the PPM> prompt, type: repo list
  • Disable each repository in the response, whose name is NOT PerlSaRepo, by using this command:
    repo [id] off
    where [id] represents the number of the repository in the list we got earlier.
  • If the PerlSaRepo repository is not yet in the list, we add it now:
    repo add http://www.pytheas.com/pmg/PerlSaRepo PerlSaRepo
  • We can now add the missing Perl modules:
    install SA_PerlModules
    Expected response:
    Syncing site PPM database with .packlists...done
    Downloading SA_PerlModules-1.816...done
    Downloading IP-Country_2-27-2.27...done
    Downloading Win32-Registry-File_1-00-1.10...done
    Downloading NetAddr-IP_4-026-4.026...done
    Downloading Net-DNS_2-61-0.61...done
    Downloading Mail-SPF_2-00-2.00...done
    Downloading Geography-Countries-2009041301...done
    Downloading Tie-IxHash-1.21...done
    Downloading Net-IP-1.25...done
    Unpacking SA_PerlModules-1.816...done
    Unpacking IP-Country_2-27-2.27...done
    Unpacking Win32-Registry-File_1-00-1.10...done
    Unpacking NetAddr-IP_4-026-4.026...done
    Unpacking Net-DNS_2-61-0.61...done
    Unpacking Mail-SPF_2-00-2.00...done
    Unpacking Geography-Countries-2009041301...done
    Unpacking Tie-IxHash-1.21...done
    Unpacking Net-IP-1.25...done
    Generating HTML for SA_PerlModules-1.816...done
    Generating HTML for IP-Country_2-27-2.27...done
    Generating HTML for NetAddr-IP_4-026-4.026...done
    Generating HTML for Net-DNS_2-61-0.61...done
    Generating HTML for Geography-Countries-2009041301...done
    Generating HTML for Net-IP-1.25...done
    Updating files in site area...done
     201 files installed
  • To exit the package manager, type: quit

Obtaining and Installing SpamAssassin

  • Be sure to have PYTHEAS MailGate v. 2.32a (or a newer version). Upgrade if necessary.
  • Go to http://spamassassin.apache.org/downloads.html, and download the ZIP file distribution. Extract the Zip file off the root. For SpamAssassin version 3.4.1 for example, this will create C:\Mail-SpamAssassin-3.4.1  or C:\Mail-SpamAssassin-3.4.1\Mail-SpamAssassin-3.4.1, depending on how you proceed. We'll refer to this folder as the SPAMSOURCE folder in subsequent sections.
  • Open a command-line window (an elevated command line window on Windows Server 2008 and later), go to the SPAMSOURCE folder and type:
    PERL MAKEFILE.PL
    You will be asked a couple of questions. Be sure to answer No to the first one, which is not the default response:
    First question: Build spamc.exe (...)?
    Answer: N
    Next question: What email address or URL should be used (...)
    Answer: give a meaningful answer for your site.
    You may safely ignore the warnings about optional missing modules:
    (...)
    optional module missing: Razor2
    optional module missing: Net::Ident
    optional module missing: IO::Socket::INET6
    optional module missing: IO::Socket::SSL
    (...)
  • Still in the SPAMSOURCE folder, type:
    NMAKE
    NMAKE INSTALL
  • Make a backup copy of c:\perl\site\etc\mail\spamassassin\v310.pre (name it  v310.backup for ex.; in any case, don't give it the .pre extension). Open the file c:\perl\site\etc\mail\spamassassin\v310.pre in a text editor (Wordpad.exe will handle the line endings better than Notepad.exe).
    At the beginning of the lines
    loadplugin Mail::SpamAssassin::Plugin::Pyzor
    loadplugin Mail::SpamAssassin::Plugin::Razor2
    add the character # to transform them into a comment and avoid loading the plug-ins.
  • Finally type:
    C:\Perl\Site\Bin\SpamAssassin -V
    You should get the following response:
    SpamAssassin version 3.4.1
      running on Perl version 5.8.8
  • Download the SpamAssassin rules:
    C:\Perl\Site\Bin\sa-update --nogpg -v
    Using the --nogpg option works even if you do not have gpg installed. This should run without an error message.
    We recommend to run this command regularly (once a week, for ex.) to keep the SpamAssassin rules up to date.

Testing Your SpamAssassin Installation

From a command line window, in the SPAMSOURCE folder, type:
c:\perl\site\bin\spamassassin -D < sample-nonspam.txt 2>spamdebug.txt

This command should run smoothly. In the command line window, you will get the message after it passed through SpamAssassin. The output should indicate that this sample message is not spam - look at the X-Spam-... lines added by SpamAssassin in the header part of the message.

Please note: it may happen that the file spamassassin.bat is not created in the c:\perl\site\bin folder, but in the c:\perl\bin folder. In this case please adjust the suggested commands in the subsequent chapters.

Have a look at spamdebug.txt which has been created by this run. Check for DNS resolution. In the Received header parsing part of it, you should see:

dbg: dns: servers obtained from Net::DNS : [...]:53
dbg: dns: nameservers set to ...
(...)
dbg: dns: is Net::DNS::Resolver available? yes

At the end of the file, check for the results:
dbg: check: is spam? score=0 required=5
dbg: check: tests=
dbg: check: subtests=__CT,__CTYPE_CHARSET_QUOTED, __CT_TEXT_PLAIN, __DOS_BODY_STOCK, __DOS_BODY_SUN, __DOS_HAS_ANY_URI, __DOS_LINK, __DOS_RCVD_FRI, __FB_PICK, __FB_S_STOCK, __FM_STOCK_WORDS, __HAS_ANY_EMAIL, __HAS_ANY_URI, __HAS_MSGID, __HAS_RCVD, __HAS_SUBJECT, __LAST_UNTRUSTED_RELAY_NO_AUTH, __MIME_VERSION, __MISSING_REF, __MSOE_MID_WRONG_CASE, __NAKED_TO, __NONEMPTY_BODY, __RCVD_IN_SORBS, __RCVD_IN_ZEN, __SANE_MSGID, __TOCC_EXISTS, __YOUR_ACCOUNT

Now let's check if a message is correctly identified as spam. From the SPAMSOURCE folder, type:
c:\perl\site\bin\spamassassin -D < sample-spam.txt 2>spamdebug.txt

The output in the command line window should indicate that this sample message is spam (look at the X-Spam-... lines added by SpamAssassin in the header part of the message, and the body of the message which has been modified by SpamAssassin).

Have a look at spamdebug.txt. At the end of the file, check for the results:
dbg: check: is spam? score=999.998 required=5
dbg: check: tests=GTUBE,NO_RECEIVED,NO_RELAYS
dbg: check: subtests=__CT,__CTE,__CT_TEXT_PLAIN,__HAS_MSGID,__HAS_SUBJECT, __MIME_VERSION, __MISSING_REF, __MSGID_OK_HOST, __NONEMPTY_BODY, __SANE_MSGID, __TOCC_EXISTS, __UNUSABLE_MSGID

The Online Documentation

You can access the documentation at http://spamassassin.apache.org/full/3.3.x/dist/doc/. The most important file to read is Mail Spamassassin Conf - it outlines all major configuration parameters.

Connect SpamAssassin and PYTHEAS MailGate

If you are upgrading, you are now ready to restart PYTHEAS MailGate.

If you do not have a pmg-local.cf file, copy this file from the SpamAssassin support files to C:\etc\mail\spamassassin. Create this folder if it does not exist. Use this file to configure the way SpamAssassin should work for your site. You should not edit global configuration files in C:\perl\site\share\spamassassin as your settings could be lost during the next upgrade. Of course, it is a good idea to look at the global configuration files to know what parameters can be changed.

Please note: For PYTHEAS MailGate v. 2.75c and earlier, on Microsoft Windows Server 2012, please avoid folder names containing spaces for temporary storage of incoming messages.

Copy the files sa.cmd, DOS2UNIX.EXE et UNIX2DOS.EXE to the C:\Program Files\PytheasMailgate folder. The downloadable version of the file assumes that Perl has been installed in the C:\perl folder. Please note that we do not really need DOS2UNIX.EXE and UNIX2DOS.EXE for the current version of SpamAssassin, but it may be useful for future versions. Here are some comments about the contents of sa.cmd:

-D Instructs SpamAssassin to produce diagnostic output (see below). You may change this option to obtain different diagnostic output. You can also omit this parameter altogether, if you do not need it.
-e Instructs SpamAssassin to set the exit code depending on the spam status. PYTHEAS MailGate uses this exit code to pick up the spam status.
-p ... Instructs SpamAssassin to use the Pmg-local.cf file, regardless of the user context in which it is running.
%1, %2, %3, %4 PYTHEAS MailGate will always call sa.cmd with 4 parameters. Please see details below.
%1 Path name of the file containing the message to be checked.
%2 Path name of the file to contain the checked message (this is always Temp_folder\PmgSaChki.tmp, i being a number from 1 to 12).
%3 Path name of the file to contain the diagnostic output produced by SpamAssassin (this is always Temp_folder\PmgSpamAi.log, i being a number from 1 to 12).
%4 Determined by the POP3 account configuration in PYTHEAS MailGate. Note: the downloadable version of sa.cmd includes a code to handle the value NoSpamCheck for this parameter, which does what its name suggests: if you add Spam-A:NoSpamCheck to the Comment of a POP3 account, it will be excluded from spam checking.
Exit code or Errorlevel Since v. 2.31c,  PYTHEAS MailGate no longer relies on the exit code (or Errorlevel value) of the sa.cmd command file, as with previous versions.

To check your installation, you may use sapmg.cmd from the SpamAssassin support files. This command file calls SpamAssassin the same way PYTHEAS MailGate does. You will find the message which has been checked by SpamAssassin, and the diagnostic output spamdebug.txt, in the folder referenced by the TEMP environment variable (use the SET command to show environment variables).

Test it

If you activate spam-checking for the first time, you may want to activate it for a single POP3 account only, with the following options:

  • Check incoming mail with SpamAssassin... Only from POP3 accounts with the word Spam-A in the comment. Put the word Spam-A into the Comment field of the POP3 account entry.
  • Forward messages identified as Spam to... The intended Recipient as usual

After messages have been spam-checked, look for the following lines In the Remote Control Program or in the Session Log message:

[11:16] [Spamassassin] Spam status: No, score=-4.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.1

or

[11:06] *** [Spamassassin] Spam status: Yes, score=8.8 required=5.0 tests=BAYES_99, BIZ_TLD, HTML_60_70, HTML_MESSAGE, HTML_TITLE_UNTITLED, HTTP_EXCESSIVE_ESCAPES, MIME_BASE64_TEXT, MIME_HTML_NO_CHARSET, MIME_HTML_ONLY autolearn=no version=3.4.1

In case you have problems:

  • Please have a look at PmgSpamAn.log or at PmgSaChkn.tmp (you will need to make a copy of these file while the download session is still in progress, as they will be deleted upon termination). You will find these files in the folder you specified on the Service Options page, Incoming mail tab (in v. 2.x: on the Environment tab of the Configuration Program).
  • Did you really restart the computer since you installed Perl for the first time?
  • Did you check the paths in sa.cmd ?
  • Did you create the C:\etc\mail\SpamAssassin folder? Did you put your copy of pmg-local.cf there?

Cleaning up

The SPAMSOURCE folder is no longer needed once the installation is completed.

Setting Spam Delivery Options in PYTHEAS MailGate

You have the following options for the delivery of messages which have been identified as spam:

  • deliver as usual (please note that the spam will have been tagged as such by SpamAssassin),
  • always deliver to a particular Recipient
  • do not deliver to anybody. If you have configured to write a log entry for every incoming message, messages identified as spam are logged even if they are actually not forwarded to any internal Recipient at all. Such messages receive a [Spam] tag at the beginning of the message subject.
  • Messages with a spam score above a certain level can be handled in a different way, as compared to spam messages with a spam score below this level.

Specific Configuration Settings for POP3 Accounts

You can activate spam analysis for all POP3 accounts, or only for selected ones. The Comments field in the POP3 Account properties is used for this purpose.

To activate spam detection only for certain POP3 accounts, configure the corresponding option in the PYTHEAS MailGate configuration (see screen shot above), and type the word Spam-A anywhere as a separate word into the Comment field of the selected POP3 accounts.

To use specific SpamAssassin configuration settings for POP3 accounts, proceed as follows:

  • Put the following expression into the Comment field of each POP3 Account entry: Spam-A:ConfigTag.
    ConfigTag
    is some identifier (only composed of letters and numbers). It will be passed as 4th parameter to sa.cmd.
  • You can now write code in sa.cmd to switch to different configuration files, based on this parameter.
  • If for a particular POP3 account, no ConfigTag value is found in the Comments field, the word Nothing is passed as 4th parameter (so you can be sure that your sa.cmd file always gets 4 parameters).
  • The sa.cmd file included in the SpamAssassin support files contains code to handle the ConfigTag value of NoSpamCheck, to exclude a particular POP3 account from spam checking.

Spam/Ham Learning for SpamAssassin

For spam/ham learning with sa-learn, messages are needed in text format according to RFC822, with the complete message header lines. Unfortunately, there does not seem to be an easy way to save messages in such a format using Microsoft Outlook.

How to save incoming messages to files in RFC822 format

PYTHEAS MailGate v. 2.30c (or later) supports a new way to write messages to disk files in RFC822 format. This new function is managed by a tag in the Comment field of POP3 account entries. The name of the tag is SaveToDisk, and it has two parameters, which are separated by a vertical bar (ASCII_124):

  • a name for a folder (which will be created if it does not exist). Messages will be saved to this folder. It will be located in  ProgramData\PytheasMailgate\Incoming or  Program_Files\PytheasMailgate\Incoming (depending on where your PMailGat.INI configuration file is located);
  • an age limit (in hours). Any files in this folder older than the age limit will automatically be deleted. An age limit of 0 (zero) will disable automatic cleaning.

As an example, adding the expression SaveToDisk:SpamHam|24 to the Comment field of a POP3 account entry will save all messages from this POP3 mailbox to the Incoming\SpamHam subfolder of the folder where the PYTHEAS MailGate configuration files are stored, and any file older than 24 hours in this folder will be cleaned out at the beginning of the upcoming download session. Message delivery will continue as usual. Several POP3 mailboxes can have their messages dropped into the same folder.

Another way to obtain messages in RFC822 format is to use the View/Delete messages function (accessible from the POP3 account property page). It has a Save message as-function (press F10 to access it). You should also configure PYTHEAS MailGate not to delete messages after downloading them, and clean them after a day or two. So you can get messages in RFC822 format directly from the POP3 account. With this method, you can also get the messages to teach the Bayes engine with messages for which it does not yield the correct result.

To streamline the process, you could do the following:

  • Set up a folder structure as described in the SpamAssassin support files package.
  • Make shortcuts on the desktop for the programs LearnHam.cmd and LearnSpam.cmd, and the folders  SpamTest\Ham and SpamTest\Spam.

Now the learning procedure could look like this:

  • If you configured your POP3 account to have the messages saved to files by using the SaveToDisk option (see above), open the ...\Incoming\... folder. Drag-and-drop the messages to the SpamTest\Spam or SpamTest\Ham shortcut.
  • Alternatively, you can save the message to feed into the learning process on the desktop (View/Delete messages, F10, Save message as). Then drag-and-drop the file to the shortcut pointing to the SpamTest\Spam or SpamTest\Ham folder.
  • Double-click on the shortcut for LernSpam.cmd or LernHam.cmd (this will feed all files contained in this folder into sa-learn).

Additional instructions for upgrading from SpamAssassin 2.x

  • Before installing a 3.x version of SpamAssassin over a 2.x version,  you should put your Bayes database into a "clean" state:
    from a command line prompt, execute:
    sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf --rebuild
  • Clean the c:\etc\mail\spamassassin folder: leave only pmg-local.cf and the bayesdb subfolder and its contents; delete all the other files.
  • After installing the 3.x version of SpamAssassin: From a command line prompt, execute...
    c:\perl\site\bin\sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf --sync
    followed by
    c:\perl\site\bin\sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf -D --import
    to migrate the data into new DB_File format. Be patient, these commands may take a couple of minutes to complete, depending on the size of your Bayes database.
  • Check that the new version of SpamAssassin works on your machine (we recommend to use the spam-a.cmd command file included in the SpamAssassin support files for this purpose, because it includes a reference to your pmg-local.cf preferences file, which in turn contains the pointer to your Bayes database in c:\etc\mail\spamassassin\bayesdb). Look in the debug output for configuration options in pmg-local.cf which may be no longer supported or which have a new syntax. You may want to compare your configuration file to the sample pmg-local.cf file contained in the SpamAssassin support files.

More Information

Credits

This document has been inspired by USING SpamAssassin WITH WIN32, (c) 2002,2004 by Michael Bell (thanks!).

SpamAssassin is a trademark of the Apache Software Foundation.

Back to top  Back to top