Spam Detection Using SpamAssassin with PYTHEAS MailGate
This page now includes instructions how to install SpamAssassin release
Upgrade instructions are here.
SpamAssassin (tm) is an open source product that performs
heuristic spam analysis and RBL (Realtime Blackhole List) lookups among other tests,
to clearly tag spam mail as such. PYTHEAS MailGate can then be instructed
to handle spam mail in a particular way.
SpamAssassin (tm) is open source software, licensed
under the Apache Software License (which you can find at
No guarantees or warranties apply to the software. You use it entirely at your own
Neither SpamAssassin nor the software components it
requires are installed by the PYTHEAS MailGate setup program. Please note
that you need a PYTHEAS MailGate license key which activates the
Content-Checking Rules engine; see the
About tab to learn about
the options activated by your license key.
In its default form, SpamAssassin is designed and written
for Unix platforms. This document outlines how to get SpamAssassin
working on a Windows platform. Although it may seem a little
bit cumbersome at first glance, we are sure that you will recognize that it is worth
the trouble - it has an amazing efficiency.
If you are doing a fresh install, you can skip this section.
Upgrading a SpamAssassin v. 3.x Installation
For the time of the upgrade, you should
stop the Pytheas.MailGate service (or the Communication Task). To upgrade to a newer version of SpamAssassin:
- If you upgrade from SpamAssassin
2.x to 3.x, be sure to read these notes first.
- Uninstall ActivePerl. Then delete the whole
Be sure not to delete the
c:\etc\mail\spamassassin folder. You
may also want to move the NMAKE utilitiy
C:\perl\bin to some safe place.
- Be sure to get the new
SpamAssassin support files. The
sa.cmd file required for SpamAssassin v.3.4.1
is different from the one included in the package for earlier (pre 3.3.0) versions of
SpamAssassin. Please copy
UNIX2DOS.EXE to the folder
where PYTHEAS MailGate has been installed.
- Your configuration file
may contain options which are no longer supported in the new version.
Carefully read the beginning of
spamdebug.txt when checking your
new SpamAssassin installation later.
- Proceed the same way as you would for a fresh installation, starting from here.
Check that you have the latest version of the
SpamAssassin support files. If not,
download and unzip.
- Install ActivePerl (v. 18.104.22.1682). Keep the features Perl
and PPM selected. You may unselect the features Perl ISAPI,
PerlEx, PerlScript, Documentation et
- Open a Command-Line window and type
PERL -v to check that everything
- In subsequent sections, it will be assumed that Perl has been installed in
C:\PERL. Make appropriate changes if necessary.
- Reboot the computer. If Perl already had been installed on your
computer, and the PATH environment variables already had been defined, for ex.
during an upgrade, you may skip the reboot. After rebooting, open a command line window, and type
to make sure that
C:\PERL\BIN is now part of your
- Download NMAKE.
- Extract the files, and place them in
NMAKE.ERR are needed.
Installing the Necessary Perl Modules
Perl uses modules to extend the language's capabilities. Many of them are included
with the core distribution, but many others are available. SpamAssassin
requires several modules which are not in the core distribution of ActivePerl.
Obtaining and Installing SpamAssassin
- Be sure to have PYTHEAS MailGate v. 2.32a (or a newer version).
Upgrade if necessary.
- Go to
and download the ZIP file distribution. Extract the Zip file off the root. For
example, this will create
depending on how you proceed. We'll refer to this folder as the SPAMSOURCE
folder in subsequent sections.
- Open a command-line window (an elevated command line window on Windows
Server 2008 and later), go to the SPAMSOURCE folder and type:
You will be asked a couple of questions. Be sure to answer
the first one, which is not the default response:
Build spamc.exe (...)?
What email address or URL should be used (...)
Answer: give a meaningful answer for your site.
You may safely ignore the warnings about optional missing modules:
optional module missing: Razor2
optional module missing: Net::Ident
optional module missing: IO::Socket::INET6
optional module missing: IO::Socket::SSL
- Still in the SPAMSOURCE folder, type:
- Make a backup copy of
v310.backup for ex.; in any case, don't give it the
.pre extension). Open the file
in a text editor (Wordpad.exe will handle the line endings better
At the beginning of the lines
add the character
# to transform them into a comment and avoid
loading the plug-ins.
- Finally type:
You should get the following response:
running on Perl version 5.8.8
- Download the SpamAssassin rules:
C:\Perl\Site\Bin\sa-update --nogpg -v
Using the --nogpg option works even if you do not have gpg installed. This should run without an error message.
We recommend to run this command regularly (once a week, for ex.) to keep
the SpamAssassin rules up to date.
Testing Your SpamAssassin Installation
From a command line window, in the SPAMSOURCE folder, type:
c:\perl\site\bin\spamassassin -D < sample-nonspam.txt 2>spamdebug.txt
This command should run smoothly. In the command line window, you will get the
message after it passed through SpamAssassin. The output should indicate that this
sample message is not spam - look at the
X-Spam-... lines added by
the header part of the message.
Please note: it may happen that the file
spamassassin.bat is not
created in the
c:\perl\site\bin folder, but in the
folder. In this case please adjust the suggested commands in the subsequent
Have a look at
spamdebug.txt which has been created by this run.
Check for DNS resolution. In the Received header parsing part of it,
you should see:
dbg: dns: servers obtained from Net::DNS : [...]:53
dbg: dns: nameservers set to ...
dbg: dns: is Net::DNS::Resolver available? yes
At the end of the file, check for the results:
dbg: check: is spam? score=0 required=5
dbg: check: tests=
dbg: check: subtests=__CT,__CTYPE_CHARSET_QUOTED, __CT_TEXT_PLAIN, __DOS_BODY_STOCK, __DOS_BODY_SUN, __DOS_HAS_ANY_URI, __DOS_LINK, __DOS_RCVD_FRI, __FB_PICK, __FB_S_STOCK, __FM_STOCK_WORDS, __HAS_ANY_EMAIL, __HAS_ANY_URI, __HAS_MSGID, __HAS_RCVD, __HAS_SUBJECT, __LAST_UNTRUSTED_RELAY_NO_AUTH, __MIME_VERSION, __MISSING_REF, __MSOE_MID_WRONG_CASE, __NAKED_TO, __NONEMPTY_BODY, __RCVD_IN_SORBS, __RCVD_IN_ZEN, __SANE_MSGID, __TOCC_EXISTS, __YOUR_ACCOUNT
Now let's check if a message is correctly identified as spam. From the SPAMSOURCE folder, type:
c:\perl\site\bin\spamassassin -D < sample-spam.txt 2>spamdebug.txt
The output in the command line window should indicate that this sample message
is spam (look at the X-Spam-... lines added by SpamAssassin
in the header part of the message, and the body of the message which has been modified
Have a look at spamdebug.txt. At the end of the file, check for the results:
dbg: check: is spam? score=999.998 required=5
dbg: check: tests=GTUBE,NO_RECEIVED,NO_RELAYS
dbg: check: subtests=__CT,__CTE,__CT_TEXT_PLAIN,__HAS_MSGID,__HAS_SUBJECT, __MIME_VERSION, __MISSING_REF, __MSGID_OK_HOST, __NONEMPTY_BODY, __SANE_MSGID, __TOCC_EXISTS, __UNUSABLE_MSGID
The Online Documentation
You can access the documentation at
http://spamassassin.apache.org/full/3.3.x/dist/doc/. The most
important file to read is Mail Spamassassin Conf - it outlines all major
Connect SpamAssassin and PYTHEAS MailGate
If you are upgrading, you are now ready to restart PYTHEAS MailGate.
do not have a
pmg-local.cf file, copy this file from the
SpamAssassin support files to
C:\etc\mail\spamassassin. Create this
folder if it does not exist. Use this file to configure the way SpamAssassin
should work for your site. You should not edit global configuration files in
C:\perl\site\share\spamassassin as your settings could be lost during
the next upgrade. Of course, it is a good idea to look at the global configuration
files to know what parameters can be changed.
Please note: For PYTHEAS MailGate v. 2.75c and earlier, on Microsoft Windows
Server 2012, please avoid folder names containing spaces for temporary storage
of incoming messages.
Copy the files
UNIX2DOS.EXE to the
folder. The downloadable version of the file assumes that Perl has
been installed in the
Please note that we do
not really need
UNIX2DOS.EXE for the current version of
SpamAssassin, but it may be useful for future versions.
Here are some comments about
the contents of
Instructs SpamAssassin to produce diagnostic output (see below). You may change this option to obtain different diagnostic output.
You can also omit this parameter altogether, if you do not need it.
Instructs SpamAssassin to set the exit code depending
on the spam status. PYTHEAS MailGate uses this exit code to pick up the
Instructs SpamAssassin to use the
file, regardless of the user context in which it is running.
%1, %2, %3, %4
PYTHEAS MailGate will always call
with 4 parameters. Please see details below.
Path name of the file containing the message to be checked.
Path name of the file to contain the checked message (this is
being a number from 1 to 12).
Path name of the file to contain the diagnostic output produced
by SpamAssassin (this is always
i being a number from 1 to 12).
Determined by the POP3 account configuration
in PYTHEAS MailGate. Note: the downloadable
sa.cmd includes a code to handle the value
NoSpamCheck for this parameter, which does what its name suggests: if
Spam-A:NoSpamCheck to the Comment of a
POP3 account, it will be excluded from spam checking.
Exit code or Errorlevel
Since v. 2.31c,
MailGate no longer relies on the exit code (or Errorlevel
value) of the
sa.cmd command file, as with previous versions.
To check your installation, you may use
sapmg.cmd from the
SpamAssassin support files. This
command file calls SpamAssassin the same way PYTHEAS MailGate
does. You will find the message which has been checked by SpamAssassin,
and the diagnostic output
spamdebug.txt, in the folder referenced
by the TEMP environment variable (use the SET command
to show environment variables).
If you activate spam-checking for the first time, you may want to activate it
for a single POP3 account only, with the following options:
- Check incoming mail with SpamAssassin... Only from POP3 accounts with
the word Spam-A in the comment. Put the word Spam-A into the
Comment field of the POP3 account entry.
- Forward messages identified as Spam to... The intended Recipient as
After messages have been spam-checked, look for the following lines In the Remote Control Program
or in the Session Log message:
[11:16] [Spamassassin] Spam status: No, score=-4.9 required=5.0 tests=BAYES_00
[11:06] *** [Spamassassin] Spam status: Yes, score=8.8 required=5.0
tests=BAYES_99, BIZ_TLD, HTML_60_70, HTML_MESSAGE, HTML_TITLE_UNTITLED, HTTP_EXCESSIVE_ESCAPES,
MIME_BASE64_TEXT, MIME_HTML_NO_CHARSET, MIME_HTML_ONLY autolearn=no version=3.4.1
In case you have problems:
- Please have a look at
PmgSpamAn.log or at
(you will need to make a copy of these file while the download session is still
in progress, as they will be deleted upon termination). You will find these files
in the folder you specified on the Service Options
page, Incoming mail tab (in v. 2.x:
on the Environment tab of the Configuration
- Did you really restart the computer since you installed Perl for the first
- Did you check the paths in
- Did you create the
C:\etc\mail\SpamAssassin folder? Did you
put your copy of
The SPAMSOURCE folder is no longer needed once the installation
Setting Spam Delivery Options in PYTHEAS MailGate
You have the following options for the delivery of messages which have been identified
- deliver as usual (please note that the spam will have been tagged as such
- always deliver to a particular Recipient
- do not deliver to anybody. If you have configured to write a log entry for
every incoming message, messages identified as spam are logged even if they are
actually not forwarded to any internal Recipient at all. Such messages
receive a [Spam] tag at the beginning of the message subject.
- Messages with a spam score above a certain level can be handled in a different
way, as compared to spam messages with a spam score below this level.
Specific Configuration Settings for POP3 Accounts
You can activate spam analysis for all POP3 accounts, or only for selected ones.
The Comments field in the POP3 Account properties is used for this
To activate spam detection only for certain POP3 accounts, configure the corresponding
option in the PYTHEAS MailGate configuration (see screen shot above), and type the word Spam-A
anywhere as a separate word into the Comment field of the selected
To use specific SpamAssassin configuration settings for POP3 accounts,
proceed as follows:
- Put the following expression into the Comment field of each POP3
ConfigTag is some identifier (only composed of letters and numbers). It
will be passed as 4th parameter to
- You can now write code in
sa.cmd to switch to different configuration
files, based on this parameter.
- If for a particular POP3 account, no ConfigTag value is found
in the Comments field, the word Nothing is passed as
4th parameter (so you can be sure that your
sa.cmd file always gets
sa.cmd file included in the
SpamAssassin support files
contains code to handle the ConfigTag value of
to exclude a particular POP3 account from spam checking.
Spam/Ham Learning for SpamAssassin
For spam/ham learning with sa-learn, messages
are needed in text format according to RFC822, with the complete message header
lines. Unfortunately, there does not seem to be an easy way to save messages in
such a format using Microsoft Outlook.
How to save incoming messages to files in RFC822 format
PYTHEAS MailGate v. 2.30c (or later) supports a new way to write messages to disk
files in RFC822 format. This new function is managed by a tag in the
Comment field of POP3 account entries. The name of the tag is
and it has two parameters, which are separated by a vertical bar (ASCII_124):
- a name for a folder (which will be created if it does not exist). Messages
will be saved to this folder. It will be located in
(depending on where your
PMailGat.INI configuration file is
- an age limit (in hours). Any files in this folder older than the age limit
automatically be deleted. An age limit of 0 (zero) will disable automatic
As an example, adding the expression
the Comment field of a POP3 account entry will save all
messages from this POP3 mailbox to the
of the folder where the PYTHEAS MailGate configuration files are stored,
and any file older than 24 hours in this folder will be cleaned out at the
beginning of the upcoming download
session. Message delivery will continue as usual. Several POP3 mailboxes can
have their messages dropped into the same folder.
Another way to obtain messages in RFC822 format is to use the View/Delete messages function (accessible from the POP3 account
property page). It has a Save message as-function (press F10 to access
it). You should also configure PYTHEAS
MailGate not to delete messages after downloading them, and clean them
after a day or two. So you can get messages in RFC822 format directly from the POP3
account. With this method, you can also get the messages to teach the Bayes engine
with messages for which it does not yield the correct result.
To streamline the process, you could do the following:
- Set up a folder structure as described in the
SpamAssassin support files package.
- Make shortcuts on the desktop for the programs
LearnSpam.cmd, and the folders
Now the learning procedure could look like this:
- If you configured your POP3 account to have the messages saved to files by
SaveToDisk option (see above), open the
...\Incoming\... folder. Drag-and-drop the messages to the
- Alternatively, you can save the message to feed into the learning process on the desktop (View/Delete
messages, F10, Save message as). Then drag-and-drop the file to the shortcut pointing to the
- Double-click on the shortcut for
(this will feed
all files contained in this folder into sa-learn).
Additional instructions for upgrading from SpamAssassin
installing a 3.x version of SpamAssassin over a 2.x version, you should
put your Bayes database into a "clean" state:
from a command line prompt, execute:
sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf --rebuild
- Clean the
c:\etc\mail\spamassassin folder: leave only
bayesdb subfolder and its contents; delete all the other
- After installing the 3.x version of SpamAssassin: From a command line prompt, execute...
c:\perl\site\bin\sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf --sync
c:\perl\site\bin\sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf -D --import
to migrate the data into new DB_File format. Be patient, these commands may take
a couple of minutes to complete, depending on the size of your Bayes database.
- Check that the new version of SpamAssassin works on your machine
(we recommend to use the
spam-a.cmd command file included
SpamAssassin support files for this
purpose, because it includes a reference to your
preferences file, which in turn contains the pointer to your Bayes database
c:\etc\mail\spamassassin\bayesdb). Look in the debug output for
configuration options in
pmg-local.cf which may be no longer supported
or which have a new syntax. You may want to compare your configuration file to
pmg-local.cf file contained in the
SpamAssassin support files.
This document has been inspired by USING SpamAssassin WITH
WIN32, (c) 2002,2004 by Michael Bell (thanks!).
SpamAssassin is a trademark of the Apache Software Foundation.