sa-heatu

version 3.03

A utility to display, edit and age the Spam Assassin Heuristic Email Address Tracker database ~/.spamassassin/auto-whitelist.

HEAT Background

This version contains a significant enhancment

There is now a parallel hash file containing date entries.

sa-heatu matches entries from auto-whitelist with timestamps entries. If the count from the auto-whitelist entry is greater then the timestamps entry, the timestamps entry is updated with the current time and count. If there was no timestamps entry one is created.
Entries that have not been updated in many days are expired.

This utility now operates in a "current file in, new file out" mode as opposed to the previous "update in place" mode. This may cause the hash files to be reduced in size due to expiring entries. Deleting entries from a hash does not reduce it's size.

This very simple minded approach to aging permits expiring old entries without any impact on spamassassin's operation.


sa-heatu is used to maintain the database

sa-heatu --quiet | --showUpdates | --verbose
--firstTimes | --DONTupdateTimestamps | --noTimestamps
--expireOlderThan days
--remove nnnnnn@dddddd.xxx [dbFile [timeStampFile ]]

--quiet Don't output anything.
--showUpdates Output entries updated or added or removed, in addition to the summary.
--verbose Output every entry.
Warning the output should piped to a filter or redirected to a file as will be very long.
--firstTimes Use this for the first run to avoid reading timestamps.
--DONTupdateTimestamps  
--noTimestamps No timestamps processing is done.
--expireOlderThan days Expires entries older than days i.e. they are not written to the output files.
Default: 183
--remove nnnnnn@dddddd.xxx Remove entries with this email address and any IP address
dbfile Score and count database name (perl hash file). Default:auto-whitelist
timestamps timestamps database (perl hash file). Default:timestamps
-h
sa-heatu Spam Assassin - Heuristic Email Address Tracker Utility  3.03 101129  
                    DGermansa@Real-world-Systems.com (c)2010 Dennis G German  

 usage: sa-heatu --quiet --showUpdates --verbose  
                 --firstTimes --DONTupdateTimestamps  --noTimestamps   
                 --expireOlderThan days 
                 --remove nnnnnn@dddddd.xxx                   dbfile timestamps 

Recommended Operation

Run daily from cron. Suggested script:

/usr/local/bin/sa-heatu -showUpdates >> ~/.spamassassin/sa-heatu.log
if [ $? -ne 0 ]; then echo "sa-heatu failed, databases unmoved\!"; exit 1; fi
#
rm auto-whitelisto timestampso
# Now save the old files to  xxx-1 and install the ones output by sa-heatu.
mv -f auto-whitelist  auto-whitelist-1 ; if [ $? -ne 0 ]; then echo "mv auto-whitelist  failed, I quit\!";exit 1; fi
mv auto-whitelisto auto-whitelist;       if [ $? -ne 0 ]; then echo "mv auto-whitelisto failed, I quit\!";exit 1; fi
mv -f timestamps  timestamps-1;          if [ $? -ne 0 ]; then echo "mv timestamsp      failed, I quit\!";exit 1; fi
mv timestampso timestamps;               if [ $? -ne 0 ]; then echo "mv timestampso     failed, I quit\!";exit 1; fi
echo "autop-whitelist, timestamps updated"

Output from sa-heatu can be sorted to display frequent (or rare) senders of spam (or ham ).

Running sa-heatu --verbose should be avoided unless the output is redirectrd to a file or piped to a filter since the database contains a (surprisingly) large number of entries.


Display ham senders:

(Remember the date and time stamp is the time sa-heatu was run, not the time the email was received). average total count email address ip network address last time updated sa-heatu --verbose --DONTupdateTimestamps |sort -n | head -5

   -19.3     -96.3   5   jason.haar@trimble.co.nz                    222.154; kept, Aug 20 21:24 2010
   -19.3     -96.3   5   karliak@ajetaci.cz                          77.48; kept, Aug 20 21:24 2010
   -19.3    -115.6   6   scheidell@secnap.net                        204.89; new,
   -19.3    -115.6   6   si@yacc.co.uk                               62.232; new, Aug 27 21:59 2010
   -19.3    -134.9   7   mkitchin.public@gmail.com                   66.238; kept, Aug 20 21:24 2010

Display spammers:

sa-heatu --verbose --DONTupdateTimestamps |sort -rn | head -4

    61.8     123.5   2   claims_office001@kimo.com                    221.2; kept,Aug 20 21:24 2010
    60.8      60.8   1   mr.williams.wright@gmail.com                 82.128; kept, Aug 20 21:24 2010
    56.2     112.4   2   danjos_01@yahoo.com                          41.26; kept, Aug 20 21:24 2010
    55.2     110.5   2   danjos_01@yahoo.com                          67.205; kept, Aug 20 21:24 2010

Find senders whose messages are incorrectly adjusted.

To display a single sender's record:
    sa-heat --noTimestamps --verbose | grep -i Spammer@example.com


Remove the entries for a particular email address, for all IP networks :
   sa-heat --noTimestamps --remove spammer@example.com
   mv -f ~/.spamassassin/auto-whitelisto autowhitelist


Included in the tar is 64c.hexdump which is a formatting specification file for hexdump which can be used to display the timestamps and other perl hash files.

hexdump -f 64c.hexdump timestamps

See www.Real-World-Systems.com/docs/hexdump.1.html

top

HEAT Background

The Heuristic Email Address Tracker feature in spamassassin retains a summary of scores from messages received by email address and IP network address.
When a new message is received, the final score is adjusted as a function of the previous average value resulting in a:
  • boost  from senders who have sent ham (nice messages) or a
  • penally from senders who have sent spam

    The final SCORE of a message is calculated by:
  1. SCORE based on rules
  2. Compute DELTA as (MEAN-SCORE)*auto_whitelist_factor (from configuration)
  3. Bump SCORE by DELTA
The result is compared against required_score and if the message score is greater, it is considered spam.

Negative values indicate senders of ham, positive values senders of spam.

The sender's email address, the IP adress, accumulated score, and number of emails received are stored is in a perl hash.

Spammers have been known to use this to their advantage by sending a benign email which scores high as ham. They then send spam which has it's score "neturalized" by the Heuristic Email Address Tracker scheme and the message will be, falsely, considered ham!
If you receive a message that is clearly spam, check X-Spam-Report in the message header for the string:
     AWL: From: address

There is no mechanism within spamassassin to remove incorrect entries from the database.

Although this is a small amount of data, no mechanism is provided within spamassassin to expire old entries.

There has long been a discussion regarding the significantly misnamed AWL (Auto White List).

Next revision

After significant analysis and thought it seems that giving a bonus to users who send ham is not necessary. People who usually send you nice emails will continue to send nice emails an don't need any help in scoring.
Giving hammers a bonus has the unfortunate consequence that if ham is received from an spammer a subsequent message is given a little slack, i.e. a bonus, for having sent ham previously. Spammers have take advantage of this.
The PLANNED future option--dehammer deletes all entries with a negative score i.e.previously sent ham.
Please send me a message if you have comments on this, or anything else about sa-heatu

This is an enhanced version of the original tool.

This document and the current version of sa-heatu can be downloaded at: sa-heatu.3.xx.tar



Previous versions of this utility included --prune which has been depricated. The idea was that:
"The size of the database can be significantly reduced by using:     sa-heat --prune This caused any entry with only 1 entry to be removed on the (somewhat mistaken) assumption that an emailer that has only sent 1 email isn't worth remembering."

This is definitly mistaken as if a spammer has recently sent a message he may soon send another.