spam assassin
bayes learn version 3.2.4

sa-learn [options] [file]...

Given a "typical" selection of incoming mail classified as spam or ham (non-spam), feed each mail to SpamAssassin, allowing it to 'learn' what signs are likely to mean spam, and which are likely to mean ham.

Run for each mail folder, and it will ''learn'' from the mail.

Globbing of the folder is supported; * will scan every folder that matches. See Mail::SpamAssassin::ArchiveIterator for more details. SpamAssassin remembers which mail messages it has learnt already, and will not re-learn those messages again, unless you use the --forget option. Messages learnt as spam will have SpamAssassin markup removed, on the fly. If you make a mistake and scan a mail as ham when it is spam, or vice versa, simply rerun this command with the correct classification, and the mistake will be corrected. SpamAssassin will automatically 'forget' the previous indications.

spamd users : to perform training remotely, over a network, see -L .

--ham| (--spam) Learn messages as
--mbox | (--mbx) Input format
-f file
--folders=file
Read list of files / directories from file
--forget Forget a message from STDIN
--use-ignores Use bayes_ignore_from and bayes_ignore_to
--sync | (--no-sync) Syncronize (skip synchronization of) the database and the journal if needed
--force-expire Force a database sync and expiry run
--dump[all|data|magic] Display the contents of the Bayes database
--regexp re For dump specifies which tokens to dump
--showdots|
(--progress)
Show progress using dots (progress bar)
-L
--local
Operate locally, no network accesses
--import Migrate data from older version/non DB_file based databases
--clear Wipe out existing database
--backup |
(--restore filename)
Backup existing database to STDOUT > file
Restore a database from filename
--dbpath Allows commandline override (in bayes_path form) for where to read the Bayes DB from
-u username
--username=username
Override username taken from the runtime environment, used with SQL
-C path
--configpath=path
--config-file=path
Path to standard configuration dir
-p file
--prefspath=file
--prefs-file=file
Set user preferences file default: ~/.spamassassin/user_prefs
--siteconfigpath=path Path for site configs. default: /etc/mail/spamassassin
--cf='config line' Additional line of configuration
-D
--debug [area=n,...]
If no areas are listed, all debugging information is printed.
Diagnostic output can be enabled for each area individually;
spamassassin -D bayes,learn,dns
For more information about which areas (also known as channels) are available, please see: wiki.apache.org/spamassassin/DebugChannels
-V --version
-h --help
 

 sa-learn --dump magic
 0.000          0          3          0  non-token data: bayes db version
 0.000          0     261396          0  non-token data: nspam
 0.000          0      18089          0  non-token data: nham
 0.000          0     148790          0  non-token data: ntokens 
 0.000          0 1230126517          0  non-token data: oldest atime
 0.000          0 1236139617          0  non-token data: newest atime
 0.000          0 1236140767          0  non-token data: last journal sync atime
 0.000          0 1235651034          0  non-token data: last expiry atime
 0.000          0    5529600          0  non-token data: last expire atime delta
 0.000          0      10952          0  non-token data: last expire reduction count
 

 sa-learn --dump data
0.062       2274       2400 1236137441  c0614089c0
0.507      13202        889 1236085607  2dd27dc5f9
0.001          2        226 1236119931  461312c98e
0.003          8        170 1235173556  262e33315c
…     148790 lines !
0.016          0          1 1235685679  da21efbad4
0.016          0          1 1235706793  2f834646e6
0.987          1          0 1235769482  ab8e7006c3
0.987          1          0 1236111118  77f749b43b

Spam Assassin database sa-learn Sync

Output from --sync -D (debug)
dbg: locker: safe_lock: link to /home/dauser/.spamassassin/bayes.lock: link ok
dbg: bayes: tie-ing to DB file R/W /home/dauser/.spamassassin/bayes_toks
dbg: bayes: tie-ing to DB file R/W /home/dauser/.spamassassin/bayes_seen
dbg: bayes: found bayes db version 3
dbg: locker: refresh_lock: refresh /home/dauser/.spamassassin/bayes.lock
dbg: bayes: DB expiry: tokens in DB: 148790, Expiry max size: 150000, 
                   Oldest atime: 1230126517, 
                   Newest atime: 1236139617, 
                   Last expire:  1235651034,    
                   Current time: 1236140791
dbg: bayes: expiry completed
dbg: bayes: untie-ing
dbg: bayes: files locked, now unlocking lock
dbg: locker: safe_unlock: unlink /home/dauser/.spamassassin/bayes.lock

backup format:

v   3   db_version # this must be the first line!!!
v   261603  num_spam
v   19752   num_nonspam
t   2275    2542    1239258238  c0614089c0
t   13207   1334    1239311789  2dd27dc5f9
t   2   275 1239303031  461312c98e
t   8   189 1239254329  262e33315c
t   1   107 1239123759  91919f0fac
t   3   206 1239312110  90775ea219 
… 
s   s   f0a5362fbb8e433dd8378e7986bb8d80edffbc03@sa_generated
s   h   266cee79307b372d7104c3dae598dca8acb9e15e@sa_generated
s   s   00ff668488135f17f4ac7c5a2ab7f573abfbef2f@sa_generated
Spam Assassin configuration

Original sa-learn

On an on-going basis, it is best to keep training the filter to make sure it has fresh data to work from.


site config

 /usr/local/cpanel/etc/mail/spamassassin/
   750 Jun  6    2019  BAYES_POISON_DEFENSE.cf
   4233 Jun  6  2019  CPANEL.cf
 488518 Aug 25 03:49  KAM.cf
   4983 Sep 14  2020  deadweight2.cf
  31173 Aug 25 03:49  deadweight2_meta.cf
   8342 Aug 25 03:49  deadweight2_sub.cf
  23681 Aug 25 03:49  deadweight.cf
   1316 Aug 25 03:49  kam_heavyweights.cf
   3504 Nov 18  2015  P0f.cf
   1869 May 29  2018  user_prefs.template

  4073 Dec  7 03:50 local.cf
  3213 May 11  2020 local.cf.rpmnew

  1194 Sep 16 18:29 init.pre
  2524 Sep 16 18:29 v310.pre
  1194 Sep 16 18:29 v312.pre
  1237 Sep 16 18:29 v330.pre
  2416 May 11  2020 v320.pre.rpmnew
  2412 Feb  2  2017 v320.pre

 4096 Feb  1 03:53 sa-update-keys/
#BAYES_POISON_DEFENSE.cf - SpamAssassin Rules bayes_auto_learn_threshold_spam 8.0 # Default is 6.0 bayes_auto_learn_threshold_nonspam -2.0 # Default is +0.1 #KAM.cf #Author: Kevin A. McGrail et al #HomePage: http://www.mcgrail.com/downloads/KAM.cf the McGrail Foundation #Installation: There are multiple files that make up the KAM ruleset including #heavyweight, deadweight, & nonKAMrules. The KAM ruleset is now a channel! # #Please see https://mcgrail.com/template/kam.cf_channel for more information #The ruleset includes internal rules so not every rule will be useful but #we encapsulate those in a KAMOnly defined loop. #KAM.cf is maintained by The McGrail Foundation, a 501(c)(3) charity. Donations #are appreciated. See www.mcgrail.com for more information on donations and #sponsorships. #THANK YOU TO OUR SPONSORS (in Alphabetical Order): #cPanel, INKY, Invaluement, iSpark, Linode, PCCC, ShipShapeIT and Zix/Appriver #This is a collection of special rules that I have developed and use on my system. # #The exact date is lost to the sands of time but we have been publishing this #ruleset since at least May 2004. # #They are intended as live research for committal to SpamAssassin's SVN sandbox but #often rely on my corpora so they do not fair well in masschecks. # #You are welcome and encouraged to email me directly regarding suggestions. #To avoid being caught by our filters, False positives and negatives should be #submitted to https://raptor.pccc.com/raptor.cgim?template=report_problem # #I believe the rules are safe and they are in use on production systems so I will #do my best to respond to FPs *especially* if you can send me an email sample. # #IMPORTANT: This cf file is designed for systems with a threshold of 5.0 or higher. #It is best to save an email sample in mbox format and zip it to attach to get #around my filters. It is sometimes best to send samples in a second email so I #know to go looking for it in my spam folders. # #NOTE: I do use some poison pill (i.e. Automatic HAM/SPAM rules). # # - I don't view many of my rules as single rules as I typically use meta rules. # I view meta rules as multiple rules hence a larger score is acceptable. # # - Some content needs to be blocked either due to large number of complaints or # for content. For example, the sexually explicit items and the stock tips. # FPs in these rules will be quickly addressed. # COURTESY OF Marcin Miros.aw /home/realger1/.spamassassin >