sa-learn [options] [file]...
Given a "
typical"
selection of incoming mail classified as spam or ham (non-spam), feed each mail to SpamAssassin, allowing it to 'learn' what signs are likely to mean spam, and which are likely to mean ham.
Run for each mail folder, and it will ''learn'' from the mail.
Globbing of the folder is supported; * will scan every folder that matches. See Mail::SpamAssassin::ArchiveIterator for more details. SpamAssassin remembers which mail messages it has learnt already, and will not re-learn those messages again, unless you use the --forget option. Messages learnt as spam will have SpamAssassin markup removed, on the fly. If you make a mistake and scan a mail as ham when it is spam, or vice versa, simply rerun this command with the correct classification, and the mistake will be corrected. SpamAssassin will automatically 'forget' the previous indications.
spamd users : to perform training remotely, over a network, see -L
.
--ham| (--spam) |
sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 261396 0 non-token data: nspam 0.000 0 18089 0 non-token data: nham 0.000 0 148790 0 non-token data: ntokens 0.000 0 1230126517 0 non-token data: oldest atime 0.000 0 1236139617 0 non-token data: newest atime 0.000 0 1236140767 0 non-token data: last journal sync atime 0.000 0 1235651034 0 non-token data: last expiry atime 0.000 0 5529600 0 non-token data: last expire atime delta 0.000 0 10952 0 non-token data: last expire reduction count |
sa-learn --dump data 0.062 2274 2400 1236137441 c0614089c0 0.507 13202 889 1236085607 2dd27dc5f9 0.001 2 226 1236119931 461312c98e 0.003 8 170 1235173556 262e33315c … 148790 lines ! 0.016 0 1 1235685679 da21efbad4 0.016 0 1 1235706793 2f834646e6 0.987 1 0 1235769482 ab8e7006c3 0.987 1 0 1236111118 77f749b43b |
dbg: locker: safe_lock: link to /home/dauser/.spamassassin/bayes.lock: link ok dbg: bayes: tie-ing to DB file R/W /home/dauser/.spamassassin/bayes_toks dbg: bayes: tie-ing to DB file R/W /home/dauser/.spamassassin/bayes_seen dbg: bayes: found bayes db version 3 dbg: locker: refresh_lock: refresh /home/dauser/.spamassassin/bayes.lock dbg: bayes: DB expiry: tokens in DB: 148790, Expiry max size: 150000, Oldest atime: 1230126517, Newest atime: 1236139617, Last expire: 1235651034, Current time: 1236140791 dbg: bayes: expiry completed dbg: bayes: untie-ing dbg: bayes: files locked, now unlocking lock dbg: locker: safe_unlock: unlink /home/dauser/.spamassassin/bayes.lock
v 3 db_version # this must be the first line!!! v 261603 num_spam v 19752 num_nonspam t 2275 2542 1239258238 c0614089c0 t 13207 1334 1239311789 2dd27dc5f9 t 2 275 1239303031 461312c98e t 8 189 1239254329 262e33315c t 1 107 1239123759 91919f0fac t 3 206 1239312110 90775ea219 … s s f0a5362fbb8e433dd8378e7986bb8d80edffbc03@sa_generated s h 266cee79307b372d7104c3dae598dca8acb9e15e@sa_generated s s 00ff668488135f17f4ac7c5a2ab7f573abfbef2f@sa_generatedSpam Assassin configuration
Original sa-learn
On an on-going basis, it is best to keep training the filter to make sure it has fresh data to work from.
auto-learning
Based on statistical analysis of the success rates, automatically train the database with a certain degree of confidence that training data is accurate.
bayes_auto_learn
to 0.
/usr/local/cpanel/etc/mail/spamassassin/ 750 Jun 6 2019 BAYES_POISON_DEFENSE.cf 4233 Jun 6 2019 CPANEL.cf 488518 Aug 25 03:49 KAM.cf 4983 Sep 14 2020 deadweight2.cf 31173 Aug 25 03:49 deadweight2_meta.cf 8342 Aug 25 03:49 deadweight2_sub.cf 23681 Aug 25 03:49 deadweight.cf 1316 Aug 25 03:49 kam_heavyweights.cf 3504 Nov 18 2015 P0f.cf 1869 May 29 2018 user_prefs.template 4073 Dec 7 03:50 local.cf 3213 May 11 2020 local.cf.rpmnew 1194 Sep 16 18:29 init.pre 2524 Sep 16 18:29 v310.pre 1194 Sep 16 18:29 v312.pre 1237 Sep 16 18:29 v330.pre 2416 May 11 2020 v320.pre.rpmnew 2412 Feb 2 2017 v320.pre 4096 Feb 1 03:53 sa-update-keys/#BAYES_POISON_DEFENSE.cf - SpamAssassin Rules bayes_auto_learn_threshold_spam 8.0 # Default is 6.0 bayes_auto_learn_threshold_nonspam -2.0 # Default is +0.1 #KAM.cf #Author: Kevin A. McGrail et al #HomePage: http://www.mcgrail.com/downloads/KAM.cf the McGrail Foundation #Installation: There are multiple files that make up the KAM ruleset including #heavyweight, deadweight, & nonKAMrules. The KAM ruleset is now a channel! # #Please see https://mcgrail.com/template/kam.cf_channel for more information #The ruleset includes internal rules so not every rule will be useful but #we encapsulate those in a KAMOnly defined loop. #KAM.cf is maintained by The McGrail Foundation, a 501(c)(3) charity. Donations #are appreciated. See www.mcgrail.com for more information on donations and #sponsorships. #THANK YOU TO OUR SPONSORS (in Alphabetical Order): #cPanel, INKY, Invaluement, iSpark, Linode, PCCC, ShipShapeIT and Zix/Appriver #This is a collection of special rules that I have developed and use on my system. # #The exact date is lost to the sands of time but we have been publishing this #ruleset since at least May 2004. # #They are intended as live research for committal to SpamAssassin's SVN sandbox but #often rely on my corpora so they do not fair well in masschecks. # #You are welcome and encouraged to email me directly regarding suggestions. #To avoid being caught by our filters, False positives and negatives should be #submitted to https://raptor.pccc.com/raptor.cgim?template=report_problem # #I believe the rules are safe and they are in use on production systems so I will #do my best to respond to FPs *especially* if you can send me an email sample. # #IMPORTANT: This cf file is designed for systems with a threshold of 5.0 or higher. #It is best to save an email sample in mbox format and zip it to attach to get #around my filters. It is sometimes best to send samples in a second email so I #know to go looking for it in my spam folders. # #NOTE: I do use some poison pill (i.e. Automatic HAM/SPAM rules). # # - I don't view many of my rules as single rules as I typically use meta rules. # I view meta rules as multiple rules hence a larger score is acceptable. # # - Some content needs to be blocked either due to large number of complaints or # for content. For example, the sexually explicit items and the stock tips. # FPs in these rules will be quickly addressed. # COURTESY OF Marcin Miros.aw