You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Kevin W. Gagel" <ga...@cnc.bc.ca> on 2007/12/20 17:54:34 UTC

Re: hits gawk program re: spamassassin

----- Original Message -----
>I am trying to learn how you did your hits gawk program and how to use it
>and what the logfile should look like
>
>How do I learn more regarding your setup so I can use or change to use on
>my setup

Robert,

To use the hits report the command line is:
gawk -f hits /path/to/logfile

The -f tells gawk to get its commands from the file hits. Then use them on
the file indicated.

To learn more about gawk search google, there are enough tutorials out
there.

My setup is using spamd daemon and logging to it's own logfile. None the
less it should still work if there are other log entries because gawk is
searching only for lines that contain "result:". You can verify that it
will find what it needs by doing a grep on your logfile for that same
thing. Do it like this:
cat /var/log/maillog | grep " result: "

What should happen is that every line in your maillog that contains the
characters " results: " will be echo'd to your screen. This is what spamd
uses to log what it has found for each message.

That is the information that gawk will parse and retrieve what tests scored
and build a list of how often they scored. Since a test can only score once
in any given message, the amount of times a test scored represents how many
messages had the pattern that the particular test looks for. The gawk file
compensates for messages that did not score anything and adds them up as
well.

The theory is that if a test does not score lots for a given site then
there may not be any need for that test. Not running a test should reduce
the scan time and overall performance. 

I wrote this because of a test that scored in a single message that I was
examining. It turned out to be a blacklist that I didn't know about and if
I'd been using I would have blocked around 30,000 messages from entering my
site. Then I wondered how much time the cpu spent looking at these messages
so I wrote another one to sum up the scantimes for those particular
messages. The end result is that the messages chewed up over 60 hours of
time in just 3 weeks. So, if I stop them from entering, I improve
performance. Of course the trick is to ensure they are legitimate spam...

For those of you interested in using it, its located here:
http://mail.cnc.bc.ca/users/gagel@cnc.bc.ca/spamassassin/

=================================
Kevin W. Gagel
Network Administrator
Information Technology Services
(250) 562-2131 local 5448
My Blog:
http://mail.cnc.bc.ca/blogs/gagel
My File share:
http://mail.cnc.bc.ca/users/gagel

-------------------------------------------------------------------
The College of New Caledonia, Visit us at http://www.cnc.bc.ca
Virus scanning is done on all incoming and outgoing email.
Anti-spam information for CNC can be found at http://avas.cnc.bc.ca
-------------------------------------------------------------------