You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by John Beck <jb...@eng.sun.com> on 2005/01/12 00:07:19 UTC
check_whitelist extensions, changes
I found the check_whitelist tool very convenient for examining who had sent
me mail and what scores they had earned. But I often found myself wanting
to study certain individuals and/or domains, and sometimes to use the same
as --clean fodder. So I extended the script to understand the new options
--addr and --domain.
I also found the output format a little too rigid for my taste, as I like
to pipe the output to sort with k1n, k3n or k5n as an argument, but the
'(' / ')' were sometimes interfering, and typing a long sed entry every
time was a pain. So I altered the white-space a bit accordingly.
I will list the diffs (in -u format) in-line below and attach the entire
updated file; hopefully others will find these changes worthwhile as well.
--- check_whitelist~ Thu Jul 15 03:47:38 2004
+++ check_whitelist Tue Jan 11 14:58:42 2005
@@ -4,7 +4,7 @@
sub usage {
die "
-usage: check_whitelist [--clean] [--min n] [dbfile]
+usage: check_whitelist [--clean] [--min n] [--addr addr | --domain domain] [dbfile]
";
}
@@ -13,18 +13,26 @@
use Getopt::Long;
use vars qw(
- $opt_clean $opt_min $opt_help
+ $opt_clean $opt_min $opt_addr $opt_domain $opt_help
);
GetOptions(
'clean' => \$opt_clean,
'min:i' => \$opt_min,
+ 'addr:s' => \$opt_addr,
+ 'domain:s' => \$opt_domain,
'help' => \$opt_help
) or usage();
$opt_help and usage();
$opt_min ||= 2;
+$opt_addr ||= '';
+$opt_domain ||= '';
+if ($opt_addr ne '' && $opt_domain ne '') {
+ die "addr and domain options are mutually exclusive\n";
+}
+
BEGIN { @AnyDBM_File::ISA = qw(DB_File GDBM_File NDBM_File SDBM_File); }
use AnyDBM_File ;
@@ -51,14 +59,28 @@
my $count = $h{$key};
next unless defined($totscore);
+ # There are 3 reasons to skip a given key:
+ # 1. clean was specified (but no addr or domain) and the count is above min.
+ if ($opt_clean && $count >= $opt_min && $opt_addr eq '' && $opt_domain eq '') {
+ #printf "skipping (count) %s\n", $key;
+ next;
+ }
+ # 2. An addr was specified but the key does not match.
+ if ($opt_addr ne '' && !($key =~ /^$opt_addr/)) {
+ #printf "skipping (addr) %s\n", $key;
+ next;
+ }
+ # 3. A domain was specified but the key does not match.
+ if ($opt_domain ne '' && !($key =~ /^.*[\@\.]$opt_domain\|/)) {
+ #printf "skipping (domain) '%s'\n", $key;
+ next;
+ }
if ($opt_clean) {
- if ($count >= $opt_min) { next; }
print "cleaning: ";
}
- printf "% 8.1f %15s -- %s\n",
- $totscore/$count, (sprintf "(%.1f/%d)",$totscore,$count),
- $key;
+ printf "% 6.1f %15s -- %s\n", $totscore/$count,
+ (sprintf "( % 7.1f / %3d )",$totscore,$count), $key;
if ($opt_clean) {
delete $h{"$key|totscore"};
@@ -73,7 +95,7 @@
=head1 SYNOPSIS
-B<check_whitelist> [--clean] [--min n] [dbfile]
+B<check_whitelist> [--clean] [--min n] [--addr s | --domain s] [dbfile]
=head1 DESCRIPTION
@@ -97,6 +119,15 @@
used. The default is C<2>, so entries that have only been seen once are
deleted.
+=item --addr s
+
+Select an individual address to be deleted.
+
+=item --domain s
+
+Select an domain to be deleted: all addresses @ that domain of @ any
+sub-domain of that domain will be deleted.
+
=back
=head1 OUTPUT
@@ -107,8 +138,8 @@
For example:
- 0.0 (0.0/7) -- dawson@example.com|ip=208.192
- 21.8 (43.7/2) -- mcdaniel_2s2000@example.com|ip=200.106
+ 0.0 ( 0.0 / 7 ) -- dawson@example.com|ip=208.192
+ 21.8 ( 43.7 / 2 ) -- mcdaniel_2s2000@example.com|ip=200.106
C<AVG> is the average score; C<TOTSCORE> is the total score of all mails seen
so far; C<COUNT> is the number of messages seen from that sender; C<EMAIL> is
-- John
Re: check_whitelist extensions, changes
Posted by Duncan Findlay <du...@debian.org>.
On Tue, Jan 11, 2005 at 03:07:19PM -0800, John Beck wrote:
> I found the check_whitelist tool very convenient for examining who had sent
> me mail and what scores they had earned. But I often found myself wanting
> to study certain individuals and/or domains, and sometimes to use the same
> as --clean fodder. So I extended the script to understand the new options
> --addr and --domain.
>
> I also found the output format a little too rigid for my taste, as I like
> to pipe the output to sort with k1n, k3n or k5n as an argument, but the
> '(' / ')' were sometimes interfering, and typing a long sed entry every
> time was a pain. So I altered the white-space a bit accordingly.
>
> I will list the diffs (in -u format) in-line below and attach the entire
> updated file; hopefully others will find these changes worthwhile as well.
Bugzilla, http://bugzilla.spamassassin.org, is the best place for
this. Be sure to "attach" the path rather than include it in your
comment.
Thanks,
--
Duncan Findlay