You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Andy Jezierski <AJ...@stepan.com> on 2014/02/11 18:25:51 UTC

Increase in Image Spam

I've been seeing a pretty big increase in image spam over the last month 
or so. I remember using FuzzyOCR years ago when image spam was a much 
bigger problem.

Since FuzzyOCR hasn't been maintained in several years, is there an 
alternative that would work?  Or is there another way to try and catch 
them?

They don't really hit on any rules....

X-Spam-Status: No, score=3.5 required=5.0 tests=BAYES_99,HTML_MESSAGE,
        SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no 
version=3.4.0-rc5 

Thanks
Andy

Re: Increase in Image Spam

Posted by Benny Pedersen <me...@junc.eu>.

On 2014-02-20 18:06, Amir Caspi wrote:

> for whatever reason, many of the FNs I've been getting lately are
> passing because they hit BAYES_00, even though they are matching
> AC_SPAMMY_URI_PATTERNS.  I need to enable bayes tokens in the headers
> so I can see why these are considered so hammy when I know for sure
> they're not...

meta AC_URI_BAYES_HAM (AC_SPAMMY_URI_PATTERNS && BAYES_00)

score with 5 ?

> But, I would love if there were a way to ignore the bayes score if
> AC_SPAMMY_URI_PATTERNS matches.

see above, dont count on scores, make rules to add scores, for the spam 
that is really spam

> I know this is rather silly -- the
> whole point of Bayes is to help determine if an email is spam or ham
> regardless of the other rules -- but I'm just flummoxed by having
> these obviously-spammy emails being treated as ham.

you should really just train bayes more then, spammers will always loose 
if bayes is well trained

> Should I create a rule that adds extra points if
> AC_SPAMMY_URI_PATTERNS hits AND a low Bayes score is found?

yep as i showed on above

> Or should
> I just make AC_SPAMMY_URI_PATTERNS a poison pill, since I've never
> gotten an FP out of it?

this will work aswell but if bayes is trained to bayes_60 or highter is 
does not really ned more help on bayes scoreing

> Not sure what else to do about these
> Bayes-killing spams (besides wiping my entire Bayes DB and starting
> over).

this will be counter productive :=)

> Thoughts?

samples somewhere ?`

Re: Increase in Image Spam

Posted by Amir 'CG' Caspi <ce...@3phase.com>.

On Thu, February 20, 2014 3:16 pm, Kevin A. McGrail wrote:
> Are you using 3.4.0?  I believe the size was hard-coded until then when
> the max-size option was added to sa-learn.

No, as mentioned previously in this flurry of emails, I'm using 3.3.2. 
However, note that using spamassassin directly (not learning, just
classifying) works just fine, there is no complaint of max message size. 
Using spamc with --max-size, no complaints either.  And, finally, sa-learn
with -D (debug) does not show me any error messages or warnings related to
message size, or ANYTHING in fact that would lead me to understand why
it's skipping these messages.  If they exceed the maximum size, sa-learn
is being very quiet about it and not throwing an explicit error in the
debug output.

I echo Martin's question of whether it's possible to override the max size
in local.cf, because on my system (with virtual hosts that call spamc)
that would be much more preferable than having to specify max-size in
every virtual host's /etc/procmailrc (which is how I have to do it now).

Thanks.

						--- Amir

Re: Increase in Image Spam

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

I think you were just on the email chain on list so my reply to another 
person went to you.

On 2/20/2014 5:21 PM, Benny Pedersen wrote:
> On 2014-02-20 23:16, Kevin A. McGrail wrote:
>
>> Are you using 3.4.0?  I believe the size was hard-coded until then
>> when the max-size option was added to sa-learn.
>
> SpamAssassin 3.4.0 (2014-02-07)
>
> yes i do ebuilds for gentoo self
>
> 3.4 is not in gentoo yet
>
> Kevin: do i need to be reply private here ?


-- 
*Kevin A. McGrail*
President

Peregrine Computer Consultants Corporation
3927 Old Lee Highway, Suite 102-C
Fairfax, VA 22030-2422

http://www.pccc.com/

703-359-9700 x50 / 800-823-8402 (Toll-Free)
703-359-8451 (fax)
KMcGrail@PCCC.com <ma...@pccc.com>

Re: Increase in Image Spam

Posted by Benny Pedersen <me...@junc.eu>.

On 2014-02-20 23:16, Kevin A. McGrail wrote:

> Are you using 3.4.0?  I believe the size was hard-coded until then
> when the max-size option was added to sa-learn.

SpamAssassin 3.4.0 (2014-02-07)

yes i do ebuilds for gentoo self

3.4 is not in gentoo yet

Kevin: do i need to be reply private here ?

Re: Increase in Image Spam

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 2/20/2014 5:07 PM, Amir 'CG' Caspi wrote:
> On Thu, February 20, 2014 2:49 pm, Benny Pedersen wrote:
>> On 2014-02-20 22:39, Kevin A. McGrail wrote:
>>> --max-size= I believe.  Default is 256K.
> sa-learn barfs, that flag is not accepted.  That flag works for spamc, but
> not for sa-learn.  sa-learn man page and CLI help don't have any mention
> of a max message size.
Are you using 3.4.0?  I believe the size was hard-coded until then when 
the max-size option was added to sa-learn.

Re: Increase in Image Spam

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 2/20/2014 10:35 PM, Amir Caspi wrote:
> On Feb 20, 2014, at 8:07 PM, Kevin A. McGrail <KM...@PCCC.com> wrote:
>
>> No need to run through 3.3.2.  The emails are well over the 256KB limit hard coded in sa-learn with 3.3.2.
> Understood, and thanks for checking on this.  Now that I know this is the problem, I've manually edited Mail::SpamAssassin::ArchiveIterator.pm to change the BIG_BYTES limit from 256K to 1500K (which I've found is a reasonable size for my small system).  I've verified that this change allows sa-learn to work properly for these messages.
>
> Is there any reason that such a manual edit could cause problems elsewhere, or am I safe to have made this change?  (Neglect the fact that large messages could cause high loads, my system can handle that.)
>
> Or, would you recommend that instead of making this change, I just set opt_all => 1 in sa-learn's instantiation of ArchiveIterator?  (That is, modify sa-learn instead of ArchiveIterator.)
I don't know, sorry.  Let us know if you find any issues for sure.
> Now, that brings up the other question: I have other mails that are well below the 256K limit (and certainly below the 1500K limit I just made), but they are still not being examined by sa-learn.  These messages are pretty old (from July 2013) ... are they being ignored because they are too old?  I don't see that sa-learn is using opt_before or opt_after for Archive_Iterator, and I don't see anywhere else where it's excluding old messages... and there are no errors in the debug output, but I'm still getting "0 message examined."
>
> This sample mbox of old mails is here:
>
> https://www.dropbox.com/s/zvbmvk8pb06v0m8/SA_testspam_old.mbox
>
> If it's being ignored based on date, how would I know that?
>
> Sorry for being dense. =)

The file isn't in mbox format.  No From separators.

Regards,
KAM

Re: Increase in Image Spam

Posted by Amir Caspi <ce...@3phase.com>.

On Feb 20, 2014, at 8:07 PM, Kevin A. McGrail <KM...@PCCC.com> wrote:

> No need to run through 3.3.2.  The emails are well over the 256KB limit hard coded in sa-learn with 3.3.2.

Understood, and thanks for checking on this.  Now that I know this is the problem, I've manually edited Mail::SpamAssassin::ArchiveIterator.pm to change the BIG_BYTES limit from 256K to 1500K (which I've found is a reasonable size for my small system).  I've verified that this change allows sa-learn to work properly for these messages.

Is there any reason that such a manual edit could cause problems elsewhere, or am I safe to have made this change?  (Neglect the fact that large messages could cause high loads, my system can handle that.)

Or, would you recommend that instead of making this change, I just set opt_all => 1 in sa-learn's instantiation of ArchiveIterator?  (That is, modify sa-learn instead of ArchiveIterator.)

Now, that brings up the other question: I have other mails that are well below the 256K limit (and certainly below the 1500K limit I just made), but they are still not being examined by sa-learn.  These messages are pretty old (from July 2013) ... are they being ignored because they are too old?  I don't see that sa-learn is using opt_before or opt_after for Archive_Iterator, and I don't see anywhere else where it's excluding old messages... and there are no errors in the debug output, but I'm still getting "0 message examined."

This sample mbox of old mails is here:

https://www.dropbox.com/s/zvbmvk8pb06v0m8/SA_testspam_old.mbox

If it's being ignored based on date, how would I know that?

Sorry for being dense. =)

Thanks.

--- Amir

Re: Increase in Image Spam

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 2/20/2014 7:18 PM, Amir 'CG' Caspi wrote:
> If you have a chance, please run it through both 3.3.2 and 3.4.0, to 
> see if there's a difference... clearly, it's not working on _MY_ 3.3.2 
> for some reason! I sent the exact commands that I used in a prior 
> email a couple of hours ago. Thanks. =) --- Amir

No need to run through 3.3.2.  The emails are well over the 256KB limit 
hard coded in sa-learn with 3.3.2.

3.4.0:

sa-learn -D --mbox --progress --spam < /tmp/temp.mbox 2>&1 | tee /tmp/output

Feb 20 21:51:33.484 [21525] dbg: archive-iterator: _run_mailbox 
/tmp/.spamassassin2152599LqEKtmp, ofs 0, limit 262144
Feb 20 21:51:33.500 [21525] info: archive-iterator: skipping large 
message: 4089 lines, 262160 bytes, limit 262144 bytes
Feb 20 21:51:33.501 [21525] dbg: archive-iterator: _run_mailbox 
/tmp/.spamassassin2152599LqEKtmp, ofs 429849, limit 262144
Feb 20 21:51:33.517 [21525] info: archive-iterator: skipping large 
message: 4088 lines, 262169 bytes, limit 262144 bytes


Re-running with a limit high enough to
sa-learn -D --mbox --progress --spam < /tmp/temp.mbox --max-size=600000 
2>&1 | tee /tmp/output

Learned tokens from 2 message(s) (2 message(s) examined)


Output from debug and everything ;-)

regards,
KAM

Re: Increase in Image Spam

Posted by Benny Pedersen <me...@junc.eu>.

On 2014-02-20 22:56, Amir 'CG' Caspi wrote:

> I run a virtual-hosting server where the individual site RPMs are 
> copied
> from server-level RPMs. Basically all software has to be installed as 
> RPMs
> in order to propagate to the individual virtual hosts.

google on dist2rpm, you basicly just use source from cpan to make rpms, 
when rpms is build update like you always do in centos

i just still dont understand centos people not make it self more 
natively create the spec file and rebuild with a src rpms if cpan is not 
an option

Re: Increase in Image Spam

Posted by Amir 'CG' Caspi <ce...@3phase.com>.

On Thu, February 20, 2014 2:39 pm, Axb wrote:
> what's wrong with installing from source?

I run a virtual-hosting server where the individual site RPMs are copied
from server-level RPMs. Basically all software has to be installed as RPMs
in order to propagate to the individual virtual hosts.

--- Amir

Re: Increase in Image Spam

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 2/20/2014 4:39 PM, Axb wrote:
> On 02/20/2014 10:35 PM, Amir 'CG' Caspi wrote:
>> Note that I have some other spams for which this is now an issue but 
>> which
>> I think worked fine in the past (with SA 3.3.1 for sure); is it possible
>> something got borked in sa-learn between 3.3.1 and 3.3.2 and nobody
>> noticed?  (I can't install 3.4 since it hasn't been RPM'd for CentOS 5.x
>> yet.)
>
> what's wrong with installing from source?
> (NOT Cpan install)
Theoretically CPAN install should work now as well though FreeBSD users 
will need to wait for the 3.4.1 release to install cleanly due to a 
variable collision (script).

Regards,
KAM

Re: Increase in Image Spam

Posted by Benny Pedersen <me...@junc.eu>.

On 2014-02-20 22:39, Axb wrote:
>> noticed?  (I can't install 3.4 since it hasn't been RPM'd for CentOS 
>> 5.x
>> yet.)
> 
> what's wrong with installing from source?
> (NOT Cpan install)

http://searchcode.com/codesearch/view/21483839

the harddest part is to know howto :=)

Re: Increase in Image Spam

Posted by Axb <ax...@gmail.com>.

On 02/20/2014 10:35 PM, Amir 'CG' Caspi wrote:
> Note that I have some other spams for which this is now an issue but which
> I think worked fine in the past (with SA 3.3.1 for sure); is it possible
> something got borked in sa-learn between 3.3.1 and 3.3.2 and nobody
> noticed?  (I can't install 3.4 since it hasn't been RPM'd for CentOS 5.x
> yet.)

what's wrong with installing from source?
(NOT Cpan install)

Re: Increase in Image Spam

Posted by Amir 'CG' Caspi <ce...@3phase.com>.

On Thu, February 20, 2014 2:49 pm, Benny Pedersen wrote:
> On 2014-02-20 22:39, Kevin A. McGrail wrote:
>> --max-size= I believe.  Default is 256K.

sa-learn barfs, that flag is not accepted.  That flag works for spamc, but
not for sa-learn.  sa-learn man page and CLI help don't have any mention
of a max message size.

> and small mbox files exists, it could just be missing --mbox on
> commandline else it would use maildir as default

Here is the exact command I am running, and the exact output:

-bash-3.2$ file SA_testspam.mbox
testspam: ASCII mail text

-bash-3.2$ sa-learn --mbox --progress --spam SA_testspam.mbox
Learned tokens from 0 message(s) (0 message(s) examined)

As you can see, it is an MBOX file, and I'm passing the --mbox flag, it
just doesn't like these two messages.  (To reiterate, adding a few other
spams results in THOSE spams getting considered, but these two messages
still being ignored.)

Very strange.

--- Amir

Re: Increase in Image Spam

Posted by Benny Pedersen <me...@junc.eu>.

On 2014-02-20 22:39, Kevin A. McGrail wrote:
> On 2/20/2014 4:35 PM, Amir 'CG' Caspi wrote:
>> If it's a size issue, how can I increase the size limit for sa-learn?
>> But, I don't think it's a size issue since these messages are under 
>> 512k
>> each.
> --max-size= I believe.  Default is 256K.

and small mbox files exists, it could just be missing --mbox on 
commandline else it would use maildir as default

Re: Increase in Image Spam

Posted by Amir 'CG' Caspi <ce...@3phase.com>.

On Thu, February 20, 2014 5:13 pm, Kevin A. McGrail wrote:
> Resend the mbox.link and I will likely have a cycle to throw it through.

https://www.dropbox.com/s/m4fuv670wnvwa16/SA_testspam.mbox

To be deleted in 24-48 hours (don't want spammers harvesting it).

If you have a chance, please run it through both 3.3.2 and 3.4.0, to see
if there's a difference... clearly, it's not working on _MY_ 3.3.2 for
some reason!  I sent the exact commands that I used in a prior email a
couple of hours ago.

Thanks. =)

--- Amir

Re: Increase in Image Spam

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

Resend the mbox.link and I will likely have a cycle to throw it through.
Regards,
KAM

Amir 'CG' Caspi <ce...@3phase.com> wrote:

>On Thu, February 20, 2014 4:08 pm, Kevin A. McGrail wrote:
>> Probably best if you install 3.4.0 (or even trunk) on a test system
>and
>> throw the offending email onto that server and run sa-learn on that
>box
>> with -D.
>
>In the meantime, anyone want to do it on my behalf? =)  I provided the
>mbox link earlier; I unfortunately do not have a test system available.
>
>(I'm not quite a professional sysadmin...)
>
>						--- Amir

Re: Increase in Image Spam

Posted by Amir 'CG' Caspi <ce...@3phase.com>.

On Thu, February 20, 2014 4:08 pm, Kevin A. McGrail wrote:
> Probably best if you install 3.4.0 (or even trunk) on a test system and
> throw the offending email onto that server and run sa-learn on that box
> with -D.

In the meantime, anyone want to do it on my behalf? =)  I provided the
mbox link earlier; I unfortunately do not have a test system available. 
(I'm not quite a professional sysadmin...)

						--- Amir

Re: Increase in Image Spam

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 2/20/2014 6:01 PM, Amir 'CG' Caspi wrote:
> On Thu, February 20, 2014 3:52 pm, Kevin A. McGrail wrote:
>> Questions that will be answered by "that is solved in 3.4.0" aren't
>> really going to get much support from me...
> Understood, though it'll be a while before I can upgrade to 3.4 due to the
> RPM issue that I've mentioned previously.  However, I Googled this issue
> before mailing and this iterator error you posted SHOULD appear in
> sa-learn even in 3.3.x, but it does not seem to.  More to the point, when
> trying to run on a spam that had previously worked fine with v3.3.1,
> sa-learn STILL says "0 messages examined" and that spam is only 4K, so
> there's no chance it's running up against the max-size limit.  (On the
> other hand, that spam is many months old -- does sa-learn have a date
> limit as well?  If so, is that customizable?)
Probably best if you install 3.4.0 (or even trunk) on a test system and 
throw the offending email onto that server and run sa-learn on that box 
with -D.

Then we can start discussing apples to apples and add more debugging if 
needed.

regards,
KAM

Re: Increase in Image Spam

Posted by Amir 'CG' Caspi <ce...@3phase.com>.

On Thu, February 20, 2014 3:52 pm, Kevin A. McGrail wrote:
> Questions that will be answered by "that is solved in 3.4.0" aren't
> really going to get much support from me...

Understood, though it'll be a while before I can upgrade to 3.4 due to the
RPM issue that I've mentioned previously.  However, I Googled this issue
before mailing and this iterator error you posted SHOULD appear in
sa-learn even in 3.3.x, but it does not seem to.  More to the point, when
trying to run on a spam that had previously worked fine with v3.3.1,
sa-learn STILL says "0 messages examined" and that spam is only 4K, so
there's no chance it's running up against the max-size limit.  (On the
other hand, that spam is many months old -- does sa-learn have a date
limit as well?  If so, is that customizable?)

--- Amir

Re: Increase in Image Spam

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 2/20/2014 5:38 PM, Amir 'CG' Caspi wrote:
> On Thu, February 20, 2014 3:29 pm, Kevin A. McGrail wrote:
>> Unifying wouldn't be something I would want to see.
> Well, no one is arguing to _force_ unification, but to provide an option
> for it.  That is, max-size could be set in local.cf and would become a
> global parameter, but could still be overridden with CLI options.
I think based on your idea below, the CLI could not override it. For 
example, if I have a max size on spamd, how could spamc override it?  
Right now, spamd has no limit and spamc enforces a limit.

As with before, just because *I* don't want to see it just means that 
you have to figure it out on your own and come up with a patch that 
doesn't break existing functionality but adds what you want. And I'm 
willing to discuss the concept and test patches because I see some 
merit.  But I know I have other issues I want to focus on right now with SA.
>> Typically if you were using spamassassin, a size limit it would be
>> implemented by your .procmailrc implementation for example.
> Well, at least in 3.3.2, there is no apparent max-size parameter for
> spamassassin (the direct SA executable, not spamc/spamd or sa-learn).
> Older messages from the archives of this very mailing list seems to
> suggest that spamassassin itself has no message size limit.
That is correct.  There is no size limit in that scenario of using the 
spamassassin executable directly.  However, in the real-world, there is 
almost 0 necessity to use that in a live environment because the startup 
time is too high.

As noted, if you were using spamassassin, you would likely using 
something like .procmailrc with a rule that limits the size ala:

:0fw
* < 1572864

>   spamc
> certainly does, which as you say is overridden with the -s parameter.
> sa-learn apparently has a hardcoded limit, although as I mentioned in my
> previous email, I'm not seeing any error in the debug output that it's
> skipping due to size.
Please try with 3.4.0 and if there is still no output in debug, let me 
know and I'll add something.  But from looking at the code, I believe 
this is addressed:

  info("archive-iterator: skipping large message: ".
          "file size %d, limit %d bytes", -s _, $opt_max_size);

Questions that will be answered by "that is solved in 3.4.0" aren't 
really going to get much support from me...
>
>> More to the point, spamc would have to process all config files first
>> which would slow it down.  The point of spamc is to be a VERY
>> lightweight connection to spamd.
> Actually, if a limit is imposed centrally in spamd, I think this could be
> accomplished without any changes to spamc except to remove spamc's default
> size limit.  spamc would remain lightweight, simply piping email to
> spamd... if the message exceeds spamd's size limit, spamd would simply
> regurgitate the X-Spam-Status: No header, which is exactly what spamc
> currently does locally when the message size limit is exceeded -- the
> difference is only that spamc would send the message to spamd and spamd
> would barf, rather than spamc barfing locally.  Only spamd would have to
> read its central config.  (A local size limit COULD still be imposed for
> spamc via CLI, the difference is that no local size limit would exist by
> default, it would have to be done via CLI.)

More to the point though, a local size limit SHOULD be imposed.  Do you 
really want Spamc sending giant messages to spamd just to have it say, 
no, that's too large?

If you really want this, I'd say off the cuff you should implement a new 
version of the spamc protocol and have the spamc/spamd negotiate whether 
the connection was going to be accepted by sending the message size 
ahead of time coupled with a local.cf option for the spamd max message size.

You can open a feature request for this at bugzilla and I'd be happy to 
help testing any patches you might come up with.

However, in my case, I use spamc and multiple factors to determine what 
the max size is to send to spamd.  For example, if our load average is 
very low, I will send very large messages to spamd.  I enjoy the 
flexibility of the setting.

regards,
KAM

Re: Increase in Image Spam

Posted by Amir 'CG' Caspi <ce...@3phase.com>.

On Thu, February 20, 2014 3:29 pm, Kevin A. McGrail wrote:
> Unifying wouldn't be something I would want to see.

Well, no one is arguing to _force_ unification, but to provide an option
for it.  That is, max-size could be set in local.cf and would become a
global parameter, but could still be overridden with CLI options.

> Typically if you were using spamassassin, a size limit it would be
> implemented by your .procmailrc implementation for example.

Well, at least in 3.3.2, there is no apparent max-size parameter for
spamassassin (the direct SA executable, not spamc/spamd or sa-learn). 
Older messages from the archives of this very mailing list seems to
suggest that spamassassin itself has no message size limit.  spamc
certainly does, which as you say is overridden with the -s parameter. 
sa-learn apparently has a hardcoded limit, although as I mentioned in my
previous email, I'm not seeing any error in the debug output that it's
skipping due to size.

> More to the point, spamc would have to process all config files first
> which would slow it down.  The point of spamc is to be a VERY
> lightweight connection to spamd.

Actually, if a limit is imposed centrally in spamd, I think this could be
accomplished without any changes to spamc except to remove spamc's default
size limit.  spamc would remain lightweight, simply piping email to
spamd... if the message exceeds spamd's size limit, spamd would simply
regurgitate the X-Spam-Status: No header, which is exactly what spamc
currently does locally when the message size limit is exceeded -- the
difference is only that spamc would send the message to spamd and spamd
would barf, rather than spamc barfing locally.  Only spamd would have to
read its central config.  (A local size limit COULD still be imposed for
spamc via CLI, the difference is that no local size limit would exist by
default, it would have to be done via CLI.)

Cheers.
						--- Amir

Re: Increase in Image Spam

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 2/20/2014 5:48 PM, Martin Gregorie wrote:
> On Thu, 2014-02-20 at 17:29 -0500, Kevin A. McGrail wrote:
>> More to the point, spamc would have to process all config files first
>> which would slow it down.  The point of spamc is to be a VERY
>> lightweight connection to spamd.
>>
> That's why I suggested that spamc could be handed that value by spamd
> before it ships the message over.
I had the same suggestion.  "If you really want this, I'd say off the 
cuff you should implement a new version of the spamc protocol and have 
the spamc/spamd negotiate whether the connection was going to be 
accepted by sending the message size ahead of time coupled with a 
local.cf option for the spamd max message size.

You can open a feature request for this at bugzilla and I'd be happy to 
help testing any patches you might come up with."

So in short, if you like the idea, take a whack at the code and make a 
patch.

regards,
KAM

Re: Increase in Image Spam

Posted by Martin Gregorie <ma...@gregorie.org>.

On Thu, 2014-02-20 at 17:29 -0500, Kevin A. McGrail wrote:
> More to the point, spamc would have to process all config files first 
> which would slow it down.  The point of spamc is to be a VERY 
> lightweight connection to spamd.
> 
That's why I suggested that spamc could be handed that value by spamd
before it ships the message over. 

This is or should be lightweight: in the past I was able to get 25,000
request/responses per second from a process that was answering queries
against a large (500k entry) in-memory red/black btree. This was on a
single core 625 MHz AlphaServer with both processes on the same box. IOW
the cost per message pair was comfortably under 40mS once the time
needed to search the btree is subtracted. Most present-day servers
should do considerably better.

Martin

Re: Increase in Image Spam

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 2/20/2014 5:16 PM, Martin Gregorie wrote:
> On Thu, 2014-02-20 at 16:39 -0500, Kevin A. McGrail wrote:
>> On 2/20/2014 4:35 PM, Amir 'CG' Caspi wrote:
>>> If it's a size issue, how can I increase the size limit for sa-learn?
>>> But, I don't think it's a size issue since these messages are under 512k
>>> each.
>> --max-size= I believe.  Default is 256K.
>>
> Sorry, no. According to my manpage (SA 3.3.2) there is no --max-size
> option and (second try) sa-learn --max-size is rejected as an unknown
> option.
Try 3.4.0

  --max-size <b>        Skip messages larger than b bytes;
                               defaults to 256 KiB, 0 implies no limit

I'll fix KiB to read KB.
>
> On the same subject, is there any change that a max-size configuration
> parameter could be supplied via local.cf?
Don't believe so.
> 1) IMO a single central setting is better than remembering to specify
>     and change it in several scripts. Currently it needs to be set to
>     the same value in every script or MTA configuration that can run
>     spamc and/or sa-learn and its quite easy to miss one.
My systems run with different limits in different places and in fact on 
different servers with spamc connecting to spamd boxes on other 
systems.  Unifying wouldn't be something I would want to see.
>
> 2) There currently seems to be no way of overriding the default max
>     message size for the commands spamassassin, spamd or sa-learn.
I believe this is false.

Typically if you were using spamassassin, a size limit it would be 
implemented by your .procmailrc implementation for example.

Spamd would be limited by spamc -s parameter.

sa-learn has the --max-size option added with 3.4.0
> 3) It improves system documentation to have all parameter settings in
>     one place.
SA is an API as well as a collection of programs implementing the API.  
It's a Swiss army tool with a whole bunch of configurable settings.  
And, as in my case, many of the tools can run on different servers by 
different users, etc.  One place for parameters is very hard.

But if you want to discuss further and can provide patches that don't 
break existing functionality, I'm always looking to get more people 
involved and submitting patches.
> I accept that setting the message size in local.cf may slow spamc down
> slightly if spamd doesn't already send a reply to spamc, which could
> pass the setting back, before accepting the message but the overhead of
> adding the reply message should be quite small.
More to the point, spamc would have to process all config files first 
which would slow it down.  The point of spamc is to be a VERY 
lightweight connection to spamd.

regards,
KAM

Re: Increase in Image Spam

Posted by Martin Gregorie <ma...@gregorie.org>.

On Thu, 2014-02-20 at 16:39 -0500, Kevin A. McGrail wrote:
> On 2/20/2014 4:35 PM, Amir 'CG' Caspi wrote:
> > If it's a size issue, how can I increase the size limit for sa-learn?
> > But, I don't think it's a size issue since these messages are under 512k
> > each.
> --max-size= I believe.  Default is 256K.
> 
Sorry, no. According to my manpage (SA 3.3.2) there is no --max-size
option and (second try) sa-learn --max-size is rejected as an unknown
option.

On the same subject, is there any change that a max-size configuration
parameter could be supplied via local.cf? 

Reasons:

1) IMO a single central setting is better than remembering to specify
   and change it in several scripts. Currently it needs to be set to 
   the same value in every script or MTA configuration that can run
   spamc and/or sa-learn and its quite easy to miss one.

2) There currently seems to be no way of overriding the default max
   message size for the commands spamassassin, spamd or sa-learn.

3) It improves system documentation to have all parameter settings in
   one place.

I accept that setting the message size in local.cf may slow spamc down
slightly if spamd doesn't already send a reply to spamc, which could
pass the setting back, before accepting the message but the overhead of
adding the reply message should be quite small.

Martin

Re: Increase in Image Spam

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 2/20/2014 4:35 PM, Amir 'CG' Caspi wrote:
> If it's a size issue, how can I increase the size limit for sa-learn?
> But, I don't think it's a size issue since these messages are under 512k
> each.
--max-size= I believe.  Default is 256K.

Re: Increase in Image Spam

Posted by Amir 'CG' Caspi <ce...@3phase.com>.

On Thu, February 20, 2014 12:57 pm, John Hardin wrote:
> "0 messages examined" generally means either the format isn't what
> sa-learn expected, or the message is larger than the size limit.

The file format is most certainly MBOX... it was created by my MUA, and
running "file" on it tells me that it is "ASCII mail text."  As I
mentioned, adding other spams to it results in those other spams being
properly learned, so it can't be a format issue unless the specific
messages themselves are not formatted in a way that sa-learn likes (though
the MTA and MUA like it just fine).

If it's a size issue, how can I increase the size limit for sa-learn? 
But, I don't think it's a size issue since these messages are under 512k
each.

Note that I have some other spams for which this is now an issue but which
I think worked fine in the past (with SA 3.3.1 for sure); is it possible
something got borked in sa-learn between 3.3.1 and 3.3.2 and nobody
noticed?  (I can't install 3.4 since it hasn't been RPM'd for CentOS 5.x
yet.)

I tried running sa-learn -D but the debug output didn't tell me anything
(that I could see) about why it was skipping the messages.  Running
spamassassin on the messages works just fine (I see SA output, so it's
matching rules), as does running spamc/spamd.  It is only sa-learn that
seems to be choking, and I have no idea why.

Any additional suggestions on how I can diagnose this?  Is it looking like
something I can fix, or a bug in sa-learn?

Thanks.

--- Amir

Re: Increase in Image Spam

Posted by John Hardin <jh...@impsec.org>.

On Thu, 20 Feb 2014, Ian Zimmerman wrote:

> On Thu, 20 Feb 2014 11:57:17 -0800 (PST)
> John Hardin <jh...@impsec.org> wrote:
>
> Amir> When I run sa-learn on this mailbox, it says:
>
> Amir> Learned tokens from 0 message(s) (0 message(s) examined)
>
> John> "0 messages examined" generally means either the format isn't what
> John> sa-learn expected, or the message is larger than the size limit.
>
> In my case it usually means the message has been learned already and SA
> just refuses to do so for the 2nd time :-)

That would be "learned tokens from 0 messages (n > 0 messages examined)".

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   You do not examine legislation in the light of the benefits it
   will convey if properly administered, but in the light of the
   wrongs it would do and the harms it would cause if improperly
   administered.                                  -- Lyndon B. Johnson
-----------------------------------------------------------------------
  2 days until George Washington's 282nd Birthday

Re: Increase in Image Spam

Posted by Amir Caspi <ce...@3phase.com>.

On Feb 20, 2014, at 7:07 PM, Ian Zimmerman <it...@buug.org> wrote:

> In my case it usually means the message has been learned already and SA
> just refuses to do so for the 2nd time :-)

When I run sa-learn on already-learned messages, it says 0 tokens learned, but it still says N messages examined (where N > 0).  That is, it _examines_ the messages, but does not learn from them, because they were already processed.

In this case, it's not even examining the messages, which is a different problem.

Thanks.

--- Amir

Re: Increase in Image Spam

Posted by Ian Zimmerman <it...@buug.org>.

On Thu, 20 Feb 2014 11:57:17 -0800 (PST)
John Hardin <jh...@impsec.org> wrote:

Amir> When I run sa-learn on this mailbox, it says:

Amir> Learned tokens from 0 message(s) (0 message(s) examined)

John> "0 messages examined" generally means either the format isn't what
John> sa-learn expected, or the message is larger than the size limit.

In my case it usually means the message has been learned already and SA
just refuses to do so for the 2nd time :-)

-- 
Please *no* private copies of mailing list or newsgroup messages.

gpg public key: 2048R/984A8AE4
fingerprint: 7953 ADA1 0E8E AB57 FB79  FFD2 360A 88B2 984A 8AE4
Funny pic: http://bit.ly/ZNE2MX

Re: Increase in Image Spam

Posted by John Hardin <jh...@impsec.org>.

On Thu, 20 Feb 2014, Amir Caspi wrote:

> When I run sa-learn on this mailbox, it says:
>
> Learned tokens from 0 message(s) (0 message(s) examined)

"0 messages examined" generally means either the format isn't what 
sa-learn expected, or the message is larger than the size limit.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   From the Liberty perspective, it doesn't matter if it's a
   jackboot or a Birkenstock smashing your face.         -- Robb Allen
-----------------------------------------------------------------------
  2 days until George Washington's 282nd Birthday

Re: Increase in Image Spam

Posted by Amir Caspi <ce...@3phase.com>.

On Feb 20, 2014, at 11:21 AM, Kris Deugau <kd...@vianet.ca> wrote:

> Have you tried learning one specific FN, then reprocessing that message
> to see what Bayes score it gets?  IME it will usually shift from
> BAYES_00 to at least BAYES_40 in most cases, even with a large sitewide
> DB with far more tokens than the usual per-user DB.

Well, I just tried this, and sa-learn seems to be refusing to learn the messages.  I've placed an example MBOX here, temporarily (I will delete this within the next 24-48 hours for security):

https://www.dropbox.com/s/m4fuv670wnvwa16/SA_testspam.mbox

When I run sa-learn on this mailbox, it says:

Learned tokens from 0 message(s) (0 message(s) examined)

(This is using SA 3.3.2 on a CentOS 5.10 box.)

I tried placing other spam in here and it learned those fine, so clearly something about these two messages is confusing sa-learn.

Anyone have an idea why sa-learn is refusing to even examine these messages?

(Note that the messages are out of order; the first one is newer than the second.  The older one scored Bayes_50, the newer one scored Bayes_00.)

Any thoughts are greatly appreciated, I don't know why sa-learn won't even touch these... and that may explain why they continue to have low scores!

--- Amir

Re: Increase in Image Spam

Posted by Kris Deugau <kd...@vianet.ca>.

Amir Caspi wrote:
> Bayes is set to autolearn, and I manually run sa-learn about once a week on my spam folder (to learn the FNs, plus lower-scoring spam that was not autolearned).

Try setting up a cron job to run this daily or even as often as hourly.
 The faster you get feedback into the system the less likely it is
you'll end up with strange results.

Have you tried learning one specific FN, then reprocessing that message
to see what Bayes score it gets?  IME it will usually shift from
BAYES_00 to at least BAYES_40 in most cases, even with a large sitewide
DB with far more tokens than the usual per-user DB.

-kgd

Re: Increase in Image Spam

Posted by Benny Pedersen <me...@junc.eu>.

On 2014-02-20 21:43, Axb wrote:

> Redis DB in RAM - do the math :)

got results as 7812500000

now its time to see how much power so many pi' is using :=)

have anyone thinked about running mysql in memory ?, if its slow?

engine=memory in the spamd init script, and engine=myisam on shutdown

yes i know its risky, but would be nice to see comparisons

Re: Increase in Image Spam

Posted by Axb <ax...@gmail.com>.

On 02/20/2014 07:46 PM, Benny Pedersen wrote:
> On 2014-02-20 19:34, Axb wrote:
>> well, not huge...let me brag :)
>>
>> sa-learn --dump magic
>> 0.000          0          3          0  non-token data: bayes db version
>> 0.000          0   17663091          0  non-token data: nspam
>> 0.000          0    6768342          0  non-token data: nham
>
> how many raspberry-pi is needed in cluster setup to handle this ? :=)

# Memory
used_memory:4072212184
used_memory_human:3.79G
used_memory_rss:4163964928
used_memory_peak:4076821712
used_memory_peak_human:3.80G

Redis DB in RAM - do the math :)

Re: Increase in Image Spam

Posted by Benny Pedersen <me...@junc.eu>.

On 2014-02-20 19:34, Axb wrote:
> well, not huge...let me brag :)
> 
> sa-learn --dump magic
> 0.000          0          3          0  non-token data: bayes db 
> version
> 0.000          0   17663091          0  non-token data: nspam
> 0.000          0    6768342          0  non-token data: nham

how many raspberry-pi is needed in cluster setup to handle this ? :=)

/me hiddes

Re: Increase in Image Spam

Posted by Axb <ax...@gmail.com>.

On 02/20/2014 06:44 PM, Amir Caspi wrote:
> On Feb 20, 2014, at 10:34 AM, Axb <ax...@gmail.com> wrote:
>
>> I hope you're running SA 3.4 so:
>
> I am still on 3.3.2 because nobody has yet packaged 3.4 for CentOS
> 5.x, from what I can tell.  I have the package from the
> rpmforge-extras repo, and 3.3.2 is still the most current version
> there (and on Atomic and AtRPMs).
>
> I'm not sure who is responsible for updating the packages, but I'll
> probably have to wait a while until they get 3.4 uploaded there.
>
>> Assuming you can check maillogs and can either detect some spammed
>> unknown user patterns or have  a dedicated trap domain to spare,
>> I'd accept that mail and write some header rules to score the trap
>> rcpt/domain REAL high and use a rule like
>>
>> tflags RULENAME autolearn_force
>
> I'm not entirely sure what you mean here.  Are you saying to use a
> honeypot/spamtrap to feed the Bayes DB?

yep, exactly.

>  My problem is not that my Bayes DB doesn't have enough spam in it, it's that these particular
> FNs are scoring 00.  Let me note that the Bayes DBs are per-user, not
> per-domain.  Here's the magic output from my Bayes DB:

Personally I wouldn't use /user bayes DB but site wide so all users will 
have the benefit of your trapped data/learnt spam
I'd bet you'd see a major improvement in spam detection and no FPs.

> I don't think this counts as a "small" DB, does it?

well, not huge...let me brag :)

sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0   17663091          0  non-token data: nspam
0.000          0    6768342          0  non-token data: nham


> Bayes is set to autolearn, and I manually run sa-learn about once a
> week on my spam folder (to learn the FNs, plus lower-scoring spam
> that was not autolearned).  MANY such image spams are caught
> properly, including by Bayes; the problem is that some of them,
> somehow, manage to slip through and score very low (00 or 20).  I
> just have no idea how that is happening (which is why I should start
> enabling token output in the headers and look), but that's why I was
> thinking of scoring AC_SPAMMY_URI_PATTERNS very high if Bayes is
> scoring very low, although I guess that kind of defeats the purpose
> of Bayes and introduces the risk of FPs.

seems obvious that learning manually a week later isn't doing the trick 
imo, you're in need of a better method to autolearn in "the flow"
as use an imap folder to drop FNs into and script learn spam from there, 
every hour, for example...

Axb

Re: Increase in Image Spam

Posted by Amir Caspi <ce...@3phase.com>.

On Feb 20, 2014, at 10:34 AM, Axb <ax...@gmail.com> wrote:

> I hope you're running SA 3.4 so:

I am still on 3.3.2 because nobody has yet packaged 3.4 for CentOS 5.x, from what I can tell.  I have the package from the rpmforge-extras repo, and 3.3.2 is still the most current version there (and on Atomic and AtRPMs).

I'm not sure who is responsible for updating the packages, but I'll probably have to wait a while until they get 3.4 uploaded there.

> Assuming you can check maillogs and can either detect some spammed unknown user patterns or have  a dedicated trap domain to spare, I'd accept that mail and write some header rules to score the trap rcpt/domain REAL high and use a rule like
> 
> tflags RULENAME autolearn_force

I'm not entirely sure what you mean here.  Are you saying to use a honeypot/spamtrap to feed the Bayes DB?  My problem is not that my Bayes DB doesn't have enough spam in it, it's that these particular FNs are scoring 00.  Let me note that the Bayes DBs are per-user, not per-domain.  Here's the magic output from my Bayes DB:

0.000          0          3          0  non-token data: bayes db version
0.000          0     239650          0  non-token data: nspam
0.000          0      85695          0  non-token data: nham
0.000          0     145773          0  non-token data: ntokens
0.000          0 1387110367          0  non-token data: oldest atime
0.000          0 1392917375          0  non-token data: newest atime
0.000          0 1392886526          0  non-token data: last journal sync atime
0.000          0 1392637273          0  non-token data: last expiry atime
0.000          0    5529600          0  non-token data: last expire atime delta
0.000          0       9005          0  non-token data: last expire reduction count

I don't think this counts as a "small" DB, does it?

Bayes is set to autolearn, and I manually run sa-learn about once a week on my spam folder (to learn the FNs, plus lower-scoring spam that was not autolearned).  MANY such image spams are caught properly, including by Bayes; the problem is that some of them, somehow, manage to slip through and score very low (00 or 20).  I just have no idea how that is happening (which is why I should start enabling token output in the headers and look), but that's why I was thinking of scoring AC_SPAMMY_URI_PATTERNS very high if Bayes is scoring very low, although I guess that kind of defeats the purpose of Bayes and introduces the risk of FPs.

-- Amir

Re: Increase in Image Spam

Posted by Axb <ax...@gmail.com>.

On 02/20/2014 06:22 PM, Amir Caspi wrote:
> On Feb 20, 2014, at 10:15 AM, Axb <ax...@gmail.com> wrote:
>
>> What kind of traffic are you dealing with? personal, corporate?
>> ISPish? How many domains/users/msgs/day?
>
> This is mostly personal email with a little bit of corporate.  In
> this instance, it is for a single domain with 3 users and
> approximately 50-100 total legitimate messages per day (but HUNDREDS
> of spams per day, most of which are properly classified; I am seeing
> only a few [<10] FNs per day, although those FNs are, as I described,
> getting Bayes_00... they are almost always image spam with not much
> text.)
>
> I do have a number of other domains but I don't monitor the spam
> quality on those actively (and I haven't received complaints).


In your case this is what I'd do.

I hope you're running SA 3.4 so:

Assuming you can check maillogs and can either detect some spammed 
unknown user patterns or have  a dedicated trap domain to spare, I'd 
accept that mail and write some header rules to score the trap 
rcpt/domain REAL high and use a rule like

tflags RULENAME autolearn_force

obviously you'll need
bayes_auto_learn  1


That would help feed your small Bayes DB pretty fast and help detect all 
kinds of crap.

h2h

Re: Increase in Image Spam

Posted by Amir Caspi <ce...@3phase.com>.

On Feb 20, 2014, at 10:15 AM, Axb <ax...@gmail.com> wrote:

> What kind of traffic are you dealing with? personal, corporate? ISPish?
> How many domains/users/msgs/day?

This is mostly personal email with a little bit of corporate.  In this instance, it is for a single domain with 3 users and approximately 50-100 total legitimate messages per day (but HUNDREDS of spams per day, most of which are properly classified; I am seeing only a few [<10] FNs per day, although those FNs are, as I described, getting Bayes_00... they are almost always image spam with not much text.)

I do have a number of other domains but I don't monitor the spam quality on those actively (and I haven't received complaints).

Thanks.

--- Amir

Re: Increase in Image Spam

Posted by Axb <ax...@gmail.com>.

On 02/20/2014 06:06 PM, Amir Caspi wrote:
> Hi all,
>
> 	Following some off-list discussions with Kevin, John, et al., I had a question that was suggested I bring up on-list, so here it is:
>
> 	For whatever reason, many of the FNs I've been getting lately are passing because they hit BAYES_00, even though they are matching AC_SPAMMY_URI_PATTERNS.  I need to enable bayes tokens in the headers so I can see why these are considered so hammy when I know for sure they're not...
>
> 	But, I would love if there were a way to ignore the bayes score if AC_SPAMMY_URI_PATTERNS matches.  I know this is rather silly -- the whole point of Bayes is to help determine if an email is spam or ham regardless of the other rules -- but I'm just flummoxed by having these obviously-spammy emails being treated as ham.
>
> 	Should I create a rule that adds extra points if AC_SPAMMY_URI_PATTERNS hits AND a low Bayes score is found?  Or should I just make AC_SPAMMY_URI_PATTERNS a poison pill, since I've never gotten an FP out of it?  Not sure what else to do about these Bayes-killing spams (besides wiping my entire Bayes DB and starting over).
>
> Thoughts?

Amir,

What kind of traffic are you dealing with? personal, corporate? ISPish?
How many domains/users/msgs/day?

There's a number of options depending on the amount of traffic you handle.

Re: Increase in Image Spam

Posted by Amir Caspi <ce...@3phase.com>.

Hi all,

	Following some off-list discussions with Kevin, John, et al., I had a question that was suggested I bring up on-list, so here it is:

	For whatever reason, many of the FNs I've been getting lately are passing because they hit BAYES_00, even though they are matching AC_SPAMMY_URI_PATTERNS.  I need to enable bayes tokens in the headers so I can see why these are considered so hammy when I know for sure they're not...

	But, I would love if there were a way to ignore the bayes score if AC_SPAMMY_URI_PATTERNS matches.  I know this is rather silly -- the whole point of Bayes is to help determine if an email is spam or ham regardless of the other rules -- but I'm just flummoxed by having these obviously-spammy emails being treated as ham.

	Should I create a rule that adds extra points if AC_SPAMMY_URI_PATTERNS hits AND a low Bayes score is found?  Or should I just make AC_SPAMMY_URI_PATTERNS a poison pill, since I've never gotten an FP out of it?  Not sure what else to do about these Bayes-killing spams (besides wiping my entire Bayes DB and starting over).

Thoughts?

Thanks.

--- Amir

Re: Increase in Image Spam

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 2/11/2014 2:02 PM, John Hardin wrote:
> On Tue, 11 Feb 2014, Amir Caspi wrote:
>
>> I could release the rules publicly but that may end up backfiring, 
>> per above.  John, Kevin, what do you guys think?
>
> Spammers can install SpamAssassin as easily as anyone else, that's a 
> known risk. Any rules we provide they can potentially test against 
> their spams to minimize score.
>
> How much they actually *do* this I can't say.
>
> We could try it with one of your rules, and if it suddenly stops 
> hitting then the spammers are reacting.
>
> I think it has value, even if they do react.
I agree with John's assessment.

Re: Increase in Image Spam

Posted by John Hardin <jh...@impsec.org>.

On Tue, 11 Feb 2014, Amir Caspi wrote:

> I could release the rules publicly but that may end up backfiring, per 
> above.  John, Kevin, what do you guys think?

Spammers can install SpamAssassin as easily as anyone else, that's a known 
risk. Any rules we provide they can potentially test against their spams 
to minimize score.

How much they actually *do* this I can't say.

We could try it with one of your rules, and if it suddenly stops hitting 
then the spammers are reacting.

I think it has value, even if they do react.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Windows Genuine Advantage (WGA) means that now you use your
   computer at the sufferance of Microsoft Corporation. They can
   kill it remotely without your consent at any time for any reason;
   it also shuts down in sympathy when the servers at Microsoft crash.
-----------------------------------------------------------------------
  Tomorrow: Abraham Lincoln's and Charles Darwin's 205th Birthdays

Re: Increase in Image Spam

Posted by Amir Caspi <ce...@3phase.com>.

On Feb 11, 2014, at 10:25 AM, Andy Jezierski <AJ...@stepan.com> wrote:
> They don't really hit on any rules.... 

A number of image spams have certain template formats and I've written custom rules to catch many... however, I've been hesitant to release those rules publicly since spammers could just change their templates easily to circumvent this.  (Most image spams for me hit moderate or very low Bayes scores, sometimes Bayes_00, presumably due to the low amount of spammy tokens and large amount of innocuous/hammy tokens...)

I could release the rules publicly but that may end up backfiring, per above.  John, Kevin, what do you guys think?

--- Amir

Re: Increase in Image Spam

Posted by Benny Pedersen <me...@junc.eu>.

On 2014-02-11 20:59, RW wrote:

> Actually I find BAYES_99 to be so reliable that I'd be happy to score
> it above 5.0. Other have made similar comments too.

there is a number of ways to punish spf pass domains for spamming :)

blacklist_from *@foo.example.org

and for the bayes on could make another meta like:

meta NOT_BAYES_HAM_SPF_PASS (!BAYES_00 && SPF_PASS)

or simple reject sender domain in mta

Re: Increase in Image Spam

Posted by RW <rw...@googlemail.com>.

On Tue, 11 Feb 2014 20:22:00 +0100
Benny Pedersen wrote:

> On 2014-02-11 18:25, Andy Jezierski wrote:
> 
> > They don't really hit on any rules....
> > 
> > X-Spam-Status: No, score=3.5 required=5.0
> > tests=BAYES_99,HTML_MESSAGE,
> > 
> >  SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no
> > version=3.4.0-rc5
> 
> bayes is seeing it as spam, so it might be in vain :)
> 
> well if bayes is well trained you can add more meta score to that
> hit, but also maybe meta it with  not user in spf whitelist or
> something ?

Actually I find BAYES_99 to be so reliable that I'd be happy to score
it above 5.0. Other have made similar comments too.

Re: Increase in Image Spam

Posted by Benny Pedersen <me...@junc.eu>.

On 2014-02-11 18:25, Andy Jezierski wrote:

> They don't really hit on any rules....
> 
> X-Spam-Status: No, score=3.5 required=5.0 tests=BAYES_99,HTML_MESSAGE,
> 
>  SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no
> version=3.4.0-rc5

bayes is seeing it as spam, so it might be in vain :)

well if bayes is well trained you can add more meta score to that hit, 
but also maybe meta it with  not user in spf whitelist or something ?

eg if spf pass domain is spamming remove it from local.cf as whitelisted 
for that envelope sender, not From: header

meta UNTRUSTED_SPF_PASS (SPF_PASS && !USER_IN_SPF_WHITELIST)

score based on that meta

to distingt that this is usefull add whitelist_from_spf 
*@foo.example.com to local.cf for sender domains that is not spaming

same meta can be made with dkim