You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Amir 'CG' Caspi <ce...@3phase.com> on 2013/08/10 22:10:36 UTC

Re: LONGWORDS not hitting?

At 12:42 PM -0600 06/30/2013, Amir 'CG' Caspi wrote:
>Hi all,
>
>	Just got this spam:
>
>http://pastebin.com/KM5paaZ9
>
>To me, it looks like LONGWORDS should have hit... but it didn't.  I 
>ran it manually through spamassassin and spamc, and LONGWORDS still 
>didn't hit, so it seems to just not be hitting that rule.  But, to 
>my eye, it looks like it should.  Any idea why it failed, and should 
>LONGWORDS be updated?

OK, more info and potentially new problem.  I re-tested one of the 
spams I posted yesterday:
http://pastebin.com/VCtvzjzV

When running this example through SA (either SA standalone, or 
spamc/spamd) now, LONGWORDS hits, as follows:

Aug 10 15:47:20.115 [21805] dbg: rules: ran body rule __LONGWORDS_C 
======> got hit: "authenticate dearth deplorers hogmane 
fraudulentness going pillowcases believing vagotomy mastoidectomies "
Aug 10 15:46:20.613 [21757] dbg: rules: ran body rule __LONGWORDS_B ======> got
hit: "family husbandry allowed walloper little length voluntaries 
weothao sternw
ard "

... BUT... this pastebin example is the copy/paste of "view raw 
source" from my MUA.  If I run SA on the original server-side email 
(i.e. the email as stored in my IMAP mailbox), LONGWORDS does _NOT_ 
hit.  That is, neither _C nor _B hit on the server-side version, 
despite hitting on the MUA version.

For your perusal, I've copied the output of SA when running on the 
server-side version, i.e. with all MIME content fully intact... see 
here:

http://pastebin.com/keNi5BjN

What the heck is going on?  Why would LONGWORDS hit on the MUA 
version but not the server-side?  Since LONGWORDS is a rawbody rule, 
not based on headers, it seems like it should pop on both versions. 
I'm guessing that there's something about the MIME content that's 
making LONGWORDS fail to hit on the server-side (MBX) email, but 
allows it to hit on the MUA ("view raw source") email... but I just 
don't understand why that would be.

I've had LONGWORDS hit at the server-side (pre-MUA) level, though not 
very often (only 4 out of 465 messages currently in my spam box), so 
it _is_ running... but for whatever reason, LONGWORDS hits much more 
often (i.e. as it should) with the MUA "raw source" versions than it 
does with server-side (MBOX/MBX) versions, so this is not an isolated 
occurrence.

So WTF is going on?  Does anyone have ideas?  To my eyeballs, the 
exact same text is contained in both versions and therefore should 
hit LONGWORDS in either version, but only one version pops.

I'm happy to paste more debug output if it might help someone debug the rule.

Thanks in advance.

						--- Amir

Re: LONGWORDS not hitting?

Posted by Amir 'CG' Caspi <ce...@3phase.com>.
At 1:43 PM +0100 08/24/2013, RW wrote:
>LONGWORDS is a body rule, i.e. it runs on a normalized  version of the

Gah, THAT'S why it wasn't working?  I feel like an idiot now. =P

						--- Amir

Re: LONGWORDS not hitting?

Posted by RW <rw...@googlemail.com>.
On Sat, 24 Aug 2013 00:23:17 -0600
Amir 'CG' Caspi wrote:

> Hi all,
> 
> 	Since it's been a couple of weeks with no reply, I thought I 
> might ask this again.  See below.
> 	Do I need to file a bug for SA?  Is this something obvious 
> that I'm missing?  Does the LONGWORDS rule need an update?

LONGWORDS is a body rule, i.e. it runs on a normalized  version of the
rendered text. Neither Bayes nor  LONGWORDS sees any of the words
you're looking at.

You could try writing a separate rawbody rule, but it would see all
of the  html and not just the comments.

Re: LONGWORDS not hitting?

Posted by Amir 'CG' Caspi <ce...@3phase.com>.
Hi all,

	Since it's been a couple of weeks with no reply, I thought I 
might ask this again.  See below.
	Do I need to file a bug for SA?  Is this something obvious 
that I'm missing?  Does the LONGWORDS rule need an update?

	It appears that LONGWORDS is failing to hit on the original 
(server-side, MBOX) email with all MIME components... but hits on the 
email once it has been interpreted as text by the MTA.  Something 
about the MIME-encoding is confusing LONGWORDS even though I can't 
see why, with my naked eye.
	Pastebin examples of both (server-side and MTA) versions are below.

Thanks.

						--- Amir

At 2:10 PM -0600 08/10/2013, Amir 'CG' Caspi wrote:
>At 12:42 PM -0600 06/30/2013, Amir 'CG' Caspi wrote:
>>Hi all,
>>
>>	Just got this spam:
>>
>>http://pastebin.com/KM5paaZ9
>>
>>To me, it looks like LONGWORDS should have hit... but it didn't.  I 
>>ran it manually through spamassassin and spamc, and LONGWORDS still 
>>didn't hit, so it seems to just not be hitting that rule.  But, to 
>>my eye, it looks like it should.  Any idea why it failed, and 
>>should LONGWORDS be updated?
>
>OK, more info and potentially new problem.  I re-tested one of the 
>spams I posted yesterday:
>http://pastebin.com/VCtvzjzV
>
>When running this example through SA (either SA standalone, or 
>spamc/spamd) now, LONGWORDS hits, as follows:
>
>Aug 10 15:47:20.115 [21805] dbg: rules: ran body rule __LONGWORDS_C 
>======> got hit: "authenticate dearth deplorers hogmane 
>fraudulentness going pillowcases believing vagotomy mastoidectomies "
>Aug 10 15:46:20.613 [21757] dbg: rules: ran body rule __LONGWORDS_B 
>======> got
>hit: "family husbandry allowed walloper little length voluntaries 
>weothao sternw
>ard "
>
>... BUT... this pastebin example is the copy/paste of "view raw 
>source" from my MUA.  If I run SA on the original server-side email 
>(i.e. the email as stored in my IMAP mailbox), LONGWORDS does _NOT_ 
>hit.  That is, neither _C nor _B hit on the server-side version, 
>despite hitting on the MUA version.
>
>For your perusal, I've copied the output of SA when running on the 
>server-side version, i.e. with all MIME content fully intact... see 
>here:
>
>http://pastebin.com/keNi5BjN
>
>What the heck is going on?  Why would LONGWORDS hit on the MUA 
>version but not the server-side?  Since LONGWORDS is a rawbody rule, 
>not based on headers, it seems like it should pop on both versions. 
>I'm guessing that there's something about the MIME content that's 
>making LONGWORDS fail to hit on the server-side (MBX) email, but 
>allows it to hit on the MUA ("view raw source") email... but I just 
>don't understand why that would be.
>
>I've had LONGWORDS hit at the server-side (pre-MUA) level, though 
>not very often (only 4 out of 465 messages currently in my spam 
>box), so it _is_ running... but for whatever reason, LONGWORDS hits 
>much more often (i.e. as it should) with the MUA "raw source" 
>versions than it does with server-side (MBOX/MBX) versions, so this 
>is not an isolated occurrence.
>
>So WTF is going on?  Does anyone have ideas?  To my eyeballs, the 
>exact same text is contained in both versions and therefore should 
>hit LONGWORDS in either version, but only one version pops.
>
>I'm happy to paste more debug output if it might help someone debug the rule.
>
>Thanks in advance.
>
>						--- Amir