You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Sharma, Ashish" <as...@hp.com> on 2010/06/21 21:25:54 UTC

unable to find logic behind spamassassin rule

Hi,

I have the latest version of spamassassin, I am unable to find the logic behind the following rule and it's high spam score.

MANY_SPAN_IN_TEXT 3.099


Can anybody give a reason?

Thanks in advance

Ashish Sharma

Re: unable to find logic behind spamassassin rule

Posted by John Hardin <jh...@impsec.org>.
On Mon, 21 Jun 2010, Bowie Bailey wrote:

> Michael Scheidell wrote:
>> On 6/21/10 3:25 PM, Sharma, Ashish wrote:
>>>
>>> I have the latest version of spamassassin, I am unable to find the
>>> logic behind the following rule and it's high spam score.
>>>
>>> MANY_SPAN_IN_TEXT 3.099
>>>
>>> Can anybody give a reason?
>>
>> 72_scores.cf:score MANY_SPAN_IN_TEXT         1.862 2.398 1.862 2.398
>
> 72_active.cf:rawbody        __SPAN_BEG_TEXT     /[a-z]{2}<(?i:span)\s/
> 72_active.cf:tflags         __SPAN_BEG_TEXT     multiple
> 72_active.cf:rawbody        __SPAN_END_TEXT     /[^;>]<\/(?i:span)>[a-z]{3}/
> 72_active.cf:tflags         __SPAN_END_TEXT     multiple
>
> In other words, the message has more than 4 <span> tags and more than 4
> </span> tags.

It's slightly more than that. There aren't just <span> tags, there are 
<span> tags embedded within lowercase text. It appears to be a way to try 
to break pattern matching on spammy words, by dropping a <span></span> tag 
pair in the middle:

    via<span>sausage</span>gra

this renders visually as a single word commonly seen in pharma spam, but a 
naive string matching spam filter may be spoofed and miss it.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Gun Control laws cannot reduce violent crime, because gun control
   laws focus obsessively on a tool a criminal might use to commit a
   crime rather than the criminal himself and his act of violence.
-----------------------------------------------------------------------
  13 days until the 234th anniversary of the Declaration of Independence

Re: unable to find logic behind spamassassin rule

Posted by Bowie Bailey <Bo...@BUC.com>.
Michael Scheidell wrote:
> On 6/21/10 3:25 PM, Sharma, Ashish wrote:
>> Hi,
>>
>> I have the latest version of spamassassin, I am unable to find the
>> logic behind the following rule and it's high spam score.
>>
>> MANY_SPAN_IN_TEXT 3.099
>>
>>
>> Can anybody give a reason?
>>
>>    
> grep MANY_SPAN_IN_TEXT *
> 72_active.cf:##{ MANY_SPAN_IN_TEXT
> 72_active.cf:meta           MANY_SPAN_IN_TEXT   (__SPAN_BEG_TEXT > 4)
> && (__SPAN_END_TEXT > 4)
> 72_active.cf:describe       MANY_SPAN_IN_TEXT   Many <SPAN> tags
> embedded within text
> 72_active.cf:##} MANY_SPAN_IN_TEXT
> 72_scores.cf:score MANY_SPAN_IN_TEXT                     1.862 2.398
> 1.862 2.398

72_active.cf:rawbody        __SPAN_BEG_TEXT     /[a-z]{2}<(?i:span)\s/
72_active.cf:tflags         __SPAN_BEG_TEXT     multiple
72_active.cf:rawbody        __SPAN_END_TEXT     /[^;>]<\/(?i:span)>[a-z]{3}/
72_active.cf:tflags         __SPAN_END_TEXT     multiple


In other words, the message has more than 4 <span> tags and more than 4
</span> tags.  The scores are generated automatically based on the fact
that this pattern matches much more often on spam messages than on ham
messages.  If it is causing problems for you, you can override the score
in your local.cf file like this:

score MANY_SPAN_IN_TEXT 1.0

Use whatever score you want.  A score of 0 will disable the rule.

-- 
Bowie

Re: unable to find logic behind spamassassin rule

Posted by Michael Scheidell <sc...@secnap.net>.
On 6/21/10 3:25 PM, Sharma, Ashish wrote:
> Hi,
>
> I have the latest version of spamassassin, I am unable to find the logic behind the following rule and it's high spam score.
>
> MANY_SPAN_IN_TEXT 3.099
>
>
> Can anybody give a reason?
>
>    
grep MANY_SPAN_IN_TEXT *
72_active.cf:##{ MANY_SPAN_IN_TEXT
72_active.cf:meta           MANY_SPAN_IN_TEXT   (__SPAN_BEG_TEXT > 4) && 
(__SPAN_END_TEXT > 4)
72_active.cf:describe       MANY_SPAN_IN_TEXT   Many <SPAN> tags 
embedded within text
72_active.cf:##} MANY_SPAN_IN_TEXT
72_scores.cf:score MANY_SPAN_IN_TEXT                     1.862 2.398 
1.862 2.398


> Thanks in advance
>
> Ashish Sharma
>    


-- 
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
 > *| *SECNAP Network Security Corporation

    * Certified SNORT Integrator
    * 2008-9 Hot Company Award Winner, World Executive Alliance
    * Five-Star Partner Program 2009, VARBusiness
    * Best Anti-Spam Product 2008, Network Products Guide
    * King of Spam Filters, SC Magazine 2008

______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
______________________________________________________________________  

Re: unable to find logic behind spamassassin rule

Posted by Michael Scheidell <li...@secnap.com>.
On 6/21/10 3:25 PM, Sharma, Ashish wrote:
> Hi,
>
> I have the latest version of spamassassin, I am unable to find the logic behind the following rule and it's high spam score.
>
> MANY_SPAN_IN_TEXT 3.099
>
>
>    
as for the scoring, it is done autoomaticallay, checking how much 'ham' 
has more than 4 <span>jlkjlkj</span> tags, vs ham.

the current scoreing is score MANY_SPAN_IN_TEXT                     
1.862 2.398 1.862 2.398

based on if net, learning, bayes set, etc.

> Can anybody give a reason?
>
> Thanks in advance
>
> Ashish Sharma
>    

______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
______________________________________________________________________