You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Shane Metler <me...@homes.com> on 2004/09/20 21:51:34 UTC

'body', 'uri', 'rawbody' rules ...

Hi there,

Using SpamAssassin 2.64, I have found a few cases where the target
domain of a custom rule can not be matched via any of these three rule
types.

The target of my rule is a plain text (non HREF) URL that is
obfusticated in a way that seems to miss my rules.

My rule: (I've have both rawbody and body rules with this same pattern),
but the body rule is what I think should be working.

describe	SKM_SPAM_LIST_B_236		SKM Rules
body		SKM_SPAM_LIST_B_236		m/jgsgfta\.com/i
score		SKM_SPAM_LIST_B_236		50.0

The email in question is Base64 encoded.

The Base64 code decoded text has the URL not as an Href, but as plain
text ... It does have HTML tags placed within the text to break up the
URL when viewing as code.

<font face="Arial, Helvetica, sans-serif"><br>
        <br>
        <font color="#0000FF"
size="2"><strong>http://www.jg<ksbloatcasualtycoyoteplastisoldiscusskais
eroceanographylevelstonylingeriesubmersiblenightshirtbromfieldbritannicc
redenzaexcretorycentigradelamentationbawddilettantehydrogenateanonymityv
ellaurethracoastal>sgf<a
target="_blank"larmhallmarkhopemicrographytinkermiscreantmacedoniabarbar
ismcorpsmenhabitationdesertinflationaryexpatiateupburkesuperstitiousgrav
itatepuscircabalddoomsdaytrinidadgold>ta.com/index.php?ID=JNL</strong></
font></font><br>

Shouldn't the 'body' type rule remove the HTML markup, so the remaining
text is "http://www.jgsgfta.com/index.php?ID=JNL" ?

I thought that should be the case, but these messages keep coming
through ...

Any clarification or suggestions would be most appreciated!

Thanks in advance,
Shane


Re: 'body', 'uri', 'rawbody' rules ...

Posted by Matt Kettler <mk...@evi-inc.com>.
At 03:51 PM 9/20/2004, Shane Metler wrote:
>describe        SKM_SPAM_LIST_B_236             SKM Rules
>body            SKM_SPAM_LIST_B_236             m/jgsgfta\.com/i
>score           SKM_SPAM_LIST_B_236             50.0

Style note: the m modifier to regexes is pointless for body rules. All 
EOL's are stripped out of the message prior to running body rules.

>Shouldn't the 'body' type rule remove the HTML markup, so the remaining
>text is "http://www.jgsgfta.com/index.php?ID=JNL" ?

Yes, and your rule works for me when I paste your sample HTML into a 
text/plain email.

Any chance you can provide an mbox file of the exact email that's a problem?

Base64 encoding shouldn't be a problem for SA, but I'm wondering if there's 
some weirdness to the encoding or the HTML that isn't obvious from the HTML 
your posted.



RE: 'body', 'uri', 'rawbody' rules ...

Posted by Shane Metler <me...@homes.com>.
Thank you for the tip!

These messages keep squeaking through, and now I feel like I HAVE to
stop them.
:O)

Shane

-----Original Message-----
From: Loren Wilton [mailto:lwilton@earthlink.net] 
Sent: Tuesday, September 21, 2004 2:02 AM
To: users@spamassassin.apache.org
Subject: Re: 'body', 'uri', 'rawbody' rules ...


> Shouldn't the 'body' type rule remove the HTML markup, so the 
> remaining text is "http://www.jgsgfta.com/index.php?ID=JNL" ?

It should.  But on 2.63/64 it will also stick a "URI:" in there
somewhere, just to screw up things if you aren't expecting it.

Try this:

body  LW_PRINTIT   /(^.*$)(?{ print "Body:\n$^N\nEnd Body\n\n" })/i

That will print out the decoded body.  Then you can see what you are
working with.

        Loren


Re: 'body', 'uri', 'rawbody' rules ...

Posted by Loren Wilton <lw...@earthlink.net>.
> Shouldn't the 'body' type rule remove the HTML markup, so the remaining
> text is "http://www.jgsgfta.com/index.php?ID=JNL" ?

It should.  But on 2.63/64 it will also stick a "URI:" in there somewhere,
just to screw up things if you aren't expecting it.

Try this:

body  LW_PRINTIT   /(^.*$)(?{ print "Body:\n$^N\nEnd Body\n\n" })/i

That will print out the decoded body.  Then you can see what you are working
with.

        Loren