You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Craig Baird <cr...@xpressweb.com> on 2005/05/05 19:33:51 UTC

URIs being split over multiple lines

Most of my spam that's getting through at this point is stuff that has a URI 
with multiple carriage returns in it like this:

<A href="h
ttp://eafbfowksugw.org&ghikk2hnvo32i7d21gun%2Eetn
eanim
bme%2Ecom/">

I know this trick has been discussed.  I looked for a bug report, and couldn't 
find one on this particular thing.  I did find a thread in the archives about 
this, and a couple of rules were suggested, but someone mentioned that at 
least one of the rules results in a lot of FPs.  Is anyone aware of a rule 
that will catch these that doesn't trigger a lot of FPs?

Thanks!

Craig

Re: URIs being split over multiple lines

Posted by Martin Hepworth <ma...@solid-state-logic.com>.
Craig

I found SA 3.0.3 correctly spotted this and fed it to surbl.org URI-RBLs 
which trapped it.

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300


Craig Baird wrote:
> Most of my spam that's getting through at this point is stuff that has a URI 
> with multiple carriage returns in it like this:
> 
> <A href="h
> ttp://eafbfowksugw.org&ghikk2hnvo32i7d21gun%2Eetn
> eanim
> bme%2Ecom/">
> 
> I know this trick has been discussed.  I looked for a bug report, and couldn't 
> find one on this particular thing.  I did find a thread in the archives about 
> this, and a couple of rules were suggested, but someone mentioned that at 
> least one of the rules results in a lot of FPs.  Is anyone aware of a rule 
> that will catch these that doesn't trigger a lot of FPs?
> 
> Thanks!
> 
> Craig

**********************************************************************

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.	

**********************************************************************


Re: URIs being split over multiple lines

Posted by Loren Wilton <lw...@earthlink.net>.
> Then there's the other problem: rawbody rules seem to act on a
> line-by-line basis,

Yes.  By design.  :-(  (Which I consider broken design.)

so you can look for /href=h$/ or /^ttp/ but not
> /href=h\nttp/

I'm pretty sure (but not positive) that the rawbody rule might have been
hitting in this case, because the line was broken with cr's rather than
actual newlines.  On the other hand, I may well have added the rawbody rule
as a duplicate of the full rule without thinking about it.  I tend to
duplicate most of my rawbody or full rules as both, since they will often
hit different cases.

        Loren

A quick check of last month's spam indicates several hundred hits on these
rules.  In every case it looks like both rules hit.


Re: URIs being split over multiple lines

Posted by Kelson <ke...@speed.net>.
Robert Menschel wrote:
> Best I've seen in a bunch of testing:
> rawbody   __LW_URI_CR1 /href=\"[^"]*\r[^\n]/is
> full      __LW_URI_CR2 /href=\"[^"]*\r[^\n]/is
> meta      LW_URI_CR  __LW_URI_CR1 || __LW_URI_CR2
> score     LW_URI_CR  2
> describe  LW_URI_CR  unescaped cr in uri
> #hist     LW_URI_CR  Loren Wilton
> #counts   LW_URI_CR  49s/0h of 292007 corpus (122219s/169788h RM) 04/27/05
> 
> Doesn't catch all of them, for reasons I haven't yet figured out, but
> catches some, and no FPs here.

I have yet to get any hits on this one in over a week, despite receiving 
several mails that look like they use this pattern.  From what I can 
tell, either the raw-CR spammers aren't targetting us, or something is 
converting them to newlines before SA gets to see it.

Then there's the other problem: rawbody rules seem to act on a 
line-by-line basis, so you can look for /href=h$/ or /^ttp/ but not 
/href=h\nttp/

-- 
Kelson Vibber
SpeedGate Communications <www.speed.net>

Re: URIs being split over multiple lines

Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Craig,

Thursday, May 5, 2005, 10:33:51 AM, you wrote:

CB> Most of my spam that's getting through at this point is stuff that has a URI
CB> with multiple carriage returns in it like this:

CB> <A href="h
CB> ttp://eafbfowksugw.org&ghikk2hnvo32i7d21gun%2Eetn
CB> eanim
bme%2Ecom/">>

CB> I know this trick has been discussed.  I looked for a bug report, and couldn't
CB> find one on this particular thing.  I did find a thread in the archives about
CB> this, and a couple of rules were suggested, but someone mentioned that at
CB> least one of the rules results in a lot of FPs.  Is anyone aware of a rule
CB> that will catch these that doesn't trigger a lot of FPs?

Best I've seen in a bunch of testing:
rawbody   __LW_URI_CR1 /href=\"[^"]*\r[^\n]/is
full      __LW_URI_CR2 /href=\"[^"]*\r[^\n]/is
meta      LW_URI_CR  __LW_URI_CR1 || __LW_URI_CR2
score     LW_URI_CR  2
describe  LW_URI_CR  unescaped cr in uri
#hist     LW_URI_CR  Loren Wilton
#counts   LW_URI_CR  49s/0h of 292007 corpus (122219s/169788h RM) 04/27/05

Doesn't catch all of them, for reasons I haven't yet figured out, but
catches some, and no FPs here.

Bob Menschel




Re: URIs being split over multiple lines

Posted by Loren Wilton <lw...@earthlink.net>.
> Most of my spam that's getting through at this point is stuff that has a
URI
> with multiple carriage returns in it like this:
>
> I know this trick has been discussed.  I looked for a bug report, and
couldn't
> find one on this particular thing.  I did find a thread in the archives
about

There was a bug report, I filed one.  Don't know current status, it may have
been closed.

I think we may have a new SARE rule to catch these sort of things.  I've
been using a simple rule here that hasn't fp'ed.  But then, I don't get a
huge amount of mail or spam. (Although spam is approximately 100x1 times the
real mail.)

        Loren