You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jeremy Kister <sp...@jeremykister.com> on 2005/12/05 09:06:35 UTC

URIBL_SBL error

I noticed that after I sent an email, it got tagged with an incorrect rule:

 1.1 URIBL_SBL              Contains an URL listed in the SBL blocklist
                            [URIs: illas.com]

in fact, what I sent was a lot of email addresses at getawayvillas.com

the messages are temporarily at http://jeremy.kister.net/tmp/

uribl_sbl.txt is the original message
uribl_sbl-sa.txt is the message after spamc processing.

Note: It's only the URIBL_SBL that i'm concerned with.

Any idea what's going on?


-- 

Jeremy Kister
http://jeremy.kister.net./

Re: URIBL_SBL error

Posted by Theo Van Dinter <fe...@apache.org>.
On Mon, Dec 05, 2005 at 12:02:49PM -0500, Matt Kettler wrote:
> I also tried editing the first line to "<censored>@getawayillas.com" and then
> all the HTTP URI bits and the domains to query became "llas.com"
> 
> Since when does "getawa" parse as http:/ ??
> 
> I think the parser is getting very confused.

No, the URI parser is actually working right.  Per my previous message,
the message gets turned into a paragraph which is too large and gets
split.  "llas.com" is then seen as a raw domain in the message, and it's
assumed to be http.

-- 
Randomly Generated Tagline:
Isn't "half-duplex" just an apartment?

Re: URIBL_SBL error

Posted by Matt Kettler <mk...@evi-inc.com>.
Jeremy Kister wrote:
> I noticed that after I sent an email, it got tagged with an incorrect rule:
> 
>  1.1 URIBL_SBL              Contains an URL listed in the SBL blocklist
>                             [URIs: illas.com]
> 
> in fact, what I sent was a lot of email addresses at getawayvillas.com
> 
> the messages are temporarily at http://jeremy.kister.net/tmp/
> 
> uribl_sbl.txt is the original message
> uribl_sbl-sa.txt is the message after spamc processing.
> 
> Note: It's only the URIBL_SBL that i'm concerned with.
> 
> Any idea what's going on?


Looks like a bug in SA 3.1.0's parsing of mailto URIs.

Using SA 3.1.0 on your input I get this debug out:


[23263] dbg: uri: parsed uri found, mailto:<CENSORED>@getawayvillas.com
[23263] dbg: uri: cleaned parsed uri, mailto:<CENSORED>@getawayvillas.com
[23263] dbg: uri: parsed domain, getawayvillas.com
[23263] dbg: uri: parsed uri found, illas.com
[23263] dbg: uri: cleaned parsed uri, illas.com
[23263] dbg: uri: cleaned parsed uri, http://illas.com
[23263] dbg: uri: parsed domain, illas.com
[23263] dbg: uri: parsed uri found, http://illas.com
[23263] dbg: uri: cleaned parsed uri, http://illas.com
[23263] dbg: uri: parsed domain, illas.com

<snip, whole bunch of the same>
[21710] dbg: uri: parsed uri found, mailto:<censored>@getawayvillas.com
[21710] dbg: uri: cleaned parsed uri, mailto:<censored>@getawayvillas.com
[21710] dbg: uri: parsed domain, getawayvillas.com
[21710] dbg: uridnsbl: domains to query: illas.com



I also tried editing the first line to "<censored>@getawayillas.com" and then
all the HTTP URI bits and the domains to query became "llas.com"

Since when does "getawa" parse as http:/ ??

I think the parser is getting very confused.








Re: URIBL_SBL error

Posted by Theo Van Dinter <fe...@apache.org>.
On Mon, Dec 05, 2005 at 03:06:35AM -0500, Jeremy Kister wrote:
>  1.1 URIBL_SBL              Contains an URL listed in the SBL blocklist
>                             [URIs: illas.com]
> 
> in fact, what I sent was a lot of email addresses at getawayvillas.com
> 
> Note: It's only the URIBL_SBL that i'm concerned with.
> 
> Any idea what's going on?

Interesting.  The issue is related to maximum line length internally.
Basically the whole thing looks like a big paragraph, so SA puts it all in one
line, realizes the line is too long and splits it.  The split, though, doesn't
happen at a word boundary, it just gets split:

[...]
jgb@getawayvillas.com 1 jgd@getawayvillas.com 1 jjs@getawayvillas.c
om 1 johninsd@getawayvillas.com 1 josh@getawayvillas.com
[...]
spidb@getawayvillas.com 1 srahtz@getawayv
illas.com 1 stenman@getawayvillas.com
[...]

I'd open up a bugzilla ticket about it, we should probably try splitting at a
word boundary so we avoid issues like this.

-- 
Randomly Generated Tagline:
Lowery's Law:
 	If it jams -- force it.  If it breaks, it needed replacing anyway.