You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Wiebe Cazemier <ha...@gmx.net> on 2006/05/04 14:42:03 UTC

URI_NO_WWW_INFO_CGI rule

Hi,

What exactly does the URI_NO_WWW_INFO_CGI rule mean? The description is: 

        URI: CGI in .info TLD other than third-level "www"

I get false positive spam which have URI's in the .info TLD in it. Like: 

        http://foo.hello.info/forum/viewtopic.php?p=1

Does this rule mean that the webpage accessed by this URI is different then the
one accessed by:

        http://far.hello.info/forum/viewtopic.php?p=1

If so, I would find that rather strange, as the webpages will most likely
always be different, so any message with an .info URI will be marked as spam
(as the score is rather high: 4.1).

Any info is appreciated.

Regards,

Wiebe Cazemier


Re: URI_NO_WWW_INFO_CGI rule

Posted by Wiebe Cazemier <ha...@gmx.net>.
On Friday 05 May 2006 02:05, Loren Wilton wrote:

> Essentially ALL spam rules "can" misfire on legit mail.  In fact
> statistically most of them WILL misfire on some small percentage of legit
> mail.  If they are tested and scored reasonably then there should be a
> fairly small chance of legit mail getting tagged as spam.  If they are
> scored appropriately no one rule will make a mail spam, it will take at
> least two hits on the mail from different rules.
> 
> Of course mail patterns change with time, and the corpus used for scoring
> isn't world-wide.  So there are certianly the occasional need to rescore a
> rule that is found to hit more ham than expected.

Perhaps I've gotten used to the idea too much that Spamassassin hardly gives
false positives. The only time that's happened to me, was with mails that have
all the charachteristics of spam, but are not really spam. Mails like
newsletters from telephonecompanies and such (which they send out with
bulkmail software intended for spamming it would seem, judging by
Spamassassins matches)


Re: URI_NO_WWW_INFO_CGI rule

Posted by Loren Wilton <lw...@earthlink.net>.
> If all the rule does is check for uri's in a certain form, then I would
say
> that this specific rule can backfire on completely legitimate mail.

Essentially ALL spam rules "can" misfire on legit mail.  In fact
statistically most of them WILL misfire on some small percentage of legit
mail.  If they are tested and scored reasonably then there should be a
fairly small chance of legit mail getting tagged as spam.  If they are
scored appropriately no one rule will make a mail spam, it will take at
least two hits on the mail from different rules.

Of course mail patterns change with time, and the corpus used for scoring
isn't world-wide.  So there are certianly the occasional need to rescore a
rule that is found to hit more ham than expected.

        Loren


Re: URI_NO_WWW_INFO_CGI rule

Posted by Wiebe Cazemier <ha...@gmx.net>.
On Thursday 04 May 2006 16:00, Magnus Holmgren wrote:

> uri URI_NO_WWW_INFO_CGI /^(?:https?:\/\/)?[^\/]+(?<!\/www)\.[^.]
> {7,}\.info\/(?=\S{15,})\S*\?/i
> 
> Let's see if I can get this straight...
> 
> (?:https?:\/\/)?   (optionally) "http://" or "https://" followed by
> [^\/]+             one or more of any characters except forward slash /
> (?<!\/www)         of which the last part is not "/www", followed by
> \.[^.]{7,}         a dot and at least 7 characters that are not dots, and
> \.info\/           ".info/"
> (?=\S{15,})        (which is followed by at least 15 non-space characters
> \S*\?              (which we match again here, up to the first question mark
> which we add that there has to be.))
> 
> So it should match e.g. "foo.hellothere.info/forum/viewtopic.php?p=1 " as
> well as "www.hellothere.info/forum/viewtopic.php?p=1 " and
> "http://www.foo.hellothere.info/forum/viewtopic.php?p=1 "
> 
> but not "http://www.hellothere.info/forum/viewtopic.php?p=1 ",
> "foo.hello.info/forum/viewtopic.php?p=1 ",
> "hellothere.info/forum/viewtopic.php?p=1 ",
> or "foo.hellothere.info/bar.php?p=1 ".

(The following is also in reply to Bowie Bailey's message. BTW, Bowie, your
mailclient doesn't set a message reference, so threading is messed up.)

The real URLs are (why I didn't post them before, I don't know...):

http://studentwebzone.tc-online.info/forum2/viewtopic.php?p=47#47

http://studentwebzone.tc-online.info/forum2/viewtopic.php?t=17&unwatch=topic

If all the rule does is check for uri's in a certain form, then I would say
that this specific rule can backfire on completely legitimate mail.

Also, I know I can lookup the rules (in /usr/share/spamassassin) myself, but I
got very confused by all the regexps. I also didn't know what to do with the
regexp result, but I know now it should simply check if it matches.

> 
>> I get false positive spam which have URI's in the .info TLD in it. Like:
>>
>>         http://foo.hello.info/forum/viewtopic.php?p=1
>>
>> Does this rule mean that the webpage accessed by this URI is different then
>> the one accessed by:
>>
>>         http://far.hello.info/forum/viewtopic.php?p=1
> 
> It just means that someone has seen much spam containing URI:s of the
> previous form and that the mass-checks confirmed it.

I can't connect that to the description "URI: CGI in .info TLD other than
third-level "www"". 

> You can always lower the score of any rule you feel misfires. 

I'm trying to help the forum owner to avoid his reply notifcations from being
marked as spam, so what I do to my config is irrelevant.


Re: URI_NO_WWW_INFO_CGI rule

Posted by Magnus Holmgren <ho...@lysator.liu.se>.
Thursday 04 May 2006 14:42 skrev Wiebe Cazemier:
> Hi,
>
> What exactly does the URI_NO_WWW_INFO_CGI rule mean? The description is:
>
>         URI: CGI in .info TLD other than third-level "www"
>

uri URI_NO_WWW_INFO_CGI /^(?:https?:\/\/)?[^\/]+(?<!\/www)\.[^.]
{7,}\.info\/(?=\S{15,})\S*\?/i

Let's see if I can get this straight...

(?:https?:\/\/)?   (optionally) "http://" or "https://" followed by
[^\/]+             one or more of any characters except forward slash /
(?<!\/www)         of which the last part is not "/www", followed by
\.[^.]{7,}         a dot and at least 7 characters that are not dots, and
\.info\/           ".info/"
(?=\S{15,})        (which is followed by at least 15 non-space characters
\S*\?              (which we match again here, up to the first question mark 
which we add that there has to be.))

So it should match e.g. "foo.hellothere.info/forum/viewtopic.php?p=1 " as well 
as "www.hellothere.info/forum/viewtopic.php?p=1 " and
"http://www.foo.hellothere.info/forum/viewtopic.php?p=1 "

but not "http://www.hellothere.info/forum/viewtopic.php?p=1 ", 
"foo.hello.info/forum/viewtopic.php?p=1 ", 
"hellothere.info/forum/viewtopic.php?p=1 ", 
or "foo.hellothere.info/bar.php?p=1 ".

> I get false positive spam which have URI's in the .info TLD in it. Like:
>
>         http://foo.hello.info/forum/viewtopic.php?p=1
>
> Does this rule mean that the webpage accessed by this URI is different then
> the one accessed by:
>
>         http://far.hello.info/forum/viewtopic.php?p=1

It just means that someone has seen much spam containing URI:s of the previous 
form and that the mass-checks confirmed it.

> If so, I would find that rather strange, as the webpages will most likely
> always be different, so any message with an .info URI will be marked as
> spam (as the score is rather high: 4.1).

You can always lower the score of any rule you feel misfires.

-- 
Magnus Holmgren
holmgren@lysator.liu.se

Re: URI_NO_WWW_INFO_CGI rule

Posted by mouss <us...@free.fr>.
jdow wrote:
> You mean you actually found a REAL .info site!!!!!! Wow! Good digging!
how about www.mailscanner.info.
This one even gets inserted in mail scanned by mailscanner, which will 
cause that mail to be caught by other SA installations:)


Re: URI_NO_WWW_INFO_CGI rule

Posted by Magnus Holmgren <ho...@lysator.liu.se>.
Friday 05 May 2006 12:50 skrev jdow:
> It sounds like he got suckered with the .info site. It seems like all the
> spammers in the known universe dove into that one wholesale.

.biz has to be one tad worse, right? I even know someone with a .info domain, 
although she only appears to use it for mail.

-- 
Magnus Holmgren
holmgren@lysator.liu.se

Re: URI_NO_WWW_INFO_CGI rule

Posted by jdow <jd...@earthlink.net>.
It sounds like he got suckered with the .info site. It seems like all the
spammers in the known universe dove into that one wholesale.

{o.o}
----- Original Message ----- 
From: "Wiebe Cazemier" <ha...@gmx.net>


> On Friday 05 May 2006 00:29, jdow wrote:
>> You mean you actually found a REAL .info site!!!!!! Wow! Good digging!
>> 
>> {^_^}
> 
> Well, it's not "real" site. I just got into discussion about Settlers 2 with
> somebody, and he pointed me to his (personal) forum. It is not "real" as in
> that there are hardly visitors.

Re: URI_NO_WWW_INFO_CGI rule

Posted by Wiebe Cazemier <ha...@gmx.net>.
On Friday 05 May 2006 00:29, jdow wrote:
> You mean you actually found a REAL .info site!!!!!! Wow! Good digging!
> 
> {^_^}

Well, it's not "real" site. I just got into discussion about Settlers 2 with
somebody, and he pointed me to his (personal) forum. It is not "real" as in
that there are hardly visitors.


Re: URI_NO_WWW_INFO_CGI rule

Posted by jdow <jd...@earthlink.net>.
From: "Wiebe Cazemier" <ha...@gmx.net>

> Hi,
> 
> What exactly does the URI_NO_WWW_INFO_CGI rule mean? The description is: 
> 
>        URI: CGI in .info TLD other than third-level "www"
> 
> I get false positive spam which have URI's in the .info TLD in it. Like: 
> 
>        http://foo.hello.info/forum/viewtopic.php?p=1
> 
> Does this rule mean that the webpage accessed by this URI is different then the
> one accessed by:
> 
>        http://far.hello.info/forum/viewtopic.php?p=1
> 
> If so, I would find that rather strange, as the webpages will most likely
> always be different, so any message with an .info URI will be marked as spam
> (as the score is rather high: 4.1).
> 
> Any info is appreciated.
> 
> Regards,
> 
> Wiebe Cazemier

You mean you actually found a REAL .info site!!!!!! Wow! Good digging!

{^_^}