You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Richard Smits <R....@tudelft.nl> on 2010/10/26 08:07:44 UTC

List of urls

Hello,

Does anyone know if it's possible to have a list of url's, and define a 
score for all of them in one line ?

----
Now i do like this :
----
uri url_1 /www.domain1.com/
uri url_2 /www.domain2.com/
uri url_3 /www.domain3.com/
uri url_4 /www.domain4.com/

score url_1 10
score url_2 10
score url_3 10
score url_4 10

----
But I want just one line to define the score. Are there more ways to do 
this ?

Greetings .. Richard


Re: List of urls

Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2010-10-26 at 08:07 +0200, Richard Smits wrote:
> Hello,
> 
> Does anyone know if it's possible to have a list of url's, and define a 
> score for all of them in one line ?
> 
I developed a similar system for my own purposes that you might want to
look at.

The idea is that you define this type of rule in an easily edited file
which contains header lines the set the rule name, score, description,
whether it ignores case, etc. These are followed by one or more
sections, each consisting of a line saying which part of the message it
applies to (body, uri, etc) and a list of match terms. A shell script,
which uses gawk for the heavy lifting, converts one or more definition
files into rules (one rule per definition) and outputs a single .cf file
containing them all. There's even a man page.

Its all available in a GPLed tarball:
http://www.libelle-systems.com/free/portmanteau/portmanteau.tgz


Martin



Re: List of urls

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Tue, 2010-10-26 at 20:10 +0100, Martin Gregorie wrote:
> On Tue, 2010-10-26 at 10:37 -0700, John Hardin wrote:

> > The OP wasn't clear whether he wanted ten points _per URI hit_. If that's 
> > the case, the regex alternatives and meta solutions aren't appropriate and 
> > there's no way to avoid one score line per URI rule.
> 
> ????? What about 'tflags multiple' as in:
> 
> uri    RULE /(example.(com|net)|example.org|...)/
> tflags RULE multiple
> score  RULE 10
> 
> The only (minor) drawback I've found is that the list of firing rules
> can filled with RULE, RULE, RULE,.... by the type of spam that contains
> nothing but tens of lines pushing variations on a theme such as:

tflags multiple can be quite dangerous, though, if it directly results
in a hit. As per your example. Besides possibly flooding the report, it
also can seriously bias the overall score easily.

URI DNSBL hits, for example, do not count how often a domain is in the
spam, but hit once only.

The safest approach for tflags multiple rules is to trigger other rules
based on the number of hits. meta rules explicitly support this.

  meta FOO_4  __TFLAGS_MULTIPLE_SUB >= 4


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: List of urls

Posted by John Hardin <jh...@impsec.org>.
On Tue, 26 Oct 2010, Martin Gregorie wrote:

> On Tue, 2010-10-26 at 10:37 -0700, John Hardin wrote:
>>
>> The OP wasn't clear whether he wanted ten points _per URI hit_. If that's
>> the case, the regex alternatives and meta solutions aren't appropriate and
>> there's no way to avoid one score line per URI rule.
>
> ????? What about 'tflags multiple' as in:
>
> uri    RULE /(example.(com|net)|example.org|...)/
> tflags RULE multiple
> score  RULE 10

You're right. I didn't think of that.


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   ...the Fates notice those who buy chainsaws...
                                               -- www.darwinawards.com
-----------------------------------------------------------------------
  5 days until Halloween

Re: List of urls

Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2010-10-26 at 10:37 -0700, John Hardin wrote:
> On Tue, 26 Oct 2010, Karsten Brckelmann wrote:
> 
> > On Tue, 2010-10-26 at 10:53 +0200, Raymond Dijkxhoorn wrote:
> >> For your question, why dont you regexp it?
> >>
> >> uri url_1 /www.domain(1|2|3|4).com/
> >
> > The other technique you can use are meta rules
> >
> >  uri __MY_BL_001 /example.(com|net)/
> >  uri __MY_BL_002 /example.org/
> >
> >  meta  MY_BL  __MY_BL_001 || __MY_BL_002
> >  score MY_BL  10.0
> 
> The OP wasn't clear whether he wanted ten points _per URI hit_. If that's 
> the case, the regex alternatives and meta solutions aren't appropriate and 
> there's no way to avoid one score line per URI rule.
> 
????? What about 'tflags multiple' as in:

uri    RULE /(example.(com|net)|example.org|...)/
tflags RULE multiple
score  RULE 10

The only (minor) drawback I've found is that the list of firing rules
can filled with RULE, RULE, RULE,.... by the type of spam that contains
nothing but tens of lines pushing variations on a theme such as:

Buy FAMOUS SHOE basketMax
Buy FAMOUS SHOE basketSuper
Buy FAMOUS SHOE basketWimp
Buy FAMOUS SHOE runningMax
.... 


Martin




Re: List of urls

Posted by John Hardin <jh...@impsec.org>.
On Tue, 26 Oct 2010, Karsten Br�ckelmann wrote:

> On Tue, 2010-10-26 at 10:53 +0200, Raymond Dijkxhoorn wrote:
>> For your question, why dont you regexp it?
>>
>> uri url_1 /www.domain(1|2|3|4).com/
>
> The other technique you can use are meta rules
>
>  uri __MY_BL_001 /example.(com|net)/
>  uri __MY_BL_002 /example.org/
>
>  meta  MY_BL  __MY_BL_001 || __MY_BL_002
>  score MY_BL  10.0

The OP wasn't clear whether he wanted ten points _per URI hit_. If that's 
the case, the regex alternatives and meta solutions aren't appropriate and 
there's no way to avoid one score line per URI rule.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   ...the Fates notice those who buy chainsaws...
                                               -- www.darwinawards.com
-----------------------------------------------------------------------
  5 days until Halloween

Re: List of urls

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Tue, 2010-10-26 at 10:53 +0200, Raymond Dijkxhoorn wrote:
> For your question, why dont you regexp it?
> 
> uri url_1 /www.domain(1|2|3|4).com/
> 
> The exact regexp is naturally depending on the domains but you dont need a 
> seperate check for all.

One way to consolidate them, yes -- depending on the nature of the
strings to match it can be very intuitive and natural.

The other technique you can use are meta rules, together with
non-scoring sub-rules to prevent the individual parts from scoring
(default of 1, if not set explicitly).

  uri __MY_BL_001 /example.(com|net)/
  uri __MY_BL_002 /example.org/

  meta  MY_BL  __MY_BL_001 || __MY_BL_002
  score MY_BL  10.0

Note though, that the above uri matches are not sufficiently strict
(similar to the OPs example) and might result in FPs.

The dot in an RE matches any char, and must be escaped to match a
literal dot. Also, the REs should be anchored, either at the left or
right end, to prevent possibly matching innocent bystanders. Since
parsed URIs are guaranteed to have a protocol (pre-pended by SA, if
none), this would be much more safe than the simple example above.

  uri __MY_BL_000  m~^https?://(www\.)?example\.org(/|$)~

It is anchored at the beginning of the URI, allows an optional "www"
host name, and is anchored at the end to further prevent FPs. Oh, and it
also uses m// with an alternative delimiter, so I don't have to escape
the slash in the RE.

How strict you want your uri rule REs depends on your level of paranoia
and the domains to match.


> The best to handle domains is putting them in a small rbl, or get them 
> added to a existing rbl.

Well, it certainly depends on the amount of URIs, and how frequently the
list may change. SA config is not suitable for frequent changes, but
would be way easier to set up than a local RBL, if the list isn't too
large and mostly static.

Adding to existing URI DNSBLs isn't always an option, btw. URL
shorteners may have a place in severely size-constrained messages of
sorts, but have no business in mail. They won't be blacklisted by the
mayor players out there, though. ;)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: List of urls

Posted by Raymond Dijkxhoorn <ra...@prolocation.net>.
Hi!

> Now i do like this :
> ----
> uri url_1 /www.domain1.com/
> uri url_2 /www.domain2.com/
> uri url_3 /www.domain3.com/
> uri url_4 /www.domain4.com/
>
> score url_1 10
> score url_2 10
> score url_3 10
> score url_4 10

Isnt this a bit expensive? Report to SURBL or something and you get them 
added ;) (send a mail to raymond@surbl.org)

For your question, why dont you regexp it?

uri url_1 /www.domain(1|2|3|4).com/

The exact regexp is naturally depending on the domains but you dont need a 
seperate check for all.

The best to handle domains is putting them in a small rbl, or get them 
added to a existing rbl.

Bye,
Raymond.

Re: List of urls

Posted by John Hardin <jh...@impsec.org>.
On Tue, 26 Oct 2010, Richard Smits wrote:

> Does anyone know if it's possible to have a list of url's, and define a 
> score for all of them in one line ?
>
> ----
> Now i do like this :
> ----
> uri url_1 /www.domain1.com/
> uri url_2 /www.domain2.com/
> uri url_3 /www.domain3.com/
> uri url_4 /www.domain4.com/
>
> score url_1 10
> score url_2 10
> score url_3 10
> score url_4 10
>
> ----
> But I want just one line to define the score. Are there more ways to do 
> this?

Do you want ten points total if _any_ targeted URI hits, or ten points for 
each targeted URI that hits regardless of how many hit?

The latter is what you are doing above.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   ...the Fates notice those who buy chainsaws...
                                               -- www.darwinawards.com
-----------------------------------------------------------------------
  5 days until Halloween