You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Marc Perkel <ma...@perkel.com> on 2004/04/04 06:04:39 UTC

Good Job Guys!!! - SpamCopURI

I've been doing something like this for the last 2 years - but I've 
built my own list and having to use Exim tricks to make it work. 
Nonetheless - it was my best rule.

Having said that - this new one works better. I'm sure that's because it 
has a bigger URI list. So - just wanted to say - GOOD JOB! And - this 
rule seriously moves SA forward a LOT!



Re: Improving SpamCopURI

Posted by Daniel Quinlan <qu...@pathname.com>.
Jeff Chan <je...@surbl.org> writes:

> At this point the best solution is probably for you to send me
> the FP domains to manually whitelist at:  whitelist at surbl dot org
> Please send any you find if you get a chance.

Note that political spam is not uncommon.  Some people may be
legitimately subscribed (or want the mail), but many people get spams
from political organizations, candidates, etc.
 
>> However - many of these messages link to a variety of sites. From what I 
>> understand - the way the current rule works is if ANY link matches then 
>> the rule is triggered. Makes me wonder if there's a way to look at a 
>> situation where if most of the links are not positive - the rules isn't 
>> tripped.

Spammers are already inserting fake and innocent links to throw off
checkers.

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Re: Improving SpamCopURI

Posted by Jeff Chan <je...@surbl.org>.
On Monday, April 5, 2004, 7:36:47 AM, Marc Perkel wrote:
> Also - is there a way to feed back to the system new URIs for the list?
> A URI reporting system?

There is no way to report URIs directly to SURBL currently.  The
best way is to report them in spams to SpamCop.  It's indirect
but does the right thing if enough people do likewise and report
the same domain a few more times.

That said, I'm reworking the thresholding and retention system
to probably make the threshold much lower for known spam domain
IPs and Name servers as Daniel Quinlan suggested.

After watching the data for a while I think a longer general
retention of say 10 days might be a good idea to catch reports
over more than a week.  For known spam gang domains/name
servers/IPs we could make the retention a whole lot longer.
And domains that get dozens to hundreds of reports should
probably also be watched a lot longer using a longer retention.
Domains that get reported most probably deserve the most
attention through longer retention and perhaps a lower
inclusion threshold.

We would get external "known bad guys" data from other RBLs in
order to adjust thresholds and expirations, but the inclusion of
a domain in SURBL would still be triggered by SpamCop URI
reports.  But the trigger point would be lower for "bad guys".
This was a good suggestion from Daniel.

Are there any RBLs that are widely regarded as good indicators of
spam gang/spamhaus IPs other than SBLs?

Also, can anyone help us set up (or know where we can set up)
a discussion forum for SURBL?  We'd like to use it as a "star
chamber" for anti-spam veterans to join us in judging incoming
spam domains reaching the threshold to decide whether they belong
to spammers or are a false alarm and should be whitelisted.
We could also have blacklist recommendations and other discussion
there.  At this point we may need the help a community could
bring to help run things with SURBL.

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://www.surbl.org/


Re: Improving SpamCopURI

Posted by Loren Wilton <lw...@earthlink.net>.
> Also - is there a way to feed back to the system new URIs for the list? 
> A URI reporting system?

Sure - SpamCop!  If it makes it through it must need to be reported, nie?

        Loren


Re: Improving SpamCopURI

Posted by Marc Perkel <ma...@perkel.com>.
One thing that would be nice would be the ability for us to add out own 
URLs to our own private list. That way when something sneaks through we 
can add that any they go away - permanently.

Also - is there a way to feed back to the system new URIs for the list? 
A URI reporting system?


Re: Improving SpamCopURI

Posted by Jeff Chan <je...@surbl.org>.
On Sunday, April 4, 2004, 8:34:25 AM, Marc Perkel wrote:
> I've gotten very few but some false positives on this rule. And - I 
> can't tell what link produced the false positive and what to do about it 
> - which is a separate issue to address.

> The false positives are political in nature - mostly anti-war or 
> anti-bush stuff.

At this point the best solution is probably for you to send me
the FP domains to manually whitelist at:  whitelist at surbl dot org
Please send any you find if you get a chance.

> However - many of these messages link to a variety of sites. From what I 
> understand - the way the current rule works is if ANY link matches then 
> the rule is triggered. Makes me wonder if there's a way to look at a 
> situation where if most of the links are not positive - the rules isn't 
> tripped.

Yes any one URI matched against a spam domain will trigger
the rule for a given message.  One could implement your suggested
solution by scoring links within a message, requiring a certain
threshold or making the final score an accumulation or average
of the individual link scores within a single message having
multiple links.

One issue with that might be that an average could reward Joe Jobs
or invisible links.  In other words, a spammer could beat an
averaging rule by having say 20 invisible links to legitimate
sites and one visible link to their spam site.   I'm sure other
people have other ideas on this....

Regarding white rules, the best one I can think of is whether
a message is signed by someone on your public key ring....
That should get the message an automatic final score of 0.
(Not sure why public key encryption has not taken off for mail.
It seems entirely logical and useful to me....)

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://www.surbl.org/


Improving SpamCopURI

Posted by Marc Perkel <ma...@perkel.com>.
I've gotten very few but some false positives on this rule. And - I 
can't tell what link produced the false positive and what to do about it 
- which is a separate issue to address.

The false positives are political in nature - mostly anti-war or 
anti-bush stuff.

However - many of these messages link to a variety of sites. From what I 
understand - the way the current rule works is if ANY link matches then 
the rule is triggered. Makes me wonder if there's a way to look at a 
situation where if most of the links are not positive - the rules isn't 
tripped.

Or - any other ideas or thoughts?

Wondering about a white link version of this. No specifif ideas yet 
though. But pushing for more white rules.

Just trying to make a really good rule better.


Re: Good Job Guys!!! - SpamCopURI

Posted by Marc Perkel <ma...@perkel.com>.
I don't have stats but it looks like its catching about 60% of all spam 
with no false positives. About 2500 messages caught so far today.

Its a WINNER! Best rule ever! I'm scoring it at 6 points.

Jeff Chan wrote:

>Thanks for your feedback and kind words.  Do you happen to
>have any stats handy that we could use to help convince people
>to give this a try?
>
>By the way, we had noticed your earlier comment on bugzilla that
>your hand built URI lists were working well against spams:
>
>  http://bugzilla.spamassassin.org/show_bug.cgi?id=1375#c6
>
>  
>
>

Re: Good Job Guys!!! - SpamCopURI

Posted by Jeff Chan <je...@surbl.org>.
On Saturday, April 3, 2004, 8:04:39 PM, Marc Perkel wrote:
> I've been doing something like this for the last 2 years - but I've 
> built my own list and having to use Exim tricks to make it work. 
> Nonetheless - it was my best rule.

> Having said that - this new one works better. I'm sure that's because it 
> has a bigger URI list. So - just wanted to say - GOOD JOB! And - this 
> rule seriously moves SA forward a LOT!

Hi Marc,
Thanks for your feedback and kind words.  Do you happen to
have any stats handy that we could use to help convince people
to give this a try?

By the way, we had noticed your earlier comment on bugzilla that
your hand built URI lists were working well against spams:

  http://bugzilla.spamassassin.org/show_bug.cgi?id=1375#c6

> I've made some rules that blacklist based on specific hand build
> URIs and with about 150 that I test for I am really catching a lot of 
> spam and the accuracy of what is caught is almost 100%. I REALLY think 
> that the ability to somehow automatically generate a blacklist of URIs 
> of spam links will be a very effective spam control tool.

That was part of our justification for thinking this was a good
idea, so we owe you thanks for your earlier support of the idea.

We also owe Daniel Quinlan some thanks for responding in comment
#7 that setting up a new spam domain RBL would be a good way to
do this, which is what we did:   :-)

> Maintaining such a list inside of SpamAssassin is the wrong way to go.
> We can only afford to list URLs/hostnames that are particularly
> frequent.  The URLs change frequently, would need to be very numerous to
> be much more effective than what we have now, and we don't have the
> capacity to maintain so many.
> 
> The best route would be for someone to create a new RBL (a domain-based
> one) with spam domains.  Perhaps even one that provided for some way to
> do look-ups of full or partial URIs (some type of encoding to allow URIs
> to be expressed in hostnames).  The list would need maintenance,
> specific policies for listing/delisting, etc. -- the usual stuff.

RBLs are pretty much for domains or IPs, so we're not doing
URIs (has anyone made an RBL serving up URIs?), but we feel
these domains can be powerful indicators of spam.

We expect to do some more tuning of the way the data is handled,
and I am building a new system to produce SURBL which will be more
streamlined and handle some new features.  We're also taking
advantage of some improvements in the data feed from SpamCop.

The current system will continue to run until the new one tests
well in parallel for a while.  Then we'll cut over to the new
system.  SURBL users should notice no difference other than
more persistence of larger spam domains, and probably more
domains on the list with no major increase in false positives.

Thanks again,

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://www.surbl.org/