You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jeff Chan <je...@surbl.org> on 2005/03/13 14:12:30 UTC

Re: [SURBL-Discuss] List of spamvertised sites sent via zombies, open proxies, etc.?

On Friday, March 11, 2005, 11:27:52 PM, Jeff Chan wrote:
> Does anyone have or know about a list of spam-advertised URIs
> where the spam they appeared in was sent through open relays,
> zombies, open proxies, etc.  In other words does anyone know
> of a list of spamvertised web sites or their domains that's
> been cross referenced to exploited hosts?

> We could use that information as a valuable tool for getting
> more records into SURBLs.

One fairly easy for anyone running a large SpamAssassin
installation to help us get this data would be to simply grep
for "XBL" and "SURBL" rules hitting the same message and report
out the URI domains from those messages.

Perhaps some kind person could write a reporting function in
SpamAssassin for this?

Jeff C.
--
"If it appears in hams, then don't list it."


Re: Was: List of spamvertised sites sent via zombies, open proxies, etc.?

Posted by Jeff Chan <je...@surbl.org>.
It would probably help if I explained that I brought up two
different but related ides in quick succession:

1.  Asking for URI domains of messages sent through zombies, open
relays, open proxies, etc. detected by XBL that mentioned SURBL URIs.

2.  Asking for URI domains of messages sent through zombies, open
relays, open proxies, etc. detected by XBL regardless of whether
those domains were already listed in SURBLs or not.

The latter may actually be more useful since it's broader and
more inclusive.  We could easily intersect them against SURBLs
ourselves if it were useful for other applications.

I believe this could be a valuable new data source.  It's true
that Spamhaus and others probably already have this data
internally but we don't.  ;-)  It's also possibly true that
existing trap based lists like ob.surbl.org and jp.surbl.org
may already have similar data in them.  As Paul notes there
is probably a lot of overlap between the various datasets
being used or proposed.

I'd probably ask for messages sent through XBL and list.dsbl.org
listed hosts since both lists are pretty reliable.  Completeness
of compromised host detection is probably non-essential for this
application.  The resulting dataset would be so large that missing
some fraction of zombies probably would not affect the end result
very much.  The sites of the biggest spammers would tend to
bubble to the top of a volume-ranked list.

Jeff C.
--
"If it appears in hams, then don't list it."


Re: [SURBL-Discuss] Re: Was: List of spamvertised sites sent via zombies, open proxies, etc.?

Posted by Jeff Chan <je...@surbl.org>.
On Sunday, March 13, 2005, 7:31:01 AM, Raymond Dijkxhoorn wrote:
>> I'm not asking for trap data.  I'm asking to look for XBL hits,
>> then take the URIs from messages that hit XBL.  In other words
>> I want to get the sites that are being advertised through
>> exploited hosts.
>>
>> Nothing to do with traps or SBL.  ;-)

> If you can get a feed, why limit this to hosts found inside XBL?

This is not for a spam feed specifically.  It's to get data about
what sites are spam advertised through compromised hosts.  XBL
happens to be a good, reliable list of compromised hosts.  Other
lists like list.dsbl.org may be ok too, but those are the only
two RBLs I have a lot of confidence in.  The goal would not be to
get all data but to get all reliable data.

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/


Re: Was: List of spamvertised sites sent via zombies, open proxies, etc.?

Posted by Jeff Chan <je...@surbl.org>.
On Sunday, March 13, 2005, 5:36:55 AM, Raymond Dijkxhoorn wrote:
> Hi!

>>> Perhaps some kind person could write a reporting function in
>>> SpamAssassin for this?

>> Hmm, perhaps if we could extract *all* URI domains from messages
>> sent through XBLed senders then prioritize those say by frequency
>> of appearance, we could create a new SURBL list of spamvertised
>> domains sent through exploited hosts.  That would pretty directly
>> address the use of zombies, etc. and put a penalty on using them
>> to advertise sites through them.  Even with volume weighting such
>> a list of sites could be attacked by major joe job unless we took
>> additional countermeasures, but does anyone else think this might
>> be a useful type of data source for SURBLs?
[...]

> Spamtraps are bad news if you use them 1:1, you need to parse out a LOT, 
> did you run poluted spamtraps? I have been running two proxypots, i still 
> might have some tars, and most of it was really useless. What more helps 
> is a wider coverage. I rather see some automated system like spamcop 
> setup, so people can report, and we auto parse it with Joe's tool for 
> example. With a larger footprint we also get spam earlier. Its not like 
> they first send to the spamtraps and then to 'real'users alone.

> I understand you want to cover new area's but please dont rely on other 
> RBL's too much, i think waiting with own checks does much more in the end. 
> IF SBL picks it up we can pick it up faster. But we also want to pickup 
> ones NOT listed by any RBL do we ?

I think you're not understanding what I'm asking for.  :-)

I'm not asking for trap data.  I'm asking to look for XBL hits,
then take the URIs from messages that hit XBL.  In other words
I want to get the sites that are being advertised through
exploited hosts.

Nothing to do with traps or SBL.  ;-)

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/


Was: List of spamvertised sites sent via zombies, open proxies, etc.?

Posted by Jeff Chan <je...@surbl.org>.
On Sunday, March 13, 2005, 5:12:30 AM, Jeff Chan wrote:
> On Friday, March 11, 2005, 11:27:52 PM, Jeff Chan wrote:
>> Does anyone have or know about a list of spam-advertised URIs
>> where the spam they appeared in was sent through open relays,
>> zombies, open proxies, etc.  In other words does anyone know
>> of a list of spamvertised web sites or their domains that's
>> been cross referenced to exploited hosts?

>> We could use that information as a valuable tool for getting
>> more records into SURBLs.

> One fairly easy for anyone running a large SpamAssassin
> installation to help us get this data would be to simply grep
> for "XBL" and "SURBL" rules hitting the same message and report
> out the URI domains from those messages.

> Perhaps some kind person could write a reporting function in
> SpamAssassin for this?

Hmm, perhaps if we could extract *all* URI domains from messages
sent through XBLed senders then prioritize those say by frequency
of appearance, we could create a new SURBL list of spamvertised
domains sent through exploited hosts.  That would pretty directly
address the use of zombies, etc. and put a penalty on using them
to advertise sites through them.  Even with volume weighting such
a list of sites could be attacked by major joe job unless we took
additional countermeasures, but does anyone else think this might
be a useful type of data source for SURBLs?

Jeff C.
--
"If it appears in hams, then don't list it."


Re: [SURBL-Discuss] List of spamvertised sites sent via zombies, open proxies, etc.?

Posted by Kai Schaetzl <ma...@conactive.com>.
Jeff Chan wrote on Sun, 13 Mar 2005 05:12:30 -0800:

> One fairly easy for anyone running a large SpamAssassin 
> installation to help us get this data would be to simply grep 
> for "XBL" and "SURBL" rules hitting the same message and report 
> out the URI domains from those messages.
>

I have a large corpus of spam and ham by quarantining in MailScanner. 
Unfortunately, MailScanner doesn't alter the quarantined messages, so I 
would need to have a tool scan the saved score data in the Mailwatch db 
and then scan each corresponding message for URIs (and wouldn't know which 
one of them, matched).
So, depending on how you run SA, it's not that easy to get at this data. 
Wouldn't it be possible to have an option in SA that adds the matching URI 
to the score (URI_SURBL_domain.com) or saves it in a "summary"? Wouldn't a 
statistics module for SA make sense anyway?

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org