You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/01/19 05:29:11 UTC

[Bug 2948] New: URI tests against DNSBL and RHSBL

http://bugzilla.spamassassin.org/show_bug.cgi?id=2948

           Summary: URI tests against DNSBL and RHSBL
           Product: Spamassassin
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: spamassassin
        AssignedTo: spamassassin-dev@incubator.apache.org
        ReportedBy: jdl@imaginenet.net


Many additional e-mails can be caught if the URIs are extracted from the 
message and the FQDN portion compared against one or more DNSBL's and/or 
RHSBL's. Of course the FQDN would have to be resolved first before testing 
against a DNSBL. Some sort of caching and unique routines would probably need 
to be used to prevent excessive queries.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Re: [Bug 2948] New: URI tests against DNSBL and RHSBL

Posted by Marc Perkel <ma...@perkel.com>.
Yes - but invisible text is also a dead givaway of spam. By forcing 
spammers to be clever it make them do other things that make them 
identifyable. So - I would welcome such a ploy.

Besides - my serve is fast and even with some DNS lookups it's still 
faster than me reading it.

Also - if the spam contains an unusually high number of what would be 
artificial links - that could also be detected. If there were say more 
than 6 links then we could check the first three and the last three. 
That would catch most of them - the rest would be caught by other rules.

Dan DeVoe wrote:

>On Tue, 20 Jan 2004, Marc Perkel wrote:
>
>  
>
>
>Spammers also want you to read the text of their message, but this doesn't
>stop them from including text just to foil spam filters; invisible,
>visible, at the bottom, at the top, in the middle, even between WORDS
>just to muck with spam filters. If, for each parsed spam, I knew I could
>cause the remote MX to generate arbitrary DNS lookups, not only do I
>potentially have a way to DoS _that_ system, I also have a way to
>potentially DoS any nameserver out there.
>
>I'm not saying that designing a system to detect spammer URLs and score
>against them isn't possible or even a good idea, simply to suggest that it
>will need to be a LOT more intelligent than simply "pull out all the URLs,
>perform DNS queries".
>
>Your reasoning that spammers will not risk ever confusing the user for the
>sake of wreaking havok makes far too many assumptions about the mindset
>of the spammer based on the evidence we have in 2004, IMNSHO.
>
>  
>
>>I think that a reverse lookup on URLs would create a lot of good token
>>data for the bayesian filter and would add to the accuracy of
>>identifying spam.
>>    
>>
>
>Well, to each their own I suppose, but I would not consider running
>anything like this unless I'd seen good hard evidence that it could not
>be used nefariously when deployed on a large scale. So far I've seen
>nothing like that.
>
>  
>

Re: [Bug 2948] New: URI tests against DNSBL and RHSBL

Posted by Dan DeVoe <dd...@zeus.netset.com>.
On Tue, 20 Jan 2004, Marc Perkel wrote:

> I don't agree with you on this because I think it would be unlikely that
> a spammer would include a lot of other URLs. The reason is that spam
> wants you to click on the url that takes them to their web site - so
> they can sell you something - so other URLs compete with that link and
> would reduce the spammers sales.

Spammers also want you to read the text of their message, but this doesn't
stop them from including text just to foil spam filters; invisible,
visible, at the bottom, at the top, in the middle, even between WORDS
just to muck with spam filters. If, for each parsed spam, I knew I could
cause the remote MX to generate arbitrary DNS lookups, not only do I
potentially have a way to DoS _that_ system, I also have a way to
potentially DoS any nameserver out there.

I'm not saying that designing a system to detect spammer URLs and score
against them isn't possible or even a good idea, simply to suggest that it
will need to be a LOT more intelligent than simply "pull out all the URLs,
perform DNS queries".

Your reasoning that spammers will not risk ever confusing the user for the
sake of wreaking havok makes far too many assumptions about the mindset
of the spammer based on the evidence we have in 2004, IMNSHO.

> I think that a reverse lookup on URLs would create a lot of good token
> data for the bayesian filter and would add to the accuracy of
> identifying spam.

Well, to each their own I suppose, but I would not consider running
anything like this unless I'd seen good hard evidence that it could not
be used nefariously when deployed on a large scale. So far I've seen
nothing like that.

-- 
 .''`.     Daniel DeVoe <dd...@netset.com>
: :'  :    http://www.netset.com/~ddevoe
`. `'`
  `-  Debian - when you have better things to do than fix a system

Re: [Bug 2948] New: URI tests against DNSBL and RHSBL

Posted by Marc Perkel <ma...@perkel.com>.
I don't agree with you on this because I think it would be unlikely that 
a spammer would include a lot of other URLs. The reason is that spam 
wants you to click on the url that takes them to their web site - so 
they can sell you something - so other URLs compete with that link and 
would reduce the spammers sales.

I think that a reverse lookup on URLs would create a lot of good token 
data for the bayesian filter and would add to the accuracy of 
identifying spam.

Dan DeVoe wrote:

>On Sun, 18 Jan 2004 bugzilla-daemon@bugzilla.spamassassin.org wrote:
>  
>
>
>I'm sure this has been discussed on the list before, but the problem I
>see with performing DNS lookups on URLs contained in possible spam is that
>a nefarious spammer could simply attach a very large list of URLs which
>would reduce performance or even cause a denial of service as each one was
>looked up in turn. Bad enough problem if only DNSBLs are queried, but
>even worse when each domain is resolved to an IP in ADDITION to the DNSBL
>lookup.
>
>Especially given domains and nameservers under the control of spammers,
>this could be evil evil evil.
>
>  
>

Re: [Bug 2948] New: URI tests against DNSBL and RHSBL

Posted by Dan DeVoe <dd...@zeus.netset.com>.
On Sun, 18 Jan 2004 bugzilla-daemon@bugzilla.spamassassin.org wrote:

> http://bugzilla.spamassassin.org/show_bug.cgi?id=2948
>
> Many additional e-mails can be caught if the URIs are extracted from the
> message and the FQDN portion compared against one or more DNSBL's and/or
> RHSBL's. Of course the FQDN would have to be resolved first before testing
> against a DNSBL. Some sort of caching and unique routines would probably need
> to be used to prevent excessive queries.

I'm sure this has been discussed on the list before, but the problem I
see with performing DNS lookups on URLs contained in possible spam is that
a nefarious spammer could simply attach a very large list of URLs which
would reduce performance or even cause a denial of service as each one was
looked up in turn. Bad enough problem if only DNSBLs are queried, but
even worse when each domain is resolved to an IP in ADDITION to the DNSBL
lookup.

Especially given domains and nameservers under the control of spammers,
this could be evil evil evil.

-- 
 .''`.     Daniel DeVoe <dd...@netset.com>
: :'  :    http://www.netset.com/~ddevoe
`. `'`
  `-  Debian - when you have better things to do than fix a system