You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Adam Katz <an...@khopis.com> on 2009/04/27 22:10:48 UTC

emailBL

(note, I'm guessing at the appropriate mailing list for cross-post)

Dennis Davis wrote:
> http://code.google.com/p/anti-phishing-email-reply/
> 
> is also useful as it attempts to detail the compromised accounts.
> Just block/quarantine email for those accounts.

Interesting ... this seems like it would be best served by DNS in a
manner similar to URIBLs ... does such an "emailBL" exist?

A lookup for 8help@osu.edu (pulled from the live list) on emailBL
server "emailbl.org" could look like this:

$ host 8help.AT.osu.edu.emailbl.org
8help.AT.osu.edu.emailbl.org has address 127.0.0.1
$ host -t txt 8help.AT.osu.edu.emailbl.org
8help.AT.osu.edu.emailbl.org has descriptive text "20090310"
$

This maps 127.0.0.1 to type A, .2 to type B, etc.  Expirations, if
even necessary given the fact that the DNS entries should be updated
by the server, would be in the TXT records as illustrated above.

Since email addresses contain everything a valid domain can contain,
the user.AT.domain.tld (which is really user.at.domain.tld since
domains are not case-sensitive) could be ambiguous if the "user" or
the "domain" contains ".at." in itself, or whatever workaround we
create.  My proposed workaround is ".real-at." and an incremented
numeric suffix like ".real-at2." if needed.  As to pluses, just snip
them and their trailing data out.

8help@osu.edu -> 8help.at.osu.edu
portal.ac.at.edu@live.com -> portal.ac.at.edu.real-at.live.com
123+456@789.xyz -> 123.at.789.xyz
abc.real-at.def@ghi.jkl -> abc.real-at.def.real-at1.ghi.jkl
mno.real-at5.pqr@stu.vwx -> mno.real-at5.pqr.real-at6.stu.vwx
y.real-at999.z@a.at.real-at2.bc ->
    y.real-at4.z.real-at1000.a.at.real-at999.bc

This workaround should only find trouble when there are so many digits
that the overflow creates an invalid email address, which isn't a
realistic problem.

(Oh crap, is this a draft for an RFC?)

Re: emailBL

Posted by Henrik K <he...@hege.li>.
On Wed, Apr 29, 2009 at 08:27:34PM +0200, Benny Pedersen wrote:
> 
> On Tue, April 28, 2009 12:19, Henrik K wrote:
> > On Tue, Apr 28, 2009 at 10:51:33AM +0100, Matt wrote:
> >> Henrik K wrote:
> >>> If someone wants to try it on their mail feed:
> >>> http://sa.hege.li/pra.cf
> 
> can be made to milter-regex.conf ?

Then modify it. ;)

The list is somewhat useless in it's current form anyway. For this to really
work, we need emails to be in DNS <5 minutes since arrival in spamtraps.
That's what some of us are working on.


Re: emailBL

Posted by Benny Pedersen <me...@junc.org>.
On Tue, April 28, 2009 12:19, Henrik K wrote:
> On Tue, Apr 28, 2009 at 10:51:33AM +0100, Matt wrote:
>> Henrik K wrote:
>>> If someone wants to try it on their mail feed:
>>> http://sa.hege.li/pra.cf

can be made to milter-regex.conf ?

-- 
http://localhost/ 100% uptime and 100% mirrored :)


Re: emailBL

Posted by Henrik K <he...@hege.li>.
On Tue, Apr 28, 2009 at 10:51:33AM +0100, Matt wrote:
> Henrik K wrote:
>>
>> If someone wants to try it on their mail feed:
>>
>> http://sa.hege.li/pra.cf
>>
>> Don't mind the size, as optimized they only take millisecond or two to run.
>>
>> Of course when if it starts getting 10x the size, DNS will start looking
>> attractive..
>>
>>   
>
> I have been publishing a sa-update channel for this for some time
>
> the details are on Julian Field's blog (he wrote a script to do what  
> Regexp::Assemble does)
>
> http://www.jules.fm/Logbook/files/anti-spear-phishing.html

Ah nice.. though I'd rather see actually optimized regexp and not 200
separate rules. :)

What comes to my previous files: as it isn't clear to some of you, my code
is an example and I have no mention of usage or promise to update the rules.
Try at your discretion.

Hopefully someone will come up with the DNS based list, it certainly would
stop the need for costly spamassassin reloads.


Re: emailBL

Posted by Matt <sp...@coders.co.uk>.
Henrik K wrote:
>
> If someone wants to try it on their mail feed:
>
> http://sa.hege.li/pra.cf
>
> Don't mind the size, as optimized they only take millisecond or two to run.
>
> Of course when if it starts getting 10x the size, DNS will start looking
> attractive..
>
>   

I have been publishing a sa-update channel for this for some time

the details are on Julian Field's blog (he wrote a script to do what 
Regexp::Assemble does)

http://www.jules.fm/Logbook/files/anti-spear-phishing.html

matt

Re: emailBL

Posted by Mike Cardwell <sp...@lists.grepular.com>.
Henrik K wrote:

>>>> This might sound a big picky, but using backticks to call the date   
>>>> command in a perl script is horrible. Try using the standard gmtime   
>>>> function. Eg:
>>>>
>>>> $date = gmtime().' (UTC)';
>>>>
>>>> Rather than:
>>>>
>>>> $date = `date -u`; chomp($date);
>>> /me too busy to man perlfunc
>>>
>>> Let this thread be an inspiration for all coders out there.
>>>
>>> Now back to the real world..
>> Sorry, I assumed that if you were releasing source code to the public,  
>> you'd want to make sure it was cross platform compatible. I wont point  
>> out the various other limitations with your script then.
> 
> Are you actually serious or is this some geek humor that I don't get?

I was serious. Your code is a bit shit. I was just trying to help. Never 
mind.

> If you are serious, would you be willing to audit SpamAssassin code with such
> enthusiasm? It might actually _matter_.

No, I'm too busy.

-- 
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)

Re: emailBL

Posted by Henrik K <he...@hege.li>.
On Tue, Apr 28, 2009 at 10:31:42AM +0100, Mike Cardwell wrote:
> Henrik K wrote:
>
>>> This might sound a big picky, but using backticks to call the date   
>>> command in a perl script is horrible. Try using the standard gmtime   
>>> function. Eg:
>>>
>>> $date = gmtime().' (UTC)';
>>>
>>> Rather than:
>>>
>>> $date = `date -u`; chomp($date);
>>
>> /me too busy to man perlfunc
>>
>> Let this thread be an inspiration for all coders out there.
>>
>> Now back to the real world..
>
> Sorry, I assumed that if you were releasing source code to the public,  
> you'd want to make sure it was cross platform compatible. I wont point  
> out the various other limitations with your script then.

Are you actually serious or is this some geek humor that I don't get? If you
are serious, would you be willing to audit SpamAssassin code with such
enthusiasm? It might actually _matter_.


Re: emailBL

Posted by Mike Cardwell <sp...@lists.grepular.com>.
Henrik K wrote:

>> This might sound a big picky, but using backticks to call the date  
>> command in a perl script is horrible. Try using the standard gmtime  
>> function. Eg:
>>
>> $date = gmtime().' (UTC)';
>>
>> Rather than:
>>
>> $date = `date -u`; chomp($date);
> 
> /me too busy to man perlfunc
> 
> Let this thread be an inspiration for all coders out there.
> 
> Now back to the real world..

Sorry, I assumed that if you were releasing source code to the public, 
you'd want to make sure it was cross platform compatible. I wont point 
out the various other limitations with your script then.

-- 
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)

Re: emailBL

Posted by Henrik K <he...@hege.li>.
On Tue, Apr 28, 2009 at 09:46:44AM +0100, Mike Cardwell wrote:
> Henrik K wrote:
>
>>> (note, I'm guessing at the appropriate mailing list for cross-post)
>>>
>>> Dennis Davis wrote:
>>>> http://code.google.com/p/anti-phishing-email-reply/
>>>>
>>>> is also useful as it attempts to detail the compromised accounts.
>>>> Just block/quarantine email for those accounts.
>>> Interesting ... this seems like it would be best served by DNS in a
>>> manner similar to URIBLs ... does such an "emailBL" exist?
>>
>> If someone wants to try it on their mail feed:
>>
>> http://sa.hege.li/pra.cf
>>
>> Don't mind the size, as optimized they only take millisecond or two to run.
>>
>> Of course when if it starts getting 10x the size, DNS will start looking
>> attractive..
>
> This might sound a big picky, but using backticks to call the date  
> command in a perl script is horrible. Try using the standard gmtime  
> function. Eg:
>
> $date = gmtime().' (UTC)';
>
> Rather than:
>
> $date = `date -u`; chomp($date);

/me too busy to man perlfunc

Let this thread be an inspiration for all coders out there.

Now back to the real world..


Re: emailBL

Posted by Mike Cardwell <sp...@lists.grepular.com>.
Henrik K wrote:

>> (note, I'm guessing at the appropriate mailing list for cross-post)
>>
>> Dennis Davis wrote:
>>> http://code.google.com/p/anti-phishing-email-reply/
>>>
>>> is also useful as it attempts to detail the compromised accounts.
>>> Just block/quarantine email for those accounts.
>> Interesting ... this seems like it would be best served by DNS in a
>> manner similar to URIBLs ... does such an "emailBL" exist?
> 
> If someone wants to try it on their mail feed:
> 
> http://sa.hege.li/pra.cf
> 
> Don't mind the size, as optimized they only take millisecond or two to run.
> 
> Of course when if it starts getting 10x the size, DNS will start looking
> attractive..

This might sound a big picky, but using backticks to call the date 
command in a perl script is horrible. Try using the standard gmtime 
function. Eg:

$date = gmtime().' (UTC)';

Rather than:

$date = `date -u`; chomp($date);

-- 
Mike Cardwell
(https://secure.grepular.com) (http://perlcv.com/)

Re: emailBL

Posted by Henrik K <he...@hege.li>.
On Mon, Apr 27, 2009 at 04:10:48PM -0400, Adam Katz wrote:
> (note, I'm guessing at the appropriate mailing list for cross-post)
> 
> Dennis Davis wrote:
> > http://code.google.com/p/anti-phishing-email-reply/
> > 
> > is also useful as it attempts to detail the compromised accounts.
> > Just block/quarantine email for those accounts.
> 
> Interesting ... this seems like it would be best served by DNS in a
> manner similar to URIBLs ... does such an "emailBL" exist?

If someone wants to try it on their mail feed:

http://sa.hege.li/pra.cf

Don't mind the size, as optimized they only take millisecond or two to run.

Of course when if it starts getting 10x the size, DNS will start looking
attractive..


Re: emailBL

Posted by David B Funk <db...@engineering.uiowa.edu>.
On Mon, 27 Apr 2009, Karsten Bräckelmann wrote:

> On Mon, 2009-04-27 at 16:10 -0400, Adam Katz wrote:
> > Since email addresses contain everything a valid domain can contain,
> > the user.AT.domain.tld (which is really user.at.domain.tld since
> > domains are not case-sensitive) could be ambiguous if the "user" or
> > the "domain" contains ".at." in itself, or whatever workaround we
> > create.  My proposed workaround is ".real-at." and an incremented
> > numeric suffix like ".real-at2." if needed.
>
> You are aware there's a ccTLD .at? :)
>
>
> > (Oh crap, is this a draft for an RFC?)
>
> This pretty much was one of my first thoughts, too. I vaguely recall
> coming across such an RFC before. Hope someone else can point it out.
>

There is RFC-1035 (section 8) encoding of e-mail addresses for DNS
usage (EG the SOA record or RP record) which could be used here.
Slight potential for confusion but at least a starting point.


-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: emailBL

Posted by Adam Katz <an...@khopis.com>.
Karsten Bräckelmann wrote:
> You are aware there's a ccTLD .at? :)

Yes, but the TLD goes at the very end of the email, so the parser,
which strips ".emailbl.org" with that leading dot, can only trip over
invalid domains like "a.at..emailbl.org" ... my latter two examples
below show what the parser might do if actually handed such things.

a.at.b.at.emailbl.org -> a @ b.at
c.real-at.d.at.at.emailbl.org -> c @ d.at.at
a.at.b.at..emailbl.org (invalid) ~-> a@b.at OR a.at.b@.emailbl.org
a.b.at..emailbl.org (invalid) ~-> a.b@.emailbl.org

Recall:
a @ b.at -> a.at.b.at -> a.at.b.at.emailbl.org
c @ d.at.at -> c.real-at.d.at.at -> c.real-at.d.at.at.emailbl.org

$ host www..google.com
host: 'www..google.com' is not a legal name (empty label)
$ host .google.com
host: '.google.com' is not a legal name (empty label)
$

>> (Oh crap, is this a draft for an RFC?)
> 
> This pretty much was one of my first thoughts, too. I vaguely recall
> coming across such an RFC before. Hope someone else can point it out.

Indeed.

Re: emailBL

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Mon, 2009-04-27 at 16:10 -0400, Adam Katz wrote:
> Since email addresses contain everything a valid domain can contain,
> the user.AT.domain.tld (which is really user.at.domain.tld since
> domains are not case-sensitive) could be ambiguous if the "user" or
> the "domain" contains ".at." in itself, or whatever workaround we
> create.  My proposed workaround is ".real-at." and an incremented
> numeric suffix like ".real-at2." if needed.

You are aware there's a ccTLD .at? :)


> (Oh crap, is this a draft for an RFC?)

This pretty much was one of my first thoughts, too. I vaguely recall
coming across such an RFC before. Hope someone else can point it out.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: emailBL

Posted by John Hardin <jh...@impsec.org>.
On Tue, 28 Apr 2009, Mike Cardwell wrote:

> Alternatively, just stick the original email address in the 
> TXT record. So in rbldnsd, you'd have a record like this:
>
> 98f22901b17b13d910456597685c1963 :127.0.0.1:the.real@email.address

I was going to suggest that. Another thing to put in the TXT record might 
be a URL to evidence - e.g. (one of) the phishing emails containing that 
address as the contact point.

> There's no advantage of sticking the email address in the TXT record 
> rather than having a separate file, apart from keeping the data 
> together.

Ease of access?

OTOH, if you're (not you, Mike) going to host this data, you'll probably 
have a webby interface for interactive lookups, and that might be the 
proper way to publish the evidence. If the email address typed into the 
web form hits, offer a link to view the evidence supporting the listing.

I don't think there's any reason to keep the email address or the evidence 
(suitably sanitized of the targeted victim's contact information) 
confidential.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Windows Genuine Advantage (WGA) means that now you use your
   computer at the sufferance of Microsoft Corporation. They can
   kill it remotely without your consent at any time for any reason;
   it also shuts down in sympathy when the servers at Microsoft crash.
-----------------------------------------------------------------------
  10 days until the 64th anniversary of VE day

Re: 419 emailBL?

Posted by Henrik K <he...@hege.li>.
On Mon, May 04, 2009 at 10:51:14PM +0200, mouss wrote:
>
> That said, I am surprised because you defended the fact that the
> freemail plugin includes the list of freemail domains...

Think about it. Maybe few thousand freemail domains, that hardly change. Why
would that require realtime updating? They can simply be updated with
sa-update. It's strange that someone would have to "defend" this.

> This wasn't intended as a list to download. I didn't even check the
> license. I was simply replying to your "I'm surprised there still hasn't
> been an emailBL around". or if you prefer: the idea of an "emailBL
> around" isn't new. note also that SARE has a ruleset with phone numbers
> and "snail mail" infos found in spam.

Ideas are another thing, but implementing it is simple actually. We are
already at alpha stage with almost finished plugin for SA and harvesting
lots of addresses. Results to be seen..


Re: 419 emailBL?

Posted by mouss <mo...@ml.netoyen.net>.
Henrik K a écrit :
> On Sun, May 03, 2009 at 06:25:01PM +0200, mouss wrote:
>> I can't use a dnsbl on recipient addresses in postfix. This requires
>> additionnal code (exceptionally if the records are hashed...). MySQL on
>> the other hand is supported by many daemons. Sure, SA would need a mysql
>> access db plugin, but that would be beneficial for other things I think.
> 
> MySQL is not a global solution.
> 

it is for me since I use it. I don't see why I should load gigas in
rbldnsd when I can query mysql. but I agree that this is a personal
view. so let's leave it like this.

That said, I am surprised because you defended the fact that the
freemail plugin includes the list of freemail domains...

> Fixing up a postfix policyd is no problem and exim supports it out of the
> box, md5 is hardy "exceptional" function.
> 

sure.

>>> Personally I'm only interested in "freemails", I don't know how feasible it
>>> would be to create a global email blacklist. 419/phishers are pretty much
>>> the only spam that's hard to catch. I'm surprised there still hasn't been an
>>> emailBL around, 
>>
>> http://www.419scam.org/419-bl.htm
> 
> Sorry I'm not interested in wgetting a humongous list, which happens also to
> be 2 days old, also no mention of anything about freshness. :)
> 

This wasn't intended as a list to download. I didn't even check the
license. I was simply replying to your "I'm surprised there still hasn't
been an emailBL around". or if you prefer: the idea of an "emailBL
around" isn't new. note also that SARE has a ruleset with phone numbers
and "snail mail" infos found in spam.

Re: 419 emailBL?

Posted by Henrik K <he...@hege.li>.
On Sun, May 03, 2009 at 06:25:01PM +0200, mouss wrote:
>
> I can't use a dnsbl on recipient addresses in postfix. This requires
> additionnal code (exceptionally if the records are hashed...). MySQL on
> the other hand is supported by many daemons. Sure, SA would need a mysql
> access db plugin, but that would be beneficial for other things I think.

MySQL is not a global solution.

Fixing up a postfix policyd is no problem and exim supports it out of the
box, md5 is hardy "exceptional" function.

> > Personally I'm only interested in "freemails", I don't know how feasible it
> > would be to create a global email blacklist. 419/phishers are pretty much
> > the only spam that's hard to catch. I'm surprised there still hasn't been an
> > emailBL around, 
> 
> 
> http://www.419scam.org/419-bl.htm

Sorry I'm not interested in wgetting a humongous list, which happens also to
be 2 days old, also no mention of anything about freshness. :)


Re: 419 emailBL?

Posted by mouss <mo...@ml.netoyen.net>.
Benny Pedersen a écrit :
> On Sun, May 3, 2009 18:25, mouss wrote:
>> stock postfix. something I can't do with a dnsbl since there is no
>> reject_rhsbl_recipient...
> 

correction: There is no DNSBL check that acts on the full email address.
reject_rhsbl_recipient will lookup the domain part.

> http://www.docunext.com/blog/2006/12/07/sorbs-settings/

or simply

http://www.postfix.org/postconf.5.html#reject_rhsbl_recipient

Re: 419 emailBL?

Posted by Benny Pedersen <me...@junc.org>.
On Sun, May 3, 2009 18:25, mouss wrote:
> stock postfix. something I can't do with a dnsbl since there is no
> reject_rhsbl_recipient...

http://www.docunext.com/blog/2006/12/07/sorbs-settings/

-- 
http://localhost/ 100% uptime and 100% mirrored :)


Re: 419 emailBL?

Posted by mouss <mo...@ml.netoyen.net>.
Henrik K a écrit :
> On Sun, May 03, 2009 at 03:14:22PM +0200, mouss wrote:
>> Henrik K a écrit :
>>> On Sun, May 03, 2009 at 03:40:47AM +0200, mouss wrote:
>>>> with rsync or the like, you can simply add the addresses (no MD5, no
>>>> anything) to an access list that your MTA can use.
>>> You don't get free rsyncs for big players like uribl for reason (um, traffic
>>> etc?).
>> some DNSBLs are available via rsync.
>>
>> $ wc -l psbl.txt
>>  1494939 psbl.txt
>> $ ls -l psbl.txt
>> ... 20969353 ...
> 
> Like I said, no one is stopping offering it. It's up to the list or someone
> donating resources to such list. But the bigger/more popular the list,
> harder it is to create a reliable rsync-network that can handle hoardes of
> clients checking stuff every 15 minutes.
> 
>>> If we had a big emailbl, obviously it would be impractical as well.
>>> You really want to be updated every 5-15 minutes, which DNS allows.
>>>
>> It is possible to use a mechanism similar to SA update:
>> - use DNS to see if there is an update
>> - if so, download changes since some recent version
> 
> See the DNS part? You already got answer there so why complicate things? ;)
> 

not the same.

1- here, you do one dns check every 5-15 minutes. the number has nothing
to do with the amount of mail you see.
2- and the query is not done while checking mail. it's asynchronous and
adds no latency to mail checking.
3- it requires no integration with MTA or whatever. I can use this with
stock postfix. something I can't do with a dnsbl since there is no
reject_rhsbl_recipient...



>>> Of course no one stops such list offering the plain text emails as plain
>>> file. But do you want potentially millions of emails in a file?
>>>
>> 1- I prefer that over latency
> 
> You can use rbldnsd, if the data is available.. I just meant why would you
> want to have a complicated setup, especially if you are going to use the
> data possibly on several levels (MTA, SA). Transferring files around and
> reloading daemons is silly.
> 

I can't use a dnsbl on recipient addresses in postfix. This requires
additionnal code (exceptionally if the records are hashed...). MySQL on
the other hand is supported by many daemons. Sure, SA would need a mysql
access db plugin, but that would be beneficial for other things I think.

(and with local data, you can support regular expressions [except for
the "simple" wildcard things]. AFAIK, rbldnsd doesn't support these).

>> - the disabled addresses do not need to be "shared" anymore.
> 
> I'm asking because I don't know: is that reality? Do you get confirmation
> from i.e. gmail that some account is disabled? From the list point of view
> it's simple enough to wait a month or so to see if the email is still found
> in spams. Reporting etc is another thing and not necessarily concern of the
> list.
> 

I have no evidence for email addresses, but fraud domains/subdomains get
disabled (except at "uncollaborative" sites or registrars. but there I
blacklist the whole domain...).

> Personally I'm only interested in "freemails", I don't know how feasible it
> would be to create a global email blacklist. 419/phishers are pretty much
> the only spam that's hard to catch. I'm surprised there still hasn't been an
> emailBL around, 


http://www.419scam.org/419-bl.htm


> but maybe this time it becomes reality.. atleast to have
> some scoring in SA.
> 
>> I don't have a "fixed" opinion. I am just trying to see if using the
>> well-known dns hack (dnsbl) is the best choice.
> 
> DNS is simple and effective remote database for simple queries. Unless
> someone invents even better and easy to use global solution.
> 



Re: 419 emailBL?

Posted by Henrik K <he...@hege.li>.
On Sun, May 03, 2009 at 03:14:22PM +0200, mouss wrote:
> Henrik K a écrit :
> > On Sun, May 03, 2009 at 03:40:47AM +0200, mouss wrote:
> >> with rsync or the like, you can simply add the addresses (no MD5, no
> >> anything) to an access list that your MTA can use.
> > 
> > You don't get free rsyncs for big players like uribl for reason (um, traffic
> > etc?).
> 
> some DNSBLs are available via rsync.
> 
> $ wc -l psbl.txt
>  1494939 psbl.txt
> $ ls -l psbl.txt
> ... 20969353 ...

Like I said, no one is stopping offering it. It's up to the list or someone
donating resources to such list. But the bigger/more popular the list,
harder it is to create a reliable rsync-network that can handle hoardes of
clients checking stuff every 15 minutes.

> > If we had a big emailbl, obviously it would be impractical as well.
> > You really want to be updated every 5-15 minutes, which DNS allows.
> > 
> 
> It is possible to use a mechanism similar to SA update:
> - use DNS to see if there is an update
> - if so, download changes since some recent version

See the DNS part? You already got answer there so why complicate things? ;)

> > Of course no one stops such list offering the plain text emails as plain
> > file. But do you want potentially millions of emails in a file?
> > 
> 
> 1- I prefer that over latency

You can use rbldnsd, if the data is available.. I just meant why would you
want to have a complicated setup, especially if you are going to use the
data possibly on several levels (MTA, SA). Transferring files around and
reloading daemons is silly.

> - the disabled addresses do not need to be "shared" anymore.

I'm asking because I don't know: is that reality? Do you get confirmation
from i.e. gmail that some account is disabled? From the list point of view
it's simple enough to wait a month or so to see if the email is still found
in spams. Reporting etc is another thing and not necessarily concern of the
list.

Personally I'm only interested in "freemails", I don't know how feasible it
would be to create a global email blacklist. 419/phishers are pretty much
the only spam that's hard to catch. I'm surprised there still hasn't been an
emailBL around, but maybe this time it becomes reality.. atleast to have
some scoring in SA.

> I don't have a "fixed" opinion. I am just trying to see if using the
> well-known dns hack (dnsbl) is the best choice.

DNS is simple and effective remote database for simple queries. Unless
someone invents even better and easy to use global solution.

Cheers,
Henrik

Re: 419 emailBL?

Posted by mouss <mo...@ml.netoyen.net>.
Henrik K a écrit :
> On Sun, May 03, 2009 at 03:40:47AM +0200, mouss wrote:
>> with rsync or the like, you can simply add the addresses (no MD5, no
>> anything) to an access list that your MTA can use.
> 
> You don't get free rsyncs for big players like uribl for reason (um, traffic
> etc?).

some DNSBLs are available via rsync.

$ wc -l psbl.txt
 1494939 psbl.txt
$ ls -l psbl.txt
... 20969353 ...


> If we had a big emailbl, obviously it would be impractical as well.
> You really want to be updated every 5-15 minutes, which DNS allows.
> 

It is possible to use a mechanism similar to SA update:
- use DNS to see if there is an update
- if so, download changes since some recent version


> Of course no one stops such list offering the plain text emails as plain
> file. But do you want potentially millions of emails in a file?
> 

1- I prefer that over latency
2- do we _now_ have millions of such addresses? if not, premature
optimization...


here is how I see things:

- criminals (AFF, phish, ...) use some email addresses
- these addresses get listed
- the addresses are reported to domains owners
- domain owners disable these addresses (if the domain owner is the
criminal, then the full domain can be listed, and/or it can be reported
to the registrar... etc.)
- the disabled addresses do not need to be "shared" anymore.
- ... etc


I don't have a "fixed" opinion. I am just trying to see if using the
well-known dns hack (dnsbl) is the best choice.


Re: 419 emailBL?

Posted by Henrik K <he...@hege.li>.
On Sun, May 03, 2009 at 03:40:47AM +0200, mouss wrote:
> 
> with rsync or the like, you can simply add the addresses (no MD5, no
> anything) to an access list that your MTA can use.

You don't get free rsyncs for big players like uribl for reason (um, traffic
etc?). If we had a big emailbl, obviously it would be impractical as well.
You really want to be updated every 5-15 minutes, which DNS allows.

Of course no one stops such list offering the plain text emails as plain
file. But do you want potentially millions of emails in a file?


Re: 419 emailBL?

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Apr 29, 2009 at 7:56 PM, Adam Katz <an...@khopis.com> wrote:
>> I guess it depends what you mean by "enormous".  A sought rule update is 135k.
>
> And 135k doesn't add up to a lot of bandwidth?  I suppose it depends
> on the number of users, and I'm figuring worst-case scenario, e.g.
> when/if it ships enabled in the default SA install.

Well, it depends what you're measuring.  :)

The update itself isn't large, it's just 135k, which is the not
"enormous" bit.  135k in and of itself is a pretty tiny file, but I'm
not sure what "enormous" means in this context -- megs?  gigs?

The aggregate bandwidth could very well be large, depending on update
publish frequency, client update frequency, number of clients, client
bandwidth, etc.  From what I've seen, the standard SA updates w/ the
same ~130k size and the current number of users ... isn't a lot of
bandwidth.

There are some pretty standard ways to deal with this issue though, such as:

a) have lots of mirrors, same idea as your P2P idea though less
dynamic  (oh, that was another thought I had ... go short of using
torrents since they're resource heavy and instead make our own P2P
protocol doing a dynamic http/mirrored.by system)

b) split the channel into a frequent / not frequent channel (or stable
/ testing, or split based on content, or ...) for patterns which don't
change often, there's no reason to keep sending them out.  same idea I
mentioned before.

c) shrink or hold update size steady in face of updates.  hard.

d) make updates less frequently.  defeats the purpose?  clearly every
15m is different than every day is different than weekly ...


To be perfectly honest, I really don't worry about the "omg, update
bandwidth" issue right now.  I worry that there aren't enough updates
right now.  The only auto-generated one, sought, is daily, and the
manual ones now are more than weekly on average.  I don't know if
sought could even be produced faster, you need a certain amount of
incoming ham and spam to sample and produce test rules, and enough
diversity of mails to test against to avoid "obvious" bad rules...

Re: [SA] 419 emailBL?

Posted by Adam Katz <an...@khopis.com>.
>> And if bandwidth at the server is a problem, would publishing the ruleset
>> updates via the Coral Cache network work?
> 
> Unfortunately, no.  In fact, they kind of suck as a CDN.  We
> originally were putting updates through there and would regularly have
> issues w/ 404s, corrupt or incomplete downloads, etc.
> 
> It may have improved since the 2005 or so timeframe when we started w/
> updates, but ...  Haven't checked in a while.

Still has the same issues.  I'll be removing them from my sa-update
channels mirror files very soon.

Re: 419 emailBL?

Posted by John Hardin <jh...@impsec.org>.
On Wed, 29 Apr 2009, Theo Van Dinter wrote:

> On Wed, Apr 29, 2009 at 8:06 PM, John Hardin <jh...@impsec.org> wrote:
>>> And 135k doesn't add up to a lot of bandwidth?
>> And if bandwidth at the server is a problem, would publishing the ruleset
>> updates via the Coral Cache network work?
>
> Unfortunately, no.  In fact, they kind of suck as a CDN.  We
> originally were putting updates through there and would regularly have
> issues w/ 404s, corrupt or incomplete downloads, etc.
>
> It may have improved since the 2005 or so timeframe when we started w/
> updates, but ...  Haven't checked in a while.

I've edited my MIRRORED.BY, we'll see how it goes...

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The real opiate of the masses isn't religion; it's the belief that
   somewhere there is a benefit that can be delivered without a
   corresponding cost.                       -- Tom of "Radio Free NJ"
-----------------------------------------------------------------------
  9 days until the 64th anniversary of VE day

Re: 419 emailBL?

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Apr 29, 2009 at 8:06 PM, John Hardin <jh...@impsec.org> wrote:
>> And 135k doesn't add up to a lot of bandwidth?
>
> ...so don't look for updates more than once every day or two.

Yeah, but I think the point was that a frequently changing ruleset
would be downloaded frequently.

> And if bandwidth at the server is a problem, would publishing the ruleset
> updates via the Coral Cache network work?

Unfortunately, no.  In fact, they kind of suck as a CDN.  We
originally were putting updates through there and would regularly have
issues w/ 404s, corrupt or incomplete downloads, etc.

It may have improved since the 2005 or so timeframe when we started w/
updates, but ...  Haven't checked in a while.

Re: 419 emailBL?

Posted by John Hardin <jh...@impsec.org>.
On Wed, 29 Apr 2009, Adam Katz wrote:

> Theo Van Dinter wrote:
>> On Wed, Apr 29, 2009 at 6:24 PM, Adam Katz <an...@khopis.com> wrote:
>>> The mechanism for sa-update is brilliant, but
>>> doesn't lend itself to enormous indices of frequently-changing rulesets.
>>
>> I guess it depends what you mean by "enormous".  A sought rule update is 135k.
>
> And 135k doesn't add up to a lot of bandwidth?

...so don't look for updates more than once every day or two.

And if bandwidth at the server is a problem, would publishing the ruleset 
updates via the Coral Cache network work?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   A superior gunman is one who uses his superior judgment to keep
   himself out of situations that would require the use of his
   superior skills.
-----------------------------------------------------------------------
  9 days until the 64th anniversary of VE day

Re: 419 emailBL?

Posted by Adam Katz <an...@khopis.com>.
Theo Van Dinter wrote:
> On Wed, Apr 29, 2009 at 6:24 PM, Adam Katz <an...@khopis.com> wrote:
>> The mechanism for sa-update is brilliant, but
>> doesn't lend itself to enormous indices of frequently-changing rulesets.
> 
> I guess it depends what you mean by "enormous".  A sought rule update is 135k.

And 135k doesn't add up to a lot of bandwidth?  I suppose it depends
on the number of users, and I'm figuring worst-case scenario, e.g.
when/if it ships enabled in the default SA install.

> The likelihood is, imo, that you would probably split up your updates
> into multiple channels before they really got out of control in size.
> For example, you could do something like a weekly, daily, and
> sub-daily channel, and move rules appropriately between them.  Yes, a
> little more of a PITA for clients, but how much churn do you really
> expect?

How about hierarchical channel support, e.g. a channel's MIRRORED.BY
file is merely itself a sa-update-channels file.

>> Justin:  Perhaps sa-update could support [version].torrent in addition
>> to [version].tar.gz on each mirror?  (This doesn't touch the current
>> DNS-based version/announce system.)  Channels hosted for versions of
>> SA after the supporting release (e.g. 0.4.3.[channel] and "higher")
>> would be allowed to host only the torrent file.
> 
> I had actually thought about doing a P2P sa-update so as to better
> withstand DoS issues, skip the need for a mirrored.by file, etc.  But
> the main issue is that most channel updates are rather small, and so
> therefore the downloads are rather fast.  Compared to doing a torrent,
> which takes relatively a long time to get setup, and just as you
> start, you're done.  Also, it means clients are serving data, which
> makes the "quick sa-update and move on" more of a procedure and you
> have to worry about remote connectivity, etc, etc.
> 
> In the end it didn't seem worthwhile beyond the security aspect, so I
> didn't move beyond the "thinking about" stage.
> 
> (and yes, I know I'm not Justin. ;))

You're close enough on the SA development order.  For BT, I was
actually envisioning much larger rulesets with sought merely heralding
a future with lots of large auto-generated rulesets, but perhaps it
doesn't scale at the right point.  I think I'm trying to squeeze to
much :-p

-- 
Adam Katz
khopesh on irc://irc.freenode.net/#spamassassin
http://khopesh.com/Anti-spam

Re: [SA] 419 emailBL?

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Apr 29, 2009 at 6:24 PM, Adam Katz <an...@khopis.com> wrote:
> The mechanism for sa-update is brilliant, but
> doesn't lend itself to enormous indices of frequently-changing rulesets.

I guess it depends what you mean by "enormous".  A sought rule update is 135k.

The likelihood is, imo, that you would probably split up your updates
into multiple channels before they really got out of control in size.
For example, you could do something like a weekly, daily, and
sub-daily channel, and move rules appropriately between them.  Yes, a
little more of a PITA for clients, but how much churn do you really
expect?

> Justin:  Perhaps sa-update could support [version].torrent in addition
> to [version].tar.gz on each mirror?  (This doesn't touch the current
> DNS-based version/announce system.)  Channels hosted for versions of
> SA after the supporting release (e.g. 0.4.3.[channel] and "higher")
> would be allowed to host only the torrent file.

I had actually thought about doing a P2P sa-update so as to better
withstand DoS issues, skip the need for a mirrored.by file, etc.  But
the main issue is that most channel updates are rather small, and so
therefore the downloads are rather fast.  Compared to doing a torrent,
which takes relatively a long time to get setup, and just as you
start, you're done.  Also, it means clients are serving data, which
makes the "quick sa-update and move on" more of a procedure and you
have to worry about remote connectivity, etc, etc.

In the end it didn't seem worthwhile beyond the security aspect, so I
didn't move beyond the "thinking about" stage.


(and yes, I know I'm not Justin. ;))

Re: my emailBL is live!

Posted by John Wilcock <jo...@tradoc.fr>.
Le 29/04/2009 02:40, Adam Katz a écrit :
> replaces the @ with a dot (not an underscore, that's not a legal
> character).

Won't that pose problems distinguishing between fred.bloggs@example.tld 
and fred@bloggs.example.tld ?

John.

-- 
-- Over 3000 webcams from ski resorts around the world - www.snoweye.com
-- Translate your technical documents and web pages    - www.tradoc.fr

Re: my emailBL is live!

Posted by John Hardin <jh...@impsec.org>.
On Wed, 29 Apr 2009, Jesse Thompson wrote:

> A word of caution.  Be very careful how you use the list.  The intended 
> usage for the list is to prevent (or monitor) local users from sending 
> email to the listed addresses.  The phishers frequently use compromised 
> end-user accounts to receive the phishing replies, so there is a high 
> risk of false positives, especially if you attempt to classify messages 
> containing one these addresses as spam.

+1

Given the context of this information, the only safe way to use it is as a 
component of a meta that also requires phishy text fragments.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   You do not examine legislation in the light of the benefits it
   will convey if properly administered, but in the light of the
   wrongs it would do and the harms it would cause if improperly
   administered.                                  -- Lyndon B. Johnson
-----------------------------------------------------------------------
  9 days until the 64th anniversary of VE day

Re: [SA] emailBL code

Posted by John Hardin <jh...@impsec.org>.
On Fri, 1 May 2009, Adam Katz wrote:

> John Hardin wrote:
>> How would the phisher collect the password info from their target using 
>> a forged sender address?
>
> A web form.

Hrm. Okay, I'll buy that. If you're going to spearfish a specific 
organization then it would be reasonable to put the effort into forging a 
password capture website that looks plausible.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Ignorance doesn't make stuff not exist.               -- Bucky Katt
-----------------------------------------------------------------------
  7 days until the 64th anniversary of VE day

Re: [SA] emailBL code

Posted by Adam Katz <an...@khopis.com>.
John Hardin wrote:
> How would the phisher collect the password info from their target using
> a forged sender address?

A web form.


Re: emailBL code

Posted by Adam Katz <an...@khopis.com>.
Jesse Thompson wrote:
>     Possible values for TYPE:
>         E: The ADDRESS (usually in the From header) might receive replies
>             but it was not intended to receive the replies.

Oh!  That's a new one.  Changes my code.  My code now supports Z as
requesting a hidden email address, A-J as codes (with FGHIJ being
currently undefined), and ignores K-Y (as both undefined and not noted).

  $type_list =~ s/.*,([A-Z]+),.*/$1/;
  if ($type_list =~ /Z/) {
    $email =~ s/\t".*"/\t"\@hidden\@"/; # hide the email address
  }
  $type_list =~ s/[K-Z]//g; # remove unhandled types K-Y and Z
  $type_list =~ s/(?=.)/+2**/g;
  $type_list =~ tr [A-J] [0-9]; # this needs rewriting when we get a K!
  $type_list = eval 0 . $type_list;
  $type_list = "\tA\t127.0.0.$type_list\n";

Other suggestions to my list before somebody works on a plugin?
Other sources with which to seed it?
Volunteers to test it?  I'm not sure if I have enough volume surviving
greylisting (which nabs ~90% of my incoming mail) for useful stats, e.g.
my hits on malware-patrol is fully zero (and yes, I run clamAV *after* SA).

-- 
Adam Katz
khopesh on irc://irc.freenode.net/#spamassassin
http://khopesh.com/Anti-spam

Re: emailBL code

Posted by Henrik K <he...@hege.li>.
On Fri, May 01, 2009 at 02:36:28PM -0500, Jesse Thompson wrote:
> John Hardin wrote:
>> On Fri, 1 May 2009, Adam Katz wrote:
>>
>>> The emailBL mechanism could easily be populated by a spamtrap, but the
>>> danger from false positives (forged sender addresses) would be quite
>>> real.
>
> On a related note: you also need to worry about the phishers  
> intentionally forging the Reply-To with normal addresses in an attempt  
> to poison the list.

Especially if one only lists freemail addresses on the list (like we are
going to do for now), the worry it pretty small. Why would the 419/phishers
want to spend time blocking normal peoples freemails? And if the spamtraps
are well hidden (or process otherwise manual), it would take serious effort
to get someone listed.


Re: emailBL code

Posted by Jesse Thompson <je...@doit.wisc.edu>.
John Hardin wrote:
> On Fri, 1 May 2009, Adam Katz wrote:
> 
>> The emailBL mechanism could easily be populated by a spamtrap, but the
>> danger from false positives (forged sender addresses) would be quite
>> real.

On a related note: you also need to worry about the phishers 
intentionally forging the Reply-To with normal addresses in an attempt 
to poison the list.


> Suggestion: ignore the sender address if there is a Reply-To: header or 
> if there is an email address in the body of the message. There might 
> need to be some logic around detecting the contact address in the 
> message body - there could be garbage addresses inserted to get the 
> phishtrap to ignore the sender address...

That's what we do.  We've had lengthy discussions about this issue.  It 
all boils down accurately gauging the intention of the phisher, which is 
essentially impossible to automate.

It gets tricky when you consider the situation where the phisher 
intended the user to reply to the address included in the body, but the 
user doesn't pay attention and replies to the From instead, *and* the 
phisher happens to still have access to the original compromised account 
(the From address) used to send the phish.  So, it makes sense to add 
the From to the list in this case.  However, the account in question is 
usually cleaned up by the email provider quickly, so now a normal user's 
address is on the list.  And... to make matters worse, that user will 
potentially start receiving credentials from other users that are 
replying to the phish messages.

Anyway, here is the current state of how we classify the addresses:

     Possible values for TYPE:

         A: The ADDRESS was used in the Reply-To header.

         B: The ADDRESS was used in the From header.

         C: The content of the phishing message contained the ADDRESS.

         D: The content of the phishing message contained the ADDRESS,
             and it was obfuscated.

         E: The ADDRESS (usually in the From header) might receive replies
             but it was not intended to receive the replies.

     Note: unless otherwise specified, in order for the ADDRESS to
           qualify for each TYPE, it must have been intended to
           receive the replies.

Jesse

-- 
   Jesse Thompson
   Division of Information Technology, University of Wisconsin-Madison
   Email/IM: jesse.thompson@doit.wisc.edu

Re: emailBL code

Posted by John Hardin <jh...@impsec.org>.
On Fri, 1 May 2009, Adam Katz wrote:

> The emailBL mechanism could easily be populated by a spamtrap, but the
> danger from false positives (forged sender addresses) would be quite
> real.

How would the phisher collect the password info from their target using a 
forged sender address?

Suggestion: ignore the sender address if there is a Reply-To: header or if 
there is an email address in the body of the message. There might need to 
be some logic around detecting the contact address in the message body - 
there could be garbage addresses inserted to get the phishtrap to ignore 
the sender address...

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Warning Labels we'd like to see #1: "If you are a stupid idiot while
  using this product you may hurt yourself. And it won't be our fault."
-----------------------------------------------------------------------
  7 days until the 64th anniversary of VE day

Re: emailBL code

Posted by Adam Katz <an...@khopis.com>.
Yet Another Ninja wrote:
>> I'm trying hard to convince myself this data is really useful.
>> 
>> the whole 
>> http://anti-phishing-email-reply.googlecode.com/svn/trunk/phishing_reply_addresses
>> file has 4518 entries, including vintage 2008
>> 
>> compared to the big_boyz my trap feed is quite small and I
>> collected 1598 entries during the last 4 hrs

Well, this is different from traps ... though admittedly not by much.
 The fact that it's updated so frequently is a merit, and the reason
dates are noted is so that you can adjust accordingly.

The emailBL mechanism could easily be populated by a spamtrap, but the
danger from false positives (forged sender addresses) would be quite
real.  Maybe only publish addresses that pass or fail SPF/DKIM/etc, so
that domains without a way to verify authenticity are immune to it?

>> does anybody have any hit metrics?

Mike Cardwell responded:
> The list was set up to satisfy a very specific group of users that
> were being targetted by a very specific scam. Spear Phishing
> against Higher Education institutions in the UK and USA. It was
> originally discussed on a mailing list run by "nd.edu" which can
> only be subscribed to by people who are in that particular sector.
> For that particular group, the list has been useful. How useful it
> is for people outside of that scenario, I don't know.

This is why I set up the emailbl in the first place:  to see what it
does.  We need an SA plugin next.

Re: emailBL code

Posted by Mike Cardwell <sp...@lists.grepular.com>.
Yet Another Ninja wrote:

>>> This is not to suggest that I ever understood the part about using
>>> half-length MD5.
>>
>> No need.  I'm using full-length hashes now, plus the SURBL/chmod style
>> IP addresses.  I must have lost the email I was composing on the topic,
>> but it's fully propagated by now.  I've attached my code.
>>
>> Note that the code still supports the old truncated string.  I'll rip
>> that out soon.  Also note that I'm not an advanced perl coder (almost
>> all of my perl scripts start as POSIX shell scripts, including this one)
>> .... so while I'm happy to get *suggestions*, I'm not so eager for the
>> insults and hash words this list tends to give instead.
> 
> I'm trying hard to convince myself this data is really useful.
> 
> the whole 
> http://anti-phishing-email-reply.googlecode.com/svn/trunk/phishing_reply_addresses 
> file has 4518 entries, including vintage 2008
> 
> compared to the big_boyz my trap feed is quite small and I collected 
> 1598 entries during the last 4 hrs
> 
> hmmmmm
> 
> does anybody have any hit metrics?

The list was set up to satisfy a very specific group of users that were 
being targetted by a very specific scam. Spear Phishing against Higher 
Education institutions in the UK and USA. It was originally discussed on 
a mailing list run by "nd.edu" which can only be subscribed to by people 
who are in that particular sector. For that particular group, the list 
has been useful. How useful it is for people outside of that scenario, I 
don't know.

-- 
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)

Anti-phishing outside of just detection

Posted by Adam Katz <an...@khopis.com>.
I wrote:
>> I'd still rather block the offending message than intercept responses
>> to it (as that means it has suckered users, which means it has wasted
>> their time).  I see APER as a possible aid in that pursuit, though as
>> Jesse has mentioned, it is not fully reliable (as to be determined).
>> Still, these little checks add up, so even if APER gives a message 0.1
>> points, that might be enough to mark it as spam or even block it at
>> the door.
>>
>> As a secondary defense, blocking replies sounds like a grand idea.

Mandy wrote:
> I absolutely agree that the messages should be stopped on their way
> in.  I'd rather our users not have an opportunity to be suckered.  But
> at least knowing about the replies gives us a way to target our
> education efforts (now, where'd I put that LART...)

Along this light, I'd love to honeypot it; complement phishing
detection with an automated responder along the lines of "okay, here's
my login information" which of course is connected to a meaningless
account that merely informs the admins that somebody has logged on.

With that information, the admins can dig up the offending message and
see who else received it, they can examine the IP of the login and
track who else it has logged in as, and of course, the authorities can
be involved.  All before the users would have concluded there was a
problem.

Going the other direction, I read (maybe a year ago?) that some US
government organization was actually sending fake phishing emails to
their users.  When the user clicks on it, they are informed of what
they did and how to prevent it.  KnujOn (or maybe it was somebody else
presenting at this year's MIT Spam Conference?) is now pushing for
sites taken down for phishing (et al) to be replaced with information
on what happened rather than generic placeholders or nothing at all.
This is a GRAND idea!

Re: emailBL code

Posted by Mandy <me...@gmail.com>.
On Fri, May 1, 2009 at 3:37 PM, Adam Katz <an...@khopis.com> wrote:
> Can you determine how many of those were out-of-office messages?  Then
> again, even at just two, if you can stop such compromises, it's worth
> it (and then some).

The replies I was talking about was, sadly, manually filtered to
remove everything that looked like an auto response.  What I couldn't
tell was how many were "yeah, right!" or "die, spammer, die!" style
responses.  Thankfully we only had 2 compromised accounts (but that's
two too many).

> I'd still rather block the offending message than intercept responses
> to it (as that means it has suckered users, which means it has wasted
> their time).  I see APER as a possible aid in that pursuit, though as
> Jesse has mentioned, it is not fully reliable (as to be determined).
> Still, these little checks add up, so even if APER gives a message 0.1
> points, that might be enough to mark it as spam or even block it at
> the door.
>
> As a secondary defense, blocking replies sounds like a grand idea.

I absolutely agree that the messages should be stopped on their way
in.  I'd rather our users not have an opportunity to be suckered.  But
at least knowing about the replies gives us a way to target our
education efforts (now, where'd I put that LART...)

As far as blocking inbound messages, I'm going to have to remove a few
addresses from the list before I can do that.  My initial search
results were chock full of false positives.  One of the people who
made the list corresponds very regularly with 10 - 20 people in my
organization.  Granted, at 0.1, it's not a big deal, and such a rule
would probably make a fantastic META companion (warning, fictional,
unlinted rule follows)

meta   L_PHISHY   FROM_ON_APER && WEBMAIL_SUBJECT
score  L_PHISHY   2.5

anyone?

Re: emailBL code

Posted by Adam Katz <an...@khopis.com>.
I forgot to also mention honeypots here.

Create a few accounts whose sole purpose is finding these phishing
attacks.  They are email accounts which will appear to fall victim to
the attack, sending their "password" which gains "access" to the
company's web portal.  Of course, all this "access" does is tell the
admins that something bad is happening (watch that IP!) and that an
announcement to the user base is probably in order.

I believe that the best response to spam is ... responses to spam.
Especially for phishers, giving bad data and creating automated methods
by which to notice the attacks will help fight against them and will
alert them to the fact that phishing is not profitable.

After all, they only do it because it makes money.  An unfortunately
large amount of money.  Report your spam to places like KnujOn and
SpamCop!  KnujOn's mission statement is something along the lines of
stopping the profitability of spam, one domain (registrar) at a time,
and SpamCop sends out nice little complaint letters to registrars and
upstream network admins.

I'd really like to see the KnujOn reporting bug (SpamAssassin bug 6085)
filled so that it is easier to report directly to them, especially for
phishing spam for its obvious importance over standard spam.

Garth and Robert:  This thread is a bit big (sorry) ... basically, we're
(I'm?) working on putting up a URIBL based on phishing email reply-tos
(emailBL).  This email you're reading should be indexed as a child of
this post: http://www.nabble.com/Phishing-tt23226790.html#a23339685 (so
you can climb up the thread) ... I cc'd you because this might be of
interest to you (aside from the plug).  No reason you can't pursue email
domains in addition to web domains...

Re: emailBL code

Posted by Adam Katz <an...@khopis.com>.
Mandy wrote:
> I work for a Canadian provincial government, on a system with about
> 50,000 mailboxes.  I scanned our outbound mail logs over the past 6
> months with this data.  There were 31 replies to "Your webmail is
> expired!! !" type messages in that period.
> 
> If we had had been blocking outbound mail based on this list, the two
> compromised accounts we had to deal with (one of which made the list
> in its turn) wouldn't have happened.
> 
> I definitely see value here.

Can you determine how many of those were out-of-office messages?  Then
again, even at just two, if you can stop such compromises, it's worth
it (and then some).

I'd still rather block the offending message than intercept responses
to it (as that means it has suckered users, which means it has wasted
their time).  I see APER as a possible aid in that pursuit, though as
Jesse has mentioned, it is not fully reliable (as to be determined).
Still, these little checks add up, so even if APER gives a message 0.1
points, that might be enough to mark it as spam or even block it at
the door.

As a secondary defense, blocking replies sounds like a grand idea.

Re: emailBL code

Posted by Mandy <me...@gmail.com>.
On Fri, May 1, 2009 at 7:52 AM, Jesse Thompson
<je...@doit.wisc.edu> wrote:
> Yet Another Ninja wrote:
>>
>> I'm trying hard to convince myself this data is really useful.

I work for a Canadian provincial government, on a system with about
50,000 mailboxes.  I scanned our outbound mail logs over the past 6
months with this data.  There were 31 replies to "Your webmail is
expired!! !" type messages in that period.

If we had had been blocking outbound mail based on this list, the two
compromised accounts we had to deal with (one of which made the list
in its turn) wouldn't have happened.

I definitely see value here.

>> compared to the big_boyz my trap feed is quite small and I collected 1598
>> entries during the last 4 hrs
>
> Hello Yet Another Ninja,
>
> "big_boyz": as in a small collection of university postmasters?  I guess we
> should be honored, but I have a feeling that you were being condescending.

I got the impression he was talking about the major RBL providers
(spamhaus, spamcop), and the commercial filtering vendors.

[snip]

> Even the largest password-reply phishing campaign we've seen was only sent
> to 2500 of our users (and that was using the same reply-to).  On average, we
> see around 200 messages (30 unique reply-to's; not all new) of this type of
> phishing attempt every day.  I assume that the other universities see
> something similar.

After I spend some more time evaluating things, and looking for this
specific type of campaign, I'm planning to start blocking outbound
mail based on your list.  If I develop some tools for finding the
campaigns I'd be happy to contribute the messages.

Austin.

Re: emailBL code

Posted by John Hardin <jh...@impsec.org>.
On Fri, 1 May 2009, Yet Another Ninja wrote:

> Only little drawback is how to centralize (or not) all this gold to make 
> it useful to more than me and my dog.

I (and I'm sure others) would be willing to feed phishing corpa from our 
quarantines, so long as it's easy to do.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Warning Labels we'd like to see #1: "If you are a stupid idiot while
  using this product you may hurt yourself. And it won't be our fault."
-----------------------------------------------------------------------
  7 days until the 64th anniversary of VE day

Re: emailBL code

Posted by Yet Another Ninja <sa...@alexb.ch>.
On 5/1/2009 4:52 PM, Jesse Thompson wrote:
> Yet Another Ninja wrote:
>> I'm trying hard to convince myself this data is really useful.
>>
>> the whole 
>> http://anti-phishing-email-reply.googlecode.com/svn/trunk/phishing_reply_addresses 
>> file has 4518 entries, including vintage 2008
>>
>> compared to the big_boyz my trap feed is quite small and I collected 
>> 1598 entries during the last 4 hrs
> 
> Hello Yet Another Ninja,
> 
> "big_boyz": as in a small collection of university postmasters?  I guess 
> we should be honored, but I have a feeling that you were being 
> condescending.

Feel as you please.
I manage a relatively small trap space compared to some of the players 
here, so I meant what I said. Traps never correlate to a number of 
specific rcpt addresses, only.

> If you are the opposite of a "big_boy", that must mean that your domain 
> is smaller than a large university's, so you must have less than, say, 
> 50,000 unique active users.  
I'm definitely smaller, that doesn't mean that trap traffic can't be 
huge. Traps aren't active - they sit there and get hammered.

> Are you truly saying that every 4 hours you 
> have 1598 unique (as in the reply-to is unique) phishing attempts, in 
> which the phisher asks one of your users to reply with their credentials?

nope - I'm collecting generic drop boxes type of stuff and not specific 
phishes for a specific group.
these include phishes, lotto scams, etc using specific domains. (not 
rcpt domains)

> If what you are saying is true, then you are standing on a gold mine. 
> Would you mind contributing to the project?

every school, corp,ISP, soho server, etc is standing on a similar gold 
mine, I'm not re-inventing the wheel.
Only little drawback is how to centralize (or not) all this gold to make 
it useful to more than me and my dog.
Until I have some minimal metrics I can't say.

> As for the vintage of the addresses.  No, I don't have metrics.  But 
> most of the addresses are in the freemail domains, and we have no 
> indication that the freemail providers are shutting down this type of 
> account.  I don't mind scanning logs for, or blocking mail to, the "old" 
> addresses.  But we do include the date (however accurate it is) so you 
> can choose to filter the list any way you desire.

no need to got thru that trouble - you guys know its value, once apps 
are here to test the data, then others outside your space will report, 
I'm sure.

We have different targets. I misunderstood APER's

this is all work in progress so keep tuned....

Axb

Re: emailBL code

Posted by Jesse Thompson <je...@doit.wisc.edu>.
Yet Another Ninja wrote:
> I'm trying hard to convince myself this data is really useful.
> 
> the whole 
> http://anti-phishing-email-reply.googlecode.com/svn/trunk/phishing_reply_addresses 
> file has 4518 entries, including vintage 2008
> 
> compared to the big_boyz my trap feed is quite small and I collected 
> 1598 entries during the last 4 hrs

Hello Yet Another Ninja,

"big_boyz": as in a small collection of university postmasters?  I guess 
we should be honored, but I have a feeling that you were being 
condescending.

What exactly are you collecting?  Keep in mind that the APER project is 
very focused on preventing email replies to phishing (hence the name). 
We aren't trying to stop the phishing itself (directly); there are 
others that do that.

If you are the opposite of a "big_boy", that must mean that your domain 
is smaller than a large university's, so you must have less than, say, 
50,000 unique active users.  Are you truly saying that every 4 hours you 
have 1598 unique (as in the reply-to is unique) phishing attempts, in 
which the phisher asks one of your users to reply with their credentials?

If what you are saying is true, then you are standing on a gold mine. 
Would you mind contributing to the project?

Even the largest password-reply phishing campaign we've seen was only 
sent to 2500 of our users (and that was using the same reply-to).  On 
average, we see around 200 messages (30 unique reply-to's; not all new) 
of this type of phishing attempt every day.  I assume that the other 
universities see something similar.

As for the vintage of the addresses.  No, I don't have metrics.  But 
most of the addresses are in the freemail domains, and we have no 
indication that the freemail providers are shutting down this type of 
account.  I don't mind scanning logs for, or blocking mail to, the "old" 
addresses.  But we do include the date (however accurate it is) so you 
can choose to filter the list any way you desire.

Jesse

-- 
   Jesse Thompson
   Division of Information Technology, University of Wisconsin-Madison
   Email/IM: jesse.thompson@doit.wisc.edu

Re: emailBL code

Posted by Yet Another Ninja <sa...@alexb.ch>.
On 5/1/2009 3:56 PM, Adam Katz wrote:
> Jeff Moss wrote:
>> This is not to suggest that I ever understood the part about using
>> half-length MD5.
> 
> No need.  I'm using full-length hashes now, plus the SURBL/chmod style
> IP addresses.  I must have lost the email I was composing on the topic,
> but it's fully propagated by now.  I've attached my code.
> 
> Note that the code still supports the old truncated string.  I'll rip
> that out soon.  Also note that I'm not an advanced perl coder (almost
> all of my perl scripts start as POSIX shell scripts, including this one)
> .... so while I'm happy to get *suggestions*, I'm not so eager for the
> insults and hash words this list tends to give instead.

I'm trying hard to convince myself this data is really useful.

the whole 
http://anti-phishing-email-reply.googlecode.com/svn/trunk/phishing_reply_addresses 
file has 4518 entries, including vintage 2008

compared to the big_boyz my trap feed is quite small and I collected 
1598 entries during the last 4 hrs

hmmmmm

does anybody have any hit metrics?

emailBL code

Posted by Adam Katz <an...@khopis.com>.
Jeff Moss wrote:
> This is not to suggest that I ever understood the part about using
> half-length MD5.

No need.  I'm using full-length hashes now, plus the SURBL/chmod style
IP addresses.  I must have lost the email I was composing on the topic,
but it's fully propagated by now.  I've attached my code.

Note that the code still supports the old truncated string.  I'll rip
that out soon.  Also note that I'm not an advanced perl coder (almost
all of my perl scripts start as POSIX shell scripts, including this one)
... so while I'm happy to get *suggestions*, I'm not so eager for the
insults and hash words this list tends to give instead.

RE: my emailBL is live!

Posted by Jeff Moss <jm...@Huffmancorp.com>.
>> The chance of a collision really is much smaller than I thought, even
>> including the birthday paradox.  But rather than just say it's small and
>> ask you to take my word for it I'm providing a link.  The Wikipedia page
>> for Birthday Attack has a chart that shows the probability of collision
>> for hashes of various lengths.
>>
>> http://en.wikipedia.org/wiki/Birthday_attack>
>
>Well nuts.  Unless my estimation is wrong, my half-length MD5sum would
>be 64-bit and thus the 10^-18 probability of collisions would require
a> db of 190 entries rather than full-length MD5sum's 820 billion.
>
>Unless corrected, I'll revise my algorithm this evening.

Well, a 64-bit hash with a 10^-18 probability of collisions would only require 6 entries in the DB.  However a 10^-12 probability should be good enough because there probably aren't a trillion unique email addresses.  A 10^-12 probability of collision would allow 6 million entries in the DB.
 
This is not to suggest that I ever understood the part about using half-length MD5.

  Jeff Moss



Re: my emailBL is live!

Posted by Adam Katz <an...@khopis.com>.
Jeff Moss wrote:
> The chance of a collision really is much smaller than I thought, even
> including the birthday paradox.  But rather than just say it's small and
> ask you to take my word for it I'm providing a link.  The Wikipedia page
> for Birthday Attack has a chart that shows the probability of collision
> for hashes of various lengths.
> 
> http://en.wikipedia.org/wiki/Birthday_attack

Well nuts.  Unless my estimation is wrong, my half-length MD5sum would
be 64-bit and thus the 10^-18 probability of collisions would require
a db of 190 entries rather than full-length MD5sum's 820 billion.

Unless corrected, I'll revise my algorithm this evening.

RE: my emailBL is live!

Posted by Jeff Moss <jm...@Huffmancorp.com>.
Rob McEwen wrote:

>>> A word of caution.  Be very careful how you use the list.
>>
>> OK. I was wrong. Due to this discussion, I'm convinced that MD5 of the
>> whole (lower case!) e-mail address is best, with the entire e-mail
>> address still showing up in plain text in the DNS txt record.
>>
>> But I have some questions:
>>
>> (1) is MD5 of the entire address reasonably safe from collisions.
>> (consider the 'birthday paradox' before being too quick to answer)
>
>Yes. The chance of a collision is ridiculously small. Not worth worrying
>about.

The chance of a collision really is much smaller than I thought, even including the birthday paradox.  But rather than just say it's small and ask you to take my word for it I'm providing a link.  The Wikipedia page for Birthday Attack has a chart that shows the probability of collision for hashes of various lengths.

http://en.wikipedia.org/wiki/Birthday_attack

  Jeff Moss




Re: my emailBL is live!

Posted by Mike Cardwell <sp...@lists.grepular.com>.
Rob McEwen wrote:

>> A word of caution.  Be very careful how you use the list.
> 
> OK. I was wrong. Due to this discussion, I'm convinced that MD5 of the
> whole (lower case!) e-mail address is best, with the entire e-mail
> address still showing up in plain text in the DNS txt record.
> 
> But I have some questions:
> 
> (1) is MD5 of the entire address reasonably safe from collisions.
> (consider the 'birthday paradox' before being too quick to answer)

Yes. The chance of a collision is ridiculously small. Not worth worrying 
about.

> (2) I'm also interested in knowing more specifics about the data found
> at
> http://anti-phishing-email-reply.googlecode.com/svn/trunk/phishing_reply_addresses
> 
> (2.a.) how frequently are new scam addresses added to that list?
> 
> (2.b.) how long does an address take to expire since the last e-mail
> address is used for scams "in the wild"
> 
> (2.c.) Is the data auto-added? or must e-mail addresses go through a
> manual review first?
> 
> (2.d.) Moreover, what is a typical time between the "419" spammer's last
> spotted use of the e-mail, and appearance in that list?
> 
> (I don't need exactly precise answers which spammers might use to 'game'
> the system... just basic estimates will do)

There's actually a mailing list for the project. You're probably better 
off asking these questions there:

http://groups.google.com/group/anti-phishing-email-reply-discuss

-- 
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)

Re: my emailBL is live!

Posted by Jesse Thompson <je...@doit.wisc.edu>.
Rob McEwen wrote:
> Jesse Thompson wrote:
>> A word of caution.  Be very careful how you use the list.
> 
> OK. I was wrong. Due to this discussion, I'm convinced that MD5 of the
> whole (lower case!) e-mail address is best, with the entire e-mail
> address still showing up in plain text in the DNS txt record.
> 
> But I have some questions:
> 
> (1) is MD5 of the entire address reasonably safe from collisions.
> (consider the 'birthday paradox' before being too quick to answer)
> 
> (2) I'm also interested in knowing more specifics about the data found
> at
> http://anti-phishing-email-reply.googlecode.com/svn/trunk/phishing_reply_addresses
> 
> (2.a.) how frequently are new scam addresses added to that list?

Every day.  Contributers add addresses when they find them.


> (2.b.) how long does an address take to expire since the last e-mail
> address is used for scams "in the wild"

They don't expire.  You can use the date to make up your own policies 
depending on what you are doing.

We do have a 'phishing_cleared_addresses' list which we use when we get 
confirmation that an account has been locked down.  Addresses on the 
cleared list are automatically removed from the 
'phishing_reply_addresses' list if the activity date is older than the 
cleared date.


> (2.c.) Is the data auto-added? or must e-mail addresses go through a
> manual review first?

Manually added.  But I can't speak for the methods of everyone that 
contributes.


> (2.d.) Moreover, what is a typical time between the "419" spammer's last
> spotted use of the e-mail, and appearance in that list?

It's reactionary, so the spam must be received before it can be discovered.


> (I don't need exactly precise answers which spammers might use to 'game'
> the system... just basic estimates will do)


Jesse

-- 
   Jesse Thompson
   Division of Information Technology, University of Wisconsin-Madison
   Email/IM: jesse.thompson@doit.wisc.edu

Re: my emailBL is live!

Posted by Rob McEwen <ro...@invaluement.com>.
Jesse Thompson wrote:
> A word of caution.  Be very careful how you use the list.

OK. I was wrong. Due to this discussion, I'm convinced that MD5 of the
whole (lower case!) e-mail address is best, with the entire e-mail
address still showing up in plain text in the DNS txt record.

But I have some questions:

(1) is MD5 of the entire address reasonably safe from collisions.
(consider the 'birthday paradox' before being too quick to answer)

(2) I'm also interested in knowing more specifics about the data found
at
http://anti-phishing-email-reply.googlecode.com/svn/trunk/phishing_reply_addresses

(2.a.) how frequently are new scam addresses added to that list?

(2.b.) how long does an address take to expire since the last e-mail
address is used for scams "in the wild"

(2.c.) Is the data auto-added? or must e-mail addresses go through a
manual review first?

(2.d.) Moreover, what is a typical time between the "419" spammer's last
spotted use of the e-mail, and appearance in that list?

(I don't need exactly precise answers which spammers might use to 'game'
the system... just basic estimates will do)

-- 
Rob McEwen
http://dnsbl.invaluement.com/
rob@invaluement.com
+1 (478) 475-9032



Re: 419 emailBL?

Posted by Mike Cardwell <sp...@lists.grepular.com>.
mouss wrote:

>>> Is the best way to do this - not via DNS.
>> Depends what you're trying to achieve. I thought the objective was a
>> block list of email addresses that could be queried via the DNS by any
>> application... Your suggestion doesn't really capture the requirements.
> and what is the benefit of using DNS? why not rsync/svn/wget/... ?
> 
>> In this particular example, the list should be used for preventing your
>> users sending emails *to* those addresses. Many organisations rightly or
>> wrongly don't perform spam filtering on their outgoing relays so
>> spamassassin is a bit over the top when you can just use another dns
>> based bl.
>
> with rsync or the like, you can simply add the addresses (no MD5, no
> anything) to an access list that your MTA can use.

It sounds like you're asking me what the benefit of distributing a block 
list via the DNS is? If yes, type "dnsbl" into google. If not, please 
clarify ...

-- 
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)

Re: 419 emailBL?

Posted by mouss <mo...@ml.netoyen.net>.
Mike Cardwell a écrit :
> Steve Freegard wrote:
> [snip]
>>
>> Is the best way to do this - not via DNS.
> 
> Depends what you're trying to achieve. I thought the objective was a
> block list of email addresses that could be queried via the DNS by any
> application... Your suggestion doesn't really capture the requirements.
> 

and what is the benefit of using DNS? why not rsync/svn/wget/... ?


> In this particular example, the list should be used for preventing your
> users sending emails *to* those addresses. Many organisations rightly or
> wrongly don't perform spam filtering on their outgoing relays so
> spamassassin is a bit over the top when you can just use another dns
> based bl.
> 

with rsync or the like, you can simply add the addresses (no MD5, no
anything) to an access list that your MTA can use.

Re: [SA] 419 emailBL?

Posted by Mike Cardwell <sp...@lists.grepular.com>.
Adam Katz wrote:

>>>> For listing both emails and uri's it would be useful if you could add
>>>> regular expressions. [...]
> 
> Steve Freegard responded:
>>> Yuck; if you want to do stuff using regexp then:
>>>
>>> uri RULE_NAME /<regexp>/
>>> score RULE_NAME nn.nnn
>>>
>>> Is the best way to do this - not via DNS.
> 
> Mike Cardwell defended:
>> Depends what you're trying to achieve. I thought the objective was a
>> block list of email addresses that could be queried via the DNS by any
>> application... Your suggestion doesn't really capture the requirements.
>>
>> In this particular example, the list should be used for preventing your
>> users sending emails *to* those addresses. Many organisations rightly or
>> wrongly don't perform spam filtering on their outgoing relays so
>> spamassassin is a bit over the top when you can just use another dns
>> based bl.
> 
> If by "any application" you mean "any application that can handle
> full-blown perl regular expressions" ... your regex examples are
> nontrivial, so you're already pretty much catering to SA anyway.

You completely misunderstood what I was suggesting. On the server side I 
shove this in my list:

^foo-\d+@example\.com$

Then when the client looks up foo-5@example.com I return a positive 
result. The client needs no regex capability.

-- 
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)

Re: [SA] 419 emailBL?

Posted by Adam Katz <an...@khopis.com>.
Mike Cardwell wrote:
>>> For listing both emails and uri's it would be useful if you could add
>>> regular expressions. [...]

Steve Freegard responded:
>> Yuck; if you want to do stuff using regexp then:
>>
>> uri RULE_NAME /<regexp>/
>> score RULE_NAME nn.nnn
>>
>> Is the best way to do this - not via DNS.

Mike Cardwell defended:
> Depends what you're trying to achieve. I thought the objective was a
> block list of email addresses that could be queried via the DNS by any
> application... Your suggestion doesn't really capture the requirements.
> 
> In this particular example, the list should be used for preventing your
> users sending emails *to* those addresses. Many organisations rightly or
> wrongly don't perform spam filtering on their outgoing relays so
> spamassassin is a bit over the top when you can just use another dns
> based bl.

If by "any application" you mean "any application that can handle
full-blown perl regular expressions" ... your regex examples are
nontrivial, so you're already pretty much catering to SA anyway.

There's also the question of handling quotes and other forbidden
characters in the TXT field, plus its length limit.  Once that's all
solved, the question of feasibility and efficiency still looms.

Given the options of putting that kind of thing in (A) DNS or (B)
sa-channels, I'd lean towards (B) on the way to (C) something else:

I'm sure Justin Mason (for his sought channel) has thought long and
hard about this.  The mechanism for sa-update is brilliant, but
doesn't lend itself to enormous indices of frequently-changing
rulesets.  Even if it were revised to enable a diff/patch system (hint
hint), it would still fail to distribute the remaining load.

Justin:  Perhaps sa-update could support [version].torrent in addition
to [version].tar.gz on each mirror?  (This doesn't touch the current
DNS-based version/announce system.)  Channels hosted for versions of
SA after the supporting release (e.g. 0.4.3.[channel] and "higher")
would be allowed to host only the torrent file.

Either the self-healing nature of BT would implement the diffing
portion for free, or SA's BT client would merely choose which files in
the torrent to download (assuming there are perl-based clients that
support that... libtorrent does, but that's C-based), as it would
contain full.cf, [n-1].diff, [n-2].diff, [n-3].diff, and [last release
yesterday].diff (or the like).

... this is similar to my proposal for a distributed Blue Frog rehash,
http://khopesh.com/wiki/Ending_spam

-- 
Adam Katz
khopesh on irc://irc.freenode.net/#spamassassin
http://khopesh.com/Anti-spam

Re: 419 emailBL?

Posted by Mike Cardwell <sp...@lists.grepular.com>.
Steve Freegard wrote:

>> For listing both emails and uri's it would be useful if you could add
>> regular expressions. I'm not sure how you'd serve such an RBL though
>> without writing your own custom software or modifying an existing dns
>> server. Eg, it would be nice if you could add entries like this to the rbl:
>>
>> ^(?i)https?://[a-z]+\.example\.com/unsubscribe\.cgi\?id=\d+$
>>
>> And:
>>
>> ^(?i)customer-service-[A-Z]\d+@example\.(?:com|co\.uk)$
>>
> 
> Yuck; if you want to do stuff using regexp then:
> 
> uri RULE_NAME /<regexp>/
> score RULE_NAME nn.nnn
> 
> Is the best way to do this - not via DNS.

Depends what you're trying to achieve. I thought the objective was a 
block list of email addresses that could be queried via the DNS by any 
application... Your suggestion doesn't really capture the requirements.

In this particular example, the list should be used for preventing your 
users sending emails *to* those addresses. Many organisations rightly or 
wrongly don't perform spam filtering on their outgoing relays so 
spamassassin is a bit over the top when you can just use another dns 
based bl.

-- 
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)

Re: 419 emailBL?

Posted by Steve Freegard <st...@stevefreegard.com>.
Mike Cardwell wrote:
> Steve Freegard wrote:
> 
>>>> A word of caution.  Be very careful how you use the list.  The
>>>> intended usage for the list is to prevent (or monitor) local users
>>>> from sending email to the listed addresses.  The phishers frequently
>>>> use compromised end-user accounts to receive the phishing replies, so
>>>> there is a high risk of false positives, especially if you attempt to
>>>> classify messages containing one these addresses as spam.
>>> Thread fork!
>>>
>>> Would it be useful to have a similar list for 419 fraud contact
>>> addresses?
>>>
>>> Discuss...
>>
>> That was always my intention - there are a couple of us looking at
>> several methods of automatically listing e-mail addresses present in the
>> body of spam or the Reply-To header to specifically target stuff that
>> often slips though with low scores.
>>
>> I'm also looking at listing URIs that are impossible to list in the
>> traditional URIBLs  e.g. groups.yahoo.com/groupname/message/1
> 
> For listing both emails and uri's it would be useful if you could add
> regular expressions. I'm not sure how you'd serve such an RBL though
> without writing your own custom software or modifying an existing dns
> server. Eg, it would be nice if you could add entries like this to the rbl:
> 
> ^(?i)https?://[a-z]+\.example\.com/unsubscribe\.cgi\?id=\d+$
> 
> And:
> 
> ^(?i)customer-service-[A-Z]\d+@example\.(?:com|co\.uk)$
> 

Yuck; if you want to do stuff using regexp then:

uri RULE_NAME /<regexp>/
score RULE_NAME nn.nnn

Is the best way to do this - not via DNS.

Regards,
Steve.

Re: 419 emailBL?

Posted by Mike Cardwell <sp...@lists.grepular.com>.
Steve Freegard wrote:

>>> A word of caution.  Be very careful how you use the list.  The
>>> intended usage for the list is to prevent (or monitor) local users
>>> from sending email to the listed addresses.  The phishers frequently
>>> use compromised end-user accounts to receive the phishing replies, so
>>> there is a high risk of false positives, especially if you attempt to
>>> classify messages containing one these addresses as spam.
>> Thread fork!
>>
>> Would it be useful to have a similar list for 419 fraud contact addresses?
>>
>> Discuss...
> 
> That was always my intention - there are a couple of us looking at
> several methods of automatically listing e-mail addresses present in the
> body of spam or the Reply-To header to specifically target stuff that
> often slips though with low scores.
> 
> I'm also looking at listing URIs that are impossible to list in the
> traditional URIBLs  e.g. groups.yahoo.com/groupname/message/1

For listing both emails and uri's it would be useful if you could add 
regular expressions. I'm not sure how you'd serve such an RBL though 
without writing your own custom software or modifying an existing dns 
server. Eg, it would be nice if you could add entries like this to the rbl:

^(?i)https?://[a-z]+\.example\.com/unsubscribe\.cgi\?id=\d+$

And:

^(?i)customer-service-[A-Z]\d+@example\.(?:com|co\.uk)$

-- 
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)

Re: 419 emailBL?

Posted by Steve Freegard <st...@stevefreegard.com>.
John Hardin wrote:
> On Wed, 29 Apr 2009, Jesse Thompson wrote:
> 
>> A word of caution.  Be very careful how you use the list.  The
>> intended usage for the list is to prevent (or monitor) local users
>> from sending email to the listed addresses.  The phishers frequently
>> use compromised end-user accounts to receive the phishing replies, so
>> there is a high risk of false positives, especially if you attempt to
>> classify messages containing one these addresses as spam.
> 
> Thread fork!
> 
> Would it be useful to have a similar list for 419 fraud contact addresses?
> 
> Discuss...
> 

That was always my intention - there are a couple of us looking at
several methods of automatically listing e-mail addresses present in the
body of spam or the Reply-To header to specifically target stuff that
often slips though with low scores.

I'm also looking at listing URIs that are impossible to list in the
traditional URIBLs  e.g. groups.yahoo.com/groupname/message/1

Cheers,
Steve.

419 emailBL?

Posted by John Hardin <jh...@impsec.org>.
On Wed, 29 Apr 2009, Jesse Thompson wrote:

> A word of caution.  Be very careful how you use the list.  The intended 
> usage for the list is to prevent (or monitor) local users from sending 
> email to the listed addresses.  The phishers frequently use compromised 
> end-user accounts to receive the phishing replies, so there is a high 
> risk of false positives, especially if you attempt to classify messages 
> containing one these addresses as spam.

Thread fork!

Would it be useful to have a similar list for 419 fraud contact addresses?

Discuss...

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   You do not examine legislation in the light of the benefits it
   will convey if properly administered, but in the light of the
   wrongs it would do and the harms it would cause if improperly
   administered.                                  -- Lyndon B. Johnson
-----------------------------------------------------------------------
  9 days until the 64th anniversary of VE day

Re: my emailBL is live!

Posted by John Hardin <jh...@impsec.org>.
On Wed, 29 Apr 2009, Adam Katz wrote:

> Okay, back to using the second half of the MD5 (simple enough, since
> that was my original implementation).  Relevant code:
>
> $hash =~ s/@.*//;
> $hash =~ tr [A-Z] [a-z];
> $hash = substr(Digest::MD5::md5_hex($hash),16); # 2nd 16 of 32 chars

...can you go through your logic for throwing away half of the MD5 again?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   We are hell-bent and determined to allocate the talent, the
   resources, the money, the innovation to absolutely become a
   powerhouse in the ad business.       -- Microsoft CEO Steve Ballmer
   ...because allocating talent to securing Windows isn't profitable?
-----------------------------------------------------------------------
  9 days until the 64th anniversary of VE day

Re: my emailBL is live!

Posted by Adam Katz <an...@khopis.com>.
Jesse Thompson wrote:
> A word of caution.  Be very careful how you use the list.  The
> intended usage for the list is to prevent (or monitor) local users
> from sending email to the listed addresses.  The phishers
> frequently use compromised end-user accounts to receive the
> phishing replies, so there is a high risk of false positives,
> especially if you attempt to classify messages containing one these
> addresses as spam.

That might just mean that the SpamAssassin rule that uses it should
require some other phishing-detection rule(s) to hit as well.

Rob McEwen wrote:
> OK. I was wrong. Due to this discussion, I'm convinced that MD5 of
> the whole (lower case!) e-mail address is best, with the entire
> e-mail address still showing up in plain text in the DNS txt
> record.

Okay, back to using the second half of the MD5 (simple enough, since
that was my original implementation).  Relevant code:

$hash =~ s/@.*//;
$hash =~ tr [A-Z] [a-z];
$hash = substr(Digest::MD5::md5_hex($hash),16); # 2nd 16 of 32 chars

You can look up an address by hash by pretending its domain is "hash"
(note the collision in the example).

I've added support for a new type, Z.  Z means the email address
should not be revealed (perhaps this should always be the case?).  The
TXT record for hash lookup will return "@hidden@" ... see the test for
hidden@ example.com below.

David B Funk recommended SURBL's merged results as a bandwidth-saver:
> EG: A == 127.0.0.2
>     B == 127.0.0.4
>     C == 127.0.0.8
>     D == 127.0.0.16
> 
> thus AB == 127.0.0.6
>     AC == 127.0.0.10
> 
> etc.

I like it!  Why not start at one? A=1 B=2 C=4 D=8 Z=n/a.  This
facilitates:

  $type_list =~ s/.*,([A-IZ]+),.*/$1/;
  if ($type_list =~ /Z/) {
    $email =~ s/\t".*"/\t"\@hidden\@"/; # hide the email address
    $type_list =~ s/Z//g;
  }
  $type_list =~ s/(?=.)/+/g;
  $type_list =~ tr [ABCD] [1248]; # rewrite when we get an E!
  $type_list = eval 0 . $type_list;


Here are some tests:

$ eblhash() { perl -MDigest::MD5 -e \
  'print substr(Digest::MD5::md5_hex(q('$1')),16)."\n"'; }
$ eblhash test
cade4e832627b4f6
$ host -t txt `eblhash test`.hash.emailbl.khopesh.com.
cade4e832627b4f6.hash.emailbl.khopesh.com descriptive text
"test@example.com"
cade4e832627b4f6.hash.emailbl.khopesh.com descriptive text
"test@emailbl.khopesh.com"
$ host `eblhash test`.example.com.emailbl.khopesh.com.
cade4e832627b4f6.example.com.emailbl.khopesh.com has address 127.0.0.15
$ host -t txt `eblhash hidden`.hash.emailbl.khopesh.com.
e8238a6c0be92190.hash.emailbl.khopesh.com descriptive text "@hidden@"

For test purposes, "text" and "hidden" are also included as their own
hashes.  For legacy purposes, the truncated username model used
earlier is still there.  I'll remove it in time.


I should also mention that this index is updated regularly and will
stay up until it suffers from its success or misuse.

The last-seen date stamp on test@example.com serves as the last day it
was updated, and the SOA record for my khopesh.com domain shows the
last DNS update (the last two digits are the EDT hour or higher if
I've been toying with it).  My sa-channels update every four hours.


-----

I'm also toying with the idea of making the khop-sc-neighbors list
(currently an sa-update channel) a DNSBL with return codes indicating
its networks' rank on an inverse scale from 1-100 (though not
necessarily with a hundred entries), so 127.0.0.100 means the
top-ranked spamming network and 127.0.0.1 is the lowest noted spamming
network.  This becomes a percent to apply a plugin configuration
option's multiplier, so a multiplier of 3 would give the top-ranked
network (127.0.0.100) a score of 3 and a network in the middle
(127.0.0.50) a score of 1.5, etc.  The khop-sc-neighbors channel
examines /24 CIDRs (x in a.b.c.x) and /8 CIDRs (x.y.z in a.x.y.z),
which would be 127.0.CIDR.PERCENT, so 127.0.24.100 would be the top
/24 offender and 127.0.8.100 would be the top /8 offender.


-- 
Adam Katz
khopesh on irc://irc.freenode.net/#spamassassin
http://khopesh.com/Anti-spam

Re: my emailBL is live!

Posted by Jesse Thompson <je...@doit.wisc.edu>.
Adam Katz wrote:
> This was actually rather simple to set up.  I'll publish the code
[snip]

Thanks for your efforts with this.  I forwarded your message to the APER 
mailing list.

A word of caution.  Be very careful how you use the list.  The intended 
usage for the list is to prevent (or monitor) local users from sending 
email to the listed addresses.  The phishers frequently use compromised 
end-user accounts to receive the phishing replies, so there is a high 
risk of false positives, especially if you attempt to classify messages 
containing one these addresses as spam.

Jesse

-- 
   Jesse Thompson
   Division of Information Technology, University of Wisconsin-Madison
   Email/IM: jesse.thompson@doit.wisc.edu

Re: my emailBL is live!

Posted by Mike Cardwell <sp...@lists.grepular.com>.
Adam Katz wrote:

> Mike Cardwell contended:
>>> It would definitely require a hashing algorithm, like MD5. IIRC
>>> there is a maximum length for a hostname, and that is 255
>>> characters. What if the hostname in your email address is 255
>>> characters long on it's own...?
> 
> When MD5sums were first proposed (in place of my wild escaping), it
> seemed like a great idea.  However, a voice in the back of my head,
> now spoken (typed?) by Rob, has been growing louder.  My
> implementation now merely truncates email usernames to 16 characters
> (plus the noted defanging, which makes it complicated again ...) and
> replaces the @ with a dot (not an underscore, that's not a legal
> character).

Hmmm. I'm still not convinced you've done it the best way. That 
conversion sounds a lot more complicated than a straight MD5 conversion, 
and it doesn't deal with the fact that there is a maximum length for an 
FQDN.

> In fact, collisions here could be regarded as good, as usernames that
> long can include tracking strings (e.g. the mailer for our list,
> users-return-12345-joe=bob.com@ spamassassin.apache.org, becomes
> users-return-123.spamassassin.apache.org), which should help.

That could be seen as an advantage I suppose. But, the particular source 
list being used here wasn't meant to be used that way. Some people might 
consider such hits as false positives.

> I did fully implement my proposed latter 16 characters (of MD5's 32)
> plus dot plus the domain, complete with hash lookups, but I just
> removed it (which is why non-test lookups will fail for the next ~4h).
> 
>>> Having access to the plain text email address would only make it
>>> easier for ISPs to do anything if they had access to the zone file.
>>> In which case, you could just give them access to a separate list
>>> which has the email addresses in plain text.
> 
> Unless we're replacing the currently well-groomed upstream source at
> http://anti-phishing-email-reply.googlecode.com/#, I see no reason to
> offer such services (since they do it better).
> 
>>> So in rbldnsd, ...
> 
> Whoa, what's that?!  Interesting ... it's even in Debian.  I think I'm
> happy with BIND for the moment, since my origin point is hidden from
> use and the actual NS records are merely slaves run by zoneedit (so
> efficiency isn't really important).  I probably need to stay on BIND
> as I doubt I could use rbldnsd to host my SpamAssassin channels.

I implemented pretty much exactly the same thing that you did, except it 
uses a straight hexadecimal MD5 digest of the full address. I know this 
isn't strictly correct as the local part of an email address is 
technically case sensitive, but as email addresses in the real world are 
case *insensitive* I convert it to lower case before hashing.

Eg:

root@haven:/var/lib/rbldns# host -t a 
bda05135a5b8a92d5d2934531864442d.phishing.email.rbl.grepular.com
bda05135a5b8a92d5d2934531864442d.phishing.email.rbl.grepular.com 
A       127.0.0.3
bda05135a5b8a92d5d2934531864442d.phishing.email.rbl.grepular.com 
A       127.0.0.1
root@haven:/var/lib/rbldns# host -t txt 
bda05135a5b8a92d5d2934531864442d.phishing.email.rbl.grepular.com
bda05135a5b8a92d5d2934531864442d.phishing.email.rbl.grepular.com 
TXT     "20090411"
root@haven:/var/lib/rbldns#

That RBL wont stay public for long so don't use it for anything other 
than a quick test.

Here's the code I use to download the data and populate an rbldnsd file:

https://secure.grepular.com/phishing_addresses.txt

You might find something you can strip out and re-use.

Here are the Exim acls I use to query it for the envelope sender, From 
header and Reply-to headers:

acl_smtp_mail:

deny dnslists   = 
phishing.email.rbl.grepular.com/${md5:${lc:$sender_address}}

acl_smtp_data:

deny dnslists   = 
phishing.email.rbl.grepular.com/${md5:${lc:${address:$h_From:}}}

deny dnslists   = 
phishing.email.rbl.grepular.com/${md5:${lc:${address:$h_Reply-To:}}}

I'm not familiar enough with writing SpamAssassin rules yet to write a 
SpamAssassin recipe.

-- 
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)

Re: my emailBL is live!

Posted by Mike Cardwell <sp...@lists.grepular.com>.
David B Funk wrote:

>> When MD5sums were first proposed (in place of my wild escaping), it
>> seemed like a great idea.  However, a voice in the back of my head,
>> now spoken (typed?) by Rob, has been growing louder.  My
>> implementation now merely truncates email usernames to 16 characters
>> (plus the noted defanging, which makes it complicated again ...) and
>> replaces the @ with a dot (not an underscore, that's not a legal
>> character).
> 
> Repeat after me, ALMOST ALL characters (octets actually) are now
> LEGAL in DNS queries (see RFC-2181 section 11).
> 
> There is NO need for -any- kind of munging.

That same RFC says labels are limited to 63 chars and FQDNs are limited 
to 255 chars. So you'd need to mung for those two cases wouldn't you? 
Also, are you 100% sure there are no characters that are allowed in an 
email address local part which aren't allowed in a domain name?

> I've set up an emailBL directly from the Google list, try:
> 
>  host abuse-t@live.com.phish.icaen.uiowa.edu.

"host" on my Debian system spits out warnings. It does however do the 
lookup correctly. You must recognise that there will be compatibility 
problems with your solution in the wild though. One example being Exim's 
dnsdb lookup type, which fails outright doing that lookup.

Here's the warning I get from "host".

host -t a abuse-t@live.com.phish.icaen.uiowa.edu
  *** invalid answer name abuse-t\@live.com.phish.icaen.uiowa.edu after 
A query for abuse-t@live.com.phish.icaen.uiowa.edu
abuse-t\@live.com.phish.icaen.uiowa.edu	A	127.0.0.2
  !!! abuse-t\@live.com.phish.icaen.uiowa.edu A record has illegal name

What exactly is the problem with hashing the address anyway? We'll 
forget accidental collisions as they simply wont happen.

> IE "address.phish.icaen.uiowa.edu"
> 
> NO need for hashing, no collsions, etc.
 >
> Also makes it easier to deploy into an address filter/blocker in
> your smtp-MTA (to prevent local llusers from being reply to one
> of those addresses).
 >
> 
> BTW notice that the Google data is multi-valued in the TYPE field.
> rather than a simple enumeration of that data into an address it
> is better to turn it into a bit-mask, as then multiple values can
> be represented (and queried) in a single address/operation.
> 
> EG: A == 127.0.0.2
>     B == 127.0.0.4
>     C == 127.0.0.8
>     D == 127.0.0.16
> 
> thus AB == 127.0.0.6
>     AC == 127.0.0.10
> 
> etc.
> 
> So the entry for 'abuse-t@live.com' only has an 'A' type.
> 
>  host account-teamdept@live.com.phish.icaen.uiowa.edu. => 127.0.0.10
> 
> so the entry for 'account-teamdept@live.com' has an 'A' & 'C' type.

Yeah, that might be a good idea.

-- 
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)

Re: my emailBL is live!

Posted by Adam Katz <an...@khopis.com>.
David B Funk wrote:
> Umm, I guess you didn't understand what the ".phish.icaen.uiowa.edu" part
> of "address.phish.icaen.uiowa.edu" ment.

D'oh!  Sorry, doing too many things at once.  You're right, that
worked for me.  However, you still have Mike's issue of 63 characters
per label and 255 characters total, the support issue, plus all the
wasted bandwidth with such a long name.

Also, I'd be hesitant dealing with certain special characters,
technically legal but potentially dangerous, like [@*"'?%,] et al.

> Unless you've got an obsolete version of software this does work.
> In bind if you use the "check-names ignore" option for that zone it
> does -NOT- require munging. (I'm running mine that way, so I know
> that it works.)

Isn't there a reason they recently re-enabled check-names by default?

> Have you followed the development of the SURBL service? They
> explicitly switched to the bit-mask format to reduce DNS load.

Obviously not.  Interesting.

Re: my emailBL is live!

Posted by David B Funk <db...@engineering.uiowa.edu>.
On Wed, 29 Apr 2009, Adam Katz wrote:

> David B Funk wrote:
> > Repeat after me, ALMOST ALL characters (octets actually) are now
> > LEGAL in DNS queries (see RFC-2181 section 11).
> >
> > There is NO need for -any- kind of munging.
>
> First, you must start and end a domain label ("octet" refers to IP
> addresses) with a letter or number, so munging is still required.
> Second, DNS thrives on caching, peering, and slaves; if BIND or other
> major name servers can't handle it, it won't fly.  I'm running the
> latest version of BIND and it required each of the munging steps I
> implemented (except the truncation to 16 chars, which was for
> bandwidth) in order to work.
>
> Also, some of the addresses are forged and should not be listed in the
> plain anyway.  More on that in my next email announcing my md5-enabled
> list, in which I'll propose a type Z for "do not reveal this address."
>
> >     host abuse-t@live.com.phish.icaen.uiowa.edu.
> > NO need for hashing, no collsions, etc.
>
> How about the first entry in the upstream list:
> $ host -- -helpdesk@live.com
> Host -helpdesk@live.com not found: 3(NXDOMAIN)
> $
>
> I guess you have to munge it.

Umm, I guess you didn't understand what the ".phish.icaen.uiowa.edu" part
of "address.phish.icaen.uiowa.edu" ment.

Try:
  host -- -helpdesk@live.com.phish.icaen.uiowa.edu.

Unless you've got an obsolete version of software this does work.
In bind if you use the "check-names ignore" option for that zone it
does -NOT- require munging. (I'm running mine that way, so I know
that it works.)

-- 
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: my emailBL is live!

Posted by David B Funk <db...@engineering.uiowa.edu>.
On Wed, 29 Apr 2009, Adam Katz wrote:

> But your very next topic is contrary to that philosophy...
>
> > BTW notice that the Google data is multi-valued in the TYPE field.
> > rather than a simple enumeration of that data into an address it
> > is better to turn it into a bit-mask, as then multiple values can
> > be represented (and queried) in a single address/operation.
> >
> > EG: A == 127.0.0.2
> >     B == 127.0.0.4
> >     C == 127.0.0.8
> >     D == 127.0.0.16
> >
> > thus AB == 127.0.0.6
> >     AC == 127.0.0.10
> >
> > etc.
>
> I was just following the model used by all other DNSBL/URIBLs.  Round
> robin A records for each letter.  To quote somebody you hold near and
> dear:  it "makes it easier to deploy into an address filter/blocker in
> your smtp-MTA ..."

Have you followed the development of the SURBL service? They explicitly
switched to the bit-mask format to reduce DNS load.


-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: my emailBL is live!

Posted by Adam Katz <an...@khopis.com>.
David B Funk wrote:
> Repeat after me, ALMOST ALL characters (octets actually) are now
> LEGAL in DNS queries (see RFC-2181 section 11).
> 
> There is NO need for -any- kind of munging.

First, you must start and end a domain label ("octet" refers to IP
addresses) with a letter or number, so munging is still required.
Second, DNS thrives on caching, peering, and slaves; if BIND or other
major name servers can't handle it, it won't fly.  I'm running the
latest version of BIND and it required each of the munging steps I
implemented (except the truncation to 16 chars, which was for
bandwidth) in order to work.

Also, some of the addresses are forged and should not be listed in the
plain anyway.  More on that in my next email announcing my md5-enabled
list, in which I'll propose a type Z for "do not reveal this address."

>     host abuse-t@live.com.phish.icaen.uiowa.edu.
> NO need for hashing, no collsions, etc.

How about the first entry in the upstream list:
$ host -- -helpdesk@live.com
Host -helpdesk@live.com not found: 3(NXDOMAIN)
$

I guess you have to munge it.

> Also makes it easier to deploy into an address filter/blocker in
> your smtp-MTA (to prevent local llusers from being reply to one
> of those addresses).

But your very next topic is contrary to that philosophy...

> BTW notice that the Google data is multi-valued in the TYPE field.
> rather than a simple enumeration of that data into an address it
> is better to turn it into a bit-mask, as then multiple values can
> be represented (and queried) in a single address/operation.
> 
> EG: A == 127.0.0.2
>     B == 127.0.0.4
>     C == 127.0.0.8
>     D == 127.0.0.16
> 
> thus AB == 127.0.0.6
>     AC == 127.0.0.10
> 
> etc.

I was just following the model used by all other DNSBL/URIBLs.  Round
robin A records for each letter.  To quote somebody you hold near and
dear:  it "makes it easier to deploy into an address filter/blocker in
your smtp-MTA ..."

Re: my emailBL is live!

Posted by David B Funk <db...@engineering.uiowa.edu>.
> When MD5sums were first proposed (in place of my wild escaping), it
> seemed like a great idea.  However, a voice in the back of my head,
> now spoken (typed?) by Rob, has been growing louder.  My
> implementation now merely truncates email usernames to 16 characters
> (plus the noted defanging, which makes it complicated again ...) and
> replaces the @ with a dot (not an underscore, that's not a legal
> character).

Repeat after me, ALMOST ALL characters (octets actually) are now
LEGAL in DNS queries (see RFC-2181 section 11).

There is NO need for -any- kind of munging.

I've set up an emailBL directly from the Google list, try:

 host abuse-t@live.com.phish.icaen.uiowa.edu.

IE "address.phish.icaen.uiowa.edu"

NO need for hashing, no collsions, etc.
Also makes it easier to deploy into an address filter/blocker in
your smtp-MTA (to prevent local llusers from being reply to one
of those addresses).


BTW notice that the Google data is multi-valued in the TYPE field.
rather than a simple enumeration of that data into an address it
is better to turn it into a bit-mask, as then multiple values can
be represented (and queried) in a single address/operation.

EG: A == 127.0.0.2
    B == 127.0.0.4
    C == 127.0.0.8
    D == 127.0.0.16

thus AB == 127.0.0.6
    AC == 127.0.0.10

etc.

So the entry for 'abuse-t@live.com' only has an 'A' type.

 host account-teamdept@live.com.phish.icaen.uiowa.edu. => 127.0.0.10

so the entry for 'account-teamdept@live.com' has an 'A' & 'C' type.

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

my emailBL is live!

Posted by Adam Katz <an...@khopis.com>.
This was actually rather simple to set up.  I'll publish the code
(AGPL) that runs it in a bit (I need to clean it up to withstand the
heavy-handed criticism on this list ...).  Note, I'm using ZoneEdit's
free NS mirroring, which has limited bandwidth.  I'm willing to pay
their minimum threshold if it gets that popular, but any more than
that and I'll be looking for other options.  (NOT PRODUCTION GRADE!)

A SpamAssassin plugin will be needed to get it working, too ... I
suspect there are gurus here who can do that part as easily as I did
the scraper and BIND code.  If nobody bites, I'll get to it in time.

For now, we have a functional proof-of-concept.  I'll post the code, a
more formal announcement, and more documentation to my blog and
website in a few days ("a few" might be a large number).  The emailBL
syncs with the upstream every 4h (I'd reduce the TTL and increase the
syncing frequency, but I'd risk running out of bandwidth).

(Note, the DNS will take another 1-4 hours to propagate.)


The structure of the upstream list:

    ADDRESS,TYPE[TYPE...],DATE

ADDRESS is an email address like <test@ emailbl.khopesh.com>
TYPE is one or more letters of A B C D as follows:
    A (reply-to)
    B (from, !reply-to)
    C (msg body has ADDRESS)
    D (msg body has ADDRESS obfuscated)
DATE is the last time it was seen, formatted YYYYMMDD, in UTC(?).

The structure of domains in my emailBL index:

    USER.DOMAIN.emailbl.khopesh.com  TXT  <DATE>
    USER.DOMAIN.emailbl.khopesh.com  A    127.0.0.<N_TYPE>

USER is the ADDRESS's username, altered as follows:
  s/^([^@+]{1,16})[^@]*@.*/$1/;  # truncate to 16 characters
  s/^[^a-z0-9]*|[^a-z0-9]*$//g;  # fix leading/trailing chars
  s/[^-a-z.0-9]/-/g;             # fix illegal chars
DOMAIN is the ADDRESS's domain
N_TYPE is a numerical version of TYPE above (A=1, B=2, C=3, D=4)

Main test points (with no space after the at sign, obviously):

    test@ example.com
        -> test.example.com.emailbl.khopesh.com
    test@ emailbl.khopesh.com
        -> test.emailbl.khopesh.com.emailbl.khopesh.com

Alternate test point (mimicking DNSBLs):

    2.0.0.127.emailbl.khopesh.com


Let's pretend we're in a shell (I've spaced all emails):
################

# Look up TXT record (last-seen DATE) for <test@ example.com>
$ host -t txt test.example.com.emailbl.khopesh.com.
test.example.com.emailbl.khopesh.com descriptive text "20090328"
$

# Look up A record (inclusion TYPE[s]) for <test@ example.com>
$ host test.example.com.emailbl.khopesh.com.
test.example.com.emailbl.khopesh.com has address 127.0.0.3
test.example.com.emailbl.khopesh.com has address 127.0.0.4
test.example.com.emailbl.khopesh.com has address 127.0.0.1
test.example.com.emailbl.khopesh.com has address 127.0.0.2
$

################


More comments in-line:

Jesse Thompson (developer of anti-phishing-email-reply) wrote me:
> Yes, I and others have thought of it.  But I don't need it since we
> only use the list to scan log files and populate mapping tables.  I
> don't have time or money to do any of this, and I'm kept pretty
> busy just updating the list...on top of my other bazillion other
> responsibilities.
> 
> You are welcome to use the list to create your own URIBL of course.

(Jesse is BCC'd.)  And so I did.  Thanks for keeping the list updated.
 Hopefully this emailBL will open your list to new horizons.  Clearly,
credit for the real work goes to you and the other APER developers.

Rob McEwen wrote:
>>> Personally, I think the obfuscation is overkill. Instead, I'd
>>> prefer to change the "@" symbol to an underscore (and any other
>>> minor change that might be needed to work with dns queries) and
>>> be done with it. This would also make the implementation easier,
>>> and research by ISPs easire.

Mike Cardwell contended:
>> It would definitely require a hashing algorithm, like MD5. IIRC
>> there is a maximum length for a hostname, and that is 255
>> characters. What if the hostname in your email address is 255
>> characters long on it's own...?

When MD5sums were first proposed (in place of my wild escaping), it
seemed like a great idea.  However, a voice in the back of my head,
now spoken (typed?) by Rob, has been growing louder.  My
implementation now merely truncates email usernames to 16 characters
(plus the noted defanging, which makes it complicated again ...) and
replaces the @ with a dot (not an underscore, that's not a legal
character).

In fact, collisions here could be regarded as good, as usernames that
long can include tracking strings (e.g. the mailer for our list,
users-return-12345-joe=bob.com@ spamassassin.apache.org, becomes
users-return-123.spamassassin.apache.org), which should help.

I did fully implement my proposed latter 16 characters (of MD5's 32)
plus dot plus the domain, complete with hash lookups, but I just
removed it (which is why non-test lookups will fail for the next ~4h).

>> Having access to the plain text email address would only make it
>> easier for ISPs to do anything if they had access to the zone file.
>> In which case, you could just give them access to a separate list
>> which has the email addresses in plain text.

Unless we're replacing the currently well-groomed upstream source at
http://anti-phishing-email-reply.googlecode.com/#, I see no reason to
offer such services (since they do it better).

>> So in rbldnsd, ...

Whoa, what's that?!  Interesting ... it's even in Debian.  I think I'm
happy with BIND for the moment, since my origin point is hidden from
use and the actual NS records are merely slaves run by zoneedit (so
efficiency isn't really important).  I probably need to stay on BIND
as I doubt I could use rbldnsd to host my SpamAssassin channels.


-- 
Adam Katz
khopesh on irc://irc.freenode.net/#spamassassin
http://khopesh.com/Anti-spam

Re: emailBL

Posted by Mike Cardwell <sp...@lists.grepular.com>.
Rob McEwen wrote:

>> If you're worried about spammers gaming the hash system
> 
> Most likely, they won't care. They'll happily pursue the "low hanging
> fruit". The only exception is if/when freemail ISPs started using such a
> list to start investigating individual accounts for possible
> termination. But, even then, that is a good problem to have.
> 
> Personally, I think the obfuscation is overkill. Instead, I'd prefer to
> change the "@" symbol to an underscore (and any other minor change that
> might be needed to work with dns queries) and be done with it. This
> would also make the implementation easier, and research by ISPs easire.

It would definitely require a hashing algorithm, like MD5. IIRC there is 
a maximum length for a hostname, and that is 255 characters. What if the 
hostname in your email address is 255 characters long on it's own...?

Having access to the plain text email address would only make it easier 
for ISPs to do anything if they had access to the zone file. In which 
case, you could just give them access to a separate list which has the 
email addresses in plain text. Alternatively, just stick the original 
email address in the TXT record. So in rbldnsd, you'd have a record like 
this:

98f22901b17b13d910456597685c1963 :127.0.0.1:the.real@email.address

Doing an A record lookup on 98f22901b17b13d910456597685c1963.example.com 
would return "127.0.0.1" and doing a TXT record returns 
"the.real@email.address". There's no advantage of sticking the email 
address in the TXT record rather than having a separate file, apart from 
keeping the data together.

-- 
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)

Re: emailBL

Posted by Rob McEwen <ro...@invaluement.com>.
Ben Winslow wrote:
> If you're worried about spammers gaming the hash system

Most likely, they won't care. They'll happily pursue the "low hanging
fruit". The only exception is if/when freemail ISPs started using such a
list to start investigating individual accounts for possible
termination. But, even then, that is a good problem to have.

Personally, I think the obfuscation is overkill. Instead, I'd prefer to
change the "@" symbol to an underscore (and any other minor change that
might be needed to work with dns queries) and be done with it. This
would also make the implementation easier, and research by ISPs easire.

As with all DNSBLs, the really hard part is not listing legitimate
items. For example, consider that guy out there is probably sending
financial newsletters to his very own clients, uses his ISP's MTA for
sending, but uses a gmail "from" address. His e-mail address might have
a high chance of being mistakenly blacklisted!

The last time 2-3 times I saw this idea come up on either SA or Spam-L,
I recall that the idea was strongly shot down by a number of people for
this and other reasons. But I kept out of the discussion and I actually
thought this could be a great idea... if done right and if FPs are kept
to a minimum. I'd been planning on starting such a list for quite some
time, but it kept getting delayed by more urgent needs.

-- 
Rob McEwen
http://dnsbl.invaluement.com/
rob@invaluement.com
+1 (478) 475-9032



Re: emailBL

Posted by Ben Winslow <wi...@pa.net>.
On Tue, 28 Apr 2009 02:09:02 +0100
Steve Freegard <st...@stevefreegard.com> wrote:
> Well in the case of an emailBL - the worst that can happen is that one
> listed md5 collides with an innocent e-mail address.  By adding in the
> string length it reduces that possibility because both colliding
> addresses would have to be exactly the same length.  I believe you'll
> find that ClamAV uses this method for it's MD5 signatures - to get a
> match it has to match the MD5 and the file size has to match.

MD5 already adds the message length (in bits, as a 64-bit integer) at
the very end of the input before the hash is finalized, so adding it
again as an ASCII representation of bytes isn't really going to improve
anything.

If you're worried about spammers gaming the hash system (e.g. using a
botnet to compute an address with a hash which collides with some
target address), you should bite the bullet and use a longer hash
(something in the SHA family, maybe?)  You could make up for the extra
hash length (in terms of DNS traffic) by using a more efficient encoding
of the hash than hex (e.g. base64 or better) with the obvious caveat
that it'd be more difficult to query.

Given that most software will need new code to support an
email-address-based BL, you should give operational concerns (e.g.
bandwidth requirements) some serious thought while you have the chance.

-- 
Ben Winslow <wi...@pa.net>

Re: emailBL

Posted by Steve Freegard <st...@stevefreegard.com>.
John Hardin wrote:
> 
> I suppose I should ask, what do you mean by a spammer "reversing the list"?
> 

I guess I meant that it makes it harder for the spammer if he/she gets a
copy of the list to casually look for addresses to avoid without doing
the extra work of encoding the address in the same way and looking it
up.  But with fresh eyes this morning the benefit of this is tenuous -
it just means that they have to do a bit of extra work ;-)

My idea for creating an emailBL was in the vain hope that if I could get
it to work well enough that the actual mailbox providers hosting the
dropboxes might actually use it to terminate the mailbox provided I let
them see evidence for each address (I know - probably no chance of that;
but I can hope).

I'm also thinking of doing the same with 'full URIs' that cannot be
listed by the existing URI blacklists due to the spammers abusing
services specifically to avoid the existing lists so they don't burn up
an actual domain name e.g. http://groups.yahoo.com/groupname/message/1
would be as easy as:

smf@laptop-smf:~$ perl -MDigest::MD5 -e
'$uri="http://groups.yahoo.com/groupname/message/1"; print
Digest::MD5::md5_hex($uri).length($uri).".bl.org\n"'
f499f872e8276a4777c3dba48481915a43.bl.org

Cheers,
Steve.

Re: emailBL

Posted by John Hardin <jh...@impsec.org>.
On Tue, 28 Apr 2009, Steve Freegard wrote:

> John Hardin wrote:
>> On Tue, 28 Apr 2009, Steve Freegard wrote:
>>
>>> To reduce the likelihood of collisions then it's better to add the input
>>> string length at the end of the md5 like ClamAV does in it's MD5 sigs
>>> e.g.
>>>
>>> smf@laptop-smf:~$ perl -MDigest::MD5 -e '$email="sa\@fsg.com"; print
>>> Digest::MD5::md5_hex($email).length($email).".emailbl.org\n"'
>>> c18782f8d94595d5e016e3ab9ab3f8f610.emailbl.org
>>>
>>> This also has the benefit of making it impossible to reverse the list
>>> if the spammer were to rsync the list.
>>
>> ...huh? If MD5 isn't cryptographically secure, how will adding some
>> extra characters onto the end make it stronger?
>
> Well in the case of an emailBL - the worst that can happen is that one
> listed md5 collides with an innocent e-mail address.

I get that. That's a reasonable counter to hash collisions.

I suppose I should ask, what do you mean by a spammer "reversing the 
list"?

>> And there's no way to keep a spammer from checking to see if a given
>> email address is listed, just as there's no way to keep them from
>> checking whether a given domain name is listed.
>
> Ok - you're right. It's late here ;-)

Sleep well! :)

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Theraputic Phrenologist - send email for affordable rate schedule.
-----------------------------------------------------------------------
  96 days since Obama's inauguration and still no unicorn!

Re: emailBL

Posted by Steve Freegard <st...@stevefreegard.com>.
John Hardin wrote:
> On Tue, 28 Apr 2009, Steve Freegard wrote:
> 
>> To reduce the likelihood of collisions then it's better to add the input
>> string length at the end of the md5 like ClamAV does in it's MD5 sigs
>> e.g.
>>
>> smf@laptop-smf:~$ perl -MDigest::MD5 -e '$email="sa\@fsg.com"; print
>> Digest::MD5::md5_hex($email).length($email).".emailbl.org\n"'
>> c18782f8d94595d5e016e3ab9ab3f8f610.emailbl.org
>>
>> This also has the benefit of making it impossible to reverse the list
>> if the spammer were to rsync the list.
> 
> ...huh? If MD5 isn't cryptographically secure, how will adding some
> extra characters onto the end make it stronger?

Well in the case of an emailBL - the worst that can happen is that one
listed md5 collides with an innocent e-mail address.  By adding in the
string length it reduces that possibility because both colliding
addresses would have to be exactly the same length.  I believe you'll
find that ClamAV uses this method for it's MD5 signatures - to get a
match it has to match the MD5 and the file size has to match.

> And there's no way to keep a spammer from checking to see if a given
> email address is listed, just as there's no way to keep them from
> checking whether a given domain name is listed.

Ok - you're right. It's late here ;-)

Cheers,
Steve.

Re: emailBL

Posted by John Hardin <jh...@impsec.org>.
On Tue, 28 Apr 2009, Steve Freegard wrote:

> To reduce the likelihood of collisions then it's better to add the input
> string length at the end of the md5 like ClamAV does in it's MD5 sigs e.g.
>
> smf@laptop-smf:~$ perl -MDigest::MD5 -e '$email="sa\@fsg.com"; print
> Digest::MD5::md5_hex($email).length($email).".emailbl.org\n"'
> c18782f8d94595d5e016e3ab9ab3f8f610.emailbl.org
>
> This also has the benefit of making it impossible to reverse the list if 
> the spammer were to rsync the list.

...huh? If MD5 isn't cryptographically secure, how will adding some extra 
characters onto the end make it stronger?

And there's no way to keep a spammer from checking to see if a given email 
address is listed, just as there's no way to keep them from checking 
whether a given domain name is listed.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Public Education: the bureaucratic process of replacing
   an empty mind with a closed one.                          -- Thorax
-----------------------------------------------------------------------
  96 days since Obama's inauguration and still no unicorn!

Re: emailBL

Posted by Mike Cardwell <sp...@lists.grepular.com>.
Dave Funk wrote:

>> Nah - I really don't like it that way; it doesn't really bring you any
>> benefit and is more likely to cause collisions if you do it that way.
>> Don't see how it can cause less DNS traffic either.  At least using MD5
>> hashes your DNS query will only be 32 characters + blacklist zone name
>> regardless of the size of the input string.
>>
>> To reduce the likelihood of collisions then it's better to add the input
>> string length at the end of the md5 like ClamAV does in it's MD5 sigs 
>> e.g.
>>
>> smf@laptop-smf:~$ perl -MDigest::MD5 -e '$email="sa\@fsg.com"; print
>> Digest::MD5::md5_hex($email).length($email).".emailbl.org\n"'
>> c18782f8d94595d5e016e3ab9ab3f8f610.emailbl.org
>>
>> This also has the benefit of making it impossible to reverse the list if
>> the spammer were to rsync the list.
> 
> Silly question, given that RFC-2181 says that you can put almost anything
> you want into a DNS zone file, why go to the bother with the munging, 
> why not just put the raw unadulterated e-mail address in there and do 
> direct queries on it?
> 
> EG: nslookup systems@administrativos.com.marc.icaen.uiowa.edu.
> 
> Assuming you're running reasonably up-2-date DNS stuff it does just work.

You can also put pretty much any character you want in an email address 
local part. Eg, this is a valid email address...

"Personal Email@O'Reilly, Peter"@example.com

MD5 is cryptographically secure enough for this purpose. Just hashing 
the entire address with md5 is the simplest and most workable solution. 
I expect it would be simple to use such a bl in all modern mta's without 
too much hacking. Eg, in Exim, the configuration to look up such an 
address against an emailbl called "example.com" would be (untested):

deny dnslists = example.com/${md5:$sender_address}
      message  = $sender_address is listed on $dnslist_domain

-- 
Mike Cardwell
(https://secure.grepular.com) (http://perlcv.com/)

Re: emailBL

Posted by Dave Funk <db...@engineering.uiowa.edu>.
On Tue, 28 Apr 2009, Steve Freegard wrote:

> Nah - I really don't like it that way; it doesn't really bring you any
> benefit and is more likely to cause collisions if you do it that way.
> Don't see how it can cause less DNS traffic either.  At least using MD5
> hashes your DNS query will only be 32 characters + blacklist zone name
> regardless of the size of the input string.
>
> To reduce the likelihood of collisions then it's better to add the input
> string length at the end of the md5 like ClamAV does in it's MD5 sigs e.g.
>
> smf@laptop-smf:~$ perl -MDigest::MD5 -e '$email="sa\@fsg.com"; print
> Digest::MD5::md5_hex($email).length($email).".emailbl.org\n"'
> c18782f8d94595d5e016e3ab9ab3f8f610.emailbl.org
>
> This also has the benefit of making it impossible to reverse the list if
> the spammer were to rsync the list.

Silly question, given that RFC-2181 says that you can put almost anything
you want into a DNS zone file, why go to the bother with the munging, why 
not just put the raw unadulterated e-mail address in there and do direct 
queries on it?

EG: nslookup systems@administrativos.com.marc.icaen.uiowa.edu.

Assuming you're running reasonably up-2-date DNS stuff it does just work.


-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: emailBL

Posted by Steve Freegard <st...@stevefreegard.com>.
Adam Katz wrote:
> Steve Freegard wrote:
>> I've been thinking about creating an emailBL to target dropboxes used
>> for 419 scams, phishing, russian penpals etc. as I have a reasonable way
>> to collect these in real-time and it would close a lot of doors on these
>> folks provided I can avoid being caught by address stuffing.
>>
>> However - rather than trying to do some sort of munging to work with
>> DNS; I was simply going to either MD5 or SHA1 the e-mail address e.g.
>>
>> smf@laptop-smf:~$ perl -MDigest::MD5 -e 'print
>> Digest::MD5::md5_hex("sa@fsg.com").".emailbl.org\n"'
>> 132e76bc8e252dee7c911ea2cde1f079.emailbl.org
> 
> I'm under the impression that DNSBLs reverse the IP address (e.g.
> 2.0.0.127.bl.spamcop.net) so the hierarchical ordering can be
> preserved

Yeah - only really relevant if you are listing IP addresses and want to
use wildcards in the DNS zone.  For listing e-mail addresses it isn't
relevant.

>  but checksumming would be *significantly* better than my
> proposal.  Perhaps just the username, and perhaps a tighter hash (more
> collisions, less DNS traffic), e.g. for your proffered sa @ fsg.com:
> 
> $ perl -MDigest::MD5 -e 'print substr(Digest::MD5::md5_hex("sa"),16) .
> ".fsg.com.emailbl.org\n"'
> 7e1e9e4aedb8242d.fsg.com.emailbl.org
> $
> 

Nah - I really don't like it that way; it doesn't really bring you any
benefit and is more likely to cause collisions if you do it that way.
Don't see how it can cause less DNS traffic either.  At least using MD5
hashes your DNS query will only be 32 characters + blacklist zone name
regardless of the size of the input string.

To reduce the likelihood of collisions then it's better to add the input
string length at the end of the md5 like ClamAV does in it's MD5 sigs e.g.

smf@laptop-smf:~$ perl -MDigest::MD5 -e '$email="sa\@fsg.com"; print
Digest::MD5::md5_hex($email).length($email).".emailbl.org\n"'
c18782f8d94595d5e016e3ab9ab3f8f610.emailbl.org

This also has the benefit of making it impossible to reverse the list if
the spammer were to rsync the list.

>> If you want to separate stuff out into different meanings e.g. the
>> Google Anti-Phishing stuff; then just use a different sub-domain for each.
> 
> Ah, but DNSBLs and URIBLs already have that ability; they can answer
> anything in the 127.0.0.0/8 space.  Using a different sub-domain would
> mean differing DNS lookups, which means more traffic (which is why if
> you look at the SA code for Spamhaus's DNSBL, all queries go to
> zen.spamhaus.org).
> 

Yeah - you're absolutely right.  Be sure to read
http://tools.ietf.org/html/draft-irtf-asrg-bcp-blacklists-05 if you are
going to publish a public list.

Regards,
Steve.

Re: emailBL

Posted by Adam Katz <an...@khopis.com>.
Steve Freegard wrote:
> I've been thinking about creating an emailBL to target dropboxes used
> for 419 scams, phishing, russian penpals etc. as I have a reasonable way
> to collect these in real-time and it would close a lot of doors on these
> folks provided I can avoid being caught by address stuffing.
> 
> However - rather than trying to do some sort of munging to work with
> DNS; I was simply going to either MD5 or SHA1 the e-mail address e.g.
> 
> smf@laptop-smf:~$ perl -MDigest::MD5 -e 'print
> Digest::MD5::md5_hex("sa@fsg.com").".emailbl.org\n"'
> 132e76bc8e252dee7c911ea2cde1f079.emailbl.org

I'm under the impression that DNSBLs reverse the IP address (e.g.
2.0.0.127.bl.spamcop.net) so the hierarchical ordering can be
preserved, but checksumming would be *significantly* better than my
proposal.  Perhaps just the username, and perhaps a tighter hash (more
collisions, less DNS traffic), e.g. for your proffered sa @ fsg.com:

$ perl -MDigest::MD5 -e 'print substr(Digest::MD5::md5_hex("sa"),16) .
".fsg.com.emailbl.org\n"'
7e1e9e4aedb8242d.fsg.com.emailbl.org
$

> If you want to separate stuff out into different meanings e.g. the
> Google Anti-Phishing stuff; then just use a different sub-domain for each.

Ah, but DNSBLs and URIBLs already have that ability; they can answer
anything in the 127.0.0.0/8 space.  Using a different sub-domain would
mean differing DNS lookups, which means more traffic (which is why if
you look at the SA code for Spamhaus's DNSBL, all queries go to
zen.spamhaus.org).

Re: emailBL

Posted by Steve Freegard <st...@stevefreegard.com>.
Adam Katz wrote:
> (note, I'm guessing at the appropriate mailing list for cross-post)
> 
> Dennis Davis wrote:
>> http://code.google.com/p/anti-phishing-email-reply/
>>
>> is also useful as it attempts to detail the compromised accounts.
>> Just block/quarantine email for those accounts.
> 
> Interesting ... this seems like it would be best served by DNS in a
> manner similar to URIBLs ... does such an "emailBL" exist?
> 
> A lookup for 8help@osu.edu (pulled from the live list) on emailBL
> server "emailbl.org" could look like this:
> 
> $ host 8help.AT.osu.edu.emailbl.org
> 8help.AT.osu.edu.emailbl.org has address 127.0.0.1
> $ host -t txt 8help.AT.osu.edu.emailbl.org
> 8help.AT.osu.edu.emailbl.org has descriptive text "20090310"
> $
> 
> This maps 127.0.0.1 to type A, .2 to type B, etc.  Expirations, if
> even necessary given the fact that the DNS entries should be updated
> by the server, would be in the TXT records as illustrated above.
> 
> Since email addresses contain everything a valid domain can contain,
> the user.AT.domain.tld (which is really user.at.domain.tld since
> domains are not case-sensitive) could be ambiguous if the "user" or
> the "domain" contains ".at." in itself, or whatever workaround we
> create.  My proposed workaround is ".real-at." and an incremented
> numeric suffix like ".real-at2." if needed.  As to pluses, just snip
> them and their trailing data out.
> 
> 8help@osu.edu -> 8help.at.osu.edu
> portal.ac.at.edu@live.com -> portal.ac.at.edu.real-at.live.com
> 123+456@789.xyz -> 123.at.789.xyz
> abc.real-at.def@ghi.jkl -> abc.real-at.def.real-at1.ghi.jkl
> mno.real-at5.pqr@stu.vwx -> mno.real-at5.pqr.real-at6.stu.vwx
> y.real-at999.z@a.at.real-at2.bc ->
>     y.real-at4.z.real-at1000.a.at.real-at999.bc
> 
> This workaround should only find trouble when there are so many digits
> that the overflow creates an invalid email address, which isn't a
> realistic problem.
> 
> (Oh crap, is this a draft for an RFC?)
> 

I've been thinking about creating an emailBL to target dropboxes used
for 419 scams, phishing, russian penpals etc. as I have a reasonable way
to collect these in real-time and it would close a lot of doors on these
folks provided I can avoid being caught by address stuffing.

However - rather than trying to do some sort of munging to work with
DNS; I was simply going to either MD5 or SHA1 the e-mail address e.g.

smf@laptop-smf:~$ perl -MDigest::MD5 -e 'print
Digest::MD5::md5_hex("sa@fsg.com").".emailbl.org\n"'
132e76bc8e252dee7c911ea2cde1f079.emailbl.org

If you want to separate stuff out into different meanings e.g. the
Google Anti-Phishing stuff; then just use a different sub-domain for each.

Just an idea.

Cheers,
Steve.

Re: emailBL

Posted by John Hardin <jh...@impsec.org>.
On Mon, 27 Apr 2009, David B Funk wrote:

> On Mon, 27 Apr 2009, John Hardin wrote:
>
>>  How about "_at_" - I think a leading and trailing underscore will be
>>  very rare in real world domain name parts, especially as you can't
>>  register a domain name having an underscore, and many apps will
>>  discard hostnames with underscores as invalid.
>
> Ever seen a MicroSoft AD "SRV" dns query? Try something like:
>  "_gc._tcp.Default-First-Site._sites.win.ccad.uiowa.edu."

I don't treat anything from Microsoft as a reference implementation.

:)

> Havn't seen one that contains leading and trailing underscores, -yet-.
>
>>  Will the DNS server choke on that? Remember, it only has to be valid
>>  within the scope of a DNS query.
>
> All modern DNS servers will handle it. One of the newer RFCs for
> DNS pretty much throws things wide open (can you say UTF-8 and
> internationalization?).

...yeah I'd forgotten about that. It looks like the MD5 solution is 
probably the most reliable.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   A sword is never a killer, it is but a tool in the killer's hands.
                           -- Lucius Annaeus Seneca (Martial) 4BC-65AD
-----------------------------------------------------------------------
  96 days since Obama's inauguration and still no unicorn!

Re: emailBL

Posted by SM <sm...@resistor.net>.
At 14:54 27-04-2009, David B Funk wrote:
>On Mon, 27 Apr 2009, John Hardin wrote:
>
>>How about "_at_" - I think a leading and trailing underscore will be very
>>rare in real world domain name parts, especially as you can't register
>>a domain name having an underscore, and may apps will discard hostnames
>>with underscores as invalid.
>
>Ever seen a MicroSoft AD "SRV" dns query? Try something like:
>  "_gc._tcp.Default-First-Site._sites.win.ccad.uiowa.edu."
>
>Havn't seen one that contains leading and trailing underscores, -yet-.

The previous comment was about hostnames.  An underscore is not a 
valid character for a hostname.  The example you gave is not a hostname.

Regards,
-sm 


Re: emailBL

Posted by David B Funk <db...@engineering.uiowa.edu>.
On Mon, 27 Apr 2009, John Hardin wrote:

> How about "_at_" - I think a leading and trailing underscore will be very
> rare in real world domain name parts, especially as you can't register
> a domain name having an underscore, and may apps will discard hostnames
> with underscores as invalid.

Ever seen a MicroSoft AD "SRV" dns query? Try something like:
  "_gc._tcp.Default-First-Site._sites.win.ccad.uiowa.edu."

Havn't seen one that contains leading and trailing underscores, -yet-.

> Will the DNS server choke on that? Remember, it only has to be valid
> within the scope of a DNS query.

All modern DNS servers will handle it. One of the newer RFCs for
DNS pretty much throws things wide open (can you say UTF-8 and
internationalization?).

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: emailBL

Posted by Adam Katz <an...@khopis.com>.
mouss:

My list has been using an md5sum hash for the username portion or the
email address for a while now.  As to before that, it replaced any
nonstandard characters with dashes.  Please see my other emails in this
lengthy thread.

Re: emailBL

Posted by mouss <mo...@ml.netoyen.net>.
John Hardin a écrit :
> On Mon, 27 Apr 2009, Karsten Br�ckelmann wrote:
> 
>>> y.real-at999.z @ a.at.real-at2.bc ->
>>>     y.real-at999.z.real-at1000.a.at.real-at2.bc
>>
>> Still ambiguous. So the generated s/at/real-at$n/ is the last occurrence
>> of a numbered "real-at" plus 1.
>>
>> What if we need it twice, and there are 3 such thingies in total? How do
>> we know we only need to "decode" 1 -- or do we need to decode2? Or maybe
>> even all three, if they start at 1...
>>
>> Sorry, Adam. ;)
> 
> How about "_at_" - I think a leading and trailing underscore will be
> very rare in real world domain name parts, especially as you can't
> register a domain name having an underscore, and may apps will discard
> hostnames with underscores as invalid.
> 
> Will the DNS server choke on that? Remember, it only has to be valid
> within the scope of a DNS query.
> 

just use a TXT record (look at DKIM for an example).

>  y.real-at999.z @ a.at.real-at2.bc ->
>      y.real-at999.z._at_.a.at.real-at2.bc
> 

but don't forget addresses like:

päl.émànïtè  @  example.com
"joe @ smith.example" @ example.org
...



Re: emailBL

Posted by John Hardin <jh...@impsec.org>.
On Mon, 27 Apr 2009, Karsten Br�ckelmann wrote:

>> y.real-at999.z @ a.at.real-at2.bc ->
>>     y.real-at999.z.real-at1000.a.at.real-at2.bc
>
> Still ambiguous. So the generated s/at/real-at$n/ is the last occurrence
> of a numbered "real-at" plus 1.
>
> What if we need it twice, and there are 3 such thingies in total? How do
> we know we only need to "decode" 1 -- or do we need to decode2? Or maybe
> even all three, if they start at 1...
>
> Sorry, Adam. ;)

How about "_at_" - I think a leading and trailing underscore will be very 
rare in real world domain name parts, especially as you can't register 
a domain name having an underscore, and may apps will discard hostnames 
with underscores as invalid.

Will the DNS server choke on that? Remember, it only has to be valid 
within the scope of a DNS query.

  y.real-at999.z @ a.at.real-at2.bc ->
      y.real-at999.z._at_.a.at.real-at2.bc

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   A sword is never a killer, it is but a tool in the killer's hands.
                           -- Lucius Annaeus Seneca (Martial) 4BC-65AD
-----------------------------------------------------------------------
  96 days since Obama's inauguration and still no unicorn!

Re: Next Version of SA and New Rule Updates

Posted by Raymond Dijkxhoorn <ra...@prolocation.net>.
Hi!

> Any Idea of when  we will expect a new version of SA or new rule 
> updates. We are getting hit pretty hard with Spam lately.

Feel free to submit rules, dont just sit and wait. ;)

Bye,
Raymond.

Re: Next Version of SA and New Rule Updates

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
Removing the quoted body and changing the Subject after hitting the
Reply button doesn't make it a new post. It is still a reply. Aka
"please don't hijack unrelated threads".

Frankly, I'm almost surprised to see *that* old a version of Lotus Notes
actually honor and set an In-Reply-To header at all. :)


On Mon, 2009-04-27 at 16:58 -0400, Jeremy Davila wrote:
> Any Idea of when  we will expect a new version of SA or new rule
> updates.

When it is done. So much for the standard Open Source answer. You did
read some recent posts talking about "3.3 work ongoing"?

Which version are you running?

> We are getting hit pretty hard with Spam lately.

Did you verify it's not a mis-configuration that's causing too much spam
slip through? Did you try adding some of the local tweaking stuff,
custom rule-sets and third-party plugins that are discussed and
recommended here kind of on a weekly basis?


Oh, and BTW -- your add_header X-Spam-Report setting is broken. :-)

  guenther


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Next Version of SA and New Rule Updates

Posted by Jeremy Davila <JD...@languageworks.com>.
Any Idea of when  we will expect a new version of SA or new rule updates. 
We are getting hit pretty hard with Spam lately. 

Re: emailBL

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
> y.real-at999.z @ a.at.real-at2.bc ->
>     y.real-at999.z.real-at1000.a.at.real-at2.bc

Still ambiguous. So the generated s/at/real-at$n/ is the last occurrence
of a numbered "real-at" plus 1.

What if we need it twice, and there are 3 such thingies in total? How do
we know we only need to "decode" 1 -- or do we need to decode2? Or maybe
even all three, if they start at 1...

Sorry, Adam. ;)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: emailBL

Posted by Adam Katz <an...@khopis.com>.
Adam Katz wrote:
> (note, I'm guessing at the appropriate mailing list for cross-post)

Failure.  I've sent a lead developer a list to an online caching of my
post.

Also, I borked my last example, and online caching sites' defanging
techniques make this proposal impossible to read, so I've spaced out
my refined examples:

8help @ osu.edu -> 8help.at.osu.edu
portal.ac.at.edu @ live.com -> portal.ac.at.edu.real-at.live.com
123+456 @ 789.xyz -> 123.at.789.xyz
abc.real-at.def @ ghi.jkl -> abc.real-at.def.real-at1.ghi.jkl
mno.real-at5.pqr @ stu.vwx -> mno.real-at5.pqr.real-at6.stu.vwx
y.real-at999.z @ a.at.real-at2.bc ->
    y.real-at999.z.real-at1000.a.at.real-at2.bc