You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Christian Recktenwald <sp...@citecs.de> on 2013/03/20 13:59:37 UTC

Re: URL and mail address RBL [was: Hot News]

On Wed, Mar 20, 2013 at 10:26:21AM +0000, Steve Freegard wrote:
> Listing e-mail addresses and URL paths could be done by normalizing them 

yup 

> (e.g. lower-case, stripping query parameters etc.) 

Not necessarily - as I see there would be use cases for complete URLs 
as well as for stripped ones, maybe even for the domain part only.
Further aspect: there are urls pointing clearly to spammy sites
and other ones (I see them often in 419's) pointing to 
a completely legit page (say, an article to bbc.co.uk) used
for something like illustrational purposes only.

similar for email addresses - the domain part or the full address
may be of some value depending on the situation.

> and then hashing them 
> (e.g. MD5/SHA1 etc) and listing the hash.

Good idea. Hashing should completely circumvent character issues.

> As you say though - the issue is collecting the data and populating the 
> lists along and maintaining the rest of the infrastructure that serves it.

How about honeynet.org?