You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2005/03/08 07:23:10 UTC

Re: Re[4]: learn_with_whitelist?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Robert Menschel writes:
> Hello Justin,
> 
> Monday, March 7, 2005, 11:34:00 AM, you wrote:
> 
> JM> Yes, it'd sound great!   That and whitelist_from_spf entries
> JM> (arriving shortly) ;)
> 
> Definitely to be part of this offering!  Look forward to it!
> 
> JM> However we'd want to ensure we can get it into the main distro,
> JM> btw, and I'm not sure we've figured out the ins-and-outs of
> JM> putting SARE stuff into the SpamAssassin main distro (with CLAs
> JM> and all that).  or have we?
> 
> Not in the general case, no, but in this case, where the general
> process of collecting/providing the rules is from a CLA contributor
> (me), and any individual whitelist rule is trivial and therefore does
> not need a CLA as long as the intent to contribute is clearly stated,
> then I think we're covered.
> 
> Alternately, if individual whitelist rules are not considered trivial
> enough to not need a CLA for their submission, we can instead ask for
> copies of the headers of emails, and have CLA contributors create the
> whitelist rules from those, which would ensure all contributed
> whitelist rules are covered by CLA.

I think individual whitelist_foo lines are indeed trivial enough.
A block of like 10 or so probably wouldn't though... but the
building-from-samples case does take care of that as you say.
Cool, sounds good to me ;)

- --j.

> Bob Menschel
> 
> JM> Robert Menschel writes:
> >> Saturday, March 5, 2005, 11:40:54 PM, Matt wrote:
> >> 
> >> MK> If you really need a whitelist, use whitelist_from_rcvd entries,
> >> MK> ...
> >> 
> >> I've been using William Stearns' compiled blacklist available at
> >> http://www.stearns.org/sa-blacklist/sa-blacklist.current.cf and have
> >> contributed to it from time to time.
> >> 
> >> I'm wondering whether it'd be worth while to have a similar file of
> >> contributed whitelist entries (whitelist_from_rcvd rules), contributed
> >> by the community, maintained by one or two people, and available to
> >> the community. 
> >> 
> >> The purpose would be to provide reliable (non-abusable) whitelist
> >> records for common vendors, newsletters, and other sources of email
> >> whose content can sometimes cause false positives (such as when a
> >> Wasington Post newsletter mentions events in Nigeria, drug use, and
> >> advances in high-end time keeping apparel all in the same issue).
> >> 
> >> It can also help web site admins receive admin emails from their hosts
> >> and domain name registrars, without those emails being flagged as spam
> >> by search engine spam rules and others.
> >> 
> >> Some sample rules I've collected:
> >> 
> >> whitelist_from_rcvd   no.reply@1and1.com             kundenserver.de
> >> whitelist_from_rcvd   *@info.aa.com                  aa.com 
> >> whitelist_from_rcvd   *@yahoo.americangreetings.com  americangreetings.com
> >> whitelist_from_rcvd   *@mail.cnn.com                 cnn.com
> >> whitelist_from_rcvd   info@govdelivery.com           agiliti.net
> >> whitelist_from_rcvd   insurance.ca.gov               agiliti.net
> >> whitelist_from_rcvd   *@*.nordstrom.com              nordstrom.com
> >> whitelist_from_rcvd   *@nordstrom.com                nordstrom.com
> >> whitelist_from_rcvd   *@pageaday.com                 workman.com
> >> whitelist_from_rcvd   *@hallmark.com                 hallmark.com
> >> whitelist_from_rcvd   *@*.hallmark.com               hallmark.com
> >> whitelist_from_rcvd   *@freelancejobreport.com       freelance-2.unknowndns.net
> >> 
> >> Do others think this type of collection might be worth while?
> >> 
> >> Is anyone else already providing this type of collection?
> >> 
> >> If Yes to the first, and No to the second, would such a collection
> >> maintained within SARE, with contributions from the entire community,
> >> be of interest?
> >> 
> >> Bob Menschel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFCLUTNMJF5cimLx9ARAiTLAJsEoLlXOODikBsAIWpALrpKnUkH6ACeKtmx
FhEnt6JOhECMyXKWUPN9zp8=
=jS1A
-----END PGP SIGNATURE-----

Re: Whitelist collection project

Posted by Jeff Chan <je...@surbl.org>.

On Wednesday, March 9, 2005, 6:20:10 PM, Robert Menschel wrote:
> Tuesday, March 8, 2005, 8:44:43 PM, Daryl wrote:

>>> Assumption: This activity will focus only on public newsletters,
>>> services, etc., which normally do not contain any private
>>> information. Therefore there will not be any privacy or
>>> confidentiality concerns for the great majority of emails from
>>> these sources.    

DCWOS>> What about emails from banks etc?  I'd think they'd be a good
DCWOS>> candidate for something you want to whitelist based on their
DCWOS>> received headers or SPF.

> Excellent class of whitelist entities, and with high
> privacy/confidentiality concerns.  I don't want people sending banking
> userids or passwords or such around by email.
[...]

DCWOS>> Then again I don't know how many submissions you are expecting,
DCWOS>> so that may be overkill.

> Over time I would expect many hundreds -- all banks, credit unions,
> airlines, hotel chains, retail chains, newspapers, major magazines,
> major NPOs, etc.

> Bob Menschel

If there are going to be different sources and different reasons
for whitelisting, it will probably be useful to take note of
that source/reason info for each record.  If that's a comment
after each record, or a field in a database, or both, it would
probably be helpful for managing these records in the long run.

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/

Re: Whitelist collection project

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.

Robert Menschel wrote:
> Hello Daryl,
> 
> Thursday, March 10, 2005, 5:51:26 PM, you wrote:
> DCWOS> Whatever your favourite way of retrieving DNS records is, will work.
> DCWOS> On Windows you could use nslookup, at a command prompt: ...
> 
> Thanks.  That's a good start.  Now, how will I know when a domain has
> an SPF record to validate upon?  What do they look like?
> 
> When I do this on your domain, I see a TXT record that begins
> 
>>v=spf1
> 
> and has what appears to be two SMTP system addresses (a:), and an
> ~all.

My own domain's SPF record doesn't end in ~all so I'm not sure which 
domain you queried.

> Would I be correct to read it as,
> 
>>If you receive email that you can verify comes from one of these two
>>SMTP machines, then you can be confident that the email does indeed
>>come from this domain.
>>However, the ~all indicates that we do not limit all email users to
>>these two machines, and you could receive valid domain email from
>>other sources.

You've pretty much got it... ~all stands for soft fail, which does mean 
that there may be some mail that comes from unlisted hosts, but most 
should come from the listed hosts.

Checkout http://spf.pobox.com for more info.

If you go to that page and enter a domain that already has an SPF record 
in the text box on the left side of the page, it'll explain what each 
part of the record means half way down the page.

Daryl

Re[2]: Whitelist collection project

Posted by Robert Menschel <Ro...@Menschel.net>.

Hello Daryl,

Thursday, March 10, 2005, 5:51:26 PM, you wrote:

DCWOS> Robert Menschel wrote:
>> And that leads to the second question: what's the best way for an "end
>> user" to obtain/verify SPF records? I have all the capabilities of XP
>> (shudder) and Cygwin readily available, and can get Linux command-line
>> capabilities via SSH to SARE's server, I believe.

DCWOS> Whatever your favourite way of retrieving DNS records is, will work.
DCWOS> On Windows you could use nslookup, at a command prompt: ...

Thanks.  That's a good start.  Now, how will I know when a domain has
an SPF record to validate upon?  What do they look like?

When I do this on your domain, I see a TXT record that begins
> v=spf1
and has what appears to be two SMTP system addresses (a:), and an
~all.

Would I be correct to read it as,
> If you receive email that you can verify comes from one of these two
> SMTP machines, then you can be confident that the email does indeed
> come from this domain.
> However, the ~all indicates that we do not limit all email users to
> these two machines, and you could receive valid domain email from
> other sources.

Thanks again.

Bob Menschel

Re: Whitelist collection project

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.

Robert Menschel wrote:
> And that leads to the second question: what's the best way for an "end
> user" to obtain/verify SPF records? I have all the capabilities of XP
> (shudder) and Cygwin readily available, and can get Linux command-line
> capabilities via SSH to SARE's server, I believe.

Whatever your favourite way of retrieving DNS records is, will work.

On Windows you could use nslookup, at a command prompt:

nslookup	<enter>
set type=txt	<enter>
somedomain.com	<enter>
another.ca	<enter>
exit		<enter>	when you're done


On a *nix system you could use:   dig txt somedomain.com


Daryl

Re[2]: Whitelist collection project

Posted by Robert Menschel <Ro...@Menschel.net>.

Hello Daryl,

[BTW, thanks to you and others for the direct response/cc in addition
to the list posting. I get the list by digest, and so list-only
responses don't get to me until the digest is released.]

Wednesday, March 9, 2005, 6:55:50 PM, you wrote:

>> DCWOS> Don't forget about the new whitelist_from_spf capability that
>> DCWOS> should be in the next major release.
>> 
>> Not forgetting about it.  Looking forward to it.  However, hoping to
>> have a first whitelist.cf file out before 3.1 is officially released,
>> we can't very well include whitelist_from_spf rules in there, at least
>> not immediately.

DCWOS> Of course!  I wasn't sure on the time line for this whitelist,
DCWOS> or 3.1 for that matter.

Not sure about 3.1, but hope to have the first release of the
whitelist file out in a couple of weeks.

DCWOS> I believe you'd need to use two separate if structures:

if (version >>= 3.001000)
DCWOS> 	whitelist_from_spf
DCWOS> endif

DCWOS> if (version < 3.001000)
DCWOS> 	whitelist_from_rcvd
DCWOS> endif

Captured and will test/verify.

Related question to anyone/everyone: I've been unable to get
mass-check results on whitelist_from_rcvd ... I can test against
individual emails using
> spamassassin --prefs-file=file <testemail >outputfile
but nothing I've done has gotten any results through mass-check and
hit-frequencies. Is there a way to get hit-frequencies results from
whitelist_from_rcvd?

>> DCWOS> For info on whitelist_from_spf (and def_whitelist_from_spf)
>> DCWOS> implementation see bug 3487.
>> 
>> Thanks for the pointer. This functionality is active in the current
>> svn bleeding edge version, yes? If so, then perhaps later this week I
>> can download that and begin playing with it.

DCWOS> I haven't commited it yet, but the latest patch in the bug will
DCWOS> probably be what is used unless somebody thinks of something I
DCWOS> missed.

I'll probably proceed with whitelist.cf release 1 strictly on _rcvd
then, to get it out this month, and will roll in _spf capabilities
once it's stable.

DCWOS> It'd be extremely unlikely that the actual rule form would change from
DCWOS> what it is now:		whitelist_from_spf	*@domain
DCWOS> ...just like whitelist_from & blacklist_from.

And that leads to the second question: what's the best way for an "end
user" to obtain/verify SPF records? I have all the capabilities of XP
(shudder) and Cygwin readily available, and can get Linux command-line
capabilities via SSH to SARE's server, I believe.

>> Rather than maintain the rules and then separately a list of domains,
>> IMO it'd be easier to simply make sure the rules are ordered by domain
>> name, and make the online link point to the rules file itself.
>> 
>> I'd rather go to the slight extra effort of making the rules file more
>> readable than try to keep two different files in sync.

DCWOS> Well, if it comes to the point where you implement a database
DCWOS> for submissions, you can automatically generate both the rules
DCWOS> page and the submission page / duplication checks rather
DCWOS> trivially.

Good point!

Bob Menschel

Re: Whitelist collection project

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.

Robert Menschel wrote:
> Hello Daryl,
> 
> Tuesday, March 8, 2005, 8:44:43 PM, you wrote:
> 
> DCWOS> Robert Menschel wrote:
> DCWOS> <snipped a little for brevity>
> 
>>>Summary: A group of volunteers will maintain a collected/distributed
>>>whitelist, using SpamAssassin's whitelist_from_rcvd capabilities,
>>>similar to (but in the opposite direction as) William Stearns'
>>>collected/distributed blacklist at
>>>http://www.stearns.org/sa-blacklist/sa-blacklist.current.cf
> 
> 
> DCWOS> Don't forget about the new whitelist_from_spf capability that
> DCWOS> should be in the next major release.
> 
> Not forgetting about it.  Looking forward to it.  However, hoping to
> have a first whitelist.cf file out before 3.1 is officially released,
> we can't very well include whitelist_from_spf rules in there, at least
> not immediately.

Of course!  I wasn't sure on the time line for this whitelist, or 3.1 
for that matter.


> However, thinking back, we have the ability in SA to do conditional
> rules now, yes? So we could (using meta language rather than the
> actual SA language):
> 
> if version < 3.1.0 then
>    put here the whitelist_from_rcvd rules for which we don't have spf
> else
>    put here the whitelist_from_spf rules for these sources
> end
> put here the whitelist_from_rcvd rules where spf doesn't apply.
> 
> I'll look into that capability and begin testing it.

I believe you'd need to use two separate if structures:

if (version >= 3.001000)
	whitelist_from_spf
endif

if (version < 3.001000)
	whitelist_from_rcvd
endif


> DCWOS> I think we'd like to get away from whitelist_from_rcvd if
> DCWOS> possible for domains that appear to have sensible SPF records
> DCWOS> (records that actually list their hosts and don't use ?all,
> DCWOS> etc). ...   
> 
> Agreed.
> 
> DCWOS> For info on whitelist_from_spf (and def_whitelist_from_spf)
> DCWOS> implementation see bug 3487.
> 
> Thanks for the pointer. This functionality is active in the current
> svn bleeding edge version, yes? If so, then perhaps later this week I
> can download that and begin playing with it.

I haven't commited it yet, but the latest patch in the bug will probably 
be what is used unless somebody thinks of something I missed.

It'd be extremely unlikely that the actual rule form would change from 
what it is now:		whitelist_from_spf	*@domain
...just like whitelist_from & blacklist_from.


>>>Assumption: This activity will focus only on public newsletters,
>>>services, etc., which normally do not contain any private
>>>information. Therefore there will not be any privacy or
>>>confidentiality concerns for the great majority of emails from
>>>these sources.    
> 
> 
> DCWOS> What about emails from banks etc?  I'd think they'd be a good
> DCWOS> candidate for something you want to whitelist based on their
> DCWOS> received headers or SPF.
> 
> Excellent class of whitelist entities, and with high
> privacy/confidentiality concerns.  I don't want people sending banking
> userids or passwords or such around by email.
> 
> We'll have to come up with some guidelines concerning how to
> communicate and validate those.
> 
> 
>>>[Question to the devs: would you agree this is a valid use for the
>>>def_whitelist_from_rcvd rule?]
> 
> 
> DCWOS> I don't see a problem with it, although we may want to create
> DCWOS> an alias for it, such as sare_whitelist_from_(rcvd|spf), to
> DCWOS> prevent confusion between entries included in the distribution
> DCWOS> with those available from SARE.
> 
> That would be excellent. I don't see any way for SARE to create such
> an alias ourselves (though it might be feasible via plugin) -- what
> would be involved in getting such aliases defined?
> 
> And to generalize the alias name more, maybe call it
> 
>>contrib_whitelist_from_rcvd (and _spf)
>>describe: contributed whitelist entry

There are various ways to do it, if the rest of the devs feel there is a 
need for it.


>>>Any submission which already matches a def_whitelist_from_rcvd rule
>>>within the SA distribution will be identified and ignored (after
>>>response back to the submitter). (We are not going to try to develop
>>>pre-3.0.0 files.)
> 
> 
> DCWOS> There aren't too many of them so it shouldn't be a problem.
> DCWOS> I'd suggest listing those domains (along with new domains as
> DCWOS> added) on a web page at your site, along with the info on how
> DCWOS> to submit new domains.
> 
> Rather than maintain the rules and then separately a list of domains,
> IMO it'd be easier to simply make sure the rules are ordered by domain
> name, and make the online link point to the rules file itself.
> 
> I'd rather go to the slight extra effort of making the rules file more
> readable than try to keep two different files in sync.

Well, if it comes to the point where you implement a database for 
submissions, you can automatically generate both the rules page and the 
submission page / duplication checks rather trivially.


>>>Can anyone think of any guidelines to be added or changed?
> 
> 
> DCWOS> You might want to setup a web form for submissions.  You could
> DCWOS> use it (well, the script behind it & a database) to
> DCWOS> automatically filter out duplicate submissions -- but still
> DCWOS> tally submissions for a domain.
> 
> Yep, good enhancement to the idea. We'll use email to get started
> (since there's nothing to build), but a web form is a good way to go.
> 
> DCWOS> Then again I don't know how many submissions you are expecting,
> DCWOS> so that may be overkill.
> 
> Over time I would expect many hundreds -- all banks, credit unions,
> airlines, hotel chains, retail chains, newspapers, major magazines,
> major NPOs, etc.
> 
> Bob Menschel

Daryl

Re[2]: Whitelist collection project

Posted by Robert Menschel <Ro...@Menschel.net>.

Hello Daryl,

Tuesday, March 8, 2005, 8:44:43 PM, you wrote:

DCWOS> Robert Menschel wrote:
DCWOS> <snipped a little for brevity>

>> Summary: A group of volunteers will maintain a collected/distributed
>> whitelist, using SpamAssassin's whitelist_from_rcvd capabilities,
>> similar to (but in the opposite direction as) William Stearns'
>> collected/distributed blacklist at
>> http://www.stearns.org/sa-blacklist/sa-blacklist.current.cf

DCWOS> Don't forget about the new whitelist_from_spf capability that
DCWOS> should be in the next major release.

Not forgetting about it.  Looking forward to it.  However, hoping to
have a first whitelist.cf file out before 3.1 is officially released,
we can't very well include whitelist_from_spf rules in there, at least
not immediately.

However, thinking back, we have the ability in SA to do conditional
rules now, yes? So we could (using meta language rather than the
actual SA language):

if version < 3.1.0 then
   put here the whitelist_from_rcvd rules for which we don't have spf
else
   put here the whitelist_from_spf rules for these sources
end
put here the whitelist_from_rcvd rules where spf doesn't apply.

I'll look into that capability and begin testing it.

DCWOS> I think we'd like to get away from whitelist_from_rcvd if
DCWOS> possible for domains that appear to have sensible SPF records
DCWOS> (records that actually list their hosts and don't use ?all,
DCWOS> etc). ...   

Agreed.

DCWOS> For info on whitelist_from_spf (and def_whitelist_from_spf)
DCWOS> implementation see bug 3487.

Thanks for the pointer. This functionality is active in the current
svn bleeding edge version, yes? If so, then perhaps later this week I
can download that and begin playing with it.

>> Assumption: This activity will focus only on public newsletters,
>> services, etc., which normally do not contain any private
>> information. Therefore there will not be any privacy or
>> confidentiality concerns for the great majority of emails from
>> these sources.    

DCWOS> What about emails from banks etc?  I'd think they'd be a good
DCWOS> candidate for something you want to whitelist based on their
DCWOS> received headers or SPF.

Excellent class of whitelist entities, and with high
privacy/confidentiality concerns.  I don't want people sending banking
userids or passwords or such around by email.

We'll have to come up with some guidelines concerning how to
communicate and validate those.

>> [Question to the devs: would you agree this is a valid use for the
>> def_whitelist_from_rcvd rule?]

DCWOS> I don't see a problem with it, although we may want to create
DCWOS> an alias for it, such as sare_whitelist_from_(rcvd|spf), to
DCWOS> prevent confusion between entries included in the distribution
DCWOS> with those available from SARE.

That would be excellent. I don't see any way for SARE to create such
an alias ourselves (though it might be feasible via plugin) -- what
would be involved in getting such aliases defined?

And to generalize the alias name more, maybe call it
> contrib_whitelist_from_rcvd (and _spf)
> describe: contributed whitelist entry

>> Any submission which already matches a def_whitelist_from_rcvd rule
>> within the SA distribution will be identified and ignored (after
>> response back to the submitter). (We are not going to try to develop
>> pre-3.0.0 files.)

DCWOS> There aren't too many of them so it shouldn't be a problem.
DCWOS> I'd suggest listing those domains (along with new domains as
DCWOS> added) on a web page at your site, along with the info on how
DCWOS> to submit new domains.

Rather than maintain the rules and then separately a list of domains,
IMO it'd be easier to simply make sure the rules are ordered by domain
name, and make the online link point to the rules file itself.

I'd rather go to the slight extra effort of making the rules file more
readable than try to keep two different files in sync.

>> Can anyone think of any guidelines to be added or changed?

DCWOS> You might want to setup a web form for submissions.  You could
DCWOS> use it (well, the script behind it & a database) to
DCWOS> automatically filter out duplicate submissions -- but still
DCWOS> tally submissions for a domain.

Yep, good enhancement to the idea. We'll use email to get started
(since there's nothing to build), but a web form is a good way to go.

DCWOS> Then again I don't know how many submissions you are expecting,
DCWOS> so that may be overkill.

Over time I would expect many hundreds -- all banks, credit unions,
airlines, hotel chains, retail chains, newspapers, major magazines,
major NPOs, etc.

Bob Menschel

Re: Whitelist collection project

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.

Robert Menschel wrote:

<snipped a little for brevity>

> OK, based on what little discussion there's been so far, here's a
> draft proposal for people to think about.
> 
> Summary: A group of volunteers will maintain a collected/distributed
> whitelist, using SpamAssassin's whitelist_from_rcvd capabilities,
> similar to (but in the opposite direction as) William Stearns'
> collected/distributed blacklist at
> http://www.stearns.org/sa-blacklist/sa-blacklist.current.cf

Don't forget about the new whitelist_from_spf capability that should be 
in the next major release.

I think we'd like to get away from whitelist_from_rcvd if possible for 
domains that appear to have sensible SPF records (records that actually 
list their hosts and don't use ?all, etc).  There's no point in keeping 
up with the addition/removal of a domain's hosts if we don't have to 
(which will be more common than the current whitelist_from_rcvd domains 
if there are lots of them added).  It's also somewhat unfair to restrict 
a domain to their current mail hosts due to concerns about 'their 
SpamAssassin score' if we don't have to.

For info on whitelist_from_spf (and def_whitelist_from_spf) 
implementation see bug 3487.

> Assumption: This activity will focus only on public newsletters,
> services, etc., which normally do not contain any private information.
> Therefore there will not be any privacy or confidentiality concerns
> for the great majority of emails from these sources.

What about emails from banks etc?  I'd think they'd be a good candidate 
for something you want to whitelist based on their received headers or SPF.

> Distribution:  The rules file which results from this activity will be
> maintained within the SARE system, as file 70_sare_whitelist.cf -- it
> can be downloaded manually or via RDJ.
> 
> Rules: Since these rules are gathered by the community at large,
> rather than use the whitelist_from_rcvd rule which normally scores
> -100, we will use the def_whitelist_from_rcvd rule, which scores only
> -15. Any site that wishes can copy any of these rules to a
> def_whitelist_from rule to gain the full -100 points.

While it doesn't really matter how it's done, I'd suggest that a user 
just sets the score for def_whitelist_from_rcvd to -100 or whatever they 
want if -15 isn't enough (which I don't think I've seen a case where it 
isn't enough).

def_whitelist_from_spf currently doesn't assign the full -15, unless the 
'From:' header matches the envelope sender (see bug 3487), and would 
have to be scored appropriately if desired.

> [Question to the devs: would you agree this is a valid use for the
> def_whitelist_from_rcvd rule?]

I don't see a problem with it, although we may want to create an alias 
for it, such as sare_whitelist_from_(rcvd|spf), to prevent confusion 
between entries included in the distribution with those available from SARE.

> Any submission which already matches a def_whitelist_from_rcvd rule
> within the SA distribution will be identified and ignored (after
> response back to the submitter). (We are not going to try to develop
> pre-3.0.0 files.)

There aren't too many of them so it shouldn't be a problem.  I'd suggest 
listing those domains (along with new domains as added) on a web page at 
your site, along with the info on how to submit new domains.

> Can anyone think of any guidelines to be added or changed?

You might want to setup a web form for submissions.  You could use it 
(well, the script behind it & a database) to automatically filter out 
duplicate submissions -- but still tally submissions for a domain.

Then again I don't know how many submissions you are expecting, so that 
may be overkill.

Daryl

Re: Whitelist collection project

Posted by Jeff Chan <je...@surbl.org>.

On Wednesday, March 9, 2005, 6:20:49 PM, Robert Menschel wrote:
> Wednesday, March 9, 2005, 1:00:33 AM, Jeff Chan wrote:

>>> Goal: There are public newsletters, services, etc., which a) do not
>>> spam, and b) can easily be mistaken as spam by SpamAssassin for a
>>> variety of reasons (overly aggressive custom rules, wrongly taught
>>> Bayes system, paid advertising listing SURBL URIs, etc). Though
>>> anti-spam devotees see these as opportunities for cleaning up our
>>> system, for the purposes of email reliability we want these emails to
>>> go through unhindered. 

JC>> This is something that could potentially be useful to SURBLs as a
JC>> whitelist source (used for exclusion from SURBLs), so I'm in
JC>> favor of it.  Daryl's ideas of a web form feeding a database
JC>> and a separately named rule to use it within SA seem like good
JC>> suggestions.

> In that a whitelisted "from" domain can be assumed to reference its
> own domain in URIs within its emails, I agree.  A SURBL whitelist
> probably should contain these primary domain names, to ensure that
> spammers don't pollute SURBL by using/abusing these domains.

Yes, that's one of the current functions of the SURBL whitelist:
to prevent Joe Jobs of bad guys trying to get legitimate domains
listed in SURBLs by mentioning them in spams.

> Theo's comment about being able to skip SURBL checks is a good idea
> also.

In case it's not clear, that's already being done in URIDNSBL.pm.

[...]
JC>> As you outline it above it seems like it would be a global
JC>> publishing of local whitelists where there was strong consensus
JC>> about what should be whitelisted.  That could be a subset of
JC>> the local whitelist_froms of all SpamAssassin installations.
JC>> It could also grow into something larger, and that's not
JC>> necessarily a bad thing.  Collecting up SA local whitelist_froms
JC>> is a reasonable place to start.

> Problem with the whitelist_from's is that they don't have the rcvd
> information needed to prevent abuse. Change that to a collection of
> local whitelist_from_rcvd's, and I would agree with you completely.

Yes, whitelist_from_rcvd is what I meant, not whitelist_from.
:-)

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/

Re[2]: Whitelist collection project

Posted by Robert Menschel <Ro...@Menschel.net>.

Hello Jeff,

Wednesday, March 9, 2005, 1:00:33 AM, you wrote:

>> Goal: There are public newsletters, services, etc., which a) do not
>> spam, and b) can easily be mistaken as spam by SpamAssassin for a
>> variety of reasons (overly aggressive custom rules, wrongly taught
>> Bayes system, paid advertising listing SURBL URIs, etc). Though
>> anti-spam devotees see these as opportunities for cleaning up our
>> system, for the purposes of email reliability we want these emails to
>> go through unhindered. 

JC> This is something that could potentially be useful to SURBLs as a
JC> whitelist source (used for exclusion from SURBLs), so I'm in
JC> favor of it.  Daryl's ideas of a web form feeding a database
JC> and a separately named rule to use it within SA seem like good
JC> suggestions.

In that a whitelisted "from" domain can be assumed to reference its
own domain in URIs within its emails, I agree.  A SURBL whitelist
probably should contain these primary domain names, to ensure that
spammers don't pollute SURBL by using/abusing these domains.

Theo's comment about being able to skip SURBL checks is a good idea
also.

JC> Because this would be a centrally maintained, hand-edited list I
JC> don't see it occupying the exact same space as SPF. 

Agreed. It will acknowledge/use SPF as much as possible, but it will
extend past SPF by making determinations of "not a spammer", which SPF
itself cannot do.

JC> Ultimately something like an RBL may be the best way to get
JC> the data out unless the list is relatively static and not
JC> too large.

Yes ... the list will be relatively static, probably have a fast
growth period at some point, but unlike spammers it won't be
continually morphing. I don't see it getting large enough to warrant
an RBL for a long while, but eventually it might be useful.

JC> As you outline it above it seems like it would be a global
JC> publishing of local whitelists where there was strong consensus
JC> about what should be whitelisted.  That could be a subset of
JC> the local whitelist_froms of all SpamAssassin installations.
JC> It could also grow into something larger, and that's not
JC> necessarily a bad thing.  Collecting up SA local whitelist_froms
JC> is a reasonable place to start.

Problem with the whitelist_from's is that they don't have the rcvd
information needed to prevent abuse. Change that to a collection of
local whitelist_from_rcvd's, and I would agree with you completely.

Bob Menschel

Re: Whitelist collection project

Posted by Jeff Chan <je...@surbl.org>.

On Tuesday, March 8, 2005, 8:13:05 PM, Robert Menschel wrote:
> OK, based on what little discussion there's been so far, here's a
> draft proposal for people to think about.

> Summary: A group of volunteers will maintain a collected/distributed
> whitelist, using SpamAssassin's whitelist_from_rcvd capabilities,
> similar to (but in the opposite direction as) William Stearns'
> collected/distributed blacklist at
> http://www.stearns.org/sa-blacklist/sa-blacklist.current.cf

> Goal: There are public newsletters, services, etc., which a) do not
> spam, and b) can easily be mistaken as spam by SpamAssassin for a
> variety of reasons (overly aggressive custom rules, wrongly taught
> Bayes system, paid advertising listing SURBL URIs, etc). Though
> anti-spam devotees see these as opportunities for cleaning up our
> system, for the purposes of email reliability we want these emails to
> go through unhindered. 

> Assumption: This activity will focus only on public newsletters,
> services, etc., which normally do not contain any private information.
> Therefore there will not be any privacy or confidentiality concerns
> for the great majority of emails from these sources.

> Distribution:  The rules file which results from this activity will be
> maintained within the SARE system, as file 70_sare_whitelist.cf -- it
> can be downloaded manually or via RDJ.

This is something that could potentially be useful to SURBLs as a
whitelist source (used for exclusion from SURBLs), so I'm in
favor of it.  Daryl's ideas of a web form feeding a database
and a separately named rule to use it within SA seem like good
suggestions. 

Because this would be a centrally maintained, hand-edited list I
don't see it occupying the exact same space as SPF.

Ultimately something like an RBL may be the best way to get
the data out unless the list is relatively static and not
too large.

As you outline it above it seems like it would be a global
publishing of local whitelists where there was strong consensus
about what should be whitelisted.  That could be a subset of
the local whitelist_froms of all SpamAssassin installations.
It could also grow into something larger, and that's not
necessarily a bad thing.  Collecting up SA local whitelist_froms
is a reasonable place to start.

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/

Whitelist collection project

Posted by Robert Menschel <Ro...@Menschel.net>.

OK, based on what little discussion there's been so far, here's a
draft proposal for people to think about.

Summary: A group of volunteers will maintain a collected/distributed
whitelist, using SpamAssassin's whitelist_from_rcvd capabilities,
similar to (but in the opposite direction as) William Stearns'
collected/distributed blacklist at
http://www.stearns.org/sa-blacklist/sa-blacklist.current.cf

Goal: There are public newsletters, services, etc., which a) do not
spam, and b) can easily be mistaken as spam by SpamAssassin for a
variety of reasons (overly aggressive custom rules, wrongly taught
Bayes system, paid advertising listing SURBL URIs, etc). Though
anti-spam devotees see these as opportunities for cleaning up our
system, for the purposes of email reliability we want these emails to
go through unhindered. 

Assumption: This activity will focus only on public newsletters,
services, etc., which normally do not contain any private information.
Therefore there will not be any privacy or confidentiality concerns
for the great majority of emails from these sources.

Distribution:  The rules file which results from this activity will be
maintained within the SARE system, as file 70_sare_whitelist.cf -- it
can be downloaded manually or via RDJ.

Rules: Since these rules are gathered by the community at large,
rather than use the whitelist_from_rcvd rule which normally scores
-100, we will use the def_whitelist_from_rcvd rule, which scores only
-15. Any site that wishes can copy any of these rules to a
def_whitelist_from rule to gain the full -100 points.

[Question to the devs: would you agree this is a valid use for the
def_whitelist_from_rcvd rule?]

Adoption to SA Core: We expect that some/most of these rules will be
adopted into the SpamAssassin core distribution when new releases are
issued. At such times, rules adopted into SA will be migrated to a
stable file named 70_sare_whitelist_pre_vvvvvv.cf (eg:
70_sare_whitelist_pre_030100 for SA 3.1.0). This file will be
announced on the SA-Users list, and can be retrieved once into any/all
systems. That file will not change. These rules will be deleted from
the primary 70_sare_whitelist.cf file concurrent with the official
release of the new SA release. Systems that migrate to a new SA
release would then be expected to delete all "pre" files, though there
wouldn't be any harm other than a minimal performance hit if they do
not.

Contributions: Anyone within the SA community can submit a single
whitelist rule for inclusion into the active rules file, and/or submit
a qualifying email (with full headers), from which the whitelist team
will create a rule. Submissions with a full qualifying email are
preferred. Submissions should be sent to whitelist@menschel.net

[Submissions can be sent to that address immediately -- something will
be done with them, whether following this proposal or not.]

Process: At least at first, we expect many/most of these submissions
to be obvious ham sources (such as NYTimes.com would be). Obvious ham
sources will be agreed upon unanimously by the whitelist team, and
added to the whitelist. 

Any non-obvious ham source, any that has a majority agreement within
the whitelist team but not unanimous agreement, will be examined in
detail by one of the team. The results of that examination will be
reviewed by the whitelist team, and any that now receives unanimous
agreement will be added to the rules file.

Any submission which does not obtain unanimous agreement will not be
added to the rules file.

Any submission which already matches a def_whitelist_from_rcvd rule
within the SA distribution will be identified and ignored (after
response back to the submitter). (We are not going to try to develop
pre-3.0.0 files.)

Team members: Anyone within the SpamAssassin community who has
submitted a CLA to Apache, and who is accepted unanimously by the
whitelist team, can join the team.

[The CLA requirement is to ensure that rules can be adopted into the
SA without any licensing concerns.]

- - - - - - -

Can anyone think of any guidelines to be added or changed?

Any volunteers willing to help out?

Bob Menschel