You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by "Matthew Kitchin (public/usenet)" <mk...@gmail.com> on 2010/08/05 19:47:37 UTC

List of "banned" words/bounce to sender

  Hello all. I have been a loyal users for years, but have never had to 
do much more than make a few custom rules. I work for a healthcare 
company, and I have been asked to implement a mechanism to search for 
patient names in outgoing emails an bounce them back to the sender if 
one is identified.
We would search for them in the format "John Smith" and "Smith, John".
We would like to bounce them back to the sender (that would be within 
our company) with a custom notice indicating what they should do to 
properly send the email.
My typical setups are Postfix ->amavisd->SA
In this case, the setup doesn't exist yet, because I'm just exploring 
the feasibility of doing it.  I would run the latest Versions of CentOS 
64 Bit, Postfix, Amavisd, and SA.
It would be great if it could search attachments too, but I could 
probably get by with just looking at the body. Of course, the emails 
will be HTML and RTF too. They originate in and Outlook/Exchange 
environment.
Is this a realistic setup?

Thanks,
Matthew

RE: List of "banned" words/bounce to sender

Posted by "Kelly, James" <ja...@chapman.edu>.

It is more associated with Mailscanner rather than Amavis, but the
Scamnailer project does something very similar with a long list of
"known bad" phishing mail addresses. It scans each message for the
presence of any of thousands of addresses from a frequently-updated
list, and when it finds one, it adds an additional header. You then
configure a Mailscanner "Spamassassin Rule Action" setting to perform
some custom action if it finds that Scamnailer header in the message.
Perhaps that action could include the kind of bounce/sender notify you'd
want to do.

Scamnailer also works great for stopping targeted phishing attacks, too,
which as a university we get a lot of.

Scamnailer is at http://www.scamnailer.info/

Thanks,
James
__

James Kelly
Network Administrator
IS&T Network Operations
Chapman University
Phone: 714-744-7833
Email: jakelly@chapman.edu
---
CHAPMAN UNIVERSITY WILL NEVER ASK FOR YOUR PASSWORD!
DO NOT SHARE YOUR PASSWORD WITH OTHERS!
If you wish to modify your Chapman email address account information:
Use the account management web page at
https://web.chapman.edu/accountmanagement/,
Call the Chapman University helpdesk at (714) 997-6600, or
Contact helpdesk@chapman.edu.

-----Original Message-----
From: Matthew Kitchin (public/usenet) [mailto:mkitchin.public@gmail.com]

Sent: Thursday, August 05, 2010 11:11 AM
To: Spamassassin
Subject: Re: List of "banned" words/bounce to sender

  On 8/5/2010 1:03 PM, Evan Platt wrote:
>
> Spamassassin can't handle this - it has no capability to reject mail, 
> however you need to think - are you going to have a database of 
> patients names, or is your intention to block anything with a "Name"? 
> Are you really going to want to manage a databse of every name out 
> there? If so, what happens when someone e-mails "I watched a 
> presentation from Bill Gates on...." Well, that's a name, right?
>
> So let's take the alternative - you have a database of just custom 
> names (of your patients). Whos job is it to maintain that? And what 
> happens if, again, in the above situation, a patient has the same name

> as say a celebrity or even worse, say a doctor? Let's say there's a 
> world famous doctor James Bond. But James Bond (different person) is a

> patient. One of your staf members e-mails "We need to go see the 
> conference Dr. James Bond is putting on". Bounced.
>
Amavisd could reject the mail. I was planning on using Spamassassin 
(with a custom built rule) to examine the email for the names. We would 
only use the names of our patients. The names would be dumped out of our

patient DB every night. If a patient has a a same name as a friend, 
there would be a code we would put in the subject to bypass the filter. 
I was thinking of a custom rule for that code that would have a score of

-20 or something like that. Basically, Spamassassin's role would be 
deciding whether or not one of the names was in the email and if the 
override code was in the subject. I'm not saying it is the most 
brilliant idea in the world, but it is what I have been told to
implement.

I know Amavisd well, so I can handle that part. I guess by main question

should be, could I have Spamassassin read a custom rule to look for 
several thousand patient names in the format "John Smith" and "Smith,
John"?

Re: List of "banned" words/bounce to sender

Posted by "Matthew Kitchin (public/usenet)" <mk...@gmail.com>.

  On 8/5/2010 2:05 PM, Bowie Bailey wrote:
> I would tend to say that something that large would not be practical.
> On the other hand, there's no way to really know until you try it.
>
> A database lookup is possible, but the problem is determining what to
> look up.  You would have to somehow identify possible names for
> comparison to the database.
>
Thanks. I think I had a brain fart here. Obviously we would have to have 
identified the names before we could look them up... I think I divided 
by 0 in my head at some point :)

Re: List of "banned" words/bounce to sender

Posted by Bowie Bailey <Bo...@BUC.com>.

 On 8/5/2010 3:00 PM, Matthew Kitchin (public/usenet) wrote:
>  On 8/5/2010 1:52 PM, Bowie Bailey wrote:
>> My approach to doing something like this would be to have a rule that
>> matches the names (however you implement it), and then have the MTA
>> check for that particular rule hit and bounce the message if it exists.
>> This is the same way you generally use the VBounce plugin.  Then do the
>> same thing for your "bypass" rule.
>>
> That is pretty much what I wanted to do. The best way I know to make
> Postfix use SA is with Amavisd.

The point being that the score is irrelevant.  If the rule hits, the
message gets bounced.

>>
>> Spamassassin can use whatever custom rule you care to come up with.  It
>> will happily use a regex with hundreds of names listed.  The question is
>> whether the rule would cause a noticeable slowdown in processing speed.
>> The only way to find out is to try it.  Using compiled rules would
>> probably help here.
>>
> Thanks. We are looking at roughly 70,000 names and always growing. If
> I gave it sufficient hardware, would you expect that to be practical,
> or is that totally ridiculous? Any options for a database look up here?

I would tend to say that something that large would not be practical. 
On the other hand, there's no way to really know until you try it.

A database lookup is possible, but the problem is determining what to
look up.  You would have to somehow identify possible names for
comparison to the database.

-- 
Bowie

Re: List of "banned" words/bounce to sender

Posted by Dominic Benson <do...@lenny.cus.org>.

On 5 Aug 2010, at 20:13, Matthew Kitchin (public/usenet) wrote:

> On 8/5/2010 2:10 PM, Noel Jones wrote:
>> 
>> Use your database to generate rules for clamav.  You could even remove
>> the stock clamav rules if you want.  Matching the body for 70,000
>> names would probably take less than 0.1 seconds.
> That sounds like a really good idea. I do use ClamAV but have never written any rules of my own. Thanks for the tip!

I'd set it up to check for surnames from the list in groups first, then if it matches one of those look for the various permutations of the full names that correspond to each set. I'm thinking of these in terms of calling out from Exim's acl_check_data section, using various database dirs depending on the rule set (like the Bayes filter), but there are other ways of achieving the same with. That ought to reduce the amount of work per message for those that will be let through. You'd have to experiment to find the best group size, it would depend on how many distinct surnames there are in your set, as well as the callout cost relative to the time for each expression. That would also give you a good shot at identifying J. Smith as well, for example.

Re: List of "banned" words/bounce to sender

Posted by "Matthew Kitchin (public/usenet)" <mk...@gmail.com>.

  On 8/5/2010 2:10 PM, Noel Jones wrote:
>
> Use your database to generate rules for clamav.  You could even remove
> the stock clamav rules if you want.  Matching the body for 70,000
> names would probably take less than 0.1 seconds.
That sounds like a really good idea. I do use ClamAV but have never 
written any rules of my own. Thanks for the tip!

Re: List of "banned" words/bounce to sender

Posted by Noel Jones <no...@gmail.com>.

On Thu, Aug 5, 2010 at 2:00 PM, Matthew Kitchin (public/usenet)
<mk...@gmail.com> wrote:
>  On 8/5/2010 1:52 PM, Bowie Bailey wrote:
>>
>> My approach to doing something like this would be to have a rule that
>> matches the names (however you implement it), and then have the MTA
>> check for that particular rule hit and bounce the message if it exists.
>> This is the same way you generally use the VBounce plugin.  Then do the
>> same thing for your "bypass" rule.
>>
> That is pretty much what I wanted to do. The best way I know to make Postfix
> use SA is with Amavisd.
>>
>> Spamassassin can use whatever custom rule you care to come up with.  It
>> will happily use a regex with hundreds of names listed.  The question is
>> whether the rule would cause a noticeable slowdown in processing speed.
>> The only way to find out is to try it.  Using compiled rules would
>> probably help here.
>>
> Thanks. We are looking at roughly 70,000 names and always growing. If I gave
> it sufficient hardware, would you expect that to be practical, or is that
> totally ridiculous? Any options for a database look up here?
>
>



Use your database to generate rules for clamav.  You could even remove
the stock clamav rules if you want.  Matching the body for 70,000
names would probably take less than 0.1 seconds.

Re: List of "banned" words/bounce to sender

Posted by "Matthew Kitchin (public/usenet)" <mk...@gmail.com>.

  On 8/9/2010 8:27 AM, Henrik K wrote:
> Nope, people constantly underestimate the power of regexes.. of course you
> can easily make bad ones, but Perl can run huge lists of simple alternations
> FAST.
>
> I downloaded a 10000 random name pack, and made a quick hack to regexify it
> with my favourite Regexp::Assemble.
>
> ------------------------------
> #!/usr/bin/perl
> use Regexp::Assemble;
> $ra = Regexp::Assemble->new;
> while (<STDIN>) {
>      chomp;
>      # Read comma separated names from stdin: Firstname,Lastname
>      ($firstname, $lastname) = split(',', lc);
>      # Firstname Lastname
>      $ra->add("$firstname $lastname");
>      # Lastname,? Firstname
>      $ra->add("$lastname,? $firstname");
>      # Print rule every 10000 names
>      # (?:^| ) instead of \b since "Kate" would hit "Mary-Kate"
>      if (++$cnt % 10000 == 0 || eof STDIN) {
> 	print 'body TEST_NAMES_'.++$idx;
>          print ' /(?:^| )'.$ra->as_string.'(?:$| )/i'."\n";
>      }
> }
> ------------------------------
> ./names.pl<  names.csv>  names.cf
>
> The resulting single 170000 byte rule did not affect SA in anyway, there was
> virtually no difference in my mass check tests. Running the regex through
> some file manually results in 80000 lines/second. This with one 3Ghz core.
> I think you can make rules/REs of MBs in size, but gains probably nothing.
>
> About ClamAV...
>
> + It would probably handle this even faster
> + Easy logging of exact signature that got hit (single name per sig)
> - It would also match any header like To: From: etc (PRETTY BAD...)
>
> I'd choose SA since it's way more flexible. I doubt performance here is a
> factor, especially with outgoing mail..
>
Thanks for the info.

- It would also match any header like To: From: etc (PRETTY BAD...)

That could be an issue. I will check to see if I can find a workaround, 
if not, ClamAV may not be an option.

Re: List of "banned" words/bounce to sender

Posted by Henrik K <he...@hege.li>.

On Mon, Aug 09, 2010 at 07:28:42AM -0500, Daniel McDonald wrote:
>
> This technique might cut down the number of rules by 93.5%, but then you
> have to do database lookups and some fancy parsing to verify the hit. 
> Don't know if that would be worth it.

Nope, people constantly underestimate the power of regexes.. of course you
can easily make bad ones, but Perl can run huge lists of simple alternations
FAST.

I downloaded a 10000 random name pack, and made a quick hack to regexify it
with my favourite Regexp::Assemble.

------------------------------
#!/usr/bin/perl
use Regexp::Assemble;
$ra = Regexp::Assemble->new;
while (<STDIN>) {
    chomp;
    # Read comma separated names from stdin: Firstname,Lastname
    ($firstname, $lastname) = split(',', lc);
    # Firstname Lastname
    $ra->add("$firstname $lastname");
    # Lastname,? Firstname
    $ra->add("$lastname,? $firstname");
    # Print rule every 10000 names
    # (?:^| ) instead of \b since "Kate" would hit "Mary-Kate"
    if (++$cnt % 10000 == 0 || eof STDIN) {
	print 'body TEST_NAMES_'.++$idx;
        print ' /(?:^| )'.$ra->as_string.'(?:$| )/i'."\n";
    }
}
------------------------------
./names.pl < names.csv > names.cf

The resulting single 170000 byte rule did not affect SA in anyway, there was
virtually no difference in my mass check tests. Running the regex through
some file manually results in 80000 lines/second. This with one 3Ghz core.
I think you can make rules/REs of MBs in size, but gains probably nothing.

About ClamAV...

+ It would probably handle this even faster
+ Easy logging of exact signature that got hit (single name per sig)
- It would also match any header like To: From: etc (PRETTY BAD...)

I'd choose SA since it's way more flexible. I doubt performance here is a
factor, especially with outgoing mail..

Re: List of "banned" words/bounce to sender

Posted by jdow <jd...@earthlink.net>.

From: "Daniel McDonald" <da...@austinenergy.com>
Sent: Monday, 2010/August/09 05:28


> On 8/9/10 6:58 AM, "Martin Gregorie" <ma...@gregorie.org> wrote:
>
>> On Mon, 2010-08-09 at 14:17 +0300, Henrik K wrote:
>>> On Mon, Aug 09, 2010 at 11:38:50AM +0100, Martin Gregorie wrote:
>>>> On Thu, 2010-08-05 at 14:00 -0500, Matthew Kitchin (public/usenet)
>>>> wrote:
>>>>> Thanks. We are looking at roughly 70,000 names and always growing. If 
>>>>> I
>>>>> gave it sufficient hardware, would you expect that to be practical, or
>>>>> is that totally ridiculous? Any options for a database look up here?
>>>>>
>>>> I'd use a plugin that simply queries the database plus a rule to
>>>> activate the plugin by calling its eval() method and sets the score if
>>>> the rule fires.
>>>
>>> Queries database for what? I guess you didn't read the thread fully. :-)
>>>
>> Queries the patient data DB for patient names - obviously. I made the
>> offer because I found it useful to be able to modify an existing plugin
>> that queried a database. Exactly what the SQL query does in largely
>> irrelevant. I found that the difficult bit was working out to how to
>> configure the plugin to access my database. Constructing the query and
>> interpreting its result were relatively easy.
>
> So, you are recommending that he use a plugin to query 70,000 records from 
> a
> database, and perform 140,000 body matches, for every e-mail message he
> receives?  Doesn't seem very efficient.  It would make sense if it were
> structured data he was looking at, to then perform one-off queries to see 
> if
> that data matched the database.  But the original post was discussing a
> data-loss-prevention scheme to avoid unstructured data leaks.
>
> If the data could be regularized somehow, that might be different.  For
> example, if there were a limited number of first names, you could write
> signatures that looked for first names with another capitalized word 
> nearby,
> and then do a database lookup to see if the capitalized word was a last 
> name
> associated with the first name that you discovered.  Unfortunately, people
> are pretty random with first names.  I have a database of some 600K voters
> in Travis County, Texas.  There are 38,808 distinct first names.  This
> technique might cut down the number of rules by 93.5%, but then you have 
> to
> do database lookups and some fancy parsing to verify the hit.  Don't know 
> if
> that would be worth it.

Um, a query for "firstname=John and lastname=Smith" and a query for
"firstname=Smith and lastname=John" is a start. (Match with the format for
the database.) One of the problems is picking out names and match them with
other names close enough to them to be "John Smith". Then you have to guess
the order, the two queries above handle that. Then you have to settle on
whether this is one of our John Smith's or a third party unrelated to our
database. I see that last one as the real problem.

{^_^}

Re: List of "banned" words/bounce to sender

Posted by Daniel McDonald <da...@austinenergy.com>.

On 8/9/10 6:58 AM, "Martin Gregorie" <ma...@gregorie.org> wrote:

> On Mon, 2010-08-09 at 14:17 +0300, Henrik K wrote:
>> On Mon, Aug 09, 2010 at 11:38:50AM +0100, Martin Gregorie wrote:
>>> On Thu, 2010-08-05 at 14:00 -0500, Matthew Kitchin (public/usenet)
>>> wrote:
>>>> Thanks. We are looking at roughly 70,000 names and always growing. If I
>>>> gave it sufficient hardware, would you expect that to be practical, or
>>>> is that totally ridiculous? Any options for a database look up here?
>>>> 
>>> I'd use a plugin that simply queries the database plus a rule to
>>> activate the plugin by calling its eval() method and sets the score if
>>> the rule fires.
>> 
>> Queries database for what? I guess you didn't read the thread fully. :-)
>> 
> Queries the patient data DB for patient names - obviously. I made the
> offer because I found it useful to be able to modify an existing plugin
> that queried a database. Exactly what the SQL query does in largely
> irrelevant. I found that the difficult bit was working out to how to
> configure the plugin to access my database. Constructing the query and
> interpreting its result were relatively easy.

So, you are recommending that he use a plugin to query 70,000 records from a
database, and perform 140,000 body matches, for every e-mail message he
receives?  Doesn't seem very efficient.  It would make sense if it were
structured data he was looking at, to then perform one-off queries to see if
that data matched the database.  But the original post was discussing a
data-loss-prevention scheme to avoid unstructured data leaks.

If the data could be regularized somehow, that might be different.  For
example, if there were a limited number of first names, you could write
signatures that looked for first names with another capitalized word nearby,
and then do a database lookup to see if the capitalized word was a last name
associated with the first name that you discovered.  Unfortunately, people
are pretty random with first names.  I have a database of some 600K voters
in Travis County, Texas.  There are 38,808 distinct first names.  This
technique might cut down the number of rules by 93.5%, but then you have to
do database lookups and some fancy parsing to verify the hit.  Don't know if
that would be worth it.

-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281

Re: List of "banned" words/bounce to sender

Posted by Martin Gregorie <ma...@gregorie.org>.

On Mon, 2010-08-09 at 14:17 +0300, Henrik K wrote:
> On Mon, Aug 09, 2010 at 11:38:50AM +0100, Martin Gregorie wrote:
> > On Thu, 2010-08-05 at 14:00 -0500, Matthew Kitchin (public/usenet)
> > wrote:
> > > Thanks. We are looking at roughly 70,000 names and always growing. If I 
> > > gave it sufficient hardware, would you expect that to be practical, or 
> > > is that totally ridiculous? Any options for a database look up here?
> > >
> > I'd use a plugin that simply queries the database plus a rule to
> > activate the plugin by calling its eval() method and sets the score if
> > the rule fires.
> 
> Queries database for what? I guess you didn't read the thread fully. :-)
> 
Queries the patient data DB for patient names - obviously. I made the
offer because I found it useful to be able to modify an existing plugin
that queried a database. Exactly what the SQL query does in largely
irrelevant. I found that the difficult bit was working out to how to
configure the plugin to access my database. Constructing the query and
interpreting its result were relatively easy. 
 

Martin

Re: List of "banned" words/bounce to sender

Posted by Henrik K <he...@hege.li>.

On Mon, Aug 09, 2010 at 11:38:50AM +0100, Martin Gregorie wrote:
> On Thu, 2010-08-05 at 14:00 -0500, Matthew Kitchin (public/usenet)
> wrote:
> > Thanks. We are looking at roughly 70,000 names and always growing. If I 
> > gave it sufficient hardware, would you expect that to be practical, or 
> > is that totally ridiculous? Any options for a database look up here?
> >
> I'd use a plugin that simply queries the database plus a rule to
> activate the plugin by calling its eval() method and sets the score if
> the rule fires.

Queries database for what? I guess you didn't read the thread fully. :-)

> I'm currently doing the reverse: 
> - I use a view on a mail archive database to check whether I've
>   previously sent mail to the sender of an incoming message and wrote
>   an SA plugin to query the view. 

In case someone is interested, I wrote similar policy daemon for Postfix:
http://mailfud.org/postpals/

Re: List of "banned" words/bounce to sender

Posted by Martin Gregorie <ma...@gregorie.org>.

On Thu, 2010-08-05 at 14:00 -0500, Matthew Kitchin (public/usenet)
wrote:
> Thanks. We are looking at roughly 70,000 names and always growing. If I 
> gave it sufficient hardware, would you expect that to be practical, or 
> is that totally ridiculous? Any options for a database look up here?
>
I'd use a plugin that simply queries the database plus a rule to
activate the plugin by calling its eval() method and sets the score if
the rule fires.

I'm currently doing the reverse: 
- I use a view on a mail archive database to check whether I've
  previously sent mail to the sender of an incoming message and wrote
  an SA plugin to query the view. 
- a rule causes a plugin to query the view and whitelists the message
  by applying a large negative score if the query got a hit.
- the plugin + view does also detect incoming messages containing my
  addresses as forged senders since that's a common spammer trick.

This works very well. E-mail me off-list if you'd like a copy of the
plugin since I haven't yet published it. 

However, if you're not intending to apply any other rules to the
outgoing messages then using SA sounds a bit like overkill when you
could simply write small program in Perl, Python, Java, etc. that simply
runs the query and causes your MTA to bounce any messages than contain
matches with a suitable error code or diagnostic.

Martin

Re: List of "banned" words/bounce to sender

Posted by "Matthew Kitchin (public/usenet)" <mk...@gmail.com>.

  On 8/5/2010 1:52 PM, Bowie Bailey wrote:
> My approach to doing something like this would be to have a rule that
> matches the names (however you implement it), and then have the MTA
> check for that particular rule hit and bounce the message if it exists.
> This is the same way you generally use the VBounce plugin.  Then do the
> same thing for your "bypass" rule.
>
That is pretty much what I wanted to do. The best way I know to make 
Postfix use SA is with Amavisd.
>
> Spamassassin can use whatever custom rule you care to come up with.  It
> will happily use a regex with hundreds of names listed.  The question is
> whether the rule would cause a noticeable slowdown in processing speed.
> The only way to find out is to try it.  Using compiled rules would
> probably help here.
>
Thanks. We are looking at roughly 70,000 names and always growing. If I 
gave it sufficient hardware, would you expect that to be practical, or 
is that totally ridiculous? Any options for a database look up here?

Re: List of "banned" words/bounce to sender

Posted by Bowie Bailey <Bo...@BUC.com>.

 On 8/5/2010 2:11 PM, Matthew Kitchin (public/usenet) wrote:
>  
> Amavisd could reject the mail. I was planning on using Spamassassin
> (with a custom built rule) to examine the email for the names. We
> would only use the names of our patients. The names would be dumped
> out of our patient DB every night. If a patient has a a same name as a
> friend, there would be a code we would put in the subject to bypass
> the filter. I was thinking of a custom rule for that code that would
> have a score of -20 or something like that. Basically, Spamassassin's
> role would be deciding whether or not one of the names was in the
> email and if the override code was in the subject. I'm not saying it
> is the most brilliant idea in the world, but it is what I have been
> told to implement.

My approach to doing something like this would be to have a rule that
matches the names (however you implement it), and then have the MTA
check for that particular rule hit and bounce the message if it exists. 
This is the same way you generally use the VBounce plugin.  Then do the
same thing for your "bypass" rule.

> I know Amavisd well, so I can handle that part. I guess by main
> question should be, could I have Spamassassin read a custom rule to
> look for several thousand patient names in the format "John Smith" and
> "Smith, John"?

Spamassassin can use whatever custom rule you care to come up with.  It
will happily use a regex with hundreds of names listed.  The question is
whether the rule would cause a noticeable slowdown in processing speed. 
The only way to find out is to try it.  Using compiled rules would
probably help here.

body BAD_NAMES /John Smith|Smith, John|Jane Doe|Doe, Jane|....../

Not the most efficient rule, but it would work.  You would probably have
to split it into multiple rules and combine them with a meta rule.

body __BAD_NAMES1 .....
body __BAD_NAMES2 .....
body __BAD_NAMES3 .....
meta BAD_NAMES __BAD_NAMES1 || __BAD_NAMES2 || __BAD_NAMES3

Regexp::Optimizer would probably also help when creating the rules.

-- 
Bowie

Re: List of "banned" words/bounce to sender

Posted by "Matthew Kitchin (public/usenet)" <mk...@gmail.com>.

  On 8/5/2010 1:03 PM, Evan Platt wrote:
>
> Spamassassin can't handle this - it has no capability to reject mail, 
> however you need to think - are you going to have a database of 
> patients names, or is your intention to block anything with a "Name"? 
> Are you really going to want to manage a databse of every name out 
> there? If so, what happens when someone e-mails "I watched a 
> presentation from Bill Gates on...." Well, that's a name, right?
>
> So let's take the alternative - you have a database of just custom 
> names (of your patients). Whos job is it to maintain that? And what 
> happens if, again, in the above situation, a patient has the same name 
> as say a celebrity or even worse, say a doctor? Let's say there's a 
> world famous doctor James Bond. But James Bond (different person) is a 
> patient. One of your staf members e-mails "We need to go see the 
> conference Dr. James Bond is putting on". Bounced.
>
Amavisd could reject the mail. I was planning on using Spamassassin 
(with a custom built rule) to examine the email for the names. We would 
only use the names of our patients. The names would be dumped out of our 
patient DB every night. If a patient has a a same name as a friend, 
there would be a code we would put in the subject to bypass the filter. 
I was thinking of a custom rule for that code that would have a score of 
-20 or something like that. Basically, Spamassassin's role would be 
deciding whether or not one of the names was in the email and if the 
override code was in the subject. I'm not saying it is the most 
brilliant idea in the world, but it is what I have been told to implement.

I know Amavisd well, so I can handle that part. I guess by main question 
should be, could I have Spamassassin read a custom rule to look for 
several thousand patient names in the format "John Smith" and "Smith, John"?

Re: List of "banned" words/bounce to sender

Posted by Evan Platt <ev...@espphotography.com>.

On 08/05/2010 10:47 AM, Matthew Kitchin (public/usenet) wrote:
>  Hello all. I have been a loyal users for years, but have never had to 
> do much more than make a few custom rules. I work for a healthcare 
> company, and I have been asked to implement a mechanism to search for 
> patient names in outgoing emails an bounce them back to the sender if 
> one is identified.
> We would search for them in the format "John Smith" and "Smith, John".
> We would like to bounce them back to the sender (that would be within 
> our company) with a custom notice indicating what they should do to 
> properly send the email.
> My typical setups are Postfix ->amavisd->SA
> In this case, the setup doesn't exist yet, because I'm just exploring 
> the feasibility of doing it.  I would run the latest Versions of 
> CentOS 64 Bit, Postfix, Amavisd, and SA.
> It would be great if it could search attachments too, but I could 
> probably get by with just looking at the body. Of course, the emails 
> will be HTML and RTF too. They originate in and Outlook/Exchange 
> environment.
> Is this a realistic setup?

Spamassassin can't handle this - it has no capability to reject mail, 
however you need to think - are you going to have a database of patients 
names, or is your intention to block anything with a "Name"? Are you 
really going to want to manage a databse of every name out there? If so, 
what happens when someone e-mails "I watched a presentation from Bill 
Gates on...." Well, that's a name, right?

So let's take the alternative - you have a database of just custom names 
(of your patients). Whos job is it to maintain that? And what happens 
if, again, in the above situation, a patient has the same name as say a 
celebrity or even worse, say a doctor? Let's say there's a world famous 
doctor James Bond. But James Bond (different person) is a patient. One 
of your staf members e-mails "We need to go see the conference Dr. James 
Bond is putting on". Bounced.

While it's a great idea in theory (IMHO), it's going to be a headache.

One company I worked at a while ago implemented a web filter. The IT guy 
implemented it, then went to lunch. Unless a site was allowed, it was 
blocked. We  very quickly realized that while he added say 
www.yahoo.com, http://mail.yahoo.com was blocked. So he added 
*.yahoo.com . But then we found out that  there were a dozen other 
DOMAINS needed too - one by one. Say yahoomail.com yahoohosting.com , 
etc. His first few days were spent whitelisting site after site after site.

Eventuallly, they gave up on the idea.

Re: List of "banned" words/bounce to sender

Posted by "Matthew Kitchin (public/usenet)" <mk...@gmail.com>.

  On 8/5/2010 1:19 PM, Benny Pedersen wrote:
> On tor 05 aug 2010 19:47:37 CEST, "Matthew Kitchin (public/usenet)" wrote
>
>> Is this a realistic setup?
>
> postfix will love it if done right with local smtp auth senders, eg no 
> sender sends unauthed then its just add smtpd_sender_bcc_naps from a 
> list of all local recipients
>
> just dont make it if sender auth is not in place first !
>
> more questions ?, its not a spamassassin answer :)
>
I'm not sure what you mean. I'm not looking for anything along the lines 
of authorized senders. I'm wanting to search and email to see if it has 
one of several thousand patient names in it.
I guess by main question should be, could I have Spamassassin read a 
custom rule to look for several thousand patient names in the format 
"John Smith" and "Smith, John"?

Re: List of "banned" words/bounce to sender

Posted by Benny Pedersen <me...@junc.org>.

On tor 05 aug 2010 19:47:37 CEST, "Matthew Kitchin (public/usenet)" wrote

> Is this a realistic setup?

postfix will love it if done right with local smtp auth senders, eg no  
sender sends unauthed then its just add smtpd_sender_bcc_naps from a  
list of all local recipients

just dont make it if sender auth is not in place first !

more questions ?, its not a spamassassin answer :)

-- 
xpoint http://www.unicom.com/pw/reply-to-harmful.html