You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by "Tuc at T-B-O-H.NET" <ml...@t-b-o-h.net> on 2008/03/05 19:30:40 UTC

How to report 120,000 spams a day

Hi,

	Our mail server receives about 128K emails a day. Of
those, 120K are absolutely known spam so I don't even run
them through spamassassin. Of the 8K left, 6K are determined 
to be spams, and 2K are considered "good".

	I'm wondering if there is some way to help the 
community (and, admittedly, ourselves) to somehow process
and report those spams to various databases. For the 
smaller users, I've implemented the SiteWideRazor and
use procmail to save off their spams to "probably-spam"
and process them through "spamassassin -r" once an hour.

	For our bigger ones, though, so as not to wear
a hole in the disk drive, I wondered if there were any
suggestions what to do.

		Thanks, Tuc

RE: How to report 120,000 spams a day

Posted by SM <sm...@resistor.net>.

At 13:38 10-03-2008, James E. Pratt wrote:
>No. "Possible mail loss" is really the correct term. Just because I have
>no backup MX, it does not mean I will lose mail.... (Mail loss can, and
>usually is caused by many more issues than just no backup/secondary MX).

Yes.

At 14:20 10-03-2008, Bob Proulx wrote:
>Loss of mail cannot result solely from a primary MX being offline.

See comment about "possible".

>If you believe that it can then let me ask a related but different
>question.  Does your mail server ever return a 4xx code such as:
>
>  - out of disk space / insufficient system storage
>  - dns temporarily unavailable / domain service not available
>  - service not available
>  - mailbox not available
>  - user quota exceeded
>  - too many simultaneous connections
>  - other

Yes.

>If the mta does ever return such a code then (for the sake of
>argument) the same potential for mail loss exists.  If you believe
>that this causes loss then you would want to ensure that none of these
>conditions can ever happen.  Of course this is impossible due to
>practical limits.

In general, we seek to minimize the impact within practical 
limits.  I have seen cases when a backup MX is useful but I won't 
call it a general rule as it can be more of a problem than it is worth.

Regards,
-sm

Re: How to report 120,000 spams a day

Posted by Bob Proulx <bo...@proulx.com>.

James E. Pratt wrote:
> > Bob Proulx wrote:
> > >What would have been the downside of *not* having a backup MX?  The
> > 
> > Loss of mail.
> 
> No. "Possible mail loss" is really the correct term. Just because I have
> no backup MX, it does not mean I will lose mail.... (Mail loss can, and
> usually is caused by many more issues than just no backup/secondary MX).

Loss of mail cannot result solely from a primary MX being offline.

If you believe that it can then let me ask a related but different
question.  Does your mail server ever return a 4xx code such as:

 - out of disk space / insufficient system storage
 - dns temporarily unavailable / domain service not available
 - service not available
 - mailbox not available
 - user quota exceeded
 - too many simultaneous connections
 - other

If the mta does ever return such a code then (for the sake of
argument) the same potential for mail loss exists.  If you believe
that this causes loss then you would want to ensure that none of these
conditions can ever happen.  Of course this is impossible due to
practical limits.

Fortunately for us mail transfer was designed to be robust in the
presence of these problems.  When a mail transfer agent receives a 400
level response it knows the action was not taken and the condition is
temporary in nature.  The mta retry interval should be at least 30
minutes.  The mta should give up retrying after at least 4-5 days.

Bob

RE: How to report 120,000 spams a day

Posted by "James E. Pratt" <jp...@norwich.edu>.

> -----Original Message-----
> From: SM [mailto:sm@resistor.net]
> Sent: Monday, March 10, 2008 3:49 PM
> To: users@spamassassin.apache.org
> Subject: Re: How to report 120,000 spams a day
> 
> At 11:47 10-03-2008, Bob Proulx wrote:
> >What would have been the downside of *not* having a backup MX?  The
> 
> Loss of mail.

No. "Possible mail loss" is really the correct term. Just because I have
no backup MX, it does not mean I will lose mail.... (Mail loss can, and
usually is caused by many more issues than just no backup/secondary MX).

> 
> >mail would have remained in the mailqueue.  Comcast, AOL, Yahoo,
> >Gmail, corporate servers, private servers, etc. would have retried to
> >send the mail to you later.  When your main mail relay came online
> >they would have retried and delivered it.  There would have been NO
> >DIFFERENCE at all.  You didn't need your backup MX relay to proxy
> >relay the mail to you.
> 
> The difference is that you are making assumptions about their retry
> strategy.

Yes, all are different. In the grand scheme though, who cares? We've had
no "backup mx" here for over 5 years, and have lost no mail that I'm
aware of... (or rather, no one has complained anyhow?). We've been down
once for like 8 hours and lost nothing as far as I could tell. If it
were down longer (unlikely with a hot spare ready to go, but besides the
point) some stuff would just bounce and the senders would resend it.
Life goes on).

Regards,
jp

Re: How to report 120,000 spams a day

Posted by SM <sm...@resistor.net>.

At 11:47 10-03-2008, Bob Proulx wrote:
>What would have been the downside of *not* having a backup MX?  The

Loss of mail.

>mail would have remained in the mailqueue.  Comcast, AOL, Yahoo,
>Gmail, corporate servers, private servers, etc. would have retried to
>send the mail to you later.  When your main mail relay came online
>they would have retried and delivered it.  There would have been NO
>DIFFERENCE at all.  You didn't need your backup MX relay to proxy
>relay the mail to you.

The difference is that you are making assumptions about their retry strategy.

Regards,
-sm

Re: How to report 120,000 spams a day

Posted by Bob Proulx <bo...@proulx.com>.

Tuc at T-B-O-H.NET wrote:
> 	Everyone keeps telling me to push the userlist out to the
> MX. This isn't possible, since everything is handled in virtusertable.
> So then they tell me to push the virtusertable out to the MX's.

You are begining to understand why MX relays are recommended against.
They don't really serve a good purpose today.  They do cause hard
problems to solve.  If you do need one then you also need to solve the
hard problems that it pulls in too.

Mail transfer agents can retry mail delivery.  They don't need to
deliver it if your main mail server is offline.  They can wait and
send it later when it is online.

> So I've asked multiple people multiple times how using sendmail
> on an MX thats not a final delivery server how to use the virtusertable

Did you ask that on the sendmail users mailing list?  That would be
the place.  I couldn't recommend using Sendmail anymore.  I recommend
Postfix generally but Exim is also a fine MTA.

> 	I think the issue most people are having is that they
> have the luxury that every MX in their list is a final delivery
> host. We don't. MX's for us fall under the heading of "If the
> sole final delivery host is too overburdened, or is down
> for maintenance, hold the mail atleast until it comes back".

These days many people think that is not a worthwhile reason to have a
backup MX.  (I am one of those people.)  Because of this it isn't
solved by anyone because no one wants to work on it.  It is your
server and it is okay for you to want to do this.  But since you are
going against the current best practices it means that fewer people care
about solving that problem.  Which means that you would need to do it
yourself.  But if you do a nice solution to the problem then other
people who think like you do will be greatful for your efforts.  If it
is really a very nice solution then it might even fall back into favor
as an okay way to do things.

> That REALLY REALLY worked well for us when the datacenter we
> were at in NYC went down during 9/11 because the National 
> Guard stopped a fuel delivery truck for an hour. Our MX
> was uptown. When we finally came back online.

What would have been the downside of *not* having a backup MX?  The
mail would have remained in the mailqueue.  Comcast, AOL, Yahoo,
Gmail, corporate servers, private servers, etc. would have retried to
send the mail to you later.  When your main mail relay came online
they would have retried and delivered it.  There would have been NO
DIFFERENCE at all.  You didn't need your backup MX relay to proxy
relay the mail to you.

Bob

Re: How to report 120,000 spams a day

Posted by Kelson <ke...@speed.net>.

Sandy S wrote:
> OK, I admit I haven't been following this thread closely so I may have 
> missed something and maybe my suggestion won't fit your needs.  However, 
> we're accomplishing something like what you describe above using 
> Mimedefang.  The Mimedefang milter includes a function called 
> md_check_against_smtp_server which checks the recipient address against 
> the virtusertable defined on whatever MX server you give it.  If it's 
> not a valid user voila!  message is rejected during the Mimedefang 
> processing - aka as soon as the connecting server has provided the 
> recipient address, before the whole message has been transmitted.  
> Otherwise processing and mail delivery continues as normal.

You beat me to it!

I'll just add that people have discussed alternate solutions on the MD 
archives that, instead of using md_check_against_smtp_server, involve 
exporting the list tot he remote MX so that it can still query that 
information if/when the primary is unavailable.

Looking through the MIMEDefang mailing list archives is left as an 
exercise for the reader.

-- 
Kelson Vibber
SpeedGate Communications <www.speed.net>

Re: How to report 120,000 spams a day

Posted by Sandy S <sa...@boreal.org>.

Tuc at T-B-O-H.NET wrote:
> Hi,
>
> 	Everyone keeps telling me to push the userlist out to the
> MX. This isn't possible, since everything is handled in virtusertable.
> So then they tell me to push the virtusertable out to the MX's.
> So I've asked multiple people multiple times how using sendmail
> on an MX thats not a final delivery server how to use the virtusertable
> to accept the mail, process against the virtusertable, and then
> when the final delivery server is contactable, send it there. Of
> what I've read, no one can tell me. Maybe I'm missing a fundamental
> fact. Are virtusertables checked during non final delivery MX
> handling in sendmail?
>
>
>   
OK, I admit I haven't been following this thread closely so I may have 
missed something and maybe my suggestion won't fit your needs.  However, 
we're accomplishing something like what you describe above using 
Mimedefang.  The Mimedefang milter includes a function called 
md_check_against_smtp_server which checks the recipient address against 
the virtusertable defined on whatever MX server you give it.  If it's 
not a valid user voila!  message is rejected during the Mimedefang 
processing - aka as soon as the connecting server has provided the 
recipient address, before the whole message has been transmitted.  
Otherwise processing and mail delivery continues as normal.

The man pages warn about this causing a little more overhead on the 
server you're checking along with extra log entries, but it has not been 
a problem here.

Sandy

Re: How to report 120,000 spams a day

Posted by Richard Frovarp <ri...@sendit.nodak.edu>.

Tuc at T-B-O-H.NET wrote:
>> Seriously...
>>
>> How hard is it to setup the MX boxen to only allow 4 email addresses to pass
>> for that particular domain, rejecting all others in the SMTP conversation?
>>
>> Unless the customer is dropping BIG DADDY $$$ with you, tell him policy
>> change and that he isn't losing any email if you do not do a catchall for
>> his domain
>>
>> That postmaster thing is a monster. Send the postmaster stuff to that
>> customer and see how soon they want it turned off
>>
>> ;->
>>
>> Otherwise do what Kris said and push or pull or whatever all the
>> validrcptto's out to the MX's
>>
>>  - rh
>>
>>     
> Hi,
>
> 	Everyone keeps telling me to push the userlist out to the
> MX. This isn't possible, since everything is handled in virtusertable.
> So then they tell me to push the virtusertable out to the MX's.
> So I've asked multiple people multiple times how using sendmail
> on an MX thats not a final delivery server how to use the virtusertable
> to accept the mail, process against the virtusertable, and then
> when the final delivery server is contactable, send it there. Of
> what I've read, no one can tell me. Maybe I'm missing a fundamental
> fact. Are virtusertables checked during non final delivery MX
> handling in sendmail?
>
> 	The postmaster emails are necessary to be able to find
> issues with the systems before clients do. I've caught issues
> with disks going bad, perl updates gone wrong, memory problems,
> and the most recent was that a client was having email sent
> directly to their ISP, who finally decided I was a spammer. The
> "5 days worth of attempts" finally expired and I started seeing
> all the upchuck from the system. If I turn postmaster bounce off,
> I lose that. But yea, it might become something I have to do.
> Lose the ability to monitor things happening on my systems in
> the name of spam.
>
> 	I think the issue most people are having is that they
> have the luxury that every MX in their list is a final delivery
> host. We don't. MX's for us fall under the heading of "If the
> sole final delivery host is too overburdened, or is down
> for maintenance, hold the mail atleast until it comes back".
> That REALLY REALLY worked well for us when the datacenter we
> were at in NYC went down during 9/11 because the National 
> Guard stopped a fuel delivery truck for an hour. Our MX
> was uptown. When we finally came back online.
>
> 	In any case, if someone can explain the mechanics
> of having a sendmail MX that is not the final delivery server
> do localized verification against something and then pass
> it along to the final delivery server please let me know.
> Its not that I don't want to do any of this all, its that
> from what I know, at last look, the virtusertable is only
> consulted during final delivery.
>
> 		Thanks, Tuc
>
>   

You can do this in the access table. You say you only have 4 users, so 
it isn't going to be much work. Otherwise you can install smf-sav to do 
the call ahead. I'd probably just do the manual method into the access 
table however. We have several mx's to several backends and use 
redundant LDAP to do our lookup and routing.

Re: How to report 120,000 spams a day

Posted by Yet Another Ninja <sa...@alexb.ch>.

On 3/10/2008 7:15 PM, Tuc at T-B-O-H.NET wrote:

> 	In any case, if someone can explain the mechanics
> of having a sendmail MX that is not the final delivery server
> do localized verification against something and then pass
> it along to the final delivery server please let me know.
> Its not that I don't want to do any of this all, its that
> from what I know, at last look, the virtusertable is only
> consulted during final delivery.

"milter-ahead"

http://snertsoft.com is your friend

easy to use and rock solid

Re: How to report 120,000 spams a day

Posted by Joseph Brennan <br...@columbia.edu>.

> So then they tell me to push the virtusertable out to the MX's.
> So I've asked multiple people multiple times how using sendmail
> on an MX thats not a final delivery server how to use the virtusertable
> to accept the mail, process against the virtusertable, and then
> when the final delivery server is contactable, send it there. Of
> what I've read, no one can tell me.


What's the problem?  Sendmail will resolve the recipients using
virtusertable, and queue and retry until it can send.  There's
nothing to it.

Joseph Brennan
Columbia University Information Technology

Re: How to report 120,000 spams a day

Posted by "Tuc at T-B-O-H.NET" <ml...@t-b-o-h.net>.

> 
> Seriously...
> 
> How hard is it to setup the MX boxen to only allow 4 email addresses to pass
> for that particular domain, rejecting all others in the SMTP conversation?
> 
> Unless the customer is dropping BIG DADDY $$$ with you, tell him policy
> change and that he isn't losing any email if you do not do a catchall for
> his domain
> 
> That postmaster thing is a monster. Send the postmaster stuff to that
> customer and see how soon they want it turned off
> 
> ;->
> 
> Otherwise do what Kris said and push or pull or whatever all the
> validrcptto's out to the MX's
> 
>  - rh
> 
Hi,

	Everyone keeps telling me to push the userlist out to the
MX. This isn't possible, since everything is handled in virtusertable.
So then they tell me to push the virtusertable out to the MX's.
So I've asked multiple people multiple times how using sendmail
on an MX thats not a final delivery server how to use the virtusertable
to accept the mail, process against the virtusertable, and then
when the final delivery server is contactable, send it there. Of
what I've read, no one can tell me. Maybe I'm missing a fundamental
fact. Are virtusertables checked during non final delivery MX
handling in sendmail?

	The postmaster emails are necessary to be able to find
issues with the systems before clients do. I've caught issues
with disks going bad, perl updates gone wrong, memory problems,
and the most recent was that a client was having email sent
directly to their ISP, who finally decided I was a spammer. The
"5 days worth of attempts" finally expired and I started seeing
all the upchuck from the system. If I turn postmaster bounce off,
I lose that. But yea, it might become something I have to do.
Lose the ability to monitor things happening on my systems in
the name of spam.

	I think the issue most people are having is that they
have the luxury that every MX in their list is a final delivery
host. We don't. MX's for us fall under the heading of "If the
sole final delivery host is too overburdened, or is down
for maintenance, hold the mail atleast until it comes back".
That REALLY REALLY worked well for us when the datacenter we
were at in NYC went down during 9/11 because the National 
Guard stopped a fuel delivery truck for an hour. Our MX
was uptown. When we finally came back online.

	In any case, if someone can explain the mechanics
of having a sendmail MX that is not the final delivery server
do localized verification against something and then pass
it along to the final delivery server please let me know.
Its not that I don't want to do any of this all, its that
from what I know, at last look, the virtusertable is only
consulted during final delivery.

		Thanks, Tuc

RE: [spamassassin] Re: How to report 120,000 spams a day

Posted by Robert - elists <li...@abbacomm.net>.

Seriously...

How hard is it to setup the MX boxen to only allow 4 email addresses to pass
for that particular domain, rejecting all others in the SMTP conversation?

Unless the customer is dropping BIG DADDY $$$ with you, tell him policy
change and that he isn't losing any email if you do not do a catchall for
his domain

That postmaster thing is a monster. Send the postmaster stuff to that
customer and see how soon they want it turned off

;->

Otherwise do what Kris said and push or pull or whatever all the
validrcptto's out to the MX's

 - rh

[spamassassin] Re: How to report 120,000 spams a day

Posted by Kris Deugau <kd...@vianet.ca>.

Tuc at T-B-O-H.NET wrote:
> 	There are "considerations" in doing this. Right now,
> all my systems are set up running sendmail, and all with the
> config of :
> 
> 	define(`confCOPY_ERRORS_TO',`Postmaster')
> 
> 	As such, true to its name, anytime there is an error, the
> postmaster gets a copy. 120K copies of
[snip]

... eww.  <g>

> 	isn't acceptable. Yes, I could take out the COPY_ERRORS_TO,
> but we also run alot of things that are piped to programs, and we
> usually don't see the errors unless that is set.

... O_o   Like what?  I'm sure there are better ways to receive these 
other messages without relying on something of a hack to get them.  I'd 
never enable that on any production system I maintain;  the 
(legitimate!) mail volume alone would generate far more error messages 
that I really don't need to know about than would be worth wading 
through.  (Do you *really* want to get copies of every postmaster 
response to a legitimate user's mistyped outbound mail?)

For instance, systems here have one of our NOC staff aliases set as the 
cron mailto;  in the event of a cronjob failure, off goes the mail to 
the people who can deal with it.  Many tasks send email to a specific 
person or alias;  and if mail falls apart completely we have the 
capability to send to pagers or SMS cell phones.

> 	Even if I did that, though, the next thing I run into is
> MX's. The MX blindly accepts the mail.

Push a user list out to the MX.  Seriously.  Blind relays like that are, 
um, nasty.  Mail forwarding is slightly less nasty (you usually only 
have *one* destination address instead of any destination attracting 
spam).  I've been there;  on a legacy system here I stopped relaying 
mail for domains I don't have a user list for some time ago - the 
limited benefit it offered in getting mail to the customer faster wasn't 
worth the glop in the queue, the postmaster mess, or the hardware and 
staff-time cost.  (Now to convince head office...  <g>)

If you can't cut down the volume on the front-line MX, you *will* have 
to spend CPU and/or disk, somewhere, to deal with the mess.  Feeding it 
to /dev/null as you've been doing is probably about as cheap as you can get.

And as others have noted, it's a tainted feed as a "spamtrap";  you'd 
still have to postprocess it to some degree to make it useful anyway.

-kgd

Re: [spamassassin] RE: [spamassassin] Re: [spamassassin] Re: How to report 120,000 spams a day

Posted by "Tuc at T-B-O-H.NET" <ml...@t-b-o-h.net>.

> 
> > Hi,
> > 
> > 	Thanks for the reply. In as much as I'd like to help the community,
> > I'm under a set of constraints. Starting a whole other server to start
> > doing
> > this isn't something that fits under those constraints. It looks like
> > I'll probably just end up having to /dev/null them as I have been.
> > 
> > 		Tuc
> 
> Tuc
> 
> Didn't it come out that you were accepting emails to any email address
> whether it is a valid email address or not?
> 
> If so, that is where to start...
> 
> do not accept those emails... reject them properly.
> 
>  - rh
>
	There are "considerations" in doing this. Right now,
all my systems are set up running sendmail, and all with the
config of :

	define(`confCOPY_ERRORS_TO',`Postmaster')

	As such, true to its name, anytime there is an error, the
postmaster gets a copy. 120K copies of 

The original message was received at Sun, 9 Mar 2008 15:12:41 -0400 (EDT)
from pD9E3AE30.dip.t-dialin.net [217.227.174.48]

   ----- The following addresses had permanent fatal errors -----
<ho...@example.com>
    (reason: 550 5.0.0 <ho...@example.com>... No such user here)

   ----- Transcript of session follows -----
... while talking to smtp.example.com.:
>>> DATA
<<< 550 5.0.0 <ho...@example.com>... No such user here
550 5.1.1 <ho...@example.com>... User unknown
<<< 503 5.0.0 Need RCPT (recipient)

   ----- Message header follows -----

	isn't acceptable. Yes, I could take out the COPY_ERRORS_TO,
but we also run alot of things that are piped to programs, and we
usually don't see the errors unless that is set. If there is some way
to have my errors copied to me, but "User unknown" not, then I'll
implement it. My way of preventing it from happening, but still seeing
my errors, was to /dev/null addresses that don't exist. I could have
the COPY_ERRORS_TO sent to a special user that uses procmail to weed
them out, but then it defeats my attempts to reduce disk space wear
and tear, CPU, etc.

	Even if I did that, though, the next thing I run into is
MX's. The MX blindly accepts the mail. If the destination server
rejects it, then usually the original sender is forged or invalid,
etc. That then causes a mail spool backup on the MX host until it
then errors out after 5 days of inability to make its delivery. 

	I'd love to take advantage of some functionality ZoneEdit 
(My DNS provider) gives and letting them scan and forward the email. 
However, with the amount of emails and databits it is, I think the
cost would be more than I care to pay given its a "favor" account.
(Also why setting up another server doesn't make sense.)

		Tuc

RE: [spamassassin] Re: [spamassassin] Re: How to report 120,000 spams a day

Posted by Robert - elists <li...@abbacomm.net>.

> Hi,
> 
> 	Thanks for the reply. In as much as I'd like to help the community,
> I'm under a set of constraints. Starting a whole other server to start
> doing
> this isn't something that fits under those constraints. It looks like
> I'll probably just end up having to /dev/null them as I have been.
> 
> 		Tuc

Tuc

Didn't it come out that you were accepting emails to any email address
whether it is a valid email address or not?

If so, that is where to start...

do not accept those emails... reject them properly.

 - rh

Was: : How to report 120,000 spams a day

Posted by "Tuc at T-B-O-H.NET" <ml...@t-b-o-h.net>.

Hi,

	I wanted to thank everyone who responded both on and off list. 

	In the end there was still alot of confusion from people about my
configuration, my intentions, my set up, some things I said.... But its
really not worth rehashing again. The end result is I've changed my
setup.

	The other good that came out of this is that my Seti@Home 
Recent Average Credit went up by 10% total.

	Thanks again,

		Tuc

Re: [spamassassin] Re: How to report 120,000 spams a day

Posted by Marc Perkel <ma...@perkel.com>.

SM wrote:
> At 17:51 08-03-2008, Tuc at T-B-O-H.NET wrote:
>>         As part of it all, I also want to try to keep disk usage and CPU
>> down to as little as possible. With 120,000 per day, thats a junk mail
>> every 3/4's of a second. Since I have it set to deliver to /dev/null, I
>> reduce the amount of disk usage. I'm looking for a solution that 
>> would be
>> easy on the disk and easy on the CPU.  So something directly out of 
>> the MTA
>> would be great (sendmail) or something that the delivery would not store
>> it locally.
>
> Rewrite the recipient address of these emails to another address.  
> That should reduce disk usage on that server and filtering load.  You 
> can run the reporting on another server.  It can be done hourly by 
> processing the mailbox instead of one message at a time.  That would 
> require some code changes.
>
> Regards,
> -sm
>

I'm doing spam reporting and I'm using Exim to do it. I'm not sure what 
you are trying to do but I've configured Exim to try one time to send an 
abouse report and if it fails it goes to /dev/null. In oder to get speed 
I mount a ram disk and put my queues in ram which makes it run really 
fast. I'm deliverling about 10k/hour abuse reports and not having any 
problem. Runs fine as a separate server in a VPS.

Re: [spamassassin] Re: [spamassassin] Re: How to report 120,000 spams a day

Posted by "Tuc at T-B-O-H.NET" <ml...@t-b-o-h.net>.

> 
> At 17:51 08-03-2008, Tuc at T-B-O-H.NET wrote:
> >         As part of it all, I also want to try to keep disk usage and CPU
> >down to as little as possible. With 120,000 per day, thats a junk mail
> >every 3/4's of a second. Since I have it set to deliver to /dev/null, I
> >reduce the amount of disk usage. I'm looking for a solution that would be
> >easy on the disk and easy on the CPU.  So something directly out of the MTA
> >would be great (sendmail) or something that the delivery would not store
> >it locally.
> 
> Rewrite the recipient address of these emails to another 
> address.  That should reduce disk usage on that server and filtering 
> load.  You can run the reporting on another server.  It can be done 
> hourly by processing the mailbox instead of one message at a 
> time.  That would require some code changes.
> 
> Regards,
> -sm 
> 
Hi,

	Thanks for the reply. In as much as I'd like to help the community,
I'm under a set of constraints. Starting a whole other server to start doing
this isn't something that fits under those constraints. It looks like 
I'll probably just end up having to /dev/null them as I have been.

		Tuc

Re: [spamassassin] Re: How to report 120,000 spams a day

Posted by SM <sm...@resistor.net>.

At 17:51 08-03-2008, Tuc at T-B-O-H.NET wrote:
>         As part of it all, I also want to try to keep disk usage and CPU
>down to as little as possible. With 120,000 per day, thats a junk mail
>every 3/4's of a second. Since I have it set to deliver to /dev/null, I
>reduce the amount of disk usage. I'm looking for a solution that would be
>easy on the disk and easy on the CPU.  So something directly out of the MTA
>would be great (sendmail) or something that the delivery would not store
>it locally.

Rewrite the recipient address of these emails to another 
address.  That should reduce disk usage on that server and filtering 
load.  You can run the reporting on another server.  It can be done 
hourly by processing the mailbox instead of one message at a 
time.  That would require some code changes.

Regards,
-sm

Re: How to report

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

> > >etc. I have in my sendmail virtusertable:
> > >*@example.com                           nobody

> > The above is incorrect as there is still a processing overhead.  I 
> > suggest using:
> > 
> > @example.com                           error:nouser User unknown

On 09.03.08 15:05, Tuc at T-B-O-H.NET wrote:
> 	Can't do that as much as I'd like to. Mail comes through
> an MX. The MX just passes it along. When the final machine errors
> it out, the MX is stuck with trying to get rid of it. The postmaster
> also ends up getting a copy of the emails (Yes, I could turn that off,
> but for the number of times its pointed out potential hacks, system
> issues, etc, I'd rather not. ).

It would be good to set up LDAP or similar addressing that could reject mail
to unknown user even on MX relay.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Microsoft dick is soft to do no harm

Re: [spamassassin] Re: [spamassassin] Re: [spamassassin] Re: How to report

Posted by "Tuc at T-B-O-H.NET" <ml...@t-b-o-h.net>.

> 
> I see delivery attempts to invalid email address regularly.  They get 
> rejected at the SMTP level.  Running such messages through 
> SpamAssassin doesn't make sense.  Your previous message mentioned 
> that you wanted to report these "spam" messages and my reply was 
> based upon that.
>
	I don't run them through SA. I /dev/null them. They are going
to an email address that doesn't exist, especially 120K of them a day to
a SINGLE domain, they are spam and don't even need to be run through
SA or anything else. They get discarded as soon as they arrive.
> 
> >etc. I have in my sendmail virtusertable:
> >
> >bingo@example.com                       bingo
> >bangob@example.com                      bango
> >bongo@example.com                       bongo
> >irving@example.com                      irving
> >*@example.com                           nobody
> 
> The above is incorrect as there is still a processing overhead.  I 
> suggest using:
> 
> @example.com                           error:nouser User unknown
> 
	Can't do that as much as I'd like to. Mail comes through
an MX. The MX just passes it along. When the final machine errors
it out, the MX is stuck with trying to get rid of it. The postmaster
also ends up getting a copy of the emails (Yes, I could turn that off,
but for the number of times its pointed out potential hacks, system
issues, etc, I'd rather not. ).

			Tuc

Re: [spamassassin] Re: [spamassassin] Re: How to report 120,000 spams a day

Posted by SM <sm...@resistor.net>.

At 11:01 09-03-2008, Tuc at T-B-O-H.NET wrote:
>         I guess I'm still not being clear. There are 120K emails a day coming
>to INVALID EMAIL ADDRESSES THAT NEVER EXISTED. Its not a case of a user being
>fickle, its a case that they are emailing addresses that NEVER EVER ACTUALLY
>EXISTED. About 1 ever 3/4 of a second. So running them through ANYTHING is
>counter productive since , atleast in my eyes, if you try to email an email
>address that never existed... ITS SPAM. Its not things the user ever 
>sees/knows,

I see delivery attempts to invalid email address regularly.  They get 
rejected at the SMTP level.  Running such messages through 
SpamAssassin doesn't make sense.  Your previous message mentioned 
that you wanted to report these "spam" messages and my reply was 
based upon that.

>etc. I have in my sendmail virtusertable:
>
>bingo@example.com                       bingo
>bangob@example.com                      bango
>bongo@example.com                       bongo
>irving@example.com                      irving
>*@example.com                           nobody

The above is incorrect as there is still a processing overhead.  I 
suggest using:

@example.com                           error:nouser User unknown

Regards,
-sm

Re: [spamassassin] Re: [spamassassin] Re: How to report 120,000 spams

Posted by Shane Williams <sh...@shanew.net>.

On Sun, 9 Mar 2008, Tuc at T-B-O-H.NET wrote:

> 	But it still remains, I'm looking to find what people think is
> the best way on an MX host to do the rejecting at SMTP time.

I'm coming to this conversation kind of late, so I apologize if I've
missed something important earlier in the thread, but it sounds like
what you want is a call-ahead milter (an example is at
http://www.snertsoft.com/sendmail/milter-ahead/).

Call-ahead milters allow MXs to contact the ultimate destination
server, determine whether an address is valid at the end of the
line and then take various actions, including rejection.

-- 
Public key #7BBC68D9 at            |                 Shane Williams
http://pgp.mit.edu/                |      System Admin - UT iSchool
=----------------------------------+-------------------------------
All syllogisms contain three lines |              shanew@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: [spamassassin] Re: [spamassassin] Re: How to report 120,000 spams

Posted by "Tuc at T-B-O-H.NET" <ml...@t-b-o-h.net>.

> > 	Bango said that if his mom can't spell his name right, he doesn't
> > care if he gets her emails. :)
> >   
> 
> fair enough (he can also discard delivered mail anyway). but I've seen a 
> lot of people subscribing to services with a mistyped address (their 
> own) and then calling us to complain why they didn't get the 
> confirmation request...
> 
> anyway, your "corpus" is probably usable provided one uses heuristics to 
> avoid hitting possible ham (or example by computing a distance between 
> the recipient address and your valid addresses to make sure the 
> recipient address is not mistyped, ... etc). but I still believe it 
> should be "reduced" by rejecting mail at smtp time and only keeping some 
> selected "trap" addresses (for example /\d{5}.*@example\.com$/ to catch 
> attempts to use a phone-like address).
> 
	The bango/mom thing was a joke. Not to make the situation any
worse, but the user has never called me wondering where email they expected
is. But then again, I rarely ever hear from the user period. 

	Anyway, I'm fine with the 120,000 mails now being considered
useless in the long run. Atleast 2 people put it well enough to me that
I get it. I'm fine with not having ANY spam traps either. 

	But it still remains, I'm looking to find what people think is
the best way on an MX host to do the rejecting at SMTP time.

		Thanks, Tuc

Re: [spamassassin] Re: How to report 120,000 spams

Posted by mouss <mo...@netoyen.net>.

Tuc at T-B-O-H.NET wrote:
>> If you are proposing some kind of checksums or other types of 'message
>> identifying' techniques on the messages,  those few mistyped addresses
>> could certainly make a difference for your site.   What if bongo's mom
>> mistypes to bungo, realizes her mistake and resends it to bongo a few
>> minutes later.  It is quite likely that the valid message will be
>> rejected now since it's (almost) identical to the one your proposed
>> system just marked as spam.  What if bongo signs up for the a mailing
>> list and mistypes his own email address (yes, this happens).  Now your
>> system marks all list mailings as spam, so everyone using your system
>> starts losing their copies of the mailing list messages too?
>>
>>     
> 	Bango said that if his mom can't spell his name right, he doesn't
> care if he gets her emails. :)
>   

fair enough (he can also discard delivered mail anyway). but I've seen a 
lot of people subscribing to services with a mistyped address (their 
own) and then calling us to complain why they didn't get the 
confirmation request...

anyway, your "corpus" is probably usable provided one uses heuristics to 
avoid hitting possible ham (or example by computing a distance between 
the recipient address and your valid addresses to make sure the 
recipient address is not mistyped, ... etc). but I still believe it 
should be "reduced" by rejecting mail at smtp time and only keeping some 
selected "trap" addresses (for example /\d{5}.*@example\.com$/ to catch 
attempts to use a phone-like address).

> 	I'm not proposing anything. I originally wanted to see if there
> was some way that these 120,000 emails that don't go to a valid/usable
> end user could be used to help the community out in some way. I had 2
> filtering systems agree to do something with them, but for reasons I'd
> rather not share neither one worked out. (One may still yet, I'm not
> sure, waiting to hear back)
>
> 	We also don't do sitewide Bayes/etc. We do it per received user.
> For this domain, it just happens that all 4 users of the domain
> constitute a single received user. I realize that collectively this
> list could propose well over 5000 reasons that make sense why "good" 
> mail could be part of that 120,000. I just didn't think the ever so
> insignificant percentage mattered. For as much as spam gets through,
> and good mail gets marked bad also, I thought this was "acceptable".
>   
>> I think you have good intentions but the source of your data is flawed
>> for anything but maybe limited statistical training.  Unfortunately it
>> probably is not great for that either, since the mail you are seeing
>> for non existent users is probably not at all similar to the mix of
>> spam you get to real accounts.  The scanner would end up biased
>> towards whatever junk the spammers desperate enough to use
>> dictionaries send, which would drown out the stats from those spams
>> that are actually difficult to detect.
>>
>>     
> 	Ok, very valid point that makes alot of sense. Thank you.
>   
>> Why do you accept messages for non existent accounts?  You're wasting
>> bandwidth, regardless of what you do or don't do with the junk after
>> you accept it.  From the sound of it you could reduce your mail
>> bandwidth to a tiny fraction of what it is now by just refusing this
>> stuff (which is what most everyone else does, AFAIK).
>>
>>     
> 	How do you do it on MX hosts? I realize that if I stop
> the wildcard acceptance and stop copying errors to postmaster that
> I can do it on the destination server. However, due to circumstances
> out of my control for the next few months, all email arrives to the
> main mail server via MXs ONLY.
>
> 		Thanks, Tuc
>

Re: [spamassassin] Re: How to report 120,000 spams

Posted by "Tuc at T-B-O-H.NET" <ml...@t-b-o-h.net>.

> 
> If you are proposing some kind of checksums or other types of 'message
> identifying' techniques on the messages,  those few mistyped addresses
> could certainly make a difference for your site.   What if bongo's mom
> mistypes to bungo, realizes her mistake and resends it to bongo a few
> minutes later.  It is quite likely that the valid message will be
> rejected now since it's (almost) identical to the one your proposed
> system just marked as spam.  What if bongo signs up for the a mailing
> list and mistypes his own email address (yes, this happens).  Now your
> system marks all list mailings as spam, so everyone using your system
> starts losing their copies of the mailing list messages too?
>
	Bango said that if his mom can't spell his name right, he doesn't
care if he gets her emails. :)

	I'm not proposing anything. I originally wanted to see if there
was some way that these 120,000 emails that don't go to a valid/usable
end user could be used to help the community out in some way. I had 2
filtering systems agree to do something with them, but for reasons I'd
rather not share neither one worked out. (One may still yet, I'm not
sure, waiting to hear back)

	We also don't do sitewide Bayes/etc. We do it per received user.
For this domain, it just happens that all 4 users of the domain
constitute a single received user. I realize that collectively this
list could propose well over 5000 reasons that make sense why "good" 
mail could be part of that 120,000. I just didn't think the ever so
insignificant percentage mattered. For as much as spam gets through,
and good mail gets marked bad also, I thought this was "acceptable".
>
> I think you have good intentions but the source of your data is flawed
> for anything but maybe limited statistical training.  Unfortunately it
> probably is not great for that either, since the mail you are seeing
> for non existent users is probably not at all similar to the mix of
> spam you get to real accounts.  The scanner would end up biased
> towards whatever junk the spammers desperate enough to use
> dictionaries send, which would drown out the stats from those spams
> that are actually difficult to detect.
>
	Ok, very valid point that makes alot of sense. Thank you.
> 
> Why do you accept messages for non existent accounts?  You're wasting
> bandwidth, regardless of what you do or don't do with the junk after
> you accept it.  From the sound of it you could reduce your mail
> bandwidth to a tiny fraction of what it is now by just refusing this
> stuff (which is what most everyone else does, AFAIK).
> 
	How do you do it on MX hosts? I realize that if I stop
the wildcard acceptance and stop copying errors to postmaster that
I can do it on the destination server. However, due to circumstances
out of my control for the next few months, all email arrives to the
main mail server via MXs ONLY.

		Thanks, Tuc

Re: How to report 120,000 spams

Posted by Aaron Wolfe <aa...@gmail.com>.

On Sun, Mar 9, 2008 at 8:53 PM, Tuc at T-B-O-H <ml...@t-b-o-h.net> wrote:
> >
>  > Tuc at T-B-O-H.NET wrote:
>  > >     I guess I'm still not being clear. There are 120K emails a day coming
>  > > to INVALID EMAIL ADDRESSES THAT NEVER EXISTED. Its not a case of a user being
>  > > fickle, its a case that they are emailing addresses that NEVER EVER ACTUALLY
>  > > EXISTED. About 1 ever 3/4 of a second. So running them through ANYTHING is
>  > > counter productive since , atleast in my eyes, if you try to email an email
>  > > address that never existed... ITS SPAM. Its not things the user ever sees/knows,
>  > > etc. I have in my sendmail virtusertable:
>  > >
>  > > bingo@example.com                   bingo
>  > > bangob@example.com                  bango
>  > > bongo@example.com                   bongo
>  > > irving@example.com                  irving
>  > > *@example.com                               nobody
>  > >
>  > >     The user doesn't even SEE the emails, and processing what they consider
>  > > spam I really don't care about. But getting 120K emails to *@ that are absolutely
>  > > known spam... I would like to help the community out by reporting them to every
>  > > system possible. Yea, if the added benefit is the mail that bingo, bango, bongo
>  > > and irving gets filtered a little better... I won't complain at all.
>  > >
>  > >                     Tuc
>  > >
>  >
>  > Just because mail goes to invalid addresses does not mean it is spam.
>  > people do mistype addresses some time. so this "corpus" is not safe.
>  >
>         Yes, I realize people mistype email addresses. But the domain gets
>  121,000 emails on an average day.
>
>         Of those 121,000 emails a day, 120,000 are to email addresses that
>  aren't of the 4 known/valid/acceptable ones. What percentage would you like
>  to use of emails that are sent are mistyped. One out of 1000? That means
>  121 invalid email addresses a day? But the other 999 of 1000 aren't valid...
>
>         Of the other 1000 that ARE to the 4 known/valid/acceptable email
>  addresses, about 900 of them are marked by SA as a spam level over 5.
>  Usually WILDLY over 5, like 20's and 30's.
>
>         Of those 100 delivered, 75 of them are rejected by the spam
>  filter (Using a method that violates the standard RFC's according to
>  sendmail) of the "final destination" for all 4 of those email boxes (Yes,
>  bingo, bango, bongo, irving actually all end up forwarded to
>  bingobangobongoirving@satelliteexample.com).
>
>         Of the 25 that make it through, the user tells me 15 of them are
>  usually spam.
>
>         So, 10 VALID/ACCEPTABLE emails a day out of 121,000 emails received
>  a day .. Or 8 THOUSANDS OF A SINGLE PERCENT.
>
>         So, while I definitely don't think people can type bingo, bango,
>  bongo, irving correctly 100% of the time, with a valid email ratio of 8
>  thousands of a percent, I don't think in the grand scheme of things
>  that mistyped email addresses really account for much/any.
>
>                         Tuc
>

If you are proposing some kind of checksums or other types of 'message
identifying' techniques on the messages,  those few mistyped addresses
could certainly make a difference for your site.   What if bongo's mom
mistypes to bungo, realizes her mistake and resends it to bongo a few
minutes later.  It is quite likely that the valid message will be
rejected now since it's (almost) identical to the one your proposed
system just marked as spam.  What if bongo signs up for the a mailing
list and mistypes his own email address (yes, this happens).  Now your
system marks all list mailings as spam, so everyone using your system
starts losing their copies of the mailing list messages too?

I think you have good intentions but the source of your data is flawed
for anything but maybe limited statistical training.  Unfortunately it
probably is not great for that either, since the mail you are seeing
for non existent users is probably not at all similar to the mix of
spam you get to real accounts.  The scanner would end up biased
towards whatever junk the spammers desperate enough to use
dictionaries send, which would drown out the stats from those spams
that are actually difficult to detect.

Why do you accept messages for non existent accounts?  You're wasting
bandwidth, regardless of what you do or don't do with the junk after
you accept it.  From the sound of it you could reduce your mail
bandwidth to a tiny fraction of what it is now by just refusing this
stuff (which is what most everyone else does, AFAIK).

-Aaron

Re: [spamassassin] Re: [spamassassin] Re: How to report 120,000 spams a day

Posted by mouss <mo...@netoyen.net>.

Tuc at T-B-O-H.NET wrote:
> 	I guess I'm still not being clear. There are 120K emails a day coming
> to INVALID EMAIL ADDRESSES THAT NEVER EXISTED. Its not a case of a user being
> fickle, its a case that they are emailing addresses that NEVER EVER ACTUALLY
> EXISTED. About 1 ever 3/4 of a second. So running them through ANYTHING is
> counter productive since , atleast in my eyes, if you try to email an email
> address that never existed... ITS SPAM. Its not things the user ever sees/knows,
> etc. I have in my sendmail virtusertable:
>
> bingo@example.com			bingo
> bangob@example.com			bango
> bongo@example.com			bongo
> irving@example.com			irving
> *@example.com				nobody
>
> 	The user doesn't even SEE the emails, and processing what they consider
> spam I really don't care about. But getting 120K emails to *@ that are absolutely
> known spam... I would like to help the community out by reporting them to every
> system possible. Yea, if the added benefit is the mail that bingo, bango, bongo
> and irving gets filtered a little better... I won't complain at all.
>
> 			Tuc
>   

Just because mail goes to invalid addresses does not mean it is spam. 
people do mistype addresses some time. so this "corpus" is not safe.

Re: [spamassassin] Re: [spamassassin] Re: How to report 120, 000 spams a day

Posted by Benny Pedersen <me...@junc.org>.

On Sun, March 9, 2008 19:01, Tuc at T-B-O-H.NET wrote:

> I guess I'm still not being clear. There are 120K emails a day coming
> to INVALID EMAIL ADDRESSES THAT NEVER EXISTED.

this is a ERROR IN YOUR DUMP SENDMAIL :=)

> *@example.com				nobody

FIX IT !

Benny Pedersen
Need more webspace ? http://www.servage.net/?coupon=cust37098

Re: [spamassassin] Re: [spamassassin] Re: How to report 120,000 spams a day

Posted by "Tuc at T-B-O-H.NET" <ml...@t-b-o-h.net>.

> 
> Automatic reporting - that's another thing entirely.  As was pointed out in
> previous replys, the user 
> community is not always accurate in reporting what is legit spam, and what
> is/was requested 
> or "permitted".  I tend to report manually, although I am writing some code
> to semi-automate the
> process.  The program picks out domains, TLDs in URLs and IP addresses (in
> spam), puts them in edit 
> windows, and then allows me to view the message.  At this point, I can click
> a button to report the 
> offending hosts/ips/etc. or not.   But, it is semi-manual and therefore
> involves time.  The tradeoff is 
> accurate reporting to the various block lists.
> 
	I guess I'm still not being clear. There are 120K emails a day coming
to INVALID EMAIL ADDRESSES THAT NEVER EXISTED. Its not a case of a user being
fickle, its a case that they are emailing addresses that NEVER EVER ACTUALLY
EXISTED. About 1 ever 3/4 of a second. So running them through ANYTHING is
counter productive since , atleast in my eyes, if you try to email an email
address that never existed... ITS SPAM. Its not things the user ever sees/knows,
etc. I have in my sendmail virtusertable:

bingo@example.com			bingo
bangob@example.com			bango
bongo@example.com			bongo
irving@example.com			irving
*@example.com				nobody

	The user doesn't even SEE the emails, and processing what they consider
spam I really don't care about. But getting 120K emails to *@ that are absolutely
known spam... I would like to help the community out by reporting them to every
system possible. Yea, if the added benefit is the mail that bingo, bango, bongo
and irving gets filtered a little better... I won't complain at all.

			Tuc

Re: [spamassassin] Re: How to report 120,000 spams a day

Posted by Steve Cloutier <cl...@piesky.com>.



Tuc at T-B-O-H.NET wrote:
> 
>> 
>> On 08.03.08 18:28, Tuc at T-B-O-H wrote:
>> > > 	Our mail server receives about 128K emails a day. Of
>> > > those, 120K are absolutely known spam so I don't even run
>> > > them through spamassassin. Of the 8K left, 6K are determined 
>> > > to be spams, and 2K are considered "good".
>> > > 
>> > > 	I'm wondering if there is some way to help the 
>> > > community (and, admittedly, ourselves) to somehow process
>> > > and report those spams to various databases. For the 
>> > > smaller users, I've implemented the SiteWideRazor and
>> > > use procmail to save off their spams to "probably-spam"
>> > > and process them through "spamassassin -r" once an hour.
>> > > 
>> > > 	For our bigger ones, though, so as not to wear
>> > > a hole in the disk drive, I wondered if there were any
>> > > suggestions what to do.
>> 
>> > 	Anyone??
>> 
>> afaik razor requires manual reporting, not anything automatic. Also note
>> that some people tend to mark as "spam" anything they don't like, even
>> mailing lists they have subscribed to (but are unable to unsubscribe -
>> this
>> if very common form of dumbness)
>> 
>> You can run DCC server which does something similar but is completely
>> automated.
>> 
> Hi,
> 
> 	Thanks for the reply.
> 
> 	I have a feeling that I'm not explaining myself well enough given
> this and private replies I've received.
> 
> 	I am mail hosting for a domain, we'll call it example.com . There
> are, and have only been 4 VALID email addresses for example.com such as :
> 
> bingo@example.com
> bango@example.com
> bongo@example.com
> irving@example.com
> 
> 	Those come in, get scanned by SA, and the ones we think are good
> enough we pass along to the owners email address on his local ISP
> (Hughes.net,
> who has their email processed by Tucows's securehostedemail.com that
> violates
> RFC's and causes sendmail to pump out kernel based messages which I can't
> get
> anyone there to listen to!).
> 
> 	In the mean time, anything that isn't going to bingo, bango, bongo
> or irving is sent straight to /dev/null from the MTA. Its these messages
> that 
> go straight to /dev/null that I'd like to somehow get processed into
> something
> useful for the community. Its not the result of a user getting an email
> from
> examplemacys.com, and saying "Well, I did subscribe, but I have no need
> for
> their shoe sale this week, I call "SPAM!!!!" ". These are messages to
> email
> addresses at example.com that were NEVER legit email addresses.
> 
> 	As part of it all, I also want to try to keep disk usage and CPU
> down to as little as possible. With 120,000 per day, thats a junk mail 
> every 3/4's of a second. Since I have it set to deliver to /dev/null, I
> reduce the amount of disk usage. I'm looking for a solution that would be
> easy on the disk and easy on the CPU.  So something directly out of the
> MTA
> would be great (sendmail) or something that the delivery would not store
> it locally.
> 
> 	I'm concerned if I set up another user, who has a .procmailrc to
> send it directly to "spamassassin -r" that it start spawning off way too
> many processes, too many perl invocations, etc. Same for piping to
> razor-report (And it only benefits razor, no one else). 
> 
> 	I thought DCC was running on this system, but it appears not. I'll
> have to check why and get it running. I thought it was just another
> database
> for SA to check, I'll have to read more about it. Thanks.
> 
> 			Tuc
> 
> 		Thanks, Tuc
> 
> 
> 

Wow!  You receive a LOT of spam.  I manage a site which, for today (so far -
there's an hour left), we have blocked 139,980 spam emails !!!  And, this is
down from what we used to get.

The problems you are dealing with - disk space, resource usage, etc. is why
we finally resorted to writing
a spam blocker (in C - no perl) that blocks the spam at the SMTP protocol
level (there is another 
topic titled "Yet another spam blocker" which discusses this) and never lets
the messages make it to
the disk at all.  There are also other advantages to blocking at protocol.

Automatic reporting - that's another thing entirely.  As was pointed out in
previous replys, the user 
community is not always accurate in reporting what is legit spam, and what
is/was requested 
or "permitted".  I tend to report manually, although I am writing some code
to semi-automate the
process.  The program picks out domains, TLDs in URLs and IP addresses (in
spam), puts them in edit 
windows, and then allows me to view the message.  At this point, I can click
a button to report the 
offending hosts/ips/etc. or not.   But, it is semi-manual and therefore
involves time.  The tradeoff is 
accurate reporting to the various block lists.

I wish I had a better answer for you!

Regards,

Steve

-- 
View this message in context: http://www.nabble.com/How-to-report-120%2C000-spams-a-day-tp15857111p15923807.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: [spamassassin] Re: How to report 120,000 spams a day

Posted by Michael Scheidell <sc...@secnap.net>.

-- 
Michael Scheidell, CTO
>|SECNAP Network Security
Winner 2008 Network Products Guide Hot Companies
FreeBsd SpamAssassin Ports maintainer
Charter member, ICSA labs anti-spam consortium
> From: "Tuc at T-B-O-H.NET" <ml...@t-b-o-h.net>
> Date: Sat, 8 Mar 2008 19:51:49 -0500 (EST)
> To: Matus UHLAR - fantomas <uh...@fantomas.sk>
> Cc: <us...@spamassassin.apache.org>
> Subject: Re: [spamassassin] Re: How to report 120,000 spams a day
> 
>> 
>> On 08.03.08 18:28, Tuc at T-B-O-H wrote:

> 
> Thanks for the reply.
> 
> I have a feeling that I'm not explaining myself well enough given
> this and private replies I've received.
> 
> I am mail hosting for a domain, we'll call it example.com . There
> are, and have only been 4 VALID email addresses for example.com such as :
> 
> bingo@example.com
> bango@example.com
> bongo@example.com
> irving@example.com
> 
> Those come in, get scanned by SA, and the ones we think are good
> enough we pass along to the owners email address on his local ISP (Hughes.net,
> who has their email processed by Tucows's securehostedemail.com that violates
> RFC's and causes sendmail to pump out kernel based messages which I can't get
> anyone there to listen to!).
> 
> In the mean time, anything that isn't going to bingo, bango, bongo

Then you need to enlist the paid services of an email consultant since you
have things totally fsucked up and no amount of reporting is going to help
you or your client, and no, 'the community' doesn't want more copies of the
same zombot spam that we all get al day long.

(and, sorry, abut 120,000 emails a day for 4 users? At, what, 99.99% spam
ratio? Maybe if you started to drop the emails to unknown users it would
never have gotten that bad)

> or irving is sent straight to /dev/null from the MTA. Its these messages that
> go straight to /dev/null that I'd like to somehow get processed into something

Don't send them to dev null, don't accept them.  By accepting them you are
wasting bandwidth, and will waste it more trying to report it.

> I thought DCC was running on this system, but it appears not. I'll
> have to check why and get it running. I thought it was just another database
> for SA to check, I'll have to read more about it. Thanks.

At 120,000 per day, you are required to run a local DCC server, and running
a local DCC server is the only way to process that much email. Trying to
push 120,000 emails a day through the overburdened public servers will most
likely, eventually get your ip address blacklisted, if it isn't already.

_________________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(tm). 
For Information please see http://www.spammertrap.com
_________________________________________________________________________

Re: [spamassassin] Re: How to report 120,000 spams a day

Posted by "Tuc at T-B-O-H.NET" <ml...@t-b-o-h.net>.

> 
> On 08.03.08 18:28, Tuc at T-B-O-H wrote:
> > > 	Our mail server receives about 128K emails a day. Of
> > > those, 120K are absolutely known spam so I don't even run
> > > them through spamassassin. Of the 8K left, 6K are determined 
> > > to be spams, and 2K are considered "good".
> > > 
> > > 	I'm wondering if there is some way to help the 
> > > community (and, admittedly, ourselves) to somehow process
> > > and report those spams to various databases. For the 
> > > smaller users, I've implemented the SiteWideRazor and
> > > use procmail to save off their spams to "probably-spam"
> > > and process them through "spamassassin -r" once an hour.
> > > 
> > > 	For our bigger ones, though, so as not to wear
> > > a hole in the disk drive, I wondered if there were any
> > > suggestions what to do.
> 
> > 	Anyone??
> 
> afaik razor requires manual reporting, not anything automatic. Also note
> that some people tend to mark as "spam" anything they don't like, even
> mailing lists they have subscribed to (but are unable to unsubscribe - this
> if very common form of dumbness)
> 
> You can run DCC server which does something similar but is completely
> automated.
> 
Hi,

	Thanks for the reply.

	I have a feeling that I'm not explaining myself well enough given
this and private replies I've received.

	I am mail hosting for a domain, we'll call it example.com . There
are, and have only been 4 VALID email addresses for example.com such as :

bingo@example.com
bango@example.com
bongo@example.com
irving@example.com

	Those come in, get scanned by SA, and the ones we think are good
enough we pass along to the owners email address on his local ISP (Hughes.net,
who has their email processed by Tucows's securehostedemail.com that violates
RFC's and causes sendmail to pump out kernel based messages which I can't get
anyone there to listen to!).

	In the mean time, anything that isn't going to bingo, bango, bongo
or irving is sent straight to /dev/null from the MTA. Its these messages that 
go straight to /dev/null that I'd like to somehow get processed into something
useful for the community. Its not the result of a user getting an email from
examplemacys.com, and saying "Well, I did subscribe, but I have no need for
their shoe sale this week, I call "SPAM!!!!" ". These are messages to email
addresses at example.com that were NEVER legit email addresses.

	As part of it all, I also want to try to keep disk usage and CPU
down to as little as possible. With 120,000 per day, thats a junk mail 
every 3/4's of a second. Since I have it set to deliver to /dev/null, I
reduce the amount of disk usage. I'm looking for a solution that would be
easy on the disk and easy on the CPU.  So something directly out of the MTA
would be great (sendmail) or something that the delivery would not store
it locally.

	I'm concerned if I set up another user, who has a .procmailrc to
send it directly to "spamassassin -r" that it start spawning off way too
many processes, too many perl invocations, etc. Same for piping to
razor-report (And it only benefits razor, no one else). 

	I thought DCC was running on this system, but it appears not. I'll
have to check why and get it running. I thought it was just another database
for SA to check, I'll have to read more about it. Thanks.

			Tuc

		Thanks, Tuc

Re: How to report 120,000 spams a day

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

On 08.03.08 18:28, Tuc at T-B-O-H wrote:
> > 	Our mail server receives about 128K emails a day. Of
> > those, 120K are absolutely known spam so I don't even run
> > them through spamassassin. Of the 8K left, 6K are determined 
> > to be spams, and 2K are considered "good".
> > 
> > 	I'm wondering if there is some way to help the 
> > community (and, admittedly, ourselves) to somehow process
> > and report those spams to various databases. For the 
> > smaller users, I've implemented the SiteWideRazor and
> > use procmail to save off their spams to "probably-spam"
> > and process them through "spamassassin -r" once an hour.
> > 
> > 	For our bigger ones, though, so as not to wear
> > a hole in the disk drive, I wondered if there were any
> > suggestions what to do.

> 	Anyone??

afaik razor requires manual reporting, not anything automatic. Also note
that some people tend to mark as "spam" anything they don't like, even
mailing lists they have subscribed to (but are unable to unsubscribe - this
if very common form of dumbness)

You can run DCC server which does something similar but is completely
automated.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
How does cat play with mouse? cat /dev/mouse