You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Reindl Harald <h....@thelounge.net> on 2014/08/30 23:41:29 UTC

SA works great!

after two days running SA for the first two test-domains with a
well trained bayes for the global milter-user: impressive!

the few crap making it through poscreen RBL scroing is detected

0.000          0          3          0  non-token data: bayes db version
0.000          0       1389          0  non-token data: nspam
0.000          0       1350          0  non-token data: nham
0.000          0     257152          0  non-token data: ntokens

Aug 30 23:34:19 localhost spamd[4882]: spamd: identified spam (8.9/4.5) for sa-milt:189 in 0.6 seconds, 2454 bytes.
Aug 30 23:34:19 localhost spamd[4882]: spamd: result: Y 8 -
BAYES_80,CUST_DNSBL_15,CUST_DNSWL_2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,REPLYTO_WITHOUT_TO_CC,RP_MATCHES_RCVD,SPF_PASS
scantime=0.6,size=2454,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=51671,mid=<SN...@phx.gbl>,bayes=0.842503,autolearn=disabled
Aug 30 23:34:19 localhost postfix/cleanup[6195]: 3hlrXp5S3dz1w: milter-reject: END-OF-MESSAGE from
snt004-omc1s37.hotmail.com[65.55.90.48]: 5.7.1 Blocked by SpamAssassin; from=<je...@hotmail.com> to=<***>




Re: SA works great!

Posted by Axb <ax...@gmail.com>.
On 09/01/2014 02:18 PM, Timothy Murphy wrote:
> use_bayes 0

this the "master" switch
the rest are not necessary if use_bayes ise set to 0

Re: SA works great!

Posted by Benny Pedersen <me...@junc.eu>.
On 1. sep. 2014 14.19.23 Timothy Murphy <ga...@alice.it> wrote:

> On Monday, September 01, 2014 01:28:24 PM Reindl Harald wrote:
> > > As a matter of interest, how can one turn Bayes on/off?
>
> > use_learner 0
> > use_bayes 0
> > use_bayes_rules 0

Check all pre files, might be there in a loadplugin, coment it, check its 
still lint, restart spamd, done

> Thanks very much.
> I learn something new almost every time you respond!
>
> But someone complained that SA did not work well if Bayes were turned off,
> so I thought this must be something one might do by mistake.
> Now it seems a bit like saying that the internet does not work well
> if the router is turned off ...

Depends on rules used

Re: SA works great!

Posted by Timothy Murphy <ga...@alice.it>.
On Monday, September 01, 2014 01:28:24 PM Reindl Harald wrote:

> > As a matter of interest, how can one turn Bayes on/off?

> use_learner 0
> use_bayes 0
> use_bayes_rules 0
...

Thanks very much.
I learn something new almost every time you respond!

But someone complained that SA did not work well if Bayes were turned off,
so I thought this must be something one might do by mistake.
Now it seems a bit like saying that the internet does not work well
if the router is turned off ...

-- 
Timothy Murphy  
e-mail: gayleard /at/ eircom.net
School of Mathematics, Trinity College, Dublin 2, Ireland

Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.
Am 01.09.2014 um 13:19 schrieb Timothy Murphy:
>>>> Unfortunately if Bayes is not turned on, it does not catch more than
>>>> around 60-70% of spam.  As a Spamassassin user&  server admin, I 
> would
>>>> really like to see that improve.
> 
> As a matter of interest, how can one turn Bayes on/off?
> 
> I take it that the appearance of BAYES_99, etc, in headers
> shows that Bayes is turned on?
> 
> As far as I can see, the only mention of Bayes in my SA configs is the line
>   bayes_path /home/tim/.spamassassin/bayes
> that I added to ~/.spamassassin/user_prefs

use_learner 0
use_bayes 0
use_bayes_rules 0
bayes_use_hapaxes 0
bayes_auto_expire 0
bayes_auto_learn 0
bayes_learn_during_report 0


Re: SA works great!

Posted by Timothy Murphy <ga...@alice.it>.
> >> Unfortunately if Bayes is not turned on, it does not catch more than
> >> around 60-70% of spam.  As a Spamassassin user&  server admin, I 
would
> >> really like to see that improve.

As a matter of interest, how can one turn Bayes on/off?

I take it that the appearance of BAYES_99, etc, in headers
shows that Bayes is turned on?

As far as I can see, the only mention of Bayes in my SA configs is the line
  bayes_path /home/tim/.spamassassin/bayes
that I added to ~/.spamassassin/user_prefs .

-- 
Timothy Murphy  
e-mail: gayleard /at/ eircom.net
School of Mathematics, Trinity College, Dublin 2, Ireland


Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 9/2/2014 1:45 PM, David F. Skoll wrote:
> On Tue, 02 Sep 2014 13:32:26 -0700
> Ted Mittelstaedt<te...@ipinc.net>  wrote:
>
>> The point of blocking on DNS or IP based blocking is to issue
>> that error 5xx because that is the ONLY thing that is going to
>> cause the spammer to delist.
>
> You are an optimist, aren't you?
>
>> Because at that point they are
>> now wasting money and time and resources attempting to deliver
>> to an address that probably does not exist.
>
> Botnet users have so many resources to waste that it's cheaper for them
> to ignore SMTP reply codes than do anything with them.
>

No, they don't.  How long do you think the average botnet mule lasts
before it's BLed?

How much spam can the average hijacked server send before it's BLed?

I have seen what is happening and these days, a hijacked mailserver can 
get off maybe 5,000 items before bang - it's BLed by Google and Hotmail 
and Yahoo.  Then it's queue overflows and it shuts down, particularly 
Exchange servers since Exchange cannot deal with large mail flows.  And 
that's a mailserver which likely has past ham transmissions to the
big 3.  A virused-up end user system that has a PTR of 
"ip234-567-876.dynamic.wonkulating.gronkulator.comcat.net" or whatever 
can maybe get off 500 pieces before the big 3 block it.

The independent BL's then follow shortly after the big 3 and then it's 
all over.

If all your talking is botnet spam, no the spammers don't have unlimited
resources.  Every email address on their victim list which is bona-fied
nonexistent means a probably legitimate address will not get a chance to
go on that run.

And, as I have said repeatedly, my observation of my own incoming spam 
shows that the growth today is in spam that is funded by real businesses 
with real street addresses and business licenses, who pay spammers, who 
are then using large networks where they supply matching forward and 
reverse DNS records, actual domain names, and all of that.  Registries 
like EURid and the company that owns .tk and whoever owns .co are making 
tens of millions off mostly spammers.

The people pushing ICANN to expand the number of TLDs are 99.99% spammers.

We have discussed this and seen this coming for a long time now.  The
spammers know that for them to ever have any hope of a long term 
industry they must go legit - which is why they are all pushing groups
like the Direct Email Sending Association (or whatever the hell they
are calling it) and companies like Return Path

We thought we fought and won the spam wars in the legislative halls, I'm 
warning you, they are coming back with lobbyists and money and
the next round isn't going to be as easy.

Ted

> Regards,
>
> David.

Re: SA works great!

Posted by "David F. Skoll" <df...@roaringpenguin.com>.
On Tue, 02 Sep 2014 13:32:26 -0700
Ted Mittelstaedt <te...@ipinc.net> wrote:

> The point of blocking on DNS or IP based blocking is to issue
> that error 5xx because that is the ONLY thing that is going to
> cause the spammer to delist.

You are an optimist, aren't you?

> Because at that point they are
> now wasting money and time and resources attempting to deliver
> to an address that probably does not exist.

Botnet users have so many resources to waste that it's cheaper for them
to ignore SMTP reply codes than do anything with them.

Regards,

David.

Re: SA works great!

Posted by Noel Butler <no...@ausics.net>.
 

Heh, yeah I know kids of today are so much worse then 20 years ago :) 

But either way, there needs to be drawn a line, so many newbies are
scarred to post there newbie questions on so many lists because of
people like Harry, he's got a long history of moderation and bannings,
but, even I admit he has improved in recent times after I think finally
accepting his actions are not going to be tolerated by many and is
trying to change. 

On 04/09/2014 03:29, Ted Mittelstaedt wrote: 

> While I appreciate the support, Noel, I'm not in favor of banning
> people from mailing lists for using what they think are insulting terms.
> 
> Truth is that Harry's insults are really kind of cute, like the 6 year old all decked out in a Jedi lightsaber doing battle with Darth Vader.
> 
> My 16 year old son's insults could burn him to a crisp. Now that's
> some seriously nasty stuff!
> 
> Ted
 

Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.
While I appreciate the support, Noel, I'm not in favor of banning
people from mailing lists for using what they think are insulting terms.

Truth is that Harry's insults are really kind of cute, like the 6 year 
old all decked out in a Jedi lightsaber doing battle with Darth Vader.

My 16 year old son's insults could burn him to a crisp.  Now that's
some seriously nasty stuff!

Ted

On 9/3/2014 12:13 AM, Noel Butler wrote:
> Doesnt take you long does it Harry, you've been on this list a month and
> already your abusing and putting ppl down, calling child, telling to
> STFU, and some other tripe you levelled at Ted.
>
> Karsten already warned you once, I suggest you remember that.
>
> On 03/09/2014 06:52, Reindl Harald wrote:
>
>> Am 02.09.2014 um 22:32 schrieb Ted Mittelstaedt:
>>> On 9/2/2014 4:59 AM, Reindl Harald wrote:
>>>>> just get a proper MTA, enable debug logging and watch the commands
>>>>> / responses between client and server due a message transmission
>>>> and to make it clear for you: until after end of data itslef is
>>>> responded with success the message is *undelivered* and tried again
>>>> from the sendig client if it is a proper MTA
>>> However you have GIVEN THE SPAMMER AN OK that they have a valid
>>> victim address. You had to issue an OK to the RCPT TO: to get that
>>> DATA from them. You just told them "you got a good email address"
>> child you do not realize that all you claim below has
>> nothing to do with SA, nor did you understand how
>> a *layered* spam protecton works nor did you try
>> to understand *anything* i explained you
>>
>> so what - otherwise i had even accepted the message as you do
>>
>> your setup:
>>   * not on RBL
>>   * accept it and drop it silently because the score
>>   * issue "250 OK i even took the whole message"
>>   * how do you think you don't leak the RCPT case
>>   * frankly with the 250 OK you invite to send more spam
>>
>> my setup:
>>   * not on a RBL
>>   * reject it
>>   * don't issue "250 OK i even took the whole message""
>>   * if it was not a spammer trigger a bounce on the
>>     senders server so that he don't think it was
>>     successful delivered and can even prove it by logs
>>>> if your MTA *don't repsond with success* at END-OF-DATA the message
>>>> implicit is counted as *not delivered* because simply in the middle
>>>> of data the server could raised an error by a full disk or something
>>>> else
>>> Yes and the spammer just tries again. And again. And again, forever
>>> and forever.
>> so what - what has that to do with anything i explained
>> and you refuse to understand over the whole thread?
>>> The point of blocking on DNS or IP based blocking is to issue that
>>> error 5xx because that is the ONLY thing that is going to cause the
>>> spammer to delist. Because at that point they are now wasting money
>>> and time and resources attempting to deliver to an address that
>>> probably does not exist.
>> so what - that one was not on a RBL
>> and now?
>>
>> accept the spam message or *reject* it?
>>
>> i at least reject it
>> you accept it, say "250 OK" and then drop it silent
>>> Sure they can parse the return code, looking for polite language
>>> saying something to the effect "this email is being blocked because
>>> you are on Wonkulating Gronkluator's blacklist" that some sites issue
>>> to "help" newbie Postmasters realize that their mailserver is being
>>> hijacked, or something of that nature.
>> what has that to to with the topic?
>>> But they GUESS so many of their victim addresses that they can't
>>> spend the resources doing that on a dictionary attack, they KNOW that
>>> 99.99% of the error 5xx's they get back are for User Unknown. So the
>>> few times they guess a real address and get that polite
>>> human-readable explanation that they are on a blacklist, gets lost in
>>> the noise.
>> what has that to to with the topic?
>>> But YOUR setup - why that's spam flypaper. Because, YOU are NOT
>>> issuing an error 5xx on a sender IP that happens to guess one of your
>>> users email addresses - because your just too curious to get at the
>>> DATA and inspect the Subject: line.
>> jesus christ - Subject is not data - subject is part of the header
>> come back after you made it through *basic lessons*
>>
>>
>> the client makes it to spamass-milter because he is *not*
>> on the 15 blacklists in front of
>>> Thus you are HELPING the spammers build a list of valid email
>>> addresses on your domain.
>> bullshit - i reject more than 90% like you
>> but i don't issue "250 OK" for clear spam, i reject it
>>> No wonder you have such spectacular spam counts. The spammers must
>>> just love you. Your handing them over your user email list.
>> sorry, but you are an idiot
>>
>> i handle nothing because the accept of DATA only happens
>> if the client is not listed on RBL's and so you better
>> stop to spread bullshit just because you don't understand
>> what people exlaining you by wasting time with your posts
>>> Sure, you may determine they are operating from a blacklist and shut
>>> them down after they throw you 1,000 guesses from an IP address. But
>>> in so doing you have handed them 10 good addresses that they will
>>> remember and just attack you from somewhere else from
>> bullshit again - more than 90% are rejected by postscreen and RBL's
>> frankly postscreen can't leak valid addresses because it even
>> don't know them - that information has only the smtpd process
>> if you make it through RBL's and protocol tests
>>> Do that a couple hundred times and they have thousands of your valid
>>> emails.
>>>> so the communication looks somehow like: * client: i am sending now
>>>> data * server: fine, do so * client: sending data * client: i have
>>>> finished with sending data * server: ok, i accepted that * client:
>>>> fine, QUIT * client: closes the connection AFETR QUIT
>> you missed to understand to understand that the above communication
>> *happens only* if the client *passed RBL checks* and also *passed*
>> HELO checks and also *passed* a lot of other tests and until DATA
>> was no other reason found to reject it before
>>> Here is how yours looks: Spammer: HELO
>>> throwawayhostname.throwawaydomainname.TLD like .eu or .co you: OK I
>>> did my DNS check, my RBL check, my PTR check and that's a good host
>>> so go ahead Spammer: MAIL FROM:
>>> <fakename@throwawayhostname.throwawaydomainname.TLD
>>> <ma...@throwawayhostname.throwawaydomainname.TLD>) You: OK
>>> Spammer: RCPT TO
>>> <usernameofyoursthatIjustguessed@oneofyourdomains.com
>>> <ma...@oneofyourdomains.com>> You:
>>> OK looks good but I want to see your content, so start sending
>>> Spammer to itself HOT DAM I GOT ANOTHER VALID EMAIL ADDRESS TO
>>> TORTURE FOREVER Spammer to you: DATA, blah blah blah, Subject:
>>> Viakkagra blah blah you: OK your content says your a spammer so I'm
>>> going to blow the TCP connection and not send a final OK Spammer to
>>> itself SUCKER!!!! IF HE THINKS I'LL FALL FOR THAT HE'S DUMBER THAN A
>>> POST! I'LL JUST ATTACK FROM SOMEWHERE ELSE USING DIFFERENT CONTENT
>>> UNTIL I GET PAST HIS BLOCK.
>> BULLSHIT
>>
>> the communication looks like "postfix/postscreen[30894]: NOQUEUE: reject: RCPT from [187.163.175.185]:61326:
>> 550 5.7.1 Service unavailable; client [187.163.175.185] blocked using RBL xyz"
>>> And here is how it SHOULD look: Spammer: HELO
>>> throwawayhostname.throwawaydomainname.TLD like .eu or .co you: OK
>>> Spammer: MAIL FROM:
>>> <fakename@throwawayhostname.throwawaydomainname.TLD
>>> <ma...@throwawayhostname.throwawaydomainname.TLD>) you: OK
>>> Spammer: RCPT TO:
>>> <usernameofyoursthatIjustguessed@oneofyourdomains.com
>>> <ma...@oneofyourdomains.com>> You to
>>> yourself: Hmm - looks like my user has a blacklist against .co so
>>> this guy is unwanted You to spammer: 500 User Unknown Spammer: DAMN,
>>> I guessed wrong. Toss that one and go on to the next guess. Or even
>>> better: Spammer: HELO throwawayhostname.throwawaydomainname.TLD like
>>> .eu or .co you to yourself: We block mail from Russian federation
>>> Mafia you to spammer: error 5xx go to hell, spammer. End TCP
>>> connection. Spammer: WTF just happened??? Granted, this isn't how SA
>>> works but you have been talking about prefiltering and this is how it
>>> should look
>> damned SA is not part of the whole game
>>
>> more than 90% rejects happening *before handover the connection to smtpd*
>> and so *long before* SA comes in the mix at all
>>>> so *please* refrain from reply and discuss about good or bad
>>>> defaults until you learned your *basic sessions*
>> why did you not follow the advice above?
>>
>

Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.
Am 03.09.2014 um 09:13 schrieb Noel Butler:
> Doesnt take you long does it Harry, you've been on this list a 
> month and already your abusing and putting ppl down, calling 
> child, telling to STFU, and some other tripe you levelled at Ted.  
> 
> Karsten already warned you once, I suggest you remember that.

read the whole thread and how much time i alreay wasted
trying to explain Ted how a MTA works to get at the end
explained "i leak my valid users list"

that's a thead i started and if he needs basic MTA
lessons he could start a own topic!

> On 03/09/2014 06:52, Reindl Harald wrote:
> 
>> Am 02.09.2014 um 22:32 schrieb Ted Mittelstaedt:
>>> On 9/2/2014 4:59 AM, Reindl Harald wrote:
>>>>> just get a proper MTA, enable debug logging and watch the commands / responses between client and server due a
>>>>> message transmission
>>>> and to make it clear for you: until after end of data itslef is responded with success the message is
>>>> *undelivered* and tried again from the sendig client if it is a proper MTA
>>> However you have GIVEN THE SPAMMER AN OK that they have a valid victim address. You had to issue an OK to the
>>> RCPT TO: to get that DATA from them. You just told them "you got a good email address"
>> child you do not realize that all you claim below has
>> nothing to do with SA, nor did you understand how
>> a *layered* spam protecton works nor did you try
>> to understand *anything* i explained you
>>
>> so what - otherwise i had even accepted the message as you do
>>
>> your setup:
>>  * not on RBL
>>  * accept it and drop it silently because the score
>>  * issue "250 OK i even took the whole message"
>>  * how do you think you don't leak the RCPT case
>>  * frankly with the 250 OK you invite to send more spam
>>
>> my setup:
>>  * not on a RBL
>>  * reject it
>>  * don't issue "250 OK i even took the whole message""
>>  * if it was not a spammer trigger a bounce on the
>>    senders server so that he don't think it was
>>    successful delivered and can even prove it by logs
>>>> if your MTA *don't repsond with success* at END-OF-DATA the message implicit is counted as *not delivered*
>>>> because simply in the middle of data the server could raised an error by a full disk or something else
>>> Yes and the spammer just tries again. And again. And again, forever and forever.
>> so what - what has that to do with anything i explained
>> and you refuse to understand over the whole thread?
>>> The point of blocking on DNS or IP based blocking is to issue that error 5xx because that is the ONLY thing that
>>> is going to cause the spammer to delist. Because at that point they are now wasting money and time and resources
>>> attempting to deliver to an address that probably does not exist.
>> so what - that one was not on a RBL
>> and now?
>>
>> accept the spam message or *reject* it?
>>
>> i at least reject it
>> you accept it, say "250 OK" and then drop it silent
>>> Sure they can parse the return code, looking for polite language saying something to the effect "this email is
>>> being blocked because you are on Wonkulating Gronkluator's blacklist" that some sites issue to "help" newbie
>>> Postmasters realize that their mailserver is being hijacked, or something of that nature.
>> what has that to to with the topic?
>>> But they GUESS so many of their victim addresses that they can't spend the resources doing that on a dictionary
>>> attack, they KNOW that 99.99% of the error 5xx's they get back are for User Unknown. So the few times they guess
>>> a real address and get that polite human-readable explanation that they are on a blacklist, gets lost in the noise.
>> what has that to to with the topic?
>>> But YOUR setup - why that's spam flypaper. Because, YOU are NOT issuing an error 5xx on a sender IP that happens
>>> to guess one of your users email addresses - because your just too curious to get at the DATA and inspect the
>>> Subject: line.
>> jesus christ - Subject is not data - subject is part of the header
>> come back after you made it through *basic lessons*
>>
>>
>> the client makes it to spamass-milter because he is *not*
>> on the 15 blacklists in front of
>>> Thus you are HELPING the spammers build a list of valid email addresses on your domain.
>> bullshit - i reject more than 90% like you
>> but i don't issue "250 OK" for clear spam, i reject it
>>> No wonder you have such spectacular spam counts. The spammers must just love you. Your handing them over your
>>> user email list.
>> sorry, but you are an idiot
>>
>> i handle nothing because the accept of DATA only happens
>> if the client is not listed on RBL's and so you better
>> stop to spread bullshit just because you don't understand
>> what people exlaining you by wasting time with your posts
>>> Sure, you may determine they are operating from a blacklist and shut them down after they throw you 1,000
>>> guesses from an IP address. But in so doing you have handed them 10 good addresses that they will remember and
>>> just attack you from somewhere else from
>> bullshit again - more than 90% are rejected by postscreen and RBL's
>> frankly postscreen can't leak valid addresses because it even
>> don't know them - that information has only the smtpd process
>> if you make it through RBL's and protocol tests
>>> Do that a couple hundred times and they have thousands of your valid emails.
>>>> so the communication looks somehow like: * client: i am sending now data * server: fine, do so * client:
>>>> sending data * client: i have finished with sending data * server: ok, i accepted that * client: fine, QUIT *
>>>> client: closes the connection AFETR QUIT
>> you missed to understand to understand that the above communication
>> *happens only* if the client *passed RBL checks* and also *passed*
>> HELO checks and also *passed* a lot of other tests and until DATA
>> was no other reason found to reject it before
>>> Here is how yours looks: Spammer: HELO throwawayhostname.throwawaydomainname.TLD like .eu or .co you: OK I did
>>> my DNS check, my RBL check, my PTR check and that's a good host so go ahead Spammer: MAIL FROM:
>>> <fakename@throwawayhostname.throwawaydomainname.TLD <ma...@throwawayhostname.throwawaydomainname.TLD>)
>>> You: OK Spammer: RCPT TO <usernameofyoursthatIjustguessed@oneofyourdomains.com
>>> <ma...@oneofyourdomains.com>> You: OK looks good but I want to see your
>>> content, so start sending Spammer to itself HOT DAM I GOT ANOTHER VALID EMAIL ADDRESS TO TORTURE FOREVER Spammer
>>> to you: DATA, blah blah blah, Subject: Viakkagra blah blah you: OK your content says your a spammer so I'm going
>>> to blow the TCP connection and not send a final OK Spammer to itself SUCKER!!!! IF HE THINKS I'LL FALL FOR THAT
>>> HE'S DUMBER THAN A POST! I'LL JUST ATTACK FROM SOMEWHERE ELSE USING DIFFERENT CONTENT UNTIL I GET PAST HIS BLOCK.
>> BULLSHIT
>>
>> the communication looks like "postfix/postscreen[30894]: NOQUEUE: reject: RCPT from [187.163.175.185]:61326:
>> 550 5.7.1 Service unavailable; client [187.163.175.185] blocked using RBL xyz"
>>> And here is how it SHOULD look: Spammer: HELO throwawayhostname.throwawaydomainname.TLD like .eu or .co you: OK
>>> Spammer: MAIL FROM: <fakename@throwawayhostname.throwawaydomainname.TLD
>>> <ma...@throwawayhostname.throwawaydomainname.TLD>) you: OK Spammer: RCPT TO:
>>> <usernameofyoursthatIjustguessed@oneofyourdomains.com
>>> <ma...@oneofyourdomains.com>> You to yourself: Hmm - looks like my user has a
>>> blacklist against .co so this guy is unwanted You to spammer: 500 User Unknown Spammer: DAMN, I guessed wrong.
>>> Toss that one and go on to the next guess. Or even better: Spammer: HELO
>>> throwawayhostname.throwawaydomainname.TLD like .eu or .co you to yourself: We block mail from Russian federation
>>> Mafia you to spammer: error 5xx go to hell, spammer. End TCP connection. Spammer: WTF just happened??? Granted,
>>> this isn't how SA works but you have been talking about prefiltering and this is how it should look
>> damned SA is not part of the whole game
>>
>> more than 90% rejects happening *before handover the connection to smtpd*
>> and so *long before* SA comes in the mix at all
>>>> so *please* refrain from reply and discuss about good or bad defaults until you learned your *basic sessions*
>> why did you not follow the advice above?


Re: SA works great!

Posted by Noel Butler <no...@ausics.net>.
 

Doesnt take you long does it Harry, you've been on this list a month and
already your abusing and putting ppl down, calling child, telling to
STFU, and some other tripe you levelled at Ted. 

Karsten already warned you once, I suggest you remember that.

On 03/09/2014 06:52, Reindl Harald wrote: 

> Am 02.09.2014 um 22:32 schrieb Ted Mittelstaedt:
> On 9/2/2014 4:59 AM, Reindl Harald wrote: just get a proper MTA, enable debug logging and watch the commands / responses between client and server due a message transmission and to make it clear for you: until after end of data itslef is responded with success the message is *undelivered* and tried again from the sendig client if it is a proper MTA
 However you have GIVEN THE SPAMMER AN OK that they have a valid victim
address. You had to issue an OK to the RCPT TO: to get that DATA from
them. You just told them "you got a good email address" 

child you do not realize that all you claim below has
nothing to do with SA, nor did you understand how
a *layered* spam protecton works nor did you try
to understand *anything* i explained you

so what - otherwise i had even accepted the message as you do

your setup:
 * not on RBL
 * accept it and drop it silently because the score
 * issue "250 OK i even took the whole message"
 * how do you think you don't leak the RCPT case
 * frankly with the 250 OK you invite to send more spam

my setup:
 * not on a RBL
 * reject it
 * don't issue "250 OK i even took the whole message""
 * if it was not a spammer trigger a bounce on the
 senders server so that he don't think it was
 successful delivered and can even prove it by logs

>> if your MTA *don't repsond with success* at END-OF-DATA the message implicit is counted as *not delivered* because simply in the middle of data the server could raised an error by a full disk or something else
> Yes and the spammer just tries again. And again. And again, forever and forever.

so what - what has that to do with anything i explained
and you refuse to understand over the whole thread?

> The point of blocking on DNS or IP based blocking is to issue that error 5xx because that is the ONLY thing that is going to cause the spammer to delist. Because at that point they are now wasting money and time and resources attempting to deliver to an address that probably does not exist.

so what - that one was not on a RBL
and now?

accept the spam message or *reject* it?

i at least reject it
you accept it, say "250 OK" and then drop it silent

> Sure they can parse the return code, looking for polite language saying something to the effect "this email is being blocked because you are on Wonkulating Gronkluator's blacklist" that some sites issue to "help" newbie Postmasters realize that their mailserver is being hijacked, or something of that nature.

what has that to to with the topic?

> But they GUESS so many of their victim addresses that they can't spend the resources doing that on a dictionary attack, they KNOW that 99.99% of the error 5xx's they get back are for User Unknown. So the few times they guess a real address and get that polite human-readable explanation that they are on a blacklist, gets lost in the noise.

what has that to to with the topic?

> But YOUR setup - why that's spam flypaper. Because, YOU are NOT issuing an error 5xx on a sender IP that happens to guess one of your users email addresses - because your just too curious to get at the DATA and inspect the Subject: line.

jesus christ - Subject is not data - subject is part of the header
come back after you made it through *basic lessons*

the client makes it to spamass-milter because he is *not*
on the 15 blacklists in front of

> Thus you are HELPING the spammers build a list of valid email addresses on your domain.

bullshit - i reject more than 90% like you
but i don't issue "250 OK" for clear spam, i reject it

> No wonder you have such spectacular spam counts. The spammers must just love you. Your handing them over your user email list.

sorry, but you are an idiot

i handle nothing because the accept of DATA only happens
if the client is not listed on RBL's and so you better
stop to spread bullshit just because you don't understand
what people exlaining you by wasting time with your posts

> Sure, you may determine they are operating from a blacklist and shut them down after they throw you 1,000 guesses from an IP address. But in so doing you have handed them 10 good addresses that they will remember and just attack you from somewhere else from

bullshit again - more than 90% are rejected by postscreen and RBL's
frankly postscreen can't leak valid addresses because it even
don't know them - that information has only the smtpd process
if you make it through RBL's and protocol tests

> Do that a couple hundred times and they have thousands of your valid emails. 
> 
>> so the communication looks somehow like: * client: i am sending now data * server: fine, do so * client: sending data * client: i have finished with sending data * server: ok, i accepted that * client: fine, QUIT * client: closes the connection AFETR QUIT

you missed to understand to understand that the above communication
*happens only* if the client *passed RBL checks* and also *passed*
HELO checks and also *passed* a lot of other tests and until DATA
was no other reason found to reject it before

> Here is how yours looks: Spammer: HELO throwawayhostname.throwawaydomainname.TLD like .eu or .co you: OK I did my DNS check, my RBL check, my PTR check and that's a good host so go ahead Spammer: MAIL FROM: <fakename@throwawayhostname.throwawaydomainname.TLD) You: OK Spammer: RCPT TO <us...@oneofyourdomains.com> You: OK looks good but I want to see your content, so start sending Spammer to itself HOT DAM I GOT ANOTHER VALID EMAIL ADDRESS TO TORTURE FOREVER Spammer to you: DATA, blah blah blah, Subject: Viakkagra blah blah you: OK your content says your a spammer so I'm going to blow the TCP connection and not send a final OK Spammer to itself SUCKER!!!! IF HE THINKS I'LL FALL FOR THAT HE'S DUMBER THAN A POST! I'LL JUST ATTACK FROM SOMEWHERE ELSE USING DIFFERENT CONTENT UNTIL I GET PAST HIS BLOCK.

BULLSHIT

the communication looks like "postfix/postscreen[30894]: NOQUEUE:
reject: RCPT from [187.163.175.185]:61326:
550 5.7.1 Service unavailable; client [187.163.175.185] blocked using
RBL xyz"

> And here is how it SHOULD look: Spammer: HELO throwawayhostname.throwawaydomainname.TLD like .eu or .co you: OK Spammer: MAIL FROM: <fakename@throwawayhostname.throwawaydomainname.TLD) you: OK Spammer: RCPT TO: <us...@oneofyourdomains.com> You to yourself: Hmm - looks like my user has a blacklist against .co so this guy is unwanted You to spammer: 500 User Unknown Spammer: DAMN, I guessed wrong. Toss that one and go on to the next guess. Or even better: Spammer: HELO throwawayhostname.throwawaydomainname.TLD like .eu or .co you to yourself: We block mail from Russian federation Mafia you to spammer: error 5xx go to hell, spammer. End TCP connection. Spammer: WTF just happened??? Granted, this isn't how SA works but you have been talking about prefiltering and this is how it should look

damned SA is not part of the whole game

more than 90% rejects happening *before handover the connection to
smtpd*
and so *long before* SA comes in the mix at all

>> so *please* refrain from reply and discuss about good or bad defaults until you learned your *basic sessions*

why did you not follow the advice above?

 

Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.

Am 04.09.2014 um 19:25 schrieb Reindl Harald:
>> Now as for dynamic or dialup RBLs go, UNFORTUNATELY although
>> many responsible ISPs do insert the word dynamic or dialup
>> in the PTRs of their dialup or dynamic pools, a great many
>> still do not.  Which means the RBL's that track those need
>> to try and tease the list of dynamic RBL's out of those
>> providers.  They don't have all of them.
> 
> most do one way or another
> 
> x.dyn.
> x.dyn-
> .........
> 
> i started together with another sysadmin to collect Regex rules
> for postfix (sorry but i use postfix and you could too if you
> miss features somewhere else) on top with some keywords like
> "mta", "outbound".... and have that rules running in log-only
> mode, with the final data there will be some more "whitelists"
> of legit mail servers / ISP's with a dumb PTR and after that
> have fun with a botnet zombie
> 
> "dul.dnsbl.sorbs.net" and "pbl.spamhaus.org" catch almost all
> enduser-IP's, the rest is done via PTR filters and the remaining
> 8-10% eats spamassassin

and BTW the problem are not ISP's not insert "dyn", "dynamic", "dialup"

the problem are the fools of mailserver admins where legit mail would
be rejected by the two simple PTR-rules below or not have one of
the DUNNO list at start of their PTR and sometimes i feel they
should get all a reject with a hint to a unacceptable PTR

/^bounceout.*\..*/ DUNNO
/^gate.*\..*/ DUNNO
/^gw\-.*\..*/ DUNNO
/^gw\..*\..*/ DUNNO
/^hub.*\..*/ DUNNO
/^incoming.*\..*/ DUNNO
/^list.*\..*/ DUNNO
/^mail.*\..*/ DUNNO
/^mbox.*\..*/ DUNNO
/^mda.*\..*/ DUNNO
/^message.*\..*/ DUNNO
/^mgate.*\..*/ DUNNO
/^mgw.*\..*/ DUNNO
/^mhub.*\..*/ DUNNO
/^mout.*\..*/ DUNNO
/^msend.*\..*/ DUNNO
/^msg.*\..*/ DUNNO
/^mta.*\..*/ DUNNO
/^mx.*\..*/ DUNNO
/^out.*\..*/ DUNNO
/^relay.*\..*/ DUNNO
/^send.*\..*/ DUNNO
/^smarthost.*\..*/ DUNNO
/^smtp.*\..*/ DUNNO
/.*[0-9]{1,3}(\-|\.)[0-9]{1,3}(\-|\.)[0-9]{1,3}(\-|\.)[0-9]{1,3}(\-|\.)[0-9]{1,3}(\-|\.)[0-9]{1,3}.*/ REJECT
/([0-9]{1,3}(\-|\.)[0-9]{1,3}(\-|\.)[0-9]{1,3}(\-|\.)[0-9]{1,3}.+[a-z0-9]+\.[a-z0-9]{2,6})$/ REJECT




Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.

Am 04.09.2014 um 19:08 schrieb Ted Mittelstaedt:
>> there are no countermeasures for a spammer against make it
>> on a RBL or use a zombie on a infected machine and get
>> blocked by Dialup-RBL's before the first mail or by
>> get rejected because the dynamic PTR of the infected
>> zombie
> 
> Yes, there are.  The countermeasure is to shift the list of
> victim addresses to a different bot or hijacked machine.
> 
> Here's how I have observed spammers dealing with RBLs.  You
> may have dumber spammers but here is what _I_ have seen:
> 
> 1) spammer places a list of 10,000 victim addresses in the
> job queue of the mothership.  Periodically in that list there
> are monitoring addresses.
> 
> 2) bot picks up the list & supplies it's IP.  Spammer starts
> monitoring RBLs for that IP, pulls list from mothership.
> 
> 3) bot starts spewing
> 
> 4) bot gets on RBL
> 
> 5) Spammer checks monitoring boxes to see how far bot got in
> the list.  Spammer then puts job in the mothership queue to
> idle the RBLd bot. Spammer then subtracts successfully sent victims
> from the list and then adds more new victims to the list
> then starts over at #1.  Spammer then waits a while and starts
> issuing de-list requests to the RBL's for the list of bots that
> have been RBLed.
> 
> The whole thing is done under software control.  The effect
> is the list is cycled through hundreds of bots that get RBLd
> each bot working further along the list

and *that is* why i referred to postscreen

* you can add a lot of lists with different weights
* you steal any new bot some seconds before anser at all
* most of the bots have a score above 20 with the config below
* our own RBL is feeded by free honeypot IP's on a lot of ports

postscreen_dnsbl_ttl = 5m
postscreen_dnsbl_threshold = 8
postscreen_dnsbl_action = enforce
postscreen_greet_action  = enforce
postscreen_greet_wait = ${stress?2}${stress:10}s
postscreen_dnsbl_sites = dnsbl.thelounge.net*16
 dul.dnsbl.sorbs.net*8
 b.barracudacentral.org*7
 dnsbl.inps.de*7
 zen.spamhaus.org=127.0.0.[10;11]*6
 zen.spamhaus.org=127.0.0.[4..7]*5
 bl.spamcop.net*4
 ix.dnsbl.manitu.net*4
 zen.spamhaus.org=127.0.0.3*4
 bl.mailspike.net*3
 dnsbl-1.uceprotect.net*3
 zen.spamhaus.org=127.0.0.2*3
 bl.spameatingmonkey.net*2
 dnsrbl.swinog.ch*2
 psbl.surriel.com*2
 spam.dnsbl.sorbs.net*2
 ips.backscatterer.org*1

> Now as for dynamic or dialup RBLs go, UNFORTUNATELY although
> many responsible ISPs do insert the word dynamic or dialup
> in the PTRs of their dialup or dynamic pools, a great many
> still do not.  Which means the RBL's that track those need
> to try and tease the list of dynamic RBL's out of those
> providers.  They don't have all of them.

most do one way or another

x.dyn.
x.dyn-
.........

i started together with another sysadmin to collect Regex rules
for postfix (sorry but i use postfix and you could too if you
miss features somewhere else) on top with some keywords like
"mta", "outbound".... and have that rules running in log-only
mode, with the final data there will be some more "whitelists"
of legit mail servers / ISP's with a dumb PTR and after that
have fun with a botnet zombie

"dul.dnsbl.sorbs.net" and "pbl.spamhaus.org" catch almost all
enduser-IP's, the rest is done via PTR filters and the remaining
8-10% eats spamassassin

i know about days with up to one million spam attempts and
the ones slip through our filters where not really more, only
noticed by some alter messages about uncommon CPU load


Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 9/3/2014 11:13 AM, Reindl Harald wrote:
>
>
> Am 03.09.2014 um 19:16 schrieb Ted Mittelstaedt:
>>
>>
>> On 9/2/2014 1:52 PM, Reindl Harald wrote:
>>>
>>> Am 02.09.2014 um 22:32 schrieb Ted Mittelstaedt:
>>>> On 9/2/2014 4:59 AM, Reindl Harald wrote:
>>>>>> just get a proper MTA, enable debug logging
>>>>>> and watch the commands / responses between
>>>>>> client and server due a message transmission
>>>>>
>>>>> and to make it clear for you:
>>>>>
>>>>> until after end of data itslef is responded with success
>>>>> the message is *undelivered* and tried again from the
>>>>> sendig client if it is a proper MTA
>>>>>
>>>>
>>>> However you have GIVEN THE SPAMMER AN OK that they have a
>>>> valid victim address.  You had to issue an OK to the
>>>> RCPT TO: to get that DATA from them.  You just told them "you
>>>> got a good email address"
>>>
>>> child you do not realize that all you claim below has
>>> nothing to do with SA, nor did you understand how
>>> a *layered* spam protecton works nor did you try
>>> to understand *anything* i explained you
>>>
>>
>> You changed the discussion from what SA does to your Postfix
>> pre acceptance filter solution many posts ago.  Weren't you
>> paying attention to your own posts?
>
> i explained you a *complete setup* and why SA is only a piece
>
>>> so what - otherwise i had even accepted the message as you do
>>>
>>> your setup:
>>>    * not on RBL
>>>    * accept it and drop it silently because the score
>>>    * issue "250 OK i even took the whole message"
>>>    * how do you think you don't leak the RCPT case
>>>    * frankly with the 250 OK you invite to send more spam
>>>
>>> my setup:
>>>    * not on a RBL
>>>    * reject it
>>>    * don't issue "250 OK i even took the whole message""
>>>    * if it was not a spammer trigger a bounce on the
>>>      senders server so that he don't think it was
>>>      successful delivered and can even prove it by logs
>>>
>>
>> Your words:
>>
>> "i am saying to my milter "above score XYZ reject the message" because
>> only the [SPAM] in the subject is worthless, it just indicates which
>> messages should go to "spam" or "ham" fpr bayes training"
>>
>> I interpreted this to mean you are attempting to content filter on
>> the Subject: line with a milter.  That is a post-acceptance,
>> post-recipient-address exposure-leak kind of filter
>> with the drawbacks I already illustrated
>
> what SA typically does is add the [SPAM] to subejct and
> a header to allow filtering for MUA or Sieve - and
> that is not really helpful
>
>> Once more your treating postscreen like a black box and conflating
>> it with the discussion of what pre acceptance filtering is and
>> what content filtering is.  It's just more Postfix
>> evangelizing.
>
> no - the be behavior of a complete setup is that only
> a small piece makes it to the extensive content filter
>
> wether you use postfix , exim or something else
> well, postfix indeed has some unique features
>
>> if postscreen rejects on a DNSBL without issuing an OK to the
>> RCPT TO: then it's NOT content filtering or leaking addresses
>
> the *whole* point was to explain you that content filtering
> is only one part of the game
>
>> It is pre acceptance filtering.  This is the same as greylisting
>> (which I do and which mainly eliminates dumb bots that ignore
>> SMTP response codes), and many other kinds of pre-acceptance
>> filters (PTR checks, milter-callback schemes, SPF and DKIM, etc.)
>
> it has *nothing* to do with greylisting
> greylisting answers with a 4xx code
>
> http://www.postfix.org/postconf.5.html#smtpd_delay_reject
>
>> And this discussion has little to do with SpamAssassin.  YOU dragged
>> postscreen in here and the notion of pre-acceptance filtering
>> (when you referred to postscreen using DNSBLs) several posts ago.
>
> because i tried to explain you that SA alone is not
> a ready solution as you expect and as many others
> explained you: it is a framework and *part* of a
> solution which needs knowledge to setup
>
>> If postscreen rejects on a scan of the Subject: line then it
>> has issued an OK to a RCPT TO: and leaked an address - and it is
>> content filtering the same as SpamAssassin.
>
> that's how a spamfilter works: different stages:
>
> * IP / DNSBL ->  OK / Reject
> * PTR: OK / Reject
> * Sender address: OK / Reject
> * SPF: OK / Reject
> * Subject: OK / Reject
> * Contentfilter: Ok / REJECT within the session
> * Virus Scanner: OK / REJECT within the session
>
>> At this point I think that further demonstrations from each of us as to what the
>> SMTP handshake is are rather pointless.
>
> yes
>
>> In reviewing your posts and my responses it's obvious that your posts are
>> all based on a Postfix/postscreen setup and you are making an assumption
>> that everyone out there works with this software, and can understand what
>> your saying because they understand that context.
>
> i expect if someone talks about a contentscanner he understands
> the context independet of the software and keep in mind milters
> are not postfix-specific, frankly they are a sendmail thing
> originally
>
> http://www.postfix.org/MILTER_README.html
> http://en.wikipedia.org/wiki/Milter
>
> *that* would have been postfix specific
> http://www.postfix.org/SMTPD_PROXY_README.html
>
>> This is the mark of a software zealot and evangelist.
>
> it don't matter which MTA you use in combination with a contentfilter
>
> the main difference is reject or accept and deliver anyways or drop
> silent and the main point is that you don't want to handoff 90%
> of your incoming mailflow to the contentscanner at all
>
>> I am attempting to respond based on the normal understanding of what the
>> basic blocks of email transmission are using vendor-neutral language
>> as much as possible.  It is no wonder your getting terribly pissed off
>> about it.  My refusal to engage in this discussion by accepting all your
>> vendor-specific Postfix terms, you are interpreting as a slap in the face to
>> Postfix and your enraged by this.
>
> if we are talk about email you need to understand what a milter is, what
> before-queue filtering is, what a queue is and on which levels of the
> protocl you can snap in which filters and how "expensive" they are
> to make decisions
>
>> I want to talk about ideas
>
> me too
>
>> You want to talk about Postfix.
>
> no i explained on *examples* how things are woring
>
>> That is fine except this isn't a Postfix mailing list.
>
> that's not the point
>
> you expect 90% filtering by a single contentfilter out of the box

Ah, now I understand why you think that.

You are correct I expected 90% filtering by a content filter out
of the box.

But, I made an assumption when I said that - which was that
the content filter was being used after MTA checks.  Because I
assumed that this is how most people use SA.  My bad.

> and refuse to understand that this is just impossible and that
> you pretend gmail can do that is a impudence because no user
> ever had a touch with "oout-of-the-box" of gmail
>
>> When you can drop all references to Postfix terms and drop the
>> insistence that all MTA's operate the same as an MTA running
>> Postfix and all it's associated programs, then I think we can get
>> somewhere.
>
> no, you just could Google somethings, read what i talk
> about *before* you respond and then seek for how to
> do that with your MTA or even realize why i use
> a specific one
>
>> I will simply end by echoing something you said earlier:
>> content filtering should be the LAST RESORT in the filtering chain
>
> which i told you at the very beginning of that thread
> and after you insited in SA has to catch more i explained you
> why that is impossible and others tried that too
>
>> and I will add my own to that:
>>
>> NOBODY has come up with a filter yet that works well and
>> ONLY considers the SMTP handshake BEFORE issuing an OK on
>> receipt of the recipient's address, that has any long-term
>> stick-tion
>
> because it is technically impossible
>
> * you don't know if you accept the body before you scan it
> * you must answer the question for the valid RCPT before
>

Which is why filtering spam without a content filter at the end of the 
chain is an exercise in futility.  And which conversely why filtering
BEFORE leaking the recipient address is not that effective - or as
you say "technically impossible"

>> Every one that has been created has had countermeasures worked
>> out by spammers. The fact that many unsophisticated spammers
>> don't use these countermeasures is beside the point, enough do
>> that too much spam gets past to NOT use content filtering
>> (such as provided by SA)
>
> there are no countermeasures for a spammer against make it
> on a RBL or use a zombie on a infected machine and get
> blocked by Dialup-RBL's before the first mail or by
> get rejected because the dynamic PTR of the infected
> zombie
>

Yes, there are.  The countermeasure is to shift the list of
victim addresses to a different bot or hijacked machine.

Here's how I have observed spammers dealing with RBLs.  You
may have dumber spammers but here is what _I_ have seen:

1) spammer places a list of 10,000 victim addresses in the
job queue of the mothership.  Periodically in that list there
are monitoring addresses.

2) bot picks up the list & supplies it's IP.  Spammer starts
monitoring RBLs for that IP, pulls list from mothership.

3) bot starts spewing

4) bot gets on RBL.

5) Spammer checks monitoring boxes to see how far bot got in
the list.  Spammer then puts job in the mothership queue to
idle the RBLd bot. Spammer then subtracts successfully sent victims
from the list and then adds more new victims to the list
then starts over at #1.  Spammer then waits a while and starts
issuing de-list requests to the RBL's for the list of bots that
have been RBLed.

The whole thing is done under software control.  The effect
is the list is cycled through hundreds of bots that get RBLd
each bot working further along the list.

I also see many spammers who have a pretty good idea of how
many spams they can send out in a bot, so they break the list
into smaller chunks and put each chunk into the mothership
job queue.  Then the bots pick up each chunk and work on it.
By the time the bot is about finished with the job, it's on an
RBL.

Now as for dynamic or dialup RBLs go, UNFORTUNATELY although
many responsible ISPs do insert the word dynamic or dialup
in the PTRs of their dialup or dynamic pools, a great many
still do not.  Which means the RBL's that track those need
to try and tease the list of dynamic RBL's out of those
providers.  They don't have all of them.

> you can setup your own servers but they will land in
> a short on a RBL (the R is for real time)

Yes in which case the spammer just bounces the circuit to
the provider, obtains a new IP address via DHCP, and changes the
DNS and keeps going on.

Years ago when I sold DSL I got contacted periodically by
people who would sign up a DSL account then a month later they
would cancel for no good reason.  When I first started selling,
not knowing anything, I called some of the people cancelling their
accounts to ask them why they cancelled.  I got it explained
more than once that they were cancelling because when they
issued a new DHCP request they were getting the same IP address
back that they had previously.  We had our DHCP server setup
so that all accounts got a reserved lease, basically, so that
all customers essentially got a static IP handed out via DHCP.
This was to make it easier on us if we got a complaint from
somewhere that a particular IP was doing something illegal.

and they
> have not the same throughbot than a zombie-network
>

Our local cable provider here sells residential accounts with gobs of 
throughput and their DHCP server is setup to always
give a new IP out when a residential account requests DHCP, this
is to prevent businesses from buying the cheaper residential accounts
that only come with dynamic IP addresses, and basically using them
as statically assigned IP accounts.

And they have a huge network block.

There are also gigantic IP network blocks tied up by the cellular
companies and those are also easy to setup 4G systems that refresh
IPs off of those.

With repeated use, botnets become less usable.  Many RBLs when you
request repeated de-lists on the RBL, they make it harder and harder
to delist.

For example take yahoo's RBL.  The first time you get on it simply
waiting 24 hours it will delist.

The next time then you have to wait a week - or send in a complaint to
them.

Then next time you won't auto delist.  You MUST send in a complaint.

The 4th time even sending in a complaint won't get you an immediate 
delist.  Instead they will say they delisted you but they still will
have you listed for a week.

Then 5th time they will start denying delist complaints.  To get 
delisted you have to complain, get denied, then complain a week later 
before they will delist.

The 6th time is the same as the 5th time but after they say you are
delisted, you still have to wait a week.

And, when you send in complaints, if they have no record if receiving 
legitimate mail from you in between listings - then forget it, they
will never delist you.


Microsoft, on the other hand, works a little differently.  After the 3rd 
or 4th listing, you will NEVER get delisted unless you sign up
for their junk mail limitation programs.  BUT, once you sign up for
their junk mail feedback programs then they will delist immediately
if you complain.

And in both those providers, those are the -internal- RBLs which are
not queryable by the general public, you can only find out your on them 
if you have a bona-fied mailserver that will record the return codes.

And the complaints can only be submitted via webform so a human has to 
spend time filling out the forms.

Many of the public RBLs are now also warning against repeated delisting
requests for the same IP.

In my opinion the ONLY thing that keeps spam botnets alive nowadays
is many provider's insistence on rotating IPs that are handed out via 
DHCP - so they can block businesses from using residential accounts
and the dynamic IPs on them as static IPs.

And now in the US we have Google pushing to enter the ISP market with 
Google Fiber - and Google DOES NOT and WILL NOT offer business accounts
AT ALL so it's clear that ALL IP numbering on their network will be as
forced to be dynamic as possible.  Botnet heaven!

And the whole thing is also changing with IPv6.  Practically all mail I 
get from Google nowadays is via IPv6.  RBLs will not be that useful for
those without significant logic changes.

>> Thus, while SA (and any content filter) should be the last
>> filter in the chain, it's impossible to not have it - it
>> is required.
>
> to filter out the remaining 5-10% percent
>

I think the percentages might be a little higher than that
for a lot of sites.

>> In fact, (although I DO NOT do this) many people just accept
>> everything and put SA at the beginning of the filter chain
>> and get almost the same results
>
> no - read how a milter works (not postfix specific)
>

I avoid use of the term "milter" since it implies Sendmail.

I use Sendmail myself and use milters.

> it's in the middle of the chain and get the whole data
> from headers to body, but the MTA can reject a message
> based on the envelope data and stop before body
>

It can - and that is how I do it - but SA makes the same checks - in 
fact that is a feature that has been touted for it - RBL checking - and
so if a site is not doing an RBL check with their MTA then
SA is going to get it anyway.

SA's documentation even advises that having SA do the RBL check is 
better than having the MTA do it before accepting the message.

Ted

Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.

Am 03.09.2014 um 19:16 schrieb Ted Mittelstaedt:
> 
> 
> On 9/2/2014 1:52 PM, Reindl Harald wrote:
>>
>> Am 02.09.2014 um 22:32 schrieb Ted Mittelstaedt:
>>> On 9/2/2014 4:59 AM, Reindl Harald wrote:
>>>>> just get a proper MTA, enable debug logging
>>>>> and watch the commands / responses between
>>>>> client and server due a message transmission
>>>>
>>>> and to make it clear for you:
>>>>
>>>> until after end of data itslef is responded with success
>>>> the message is *undelivered* and tried again from the
>>>> sendig client if it is a proper MTA
>>>>
>>>
>>> However you have GIVEN THE SPAMMER AN OK that they have a
>>> valid victim address.  You had to issue an OK to the
>>> RCPT TO: to get that DATA from them.  You just told them "you
>>> got a good email address"
>>
>> child you do not realize that all you claim below has
>> nothing to do with SA, nor did you understand how
>> a *layered* spam protecton works nor did you try
>> to understand *anything* i explained you
>>
> 
> You changed the discussion from what SA does to your Postfix
> pre acceptance filter solution many posts ago.  Weren't you 
> paying attention to your own posts?

i explained you a *complete setup* and why SA is only a piece

>> so what - otherwise i had even accepted the message as you do
>>
>> your setup:
>>   * not on RBL
>>   * accept it and drop it silently because the score
>>   * issue "250 OK i even took the whole message"
>>   * how do you think you don't leak the RCPT case
>>   * frankly with the 250 OK you invite to send more spam
>>
>> my setup:
>>   * not on a RBL
>>   * reject it
>>   * don't issue "250 OK i even took the whole message""
>>   * if it was not a spammer trigger a bounce on the
>>     senders server so that he don't think it was
>>     successful delivered and can even prove it by logs
>>
> 
> Your words:
> 
> "i am saying to my milter "above score XYZ reject the message" because
> only the [SPAM] in the subject is worthless, it just indicates which
> messages should go to "spam" or "ham" fpr bayes training"
> 
> I interpreted this to mean you are attempting to content filter on
> the Subject: line with a milter.  That is a post-acceptance, 
> post-recipient-address exposure-leak kind of filter
> with the drawbacks I already illustrated

what SA typically does is add the [SPAM] to subejct and
a header to allow filtering for MUA or Sieve - and
that is not really helpful

> Once more your treating postscreen like a black box and conflating
> it with the discussion of what pre acceptance filtering is and 
> what content filtering is.  It's just more Postfix
> evangelizing.

no - the be behavior of a complete setup is that only
a small piece makes it to the extensive content filter

wether you use postfix , exim or something else
well, postfix indeed has some unique features

> if postscreen rejects on a DNSBL without issuing an OK to the
> RCPT TO: then it's NOT content filtering or leaking addresses

the *whole* point was to explain you that content filtering
is only one part of the game

> It is pre acceptance filtering.  This is the same as greylisting 
> (which I do and which mainly eliminates dumb bots that ignore 
> SMTP response codes), and many other kinds of pre-acceptance
> filters (PTR checks, milter-callback schemes, SPF and DKIM, etc.)

it has *nothing* to do with greylisting
greylisting answers with a 4xx code

http://www.postfix.org/postconf.5.html#smtpd_delay_reject

> And this discussion has little to do with SpamAssassin.  YOU dragged
> postscreen in here and the notion of pre-acceptance filtering
> (when you referred to postscreen using DNSBLs) several posts ago.

because i tried to explain you that SA alone is not
a ready solution as you expect and as many others
explained you: it is a framework and *part* of a
solution which needs knowledge to setup

> If postscreen rejects on a scan of the Subject: line then it
> has issued an OK to a RCPT TO: and leaked an address - and it is
> content filtering the same as SpamAssassin.

that's how a spamfilter works: different stages:

* IP / DNSBL -> OK / Reject
* PTR: OK / Reject
* Sender address: OK / Reject
* SPF: OK / Reject
* Subject: OK / Reject
* Contentfilter: Ok / REJECT within the session
* Virus Scanner: OK / REJECT within the session

> At this point I think that further demonstrations from each of us as to what the
> SMTP handshake is are rather pointless.

yes

> In reviewing your posts and my responses it's obvious that your posts are 
> all based on a Postfix/postscreen setup and you are making an assumption 
> that everyone out there works with this software, and can understand what 
> your saying because they understand that context.

i expect if someone talks about a contentscanner he understands
the context independet of the software and keep in mind milters
are not postfix-specific, frankly they are a sendmail thing
originally

http://www.postfix.org/MILTER_README.html
http://en.wikipedia.org/wiki/Milter

*that* would have been postfix specific
http://www.postfix.org/SMTPD_PROXY_README.html

> This is the mark of a software zealot and evangelist.

it don't matter which MTA you use in combination with a contentfilter

the main difference is reject or accept and deliver anyways or drop
silent and the main point is that you don't want to handoff 90%
of your incoming mailflow to the contentscanner at all

> I am attempting to respond based on the normal understanding of what the
> basic blocks of email transmission are using vendor-neutral language
> as much as possible.  It is no wonder your getting terribly pissed off 
> about it.  My refusal to engage in this discussion by accepting all your 
> vendor-specific Postfix terms, you are interpreting as a slap in the face to
> Postfix and your enraged by this.

if we are talk about email you need to understand what a milter is, what
before-queue filtering is, what a queue is and on which levels of the
protocl you can snap in which filters and how "expensive" they are
to make decisions

> I want to talk about ideas

me too

> You want to talk about Postfix.  

no i explained on *examples* how things are woring

> That is fine except this isn't a Postfix mailing list.

that's not the point

you expect 90% filtering by a single contentfilter out of the box
and refuse to understand that this is just impossible and that
you pretend gmail can do that is a impudence because no user
ever had a touch with "oout-of-the-box" of gmail

> When you can drop all references to Postfix terms and drop the 
> insistence that all MTA's operate the same as an MTA running 
> Postfix and all it's associated programs, then I think we can get 
> somewhere.

no, you just could Google somethings, read what i talk
about *before* you respond and then seek for how to
do that with your MTA or even realize why i use
a specific one

> I will simply end by echoing something you said earlier:
> content filtering should be the LAST RESORT in the filtering chain

which i told you at the very beginning of that thread
and after you insited in SA has to catch more i explained you
why that is impossible and others tried that too

> and I will add my own to that:
> 
> NOBODY has come up with a filter yet that works well and 
> ONLY considers the SMTP handshake BEFORE issuing an OK on
> receipt of the recipient's address, that has any long-term 
> stick-tion

because it is technically impossible

* you don't know if you accept the body before you scan it
* you must answer the question for the valid RCPT before

> Every one that has been created has had countermeasures worked 
> out by spammers. The fact that many unsophisticated spammers 
> don't use these countermeasures is beside the point, enough do 
> that too much spam gets past to NOT use content filtering 
> (such as provided by SA)

there are no countermeasures for a spammer against make it
on a RBL or use a zombie on a infected machine and get
blocked by Dialup-RBL's before the first mail or by
get rejected because the dynamic PTR of the infected
zombie

you can setup your own servers but they will land in
a short on a RBL (the R is for real time) and they
have not the same throughbot than a zombie-network

> Thus, while SA (and any content filter) should be the last 
> filter in the chain, it's impossible to not have it - it 
> is required.

to filter out the remaining 5-10% percent

> In fact, (although I DO NOT do this) many people just accept 
> everything and put SA at the beginning of the filter chain
> and get almost the same results

no - read how a milter works (not postfix specific)

it's in the middle of the chain and get the whole data
from headers to body, but the MTA can reject a message
based on the envelope data and stop before body


Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 9/2/2014 1:52 PM, Reindl Harald wrote:
>
> Am 02.09.2014 um 22:32 schrieb Ted Mittelstaedt:
>> On 9/2/2014 4:59 AM, Reindl Harald wrote:
>>>> just get a proper MTA, enable debug logging
>>>> and watch the commands / responses between
>>>> client and server due a message transmission
>>>
>>> and to make it clear for you:
>>>
>>> until after end of data itslef is responded with success
>>> the message is *undelivered* and tried again from the
>>> sendig client if it is a proper MTA
>>>
>>
>> However you have GIVEN THE SPAMMER AN OK that they have a
>> valid victim address.  You had to issue an OK to the
>> RCPT TO: to get that DATA from them.  You just told them "you
>> got a good email address"
>
> child you do not realize that all you claim below has
> nothing to do with SA, nor did you understand how
> a *layered* spam protecton works nor did you try
> to understand *anything* i explained you
>

You changed the discussion from what SA does to your Postfix
pre acceptance filter solution many posts ago.  Weren't you paying 
attention to your own posts?

> so what - otherwise i had even accepted the message as you do
>
> your setup:
>   * not on RBL
>   * accept it and drop it silently because the score
>   * issue "250 OK i even took the whole message"
>   * how do you think you don't leak the RCPT case
>   * frankly with the 250 OK you invite to send more spam
>
> my setup:
>   * not on a RBL
>   * reject it
>   * don't issue "250 OK i even took the whole message""
>   * if it was not a spammer trigger a bounce on the
>     senders server so that he don't think it was
>     successful delivered and can even prove it by logs
>

Your words:

"i am saying to my milter "above score XYZ reject the message" because
only the [SPAM] in the subject is worthless, it just indicates which
messages should go to "spam" or "ham" fpr bayes training"

I interpreted this to mean you are attempting to content filter on
the Subject: line with a milter.  That is a post-acceptance, 
post-recipient-address exposure-leak kind of filter with the drawbacks I 
already illustrated

>>> if your MTA *don't repsond with success* at END-OF-DATA
>>> the message implicit is counted as *not delivered* because
>>> simply in the middle of data the server could raised
>>> an error by a full disk or something else
>>
>> Yes and the spammer just tries again.  And again.  And again,
>> forever and forever.
>
> so what - what has that to do with anything i explained
> and you refuse to understand over the whole thread?
>
>> The point of blocking on DNS or IP based blocking is to issue
>> that error 5xx because that is the ONLY thing that is going to
>> cause the spammer to delist.  Because at that point they are
>> now wasting money and time and resources attempting to deliver
>> to an address that probably does not exist.
>
> so what - that one was not on a RBL
> and now?
>
> accept the spam message or *reject* it?
>
> i at least reject it
> you accept it, say "250 OK" and then drop it silent
>
>> Sure they can parse the return code, looking for polite language
>> saying something to the effect "this email is being blocked because
>> you are on Wonkulating Gronkluator's blacklist" that some sites
>> issue to "help" newbie Postmasters realize that their mailserver
>> is being hijacked, or something of that nature.
>
> what has that to to with the topic?
>
>> But they GUESS so many of their victim addresses that they
>> can't spend the resources doing that on a dictionary attack,
>> they KNOW that 99.99% of the error 5xx's they get back are
>> for User Unknown.  So the few times they guess a real address
>> and get that polite human-readable explanation that they are
>> on a blacklist, gets lost in the noise.
>
> what has that to to with the topic?
>
>> But YOUR setup - why that's spam flypaper.  Because, YOU are
>> NOT issuing an error 5xx on a sender IP that happens
>> to guess one of your users email addresses - because your just
>> too curious to get at the DATA and inspect the Subject: line.
>
> jesus christ - Subject is not data - subject is part of the header
> come back after you made it through *basic lessons*
>

Subject: is part of DATA.  If your going to claim it's otherwise
then your simply not understanding the SMTP handshake.

>
> the client makes it to spamass-milter because he is *not*
> on the 15 blacklists in front of
>
>> Thus you are HELPING the spammers build a list of valid email
>> addresses on your domain.
>
> bullshit - i reject more than 90% like you
> but i don't issue "250 OK" for clear spam, i reject it
>
>> No wonder you have such spectacular spam counts. The spammers
>> must just love you.  Your handing them over your user email
>> list.
>
> sorry, but you are an idiot
>
> i handle nothing because the accept of DATA only happens
> if the client is not listed on RBL's and so you better
> stop to spread bullshit just because you don't understand
> what people exlaining you by wasting time with your posts
>
>> Sure, you may determine they are operating from a blacklist
>> and shut them down after they throw you 1,000 guesses from an IP
>> address.  But in so doing you have handed them 10 good addresses
>> that they will remember and just attack you from
>> somewhere else from
>
> bullshit again - more than 90% are rejected by postscreen and RBL's
> frankly postscreen can't leak valid addresses because it even
> don't know them - that information has only the smtpd process
> if you make it through RBL's and protocol tests
>

Once more your treating postscreen like a black box and conflating
it with the discussion of what pre acceptance filtering is and what 
content filtering is.  It's just more Postfix evangelizing.

if postscreen rejects on a DNSBL without issuing an OK to the
RCPT TO: then it's NOT content filtering or leaking addresses.  It is
pre acceptance filtering.  This is the same as greylisting (which I
do and which mainly eliminates dumb bots that ignore SMTP response
codes), and many other kinds of pre-acceptance
filters (PTR checks, milter-callback schemes, SPF and DKIM, etc.)

And this discussion has little to do with SpamAssassin.  YOU dragged
postscreen in here and the notion of pre-acceptance filtering
(when you referred to postscreen using DNSBLs) several posts ago.

If postscreen rejects on a scan of the Subject: line then it
has issued an OK to a RCPT TO: and leaked an address - and it is
content filtering the same as SpamAssassin.

Pretty cut and dried.  Also obvious if you stop conflating the MTA 
operation and the filtering operation.

>> Do that a couple hundred times and they have thousands of your
>> valid emails.
>>
>>> so the communication looks somehow like:
>>>
>>> * client: i am sending now data
>>> * server: fine, do so
>>> * client: sending data
>>> * client: i have finished with sending data
>>> * server: ok, i accepted that
>>> * client: fine, QUIT
>>> * client: closes the connection AFETR QUIT
>
> you missed to understand to understand that the above communication
> *happens only* if the client *passed RBL checks* and also *passed*
> HELO checks and also *passed* a lot of other tests and until DATA
> was no other reason found to reject it before
>
>> Here is how yours looks:
>>
>> Spammer:  HELO throwawayhostname.throwawaydomainname.TLD like .eu or .co
>> you:  OK I did my DNS check, my RBL check, my PTR check and that's a good host so go ahead
>> Spammer:  MAIL FROM:<fakename@throwawayhostname.throwawaydomainname.TLD)
>> You:  OK
>> Spammer: RCPT TO<us...@oneofyourdomains.com>
>> You:  OK looks good but I want to see your content, so start sending
>> Spammer to itself HOT DAM I GOT ANOTHER VALID EMAIL ADDRESS TO TORTURE FOREVER
>> Spammer to you:  DATA, blah blah blah, Subject: Viakkagra
>> blah
>> blah
>> you:  OK your content says your a spammer so I'm going to blow the
>> TCP connection and not send a final OK
>> Spammer to itself  SUCKER!!!!  IF HE THINKS I'LL FALL FOR THAT HE'S
>> DUMBER THAN A POST!  I'LL JUST ATTACK FROM SOMEWHERE ELSE USING DIFFERENT CONTENT UNTIL I GET PAST HIS BLOCK.
>
> BULLSHIT
>
> the communication looks like "postfix/postscreen[30894]: NOQUEUE: reject: RCPT from [187.163.175.185]:61326:
> 550 5.7.1 Service unavailable; client [187.163.175.185] blocked using RBL xyz"
>
>> And here is how it SHOULD look:
>>
>> Spammer:  HELO throwawayhostname.throwawaydomainname.TLD like .eu or .co
>> you: OK
>> Spammer:   MAIL FROM:<fakename@throwawayhostname.throwawaydomainname.TLD)
>> you:  OK
>> Spammer: RCPT TO:<us...@oneofyourdomains.com>
>> You to yourself:  Hmm - looks like my user has a blacklist against .co so this guy is unwanted
>> You to spammer:  500 User Unknown
>> Spammer:  DAMN, I guessed wrong.  Toss that one and go on to the next guess.
>>
>> Or even better:
>>
>> Spammer: HELO throwawayhostname.throwawaydomainname.TLD like .eu or .co
>> you to yourself:  We block mail from Russian federation Mafia
>> you to spammer: error 5xx go to hell, spammer.  End TCP connection.
>> Spammer: WTF just happened???
>>
>> Granted, this isn't how SA works but you have been talking about prefiltering and this is how it should look
>
> damned SA is not part of the whole game
>
> more than 90% rejects happening *before handover the connection to smtpd*
> and so *long before* SA comes in the mix at all
>
>>> so *please* refrain from reply and discuss about good
>>> or bad defaults until you learned your *basic sessions*
>
> why did you not follow the advice above?
>

At this point I think that further demonstrations from each of us as to 
what the SMTP handshake is are rather pointless.

In reviewing your posts and my responses it's obvious that your posts 
are all based on a Postfix/postscreen setup and you are making an 
assumption that everyone out there works with this software, and can
understand what your saying because they understand that context.

This is the mark of a software zealot and evangelist.

I am attempting to respond based on the normal understanding of what the
basic blocks of email transmission are using vendor-neutral language
as much as possible.  It is no wonder your getting terribly pissed off 
about it.  My refusal to engage in this discussion by accepting all your 
vendor-specific Postfix terms, you are interpreting as a slap in the 
face to Postfix and your enraged by this.

I want to talk about ideas.  You want to talk about Postfix.  That is 
fine except this isn't a Postfix mailing list.

When you can drop all references to Postfix terms and drop the 
insistence that all MTA's operate the same as an MTA running Postfix and 
all it's associated programs, then I think we can get somewhere.

I will simply end by echoing something you said earlier:

content filtering should be the LAST RESORT in the filtering chain

and I will add my own to that:

NOBODY has come up with a filter yet that works well and ONLY considers 
the SMTP handshake BEFORE issuing an OK on receipt of the recipient's 
address, that has any long-term stick-tion.  Every one that has been 
created has had countermeasures worked out by spammers.  The fact that 
many unsophisticated spammers don't use these countermeasures is beside 
the point, enough do that too much spam gets past to NOT use content
filtering (such as provided by SA)

Thus, while SA (and any content filter) should be the last filter in the 
chain, it's impossible to not have it - it is required.

In fact, (although I DO NOT do this) many people just
accept everything and put SA at the beginning of the filter chain
and get almost the same results.


Ted

Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.
Am 02.09.2014 um 22:32 schrieb Ted Mittelstaedt:
> On 9/2/2014 4:59 AM, Reindl Harald wrote:
>>> just get a proper MTA, enable debug logging
>>> and watch the commands / responses between
>>> client and server due a message transmission
>>
>> and to make it clear for you:
>>
>> until after end of data itslef is responded with success
>> the message is *undelivered* and tried again from the
>> sendig client if it is a proper MTA
>>
> 
> However you have GIVEN THE SPAMMER AN OK that they have a
> valid victim address.  You had to issue an OK to the
> RCPT TO: to get that DATA from them.  You just told them "you
> got a good email address"

child you do not realize that all you claim below has
nothing to do with SA, nor did you understand how
a *layered* spam protecton works nor did you try
to understand *anything* i explained you

so what - otherwise i had even accepted the message as you do

your setup:
 * not on RBL
 * accept it and drop it silently because the score
 * issue "250 OK i even took the whole message"
 * how do you think you don't leak the RCPT case
 * frankly with the 250 OK you invite to send more spam

my setup:
 * not on a RBL
 * reject it
 * don't issue "250 OK i even took the whole message""
 * if it was not a spammer trigger a bounce on the
   senders server so that he don't think it was
   successful delivered and can even prove it by logs

>> if your MTA *don't repsond with success* at END-OF-DATA
>> the message implicit is counted as *not delivered* because
>> simply in the middle of data the server could raised
>> an error by a full disk or something else
> 
> Yes and the spammer just tries again.  And again.  And again,
> forever and forever.

so what - what has that to do with anything i explained
and you refuse to understand over the whole thread?

> The point of blocking on DNS or IP based blocking is to issue
> that error 5xx because that is the ONLY thing that is going to
> cause the spammer to delist.  Because at that point they are
> now wasting money and time and resources attempting to deliver
> to an address that probably does not exist.

so what - that one was not on a RBL
and now?

accept the spam message or *reject* it?

i at least reject it
you accept it, say "250 OK" and then drop it silent

> Sure they can parse the return code, looking for polite language
> saying something to the effect "this email is being blocked because
> you are on Wonkulating Gronkluator's blacklist" that some sites
> issue to "help" newbie Postmasters realize that their mailserver
> is being hijacked, or something of that nature.

what has that to to with the topic?

> But they GUESS so many of their victim addresses that they
> can't spend the resources doing that on a dictionary attack,
> they KNOW that 99.99% of the error 5xx's they get back are
> for User Unknown.  So the few times they guess a real address
> and get that polite human-readable explanation that they are
> on a blacklist, gets lost in the noise.

what has that to to with the topic?

> But YOUR setup - why that's spam flypaper.  Because, YOU are
> NOT issuing an error 5xx on a sender IP that happens
> to guess one of your users email addresses - because your just
> too curious to get at the DATA and inspect the Subject: line.

jesus christ - Subject is not data - subject is part of the header
come back after you made it through *basic lessons*


the client makes it to spamass-milter because he is *not*
on the 15 blacklists in front of

> Thus you are HELPING the spammers build a list of valid email
> addresses on your domain.

bullshit - i reject more than 90% like you
but i don't issue "250 OK" for clear spam, i reject it

> No wonder you have such spectacular spam counts. The spammers
> must just love you.  Your handing them over your user email
> list.

sorry, but you are an idiot

i handle nothing because the accept of DATA only happens
if the client is not listed on RBL's and so you better
stop to spread bullshit just because you don't understand
what people exlaining you by wasting time with your posts

> Sure, you may determine they are operating from a blacklist
> and shut them down after they throw you 1,000 guesses from an IP
> address.  But in so doing you have handed them 10 good addresses 
> that they will remember and just attack you from
> somewhere else from

bullshit again - more than 90% are rejected by postscreen and RBL's
frankly postscreen can't leak valid addresses because it even
don't know them - that information has only the smtpd process
if you make it through RBL's and protocol tests

> Do that a couple hundred times and they have thousands of your
> valid emails.
> 
>> so the communication looks somehow like:
>>
>> * client: i am sending now data
>> * server: fine, do so
>> * client: sending data
>> * client: i have finished with sending data
>> * server: ok, i accepted that
>> * client: fine, QUIT
>> * client: closes the connection AFETR QUIT

you missed to understand to understand that the above communication
*happens only* if the client *passed RBL checks* and also *passed*
HELO checks and also *passed* a lot of other tests and until DATA
was no other reason found to reject it before

> Here is how yours looks:
> 
> Spammer:  HELO throwawayhostname.throwawaydomainname.TLD like .eu or .co
> you:  OK I did my DNS check, my RBL check, my PTR check and that's a good host so go ahead
> Spammer:  MAIL FROM: <fakename@throwawayhostname.throwawaydomainname.TLD)
> You:  OK
> Spammer: RCPT TO <us...@oneofyourdomains.com>
> You:  OK looks good but I want to see your content, so start sending
> Spammer to itself HOT DAM I GOT ANOTHER VALID EMAIL ADDRESS TO TORTURE FOREVER
> Spammer to you:  DATA, blah blah blah, Subject: Viakkagra
> blah
> blah
> you:  OK your content says your a spammer so I'm going to blow the
> TCP connection and not send a final OK
> Spammer to itself  SUCKER!!!!  IF HE THINKS I'LL FALL FOR THAT HE'S
> DUMBER THAN A POST!  I'LL JUST ATTACK FROM SOMEWHERE ELSE USING DIFFERENT CONTENT UNTIL I GET PAST HIS BLOCK.

BULLSHIT

the communication looks like "postfix/postscreen[30894]: NOQUEUE: reject: RCPT from [187.163.175.185]:61326:
550 5.7.1 Service unavailable; client [187.163.175.185] blocked using RBL xyz"

> And here is how it SHOULD look:
> 
> Spammer:  HELO throwawayhostname.throwawaydomainname.TLD like .eu or .co
> you: OK
> Spammer:   MAIL FROM: <fakename@throwawayhostname.throwawaydomainname.TLD)
> you:  OK
> Spammer: RCPT TO:  <us...@oneofyourdomains.com>
> You to yourself:  Hmm - looks like my user has a blacklist against .co so this guy is unwanted
> You to spammer:  500 User Unknown
> Spammer:  DAMN, I guessed wrong.  Toss that one and go on to the next guess.
> 
> Or even better:
> 
> Spammer: HELO throwawayhostname.throwawaydomainname.TLD like .eu or .co
> you to yourself:  We block mail from Russian federation Mafia
> you to spammer: error 5xx go to hell, spammer.  End TCP connection.
> Spammer: WTF just happened???
> 
> Granted, this isn't how SA works but you have been talking about prefiltering and this is how it should look

damned SA is not part of the whole game

more than 90% rejects happening *before handover the connection to smtpd*
and so *long before* SA comes in the mix at all

>> so *please* refrain from reply and discuss about good
>> or bad defaults until you learned your *basic sessions*

why did you not follow the advice above?


Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 9/2/2014 4:59 AM, Reindl Harald wrote:
>
>
> Am 02.09.2014 um 13:54 schrieb Reindl Harald:
>>
>> Am 02.09.2014 um 13:43 schrieb Ted Mittelstaedt:
>>>> as explained above:
>>>>
>>>> * the users don't want to see clear spam at all
>>>> * in many countries *you must* reject before-queue
>>>> * frankly, where i live for drop a accepted messages
>>>>     you can go up to 2 years *in jail*
>>>
>>> This is really getting silly
>>
>> yes, your response
>>
>>> Once you accept DATA on the SMTP handshake so you can
>>> read the Subject: line you have accepted the message
>>> whether you queue it or not.
>>
>> bullshit - the is a "END-OF-DATA" response in the SMTP
>> protocol and if what you say would be true even a basic
>> postfix reject on headers would not work
>>
>> the prerequisite to discuss about high level MTA
>> technology is that you understand basics but you
>> even fail to distinct between data and headers
>> nor do you realize how a SMTP session works
>>
>> just get a proper MTA, enable debug logging
>> and watch the commands / responses between
>> client and server due a message transmission
>
> and to make it clear for you:
>
> until after end of data itslef is responded with success
> the message is *undelivered* and tried again from the
> sendig client if it is a proper MTA
>

However you have GIVEN THE SPAMMER AN OK that they have a
valid victim address.  You had to issue an OK to the
RCPT TO: to get that DATA from them.  You just told them "you
got a good email address"

> if your MTA *don't repsond with success* at END-OF-DATA
> the message implicit is counted as *not delivered* because
> simply in the middle of data the server could raised
> an error by a full disk or something else
>

Yes and the spammer just tries again.  And again.  And again,
forever and forever.

The point of blocking on DNS or IP based blocking is to issue
that error 5xx because that is the ONLY thing that is going to
cause the spammer to delist.  Because at that point they are
now wasting money and time and resources attempting to deliver
to an address that probably does not exist.

Sure they can parse the return code, looking for polite language
saying something to the effect "this email is being blocked because
you are on Wonkulating Gronkluator's blacklist" that some sites
issue to "help" newbie Postmasters realize that their mailserver
is being hijacked, or something of that nature.

But they GUESS so many of their victim addresses that they
can't spend the resources doing that on a dictionary attack,
they KNOW that 99.99% of the error 5xx's they get back are
for User Unknown.  So the few times they guess a real address
and get that polite human-readable explanation that they are
on a blacklist, gets lost in the noise.

But YOUR setup - why that's spam flypaper.  Because, YOU are
NOT issuing an error 5xx on a sender IP that happens
to guess one of your users email addresses - because your just
too curious to get at the DATA and inspect the Subject: line.

Thus you are HELPING the spammers build a list of valid email
addresses on your domain.

No wonder you have such spectacular spam counts.  The spammers
must just love you.  Your handing them over your user email
list. Sure, you may determine they are operating from a blacklist
and shut them down after they throw you 1,000 guesses from an IP
address.  But in so doing you have handed them 10 good addresses that 
they will remember and just attack you from somewhere else from.
Do that a couple hundred times and they have thousands of your
valid emails.

> so the communication looks somehow like:
>
> * client: i am sending now data
> * server: fine, do so
> * client: sending data
> * client: i have finished with sending data
> * server: ok, i accepted that
> * client: fine, QUIT
> * client: closes the connection AFETR QUIT
>

Here is how yours looks:

Spammer:  HELO throwawayhostname.throwawaydomainname.TLD like .eu or .co
you:  OK I did my DNS check, my RBL check, my PTR check and that's a 
good host so go ahead
Spammer:  MAIL FROM: <fakename@throwawayhostname.throwawaydomainname.TLD)
You:  OK
Spammer: RCPT TO <us...@oneofyourdomains.com>
You:  OK looks good but I want to see your content, so start sending
Spammer to itself HOT DAM I GOT ANOTHER VALID EMAIL ADDRESS TO TORTURE 
FOREVER
Spammer to you:  DATA, blah blah blah, Subject: Viakkagra
blah
blah
blah
you:  OK your content says your a spammer so I'm going to blow the
TCP connection and not send a final OK
Spammer to itself  SUCKER!!!!  IF HE THINKS I'LL FALL FOR THAT HE'S
DUMBER THAN A POST!  I'LL JUST ATTACK FROM SOMEWHERE ELSE USING 
DIFFERENT CONTENT UNTIL I GET PAST HIS BLOCK.

And here is how it SHOULD look:

Spammer:  HELO throwawayhostname.throwawaydomainname.TLD like .eu or .co
you: OK
Spammer:   MAIL FROM: <fakename@throwawayhostname.throwawaydomainname.TLD)
you:  OK
Spammer: RCPT TO:  <us...@oneofyourdomains.com>
You to yourself:  Hmm - looks like my user has a blacklist against .co 
so this guy is unwanted
You to spammer:  500 User Unknown
Spammer:  DAMN, I guessed wrong.  Toss that one and go on to the next guess.

Or even better:

Spammer: HELO throwawayhostname.throwawaydomainname.TLD like .eu or .co
you to yourself:  We block mail from Russian federation Mafia
you to spammer: error 5xx go to hell, spammer.  End TCP connection.
Spammer: WTF just happened???

Granted, this isn't how SA works but you have been talking about 
prefiltering and this is how it should look.

Ted


> so *please* refrain from reply and discuss about good
> or bad defaults until you learned your *basic sessions*
>
>

Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.

Am 02.09.2014 um 13:54 schrieb Reindl Harald:
> 
> Am 02.09.2014 um 13:43 schrieb Ted Mittelstaedt:
>>> as explained above:
>>>
>>> * the users don't want to see clear spam at all
>>> * in many countries *you must* reject before-queue
>>> * frankly, where i live for drop a accepted messages
>>>    you can go up to 2 years *in jail*
>>
>> This is really getting silly
> 
> yes, your response
> 
>> Once you accept DATA on the SMTP handshake so you can 
>> read the Subject: line you have accepted the message 
>> whether you queue it or not. 
> 
> bullshit - the is a "END-OF-DATA" response in the SMTP
> protocol and if what you say would be true even a basic
> postfix reject on headers would not work
> 
> the prerequisite to discuss about high level MTA
> technology is that you understand basics but you
> even fail to distinct between data and headers
> nor do you realize how a SMTP session works
> 
> just get a proper MTA, enable debug logging
> and watch the commands / responses between
> client and server due a message transmission

and to make it clear for you:

until after end of data itslef is responded with success
the message is *undelivered* and tried again from the
sendig client if it is a proper MTA

if your MTA *don't repsond with success* at END-OF-DATA
the message implicit is counted as *not delivered* because
simply in the middle of data the server could raised
an error by a full disk or something else

so the communication looks somehow like:

* client: i am sending now data
* server: fine, do so
* client: sending data
* client: i have finished with sending data
* server: ok, i accepted that
* client: fine, QUIT
* client: closes the connection AFETR QUIT

so *please* refrain from reply and discuss about good
or bad defaults until you learned your *basic sessions*



Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.
Am 02.09.2014 um 13:43 schrieb Ted Mittelstaedt:
>> as explained above:
>>
>> * the users don't want to see clear spam at all
>> * in many countries *you must* reject before-queue
>> * frankly, where i live for drop a accepted messages
>>    you can go up to 2 years *in jail*
> 
> This is really getting silly

yes, your response

> Once you accept DATA on the SMTP handshake so you can 
> read the Subject: line you have accepted the message 
> whether you queue it or not. 

bullshit - the is a "END-OF-DATA" response in the SMTP
protocol and if what you say would be true even a basic
postfix reject on headers would not work

the prerequisite to discuss about high level MTA
technology is that you understand basics but you
even fail to distinct between data and headers
nor do you realize how a SMTP session works

just get a proper MTA, enable debug logging
and watch the commands / responses between
client and server due a message transmission

> Failing to properly TCP close or whatever other trick 
> you want to use to claim you haven't accepted it is 
> semantics used to fool the fools. When you issue 
> the OK to the DATA then you have accepted it

that is *pure bullshit* beccause the milters job
is *not* issue the OK at end of data and that
is why it is running as a milter

realize what i say now is as polite as possible:
please shut up until RTFM or at least try it out yourself

a simple PHP formmailer using "phhmailer" in SMTP mode and
a proper setup is enough to test the behavior, if you are
not above the score "phpmailer" gives back 'true' and in
case you hit the "-r 8" is spamass-milter you get 'false'
and even the response

Sep  2 13:44:35 *** postfix/cleanup[24670]: 3hnRJz1NNVz1w: milter-reject: END-OF-MESSAGE from
unknown[173.232.204.100]: 5.7.1 Blocked by SpamAssassin; from=<**> to=<**> proto=ESMTP helo=<**>


Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 9/2/2014 3:48 AM, Reindl Harald wrote:
>
>
> Am 02.09.2014 um 12:15 schrieb Ted Mittelstaedt:
>>
>>
>> On 8/31/2014 7:35 AM, Reindl Harald wrote:
>>>
>>> Am 31.08.2014 um 16:08 schrieb Ted Mittelstaedt:
>>>> On 8/31/2014 2:21 AM, Reindl Harald wrote:
>>>>>
>>>>> Am 31.08.2014 um 02:15 schrieb Ted Mittelstaedt:
>>>>>> Yes, it does work great when you have the bayes filter turned on and you take the time to feed it.  And that
>>>>>> means
>>>>>> you have to feed the
>>>>>> learner both ham and spam and setup reliable sources for those.
>>>>>>
>>>>>> Unfortunately if Bayes is not turned on, it does not catch more than
>>>>>> around 60-70% of spam.  As a Spamassassin user&    server admin, I would
>>>>>> really like to see that improve.
>>>>>
>>>>> 60-70% without training is great
>>>>>
>>>>> keep in mind that the first 90% of incoming is eaten by RBL's
>>>>> and the 60% are from the remaining 10% at all :-)
>>>>>
>>>>> i think it's impossible to improve that much "out-of-the-box" because
>>>>> that would make it to sensitive while the bayes has the ham side of
>>>>> your communication too for decisions
>>>>>
>>>>
>>>> Google does it.  It's not impossible.
>>>
>>> Google has a lot of more data and power to feed a global bayes
>>> and even then: they fail as you say yourself in the next paragraph
>>>
>>> i don't care for the 5 spam messages
>>> i care for the eaten important one
>>>
>>
>> eaten?  Your the one who is deleting the stuff because it's being tagged
>> as spam, that's YOUR decision not SA's.  SA is just saying "we think this
>> is spam" you are deciding to eat the message.
>
> i am saying to my milter "above score XYZ reject the message" because
> only the [SPAM] in the subject is worthless, it just indicates which
> messages should go to "spam" or "ham" fpr bayes training
>
>>>>> i am coming from a commercial device trying to block 100% and there
>>>>> it ends in zero-hour-blocklists with domains even if they are only
>>>>> linked on the youtube page of the blocked facebook notification
>>>>>
>>>>> so i am glad that i have to do soem training by myself instead fear
>>>>> of false positives which do much more harm
>>>>
>>>> My experience is that the commercial providers like Gmail are now
>>>> so aggressive that false positives are VERY common on their systems,
>>>> this leads to people nowadays quite commonly saying "check your
>>>> spam folder" on their websites and such that send feedback messages.
>>>
>>> which defeats the intention of a spamfilter and the whole idea
>>> of a junk-folder is broken - i need a contenfilter running
>>> relieable before-queue to not see the real crap and some [SPAM]
>>> tagged messages which are hand-move to ham/spam for train bayes
>>>
>>>> Out of the box the default decision point of 5 is too high anyway
>
> it is  a safe default
>
> i changed it to 4.5 after configure all correct and starting with 8.0
> the milter throws messages away - only a handful spam is between that
> scores - but *in any case* it means you have to configure things for
> your needs
>
> no defaults ever will satisfy everybody
> that is just impossible
>

But even you recognized that 5 is not adequate.

>>>> I think the emphasis on avoiding false positives in the stock
>>>> (non-Bayes) distribution is far too high. I suspect that over
>>>> the years many good rule submissions have been ignored because
>>>> incidence of false positives with them was too high for the
>>>> SA maintainers.
>>>
>>> if you have users to support there is nothing more bad than
>>> a false positive - 10 slipped junk mails are not that worse
>>> as having a user complaining that ge don't get legit mail
>>> and is tired of try to explain his customers how the could
>>> make it through the filter
>>
>> If you tag the message as SPAM, either in the header or in the
>> subject line and pass it to the user, the user gets the message.
>
> that's not a professional spamfilter
>
> a good setup tags between careful selected values and rejects
> message above a specific score - we had days with 500000 spam
> delivery tries - have fun deliver them all to the users and
> let them sort out the crap manually
>
>> The user determines the level that the message "slips as junk"
>> not SA - they determine it through the spam score.
>
> depends on the setup
>
> running as milter before-queue you have *one* user
>
> in many countries like here you are required to run before-queue
> because after you accepted a message you *must* deliver it and
> so you have to *reject* messages with a score above XYZ in the
> due the running SMTP session of the lcient
>
>> They can change their spam score (or you can change it for them)
>> so that SA is less aggressive and catches less spam.
>
> depends on the setup
>
>> They can change whatever rule they have that moves spam into a
>> junk mail folder to simply leave it in the inbox  (or you can)
>
> and that does not work in general
>
> i have 8 years expierience with normal endusers
>
> they won't change anything or change things without understand
> the implication and often makes things worser and *user bayes*
> don't work at all - i had over years 12000 users and not a single
> one made it to 200 ham samples, they only hitted night and day
> on the "spam" button and so the user-bayes never was used
>
>> The only way a user would complain that they didn't get a message
>> is if they had configured their setup so that any incoming message
>> that SA thinks is spam, gets deleted.
>
> they expect spam to get rejected
> at least business users do so
>
> at the same time they expect litte to zero false positives
> that's simply impossible out of the box
>
>> They could instead configure their setup so that messages SA thinks
>> are definitely spam (high spam score) go into junk, messages that
>> SA thinks might be spam (moderate spam score) are merely flagged in
>> the subject line as "POSSIBLE SPAM" then put into the inbox where
>> they see them.
>
> they are not interested to get definitely anywhere
> they just don't want to see that
>
> frankly i got spam forwared with the words "why so
> many" (already [SPAM] in the subject) and the "many"
> defined 5 spam messages for a complete week and the
> forwarded was one out of 12 mails to the whole domain
> on a saturdy 3:00 PM
>
>> Or they could just have all mail delivered to their inbox and
>> tag it spam in the subject line.
>>
>> You merely have SA put the spam score in the header then use Procmail
>> to munge up the subject line or delete the message or whatever.
>
> as explained above:
>
> * the users don't want to see clear spam at all
> * in many countries *you must* reject before-queue
> * frankly, where i live for drop a accepted messages
>    you can go up to 2 years *in jail*
>

This is really getting silly.  Once you accept DATA on the SMTP 
handshake so you can read the Subject: line you have accepted the 
message whether you queue it or not. Failing to properly TCP close or 
whatever other trick you want to use to claim you haven't accepted it is 
semantics used to fool the fools.  When you issue the OK to the
DATA then you have accepted it.

I congratulate you on the con job you have pulled on your obviously
extremely ignorant law enforcement to get them to believe you can 
actually take the whole message in and read the body of it, make
decisions based on that, then reject it. <eyeroll>

I also sympathize with you for dealing with whatever cocked-up 
legislative body would pass anything saying that dropping an e-mail 
message you have accepted means you go to jail, since they obviously
have zero clue on what email actually is.

To truly reject it you can only do so on DNS stuff (like PTR checks) 
blacklists and other IP address based stuff, or irregularities in
the envelope like invalid recipient or invalid sender.  And none of that
has anything to do with SpamAssassin.

I reject a lot of mail based on that as well.  My point was if you
accept in a message, content scan it with SA, then you delete it, then
quit complaining about FPs.  I'm not talking about deleting it
before you read it in.

>>>> For a newbie to SA it is disheartening to install SA and not
>>>> get 90% with a 2% false positive, out of the box, but rather get
>>>> 50% with a 0% false positive.  And I think that is a mistake the
>>>> maintainers are making is over-reliance on bayes.
>>>
>>> no - as i showed in another thread that day the opposite is true
>>> the bayes could and should have more impact
>>>
>> I did not see that other thread (and I'm not really interested in
>> looking it up) if your going to disagree at least explain the
>> reasoning in the same thread and don't make people dig it out.
>
> here you go, as you are subscribed you have that messages
>
> http://www.gossamer-threads.com/lists/spamassassin/users/187322
>
>>>> Their design approach has been to rely on Bayes to be trained to go from 50%
>>>> capture out of box with 0% FP to 80-90% capture with 0% FP.
>>>
>>> easy spoken words
>>>
>>> spammer are not dumb and follow SA updates too
>>> how long do you think would such a default survive in the wild?
>>>
>>
>> Uh, spammers don't even like the 50% capture out of the box
>> and constantly work to defeat the rules.  If even 1 of their
>> messages is blocked that is too many.
>
> sorry, but you have no clue about the spammer business
>
> they don't care about the 50% because they have no costs
> in sending out 5 million junk mails but would have costs
> by optimize things
>
> the spam business has the logic below:
>
> * we send out 100000000 junk mails
> * if only 100 makes it to a inbox we nearly won
> * if only one of the hundret hits an idiot buying something we have won
>

They DO care about that 50% because the tests I have run show that
for every default install of SA that is run out of the box with NO
Bayes, and NO rule updates, that over time the amount of spam that
it catches gets lower and lower and lower until it's down around 20%.
That shows that spammers are continually adapting and working on
lowering the effectiveness of even old rulesets.  And that is 
understandable since certainly many people have older servers or
firewalls that incorporate older SA.

Don't forget some (maybe many) of the spammers customers maintain
honeypot addresses just to verify that yes, the spammer did send all
of the messages out that I paid him to send out.

You seem to think it's the spammers selling the junk what I see
is more and more businesses paying the spammers to send out the
spams.

>>>> But, the design approach could easily be relying on Bayes to go
>>>> from 90% capture with 5% FP out of the box, to 90% capture with
>>>> 0% FP with Bayes, and the emphasis being on training Bayes on ham,
>>>> not spam.
>>>
>>> 5% false positives out of the box is just inacceptable
>>>
>>
>> To you, maybe.  Not to Google or Hotmail, and a lot of people use
>> those services.  No, they are not 5% FP but they ARE accepting some
>> FP - and I'm quite sure the actual amount is a trade secret.
>
> if i tag or reject 5% of business customers mail they
> are no longer customer - period
>

Same here.

>> Granted, a lot of their base is free clients so they can tell them
>> to go pound sand if those clients complain about FPs. But many
>> are businesses and I think their reasoning is sound. They are selling
>> into the real world and the real world has a lot more people complaining
>> a lot more about spam, than about FP's.
>
> they accept it at Google - just because they can't call somebody
> that easy

That is a strong point.  Google is indeed sort of self-selecting.

> - if we reject a message a customer has waited for 5
> minutes later my mobile rings - proven by expierience
>
>>> the contentfilter anyways should be only the last defense
>>> and your 90% spam eaten by postscreen and DNSBL scores
>>> combined with postfix-PTR-regex reject dailup networks
>>>
>>> only with the PTR check you get rid of around 80% of
>>> botnet junk without anything else
>>>
>>
>> Those are the easiest things to defeat, and today I see most
>> spam coming in from hosts with valid PTRs and valid domain
>> names.  And the DNSBLs are getting less effective probably
>> because spammers are using large cable networks that hand
>> out IP numbers via DHCP and the spammers use these to rapidly
>> cycle through many IP numbers with their fake servers.
>
> that is simply not true or you are too small to see
> the real spam making it through the internet
>
> i have a domain with subdomains in 90 countries and
> peak days had 500000 delivery attempts with 80% from
> generic IP adressess which are part of *botnets* with
> infected clients
>

We are on the same Internet and the spammers have just as much
access to your mailservers as they do to mine.  You are talking
like your getting -more- spam than I am on a per user basis and like 
your seeing a completely different pattern.  Well OK maybe you are but I 
cannot imagine why.  I can only report on what I'm seeing and
my observations - and those are that more and more spam is coming
from throwaway domain names and less and less from botnets.

> honestly with "Note I am pulling the percentages out
> of my ass" at the begin of your first response someone
> could ignore the rest
>
>>>> Note I am pulling the percentages out of my ass, but I
>>>> think you get the idea.
>>>
>>> i get the idea and a few years ago a thought the same way
>>>
>>> but looking what support times angry customers not get
>>> important mail (including myself) wasted and how less
>>> time it takes for each user to just delete his 10 daily
>>> spam never face the other thounsands already blocked
>>> my attitude in that context changed dramatically
>>
>> I do not agree than 10 daily spams is acceptable. The only
>> valid number of spams a user should ever get is 0.  If you
>> say that 10 out of 10,000 are OK then the spammers just think
>> "wheeee!  That means all I have to do is send that guy
>> 1000 spams and I'll get 1 of them through to him.  And if can get
>> 1 though that lets me steal his credit card data from his
>> PC then it's a great day for me!"  And the spammers can
>> definitely send 1000 spams.
>
> spam != phising
> phishing is a subset of spam
>
> anyways, there is nothhing to discuss, 100% is
> not possible out of the box and if you think it
> is without damage prove it by write that SW
>

The original poster said they were coming from a commercial device
"trying to block 100%".  I said 90% and I made it clear I was talking
comparison numbers not actual numbers.

>> Gmail is also aiming at 0 daily spams not 10, and people rave about
>> how good their spam filter is, and those same
>> people NOT
>> complaining about losing important mail in Gmail's junk
>> folder (even though they do), so my attitude is 180 degrees
>> opposite yours.
>
> because you talk all day long about a junk folder
>
> frankly i do not see anything but 1 up to 5 messages
> per day in that folder because the rest is rejected
> by the milter or postscreen
>
>>> that's also why postscreen with a lot of RBL's combined
>>> with differernt weighted DNSWL's to not allow a single
>>> RBL by mistake do damage like block large providers
>>> like GMX/Web.de (United Internet) not so long ago
>>>
>>> i am a new SA user built up a complete mailfilter system
>>> the last few weeks but with some years expierience from
>>> other systems
>>>
>>> what i see here at least over the weekend is the result below
>>> and says clearly "rely on a contentfilter only as last defense
>>> for several reasons"
>>>
>>> SA is very expensive (connection time, resources), postscreen is
>>> for free and don't eat a single smtpd process most of the time
>>>
>>> [root@localhost:~]$ cat maillog | grep "CONNECT from" | wc -l
>>> 1940
>>>
>>> [root@localhost:~]$ cat maillog | grep "NOQUEUE" | grep postscreen | wc -l
>>> 1584
>>>
>>> [root@localhost:~]$ cat maillog | grep "relay=" | wc -l
>>> 286
>>>
>>> [root@localhost:~]$ cat maillog | grep "SpamAssassin" | wc -l
>>> 58
>>>
>>> [root@localhost:~]$ cat maillog | grep "cannot find your reverse hostname" | wc -l
>>> 12
>>
>> I don't use postscreen or Postfix.  but I do greylist and that does
>> a similar thing, gets rid of spambot mail.  Even though spambot mail
>> is nowadays a small amount of spam anymore.
>
> i does *nothing* similar, you are talking about postscreens
> deep-protocol-tests which are *off* by default - please RTFM
>

As I already said I do not use Postfix.  You are assuming the world
uses Postfix.  Barely a quarter does.

> postscreen asks a bundle of RBL's and you define a score for each
> of them and a total score to reject which means you don't rely on
> a single RBL while at the same time you avoid reject legit mail
> because one RBL made a mistake
>

Yay for it.  I guess your a Postfix evangelical also.  Probably should
have seen that.

> 95% of any spam-attempt is killed with the config below
> and even makes it not to the smtpd process at all, how
> many of that list will be there at the end and which
> score/trust-level they get will be ruled out the next
> weeks
>
> postscreen_cache_retention_time      = 7d
> postscreen_bare_newline_ttl          = 7d
> postscreen_greet_ttl                 = 7d
> postscreen_non_smtp_command_ttl      = 7d
> postscreen_pipelining_ttl            = 7d
> postscreen_dnsbl_ttl                 = 30m
> postscreen_dnsbl_threshold           = 8
> postscreen_dnsbl_action              = enforce
> postscreen_dnsbl_sites = dul.dnsbl.sorbs.net*8
>   b.barracudacentral.org*7
>   dnsbl.inps.de*7
>   zen.spamhaus.org=127.0.0.[10;11]*6
>   zen.spamhaus.org=127.0.0.[4..7]*5
>   bl.spamcop.net*4
>   ix.dnsbl.manitu.net*4
>   zen.spamhaus.org=127.0.0.3*4
>   bl.mailspike.net*3
>   dnsbl-1.uceprotect.net*3
>   zen.spamhaus.org=127.0.0.2*3
>   bl.spameatingmonkey.net*2
>   dnsrbl.swinog.ch*2
>   psbl.surriel.com*2
>   spam.dnsbl.sorbs.net*2
>   ips.backscatterer.org*1
>   dnswl-low.thelounge.net*-3
>   list.dnswl.org=127.0.[0..255].0*-3
>   list.dnswl.org=127.0.[0..255].1*-4
>   list.dnswl.org=127.0.[0..255].2*-5
>   list.dnswl.org=127.0.[0..255].3*-6
>
>
>> In my world the cost of hardware that has CPU power and memory power that far
>> and away exceeds the disk I/O is a little bit higher than dirt.
>
> one reason more to care about need 1, 2 or hundret
> expensive servers for the same load
>
>> Have you profiled your servers?  Mine spend most of their CPU power loafing
>> along, the disk I/O channel can be almost saturated and the CPU's cores are
>> still idling along.  But of course, these are servers that are only a few
>> years old.
>
> that is not the point, having them consuming more power which
> can be avoided with no costs also means you need more cooling
> and in case of power-outage your UPC may not stand until power
> comes back or all is fine
>
> and the other point is: how many spamd processes can scan
> at the same time and how many legit mail you do expect
> in what timeframe - so you need to calculate if you
> pass any junk to SA if you have enough ressources for
> the legit mail to not produce timeouts
>
> i expect a legit message after the first contact to postscreen
> to be in the uers inbox in the destionation server within 2 or
> 5 seconds and only as exception up to 10 seconds
>
>> If I was serving 100K mail clients I might feel differently
>
> for sure - don't get me wrong but that is my last response
> in that context, i need to do my work and already wasted
> too much time
>

No problem, I'm not interested in getting into a discussion with you
on what MTA is the best, I already know what your answer is going to be.

I understand where your coming from although I don't think you have
made much attempt to understand my position - but I thank you for
your posts because they have many good ideas - for Postfix users in 
particular - and hopefully readers will benefit.

Ted

Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.

Am 02.09.2014 um 12:15 schrieb Ted Mittelstaedt:
> 
> 
> On 8/31/2014 7:35 AM, Reindl Harald wrote:
>>
>> Am 31.08.2014 um 16:08 schrieb Ted Mittelstaedt:
>>> On 8/31/2014 2:21 AM, Reindl Harald wrote:
>>>>
>>>> Am 31.08.2014 um 02:15 schrieb Ted Mittelstaedt:
>>>>> Yes, it does work great when you have the bayes filter turned on and you take the time to feed it.  And that
>>>>> means
>>>>> you have to feed the
>>>>> learner both ham and spam and setup reliable sources for those.
>>>>>
>>>>> Unfortunately if Bayes is not turned on, it does not catch more than
>>>>> around 60-70% of spam.  As a Spamassassin user&   server admin, I would
>>>>> really like to see that improve.
>>>>
>>>> 60-70% without training is great
>>>>
>>>> keep in mind that the first 90% of incoming is eaten by RBL's
>>>> and the 60% are from the remaining 10% at all :-)
>>>>
>>>> i think it's impossible to improve that much "out-of-the-box" because
>>>> that would make it to sensitive while the bayes has the ham side of
>>>> your communication too for decisions
>>>>
>>>
>>> Google does it.  It's not impossible.
>>
>> Google has a lot of more data and power to feed a global bayes
>> and even then: they fail as you say yourself in the next paragraph
>>
>> i don't care for the 5 spam messages
>> i care for the eaten important one
>>
> 
> eaten?  Your the one who is deleting the stuff because it's being tagged
> as spam, that's YOUR decision not SA's.  SA is just saying "we think this 
> is spam" you are deciding to eat the message.

i am saying to my milter "above score XYZ reject the message" because
only the [SPAM] in the subject is worthless, it just indicates which
messages should go to "spam" or "ham" fpr bayes training

>>>> i am coming from a commercial device trying to block 100% and there
>>>> it ends in zero-hour-blocklists with domains even if they are only
>>>> linked on the youtube page of the blocked facebook notification
>>>>
>>>> so i am glad that i have to do soem training by myself instead fear
>>>> of false positives which do much more harm
>>>
>>> My experience is that the commercial providers like Gmail are now
>>> so aggressive that false positives are VERY common on their systems,
>>> this leads to people nowadays quite commonly saying "check your
>>> spam folder" on their websites and such that send feedback messages.
>>
>> which defeats the intention of a spamfilter and the whole idea
>> of a junk-folder is broken - i need a contenfilter running
>> relieable before-queue to not see the real crap and some [SPAM]
>> tagged messages which are hand-move to ham/spam for train bayes
>>
>>> Out of the box the default decision point of 5 is too high anyway

it is  a safe default

i changed it to 4.5 after configure all correct and starting with 8.0
the milter throws messages away - only a handful spam is between that
scores - but *in any case* it means you have to configure things for
your needs

no defaults ever will satisfy everybody
that is just impossible

>>> I think the emphasis on avoiding false positives in the stock
>>> (non-Bayes) distribution is far too high. I suspect that over
>>> the years many good rule submissions have been ignored because
>>> incidence of false positives with them was too high for the
>>> SA maintainers.
>>
>> if you have users to support there is nothing more bad than
>> a false positive - 10 slipped junk mails are not that worse
>> as having a user complaining that ge don't get legit mail
>> and is tired of try to explain his customers how the could
>> make it through the filter
> 
> If you tag the message as SPAM, either in the header or in the
> subject line and pass it to the user, the user gets the message.

that's not a professional spamfilter

a good setup tags between careful selected values and rejects
message above a specific score - we had days with 500000 spam
delivery tries - have fun deliver them all to the users and
let them sort out the crap manually

> The user determines the level that the message "slips as junk"
> not SA - they determine it through the spam score.

depends on the setup

running as milter before-queue you have *one* user

in many countries like here you are required to run before-queue
because after you accepted a message you *must* deliver it and
so you have to *reject* messages with a score above XYZ in the
due the running SMTP session of the lcient

> They can change their spam score (or you can change it for them)
> so that SA is less aggressive and catches less spam.

depends on the setup

> They can change whatever rule they have that moves spam into a
> junk mail folder to simply leave it in the inbox  (or you can)

and that does not work in general

i have 8 years expierience with normal endusers

they won't change anything or change things without understand
the implication and often makes things worser and *user bayes*
don't work at all - i had over years 12000 users and not a single
one made it to 200 ham samples, they only hitted night and day
on the "spam" button and so the user-bayes never was used

> The only way a user would complain that they didn't get a message
> is if they had configured their setup so that any incoming message
> that SA thinks is spam, gets deleted.

they expect spam to get rejected
at least business users do so

at the same time they expect litte to zero false positives
that's simply impossible out of the box

> They could instead configure their setup so that messages SA thinks
> are definitely spam (high spam score) go into junk, messages that
> SA thinks might be spam (moderate spam score) are merely flagged in
> the subject line as "POSSIBLE SPAM" then put into the inbox where
> they see them.

they are not interested to get definitely anywhere
they just don't want to see that

frankly i got spam forwared with the words "why so
many" (already [SPAM] in the subject) and the "many"
defined 5 spam messages for a complete week and the
forwarded was one out of 12 mails to the whole domain
on a saturdy 3:00 PM

> Or they could just have all mail delivered to their inbox and
> tag it spam in the subject line.
> 
> You merely have SA put the spam score in the header then use Procmail
> to munge up the subject line or delete the message or whatever.

as explained above:

* the users don't want to see clear spam at all
* in many countries *you must* reject before-queue
* frankly, where i live for drop a accepted messages
  you can go up to 2 years *in jail*

>>> For a newbie to SA it is disheartening to install SA and not
>>> get 90% with a 2% false positive, out of the box, but rather get
>>> 50% with a 0% false positive.  And I think that is a mistake the
>>> maintainers are making is over-reliance on bayes.
>>
>> no - as i showed in another thread that day the opposite is true
>> the bayes could and should have more impact
>>
> I did not see that other thread (and I'm not really interested in
> looking it up) if your going to disagree at least explain the
> reasoning in the same thread and don't make people dig it out.

here you go, as you are subscribed you have that messages

http://www.gossamer-threads.com/lists/spamassassin/users/187322

>>> Their design approach has been to rely on Bayes to be trained to go from 50%
>>> capture out of box with 0% FP to 80-90% capture with 0% FP.
>>
>> easy spoken words
>>
>> spammer are not dumb and follow SA updates too
>> how long do you think would such a default survive in the wild?
>>
> 
> Uh, spammers don't even like the 50% capture out of the box
> and constantly work to defeat the rules.  If even 1 of their 
> messages is blocked that is too many.

sorry, but you have no clue about the spammer business

they don't care about the 50% because they have no costs
in sending out 5 million junk mails but would have costs
by optimize things

the spam business has the logic below:

* we send out 100000000 junk mails
* if only 100 makes it to a inbox we nearly won
* if only one of the hundret hits an idiot buying something we have won

>>> But, the design approach could easily be relying on Bayes to go
>>> from 90% capture with 5% FP out of the box, to 90% capture with
>>> 0% FP with Bayes, and the emphasis being on training Bayes on ham,
>>> not spam.
>>
>> 5% false positives out of the box is just inacceptable
>>
> 
> To you, maybe.  Not to Google or Hotmail, and a lot of people use
> those services.  No, they are not 5% FP but they ARE accepting some
> FP - and I'm quite sure the actual amount is a trade secret.

if i tag or reject 5% of business customers mail they
are no longer customer - period

> Granted, a lot of their base is free clients so they can tell them
> to go pound sand if those clients complain about FPs. But many 
> are businesses and I think their reasoning is sound. They are selling 
> into the real world and the real world has a lot more people complaining
> a lot more about spam, than about FP's.

they accept it at Google - just because they can't call somebody
that easy - if we reject a message a customer has waited for 5
minutes later my mobile rings - proven by expierience

>> the contentfilter anyways should be only the last defense
>> and your 90% spam eaten by postscreen and DNSBL scores
>> combined with postfix-PTR-regex reject dailup networks
>>
>> only with the PTR check you get rid of around 80% of
>> botnet junk without anything else
>>
> 
> Those are the easiest things to defeat, and today I see most
> spam coming in from hosts with valid PTRs and valid domain
> names.  And the DNSBLs are getting less effective probably
> because spammers are using large cable networks that hand
> out IP numbers via DHCP and the spammers use these to rapidly
> cycle through many IP numbers with their fake servers.

that is simply not true or you are too small to see
the real spam making it through the internet

i have a domain with subdomains in 90 countries and
peak days had 500000 delivery attempts with 80% from
generic IP adressess which are part of *botnets* with
infected clients

honestly with "Note I am pulling the percentages out
of my ass" at the begin of your first response someone
could ignore the rest

>>> Note I am pulling the percentages out of my ass, but I
>>> think you get the idea.
>>
>> i get the idea and a few years ago a thought the same way
>>
>> but looking what support times angry customers not get
>> important mail (including myself) wasted and how less
>> time it takes for each user to just delete his 10 daily
>> spam never face the other thounsands already blocked
>> my attitude in that context changed dramatically
> 
> I do not agree than 10 daily spams is acceptable. The only
> valid number of spams a user should ever get is 0.  If you
> say that 10 out of 10,000 are OK then the spammers just think
> "wheeee!  That means all I have to do is send that guy
> 1000 spams and I'll get 1 of them through to him.  And if can get
> 1 though that lets me steal his credit card data from his
> PC then it's a great day for me!"  And the spammers can
> definitely send 1000 spams.

spam != phising
phishing is a subset of spam

anyways, there is nothhing to discuss, 100% is
not possible out of the box and if you think it
is without damage prove it by write that SW

> Gmail is also aiming at 0 daily spams not 10, and people rave about 
> how good their spam filter is, and those same
> people NOT
> complaining about losing important mail in Gmail's junk
> folder (even though they do), so my attitude is 180 degrees
> opposite yours.

because you talk all day long about a junk folder

frankly i do not see anything but 1 up to 5 messages
per day in that folder because the rest is rejected
by the milter or postscreen

>> that's also why postscreen with a lot of RBL's combined
>> with differernt weighted DNSWL's to not allow a single
>> RBL by mistake do damage like block large providers
>> like GMX/Web.de (United Internet) not so long ago
>>
>> i am a new SA user built up a complete mailfilter system
>> the last few weeks but with some years expierience from
>> other systems
>>
>> what i see here at least over the weekend is the result below
>> and says clearly "rely on a contentfilter only as last defense
>> for several reasons"
>>
>> SA is very expensive (connection time, resources), postscreen is
>> for free and don't eat a single smtpd process most of the time
>>
>> [root@localhost:~]$ cat maillog | grep "CONNECT from" | wc -l
>> 1940
>>
>> [root@localhost:~]$ cat maillog | grep "NOQUEUE" | grep postscreen | wc -l
>> 1584
>>
>> [root@localhost:~]$ cat maillog | grep "relay=" | wc -l
>> 286
>>
>> [root@localhost:~]$ cat maillog | grep "SpamAssassin" | wc -l
>> 58
>>
>> [root@localhost:~]$ cat maillog | grep "cannot find your reverse hostname" | wc -l
>> 12
> 
> I don't use postscreen or Postfix.  but I do greylist and that does
> a similar thing, gets rid of spambot mail.  Even though spambot mail 
> is nowadays a small amount of spam anymore.

i does *nothing* similar, you are talking about postscreens
deep-protocol-tests which are *off* by default - please RTFM

postscreen asks a bundle of RBL's and you define a score for each
of them and a total score to reject which means you don't rely on
a single RBL while at the same time you avoid reject legit mail
because one RBL made a mistake

95% of any spam-attempt is killed with the config below
and even makes it not to the smtpd process at all, how
many of that list will be there at the end and which
score/trust-level they get will be ruled out the next
weeks

postscreen_cache_retention_time      = 7d
postscreen_bare_newline_ttl          = 7d
postscreen_greet_ttl                 = 7d
postscreen_non_smtp_command_ttl      = 7d
postscreen_pipelining_ttl            = 7d
postscreen_dnsbl_ttl                 = 30m
postscreen_dnsbl_threshold           = 8
postscreen_dnsbl_action              = enforce
postscreen_dnsbl_sites = dul.dnsbl.sorbs.net*8
 b.barracudacentral.org*7
 dnsbl.inps.de*7
 zen.spamhaus.org=127.0.0.[10;11]*6
 zen.spamhaus.org=127.0.0.[4..7]*5
 bl.spamcop.net*4
 ix.dnsbl.manitu.net*4
 zen.spamhaus.org=127.0.0.3*4
 bl.mailspike.net*3
 dnsbl-1.uceprotect.net*3
 zen.spamhaus.org=127.0.0.2*3
 bl.spameatingmonkey.net*2
 dnsrbl.swinog.ch*2
 psbl.surriel.com*2
 spam.dnsbl.sorbs.net*2
 ips.backscatterer.org*1
 dnswl-low.thelounge.net*-3
 list.dnswl.org=127.0.[0..255].0*-3
 list.dnswl.org=127.0.[0..255].1*-4
 list.dnswl.org=127.0.[0..255].2*-5
 list.dnswl.org=127.0.[0..255].3*-6


> In my world the cost of hardware that has CPU power and memory power that far 
> and away exceeds the disk I/O is a little bit higher than dirt.

one reason more to care about need 1, 2 or hundret
expensive servers for the same load

> Have you profiled your servers?  Mine spend most of their CPU power loafing 
> along, the disk I/O channel can be almost saturated and the CPU's cores are 
> still idling along.  But of course, these are servers that are only a few 
> years old.

that is not the point, having them consuming more power which
can be avoided with no costs also means you need more cooling
and in case of power-outage your UPC may not stand until power
comes back or all is fine

and the other point is: how many spamd processes can scan
at the same time and how many legit mail you do expect
in what timeframe - so you need to calculate if you
pass any junk to SA if you have enough ressources for
the legit mail to not produce timeouts

i expect a legit message after the first contact to postscreen
to be in the uers inbox in the destionation server within 2 or
5 seconds and only as exception up to 10 seconds

> If I was serving 100K mail clients I might feel differently

for sure - don't get me wrong but that is my last response
in that context, i need to do my work and already wasted
too much time


Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 8/31/2014 7:35 AM, Reindl Harald wrote:
>
> Am 31.08.2014 um 16:08 schrieb Ted Mittelstaedt:
>> On 8/31/2014 2:21 AM, Reindl Harald wrote:
>>>
>>> Am 31.08.2014 um 02:15 schrieb Ted Mittelstaedt:
>>>> Yes, it does work great when you have the bayes filter turned on and you take the time to feed it.  And that means
>>>> you have to feed the
>>>> learner both ham and spam and setup reliable sources for those.
>>>>
>>>> Unfortunately if Bayes is not turned on, it does not catch more than
>>>> around 60-70% of spam.  As a Spamassassin user&   server admin, I would
>>>> really like to see that improve.
>>>
>>> 60-70% without training is great
>>>
>>> keep in mind that the first 90% of incoming is eaten by RBL's
>>> and the 60% are from the remaining 10% at all :-)
>>>
>>> i think it's impossible to improve that much "out-of-the-box" because
>>> that would make it to sensitive while the bayes has the ham side of
>>> your communication too for decisions
>>>
>>
>> Google does it.  It's not impossible.
>
> Google has a lot of more data and power to feed a global bayes
> and even then: they fail as you say yourself in the next paragraph
>
> i don't care for the 5 spam messages
> i care for the eaten important one
>

eaten?  Your the one who is deleting the stuff because it's being tagged
as spam, that's YOUR decision not SA's.  SA is just saying "we think 
this is spam" you are deciding to eat the message.

>>> i am coming from a commercial device trying to block 100% and there
>>> it ends in zero-hour-blocklists with domains even if they are only
>>> linked on the youtube page of the blocked facebook notification
>>>
>>> so i am glad that i have to do soem training by myself instead fear
>>> of false positives which do much more harm
>>
>> My experience is that the commercial providers like Gmail are now
>> so aggressive that false positives are VERY common on their systems,
>> this leads to people nowadays quite commonly saying "check your
>> spam folder" on their websites and such that send feedback messages.
>
> which defeats the intention of a spamfilter and the whole idea
> of a junk-folder is broken - i need a contenfilter running
> relieable before-queue to not see the real crap and some [SPAM]
> tagged messages which are hand-move to ham/spam for train bayes
>
>> Out of the box the default decision point of 5 is too high anyway.
>>
>> I think the emphasis on avoiding false positives in the stock
>> (non-Bayes) distribution is far too high. I suspect that over
>> the years many good rule submissions have been ignored because
>> incidence of false positives with them was too high for the
>> SA maintainers.
>
> if you have users to support there is nothing more bad than
> a false positive - 10 slipped junk mails are not that worse
> as having a user complaining that ge don't get legit mail
> and is tired of try to explain his customers how the could
> make it through the filter
>


If you tag the message as SPAM, either in the header or in the
subject line and pass it to the user, the user gets the message.

The user determines the level that the message "slips as junk"
not SA - they determine it through the spam score.

They can change their spam score (or you can change it for them)
so that SA is less aggressive and catches less spam.

They can change whatever rule they have that moves spam into a
junk mail folder to simply leave it in the inbox  (or you can)

The only way a user would complain that they didn't get a message
is if they had configured their setup so that any incoming message
that SA thinks is spam, gets deleted.

They could instead configure their setup so that messages SA thinks
are definitely spam (high spam score) go into junk, messages that
SA thinks might be spam (moderate spam score) are merely flagged in
the subject line as "POSSIBLE SPAM" then put into the inbox where
they see them.

Or they could just have all mail delivered to their inbox and
tag it spam in the subject line.

You merely have SA put the spam score in the header then use Procmail
to munge up the subject line or delete the message or whatever.

>> For a newbie to SA it is disheartening to install SA and not
>> get 90% with a 2% false positive, out of the box, but rather get
>> 50% with a 0% false positive.  And I think that is a mistake the
>> maintainers are making is over-reliance on bayes.
>
> no - as i showed in another thread that day the opposite is true
> the bayes could and should have more impact
>

I did not see that other thread (and I'm not really interested in
looking it up) if your going to disagree at least explain the
reasoning in the same thread and don't make people dig it out.

> but that can't be default values because no software can know
> how good the bayes data (ham and spam) are really and if it
> is trained by a noob fire any newsletter into "spam" it makes
> damage - mine is trustable because i know what i am doing in
> that context
>
> the most important thing in train a bayes is to know what
> messages you should strongly avoid to feed in
>

of course.

>> At the least the SA maintainers should maintain a separate
>> "highly aggressive" rule distro that was optional that would
>> give us a much higher success rate with a corresponding
>> slight increase in false positives.
>
> here i agree - maybe with a meta-rule or such which have
> it's own score in "local.cf" - but i still think you
> need to know what you are doing because such meta value
> also makes compromises and in my case i trust my base
> nearly unconditional but would not have other default
> rules with the same power
>
>> Their design approach has been to rely on Bayes to be trained to go from 50%
>> capture out of box with 0% FP to 80-90% capture with 0% FP.
>
> easy spoken words
>
> spammer are not dumb and follow SA updates too
> how long do you think would such a default survive in the wild?
>

Uh, spammers don't even like the 50% capture out of the box
and constantly work to defeat the rules.  If even 1 of their messages
is blocked that is too many.

>> But, the design approach could easily be relying on Bayes to go
>> from 90% capture with 5% FP out of the box, to 90% capture with
>> 0% FP with Bayes, and the emphasis being on training Bayes on ham,
>> not spam.
>
> 5% false positives out of the box is just inacceptable
>

To you, maybe.  Not to Google or Hotmail, and a lot of people use
those services.  No, they are not 5% FP but they ARE accepting some
FP - and I'm quite sure the actual amount is a trade secret.

Granted, a lot of their base is free clients so they can tell them
to go pound sand if those clients complain about FPs.  But many are 
businesses and I think their reasoning is sound.  They are selling into
the real world and the real world has a lot more people complaining
a lot more about spam, than about FP's.

> the contentfilter anyways should be only the last defense
> and your 90% spam eaten by postscreen and DNSBL scores
> combined with postfix-PTR-regex reject dailup networks
>
> only with the PTR check you get rid of around 80% of
> botnet junk without anything else
>

Those are the easiest things to defeat, and today I see most
spam coming in from hosts with valid PTRs and valid domain
names.  And the DNSBLs are getting less effective probably
because spammers are using large cable networks that hand
out IP numbers via DHCP and the spammers use these to rapidly
cycle through many IP numbers with their fake servers.

>> Note I am pulling the percentages out of my ass, but I
>> think you get the idea.
>
> i get the idea and a few years ago a thought the same way
>
> but looking what support times angry customers not get
> important mail (including myself) wasted and how less
> time it takes for each user to just delete his 10 daily
> spam never face the other thounsands already blocked
> my attitude in that context changed dramatically
>

I do not agree than 10 daily spams is acceptable.  The only
valid number of spams a user should ever get is 0.  If you
say that 10 out of 10,000 are OK then the spammers just think
"wheeee!  That means all I have to do is send that guy
1000 spams and I'll get 1 of them through to him.  And if can get
1 though that lets me steal his credit card data from his
PC then it's a great day for me!"  And the spammers can
definitely send 1000 spams.

Gmail is also aiming at 0 daily spams not 10, and people rave about how 
good their spam filter is, and those same people NOT
complaining about losing important mail in Gmail's junk
folder (even though they do), so my attitude is 180 degrees
opposite yours.

> that's also why postscreen with a lot of RBL's combined
> with differernt weighted DNSWL's to not allow a single
> RBL by mistake do damage like block large providers
> like GMX/Web.de (United Internet) not so long ago
>
> i am a new SA user built up a complete mailfilter system
> the last few weeks but with some years expierience from
> other systems
>
> what i see here at least over the weekend is the result below
> and says clearly "rely on a contentfilter only as last defense
> for several reasons"
>
> SA is very expensive (connection time, resources), postscreen is
> for free and don't eat a single smtpd process most of the time
>
> [root@localhost:~]$ cat maillog | grep "CONNECT from" | wc -l
> 1940
>
> [root@localhost:~]$ cat maillog | grep "NOQUEUE" | grep postscreen | wc -l
> 1584
>
> [root@localhost:~]$ cat maillog | grep "relay=" | wc -l
> 286
>
> [root@localhost:~]$ cat maillog | grep "SpamAssassin" | wc -l
> 58
>
> [root@localhost:~]$ cat maillog | grep "cannot find your reverse hostname" | wc -l
> 12
>

I don't use postscreen or Postfix.  but I do greylist and that does
a similar thing, gets rid of spambot mail.  Even though spambot mail is 
nowadays a small amount of spam anymore.

In my world the cost of hardware that has CPU power and memory power 
that far and away exceeds the disk I/O is a little bit higher than dirt.

Have you profiled your servers?  Mine spend most of their CPU power 
loafing along, the disk I/O channel can be almost saturated and the
CPU's cores are still idling along.  But of course, these are servers
that are only a few years old.

If I was serving 100K mail clients I might feel differently.

Ted

>>>> On 8/30/2014 2:41 PM, Reindl Harald wrote:
>>>>> after two days running SA for the first two test-domains with a
>>>>> well trained bayes for the global milter-user: impressive!
>>>>>
>>>>> the few crap making it through poscreen RBL scroing is detected
>>>>>
>>>>> 0.000          0          3          0  non-token data: bayes db version
>>>>> 0.000          0       1389          0  non-token data: nspam
>>>>> 0.000          0       1350          0  non-token data: nham
>>>>> 0.000          0     257152          0  non-token data: ntokens
>>>>>
>>>>> Aug 30 23:34:19 localhost spamd[4882]: spamd: identified spam (8.9/4.5) for sa-milt:189 in 0.6 seconds, 2454
>>>>> bytes.
>>>>> Aug 30 23:34:19 localhost spamd[4882]: spamd: result: Y 8 -
>>>>> BAYES_80,CUST_DNSBL_15,CUST_DNSWL_2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,REPLYTO_WITHOUT_TO_CC,RP_MATCHES_RCVD,SPF_PASS
>>>>>
>>>>>
>>>>> scantime=0.6,size=2454,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=51671,mid=<SN...@phx.gbl>,bayes=0.842503,autolearn=disabled
>>>>>
>>>>>
>>>>> Aug 30 23:34:19 localhost postfix/cleanup[6195]: 3hlrXp5S3dz1w: milter-reject: END-OF-MESSAGE from
>>>>> snt004-omc1s37.hotmail.com[65.55.90.48]: 5.7.1 Blocked by SpamAssassin; from=<je...@hotmail.com>
>>>>> to=<***>
>

Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.
Am 31.08.2014 um 16:08 schrieb Ted Mittelstaedt:
> On 8/31/2014 2:21 AM, Reindl Harald wrote:
>>
>> Am 31.08.2014 um 02:15 schrieb Ted Mittelstaedt:
>>> Yes, it does work great when you have the bayes filter turned on and you take the time to feed it.  And that means
>>> you have to feed the
>>> learner both ham and spam and setup reliable sources for those.
>>>
>>> Unfortunately if Bayes is not turned on, it does not catch more than
>>> around 60-70% of spam.  As a Spamassassin user&  server admin, I would
>>> really like to see that improve.
>>
>> 60-70% without training is great
>>
>> keep in mind that the first 90% of incoming is eaten by RBL's
>> and the 60% are from the remaining 10% at all :-)
>>
>> i think it's impossible to improve that much "out-of-the-box" because
>> that would make it to sensitive while the bayes has the ham side of
>> your communication too for decisions
>>
> 
> Google does it.  It's not impossible.

Google has a lot of more data and power to feed a global bayes
and even then: they fail as you say yourself in the next paragraph

i don't care for the 5 spam messages
i care for the eaten important one

>> i am coming from a commercial device trying to block 100% and there
>> it ends in zero-hour-blocklists with domains even if they are only
>> linked on the youtube page of the blocked facebook notification
>>
>> so i am glad that i have to do soem training by myself instead fear
>> of false positives which do much more harm
> 
> My experience is that the commercial providers like Gmail are now
> so aggressive that false positives are VERY common on their systems,
> this leads to people nowadays quite commonly saying "check your
> spam folder" on their websites and such that send feedback messages.

which defeats the intention of a spamfilter and the whole idea
of a junk-folder is broken - i need a contenfilter running
relieable before-queue to not see the real crap and some [SPAM]
tagged messages which are hand-move to ham/spam for train bayes

> Out of the box the default decision point of 5 is too high anyway.
> 
> I think the emphasis on avoiding false positives in the stock
> (non-Bayes) distribution is far too high. I suspect that over
> the years many good rule submissions have been ignored because
> incidence of false positives with them was too high for the
> SA maintainers.

if you have users to support there is nothing more bad than
a false positive - 10 slipped junk mails are not that worse
as having a user complaining that ge don't get legit mail
and is tired of try to explain his customers how the could
make it through the filter

> For a newbie to SA it is disheartening to install SA and not
> get 90% with a 2% false positive, out of the box, but rather get
> 50% with a 0% false positive.  And I think that is a mistake the
> maintainers are making is over-reliance on bayes.

no - as i showed in another thread that day the opposite is true
the bayes could and should have more impact

but that can't be default values because no software can know
how good the bayes data (ham and spam) are really and if it
is trained by a noob fire any newsletter into "spam" it makes
damage - mine is trustable because i know what i am doing in
that context

the most important thing in train a bayes is to know what
messages you should strongly avoid to feed in

> At the least the SA maintainers should maintain a separate
> "highly aggressive" rule distro that was optional that would
> give us a much higher success rate with a corresponding
> slight increase in false positives.

here i agree - maybe with a meta-rule or such which have
it's own score in "local.cf" - but i still think you
need to know what you are doing because such meta value
also makes compromises and in my case i trust my base
nearly unconditional but would not have other default
rules with the same power

> Their design approach has been to rely on Bayes to be trained to go from 50% 
> capture out of box with 0% FP to 80-90% capture with 0% FP.

easy spoken words

spammer are not dumb and follow SA updates too
how long do you think would such a default survive in the wild?

> But, the design approach could easily be relying on Bayes to go
> from 90% capture with 5% FP out of the box, to 90% capture with
> 0% FP with Bayes, and the emphasis being on training Bayes on ham,
> not spam.

5% false positives out of the box is just inacceptable

the contentfilter anyways should be only the last defense
and your 90% spam eaten by postscreen and DNSBL scores
combined with postfix-PTR-regex reject dailup networks

only with the PTR check you get rid of around 80% of
botnet junk without anything else

> Note I am pulling the percentages out of my ass, but I 
> think you get the idea.

i get the idea and a few years ago a thought the same way

but looking what support times angry customers not get
important mail (including myself) wasted and how less
time it takes for each user to just delete his 10 daily
spam never face the other thounsands already blocked
my attitude in that context changed dramatically

that's also why postscreen with a lot of RBL's combined
with differernt weighted DNSWL's to not allow a single
RBL by mistake do damage like block large providers
like GMX/Web.de (United Internet) not so long ago

i am a new SA user built up a complete mailfilter system
the last few weeks but with some years expierience from
other systems

what i see here at least over the weekend is the result below
and says clearly "rely on a contentfilter only as last defense
for several reasons"

SA is very expensive (connection time, resources), postscreen is
for free and don't eat a single smtpd process most of the time

[root@localhost:~]$ cat maillog | grep "CONNECT from" | wc -l
1940

[root@localhost:~]$ cat maillog | grep "NOQUEUE" | grep postscreen | wc -l
1584

[root@localhost:~]$ cat maillog | grep "relay=" | wc -l
286

[root@localhost:~]$ cat maillog | grep "SpamAssassin" | wc -l
58

[root@localhost:~]$ cat maillog | grep "cannot find your reverse hostname" | wc -l
12

>>> On 8/30/2014 2:41 PM, Reindl Harald wrote:
>>>> after two days running SA for the first two test-domains with a
>>>> well trained bayes for the global milter-user: impressive!
>>>>
>>>> the few crap making it through poscreen RBL scroing is detected
>>>>
>>>> 0.000          0          3          0  non-token data: bayes db version
>>>> 0.000          0       1389          0  non-token data: nspam
>>>> 0.000          0       1350          0  non-token data: nham
>>>> 0.000          0     257152          0  non-token data: ntokens
>>>>
>>>> Aug 30 23:34:19 localhost spamd[4882]: spamd: identified spam (8.9/4.5) for sa-milt:189 in 0.6 seconds, 2454
>>>> bytes.
>>>> Aug 30 23:34:19 localhost spamd[4882]: spamd: result: Y 8 -
>>>> BAYES_80,CUST_DNSBL_15,CUST_DNSWL_2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,REPLYTO_WITHOUT_TO_CC,RP_MATCHES_RCVD,SPF_PASS
>>>>
>>>>
>>>> scantime=0.6,size=2454,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=51671,mid=<SN...@phx.gbl>,bayes=0.842503,autolearn=disabled
>>>>
>>>>
>>>> Aug 30 23:34:19 localhost postfix/cleanup[6195]: 3hlrXp5S3dz1w: milter-reject: END-OF-MESSAGE from
>>>> snt004-omc1s37.hotmail.com[65.55.90.48]: 5.7.1 Blocked by SpamAssassin; from=<je...@hotmail.com>  
>>>> to=<***>


Re: SA works great!

Posted by Bob Proulx <bo...@proulx.com>.
Reindl Harald wrote:
> schrieb Bob Proulx:
> > Being able to undeliver spam after it has been detected later and if
> > it is as yet unread is none of those bad things.  This is a positive
> > anti-spam feature in the core feature set of an email provider.
> 
> honestly i would not want to get a message removed which was already
> in my inbox because someone *later* decides it is spam and hence
> the sender got no NDR in case that action was a false postivie

You haven't seen the message yet.  You never knew it was ever in your
inbox.  It doesn't move messages you have already seen and read.  It
only does this if you haven't seen it.  Since you haven't read it yet
you never knew that it wasn't delivered to the Junk folder on the
first pass.

This is identical to the behavior you will get if SpamAssassin were to
have identified it as spam on the first pass.  It is impossible for
you to tell the difference.  A difference which is impossible to tell
is no difference.

There is never an NDR (non-delivery receipt) sent for email classified
as spam.  Or at least shouldn't ever be one sent.  That would create
backscatter spam itself.  So again that is no difference.  The only
time an NDR should be sent is if the email message is identified as
spam and rejected at SMTP time.  Many sites have no ability to do this
and instead accept the email and filter it later.  If it is identified
as spam later it cannot/should-not generate a bounce later or it will
be backscatter.

> in case of a false positive the sender can pretend with
> his logs that my server accepted the message and he is
> right in pretend that - so any message which was accepted
> with "250 OK" has to made it in my inbox and there is no
> but and if in that context
> 
> that it what makes mail *relieable*

Sorry but that did not parse.  I think you are saying that once
accepted at SMTP time email must be delivered.  Which is obviously not
true once anti-spam is applied.  Spammers would love it though.

> * no sending attempt not confirmed by "250 OK" is counted as
>   successful and retried up to 5 days
> * any message which is rejected produces a NDR from the sending
>   server to his user (or ignored by a spammer)
> * any message which can not be successful delivered
>   produces a bounce from the sending server to his
>   user (or ignored by a spammer)
> * any message not producing a NDR within 5 days can
>   be counted as delivered

None of the above is related to Googles ability to reclassify messages
later.

> anything which leads in
> 
> * accept and drop silent
> * accept and reject internally leading in a backscatter
> 
> makes email at a whole unrelieable and so does the same harm as spammers
> you can't justify with the fight against spam any bad design

This conversation is taking a turn off into the weeds.  I can see that
this is some crusade topic of yours.  Sorry but I am not interested in
investing the time to delve into it.

> if you do you end where the USA ended with justify anything with the
> fight against terrorism - that's really the same: the other side won
> because one did enough damage to itself to make them win

I see that we have left the road far behind, driven over the weeds
along the edge, and have gone seriously off-roading and off topic.
This has nothing to do with the email topic we started discussing.

> > Therefore the simple argument of "more code bad" does not apply.
> > Otherwise everyone who starts a program by copying "Hello world." and
> > expanding it would be stopped immediately by the inability to add code
> > in order to have it provide more functionality
> 
> you need always to draw a line - but with care
> feature creep in many cases proved later "OK, now we can throw
> away all that code and start from scratch because it became
> unmiantainable over the time" and i am somehow tired of all
> that rewrites starting each time with the early bugs again

I haven't looked at Google's code base for undelivering mail later
classified as spam.  If you haven't either then the assumption that it
*must* be bad from the start is unfounded.  And even if their version
1.0 were a mess then surely all would agree that Google easily has the
resources to rewrite it several times.

Bob

Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.

Am 03.09.2014 um 00:39 schrieb Bob Proulx:
> Reindl Harald wrote:
>> schrieb Bob Proulx:
>>> Ted Mittelstaedt wrote:
>>>> Bob Proulx wrote:
>>>>> Plus Google can "undeliver" a message from your Inbox if you have not
>>>>> read it yet.  Say a spammer slowly sends sneaky spam to 10,000 people.
>>>>> After the first dozen report the message as spam then the next 9988
>>>>> have the message undelivered from their Inbox over to the Junk folder.
>>>>
>>>> While I can assume they would have this capability have you ever seen
>>>> them actually do it?
>>>
>>> Like Herman Cain "I don't have facts to back this up." but believe it
>>> to be true based upon other people's reports on the net.
>>>
>>> The capability seems plausible.  It would be easy and reasonable for
>>> Google to implement.  For any large email provider such as Google,
>>> Yahoo, others *not* to implement that feature seems implausible.  If
>>> you could then why wouldn't you do it?
>>
>> because if i am smart i do not implement any feature which i do
>> not use as i do not install any package i do not use
>>
>> why?
>>
>> because every feature and code lying around may and will
>> sooner or later introduce side effects at updates or
>> unexpected situations and makes it harder to maintain
>> the codebase
> 
> If it were a bad feature I would agree.  If it were a feature that
> frivolously did unrelated things then I would agree.  But it doesn't.
> Is it a creeping feature?  No.  Is it core to the problem of
> anti-spam?  Yes.  Is it useful?  Yes.  Bad effects?  No.
> 
> Being able to undeliver spam after it has been detected later and if
> it is as yet unread is none of those bad things.  This is a positive
> anti-spam feature in the core feature set of an email provider.

honestly i would not want to get a message removed which was already
in my inbox because someone *later* decides it is spam and hence
the sender got no NDR in case that action was a false postivie

in case of a false positive the sender can pretend with
his logs that my server accepted the message and he is
right in pretend that - so any message which was accepted
with "250 OK" has to made it in my inbox and there is no
but and if in that context

that it what makes mail *relieable*

* no sending attempt not confirmed by "250 OK" is counted as
  successful and retried up to 5 days
* any message which is rejected produces a NDR from the sending
  server to his user (or ignored by a spammer)
* any message which can not be successful delivered
  produces a bounce from the sending server to his
  user (or ignored by a spammer)
* any message not producing a NDR within 5 days can
  be counted as delivered

anything which leads in

* accept and drop silent
* accept and reject internally leading in a backscatter

makes email at a whole unrelieable and so does the same harm as spammers
you can't justify with the fight against spam any bad design

if you do you end where the USA ended with justify anything with the
fight against terrorism - that's really the same: the other side won
because one did enough damage to itself to make them win

> Therefore the simple argument of "more code bad" does not apply.
> Otherwise everyone who starts a program by copying "Hello world." and
> expanding it would be stopped immediately by the inability to add code
> in order to have it provide more functionality

you need always to draw a line - but with care
feature creep in many cases proved later "OK, now we can throw
away all that code and start from scratch because it became
unmiantainable over the time" and i am somehow tired of all
that rewrites starting each time with the early bugs again


Re: SA works great!

Posted by Bob Proulx <bo...@proulx.com>.
Reindl Harald wrote:
> schrieb Bob Proulx:
> > Ted Mittelstaedt wrote:
> >> Bob Proulx wrote:
> >>> Plus Google can "undeliver" a message from your Inbox if you have not
> >>> read it yet.  Say a spammer slowly sends sneaky spam to 10,000 people.
> >>> After the first dozen report the message as spam then the next 9988
> >>> have the message undelivered from their Inbox over to the Junk folder.
> >>
> >> While I can assume they would have this capability have you ever seen
> >> them actually do it?
> > 
> > Like Herman Cain "I don't have facts to back this up." but believe it
> > to be true based upon other people's reports on the net.
> > 
> > The capability seems plausible.  It would be easy and reasonable for
> > Google to implement.  For any large email provider such as Google,
> > Yahoo, others *not* to implement that feature seems implausible.  If
> > you could then why wouldn't you do it?
> 
> because if i am smart i do not implement any feature which i do
> not use as i do not install any package i do not use
> 
> why?
> 
> because every feature and code lying around may and will
> sooner or later introduce side effects at updates or
> unexpected situations and makes it harder to maintain
> the codebase

If it were a bad feature I would agree.  If it were a feature that
frivolously did unrelated things then I would agree.  But it doesn't.
Is it a creeping feature?  No.  Is it core to the problem of
anti-spam?  Yes.  Is it useful?  Yes.  Bad effects?  No.

Being able to undeliver spam after it has been detected later and if
it is as yet unread is none of those bad things.  This is a positive
anti-spam feature in the core feature set of an email provider.

Therefore the simple argument of "more code bad" does not apply.
Otherwise everyone who starts a program by copying "Hello world." and
expanding it would be stopped immediately by the inability to add code
in order to have it provide more functionality.

Bob

Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.

Am 02.09.2014 um 22:24 schrieb Bob Proulx:
> Ted Mittelstaedt wrote:
>> Bob Proulx wrote:
>>> Plus Google can "undeliver" a message from your Inbox if you have not
>>> read it yet.  Say a spammer slowly sends sneaky spam to 10,000 people.
>>> After the first dozen report the message as spam then the next 9988
>>> have the message undelivered from their Inbox over to the Junk folder.
>>
>> While I can assume they would have this capability have you ever seen
>> them actually do it?
> 
> Like Herman Cain "I don't have facts to back this up." but believe it
> to be true based upon other people's reports on the net.
> 
> The capability seems plausible.  It would be easy and reasonable for
> Google to implement.  For any large email provider such as Google,
> Yahoo, others *not* to implement that feature seems implausible.  If
> you could then why wouldn't you do it?

because if i am smart i do not implement any feature which i do
not use as i do not install any package i do not use

why?

because every feature and code lying around may and will
sooner or later introduce side effects at updates or
unexpected situations and makes it harder to maintain
the codebase


Re: SA works great!

Posted by Bob Proulx <bo...@proulx.com>.
Ted Mittelstaedt wrote:
> Bob Proulx wrote:
> >Plus Google can "undeliver" a message from your Inbox if you have not
> >read it yet.  Say a spammer slowly sends sneaky spam to 10,000 people.
> >After the first dozen report the message as spam then the next 9988
> >have the message undelivered from their Inbox over to the Junk folder.
> 
> While I can assume they would have this capability have you ever seen
> them actually do it?

Like Herman Cain "I don't have facts to back this up." but believe it
to be true based upon other people's reports on the net.

The capability seems plausible.  It would be easy and reasonable for
Google to implement.  For any large email provider such as Google,
Yahoo, others *not* to implement that feature seems implausible.  If
you could then why wouldn't you do it?

Bob

Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 8/31/2014 4:46 PM, Bob Proulx wrote:
> Ted Mittelstaedt wrote:
>> Reindl Harald wrote:
>>> i think it's impossible to improve that much "out-of-the-box" because
>>> that would make it to sensitive while the bayes has the ham side of
>>> your communication too for decisions
>>
>> Google does it.  It's not impossible.
>
> But not "out of the box".  Google is at long term steady-state and
> can't really compare to a fresh installation of any spam filter.
>
> Plus Google can "undeliver" a message from your Inbox if you have not
> read it yet.  Say a spammer slowly sends sneaky spam to 10,000 people.
> After the first dozen report the message as spam then the next 9988
> have the message undelivered from their Inbox over to the Junk folder.

While I can assume they would have this capability have you ever seen
them actually do it?

Ted

> That is a powerful feature but one I have never implemented for
> myself.
>
> Bob

Re: SA works great!

Posted by Bob Proulx <bo...@proulx.com>.
Ted Mittelstaedt wrote:
> Reindl Harald wrote:
> > i think it's impossible to improve that much "out-of-the-box" because
> > that would make it to sensitive while the bayes has the ham side of
> > your communication too for decisions
> 
> Google does it.  It's not impossible.

But not "out of the box".  Google is at long term steady-state and
can't really compare to a fresh installation of any spam filter.

Plus Google can "undeliver" a message from your Inbox if you have not
read it yet.  Say a spammer slowly sends sneaky spam to 10,000 people.
After the first dozen report the message as spam then the next 9988
have the message undelivered from their Inbox over to the Junk folder.
That is a powerful feature but one I have never implemented for
myself.

Bob

Re: SA works great!

Posted by Axb <ax...@gmail.com>.
On 09/02/2014 11:06 AM, Ted Mittelstaedt wrote:

> masscheck runs against your spam and ham.  But, masscheck does not know
> if what your feeding it is actually ham or spam until you have gone
> through your corpora and sorted it - moved the spam to the spam folder
> and the ham to the ham folder (assuming that is that you get any false
> positives)  That is why you say you want the corpora cleaned and hand
> classified.
>
> This is something that I only do every once in a while when I'm
> preparing corpora for my bayes database.  If I setup masscheck to
> look at my inbox and my junk mail folder on a nightly basis, there
> is no guarantee that I happened to get to my mail that day or that week
> even to make sure that only ham is in my inbox and only spam is in
> my junk mail folder.

This is where you need *commitment* (a few hours/week)  to sort your 
stuff. If you can't be bothered, it's much easier to sit back and drop 
the load on others...

> If I have a folder full of spam that my local install of SpamAssassin
> has already marked as spam, then how does telling the SA project
> "yep, ya got that right" change anything in the rules scoring?

It helps by pushing autopromoting sandbox rules, raising scores, etc.

>
> There is a lack of explanation on the masscheck page as to how and
> why it's useful.  And it is also clear that accidentally leaving spam
> (spam that has not been identified as spam by SA) in your ham folder,
> and false positives (ham) in your spam folder, is not going to help
> masscheck any - if anything it's going to make the SA scoring worse.
> That seems to me to be very important.
>
> Perhaps that is why so few participate?  They do not understand why
> masscheck is important to the SA project because the documentation on
> it does not explain why.

Filling a wiki with lots of information tends to scare ppl away. Those 
who are truly interested in contributing will ask for information hints, 
ehlp and there's always devs available willing to help.

> Most "others" out there using OSS packages do not have the skills to
> contribute development time, even to contribute rules that do not
> have unintended consequences.  You might think it simple to write
> a rule but it's not the writing it that is the problem it is the
> thinking about the consequences.

Which is wy rules are not published blindly - there's GA which does a 
pretty good job at weeding the bad stuff out and user feedback isn't 
ignored.

> I've seen some real showstoppers in SpamAssassin rules such as the time
> that someone wrote a rule to target certain spam that ended up
> triggering off Outlook Express.

Don't know when that happened or who wrote that rule, but I do know that 
that there's devs who are *very* sensitive to that sort of stuff leaking 
into SA's ruleset and battle them real loud.

> I just think the SA developers are falling just a bit too conservative on this.

And that's the good thing - the SA SVN tree gives you all the tools to 
run your own fork, with GA/Perceptron and and ALL the goodies.
(you just need to glue the whole party together and no, that is *not* 
well documented)


> For starters as a SA user I do not feel the project is served by
> multiple sa-update channels promulgating different rulesets, if I
> had the coding ability to create a huge body of rules on par with
> the existing SA rules, I would absolutely not set it up as a competing
> ruleset.

Contribute by setting up and maintaining an extra sa-update channel. 
while others may take over writing the rules, is also an approach, but 
again the magic word is *commitment*

and now back to reviewing my spam trap inboxes for today's masscheck 
run.....







Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 8/31/2014 7:55 AM, Axb wrote:
> On 08/31/2014 04:08 PM, Ted Mittelstaedt wrote:
>> Out of the box the default decision point of 5 is too high anyway.
>
> SA is the framework - you can tune to your need as much as you want.
>
>> I think the emphasis on avoiding false positives in the stock
>> (non-Bayes) distribution is far too high. I suspect that over
>> the years many good rule submissions have been ignored because
>> incidence of false positives with them was too high for the
>> SA maintainers.
>
> During the last +-4 years, scores have been set by the masscheck GA system.
> IF more ppl would contribute with masschecks and rules, detection could
> be better, but the lack of volunteers doing this shows that apparently
> what SA does is good enough or there is little interest in commitment.
>

masscheck runs against your spam and ham.  But, masscheck does not know
if what your feeding it is actually ham or spam until you have gone 
through your corpora and sorted it - moved the spam to the spam folder 
and the ham to the ham folder (assuming that is that you get any false 
positives)  That is why you say you want the corpora cleaned and hand 
classified.

This is something that I only do every once in a while when I'm 
preparing corpora for my bayes database.  If I setup masscheck to
look at my inbox and my junk mail folder on a nightly basis, there
is no guarantee that I happened to get to my mail that day or that week
even to make sure that only ham is in my inbox and only spam is in
my junk mail folder.

If I have a folder full of spam that my local install of SpamAssassin 
has already marked as spam, then how does telling the SA project
"yep, ya got that right" change anything in the rules scoring?

There is a lack of explanation on the masscheck page as to how and
why it's useful.  And it is also clear that accidentally leaving spam
(spam that has not been identified as spam by SA) in your ham folder,
and false positives (ham) in your spam folder, is not going to help
masscheck any - if anything it's going to make the SA scoring worse.
That seems to me to be very important.

Perhaps that is why so few participate?  They do not understand why
masscheck is important to the SA project because the documentation on
it does not explain why.

> For the same reason, SARE went belly up after volunteers drifted to new
> interests, jobs, had families, etc.
>
> The lack of general commitment and a general passive attitude expecting
> "others" to do the job doesn't help at all.
>

That is a blame game that a lot of people on OSS projects take.

Most "others" out there using OSS packages do not have the skills to
contribute development time, even to contribute rules that do not
have unintended consequences.  You might think it simple to write
a rule but it's not the writing it that is the problem it is the
thinking about the consequences.

I've seen some real showstoppers in SpamAssassin rules such as the time
that someone wrote a rule to target certain spam that ended up 
triggering off Outlook Express - and when confronted with this the
authors response was along the lines of "well OE does not produce
an RFC compliant header so it's not MY problem"  Well sure, he was
right that OE does not produce an RFC compliant header - it's a piece
of crap.  Unfortunately at the time the rule was inserted it was a
piece of crap used by 1/2 of the Internet.  In other words he was not
willing to own the fact that this clever thing he had discovered
and turned into a rule was unusable because 1/2 of the users on the
Internet are morons using ancient crap mail client software.

I'd rather have fewer better rules from the SA developers who seem
to have the understanding of unintended consequences than more rules
from hotshots that figure they can go on a crusade to make everyone
on the Internet use the latest version of Thunderbird.  I just think
the SA developers are falling just a bit too conservative on this.

Anyway, IMHO the people complaining about others not kicking back to OSS
projects really need to start by taking this beef up
with the people bundling SA in commercial products (like Untangle
firewall - which uses SA in it's "free" version of Untangle which
acts as marketing slippery slope fodder to get people into the
commercial product) because those people are developers already,
and making significant coin off the OSS project.  It seems to me that 
those people have a far stronger moral obligation on them to contribute
development time to the SA project, than some admin out there of a 
company mailserver who barely knows what the term regexp means.

Disclaimer:  For all I know Untangle developers do kick coding time
back to the SA project - I'm using them as a convenient example to 
illustrate the issue.

>> For a newbie to SA it is disheartening to install SA and not
>> get 90% with a 2% false positive, out of the box, but rather get
>> 50% with a 0% false positive. And I think that is a mistake the
>> maintainers are making is over-reliance on bayes.
>
> Mantainers do what they can, on a voluntary basis. If newbies expect SA
> to be FUSP out of the box, then they didn't get enough info beforehand.
>

newbies expect any software product they install be it commercial or OSS
to be fully configured out of the box.  That's the definition of a 
newbie.  I'm not excusing it, merely explaining reality.

>> At the least the SA maintainers should maintain a separate
>> "highly aggressive" rule distro that was optional that would
>> give us a much higher success rate with a corresponding
>> slight increase in false positives.
>
> "should" ? SA devs are volunteers, contributing time and resources with
> little return other than some personal satisfaction of helping others.
> SA's develpment is not funded or backed by some multimillion corp.
>
> What are you doing to contribute ?
>
> SA is the framework - if you wish to start a sa-update channel for extra
> agressive rules_du_jour you're welcome to do it and if you find some
> volunteers to help you, even better.
>

For starters as a SA user I do not feel the project is served by
multiple sa-update channels promulgating different rulesets, if I
had the coding ability to create a huge body of rules on par with
the existing SA rules, I would absolutely not set it up as a competing
ruleset.

I feel that whenever someone develops a rule that succeeds in catching
some spam and not damaging ham, that it should go into the main SA 
ruleset. That will get the widest distribution as quickly as possible.

But you misunderstand what I am saying.  I do not find fault with
how SA operates internally, my beef is with it's out-of-box 
configuration not being aggressive enough.  Since I know how to modify
it's configuration to get it more aggressive I do not have a problem
with it.  But I believe that newbies who do not engage Bayes are not
getting enough filtering from it.

>> Their design approach has been to rely on Bayes to be trained to go from
>> 50% capture out of box with 0% FP to 80-90% capture with 0% FP.
>
> an assumption, based on what?
>

Observation.  Fine you don't want to believe me, go ahead but I have
spent a lot of time observing it on my servers.

>> But, the design approach could easily be relying on Bayes to go
>> from 90% capture with 5% FP out of the box, to 90% capture with
>> 0% FP with Bayes, and the emphasis being on training Bayes on ham,
>> not spam.
>>
>> Note I am pulling the percentages out of my ass, but I think you
>> get the idea.
>
> By design, SA's Bayes is not FUSP, it's a small part of the arsenal -
> depending on your skill to write rules, make use of other SA features,
> etc, you can even run a very efficient filtering system without it.
>
> There are simple methods to automagically feed Bayes with lots of spam
> or ham - depending on what you feel you need most. It's up to you to be
> creative and make use of SA's ton of features (including third party
> rules/plugins)
>

I do not understand why you think I don't agree with this statement.

I am merely making an observation that the maintainers are approaching
SA with the idea of it's out of box configuration being very soft, and
letting a lot of spam through until the admin starts tweaking knobs
and flipping switches.

There is nothing you have said that invalidates the approach of SA
having an out of box config that is very hard and lets very little spam
through until the admin starts tweaking knobs and flipping switches.  It 
is simply the opposite approach.

And I am saying that I think it would be a better approach.  You are
not addressing that.

Ted

Re: SA works great!

Posted by Axb <ax...@gmail.com>.
On 08/31/2014 10:54 PM, Ian Zimmerman wrote:
> On Sun, 31 Aug 2014 16:55:50 +0200,
> Axb <ax...@gmail.com> wrote:
>
> Axb> During the last +-4 years, scores have been set by the masscheck GA
> Axb> system.  IF more ppl would contribute with masschecks and rules,
> Axb> detection could be better, but the lack of volunteers doing this
> Axb> shows that apparently what SA does is good enough or there is
> Axb> little interest in commitment.
>
> So, how do I take part in masscheck?
>

Please see

http://wiki.apache.org/spamassassin/NightlyMassCheck



Re: SA works great!

Posted by Ian Zimmerman <it...@buug.org>.
On Sun, 31 Aug 2014 16:55:50 +0200,
Axb <ax...@gmail.com> wrote:

Axb> During the last +-4 years, scores have been set by the masscheck GA
Axb> system.  IF more ppl would contribute with masschecks and rules,
Axb> detection could be better, but the lack of volunteers doing this
Axb> shows that apparently what SA does is good enough or there is
Axb> little interest in commitment.

So, how do I take part in masscheck?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:

Re: SA works great!

Posted by Axb <ax...@gmail.com>.
On 08/31/2014 04:08 PM, Ted Mittelstaedt wrote:
> Out of the box the default decision point of 5 is too high anyway.

SA is the framework - you can tune to your need as much as you want.

> I think the emphasis on avoiding false positives in the stock
> (non-Bayes) distribution is far too high.  I suspect that over
> the years many good rule submissions have been ignored because
> incidence of false positives with them was too high for the
> SA maintainers.

During the last +-4 years, scores have been set by the masscheck GA system.
IF more ppl would contribute with masschecks and rules, detection could 
be better, but the lack of volunteers doing this shows that apparently 
what SA does is good enough or there is little interest in commitment.

For the same reason, SARE went belly up after volunteers drifted to new 
interests, jobs, had families, etc.

The lack of general commitment and a general passive attitude expecting 
"others" to do the job doesn't help at all.

> For a newbie to SA it is disheartening to install SA and not
> get 90% with a 2% false positive, out of the box, but rather get
> 50% with a 0% false positive.  And I think that is a mistake the
> maintainers are making is over-reliance on bayes.

Mantainers do what they can, on a voluntary basis. If newbies expect SA 
to be FUSP out of the box, then they didn't get enough info beforehand.

> At the least the SA maintainers should maintain a separate
> "highly aggressive" rule distro that was optional that would
> give us a much higher success rate with a corresponding
> slight increase in false positives.

"should" ? SA devs are volunteers, contributing time and resources with 
little return other than some personal satisfaction of helping others.
SA's develpment is not funded or backed by some multimillion corp.

What are you doing to contribute ?

SA is the framework - if you wish to start a sa-update channel for extra 
agressive rules_du_jour you're welcome to do it and if you find some 
volunteers to help you, even better.

> Their design approach has been to rely on Bayes to be trained to go from
> 50% capture out of box with 0% FP to 80-90% capture with 0% FP.

an assumption, based on what?

> But, the design approach could easily be relying on Bayes to go
> from 90% capture with 5% FP out of the box, to 90% capture with
> 0% FP with Bayes, and the emphasis being on training Bayes on ham,
> not spam.
>
> Note I am pulling the percentages out of my ass, but I think you
> get the idea.

By design, SA's Bayes is not FUSP, it's a small part of the arsenal - 
depending on your skill to write rules, make use of other SA features, 
etc, you can even run a very efficient filtering system without it.

There are simple methods to automagically feed Bayes with lots of spam 
or ham - depending on what you feel you need most. It's up to you to be 
creative and make use of SA's ton of features (including third party 
rules/plugins)




Re: SA works great!

Posted by Kai Schaetzl <ma...@conactive.com>.
Ted Mittelstaedt wrote on Sun, 31 Aug 2014 07:08:11 -0700:

> Out of the box the default decision point of 5 is too high anyway.

No. You can always lower it yourself. With the result of more FPs. If you 
or your users can live with that. Fine. Many can't.

> I think the emphasis on avoiding false positives in the stock
> (non-Bayes) distribution is far too high.

No. This and the "multi-axe" approach are what kept alive SA over this 
long time and led to wide adoption in businesses as well and made it a de
-facto standard for content-based filtering.



Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com




Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 9/2/2014 3:54 AM, Reindl Harald wrote:
>
>
> Am 02.09.2014 um 12:37 schrieb Ted Mittelstaedt:
>> On 9/2/2014 2:16 AM, Reindl Harald wrote:
>>> and here you prove again that it don't work really out-of-the-box
>>> because if i have to look all day long in my spam folder because
>>> a noticeable part of my legit mail lands there it *do not work*
>>
>> Are you one of those idiot users?  No.  But, you are NOT the average
>> Internet user, your NOT Google's target market, they don't want you on
>> their service, bitching to them about your FP's and them putting
>> your ham in the junk folder
>>
>> Your doing the usual techie thing which is to assume that since your a
>> techie that everyone selling anything on the Internet wants to market
>> and sell to you
>
> this is a *technical list*
>
> if you are not a sysadmin preapre and configure SA you are just
> not the audience - what is so hard to understand that this is
> typical *server software* and no "non-techie" ever should setup
> a public server
>
> given what damage one can do with a wrong configured MTA it should
> be even forbidden by law to setup one without prove the knowledge
> before and guess what: in that case 20-30% of spam would disappear
> from one day to the next
>
>

No argument there my point to him is that he's not thinking about
how the typical user is viewing their email service.

Ted

Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.

Am 02.09.2014 um 12:37 schrieb Ted Mittelstaedt:
> On 9/2/2014 2:16 AM, Reindl Harald wrote:
>> and here you prove again that it don't work really out-of-the-box
>> because if i have to look all day long in my spam folder because
>> a noticeable part of my legit mail lands there it *do not work*
>
> Are you one of those idiot users?  No.  But, you are NOT the average
> Internet user, your NOT Google's target market, they don't want you on
> their service, bitching to them about your FP's and them putting
> your ham in the junk folder
> 
> Your doing the usual techie thing which is to assume that since your a
> techie that everyone selling anything on the Internet wants to market
> and sell to you

this is a *technical list*

if you are not a sysadmin preapre and configure SA you are just
not the audience - what is so hard to understand that this is
typical *server software* and no "non-techie" ever should setup
a public server

given what damage one can do with a wrong configured MTA it should
be even forbidden by law to setup one without prove the knowledge
before and guess what: in that case 20-30% of spam would disappear
from one day to the next



Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 9/2/2014 3:48 AM, Axb wrote:
> On 09/02/2014 12:37 PM, Ted Mittelstaedt wrote:
>> I'm just saying that out of box it should catch more spam and assume
>> people will tolerate a few FPs. Because that is what I am seeing people
>> demand in the real world. This insistence that "if SA is responsible
>> for even ONE FP it's a disaster" is a drag on SA.
>
> There is no such "insistence" - there's a target: deliver a framework
> and a rule base which does a pretty good job on a global basis, from the
> micro hoster in Bolivia, the school in Scotland to the guy hosting 500k
> Plesk'd domains in some datacenter in Tampa.
>
> SA is NOT and will never be a spam filtering service with zero hour
> detection. It's a piece of software, a collectiong of tools, with a few
> templates (rules)
>
> Do you also rant at Microsoft because you expect MS-Word to write your
> biography for you?
>

Apples and Oranges there, we are arguing over a fine point of
software that actually operates.

Ted

> EOT

Re: SA works great!

Posted by Axb <ax...@gmail.com>.
On 09/02/2014 12:37 PM, Ted Mittelstaedt wrote:
> I'm just saying that out of box it should catch more spam and assume
> people will tolerate a few FPs.  Because that is what I am seeing people
> demand in the real world.  This insistence that "if SA is responsible
> for even ONE FP it's a disaster" is a drag on SA.

There is no such "insistence" - there's a target: deliver a framework 
and a rule base which does a pretty good job on a global basis, from the 
micro hoster in Bolivia, the school in Scotland to the guy hosting 500k 
Plesk'd domains in some datacenter in Tampa.

SA is NOT and will never be a spam filtering service with zero hour 
detection. It's a piece of software, a collectiong of tools,  with a few 
templates (rules)

Do you also rant at Microsoft because you expect MS-Word to write your 
biography for you?

EOT

Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 9/2/2014 2:16 AM, Reindl Harald wrote:
>
>
> Am 02.09.2014 um 09:57 schrieb Ted Mittelstaedt:
>> On 8/31/2014 5:11 PM, LuKreme wrote:
>>>
>>> On 31 Aug 2014, at 08:08 , Ted Mittelstaedt<te...@ipinc.net>   wrote:
>>>> Google does it.  It's not impossible.
>>>
>>> [snip]
>>>
>>>> My experience is that the commercial providers like Gmail are now
>>>> so aggressive that false positives are VERY common on their systems,
>>>> this leads to people nowadays quite commonly saying "check your
>>>> spam folder" on their websites and such that send feedback messages.
>>>
>>> These two statements do not go together.
>>
>> Only because your stubbornly sticking your head in the sand.
>
> stop that trolling please
>

Then you stop being deliberately ignorant.

>> Google has well over 90% catch rate on spam out of the box
>
> fine, you ignore that it is *not* out of the box
>
> it is the same way a built up and configured system as i built mine
> the last few weeks and train bayes was one of the setup steps
>

OK that is a point.  But does the typical admin who first installs SA
train the Bayes filter for all users?  I don't believe they do.  In fact
the documentation for SA focuses so heavily on individual bayes 
databases for each user that probably most of them don't even realize 
they can use a single bayes database for all users on the server.

But my point is with the rules database anyway, not with bayes.  I don't 
see a problem with calling for a lower default spam score in the default 
globabl config which is going to make the rules database
catch more spam, with a side effect of a slightly higher FP rate.

>> Google ALSO has a 1-2% False Positive rate out of the box.  Their catch
>> rate is so high because they are willing to accept a high false positive rate.
>
> and so it don't work really
>

Your definition of it not working.  But their users seem to disagree.

>> Most users of Google are, in my opinion, idiots, and when their friends
>> email them and they don't get the email, once their friends contact them
>> later they almost NEVER go to Google's Junk Mail box - and notice that
>> Google blocked their legitimate mail.  And if they DO notice this they
>> blame the sender (their friends) because Google Is Never Wrong.
>
> and here you prove again that it don't work really out-of-the-box
> because if i have to look all day long in my spam folder because
> a noticeable part of my legit mail lands there it *do not work*
>

Are you one of those idiot users?  No.  But, you are NOT the average
Internet user, your NOT Google's target market, they don't want you on
their service, bitching to them about your FP's and them putting
your ham in the junk folder.

Your doing the usual techie thing which is to assume that since your a
techie that everyone selling anything on the Internet wants to market
and sell to you.

Sorry to burst your bubble that isn't how it works, the tech companies
don't want people like you buying stuff from them.

They don't want people like you returning the video card they bought 
because you actually tested it and discovered they were lying about 
their FPS or whatever.

They don't want people like you buying the 5 year warranty disk drive
then returning it 4 years into it when it crashes.

They want stupid people.  Stupid people who will pay them and accept the
dreck they sell. Stupid people who will not even know the warranty on 
their disk drive and won't bother finding it out they will just toss it.

This is the reality out there and so to the majority of people using 
email services they aren't like you, they don't look in their junk mail
folders, and if a corespondent calls them and asks them to look for a 
message in their junk folder and they find it, they will assume the
corespondent is at fault "because everyone else doesn't go into my junk 
mail folder, only you do so it's your problem"

>> Unfortunately, the number of idiots on the Internet vastly outweighs
>> the number of smart people which is why Google is bigger
>
> that may be true but don't change the fact that no spamfilter software
> now, tomorrow or in 5 years will be perfect out of the box and frankly
> it does not need to - it's typically part of a mailserver and nobody
> should run a MTA "out-of-the-box" and expect "somehow it will work"
>

I'm just saying that out of box it should catch more spam and assume
people will tolerate a few FPs.  Because that is what I am seeing people 
demand in the real world.  This insistence that "if SA is responsible 
for even ONE FP it's a disaster" is a drag on SA.

Ted

Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.

Am 02.09.2014 um 09:57 schrieb Ted Mittelstaedt:
> On 8/31/2014 5:11 PM, LuKreme wrote:
>>
>> On 31 Aug 2014, at 08:08 , Ted Mittelstaedt<te...@ipinc.net>  wrote:
>>> Google does it.  It's not impossible.
>>
>> [snip]
>>
>>> My experience is that the commercial providers like Gmail are now
>>> so aggressive that false positives are VERY common on their systems,
>>> this leads to people nowadays quite commonly saying "check your
>>> spam folder" on their websites and such that send feedback messages.
>>
>> These two statements do not go together.
> 
> Only because your stubbornly sticking your head in the sand.

stop that trolling please

> Google has well over 90% catch rate on spam out of the box

fine, you ignore that it is *not* out of the box

it is the same way a built up and configured system as i built mine
the last few weeks and train bayes was one of the setup steps

> Google ALSO has a 1-2% False Positive rate out of the box.  Their catch
> rate is so high because they are willing to accept a high false positive rate.

and so it don't work really

> Most users of Google are, in my opinion, idiots, and when their friends 
> email them and they don't get the email, once their friends contact them
> later they almost NEVER go to Google's Junk Mail box - and notice that
> Google blocked their legitimate mail.  And if they DO notice this they
> blame the sender (their friends) because Google Is Never Wrong.

and here you prove again that it don't work really out-of-the-box
because if i have to look all day long in my spam folder because
a noticeable part of my legit mail lands there it *do not work*

> Unfortunately, the number of idiots on the Internet vastly outweighs
> the number of smart people which is why Google is bigger

that may be true but don't change the fact that no spamfilter software
now, tomorrow or in 5 years will be perfect out of the box and frankly
it does not need to - it's typically part of a mailserver and nobody
should run a MTA "out-of-the-box" and expect "somehow it will work"


Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.
Am 02.09.2014 um 22:40 schrieb Ted Mittelstaedt:
> Yes, that is my experience when I setup test addresses on Gmail and
> stick them into spammer unsubscribe links.  Lots of spam starts showing
> up and over 90% in the junk folder

Bruhaha and that is "working out of the box"?

your problem is that you don't understand SA, nor how spamfiltering
works or even SMTP at all nor the difference between ACCEPT, FLAG
and REJECT

if 90% of the spam makes it through a for me visible folder the
whole filter don't work at all - 99% of that 90% in a 3 level
setup are *rejected* and never seen by the RCPT

read my other reply and the go to bed and *please* don't
ever repsond to any thread i started because you have
proven to have no clue what you are talking about enough


Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 9/2/2014 12:19 PM, LuKreme wrote:
> On 02 Sep 2014, at 01:57 , Ted Mittelstaedt<te...@ipinc.net>  wrote:
>> On 8/31/2014 5:11 PM, LuKreme wrote:
>>>
>>> On 31 Aug 2014, at 08:08 , Ted Mittelstaedt<te...@ipinc.net>
>>> wrote:
>>>> Google does it.  It's not impossible.
>>>
>>> [snip]
>>>
>>>> My experience is that the commercial providers like Gmail are
>>>> now so aggressive that false positives are VERY common on their
>>>> systems, this leads to people nowadays quite commonly saying
>>>> "check your spam folder" on their websites and such that send
>>>> feedback messages.
>>>
>>> These two statements do not go together.
>>
>> Only because your stubbornly sticking your head in the sand.
>>
>> Google has well over 90% catch rate on spam out of the box.
>
> "Out of the box"? What does that even mean for Google? Do you mean
> that when the introduced their gmail service they had 90% spam catch
> rate? I don't recall that being the case at all.
>

Yes, that is my experience when I setup test addresses on Gmail and
stick them into spammer unsubscribe links.  Lots of spam starts showing
up and over 90% in the junk folder.

>> Google ALSO has a 1-2% False Positive rate out of the box.  Their
>> catch rate is so high because they are willing to accept a high
>> false positive rate.
>
> That is one reason. The other reason, of course, is that they have
> literally BILLIONS of mail messages to train from. In fact, Google
> has so much mail to train from, that it is shocking to me they have
> any false positives at all.
>

Any statistician will tell you that the billions is immaterial, that
if they get a small cross section of those billions they can train from
that and get the same results.

In other words, once you get enough spam, it all starts to look the
same.  Once you get enough ham it all starts to look the same.  That's 
the whole point of the SA rulesets after all.

> The fact is, if 2% of my mail ends up in my spam folder then I have
> to spend a lot more time in my spam folder than I want to, and enough
> time that it makes my spam folder useless because not only do I have
> to scan it constantly, but I have to then go jump through some sorts
> of hoops to train it to hopefully not be spam in the future.
>
> Spread that 2% error rate over a half dozen email addresses and I am
> back to the bad old days of the late 90s when the majority of the
> time I spent in email was spent dealing with the spam.
>

I agree totally.  But how do you answer the business owner who is as
ignorant of these things as a box of rocks and who just sees an empty
inbox?  Even though they might have good mails going into their junk
folder.  Since they are typical ignorant user they don't know enough to
check their spam folder.

Even worse are some of them who think that if I spend 30 minutes a day
digging ham out of my junk folder it's better than spending 30 minutes
a day deleting spam from my inbox.  How do you respond to that?

Ted

Re: SA works great!

Posted by LuKreme <kr...@kreme.com>.
On 02 Sep 2014, at 01:57 , Ted Mittelstaedt <te...@ipinc.net> wrote:
> On 8/31/2014 5:11 PM, LuKreme wrote:
>> 
>> On 31 Aug 2014, at 08:08 , Ted Mittelstaedt<te...@ipinc.net>  wrote:
>>> Google does it.  It's not impossible.
>> 
>> [snip]
>> 
>>> My experience is that the commercial providers like Gmail are now
>>> so aggressive that false positives are VERY common on their systems,
>>> this leads to people nowadays quite commonly saying "check your
>>> spam folder" on their websites and such that send feedback messages.
>> 
>> These two statements do not go together.
> 
> Only because your stubbornly sticking your head in the sand.
> 
> Google has well over 90% catch rate on spam out of the box.

"Out of the box"? What does that even mean for Google? Do you mean that when the introduced their gmail service they had 90% spam catch rate? I don't recall that being the case at all.

> Google ALSO has a 1-2% False Positive rate out of the box.  Their catch
> rate is so high because they are willing to accept a high false positive rate.

That is one reason. The other reason, of course, is that they have literally BILLIONS of mail messages to train from. In fact, Google has so much mail to train from, that it is shocking to me they have any false positives at all.

The fact is, if 2% of my mail ends up in my spam folder then I have to spend a lot more time in my spam folder than I want to, and enough time that it makes my spam folder useless because not only do I have to scan it constantly, but I have to then go jump through some sorts of hoops to train it to hopefully not be spam in the future.

Spread that 2% error rate over a half dozen email addresses and I am back to the bad old days of the late 90s when the majority of the time I spent in email was spent dealing with the spam.

-- 
No matter how fast light travels it finds the darkness has always got
there first, and is waiting for it.


Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 8/31/2014 5:11 PM, LuKreme wrote:
>
> On 31 Aug 2014, at 08:08 , Ted Mittelstaedt<te...@ipinc.net>  wrote:
>> Google does it.  It's not impossible.
>
> [snip]
>
>> My experience is that the commercial providers like Gmail are now
>> so aggressive that false positives are VERY common on their systems,
>> this leads to people nowadays quite commonly saying "check your
>> spam folder" on their websites and such that send feedback messages.
>
> These two statements do not go together.
>
>

Only because your stubbornly sticking your head in the sand.

Google has well over 90% catch rate on spam out of the box.

Google ALSO has a 1-2% False Positive rate out of the box.  Their catch
rate is so high because they are willing to accept a high false positive 
rate.

Most users of Google are, in my opinion, idiots, and when their friends 
email them and they don't get the email, once their friends contact them
later they almost NEVER go to Google's Junk Mail box - and notice that
Google blocked their legitimate mail.  And if they DO notice this they
blame the sender (their friends) because Google Is Never Wrong.

Unfortunately, the number of idiots on the Internet vastly outweighs
the number of smart people which is why Google is bigger.

Ted

Re: SA works great!

Posted by LuKreme <kr...@kreme.com>.
On 31 Aug 2014, at 08:08 , Ted Mittelstaedt <te...@ipinc.net> wrote:
> Google does it.  It's not impossible.

[snip]

> My experience is that the commercial providers like Gmail are now
> so aggressive that false positives are VERY common on their systems,
> this leads to people nowadays quite commonly saying "check your
> spam folder" on their websites and such that send feedback messages.

These two statements do not go together.


-- 
People only think for themselves if you tell them to.


Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 8/31/2014 2:21 AM, Reindl Harald wrote:
>
> Am 31.08.2014 um 02:15 schrieb Ted Mittelstaedt:
>> Yes, it does work great when you have the bayes filter turned on and you take the time to feed it.  And that means
>> you have to feed the
>> learner both ham and spam and setup reliable sources for those.
>>
>> Unfortunately if Bayes is not turned on, it does not catch more than
>> around 60-70% of spam.  As a Spamassassin user&  server admin, I would
>> really like to see that improve.
>
> 60-70% without training is great
>
> keep in mind that the first 90% of incoming is eaten by RBL's
> and the 60% are from the remaining 10% at all :-)
>
> i think it's impossible to improve that much "out-of-the-box" because
> that would make it to sensitive while the bayes has the ham side of
> your communication too for decisions
>

Google does it.  It's not impossible.

> i am coming from a commercial device trying to block 100% and there
> it ends in zero-hour-blocklists with domains even if they are only
> linked on the youtube page of the blocked facebook notification
>
> so i am glad that i have to do soem training by myself instead fear
> of false positives which do much more harm
>

My experience is that the commercial providers like Gmail are now
so aggressive that false positives are VERY common on their systems,
this leads to people nowadays quite commonly saying "check your
spam folder" on their websites and such that send feedback messages.

Out of the box the default decision point of 5 is too high anyway.

I think the emphasis on avoiding false positives in the stock
(non-Bayes) distribution is far too high.  I suspect that over
the years many good rule submissions have been ignored because
incidence of false positives with them was too high for the
SA maintainers.

For a newbie to SA it is disheartening to install SA and not
get 90% with a 2% false positive, out of the box, but rather get
50% with a 0% false positive.  And I think that is a mistake the
maintainers are making is over-reliance on bayes.

At the least the SA maintainers should maintain a separate
"highly aggressive" rule distro that was optional that would
give us a much higher success rate with a corresponding
slight increase in false positives.

Their design approach has been to rely on Bayes to be trained to go from 
50% capture out of box with 0% FP to 80-90% capture with 0% FP.

But, the design approach could easily be relying on Bayes to go
from 90% capture with 5% FP out of the box, to 90% capture with
0% FP with Bayes, and the emphasis being on training Bayes on ham,
not spam.

Note I am pulling the percentages out of my ass, but I think you
get the idea.

Ted

>> On 8/30/2014 2:41 PM, Reindl Harald wrote:
>>> after two days running SA for the first two test-domains with a
>>> well trained bayes for the global milter-user: impressive!
>>>
>>> the few crap making it through poscreen RBL scroing is detected
>>>
>>> 0.000          0          3          0  non-token data: bayes db version
>>> 0.000          0       1389          0  non-token data: nspam
>>> 0.000          0       1350          0  non-token data: nham
>>> 0.000          0     257152          0  non-token data: ntokens
>>>
>>> Aug 30 23:34:19 localhost spamd[4882]: spamd: identified spam (8.9/4.5) for sa-milt:189 in 0.6 seconds, 2454 bytes.
>>> Aug 30 23:34:19 localhost spamd[4882]: spamd: result: Y 8 -
>>> BAYES_80,CUST_DNSBL_15,CUST_DNSWL_2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,REPLYTO_WITHOUT_TO_CC,RP_MATCHES_RCVD,SPF_PASS
>>>
>>> scantime=0.6,size=2454,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=51671,mid=<SN...@phx.gbl>,bayes=0.842503,autolearn=disabled
>>>
>>> Aug 30 23:34:19 localhost postfix/cleanup[6195]: 3hlrXp5S3dz1w: milter-reject: END-OF-MESSAGE from
>>> snt004-omc1s37.hotmail.com[65.55.90.48]: 5.7.1 Blocked by SpamAssassin; from=<je...@hotmail.com>   to=<***>
>

Re: SA works great!

Posted by Reindl Harald <h....@thelounge.net>.
Am 31.08.2014 um 02:15 schrieb Ted Mittelstaedt:
> Yes, it does work great when you have the bayes filter turned on and you take the time to feed it.  And that means
> you have to feed the
> learner both ham and spam and setup reliable sources for those.
> 
> Unfortunately if Bayes is not turned on, it does not catch more than
> around 60-70% of spam.  As a Spamassassin user & server admin, I would
> really like to see that improve.

60-70% without training is great

keep in mind that the first 90% of incoming is eaten by RBL's
and the 60% are from the remaining 10% at all :-)

i think it's impossible to improve that much "out-of-the-box" because
that would make it to sensitive while the bayes has the ham side of
your communication too for decisions

i am coming from a commercial device trying to block 100% and there
it ends in zero-hour-blocklists with domains even if they are only
linked on the youtube page of the blocked facebook notification

so i am glad that i have to do soem training by myself instead fear
of false positives which do much more harm

> On 8/30/2014 2:41 PM, Reindl Harald wrote:
>> after two days running SA for the first two test-domains with a
>> well trained bayes for the global milter-user: impressive!
>>
>> the few crap making it through poscreen RBL scroing is detected
>>
>> 0.000          0          3          0  non-token data: bayes db version
>> 0.000          0       1389          0  non-token data: nspam
>> 0.000          0       1350          0  non-token data: nham
>> 0.000          0     257152          0  non-token data: ntokens
>>
>> Aug 30 23:34:19 localhost spamd[4882]: spamd: identified spam (8.9/4.5) for sa-milt:189 in 0.6 seconds, 2454 bytes.
>> Aug 30 23:34:19 localhost spamd[4882]: spamd: result: Y 8 -
>> BAYES_80,CUST_DNSBL_15,CUST_DNSWL_2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,REPLYTO_WITHOUT_TO_CC,RP_MATCHES_RCVD,SPF_PASS
>>
>> scantime=0.6,size=2454,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=51671,mid=<SN...@phx.gbl>,bayes=0.842503,autolearn=disabled
>>
>> Aug 30 23:34:19 localhost postfix/cleanup[6195]: 3hlrXp5S3dz1w: milter-reject: END-OF-MESSAGE from
>> snt004-omc1s37.hotmail.com[65.55.90.48]: 5.7.1 Blocked by SpamAssassin; from=<je...@hotmail.com>  to=<***>


Re: SA works great!

Posted by Ted Mittelstaedt <te...@ipinc.net>.
Yes, it does work great when you have the bayes filter turned on and you 
take the time to feed it.  And that means you have to feed the
learner both ham and spam and setup reliable sources for those.

Unfortunately if Bayes is not turned on, it does not catch more than
around 60-70% of spam.  As a Spamassassin user & server admin, I would
really like to see that improve.

Ted

On 8/30/2014 2:41 PM, Reindl Harald wrote:
> after two days running SA for the first two test-domains with a
> well trained bayes for the global milter-user: impressive!
>
> the few crap making it through poscreen RBL scroing is detected
>
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0       1389          0  non-token data: nspam
> 0.000          0       1350          0  non-token data: nham
> 0.000          0     257152          0  non-token data: ntokens
>
> Aug 30 23:34:19 localhost spamd[4882]: spamd: identified spam (8.9/4.5) for sa-milt:189 in 0.6 seconds, 2454 bytes.
> Aug 30 23:34:19 localhost spamd[4882]: spamd: result: Y 8 -
> BAYES_80,CUST_DNSBL_15,CUST_DNSWL_2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,REPLYTO_WITHOUT_TO_CC,RP_MATCHES_RCVD,SPF_PASS
> scantime=0.6,size=2454,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=51671,mid=<SN...@phx.gbl>,bayes=0.842503,autolearn=disabled
> Aug 30 23:34:19 localhost postfix/cleanup[6195]: 3hlrXp5S3dz1w: milter-reject: END-OF-MESSAGE from
> snt004-omc1s37.hotmail.com[65.55.90.48]: 5.7.1 Blocked by SpamAssassin; from=<je...@hotmail.com>  to=<***>
>
>
>