You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Reindl Harald <h....@thelounge.net> on 2016/05/13 13:42:07 UTC
FSL_HELO_HOME: deep headers again
WTF - Received: from daves-air.home ([1.125.7.92]) is another time a
DEEP HEADER Inspection - What about score not well thought rules which
are even not worth a decription not higher than 0.5?
3.7 FSL_HELO_HOME No description available
score FSL_HELO_HOME 2.641 3.722 2.641 3.722
AND YES IT WAS A FALSE-POSITIVE
Re: FSL_HELO_HOME: deep headers again
Posted by Reindl Harald <h....@thelounge.net>.
Am 13.05.2016 um 15:42 schrieb Reindl Harald:
> WTF - Received: from daves-air.home ([1.125.7.92]) is another time a
> DEEP HEADER Inspection - What about score not well thought rules which
> are even not worth a decription not higher than 0.5?
>
> 3.7 FSL_HELO_HOME No description available
> score FSL_HELO_HOME 2.641 3.722 2.641 3.722
>
> AND YES IT WAS A FALSE-POSITIVE
looks it was introduced with one of the few updates in this month and
from May 12 06:21:46 until now it hitted 6 *100% ham messages* and not a
single spam while the last one was even rejcted because of the 3.7 points
again: Auto-QA does *not* work and who is that "FSL" which writes all
the time deep-header rules without a sensible max-score?
08-Mai-2016 01:38:23: SpamAssassin: No update available
09-Mai-2016 00:02:56: SpamAssassin: No update available
10-Mai-2016 01:10:29: SpamAssassin: No update available
11-Mai-2016 00:55:46: SpamAssassin: No update available
12-Mai-2016 00:21:17: SpamAssassin: Update processed successfully
13-Mai-2016 00:33:31: SpamAssassin: No update available
Re: FSL_HELO_HOME: deep headers again
Posted by John Hardin <jh...@impsec.org>.
On Thu, 26 May 2016, Reindl Harald wrote:
> Am 13.05.2016 um 18:18 schrieb John Hardin:
>> On Fri, 13 May 2016, RW wrote:
>>
>> > On Fri, 13 May 2016 15:42:07 +0200
>> > Reindl Harald wrote:
>> >
>> > > WTF - Received: from daves-air.home ([1.125.7.92]) is another time a
>> > > DEEP HEADER Inspection -
>> >
>> > This looks like a simple mistake rather than a deliberate attempt at a
>> > deep check. You should file a bug report.
>>
>> Please don't. The rule has been disabled
>
> has it?
At the time I wrote that I'd looked at the sandbox file and the rule had
been commented out.
http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/maddoc/99_doc_test.cf?r1=1726846&r2=1743683&sortby=date&diff_format=h
Checking SVN shows it has not been reenabled.
http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/maddoc/99_doc_test.cf?diff_format=h&sortby=date&view=log
There hasn't been a rules update since that change - the last update
covers through revision 1743621, just before that change. It looks like
the corpora are large enough, it seems to be a timing issue now.
A bug report about this rule would either modify or disable the rule,
which has already been done, but wouldn't cause an update to be delivered
any sooner than the masscheck corpora allow.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Public Education: the bureaucratic process of replacing
an empty mind with a closed one. -- Thorax
-----------------------------------------------------------------------
4 days until Memorial Day - honor those who sacrificed for our liberty
Re: FSL_HELO_HOME: deep headers again
Posted by Reindl Harald <h....@thelounge.net>.
Am 13.05.2016 um 18:18 schrieb John Hardin:
> On Fri, 13 May 2016, RW wrote:
>
>> On Fri, 13 May 2016 15:42:07 +0200
>> Reindl Harald wrote:
>>
>>> WTF - Received: from daves-air.home ([1.125.7.92]) is another time a
>>> DEEP HEADER Inspection -
>>
>> This looks like a simple mistake rather than a deliberate attempt at a
>> deep check. You should file a bug report.
>
> Please don't. The rule has been disabled
has it?
May 24 14:57:15 mail-gw spamd[17055]: spamd: result: . -3 -
BAYES_00,CUST_DNSWL_7_ORG_LOW,CUST_DNSWL_8_TL_NT,FSL_HELO_HOME,HTML_MESSAGE,SPF_NONE
scantime=2.3,size=39898,user=sa-milt,uid=189,required_score=5.5,rhost=localhost,raddr=127.0.0.1,rport=/run/spamassassin/spamassassin.sock,mid=<A3...@warga-hack.at>,bayes=0.000000,autolearn=disabled,shortcircuit=no
May 24 16:26:51 mail-gw spamd[21920]: spamd: result: . -1 -
BAYES_20,CUST_DNSWL_2_SENDERSC_LOW,CUST_DNSWL_7_ORG_LOW,CUST_DNSWL_8_TL_NT,DKIM_SIGNED,DKIM_VALID,FSL_HELO_HOME,HTML_MESSAGE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_NONE
scantime=1.9,size=7282,user=sa-milt,uid=189,required_score=5.5,rhost=localhost,raddr=127.0.0.1,rport=/run/spamassassin/spamassassin.sock,mid=<87...@franzwine.com>,bayes=0.176735,autolearn=disabled,shortcircuit=no
May 24 18:55:13 mail-gw spamd[2470]: spamd: result: . -3 -
BAYES_00,CUST_BODY_BEGINS_VL,CUST_DNSWL_8_TL_NT,FSL_HELO_HOME,HTML_MESSAGE,SPF_NONE
scantime=3.8,size=168647,user=sa-milt,uid=189,required_score=5.5,rhost=localhost,raddr=127.0.0.1,rport=/run/spamassassin/spamassassin.sock,mid=<00...@saeco-professional.pl>,bayes=0.000000,autolearn=disabled,shortcircuit=no
May 24 23:37:48 mail-gw spamd[11907]: spamd: result: . 2 -
BAYES_00,CUST_DNSBL_20_SORBS_SPAM,CUST_DNSBL_30_SENDERSC_MED,CUST_DNSBL_34_BACKSCATTER,CUST_DNSWL_7_ORG_LOW,CUST_DNSWL_8_TL_NT,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,FSL_HELO_HOME,HTML_IMAGE_RATIO_02,HTML_MESSAGE,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_PASS
scantime=2.5,size=955522,user=sa-milt,uid=189,required_score=5.5,rhost=localhost,raddr=127.0.0.1,rport=/run/spamassassin/spamassassin.sock,mid=<5A...@gmail.com>,bayes=0.000000,autolearn=disabled,shortcircuit=no
May 25 07:03:18 mail-gw spamd[833]: spamd: result: . -5 -
BAYES_00,CUST_DNSWL_12_TL_MED,CUST_DNSWL_2_SENDERSC_LOW,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FSL_HELO_HOME,RP_MATCHES_RCVD,SPF_PASS
scantime=2.6,size=2243,user=sa-milt,uid=189,required_score=5.5,rhost=localhost,raddr=127.0.0.1,rport=/run/spamassassin/spamassassin.sock,mid=<2A...@me.com>,bayes=0.000000,autolearn=disabled,shortcircuit=no
May 25 11:46:26 mail-gw spamd[7557]: spamd: result: . -1 -
BAYES_00,CUST_DNSBL_17_SPAMCANNIBAL,CUST_DNSBL_26_NSZONES,CUST_DNSBL_34_BACKSCATTER,CUST_DNSWL_7_ORG_LOW,CUST_DNSWL_8_TL_NT,FSL_HELO_HOME,HTML_MESSAGE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_NONE
scantime=3.7,size=20552,user=sa-milt,uid=189,required_score=5.5,rhost=localhost,raddr=127.0.0.1,rport=/run/spamassassin/spamassassin.sock,mid=<4D...@komma.cc>,bayes=0.000000,autolearn=disabled,shortcircuit=no
May 25 11:54:57 mail-gw spamd[25762]: spamd: result: . -3 -
BAYES_00,CUST_DNSWL_8_TL_NT,FSL_HELO_HOME,HTML_MESSAGE,SPF_NONE,T_KAM_HTML_FONT_INVALID
scantime=2.6,size=923615,user=sa-milt,uid=189,required_score=5.5,rhost=localhost,raddr=127.0.0.1,rport=/run/spamassassin/spamassassin.sock,mid=<7D...@intel>,bayes=0.000000,autolearn=disabled,shortcircuit=no
May 25 16:33:20 mail-gw spamd[26346]: spamd: result: . -3 -
BAYES_00,CUST_DNSWL_7_ORG_LOW,CUST_DNSWL_8_TL_NT,FREEMAIL_FROM,FSL_HELO_HOME,HTML_MESSAGE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,RP_MATCHES_RCVD,SPF_NONE
scantime=5.8,size=74496,user=sa-milt,uid=189,required_score=5.5,rhost=localhost,raddr=127.0.0.1,rport=/run/spamassassin/spamassassin.sock,mid=<6A...@aon.at>,bayes=0.000000,autolearn=disabled,shortcircuit=no
Re: FSL_HELO_HOME: deep headers again
Posted by John Hardin <jh...@impsec.org>.
On Fri, 13 May 2016, RW wrote:
> On Fri, 13 May 2016 15:42:07 +0200
> Reindl Harald wrote:
>
>> WTF - Received: from daves-air.home ([1.125.7.92]) is another time a
>> DEEP HEADER Inspection -
>
> This looks like a simple mistake rather than a deliberate attempt at a
> deep check. You should file a bug report.
Please don't. The rule has been disabled.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
...much of our country's counterterrorism security spending is not
designed to protect us from the terrorists, but instead to protect
our public officials from criticism when another attack occurs.
-- Bruce Schneier
-----------------------------------------------------------------------
143 days since the first successful real return to launch site (SpaceX)
Re: FSL_HELO_HOME: deep headers again
Posted by Reindl Harald <h....@thelounge.net>.
Am 13.05.2016 um 23:08 schrieb Tom Hendrikx:
> On 13-05-16 18:29, Reindl Harald wrote:
>> especially you would not have much from the bayes-samples because they
>> would trigger all sort of wrong rules after strip most headers and and a
>> generic received header (which seems to be needed by the bayes-engine
>> for whatever reason since it otherwise scores samples completly different)
>
> This is an assumption: you can't know what your data would contribute to
> the masscheck process
this is *not* an assumption - the setup is maintained in a way that i
don't have to make many assumptions at all
i run tools for corpus-files and downloads to pass them through SA and
see regulary all sort of rules hit on stripped samples which would not
hit on the untouched email
guess what remains with a 2292 lines "bayes_ignore_header" which is also
used to strip messages with formail compared to the original ones
the reason is that we maintain a real huge bayes which is intended only
to contain body and a few headers, otherwise 90000 samples would not
only take 800 MB stoarge and result "only" 2818486 token
why?
because we keep samples and bayes forever while train every spam message
below BAYES_99 and every ham message >= BAYES_50 to keep the option
rebuild from scratch at any point in time (tokenizer-changes in future
versions, maybe more-word-tokens in future versions or if needed switch
to a different solution without start collect from scratch)
Re: FSL_HELO_HOME: deep headers again
Posted by Tom Hendrikx <to...@whyscream.net>.
On 13-05-16 18:29, Reindl Harald wrote:
>
> Am 13.05.2016 um 18:11 schrieb John Hardin:
>> On Fri, 13 May 2016, Reindl Harald wrote:
>>
>>> the problem is blowing out such rules with such scores at all with a
>>> non working auto-QA (non-working in: no correction for days as well as
>>> dangerous scoring of new rules from the start)
>>>
>>> 02-Mai-2016 00:12:34: SpamAssassin: No update available
>>> 03-Mai-2016 01:55:05: SpamAssassin: No update available
>>> 04-Mai-2016 00:43:33: SpamAssassin: No update available
>>> 05-Mai-2016 01:48:15: SpamAssassin: Update processed successfully
>>> 06-Mai-2016 00:53:17: SpamAssassin: No update available
>>> 07-Mai-2016 01:21:23: SpamAssassin: No update available
>>> 08-Mai-2016 01:38:23: SpamAssassin: No update available
>>> 09-Mai-2016 00:02:56: SpamAssassin: No update available
>>> 10-Mai-2016 01:10:29: SpamAssassin: No update available
>>> 11-Mai-2016 00:55:46: SpamAssassin: No update available
>>> 12-Mai-2016 00:21:17: SpamAssassin: Update processed successfully
>>> 13-Mai-2016 00:33:31: SpamAssassin: No update available
>>
>> Perhaps you could help with that by participating in masscheck. You seem
>> to get a lot of FPs on base rules; contributing masscheck results on
>> your ham would reduce those
>
> i can't rsync customer mails to a 3rd party
That is not necessary for masscheck.
>
> if that would be based on some webervice where you just feed local
> samples and only give the rules which hitted and spam/ham flag out it
> would be somehow possible
The process is clearly documented on the wiki:
https://wiki.apache.org/spamassassin/MassCheck
>
> especially you would not have much from the bayes-samples because they
> would trigger all sort of wrong rules after strip most headers and and a
> generic received header (which seems to be needed by the bayes-engine
> for whatever reason since it otherwise scores samples completly different)
This is an assumption: you can't know what your data would contribute to
the masscheck process.
>
> in any case: such a rule with 3.7 must not happen at all, even if it has
> no such bad impact - 3.7 is very high and only deserved when you are
> certain that a mail is spam which is *not* backed by a single header,
> deep inspection or not
>
That is true, but I think you should put your money where your mouth is:
just run the masscheck on your corpus and send the results to the devs
for inspection. If it's not working, you lost nothing. If the data *is*
useful, we all win from your work by getting better scores.
Just my 2 cents.
Regards,
Tom
Re: FSL_HELO_HOME: deep headers again
Posted by John Hardin <jh...@impsec.org>.
On Sat, 14 May 2016, Reindl Harald wrote:
> Am 14.05.2016 um 19:10 schrieb John Hardin:
>> On Sat, 14 May 2016, Reindl Harald wrote:
>>
>> > Am 14.05.2016 um 04:50 schrieb John Hardin:
>> > > On Sat, 14 May 2016, Reindl Harald wrote:
>> > > > Am 14.05.2016 um 04:04 schrieb John Hardin:
>> > > > > How would a webservice be better? That would still be
>> > > > > sending customer emails to a third party for processing. uhm
>> > > > > you missed "and only give the rules which hitted and
>> > > > > spam/ham flag out"
>> > >
>> > > Ah, OK, I misunderstood what you were suggesting.
>> > >
>> > > That wouldn't work. That tells you the rules they hit at the time
>> > > they were scanned, not which rules they would hit from the
>> > > current testing rules.
>> >
>> > on the other hand it would reflect the complete mail-flow and not just
>> > hand-crafted samples
>>
>> It's not hand *crafted* samples, it's hand *classified* samples. The
>> message needs to be classified by a reliable human as ham or spam for
>> the analysis of the rules that it hits to have any use, or even be
>> possible.
>
> that's just nitpicking - i can correct you the same way in german for most of
> you would try to express :-)
Yes, probably.
>> That's why doing something like having an SA install that's based on the
>> current SVN sandbox rules, and that gets a forked copy of your mail
>> stream, and that captures the hits, is still not useful for anything
>> other than gross "this rule didn't hit anything" analysis - you don't
>> know what a given message *should* have been, so you can't say anything
>> about the rules that hit it - whether they aid that result, or hider it.
>
> how do you imagine such a setup *in practice*
Somewhat stream-of-consciousness:
In addition to your normal deliver-to-the-user MTA, have another MTA that
is running against an SA that is configured from SVN. Note that this
wouldn't be a backup MTA, it would have to get a copy of your inbound mail
stream. Not sure how you'd fork the mail delivery process, that's probably
MTA-dependent.
The masscheck MTA would deliver to SA, record the rule hits and
classification in the masscheck upload format, and discard the message.
Normal delivery would usually be suspended so that messages queue.
When the masscheck start time is reached, update from SVN, recompile the
rules, clear the log and enable MTA delivery. The queued messages would be
scanned and recorded until the upload time is reached, at which time
delivery is suspended again. This may or may not be long enough to clear
the queue.
The results would then be uploaded.
As you noted, there would have to be some minimum score for recording the
message as spam, and some maximum score for recording it as ham. Anything
in between would have to be discarded as ambiguous. There might also need
to be some kind of weighting on the results when they are incorporated
into masscheck to reflect that they are not hand-classified and thus their
reliability isn't as good as we'd like, however there have been
misclassifications in hand-classified corpora before so if the thresholds
are well-chosen that may not be an issue.
But note, this would probably not help offset a high-scoring FP rule as
the message would be auto-classified as spam or, at best, ambiguous - it
might actually be self-reinforcing and make the situation worse, rather
than help it be self-correcting as hand-classified corpora would. It also
won't probably help much with new rules.
I don't really think there's any way around having hand-classified clean
and complete corpora for running masschecks.
>> Unless your mail stream prior to SA is *guaranteed* 100% ham (which is
>> hugely unlikely or why would you be running SA at all?) or 100% spam
>> (which might be the case for a clean honeypot), you need to review and
>> classify the messages manually before performing the scan and reporting
>> the rule hits, and that means keeping copies of the pristine messages,
>> at least for a while.
>>
>> I don't know whether statutory requirement make this impossible for you
>> even if you did obtain consent from some of your clients to use their
>> mail stream in that manner.
>
> i don't have access to the whole mailflow to classify it nor is there a
> technical way to mirror it on a different setup
OK
> nor would SA or even smtpd ever see 95% of junk because content filters
> are the last ressort by definition
It's not too difficult for masscheck to get spam, as there are honeypots
feeding masscheck. It's harder to get ham, especially non-English ham, so
contributing to masscheck from a 99% clean feed is still helpful.
>> > should be chained in a minimum negative score to count as ham and a
>> > minimum positive to count as spam - configureable because it depends
>> > on the local environment and adjustments which scores are clear
>> > classifications, 7.0 would here not be 100% spam, 12.0 would be as
>> > example
>>
>> That's probably still not reliable enough for use in masscheck. Ham is a
>> bit more important; what would you recommend as a lower limit for
>> considering a message as ham? How many actual hams would meet that
>> requiement? It might be a lot of work for little final benefit. What
>> percentage actual FNs would you see with that setting? Those would
>> damage the masscheck analysis.
>
> i would agree if we could call the current masscheck results reliable
>
>> > it would at least help in the current situation and with a rule like
>> > FSL_HELO_HOME when it hits only clear ham and has a high spam-score
>> > and when it only needs to be enabled, collects the information through
>> > scanning and submit the results once per day a lot of people running
>> > milter like setups with reject and no access to rejected mails could
>> > help to improve to auto-QA without collecting whole mails
>>
>> Potentially. You'd have to be willing to set up a parallel mail
>> processing stream using the current SVN sandbox rules as I described
>> above. Performing analysis on the released rules provides no benefit to
>> masscheck
>
> why would it provide no benefit when one part of the "sa-update" which
> currently don't get any updates most of the time is to re-score badly socred
> rules - that's really not only about sandbox rules
Because the rules in question may have changed since the last update was
released. The analysis needs to be of the current state of the rules in
SVN - take a snapshot, masscheck it and generate scores, and those rules
and their scores are released as an update if the corpora are large enough
for the results to be considered reliable. (Note that "reliability" is
based on the *size* of the corpora. We sadly don't have any way to judge
it based on broadness of content.)
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
...much of our country's counterterrorism security spending is not
designed to protect us from the terrorists, but instead to protect
our public officials from criticism when another attack occurs.
-- Bruce Schneier
-----------------------------------------------------------------------
144 days since the first successful real return to launch site (SpaceX)
Re: FSL_HELO_HOME: deep headers again
Posted by Reindl Harald <h....@thelounge.net>.
Am 14.05.2016 um 19:10 schrieb John Hardin:
> On Sat, 14 May 2016, Reindl Harald wrote:
>
>> Am 14.05.2016 um 04:50 schrieb John Hardin:
>>> On Sat, 14 May 2016, Reindl Harald wrote:
>>> > Am 14.05.2016 um 04:04 schrieb John Hardin:
>>> > > How would a webservice be better? That would still be sending >
>>> > customer
>>> > > emails to a third party for processing.
>>> > > uhm you missed "and only give the rules which hitted and
>>> spam/ham flag
>>> > out"
>>>
>>> Ah, OK, I misunderstood what you were suggesting.
>>>
>>> That wouldn't work. That tells you the rules they hit at the time they
>>> were scanned, not which rules they would hit from the current testing
>>> rules.
>>
>> on the other hand it would reflect the complete mail-flow and not just
>> hand-crafted samples
>
> It's not hand *crafted* samples, it's hand *classified* samples. The
> message needs to be classified by a reliable human as ham or spam for
> the analysis of the rules that it hits to have any use, or even be
> possible.
that's just nitpicking - i can correct you the same way in german for
most of you would try to express :-)
> That's why doing something like having an SA install that's based on the
> current SVN sandbox rules, and that gets a forked copy of your mail
> stream, and that captures the hits, is still not useful for anything
> other than gross "this rule didn't hit anything" analysis - you don't
> know what a given message *should* have been, so you can't say anything
> about the rules that hit it - whether they aid that result, or hider it.
how do you imagine such a setup *in practice*
> Unless your mail stream prior to SA is *guaranteed* 100% ham (which is
> hugely unlikely or why would you be running SA at all?) or 100% spam
> (which might be the case for a clean honeypot), you need to review and
> classify the messages manually before performing the scan and reporting
> the rule hits, and that means keeping copies of the pristine messages,
> at least for a while.
>
> I don't know whether statutory requirement make this impossible for you
> even if you did obtain consent from some of your clients to use their
> mail stream in that manner.
i don't have access to the whole mailflow to classify it nor is there a
technical way to mirror it on a different setup nor would SA or even
smtpd ever see 95% of junk because content filters are the last ressort
by definition
>> should be chained in a minimum negative score to count as ham and a
>> minimum positive to count as spam - configureable because it depends
>> on the local environment and adjustments which scores are clear
>> classifications, 7.0 would here not be 100% spam, 12.0 would be as
>> example
>
> That's probably still not reliable enough for use in masscheck. Ham is a
> bit more important; what would you recommend as a lower limit for
> considering a message as ham? How many actual hams would meet that
> requiement? It might be a lot of work for little final benefit. What
> percentage actual FNs would you see with that setting? Those would
> damage the masscheck analysis.
i would agree if we could call the current masscheck results reliable
>> it would at least help in the current situation and with a rule like
>> FSL_HELO_HOME when it hits only clear ham and has a high spam-score
>> and when it only needs to be enabled, collects the information through
>> scanning and submit the results once per day a lot of people running
>> milter like setups with reject and no access to rejected mails could
>> help to improve to auto-QA without collecting whole mails
>
> Potentially. You'd have to be willing to set up a parallel mail
> processing stream using the current SVN sandbox rules as I described
> above. Performing analysis on the released rules provides no benefit to
> masscheck
why would it provide no benefit when one part of the "sa-update" which
currently don't get any updates most of the time is to re-score badly
socred rules - that's really not only about sandbox rules
Re: FSL_HELO_HOME: deep headers again
Posted by John Hardin <jh...@impsec.org>.
On Sat, 14 May 2016, Reindl Harald wrote:
> Am 14.05.2016 um 04:50 schrieb John Hardin:
>> On Sat, 14 May 2016, Reindl Harald wrote:
>> > Am 14.05.2016 um 04:04 schrieb John Hardin:
>> > > How would a webservice be better? That would still be sending
>> > > customer
>> > > emails to a third party for processing.
>> >
>> > uhm you missed "and only give the rules which hitted and spam/ham flag
>> > out"
>>
>> Ah, OK, I misunderstood what you were suggesting.
>>
>> That wouldn't work. That tells you the rules they hit at the time they
>> were scanned, not which rules they would hit from the current testing
>> rules.
>
> on the other hand it would reflect the complete mail-flow and not just
> hand-crafted samples
It's not hand *crafted* samples, it's hand *classified* samples. The
message needs to be classified by a reliable human as ham or spam for the
analysis of the rules that it hits to have any use, or even be possible.
That's why doing something like having an SA install that's based on the
current SVN sandbox rules, and that gets a forked copy of your mail
stream, and that captures the hits, is still not useful for anything other
than gross "this rule didn't hit anything" analysis - you don't know what
a given message *should* have been, so you can't say anything about the
rules that hit it - whether they aid that result, or hider it.
Unless your mail stream prior to SA is *guaranteed* 100% ham (which is
hugely unlikely or why would you be running SA at all?) or 100% spam
(which might be the case for a clean honeypot), you need to review and
classify the messages manually before performing the scan and reporting
the rule hits, and that means keeping copies of the pristine messages, at
least for a while.
I don't know whether statutory requirement make this impossible for you
even if you did obtain consent from some of your clients to use their mail
stream in that manner.
> should be chained in a minimum negative score to count as ham and a minimum
> positive to count as spam - configureable because it depends on the local
> environment and adjustments which scores are clear classifications, 7.0 would
> here not be 100% spam, 12.0 would be as example
That's probably still not reliable enough for use in masscheck. Ham is a
bit more important; what would you recommend as a lower limit for
considering a message as ham? How many actual hams would meet that
requiement? It might be a lot of work for little final benefit. What
percentage actual FNs would you see with that setting? Those would damage
the masscheck analysis.
> it would at least help in the current situation and with a rule like
> FSL_HELO_HOME when it hits only clear ham and has a high spam-score and when
> it only needs to be enabled, collects the information through scanning and
> submit the results once per day a lot of people running milter like setups
> with reject and no access to rejected mails could help to improve to auto-QA
> without collecting whole mails
Potentially. You'd have to be willing to set up a parallel mail processing
stream using the current SVN sandbox rules as I described above.
Performing analysis on the released rules provides no benefit to
masscheck.
>> > > Corpora with headers stripped does present a problem. The masscheck
>> > > corpora should be complete as received
>> >
>> > and that is not possible - samples are stripped and anonymized
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Maxim IX: Never turn your back on an enemy.
-----------------------------------------------------------------------
144 days since the first successful real return to launch site (SpaceX)
Re: FSL_HELO_HOME: deep headers again
Posted by Reindl Harald <h....@thelounge.net>.
Am 14.05.2016 um 04:50 schrieb John Hardin:
> On Sat, 14 May 2016, Reindl Harald wrote:
>> Am 14.05.2016 um 04:04 schrieb John Hardin:
>>> How would a webservice be better? That would still be sending customer
>>> emails to a third party for processing.
>>
>> uhm you missed "and only give the rules which hitted and spam/ham flag
>> out"
>
> Ah, OK, I misunderstood what you were suggesting.
>
> That wouldn't work. That tells you the rules they hit at the time they
> were scanned, not which rules they would hit from the current testing
> rules.
on the other hand it would reflect the complete mail-flow and not just
hand-crafted samples
should be chained in a minimum negative score to count as ham and a
minimum positive to count as spam - configureable because it depends on
the local environment and adjustments which scores are clear
classifications, 7.0 would here not be 100% spam, 12.0 would be as example
it would at least help in the current situation and with a rule like
FSL_HELO_HOME when it hits only clear ham and has a high spam-score and
when it only needs to be enabled, collects the information through
scanning and submit the results once per day a lot of people running
milter like setups with reject and no access to rejected mails could
help to improve to auto-QA without collecting whole mails
>>> Corpora with headers stripped does present a problem. The masscheck
>>> corpora should be complete as received
>>
>> and that is not possible - samples are stripped and anonymized
Re: FSL_HELO_HOME: deep headers again
Posted by John Hardin <jh...@impsec.org>.
On Sat, 14 May 2016, Reindl Harald wrote:
>
>
> Am 14.05.2016 um 04:04 schrieb John Hardin:
>> On Fri, 13 May 2016, Reindl Harald wrote:
>> > i can't rsync customer mails to a 3rd party
>>
>> You don't have to. You run the masscheck locally and only upload the
>> rule hit results. I upload my corpora because they are just my email and
>> are thus tiny.
>>
>> If you select your corpora filenames properly, no information should leak.
>
> OK
>
>> > if that would be based on some webervice where you just feed local
>> > samples and only give the rules which hitted and spam/ham flag out it
>> > would be somehow possible
>>
>> How would a webservice be better? That would still be sending customer
>> emails to a third party for processing.
>
> uhm you missed "and only give the rules which hitted and spam/ham flag out"
Ah, OK, I misunderstood what you were suggesting.
That wouldn't work. That tells you the rules they hit at the time they
were scanned, not which rules they would hit from the current testing
rules.
>> > especially you would not have much from the bayes-samples because they
>> > would trigger all sort of wrong rules after strip most headers and and
>> > a generic received header (which seems to be needed by the
>> > bayes-engine for whatever reason since it otherwise scores samples
>> > completly different)
>>
>> Corpora with headers stripped does present a problem. The masscheck
>> corpora should be complete as received
>
> and that is not possible - samples are stripped and anonymized
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Justice is justice, whereas "social justice" is code for one set
of rules for the rich, another for the poor; one set for whites,
another set for minorities; one set for straight men, another for
women and gays. In short, it's the opposite of actual justice.
-- Burt Prelutsky
-----------------------------------------------------------------------
143 days since the first successful real return to launch site (SpaceX)
Re: FSL_HELO_HOME: deep headers again
Posted by Reindl Harald <h....@thelounge.net>.
Am 14.05.2016 um 04:04 schrieb John Hardin:
> On Fri, 13 May 2016, Reindl Harald wrote:
>> i can't rsync customer mails to a 3rd party
>
> You don't have to. You run the masscheck locally and only upload the
> rule hit results. I upload my corpora because they are just my email and
> are thus tiny.
>
> If you select your corpora filenames properly, no information should leak.
OK
>> if that would be based on some webervice where you just feed local
>> samples and only give the rules which hitted and spam/ham flag out it
>> would be somehow possible
>
> How would a webservice be better? That would still be sending customer
> emails to a third party for processing.
uhm you missed "and only give the rules which hitted and spam/ham flag out"
>> especially you would not have much from the bayes-samples because they
>> would trigger all sort of wrong rules after strip most headers and and
>> a generic received header (which seems to be needed by the
>> bayes-engine for whatever reason since it otherwise scores samples
>> completly different)
>
> Corpora with headers stripped does present a problem. The masscheck
> corpora should be complete as received
and that is not possible - samples are stripped and anonymized
Re: FSL_HELO_HOME: deep headers again
Posted by John Hardin <jh...@impsec.org>.
On Fri, 13 May 2016, Reindl Harald wrote:
>
> Am 13.05.2016 um 18:11 schrieb John Hardin:
>> On Fri, 13 May 2016, Reindl Harald wrote:
>>
>> > the problem is blowing out such rules with such scores at all with a
>> > non working auto-QA (non-working in: no correction for days as well as
>> > dangerous scoring of new rules from the start)
>> >
>> > 02-Mai-2016 00:12:34: SpamAssassin: No update available
>> > 03-Mai-2016 01:55:05: SpamAssassin: No update available
>> > 04-Mai-2016 00:43:33: SpamAssassin: No update available
>> > 05-Mai-2016 01:48:15: SpamAssassin: Update processed successfully
>> > 06-Mai-2016 00:53:17: SpamAssassin: No update available
>> > 07-Mai-2016 01:21:23: SpamAssassin: No update available
>> > 08-Mai-2016 01:38:23: SpamAssassin: No update available
>> > 09-Mai-2016 00:02:56: SpamAssassin: No update available
>> > 10-Mai-2016 01:10:29: SpamAssassin: No update available
>> > 11-Mai-2016 00:55:46: SpamAssassin: No update available
>> > 12-Mai-2016 00:21:17: SpamAssassin: Update processed successfully
>> > 13-Mai-2016 00:33:31: SpamAssassin: No update available
>>
>> Perhaps you could help with that by participating in masscheck. You seem
>> to get a lot of FPs on base rules; contributing masscheck results on
>> your ham would reduce those
>
> i can't rsync customer mails to a 3rd party
You don't have to. You run the masscheck locally and only upload the rule
hit results. I upload my corpora because they are just my email and are
thus tiny.
If you select your corpora filenames properly, no information should leak.
> if that would be based on some webervice where you just feed local samples
> and only give the rules which hitted and spam/ham flag out it would be
> somehow possible
How would a webservice be better? That would still be sending customer
emails to a third party for processing.
> especially you would not have much from the bayes-samples because they
> would trigger all sort of wrong rules after strip most headers and and a
> generic received header (which seems to be needed by the bayes-engine
> for whatever reason since it otherwise scores samples completly
> different)
Corpora with headers stripped does present a problem. The masscheck
corpora should be complete as received.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
People think they're trading chaos for order [by ceding more and
more power to the Government], but they're just trading normal
human evil for the really dangerous organized kind of evil, the
kind that simply does not give a shit. Only bureaucrats can give
you true evil. -- Larry Correia
-----------------------------------------------------------------------
143 days since the first successful real return to launch site (SpaceX)
Re: FSL_HELO_HOME: deep headers again
Posted by Reindl Harald <h....@thelounge.net>.
Am 13.05.2016 um 18:11 schrieb John Hardin:
> On Fri, 13 May 2016, Reindl Harald wrote:
>
>> the problem is blowing out such rules with such scores at all with a
>> non working auto-QA (non-working in: no correction for days as well as
>> dangerous scoring of new rules from the start)
>>
>> 02-Mai-2016 00:12:34: SpamAssassin: No update available
>> 03-Mai-2016 01:55:05: SpamAssassin: No update available
>> 04-Mai-2016 00:43:33: SpamAssassin: No update available
>> 05-Mai-2016 01:48:15: SpamAssassin: Update processed successfully
>> 06-Mai-2016 00:53:17: SpamAssassin: No update available
>> 07-Mai-2016 01:21:23: SpamAssassin: No update available
>> 08-Mai-2016 01:38:23: SpamAssassin: No update available
>> 09-Mai-2016 00:02:56: SpamAssassin: No update available
>> 10-Mai-2016 01:10:29: SpamAssassin: No update available
>> 11-Mai-2016 00:55:46: SpamAssassin: No update available
>> 12-Mai-2016 00:21:17: SpamAssassin: Update processed successfully
>> 13-Mai-2016 00:33:31: SpamAssassin: No update available
>
> Perhaps you could help with that by participating in masscheck. You seem
> to get a lot of FPs on base rules; contributing masscheck results on
> your ham would reduce those
i can't rsync customer mails to a 3rd party
if that would be based on some webervice where you just feed local
samples and only give the rules which hitted and spam/ham flag out it
would be somehow possible
especially you would not have much from the bayes-samples because they
would trigger all sort of wrong rules after strip most headers and and a
generic received header (which seems to be needed by the bayes-engine
for whatever reason since it otherwise scores samples completly different)
in any case: such a rule with 3.7 must not happen at all, even if it has
no such bad impact - 3.7 is very high and only deserved when you are
certain that a mail is spam which is *not* backed by a single header,
deep inspection or not
Re: FSL_HELO_HOME: deep headers again
Posted by John Hardin <jh...@impsec.org>.
On Fri, 13 May 2016, Reindl Harald wrote:
> the problem is blowing out such rules with such scores at all with a non
> working auto-QA (non-working in: no correction for days as well as dangerous
> scoring of new rules from the start)
>
> 02-Mai-2016 00:12:34: SpamAssassin: No update available
> 03-Mai-2016 01:55:05: SpamAssassin: No update available
> 04-Mai-2016 00:43:33: SpamAssassin: No update available
> 05-Mai-2016 01:48:15: SpamAssassin: Update processed successfully
> 06-Mai-2016 00:53:17: SpamAssassin: No update available
> 07-Mai-2016 01:21:23: SpamAssassin: No update available
> 08-Mai-2016 01:38:23: SpamAssassin: No update available
> 09-Mai-2016 00:02:56: SpamAssassin: No update available
> 10-Mai-2016 01:10:29: SpamAssassin: No update available
> 11-Mai-2016 00:55:46: SpamAssassin: No update available
> 12-Mai-2016 00:21:17: SpamAssassin: Update processed successfully
> 13-Mai-2016 00:33:31: SpamAssassin: No update available
Perhaps you could help with that by participating in masscheck. You seem
to get a lot of FPs on base rules; contributing masscheck results on your
ham would reduce those.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
...much of our country's counterterrorism security spending is not
designed to protect us from the terrorists, but instead to protect
our public officials from criticism when another attack occurs.
-- Bruce Schneier
-----------------------------------------------------------------------
143 days since the first successful real return to launch site (SpaceX)
Re: FSL_HELO_HOME: deep headers again
Posted by Reindl Harald <h....@thelounge.net>.
Am 13.05.2016 um 16:25 schrieb RW:
> On Fri, 13 May 2016 15:42:07 +0200
> Reindl Harald wrote:
>
>> WTF - Received: from daves-air.home ([1.125.7.92]) is another time a
>> DEEP HEADER Inspection -
>
> This looks like a simple mistake rather than a deliberate attempt at a
> deep check. You should file a bug report
the problem is blowing out such rules with such scores at all with a non
working auto-QA (non-working in: no correction for days as well as
dangerous scoring of new rules from the start)
02-Mai-2016 00:12:34: SpamAssassin: No update available
03-Mai-2016 01:55:05: SpamAssassin: No update available
04-Mai-2016 00:43:33: SpamAssassin: No update available
05-Mai-2016 01:48:15: SpamAssassin: Update processed successfully
06-Mai-2016 00:53:17: SpamAssassin: No update available
07-Mai-2016 01:21:23: SpamAssassin: No update available
08-Mai-2016 01:38:23: SpamAssassin: No update available
09-Mai-2016 00:02:56: SpamAssassin: No update available
10-Mai-2016 01:10:29: SpamAssassin: No update available
11-Mai-2016 00:55:46: SpamAssassin: No update available
12-Mai-2016 00:21:17: SpamAssassin: Update processed successfully
13-Mai-2016 00:33:31: SpamAssassin: No update available
Re: FSL_HELO_HOME: deep headers again
Posted by RW <rw...@googlemail.com>.
On Fri, 13 May 2016 15:42:07 +0200
Reindl Harald wrote:
> WTF - Received: from daves-air.home ([1.125.7.92]) is another time a
> DEEP HEADER Inspection -
This looks like a simple mistake rather than a deliberate attempt at a
deep check. You should file a bug report.