You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Sebastian Arcus <s....@open-t.co.uk> on 2017/12/01 10:17:03 UTC

Re: HTML_IMAGE_ONLY_* generating too many FP's

On 30/11/17 12:45, Matus UHLAR - fantomas wrote:
> On 28.11.17 19:39, Sebastian Arcus wrote:
>> I'm having more and more problems with the HTML_IMAGE_ONLY_* set of 
>> rules recently generating false positives.
>>
>> Plenty of business emails will include a logo at the bottom - and not 
>> everybody is a graphics expert to make their logo a tiny optimised gif 
>> or png - so some of these are slightly bigger than they should be.
>>
>> However, this seems to be sufficiently wide spread. Also, many 
>> business emails can be just a few words reply - so the ratio of words 
>> to images triggers the filter in SA. Could the scores on 
>> HTML_IMAGE_ONLY_* set of rules be lowered a bit - or is there anything 
>> else to be done - aside from educating all the internet on optimising 
>> logos in the email signatures? :-)
> 
> those have lower scorew with BAYES and network rules enabled.
> configure BAYES and enable netowrk rules...

Hi. I have BAYES enabled and DNSBL's enabled (I assume that's what you 
mean by network rules?). I still think that a score of 1.6 is quite a 
lot, considering that so many emails nowadays contain either an embedded 
logo in the signature, with just a few words (in a quick email reply, 
for example), or even images inserted, instead of attached to the email. 
Please see below an example of a SA report:

-0.2 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
                             [212.227.126.131 listed in wl.mailspike.net]
0.4 MIME_HTML_MOSTLY       BODY: Multipart message mostly text/html MIME
1.6 HTML_IMAGE_ONLY_24     BODY: HTML: images with 2000-2400 bytes of words
2.0 BAYES_50               BODY: Bayes spam probability is 40 to 60%
                              [score: 0.4808]
0.8 MPART_ALT_DIFF         BODY: HTML and text parts are different
0.0 HTML_MESSAGE           BODY: HTML included in message
2.5 PYZOR_CHECK            Listed in Pyzor (http://pyzor.sf.net/)
-0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at http://www.dnswl.org/, no
                              trust
                              [212.227.126.131 listed in list.dnswl.org]

Re: Re: HTML_IMAGE_ONLY_* generating too many FP's

Posted by Mark London <mr...@psfc.mit.edu>.
On 12/5/2017 5:28 AM, Sebastian Arcus wrote:
> On 02/12/17 18:45, David Jones wrote:
>> On 12/02/2017 11:22 AM, Sebastian Arcus wrote:
>>> On 02/12/17 13:06, Matus UHLAR - fantomas wrote:
>>>>>> On 12/01/2017 11:17 AM, Sebastian Arcus wrote:
>>>>>>> -0.2 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
>>>>>>>                              [212.227.126.131 listed in 
>>>>>>> wl.mailspike.net]
>>>>>>> 0.4 MIME_HTML_MOSTLY       BODY: Multipart message mostly 
>>>>>>> text/html MIME
>>>>>>> 1.6 HTML_IMAGE_ONLY_24     BODY: HTML: images with 2000-2400 
>>>>>>> bytes of words
>>>>>>> 2.0 BAYES_50               BODY: Bayes spam probability is 40 to 
>>>>>>> 60%
>>>>>>>                               [score: 0.4808]
>>>>>>> 0.8 MPART_ALT_DIFF         BODY: HTML and text parts are different
>>>>>>> 0.0 HTML_MESSAGE           BODY: HTML included in message
>>>>>>> 2.5 PYZOR_CHECK            Listed in Pyzor (http://pyzor.sf.net/)
>>>>>>> -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at 
>>>>>>> http://www.dnswl.org/, no
>>>>>>>                               trust
>>>>>>>                               [212.227.126.131 listed in 
>>>>>>> list.dnswl.org]
>>>>> On 01/12/17 10:54, Axb wrote:
>>>>>> you've changed SA default scores and now complain about one which 
>>>>>> hasn't been touched as cause for FPs?
>>>>>> compare the defaults with yours...
>>>>>> score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
>>>>>> score BAYES_50  0  0  2.0    0.8
>>>>>> hmmmm.... maybe you should rethink those changes.
>>>> On 01.12.17 12:23, Sebastian Arcus wrote:
>>>>> Indeed, I did amend some of the default SA scores, to catch more 
>>>>> spam for the type of email received at this particular site. That 
>>>>> doesn't change the fact that 1.6 seems to me a pretty high score 
>>>>> for a rule which would be triggered on such a large number of ham 
>>>>> emails. Just saying.
>>>> You should understand that when you start tuning scores, you can 
>>>> get to hell
>>>> very fast. unless you do your own mass-checks and tune according to 
>>>> them.
>>> I'm not too sure I understand this attitude. The whole reason I 
>>> started to tweak the scores for certain rules is that too much spam 
>>> was going through. The false negatives have gone down considerably 
>>> since I have altered the scores - and yes, I do keep an eye on them 
>>> constantly and adjust depending on the number of false positive and 
>>> negatives, and what triggers what. I also use network tests / RBL's 
>>> as well and Bayes. The simple fact of the matter is that on plenty 
>>> of spam emails, only one significant rule might get triggered - be 
>>> it a high bayes score, one of the DNS RBL's or something else. If 
>>> the rule doesn't have a high enough score, the email passes through.
>>>
>>> Spammers change their tactics and content of their emails all the 
>>> time - and the rule scores haven't been updated in months - because 
>>> of the problems with the updating system (which is not a criticism - 
>>> I understand the situation). So for people to advise sticking 
>>> religiously to the default scores, well, frankly I don't get it.
>> The rulesets and dynamic scores in 72_scores.cf are updating again 
>> for the past 2 weeks.
>> I recommend only changing a few of the default scores and make meta 
>> rules that combine the hits to add points when you see a pattern of 2 
>> or more rules being hit.
>> If you add enough add-ons to your SA instance, then you shouldn't be 
>> impacted too much by the default scores.  SA has to be generic out of 
>> the box to cover all types of mail flow.  You have to tune it a bit 
>> for your particular recipients, language, and location.  See my email 
>> moments ago about tuning suggestions.
>> I used to constantly adjust scores to react to new spam campaigns but 
>> found I was always behind the spammers.  The more RBLs and meta rules 
>> you can setup, the more you can stay ahead of them.  Compromised 
>> accounts are the exception to this with zero-hour spam that is very 
>> difficult to block so try to keep that separate in your mind and not 
>> chase after those with score adjustments. These tend to stop 
>> automatically after 30 minutes or so when RBLs and DCC catch up to 
>> them or the account gets locked or it's password changed.  I report 
>> these to Spamcop as quickly as I can.
> Thank you David. Those are useful tips

I have also encountered FPs due to the scores of all the 
HTML_IMAGE_ONLY_* rules.  I have changed their score to be 0.001. I have 
meta rules that combine __HTML_IMG_ONLY with the RBLs, and I've found 
that to be useful.   But for some reason, __HTML_IMG_ONLY does not 
include HTML_IMAGE_ONLY_32.   Is there any reason that this was left out?

- Mark


Re: HTML_IMAGE_ONLY_* generating too many FP's

Posted by Sebastian Arcus <s....@open-t.co.uk>.
On 02/12/17 18:45, David Jones wrote:
> On 12/02/2017 11:22 AM, Sebastian Arcus wrote:
>>
>> On 02/12/17 13:06, Matus UHLAR - fantomas wrote:
>>>>> On 12/01/2017 11:17 AM, Sebastian Arcus wrote:
>>>>>> -0.2 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
>>>>>>                              [212.227.126.131 listed in 
>>>>>> wl.mailspike.net]
>>>>>> 0.4 MIME_HTML_MOSTLY       BODY: Multipart message mostly 
>>>>>> text/html MIME
>>>>>> 1.6 HTML_IMAGE_ONLY_24     BODY: HTML: images with 2000-2400 bytes 
>>>>>> of words
>>>>>> 2.0 BAYES_50               BODY: Bayes spam probability is 40 to 60%
>>>>>>                               [score: 0.4808]
>>>>>> 0.8 MPART_ALT_DIFF         BODY: HTML and text parts are different
>>>>>> 0.0 HTML_MESSAGE           BODY: HTML included in message
>>>>>> 2.5 PYZOR_CHECK            Listed in Pyzor (http://pyzor.sf.net/)
>>>>>> -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at 
>>>>>> http://www.dnswl.org/, no
>>>>>>                               trust
>>>>>>                               [212.227.126.131 listed in 
>>>>>> list.dnswl.org]
>>>
>>>> On 01/12/17 10:54, Axb wrote:
>>>>> you've changed SA default scores and now complain about one which 
>>>>> hasn't been touched as cause for FPs?
>>>>>
>>>>> compare the defaults with yours...
>>>>> score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
>>>>> score BAYES_50  0  0  2.0    0.8
>>>>>
>>>>> hmmmm.... maybe you should rethink those changes.
>>>
>>> On 01.12.17 12:23, Sebastian Arcus wrote:
>>>> Indeed, I did amend some of the default SA scores, to catch more 
>>>> spam for the type of email received at this particular site. That 
>>>> doesn't change the fact that 1.6 seems to me a pretty high score for 
>>>> a rule which would be triggered on such a large number of ham 
>>>> emails. Just saying.
>>>
>>> You should understand that when you start tuning scores, you can get 
>>> to hell
>>> very fast. unless you do your own mass-checks and tune according to 
>>> them.
>>
>> I'm not too sure I understand this attitude. The whole reason I 
>> started to tweak the scores for certain rules is that too much spam 
>> was going through. The false negatives have gone down considerably 
>> since I have altered the scores - and yes, I do keep an eye on them 
>> constantly and adjust depending on the number of false positive and 
>> negatives, and what triggers what. I also use network tests / RBL's as 
>> well and Bayes. The simple fact of the matter is that on plenty of 
>> spam emails, only one significant rule might get triggered - be it a 
>> high bayes score, one of the DNS RBL's or something else. If the rule 
>> doesn't have a high enough score, the email passes through.
>>
>> Spammers change their tactics and content of their emails all the time 
>> - and the rule scores haven't been updated in months - because of the 
>> problems with the updating system (which is not a criticism - I 
>> understand the situation). So for people to advise sticking 
>> religiously to the default scores, well, frankly I don't get it.
> 
> The rulesets and dynamic scores in 72_scores.cf are updating again for 
> the past 2 weeks.
> 
> I recommend only changing a few of the default scores and make meta 
> rules that combine the hits to add points when you see a pattern of 2 or 
> more rules being hit.
> 
> If you add enough add-ons to your SA instance, then you shouldn't be 
> impacted too much by the default scores.  SA has to be generic out of 
> the box to cover all types of mail flow.  You have to tune it a bit for 
> your particular recipients, language, and location.  See my email 
> moments ago about tuning suggestions.
> 
> I used to constantly adjust scores to react to new spam campaigns but 
> found I was always behind the spammers.  The more RBLs and meta rules 
> you can setup, the more you can stay ahead of them.  Compromised 
> accounts are the exception to this with zero-hour spam that is very 
> difficult to block so try to keep that separate in your mind and not 
> chase after those with score adjustments. These tend to stop 
> automatically after 30 minutes or so when RBLs and DCC catch up to them 
> or the account gets locked or it's password changed.  I report these to 
> Spamcop as quickly as I can.

Thank you David. Those are useful tips.

Re: HTML_IMAGE_ONLY_* generating too many FP's

Posted by David Jones <dj...@ena.com>.
On 12/02/2017 11:22 AM, Sebastian Arcus wrote:
> 
> On 02/12/17 13:06, Matus UHLAR - fantomas wrote:
>>>> On 12/01/2017 11:17 AM, Sebastian Arcus wrote:
>>>>> -0.2 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
>>>>>                              [212.227.126.131 listed in 
>>>>> wl.mailspike.net]
>>>>> 0.4 MIME_HTML_MOSTLY       BODY: Multipart message mostly text/html 
>>>>> MIME
>>>>> 1.6 HTML_IMAGE_ONLY_24     BODY: HTML: images with 2000-2400 bytes 
>>>>> of words
>>>>> 2.0 BAYES_50               BODY: Bayes spam probability is 40 to 60%
>>>>>                               [score: 0.4808]
>>>>> 0.8 MPART_ALT_DIFF         BODY: HTML and text parts are different
>>>>> 0.0 HTML_MESSAGE           BODY: HTML included in message
>>>>> 2.5 PYZOR_CHECK            Listed in Pyzor (http://pyzor.sf.net/)
>>>>> -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at 
>>>>> http://www.dnswl.org/, no
>>>>>                               trust
>>>>>                               [212.227.126.131 listed in 
>>>>> list.dnswl.org]
>>
>>> On 01/12/17 10:54, Axb wrote:
>>>> you've changed SA default scores and now complain about one which 
>>>> hasn't been touched as cause for FPs?
>>>>
>>>> compare the defaults with yours...
>>>> score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
>>>> score BAYES_50  0  0  2.0    0.8
>>>>
>>>> hmmmm.... maybe you should rethink those changes.
>>
>> On 01.12.17 12:23, Sebastian Arcus wrote:
>>> Indeed, I did amend some of the default SA scores, to catch more spam 
>>> for the type of email received at this particular site. That doesn't 
>>> change the fact that 1.6 seems to me a pretty high score for a rule 
>>> which would be triggered on such a large number of ham emails. Just 
>>> saying.
>>
>> You should understand that when you start tuning scores, you can get 
>> to hell
>> very fast. unless you do your own mass-checks and tune according to them.
> 
> I'm not too sure I understand this attitude. The whole reason I started 
> to tweak the scores for certain rules is that too much spam was going 
> through. The false negatives have gone down considerably since I have 
> altered the scores - and yes, I do keep an eye on them constantly and 
> adjust depending on the number of false positive and negatives, and what 
> triggers what. I also use network tests / RBL's as well and Bayes. The 
> simple fact of the matter is that on plenty of spam emails, only one 
> significant rule might get triggered - be it a high bayes score, one of 
> the DNS RBL's or something else. If the rule doesn't have a high enough 
> score, the email passes through.
> 
> Spammers change their tactics and content of their emails all the time - 
> and the rule scores haven't been updated in months - because of the 
> problems with the updating system (which is not a criticism - I 
> understand the situation). So for people to advise sticking religiously 
> to the default scores, well, frankly I don't get it.

The rulesets and dynamic scores in 72_scores.cf are updating again for 
the past 2 weeks.

I recommend only changing a few of the default scores and make meta 
rules that combine the hits to add points when you see a pattern of 2 or 
more rules being hit.

If you add enough add-ons to your SA instance, then you shouldn't be 
impacted too much by the default scores.  SA has to be generic out of 
the box to cover all types of mail flow.  You have to tune it a bit for 
your particular recipients, language, and location.  See my email 
moments ago about tuning suggestions.

I used to constantly adjust scores to react to new spam campaigns but 
found I was always behind the spammers.  The more RBLs and meta rules 
you can setup, the more you can stay ahead of them.  Compromised 
accounts are the exception to this with zero-hour spam that is very 
difficult to block so try to keep that separate in your mind and not 
chase after those with score adjustments. These tend to stop 
automatically after 30 minutes or so when RBLs and DCC catch up to them 
or the account gets locked or it's password changed.  I report these to 
Spamcop as quickly as I can.

-- 
David Jones

Re: HTML_IMAGE_ONLY_* generating too many FP's

Posted by Sebastian Arcus <s....@open-t.co.uk>.
On 02/12/17 13:06, Matus UHLAR - fantomas wrote:
>>> On 12/01/2017 11:17 AM, Sebastian Arcus wrote:
>>>> -0.2 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
>>>>                              [212.227.126.131 listed in 
>>>> wl.mailspike.net]
>>>> 0.4 MIME_HTML_MOSTLY       BODY: Multipart message mostly text/html 
>>>> MIME
>>>> 1.6 HTML_IMAGE_ONLY_24     BODY: HTML: images with 2000-2400 bytes 
>>>> of words
>>>> 2.0 BAYES_50               BODY: Bayes spam probability is 40 to 60%
>>>>                               [score: 0.4808]
>>>> 0.8 MPART_ALT_DIFF         BODY: HTML and text parts are different
>>>> 0.0 HTML_MESSAGE           BODY: HTML included in message
>>>> 2.5 PYZOR_CHECK            Listed in Pyzor (http://pyzor.sf.net/)
>>>> -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at 
>>>> http://www.dnswl.org/, no
>>>>                               trust
>>>>                               [212.227.126.131 listed in 
>>>> list.dnswl.org]
> 
>> On 01/12/17 10:54, Axb wrote:
>>> you've changed SA default scores and now complain about one which 
>>> hasn't been touched as cause for FPs?
>>>
>>> compare the defaults with yours...
>>> score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
>>> score BAYES_50  0  0  2.0    0.8
>>>
>>> hmmmm.... maybe you should rethink those changes.
> 
> On 01.12.17 12:23, Sebastian Arcus wrote:
>> Indeed, I did amend some of the default SA scores, to catch more spam 
>> for the type of email received at this particular site. That doesn't 
>> change the fact that 1.6 seems to me a pretty high score for a rule 
>> which would be triggered on such a large number of ham emails. Just 
>> saying.
> 
> You should understand that when you start tuning scores, you can get to 
> hell
> very fast. unless you do your own mass-checks and tune according to them.

I'm not too sure I understand this attitude. The whole reason I started 
to tweak the scores for certain rules is that too much spam was going 
through. The false negatives have gone down considerably since I have 
altered the scores - and yes, I do keep an eye on them constantly and 
adjust depending on the number of false positive and negatives, and what 
triggers what. I also use network tests / RBL's as well and Bayes. The 
simple fact of the matter is that on plenty of spam emails, only one 
significant rule might get triggered - be it a high bayes score, one of 
the DNS RBL's or something else. If the rule doesn't have a high enough 
score, the email passes through.

Spammers change their tactics and content of their emails all the time - 
and the rule scores haven't been updated in months - because of the 
problems with the updating system (which is not a criticism - I 
understand the situation). So for people to advise sticking religiously 
to the default scores, well, frankly I don't get it.

Re: HTML_IMAGE_ONLY_* generating too many FP's

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>>On 12/01/2017 11:17 AM, Sebastian Arcus wrote:
>>>-0.2 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
>>>                             [212.227.126.131 listed in wl.mailspike.net]
>>>0.4 MIME_HTML_MOSTLY       BODY: Multipart message mostly text/html MIME
>>>1.6 HTML_IMAGE_ONLY_24     BODY: HTML: images with 2000-2400 
>>>bytes of words
>>>2.0 BAYES_50               BODY: Bayes spam probability is 40 to 60%
>>>                              [score: 0.4808]
>>>0.8 MPART_ALT_DIFF         BODY: HTML and text parts are different
>>>0.0 HTML_MESSAGE           BODY: HTML included in message
>>>2.5 PYZOR_CHECK            Listed in Pyzor (http://pyzor.sf.net/)
>>>-0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at 
>>>http://www.dnswl.org/, no
>>>                              trust
>>>                              [212.227.126.131 listed in list.dnswl.org]

>On 01/12/17 10:54, Axb wrote:
>>you've changed SA default scores and now complain about one which 
>>hasn't been touched as cause for FPs?
>>
>>compare the defaults with yours...
>>score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
>>score BAYES_50  0  0  2.0    0.8
>>
>>hmmmm.... maybe you should rethink those changes.

On 01.12.17 12:23, Sebastian Arcus wrote:
>Indeed, I did amend some of the default SA scores, to catch more spam 
>for the type of email received at this particular site. That doesn't 
>change the fact that 1.6 seems to me a pretty high score for a rule 
>which would be triggered on such a large number of ham emails. Just 
>saying.

You should understand that when you start tuning scores, you can get to hell
very fast. unless you do your own mass-checks and tune according to them.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
42.7 percent of all statistics are made up on the spot. 

Re: HTML_IMAGE_ONLY_* generating too many FP's

Posted by Sebastian Arcus <s....@open-t.co.uk>.
On 01/12/17 10:54, Axb wrote:
> On 12/01/2017 11:17 AM, Sebastian Arcus wrote:
>>
>> On 30/11/17 12:45, Matus UHLAR - fantomas wrote:
>>> On 28.11.17 19:39, Sebastian Arcus wrote:
>>>> I'm having more and more problems with the HTML_IMAGE_ONLY_* set of 
>>>> rules recently generating false positives.
>>>>
>>>> Plenty of business emails will include a logo at the bottom - and 
>>>> not everybody is a graphics expert to make their logo a tiny 
>>>> optimised gif or png - so some of these are slightly bigger than 
>>>> they should be.
>>>>
>>>> However, this seems to be sufficiently wide spread. Also, many 
>>>> business emails can be just a few words reply - so the ratio of 
>>>> words to images triggers the filter in SA. Could the scores on 
>>>> HTML_IMAGE_ONLY_* set of rules be lowered a bit - or is there 
>>>> anything else to be done - aside from educating all the internet on 
>>>> optimising logos in the email signatures? :-)
>>>
>>> those have lower scorew with BAYES and network rules enabled.
>>> configure BAYES and enable netowrk rules...
>>
>> Hi. I have BAYES enabled and DNSBL's enabled (I assume that's what you 
>> mean by network rules?). I still think that a score of 1.6 is quite a 
>> lot, considering that so many emails nowadays contain either an 
>> embedded logo in the signature, with just a few words (in a quick 
>> email reply, for example), or even images inserted, instead of 
>> attached to the email. Please see below an example of a SA report:
>>
>> -0.2 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
>>                              [212.227.126.131 listed in wl.mailspike.net]
>> 0.4 MIME_HTML_MOSTLY       BODY: Multipart message mostly text/html MIME
>> 1.6 HTML_IMAGE_ONLY_24     BODY: HTML: images with 2000-2400 bytes of 
>> words
>> 2.0 BAYES_50               BODY: Bayes spam probability is 40 to 60%
>>                               [score: 0.4808]
>> 0.8 MPART_ALT_DIFF         BODY: HTML and text parts are different
>> 0.0 HTML_MESSAGE           BODY: HTML included in message
>> 2.5 PYZOR_CHECK            Listed in Pyzor (http://pyzor.sf.net/)
>> -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at 
>> http://www.dnswl.org/, no
>>                               trust
>>                               [212.227.126.131 listed in list.dnswl.org]
> 
> you've changed SA default scores and now complain about one which hasn't 
> been touched as cause for FPs?
> 
> compare the defaults with yours...
> score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
> score BAYES_50  0  0  2.0    0.8
> 
> hmmmm.... maybe you should rethink those changes.

Indeed, I did amend some of the default SA scores, to catch more spam 
for the type of email received at this particular site. That doesn't 
change the fact that 1.6 seems to me a pretty high score for a rule 
which would be triggered on such a large number of ham emails. Just saying.

Re: HTML_IMAGE_ONLY_* generating too many FP's

Posted by Axb <ax...@gmail.com>.
On 12/01/2017 11:17 AM, Sebastian Arcus wrote:
> 
> On 30/11/17 12:45, Matus UHLAR - fantomas wrote:
>> On 28.11.17 19:39, Sebastian Arcus wrote:
>>> I'm having more and more problems with the HTML_IMAGE_ONLY_* set of 
>>> rules recently generating false positives.
>>>
>>> Plenty of business emails will include a logo at the bottom - and not 
>>> everybody is a graphics expert to make their logo a tiny optimised 
>>> gif or png - so some of these are slightly bigger than they should be.
>>>
>>> However, this seems to be sufficiently wide spread. Also, many 
>>> business emails can be just a few words reply - so the ratio of words 
>>> to images triggers the filter in SA. Could the scores on 
>>> HTML_IMAGE_ONLY_* set of rules be lowered a bit - or is there 
>>> anything else to be done - aside from educating all the internet on 
>>> optimising logos in the email signatures? :-)
>>
>> those have lower scorew with BAYES and network rules enabled.
>> configure BAYES and enable netowrk rules...
> 
> Hi. I have BAYES enabled and DNSBL's enabled (I assume that's what you 
> mean by network rules?). I still think that a score of 1.6 is quite a 
> lot, considering that so many emails nowadays contain either an embedded 
> logo in the signature, with just a few words (in a quick email reply, 
> for example), or even images inserted, instead of attached to the email. 
> Please see below an example of a SA report:
> 
> -0.2 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
>                              [212.227.126.131 listed in wl.mailspike.net]
> 0.4 MIME_HTML_MOSTLY       BODY: Multipart message mostly text/html MIME
> 1.6 HTML_IMAGE_ONLY_24     BODY: HTML: images with 2000-2400 bytes of words
> 2.0 BAYES_50               BODY: Bayes spam probability is 40 to 60%
>                               [score: 0.4808]
> 0.8 MPART_ALT_DIFF         BODY: HTML and text parts are different
> 0.0 HTML_MESSAGE           BODY: HTML included in message
> 2.5 PYZOR_CHECK            Listed in Pyzor (http://pyzor.sf.net/)
> -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at http://www.dnswl.org/, no
>                               trust
>                               [212.227.126.131 listed in list.dnswl.org]

you've changed SA default scores and now complain about one which hasn't 
been touched as cause for FPs?

compare the defaults with yours...
score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
score BAYES_50  0  0  2.0    0.8

hmmmm.... maybe you should rethink those changes.


Re: HTML_IMAGE_ONLY_* generating too many FP's

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>>On 28.11.17 19:39, Sebastian Arcus wrote:
>>>I'm having more and more problems with the HTML_IMAGE_ONLY_* set 
>>>of rules recently generating false positives.

>On 30/11/17 12:45, Matus UHLAR - fantomas wrote:
>>those have lower scorew with BAYES and network rules enabled.
>>configure BAYES and enable netowrk rules...

On 01.12.17 10:17, Sebastian Arcus wrote:
>Hi. I have BAYES enabled and DNSBL's enabled (I assume that's what 
>you mean by network rules?). I still think that a score of 1.6 is 
>quite a lot, considering that so many emails nowadays contain either 
>an embedded logo in the signature, with just a few words (in a quick 
>email reply, for example), or even images inserted, instead of 
>attached to the email. Please see below an example of a SA report:

>1.6 HTML_IMAGE_ONLY_24     BODY: HTML: images with 2000-2400 bytes of words
>2.0 BAYES_50               BODY: Bayes spam probability is 40 to 60%
>                             [score: 0.4808]

configuring BAYES includes training it, so your mail don't get 0.48 score.

>2.5 PYZOR_CHECK            Listed in Pyzor (http://pyzor.sf.net/)

now I really wonder why you blame HTML_IMAGE_ONLY_24, when BAYES_50 and
PYZOR_CHECK gave you higher score each?

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I just got lost in thought. It was unfamiliar territory.