You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Brett Schenker <bs...@salsalabs.com> on 2012/06/05 17:31:10 UTC

CKEditor causing high spam score

Hey everyone, I'm new to the list and so glad it exists.

We're running into an issue with CKEditor,
http://stackoverflow.com/questions/10890407/ckeditors-html-artifacts-trigger-spamassassin-can-you-turn-ckeditors-html-modand
the following rules,

body __SEEK_A5MEIH  / cellspacing=\"0\"> <tbody> <tr> <td></
body __SEEK_FFM_YL  /\"> <tbody> <tr> <td> <p style=/
body __SEEK_JOUSPM  /> <\/tbody> <\/table> <\/td> <\/tr> </

CKEditor is placing that html if you use their WYSIWYG editor.  We don't
want to turn that off for clients, as they're pretty low in the tech
ability, but take the rule seriously as it seems to come with installs, so
probably pretty prevalent.

Has anyone else tackled the problem and have a solution?  Thanks for any
help!

Brett Schenker
Salsa Labs

Re: CKEditor causing high spam score

Posted by Benny Pedersen <me...@junc.org>.
Den 2012-06-05 17:31, Brett Schenker skrev:

> Has anyone else tackled the problem and have a solution?  Thanks for
> any help!

souch rules needs more ham from that editor to compensate for keep away 
from hitting this as a spam sign


Re: CKEditor causing high spam score

Posted by Christopher Tiwald <ct...@salsalabs.com>.
On Tue, Jun 05, 2012 at 12:46:27PM -0400, Bowie Bailey wrote:
> Those rules don't exist in the current sought rule set.  You *are*
> keeping the sought rules updated, right?
> 
> What is the date of your 20_sought.cf file?

My file dated from a copy made Monday morning off our spam check server.
I hadn't checked the updates since then, but I just pulled it locally
and it looks like the rules are gone. Thank you for the help. It is much
appreciated.

--
Christopher Tiwald

Re: CKEditor causing high spam score

Posted by Bowie Bailey <Bo...@BUC.com>.
On 6/5/2012 12:33 PM, Axb wrote:
> On 06/05/2012 06:26 PM, Christopher Tiwald wrote:
>> On Tue, Jun 05, 2012 at 11:39:29AM -0400, Kevin A. McGrail wrote:
>>> A) These are just sub rules for use in a meta.  As a specialist in
>>> meta rules, just because you hit a sub rule doesn't matter.  What
>>> matters is if it triggers a scoring rule.  Does it?
>>>
>>> B) I don't recognize those rules or know where they came from.
>>> Where did they come from?
>>>
>> The scoring rule is 4.0 JM_SOUGHT_3, which is one of the "sought
>> channel" rules distributed (and regularly updated) by the
>> sought.rules.yerp.org channel in SpamAssassin [1].
>>
>> That link is a little dated, but the channel is not. It comes stock now
>> with `yum install spamassassin` on RHEL 6, and can be added to a local
>> installation of SA by following the instructions in the link above. The
>> specific path for my vanilla install is:
>>
>> /var/lib/spamassassin/3.003002/sought_rules_yerp_org/20_sought.cf
>>
>> As far as I can tell (admittedly, I haven't studied source), it's simply
>> doing regex matching on a variety of spammy content. Nothing terribly
>> sophisticated -- the pattern matching is straight up "does this exact
>> string exist?" The problem is it's picked up artifacts of CKEditor, a
>> common CRM/CMS editor. I was able to demonstrate the problem using
>> CKEditor's demo page [2], and posted the SO question Brett cited earlier
>> [3].
> The SOUGHT rules are auto generated, several times/day by a third party 
> and not part of the SpamAssassin project.
>
> Pls paste a sample msg in pastebin. IF we can get the right person's 
> attention we may get this fixed.

Those rules don't exist in the current sought rule set.  You *are*
keeping the sought rules updated, right?

What is the date of your 20_sought.cf file?

-- 
Bowie

Re: CKEditor causing high spam score

Posted by Axb <ax...@gmail.com>.
On 06/05/2012 06:26 PM, Christopher Tiwald wrote:
> On Tue, Jun 05, 2012 at 11:39:29AM -0400, Kevin A. McGrail wrote:
>> A) These are just sub rules for use in a meta.  As a specialist in
>> meta rules, just because you hit a sub rule doesn't matter.  What
>> matters is if it triggers a scoring rule.  Does it?
>>
>> B) I don't recognize those rules or know where they came from.
>> Where did they come from?
>>
>
> The scoring rule is 4.0 JM_SOUGHT_3, which is one of the "sought
> channel" rules distributed (and regularly updated) by the
> sought.rules.yerp.org channel in SpamAssassin [1].
>
> That link is a little dated, but the channel is not. It comes stock now
> with `yum install spamassassin` on RHEL 6, and can be added to a local
> installation of SA by following the instructions in the link above. The
> specific path for my vanilla install is:
>
> /var/lib/spamassassin/3.003002/sought_rules_yerp_org/20_sought.cf
>
> As far as I can tell (admittedly, I haven't studied source), it's simply
> doing regex matching on a variety of spammy content. Nothing terribly
> sophisticated -- the pattern matching is straight up "does this exact
> string exist?" The problem is it's picked up artifacts of CKEditor, a
> common CRM/CMS editor. I was able to demonstrate the problem using
> CKEditor's demo page [2], and posted the SO question Brett cited earlier
> [3].

The SOUGHT rules are auto generated, several times/day by a third party 
and not part of the SpamAssassin project.

Pls paste a sample msg in pastebin. IF we can get the right person's 
attention we may get this fixed.

Axb







Re: CKEditor causing high spam score

Posted by Benny Pedersen <me...@junc.org>.
Den 2012-06-13 08:24, Axb skrev:

> Not officially, but if the SOUGHT creator is reading, he may get back
> to you offlist.

who have set him read only here ?



Re: CKEditor causing high spam score

Posted by Axb <ax...@gmail.com>.
On 06/13/2012 08:16 AM, Niamh Holding wrote:
>
> Hello Axb,
>
> Wednesday, June 13, 2012, 7:07:59 AM, you wrote:
>
> A>  Nobody stops you from changing the SOUGHT rules' scores if you think
> A>  they're scored too high.
>
> I'm keeping an eye on the false positives caused by them to make that
> call.
>
> Is there anywhere we can send misscored ham to help improve the rules?
>

Not officially, but if the SOUGHT creator is reading, he may get back to 
you offlist.

Re: CKEditor causing high spam score

Posted by Niamh Holding <ni...@fullbore.co.uk>.
Hello Axb,

Wednesday, June 13, 2012, 7:07:59 AM, you wrote:

A> Nobody stops you from changing the SOUGHT rules' scores if you think 
A> they're scored too high.

I'm keeping an eye on the false positives caused by them to make that
call.

Is there anywhere we can send misscored ham to help improve the rules?

-- 
Best regards,
 Niamh                            mailto:niamh@fullbore.co.uk

Re: CKEditor causing high spam score

Posted by Axb <ax...@gmail.com>.
On 06/13/2012 07:53 AM, Niamh Holding wrote:
>
> Hello Benny,
>
> Wednesday, June 13, 2012, 1:36:37 AM, you wrote:
>
> BP>  nope sought rules just needs more ham
>
> Unless a rule is almost perfect then for it to apply 80% of the
> default spam identification score is probably excessive.
>

Nobody stops you from changing the SOUGHT rules' scores if you think 
they're scored too high.
The meta names don't change.

Re: CKEditor causing high spam score

Posted by Niamh Holding <ni...@fullbore.co.uk>.
Hello Benny,

Wednesday, June 13, 2012, 6:59:39 PM, you wrote:

BP>  there is plenty of other rules that can adjust 
BP> the complete scores up or down

Very few have such a high score, that's my point.

-- 
Best regards,
 Niamh                            mailto:niamh@fullbore.co.uk

Re: CKEditor causing high spam score

Posted by Benny Pedersen <me...@junc.org>.
Den 2012-06-13 18:20, Niamh Holding skrev:
> Hello Benny,
>
> Wednesday, June 13, 2012, 5:12:16 PM, you wrote:
>
> BP> true if it only hits spam and not ham
>
> That's the point this high scoring rule hits ham and causes false
> positives.

there is just one rule in spamassassin ?, your clams is currect if it 
was just one rule, but there is plenty of other rules that can adjust 
the complete scores up or down

meta foo-rule (score-adj)

where score-adj is a integer value can adjust a foo-rule score, will 
work aswell




Re: CKEditor causing high spam score

Posted by Niamh Holding <ni...@fullbore.co.uk>.
Hello Benny,

Wednesday, June 13, 2012, 5:12:16 PM, you wrote:

BP> true if it only hits spam and not ham

That's the point this high scoring rule hits ham and causes false
positives.

-- 
Best regards,
 Niamh                            mailto:niamh@fullbore.co.uk

Re: CKEditor causing high spam score

Posted by Benny Pedersen <me...@junc.org>.
Den 2012-06-13 07:53, Niamh Holding skrev:
> Hello Benny,
>
> Wednesday, June 13, 2012, 1:36:37 AM, you wrote:
>
> BP> nope sought rules just needs more ham
>
> Unless a rule is almost perfect then for it to apply 80% of the
> default spam identification score is probably excessive.

true if it only hits spam and not ham, but since sa is a based on 
scores, where is the ham scores so ?


Re: CKEditor causing high spam score

Posted by Niamh Holding <ni...@fullbore.co.uk>.
Hello Benny,

Wednesday, June 13, 2012, 1:36:37 AM, you wrote:

BP> nope sought rules just needs more ham

Unless a rule is almost perfect then for it to apply 80% of the
default spam identification score is probably excessive.

-- 
Best regards,
 Niamh                            mailto:niamh@fullbore.co.uk

Re: CKEditor causing high spam score

Posted by Benny Pedersen <me...@junc.org>.
Den 2012-06-12 09:02, Niamh Holding skrev:

> Though I must admit I'm finding the score of 4 a bit high and it's
> causing misclassification of the occasional ham.

nope sought rules just needs more ham


Re: CKEditor causing high spam score

Posted by Niamh Holding <ni...@fullbore.co.uk>.
Hello Christopher,

Tuesday, June 5, 2012, 5:26:43 PM, you wrote:

CT> The scoring rule is 4.0 JM_SOUGHT_3, which is one of the "sought
CT> channel" rules distributed (and regularly updated) by the
CT> sought.rules.yerp.org channel in SpamAssassin [1].

Though I must admit I'm finding the score of 4 a bit high and it's
causing misclassification of the occasional ham.

-- 
Best regards,
 Niamh                            mailto:niamh@fullbore.co.uk

Re: CKEditor causing high spam score

Posted by Christopher Tiwald <ct...@salsalabs.com>.
On Tue, Jun 05, 2012 at 11:39:29AM -0400, Kevin A. McGrail wrote:
> A) These are just sub rules for use in a meta.  As a specialist in
> meta rules, just because you hit a sub rule doesn't matter.  What
> matters is if it triggers a scoring rule.  Does it?
> 
> B) I don't recognize those rules or know where they came from.
> Where did they come from?
> 

The scoring rule is 4.0 JM_SOUGHT_3, which is one of the "sought
channel" rules distributed (and regularly updated) by the
sought.rules.yerp.org channel in SpamAssassin [1].

That link is a little dated, but the channel is not. It comes stock now
with `yum install spamassassin` on RHEL 6, and can be added to a local
installation of SA by following the instructions in the link above. The
specific path for my vanilla install is:

/var/lib/spamassassin/3.003002/sought_rules_yerp_org/20_sought.cf

As far as I can tell (admittedly, I haven't studied source), it's simply
doing regex matching on a variety of spammy content. Nothing terribly
sophisticated -- the pattern matching is straight up "does this exact
string exist?" The problem is it's picked up artifacts of CKEditor, a
common CRM/CMS editor. I was able to demonstrate the problem using
CKEditor's demo page [2], and posted the SO question Brett cited earlier
[3].

One option for us would be to disable the WYSIWYG, but I can't imagine
we're the only ones affected. The CKEditor user page lists a variety of
large companies and bulk email providers, including MailChimp [4].

[1] http://taint.org/2007/08/15/004348a.html
[2] http://ckeditor.com/demo
[3] http://stackoverflow.com/questions/10890407/ckeditors-html-artifacts-trigger-spamassassin-can-you-turn-ckeditors-html-mod
[4] http://ckeditor.com/who-is-using-ckeditor

--
Christopher Tiwald

Re: CKEditor causing high spam score

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 6/5/2012 11:31 AM, Brett Schenker wrote:
> Hey everyone, I'm new to the list and so glad it exists.
>
> We're running into an issue with CKEditor, 
> http://stackoverflow.com/questions/10890407/ckeditors-html-artifacts-trigger-spamassassin-can-you-turn-ckeditors-html-mod 
> and the following rules,
>
> body __SEEK_A5MEIH  / cellspacing=\"0\"> <tbody> <tr> <td></
> body __SEEK_FFM_YL  /\"> <tbody> <tr> <td> <p style=/
> body __SEEK_JOUSPM  /> <\/tbody> <\/table> <\/td> <\/tr> </
>
> CKEditor is placing that html if you use their WYSIWYG editor.  We 
> don't want to turn that off for clients, as they're pretty low in the 
> tech ability, but take the rule seriously as it seems to come with 
> installs, so probably pretty prevalent.
>
> Has anyone else tackled the problem and have a solution?  Thanks for 
> any help!
A) These are just sub rules for use in a meta.  As a specialist in meta 
rules, just because you hit a sub rule doesn't matter.  What matters is 
if it triggers a scoring rule.  Does it?

B) I don't recognize those rules or know where they came from.  Where 
did they come from?

regards,
KAM