You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2011/10/13 14:56:50 UTC

[Bug 6674] New: new rules for polish users

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

             Bug #: 6674
           Summary: new rules for polish users
           Product: Spamassassin
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Translations and Languages
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: lemat@lemat.priv.pl
    Classification: Unclassified


In Poland there are spammers who send "an invitation to receive spam". They
often refer to our legal system, which disallows UCE but allows UBE (something
like CAN-SPAM act). Those spammers think that if they include those references
in signature of the spam - the spam will be less spammy. The following rules
will catch those references.
Those rules have been working for me for ~3 years with 1 false-positive.
The score of 99 - it works for me.
The ".{1,8}" marks ASCII>128 chars ĄąĆćĘꣳŃńÓ󌜯żŹź (AaCcEeLlNnOoSsZzZz with
glyph).

body LEMAT_1            /Zgodnie z Ustaw.{1,8}z dnia 26.08.2002 r. \(Dz. U. nr
144, poz.1204\)/i
describe LEMAT_1        Pytanie o zgode na otrzymywanie spamu
score LEMAT_1           99

body LEMAT_2            /wiadomo.{1,8}t.{1,8}wys.{1,8}ali.{1,8}my na
og.{1,8}lnodost.{1,8}pny adres e-mailowy/i
describe LEMAT_2        Handlarz adresow email
score LEMAT_2           99

body LEMAT_3            /Je.{1,8}eli nie .{1,8}ycz.{1,8}sobie Pa.{1,8}stwo
otrzymywania podobnych informacji/i
describe LEMAT_3        Opt-out spam
score LEMAT_3           99

body LEMAT_5            /zgodnie z Ustaw.{1,8}z dnia 18 lipca 2002 r. o
.{1,8}wiadczeniu us.{1,8}ug drog.{1,8}elektroniczn.{1,8}/i
describe LEMAT_5        Pytanie o zgode na otrzymywanie spamu
score LEMAT_5           99

body LEMAT_6            /Sekcja Informacji Ekonomicznej INSTYTUTU PROMOCJI
EKSPORTU I KOOPERACJI bazy danych: FIRMY W POLSCE, FIRMY EUROPY/i
describe LEMAT_6        Spammer IPEIK
score LEMAT_6           99

body LEMAT_7            /\(Dz.\s?U. z 2002r, nr 144 poz. 1204 ustawy z dnia 18
lipca 2002 r.\) oraz dyrektyw.{1,8}UOKiK/i
describe LEMAT_7        Pytanie o zgode na otrzymywanie spamu
score LEMAT_7           99

body LEMAT_8            /Dopuszczalne jest przes.{1,8}anie na adres e-mail
pytania.{1,8}czy adresat zgadza si.{1,8}na otrzymywanie
drog.{1,8}elektroniczn.{1,8}informacji handlowej/i
describe LEMAT_8        Pytanie o zgode na otrzymywanie spamu
score LEMAT_8           99

body LEMAT_9            /materia.{1,8}y z konferencji nt. Bezpiecze.{1,8}stwa w
Sieci Internet, Warszawa 14 marca 2006/i
describe LEMAT_9        Pytanie o zgode na otrzymywanie spamu
score LEMAT_9           99

body LEMAT_10            /Pa.{1,8}stwa adres e-mail pochodzi z og.{1,8}lnie
dost.{1,8}pnych .{1,8}r.{1,8}de/i
describe LEMAT_10        Harvester adresow email
score LEMAT_10           99

body LEMAT_26           /Niniejsza wiadomo.{1,8}nie jest
informacj.{1,8}handlow.{1,8}, a jedynie zapytaniem o zgod.{1,8}na
przesy.{1,8}anie informacji handlowych drog.{1,8}elektroniczn/i
describe LEMAT_26       Pytanie o zgode na otrzymywanie spamu
score LEMAT_26          99

body LEMAT_27           /ustaw.{1,8}z dnia 18 lipca
2002\s*r.{1,8}o.{1,8}wiadczeniu us.{1,8}ug
drog.{1,8}elektroniczn.{1,8}\(Dz.\s?U. z (9 wrze.{1,8}snia )?2002\s*r. Nr 144,
poz 1204/i
describe LEMAT_27       Pytanie o zgode na otrzymywanie spamu
score LEMAT_27          99

body LEMAT_29           /Pa.{1,8}stwa dane teleadresowe otrzymali.{1,8}my z
bazy HBI Polska sp. z o. o./i
describe LEMAT_29       Paser adresow email
score LEMAT_29          99

body LEMAT_31           /ustaw.{1,8}z dnia 18 lipca
2002\s*r.{1,8}o.{1,8}wiadczeniu us.{1,8}ug
drog.{1,8}elektroniczn.{1,8}\(Dz.\s?U. Nr 144 z 9 wrze.{1,8}snia 2002\s*r., poz
1204/i          
describe LEMAT_31       Pytanie o zgode na otrzymywanie spamu  
score LEMAT_31          99  

body LEMAT_34           /Pa.{1,8}stwa adres mejlowy pobrali.{1,8}my z
og.{1,8}lnie dost.{1,8}pnych serwis.{1,8}w internetowych/i
describe LEMAT_34       Harvester adresow email
score LEMAT_34          99

body LEMAT_35           /W zwi.{1,8}zku z art. 10 ustawy z dnia 18 lipca 2002
r. o .{1,8}wiadczeniu us.{1,8}ug drog.{1,8}elektroniczn.{1,8}\(Dz.U. nr 144,
poz. 1204\)/i
describe LEMAT_35       Pytanie o zgode na otrzymywanie spamu
score LEMAT_35          99

body LEMAT_36           /Ustaw.{1,8}z dnia 18.07.2002 r. o.{1,8}wiadczeniu
Us.{1,8}ug Drog.{1,8}Elektroniczn.{1,8}\(Dz. U. 2002, nr 144, poz. 1204\)/i     
describe LEMAT_36       Pytanie o zgode na otrzymywanie spamu 
score LEMAT_36          99


Above rules are for Poland only. For other countries will just waste CPU
cycles.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

Lemat <le...@lemat.priv.pl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lemat@lemat.priv.pl

--- Comment #5 from Lemat <le...@lemat.priv.pl> 2011-10-13 18:28:54 UTC ---
1) I can start maintaining these rules as CustomRuleset, I see that "Polish
Language Ruleset" is empty and Status is "?". I just need to know what shall I
do.

2) I'm thinking that maybe SA rules could be packaged with country-specific
customrulesets and the postmaster would decide which rulesets are used,
something like:

preload_rulesets pl de gr
in local.cf

or maybe SA could detect a language (which is not trivial) and load appropriate
customruleset.

I see: Greek, German are active. Romanian is marked as active, but it is empty.

3) Problem with NightlyMassCheck. I do like to setup my mailservers to reject
all spam before DATA while in smtp session. Therefore I reject most english
spam from zombies leaving some 419ers and lots of polish spam - making those
99% not-exactly-true. I have /var/amavis/quarantine with the remaining pile of
spam and there is majority of polish spam. I'm not allowed to peek at other
users mailboxes and I do not receive much spam myself. All together makes
NightlyMassCheck unusable for me.

Therefore my scores are like GTUBE/EICAR.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

--- Comment #8 from Darxus <Da...@ChaosReigns.com> 2011-10-13 19:18:13 UTC ---
Just so others don't need to dig up these links:
http://wiki.apache.org/spamassassin/CustomRulesets lists a "Polish Language
Ruleset" at http://wiki.apache.org/spamassassin/BodyTestsPl .

(In reply to comment #6)
> Perhaps the custom ruleset web page should be subdivided into two specific
> sections:  Languages other than English, and "other" collections.  I also note

Looks like it.  Go for it.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

--- Comment #16 from Karsten Bräckelmann <gu...@rudersport.de> 2011-10-27 00:49:55 UTC ---
(In reply to comment #15)
> ok. Let me explain something using example.
> 
> While testing 419 emails SA accumulates score from rules like:
> LOTTO_AGENT+MONEY_FRAUD_3+ADVANCE_FEE_3_NEW_MONEY+ADVANCE_FEE_4_NEW+... (many
> more). And the cumulative score is usually above kill level. And this is
> exactly what I expect from SA - to kill.

The operative word here is "cumulative". Many rules, not a single one.
Precisely what SA and a scoring system in general is about.

> Therefore if I want the spam in polish emails to be killed - I have to set the
> score like EICAR/GTUBE tests. I have only one bullet (rule) to kill and I want
> this bullet to kill, not to wound.

GTUBE has a score of 1000 -- for the one reason to counter *any* other rules.
It is a *test-point*, not a rule for production to catch spam. Fortunately, you
are wrong and did not set the score like for GTUBE.

> If you want me to set SCORE=1 then my rules will be wasting CPU cycles because
> cumulative score will be much less than $sa_tag2_level_deflt not to mention
> $sa_kill_level_deflt (amavis variables)

I don't think you understand why amavis even has more than one such level...

And no one told you to set the scores to 1. We told you scores of 5 or even 10
definitely are bad. (Unless deliberately set by the admin.)

Moreover, the most important point was the uridnsbl rules, and its requirement
for a local rbldnsd. Especially regarding all your rather strict (read safe)
body rules as mentioned in your original report comment 0, IMHO it likely is
safe to use a score >1. Though 20 is not.


> I have just comented out the scores and rbl.tld rules. And I believe I gave
> enough explanation how the file should be used.

Thanks!

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

--- Comment #4 from AXB <ax...@gmail.com> 2011-10-13 17:09:41 UTC ---
(In reply to comment #3)
> So you actually think it's appropriate for spamassassin to only be useful in
> English?  That seems pretty wrong to me.

When 99% of the spam is in English I don't see the problem.
or do we want to impose 100 language rulesets on ppl who don't need them?

> Although it's a good point that without any Polish data coming in through
> masscheck we couldn't use these in the default set.

SA is a framework with a basic set of rules. These work for most ppl for pretty
decent deafult spam detection.
Those who need more can find rules via third party sa-update channels or
download links in the Wiki.

Same applies to obscure RBLs, etc.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

Darxus <Da...@ChaosReigns.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #10 from Darxus <Da...@ChaosReigns.com> 2011-10-26 22:18:02 UTC ---
Lemat added a "Polish Language Ruleset 2" to
http://wiki.apache.org/spamassassin/CustomRulesets
I expect that's as far as this will go.  Closing.

(In reply to comment #4)
> When 99% of the spam is in English I don't see the problem.
> or do we want to impose 100 language rulesets on ppl who don't need them?

I believe the majority of the spam I receive that SA misses is not English.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

Marcin <bu...@mejor.pl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bug@mejor.pl

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

--- Comment #15 from Lemat <le...@lemat.priv.pl> 2011-10-27 00:09:18 UTC ---
ok. Let me explain something using example.

While testing 419 emails SA accumulates score from rules like:
LOTTO_AGENT+MONEY_FRAUD_3+ADVANCE_FEE_3_NEW_MONEY+ADVANCE_FEE_4_NEW+... (many
more). And the cumulative score is usually above kill level. And this is
exactly what I expect from SA - to kill.

For polish spam almost none of the standard SA rules will match. Below is an
example from most recent polish spam:

BAYES_99+SPF_PASS+HTML_MESSAGE+MIME_HTML_ONLY+MISSING_MID+FORGED_OUTLOOK_HTML+LEMAT_27

Therefore if I want the spam in polish emails to be killed - I have to set the
score like EICAR/GTUBE tests. I have only one bullet (rule) to kill and I want
this bullet to kill, not to wound.

If you want me to set SCORE=1 then my rules will be wasting CPU cycles because
cumulative score will be much less than $sa_tag2_level_deflt not to mention
$sa_kill_level_deflt (amavis variables)

I have just comented out the scores and rbl.tld rules. And I believe I gave
enough explanation how the file should be used.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

Lemat <le...@lemat.priv.pl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #13 from Lemat <le...@lemat.priv.pl> 2011-10-26 22:41:29 UTC ---
(In reply to comment #11)

> OUCH! This is NOT "pretty" and should be removed

please elaborate...

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

--- Comment #7 from Darxus <Da...@ChaosReigns.com> 2011-10-13 19:14:46 UTC ---
(In reply to comment #5)
> 1) I can start maintaining these rules as CustomRuleset, I see that "Polish
> Language Ruleset" is empty and Status is "?". I just need to know what shall I
> do.

That would probably be great.  

At the top of the wiki page, click "Login", then "you can create one now" to
create a wiki account, then email your wiki username to
dev@spamassassin.apache.org to request write access.  

The Polish language rule set is here:
http://svn.apache.org/repos/asf/spamassassin/branches/3.1/rules/25_body_tests_pl.cf

It looks like the download link was broken by somebody disabling the ability to
attach files to the wiki.  It might be best to just copy the entire contents of
that 25_body_tests_pl.cf file onto that wiki page and start editing from there. 

Let us know if you need any help.

> 2) I'm thinking that maybe SA rules could be packaged with country-specific
> customrulesets and the postmaster would decide which rulesets are used,
> something like:
> 
> preload_rulesets pl de gr
> in local.cf

Sounds nice to me.

> 3) Problem with NightlyMassCheck. I do like to setup my mailservers to reject
> all spam before DATA while in smtp session. Therefore I reject most english
> spam from zombies leaving some 419ers and lots of polish spam - making those
> 99% not-exactly-true. I have /var/amavis/quarantine with the remaining pile of
> spam and there is majority of polish spam. I'm not allowed to peek at other
> users mailboxes and I do not receive much spam myself. All together makes
> NightlyMassCheck unusable for me.

I believe your situation is not uncommon for people who are contributing via
masscheck.  Some people only provide non-spam.  So your data would still be
useful to us.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

--- Comment #11 from AXB <ax...@gmail.com> 2011-10-26 22:28:44 UTC ---
(In reply to comment #10)
> Lemat added a "Polish Language Ruleset 2" to
> http://wiki.apache.org/spamassassin/CustomRulesets
> I expect that's as far as this will go.  Closing.
> 
> (In reply to comment #4)
> > When 99% of the spam is in English I don't see the problem.
> > or do we want to impose 100 language rulesets on ppl who don't need them?
> 
> I believe the majority of the spam I receive that SA misses is not English.

OUCH! This is NOT "pretty" and should be removed

http://lemat.priv.pl/pliki/sa_body_test_pl.cf

header LEMAT_CHIKOR     eval:check_rbl_txt('chikor.rbl.tld', 'chikor.rbl.tld.')
describe LEMAT_CHIKOR   chikor
score LEMAT_CHIKOR      5

uridnssub       URIBL_TLD2        dynamic.rbl.tld.       A      127.0.0.2
body            URIBL_TLD2        eval:check_uridnsbl('URIBL_TLD2')
describe        URIBL_TLD2        Contains an URL listed in the dynamic.rbl.tld
blocklist
tflags          URIBL_TLD2        net
reuse           URIBL_TLD2

uridnssub       URIBL_TLD3        chikor.rbl.tld.       A       127.0.0.3
body            URIBL_TLD3        eval:check_uridnsbl('URIBL_TLD3')
describe        URIBL_TLD3        Contains an URL listed in the chikor.rbl.tld
(China) blocklist
tflags          URIBL_TLD3        net
reuse           URIBL_TLD3

uridnssub       URIBL_TLD4        chikor.rbl.tld.       A       127.0.0.4
body            URIBL_TLD4        eval:check_uridnsbl('URIBL_TLD4')
describe        URIBL_TLD4        Contains an URL listed in the chikor.rbl.tld
(Korea) blocklist
tflags          URIBL_TLD4        net
reuse           URIBL_TLD4

uridnssub       URIBL_TLD5        chikor.rbl.tld.       A       127.0.0.5
body            URIBL_TLD5        eval:check_uridnsbl('URIBL_TLD5')
describe        URIBL_TLD5        Contains an URL listed in the chikor.rbl.tld
(Misc) blocklist
tflags          URIBL_TLD5        net
reuse           URIBL_TLD5

score URIBL_TLD2 2.0
score URIBL_TLD3 4.0
score URIBL_TLD4 4.0
score URIBL_TLD5 10.0

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

--- Comment #18 from Darxus <Da...@ChaosReigns.com> 2011-10-27 01:42:00 UTC ---
I think in the absence of enough other rules to accumulate to push an email
over the threshold, it makes plenty of sense to use a blacklist with a single
rule that alone is over the threshold.  Not an ideal situation, but if it's
your only way to effectively block spam, and you can do it without causing a
problematic false positive rate, go for it.


Lemat changing the status of this bug back to "fixed" when posting comment #13
was weird.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

Darxus <Da...@ChaosReigns.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |

--- Comment #12 from Darxus <Da...@ChaosReigns.com> 2011-10-26 22:38:19 UTC ---
# Section requires local rbldnsd below with zones from
http://lemat.priv.pl/pliki/tld.gz
# Below ¼ odpalonego First section requires the file locally rbldnsd
http://lemat.priv.pl/pliki/tld.gz

Huh.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

--- Comment #6 from D. Stussy <so...@kd6lvw.ampr.org> 2011-10-13 19:00:21 UTC ---
I concur:  A custom ruleset with its own update channel is the way to go for
all non-English languages (with very limited exceptions).

The most common exception being when English-speaking users are spammed in
other languages.  As I'm in the U.S., I've seen spam in Spanish, German,
Chinese, and Japanese, but not other languages.  There already exists a
Chinese, French, German, Greek, and Japanese SA channels, and a prior Polish
ruleset from 2005.

Perhaps the custom ruleset web page should be subdivided into two specific
sections:  Languages other than English, and "other" collections.  I also note
that we could add a "SA channel" and PGP key fields to the list, where
available (or leave that for the "more information" link).

http://wiki.apache.org/spamassassin/ContributingNewRules should also point to
the custom ruleset page.  It doesn't.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

Darxus <Da...@ChaosReigns.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |Darxus@ChaosReigns.com

--- Comment #1 from Darxus <Da...@ChaosReigns.com> 2011-10-13 16:34:50 UTC ---
Seems like these should be added to the default rule set?  Lots of language
specific rules in there already, mostly English.

Lemma, we could always use more non-English corpora for rule checking and score
generation via masscheck:  http://wiki.apache.org/spamassassin/NightlyMassCheck

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

--- Comment #19 from Karsten Bräckelmann <gu...@rudersport.de> 2011-10-27 02:05:06 UTC ---
(In reply to comment #18)
> I think in the absence of enough other rules to accumulate to push an email
> over the threshold, it makes plenty of sense to use a blacklist with a single
> rule that alone is over the threshold.  Not an ideal situation, but if it's
> your only way to effectively block spam, and you can do it without causing a
> problematic false positive rate, go for it.

If $admin does it on his server, and knows the blacklist, sure. If you are
publishing rules for others, your responsibilities are much greater.

Also, again, this is about SCORING, thus pushing the score above the SA default
threshold of 5. Aiming at 15 or 20 is something different.

classified spam != SMTP reject

> Lemat changing the status of this bug back to "fixed" when posting comment #13
> was weird.

Dunno how that happened, but keeping a bug open in the browser and reloading
(without shift) is prone to keep the drop-down boxes' state -- and thus
reverting changes with the next comment.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

--- Comment #14 from Karsten Bräckelmann <gu...@rudersport.de> 2011-10-26 22:54:15 UTC ---
> > OUCH! This is NOT "pretty" and should be removed
> 
> please elaborate...

(a) It violates SA conventions and best-practices by using ridiculously high
scores. In a scoring system like SA, no single rule should score above the
default threshold.

(b) The URI DNSBL lookups will fail with this rule-set out of the box, since it
requires a local rbldnsd.

I strongly suggest to wrap that part in an "if(0) ... endif" block by default,
and have the admin explicitly enable it, IFF the local rbldnsd has been set up.
With some additional, verbose explanation.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

--- Comment #17 from Karsten Bräckelmann <gu...@rudersport.de> 2011-10-27 01:18:46 UTC ---
Lemat, parts of my previous comment 16 may sound harsher than intended, sorry.
That was based on my initial mis-understanding of your last paragraph. I did
remove my rant and adjusted the other comments, after I checked your custom
rule-set again.

Your contribution, especially the commonly used Polish phrases in spam to make
it look legit, is much appreciated. And since the audience of bugzilla (and
dev@) is rather limited, you might even want to announce your Polish rule-set
to the users@ list, providing a link to the wiki page. That should reach more
users and admins interested in Polish specific rules -- and might get you some
feedback to refine the rules further.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

--- Comment #3 from Darxus <Da...@ChaosReigns.com> 2011-10-13 16:59:04 UTC ---
So you actually think it's appropriate for spamassassin to only be useful in
English?  That seems pretty wrong to me.

Although it's a good point that without any Polish data coming in through
masscheck we couldn't use these in the default set.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

--- Comment #2 from AXB <ax...@gmail.com> 2011-10-13 16:51:22 UTC ---
(In reply to comment #1)
> Seems like these should be added to the default rule set?  Lots of language
> specific rules in there already, mostly English.

Imo, such specific regional rules should be a separately OPTIONAL rule set
under

http://wiki.apache.org/spamassassin/CustomRulesets

So anybody who needs them can add them.
In the default ruleset would add little value for most users.

Ideally some volunteer would start language based sa-update channels.. ideally.

Without a masscheck / polish corpus the scores stand no chance anyway so
CustomRulesets is the ideal place to go.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6674] new rules for polish users

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6674

Adam Katz <an...@khopis.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |antispam@khopis.com

--- Comment #9 from Adam Katz <an...@khopis.com> 2011-10-13 21:15:04 UTC ---
(In reply to comment #5)
> 1) I can start maintaining these rules as CustomRuleset, I see that "Polish
> Language Ruleset" is empty and Status is "?". I just need to know what shall I
> do.

Segregating rulesets by language is generally a bad idea because it limits
visibility (FPs get minimized and ignored) and it becomes impossible to
maintain.  There is nothing wrong with this approach if not a part of the main
project, like say as an sa-update channel.

> or maybe SA could detect a language (which is not trivial) and load
> appropriate customruleset.
> 
> I see: Greek, German are active. Romanian is marked as active, but it is
> empty.

Language detection with TextCat is awful.  It's better than nothing, but it is
frequently wrong.

> 2) I'm thinking that maybe SA rules could be packaged with country-specific
> customrulesets and the postmaster would decide which rulesets are used,
> something like:
> 
> preload_rulesets pl de gr
> in local.cf

I believe language-specific rulesets are already possible in SA via locale
support (though note you can currently only have one locale).  Though I've
never tried it, you can conceivably write rules like this:

  lang pl body PL_FOO /\btawerna\b/i

PL_FOO would then only be run if the system locale is Polish.  This is
currently only used for "describe" lines.

However, I'd rather see this implemented as channels.

If we wanted to get more specific, I'd say the channels should be vetted
through mass-check (as my channels are), so that rules good enough to be
mainstream can be automatically promoted.  It should be noted that the current
ruleqa system with its current corpora is not at all set up to properly
evaluate rule efficacy for Polish language mail and would do an awful job.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.