You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2011/05/01 19:05:47 UTC

[Bug 6229] [review] TextCat is too case sensitive

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6229

Henrik Krohns <he...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hege@hege.li
   Target Milestone|Undefined                   |3.3.2
            Summary|TextCat is too case         |[review] TextCat is too
                   |sensitive                   |case sensitive

--- Comment #1 from Henrik Krohns <he...@hege.li> 2011-05-01 17:05:47 UTC ---
Please vote on my patch above and whether to implement it in 3.3 also (+1 for
me).

It's obvious that no one is going to rewrite stuff for unicode support soon
(probably needs extensive changes, I'm no guru in that area). Also unless no
one is willing to spend time remaking the database, this is easier workaround.

I did lots of corpus testing and the patch doesn't seem to break anything, on
the contrary it fixes many cases. The positives most likely far outweight any
possible negatives. I think [a-z]{4} is pretty safe matching latin1 languages.

Feel free to test it if you have non-common language corpuses (Warren etc?). I
can supply some perl scripts for testing if needed.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.