You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2015/04/09 02:22:38 UTC

[Bug 5590] Scantime is very long on certain messages unless "use bytes" hack is used

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=5590

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #58 from Mark Martinec <Ma...@ijs.si> ---
I'm tentatively closing this issue.

There has been extensive work in 3.4.1 to stay consistent with
byte semantics (e.g. use UTF-8 encodings) in decoding, HTML decoding,
bayes tokenization, etc, so the original concern has either been
resolved or avoided by now.

In the future the remaining 'use bytes' should probably be
removed, as they are unnecessary when we can ensure that
both the text and regular expressions are using byte semantics
(i.e. have utf8 flag off) - which was not always the case in
the past. Alternatively, we can start considering switching to
character semantics throughout. In any case, it's probably
the best to open up a new ticket when time comes.

-- 
You are receiving this mail because:
You are the assignee for the bug.