You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2015/08/06 19:52:56 UTC
[Bug 7232] New: Getting rid of 'use bytes' crouches throughout
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232
Bug ID: 7232
Summary: Getting rid of 'use bytes' crouches throughout
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Hardware: All
OS: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Libraries
Assignee: dev@spamassassin.apache.org
Reporter: Mark.Martinec@ijs.si
I'd like to comment-out (or delete) the 'use bytes' in all modules,
in preparation for a more sensible Unicode use internally.
So far the historical use of 'use bytes' has already bitten us
at least twice (Bug 7215 and in bayes tokenization few months ago).
It is sprinkled all over the place, even though it may have been
needed in only a couple of places.
The 'bytes' pragma man page says:
NAME
bytes - Perl pragma to force byte semantics rather than character
semantics
NOTICE
This pragma reflects early attempts to incorporate Unicode into perl
and has since been superseded. It breaks encapsulation (i.e. it exposes
the innards of how the perl executable currently happens to store a
string), and use of this module for anything other than debugging
purposes is strongly discouraged. If you feel that the functions here
within might be useful for your application, this possibly indicates a
mismatch between your mental model of Perl Unicode and the current
reality. In that case, you may wish to read some of the perl Unicode
documentation: perluniintro, perlunitut, perlunifaq and perlunicode.
Its use affects functions ord, chr, length, substr, index, rindex.
If there is ever a need to convert Unicode into UTF-8 octets,
it should be done explicitly, e.g. through utf8::encode($s),
possibly conditionalized by: if utf8::is_utf8($s)
I believe this explicit encoding has already been done in most
cases where it was necessary. Nevertheless we should keep eye open
for some corner cases which may pop up.
The patch is purely mechanical:
$ perl -i -pe 's/^(\s*)use\s+bytes\s*;/$1# use bytes;/'
and can be easily reverted if necessary.
All tests pass (5.22.0 and 5.8.9). In a couple of hours since
I'm running this code (with charset normalization enabled)
I haven't noticed anything unusual (like warnings or changes in
bayes tokenization). There is also no change/slowdown in timing,
but that's expected as rules still are (mostly?) not yet exposed
to Unicode.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7232] Getting rid of 'use bytes' crutches throughout
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232
RW <rw...@googlemail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rwmaillists@googlemail.com
--- Comment #8 from RW <rw...@googlemail.com> ---
I estimate that there are about 150 regular expression rules that make use of
byte values, either directly or via 35 of the templates. This isn't counting
meta-rules that depend on them.
Dropping 'use byte' probably wont cause any FPs from these rules, it will just
cause their TP rates to degrade unobtrusively to varying degrees.
Does rule QA have anything that could be used to see the overall effect of a
change like this?
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7232] Getting rid of 'use bytes' crutches throughout
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232
--- Comment #9 from Karsten Bräckelmann <gu...@rudersport.de> ---
(In reply to Bill Cole from comment #7)
> We know it basically works, as it has been in trunk for 3 years & 3.4 for
> over a year, and both are being used in production. We know it's the right
> direction.
Agreed.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7232] Getting rid of 'use bytes' crouches throughout
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232
Kevin A. McGrail <km...@apache.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.0.0 |3.4.2
Severity|enhancement |blocker
Status|RESOLVED |REOPENED
Resolution|FIXED |---
--- Comment #6 from Kevin A. McGrail <km...@apache.org> ---
Do we want to remove this patch from 3.4.2?
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7232] Getting rid of 'use bytes' crouches throughout
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232
Mark Martinec <Ma...@ijs.si> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|Undefined |4.0.0
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7232] Getting rid of 'use bytes' crutches throughout
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232
Kevin A. McGrail <km...@apache.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|REOPENED |RESOLVED
--- Comment #10 from Kevin A. McGrail <km...@apache.org> ---
RW, what rules are broken by use bytes going away, please?
Closing this ticket as keeping in 3.4.2
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7232] Getting rid of 'use bytes' crouches throughout
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232
Bill Cole <bi...@apache.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |billcole@apache.org
--- Comment #7 from Bill Cole <bi...@apache.org> ---
(In reply to Kevin A. McGrail from comment #6)
> Do we want to remove this patch from 3.4.2?
I vote KEEP.
We know it basically works, as it has been in trunk for 3 years & 3.4 for over
a year, and both are being used in production. We know it's the right
direction. We don't want to support the byte-imposed botches needed to catch
non-ascii characters any longer than absolutely necessary.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7232] Getting rid of 'use bytes' crouches throughout
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232
--- Comment #1 from Mark Martinec <Ma...@ijs.si> ---
trunk:
Sending lib/Mail/SpamAssassin/ArchiveIterator.pm
Sending lib/Mail/SpamAssassin/AsyncLoop.pm
Sending lib/Mail/SpamAssassin/AutoWhitelist.pm
Sending lib/Mail/SpamAssassin/Bayes/CombineChi.pm
Sending lib/Mail/SpamAssassin/Bayes/CombineNaiveBayes.pm
Sending lib/Mail/SpamAssassin/Bayes.pm
Sending lib/Mail/SpamAssassin/BayesStore/BDB.pm
Sending lib/Mail/SpamAssassin/BayesStore/DBM.pm
Sending lib/Mail/SpamAssassin/BayesStore/MySQL.pm
Sending lib/Mail/SpamAssassin/BayesStore/PgSQL.pm
Sending lib/Mail/SpamAssassin/BayesStore/Redis.pm
Sending lib/Mail/SpamAssassin/BayesStore/SDBM.pm
Sending lib/Mail/SpamAssassin/BayesStore/SQL.pm
Sending lib/Mail/SpamAssassin/BayesStore.pm
Sending lib/Mail/SpamAssassin/Conf/LDAP.pm
Sending lib/Mail/SpamAssassin/Conf/Parser.pm
Sending lib/Mail/SpamAssassin/Conf/SQL.pm
Sending lib/Mail/SpamAssassin/Conf.pm
Sending lib/Mail/SpamAssassin/DBBasedAddrList.pm
Sending lib/Mail/SpamAssassin/Dns.pm
Sending lib/Mail/SpamAssassin/DnsResolver.pm
Sending lib/Mail/SpamAssassin/Locales.pm
Sending lib/Mail/SpamAssassin/Locker/Flock.pm
Sending lib/Mail/SpamAssassin/Locker/UnixNFSSafe.pm
Sending lib/Mail/SpamAssassin/Locker/Win32.pm
Sending lib/Mail/SpamAssassin/Locker.pm
Sending lib/Mail/SpamAssassin/Logger/File.pm
Sending lib/Mail/SpamAssassin/Logger/Stderr.pm
Sending lib/Mail/SpamAssassin/Logger/Syslog.pm
Sending lib/Mail/SpamAssassin/Logger.pm
Sending lib/Mail/SpamAssassin/MailingList.pm
Sending lib/Mail/SpamAssassin/Message/Metadata/Received.pm
Sending lib/Mail/SpamAssassin/Message/Metadata.pm
Sending lib/Mail/SpamAssassin/NetSet.pm
Sending lib/Mail/SpamAssassin/PerMsgLearner.pm
Sending lib/Mail/SpamAssassin/PersistentAddrList.pm
Sending lib/Mail/SpamAssassin/Plugin/AWL.pm
Sending lib/Mail/SpamAssassin/Plugin/AccessDB.pm
Sending lib/Mail/SpamAssassin/Plugin/AntiVirus.pm
Sending lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm
Sending lib/Mail/SpamAssassin/Plugin/Bayes.pm
Sending lib/Mail/SpamAssassin/Plugin/BodyEval.pm
Sending lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Sending lib/Mail/SpamAssassin/Plugin/DCC.pm
Sending lib/Mail/SpamAssassin/Plugin/DKIM.pm
Sending lib/Mail/SpamAssassin/Plugin/DNSEval.pm
Sending lib/Mail/SpamAssassin/Plugin/HTMLEval.pm
Sending lib/Mail/SpamAssassin/Plugin/HTTPSMismatch.pm
Sending lib/Mail/SpamAssassin/Plugin/Hashcash.pm
Sending lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
Sending lib/Mail/SpamAssassin/Plugin/ImageInfo.pm
Sending lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
Sending lib/Mail/SpamAssassin/Plugin/MIMEHeader.pm
Sending lib/Mail/SpamAssassin/Plugin/NetCache.pm
Sending lib/Mail/SpamAssassin/Plugin/P595Body.pm
Sending lib/Mail/SpamAssassin/Plugin/PDFInfo.pm
Sending lib/Mail/SpamAssassin/Plugin/Pyzor.pm
Sending lib/Mail/SpamAssassin/Plugin/RabinKarpBody.pm
Sending lib/Mail/SpamAssassin/Plugin/Razor2.pm
Sending lib/Mail/SpamAssassin/Plugin/RelayCountry.pm
Sending lib/Mail/SpamAssassin/Plugin/RelayEval.pm
Sending lib/Mail/SpamAssassin/Plugin/ReplaceTags.pm
Sending lib/Mail/SpamAssassin/Plugin/Reuse.pm
Sending lib/Mail/SpamAssassin/Plugin/Rule2XSBody.pm
Sending lib/Mail/SpamAssassin/Plugin/SPF.pm
Sending lib/Mail/SpamAssassin/Plugin/Shortcircuit.pm
Sending lib/Mail/SpamAssassin/Plugin/SpamCop.pm
Sending lib/Mail/SpamAssassin/Plugin/Test.pm
Sending lib/Mail/SpamAssassin/Plugin/TextCat.pm
Sending lib/Mail/SpamAssassin/Plugin/TxRep.pm
Sending lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending lib/Mail/SpamAssassin/Plugin/URIDetail.pm
Sending lib/Mail/SpamAssassin/Plugin/URIEval.pm
Sending lib/Mail/SpamAssassin/Plugin/URILocalBL.pm
Sending lib/Mail/SpamAssassin/Plugin/WLBLEval.pm
Sending lib/Mail/SpamAssassin/Plugin/WhiteListSubject.pm
Sending lib/Mail/SpamAssassin/Plugin.pm
Sending lib/Mail/SpamAssassin/PluginHandler.pm
Sending lib/Mail/SpamAssassin/RegistryBoundaries.pm
Sending lib/Mail/SpamAssassin/Reporter.pm
Sending lib/Mail/SpamAssassin/SQLBasedAddrList.pm
Sending lib/Mail/SpamAssassin/SpamdForkScaling.pm
Sending lib/Mail/SpamAssassin/SubProcBackChannel.pm
Sending lib/Mail/SpamAssassin/Timeout.pm
Sending lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Sending lib/Mail/SpamAssassin/Util/MemoryDump.pm
Sending lib/Mail/SpamAssassin/Util/Progress.pm
Sending lib/Mail/SpamAssassin/Util/RegistrarBoundaries.pm
Sending lib/Mail/SpamAssassin/Util/ScopedTimer.pm
Sending lib/Mail/SpamAssassin/Util.pm
Sending lib/Mail/SpamAssassin.pm
Sending sa-learn.raw
Sending spamassassin.raw
Committed revision 1694545.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7232] Getting rid of 'use bytes' crouches throughout
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232
--- Comment #4 from Mark Martinec <Ma...@ijs.si> ---
That was a biggie for backporting - not in patch size, but in potential
implications.
I hope older perls will be happy with introducing more Unicode strings in
processing.
The change is well tested in trunk and solves a couple of issues, but it is
quite
deep reaching and required compensating for the change in places, so my
intention
was to target it for 4.0, not with a minor release.
Anyway, my +0.5 for 3.4.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7232] Getting rid of 'use bytes' crouches throughout
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232
--- Comment #5 from Kevin A. McGrail <km...@pccc.com> ---
Understood. We had two people look at it and I did testing on 5.8.6 on an old
box and 5.16.3 if it makes you feel better. I'm at $dayjob right now but will
make sure to double check this tonight.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7232] Getting rid of 'use bytes' crouches throughout
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232
Kevin A. McGrail <km...@pccc.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
CC| |kmcgrail@pccc.com
Status|NEW |RESOLVED
--- Comment #3 from Kevin A. McGrail <km...@pccc.com> ---
Applying also to 3.4 branch and marking as resolved
svn commit -m 'KG: Syncing Trunk to 3.4:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232 removing use bytes'
Sending lib/Mail/SpamAssassin/ArchiveIterator.pm
Sending lib/Mail/SpamAssassin/AsyncLoop.pm
Sending lib/Mail/SpamAssassin/AutoWhitelist.pm
Sending lib/Mail/SpamAssassin/Bayes/CombineChi.pm
Sending lib/Mail/SpamAssassin/Bayes/CombineNaiveBayes.pm
Sending lib/Mail/SpamAssassin/Bayes.pm
Sending lib/Mail/SpamAssassin/BayesStore/BDB.pm
Sending lib/Mail/SpamAssassin/BayesStore/DBM.pm
Sending lib/Mail/SpamAssassin/BayesStore/MySQL.pm
Sending lib/Mail/SpamAssassin/BayesStore/PgSQL.pm
Sending lib/Mail/SpamAssassin/BayesStore/Redis.pm
Sending lib/Mail/SpamAssassin/BayesStore/SDBM.pm
Sending lib/Mail/SpamAssassin/BayesStore/SQL.pm
Sending lib/Mail/SpamAssassin/BayesStore.pm
Sending lib/Mail/SpamAssassin/Conf/LDAP.pm
Sending lib/Mail/SpamAssassin/Conf/Parser.pm
Sending lib/Mail/SpamAssassin/Conf/SQL.pm
Sending lib/Mail/SpamAssassin/Conf.pm
Sending lib/Mail/SpamAssassin/DBBasedAddrList.pm
Sending lib/Mail/SpamAssassin/Dns.pm
Sending lib/Mail/SpamAssassin/DnsResolver.pm
Sending lib/Mail/SpamAssassin/Locales.pm
Sending lib/Mail/SpamAssassin/Locker/Flock.pm
Sending lib/Mail/SpamAssassin/Locker/UnixNFSSafe.pm
Sending lib/Mail/SpamAssassin/Locker/Win32.pm
Sending lib/Mail/SpamAssassin/Locker.pm
Sending lib/Mail/SpamAssassin/Logger/File.pm
Sending lib/Mail/SpamAssassin/Logger/Stderr.pm
Sending lib/Mail/SpamAssassin/Logger/Syslog.pm
Sending lib/Mail/SpamAssassin/Logger.pm
Sending lib/Mail/SpamAssassin/MailingList.pm
Sending lib/Mail/SpamAssassin/Message/Metadata/Received.pm
Sending lib/Mail/SpamAssassin/Message/Metadata.pm
Sending lib/Mail/SpamAssassin/NetSet.pm
Sending lib/Mail/SpamAssassin/PerMsgLearner.pm
Sending lib/Mail/SpamAssassin/PersistentAddrList.pm
Sending lib/Mail/SpamAssassin/Plugin/AWL.pm
Sending lib/Mail/SpamAssassin/Plugin/AccessDB.pm
Sending lib/Mail/SpamAssassin/Plugin/AntiVirus.pm
Sending lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm
Sending lib/Mail/SpamAssassin/Plugin/Bayes.pm
Sending lib/Mail/SpamAssassin/Plugin/BodyEval.pm
Sending lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Sending lib/Mail/SpamAssassin/Plugin/DCC.pm
Sending lib/Mail/SpamAssassin/Plugin/DKIM.pm
Sending lib/Mail/SpamAssassin/Plugin/DNSEval.pm
Sending lib/Mail/SpamAssassin/Plugin/HTMLEval.pm
Sending lib/Mail/SpamAssassin/Plugin/HTTPSMismatch.pm
Sending lib/Mail/SpamAssassin/Plugin/Hashcash.pm
Sending lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
Sending lib/Mail/SpamAssassin/Plugin/ImageInfo.pm
Sending lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
Sending lib/Mail/SpamAssassin/Plugin/MIMEHeader.pm
Sending lib/Mail/SpamAssassin/Plugin/NetCache.pm
Sending lib/Mail/SpamAssassin/Plugin/P595Body.pm
Sending lib/Mail/SpamAssassin/Plugin/PDFInfo.pm
Sending lib/Mail/SpamAssassin/Plugin/Pyzor.pm
Sending lib/Mail/SpamAssassin/Plugin/RabinKarpBody.pm
Sending lib/Mail/SpamAssassin/Plugin/Razor2.pm
Sending lib/Mail/SpamAssassin/Plugin/RelayCountry.pm
Sending lib/Mail/SpamAssassin/Plugin/RelayEval.pm
Sending lib/Mail/SpamAssassin/Plugin/ReplaceTags.pm
Sending lib/Mail/SpamAssassin/Plugin/Reuse.pm
Sending lib/Mail/SpamAssassin/Plugin/Rule2XSBody.pm
Sending lib/Mail/SpamAssassin/Plugin/SPF.pm
Sending lib/Mail/SpamAssassin/Plugin/Shortcircuit.pm
Sending lib/Mail/SpamAssassin/Plugin/SpamCop.pm
Sending lib/Mail/SpamAssassin/Plugin/Test.pm
Sending lib/Mail/SpamAssassin/Plugin/TextCat.pm
Sending lib/Mail/SpamAssassin/Plugin/TxRep.pm
Sending lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending lib/Mail/SpamAssassin/Plugin/URIDetail.pm
Sending lib/Mail/SpamAssassin/Plugin/URIEval.pm
Sending lib/Mail/SpamAssassin/Plugin/URILocalBL.pm
Sending lib/Mail/SpamAssassin/Plugin/WLBLEval.pm
Sending lib/Mail/SpamAssassin/Plugin/WhiteListSubject.pm
Sending lib/Mail/SpamAssassin/Plugin.pm
Sending lib/Mail/SpamAssassin/PluginHandler.pm
Sending lib/Mail/SpamAssassin/RegistryBoundaries.pm
Sending lib/Mail/SpamAssassin/Reporter.pm
Sending lib/Mail/SpamAssassin/SQLBasedAddrList.pm
Sending lib/Mail/SpamAssassin/SpamdForkScaling.pm
Sending lib/Mail/SpamAssassin/SubProcBackChannel.pm
Sending lib/Mail/SpamAssassin/Timeout.pm
Sending lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Sending lib/Mail/SpamAssassin/Util/MemoryDump.pm
Sending lib/Mail/SpamAssassin/Util/Progress.pm
Sending lib/Mail/SpamAssassin/Util/RegistrarBoundaries.pm
Sending lib/Mail/SpamAssassin/Util/ScopedTimer.pm
Sending lib/Mail/SpamAssassin/Util.pm
Sending lib/Mail/SpamAssassin.pm
Transmitting file data
...........................................................................................
Committed revision 1790912.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7232] Getting rid of 'use bytes' crouches throughout
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232
--- Comment #2 from Mark Martinec <Ma...@ijs.si> ---
Works well, no need for 'use bytes' any longer, closing.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7232] Getting rid of 'use bytes' crutches throughout
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232
Kevin A. McGrail <km...@apache.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Getting rid of 'use bytes' |Getting rid of 'use bytes'
|crouches throughout |crutches throughout
--
You are receiving this mail because:
You are the assignee for the bug.