You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2015/08/06 19:52:56 UTC

[Bug 7232] New: Getting rid of 'use bytes' crouches throughout

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232

            Bug ID: 7232
           Summary: Getting rid of 'use bytes' crouches throughout
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Libraries
          Assignee: dev@spamassassin.apache.org
          Reporter: Mark.Martinec@ijs.si

I'd like to comment-out (or delete) the 'use bytes' in all modules,
in preparation for a more sensible Unicode use internally.

So far the historical use of 'use bytes' has already bitten us
at least twice (Bug 7215 and in bayes tokenization few months ago).
It is sprinkled all over the place, even though it may have been
needed in only a couple of places.


The 'bytes' pragma man page says:


NAME
  bytes - Perl pragma to force byte semantics rather than character
  semantics

NOTICE
  This pragma reflects early attempts to incorporate Unicode into perl
  and has since been superseded. It breaks encapsulation (i.e. it exposes
  the innards of how the perl executable currently happens to store a
  string), and use of this module for anything other than debugging
  purposes is strongly discouraged. If you feel that the functions here
  within might be useful for your application, this possibly indicates a
  mismatch between your mental model of Perl Unicode and the current
  reality. In that case, you may wish to read some of the perl Unicode
  documentation: perluniintro, perlunitut, perlunifaq and perlunicode.


Its use affects functions ord, chr, length, substr, index, rindex.

If there is ever a need to convert Unicode into UTF-8 octets,
it should be done explicitly, e.g. through utf8::encode($s),
possibly conditionalized by:  if utf8::is_utf8($s)

I believe this explicit encoding has already been done in most
cases where it was necessary. Nevertheless we should keep eye open
for some corner cases which may pop up.


The patch is purely mechanical:
  $ perl -i -pe 's/^(\s*)use\s+bytes\s*;/$1# use bytes;/'
and can be easily reverted if necessary.

All tests pass (5.22.0 and 5.8.9). In a couple of hours since
I'm running this code (with charset normalization enabled)
I haven't noticed anything unusual (like warnings or changes in
bayes tokenization). There is also no change/slowdown in timing,
but that's expected as rules still are (mostly?) not yet exposed
to Unicode.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7232] Getting rid of 'use bytes' crutches throughout

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232

RW <rw...@googlemail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rwmaillists@googlemail.com

--- Comment #8 from RW <rw...@googlemail.com> ---

I estimate that there are about 150 regular expression rules that make use of
byte values, either directly or via 35 of the templates. This isn't counting
meta-rules that depend on them.

Dropping 'use byte' probably wont cause any FPs from these rules, it will just
cause their TP rates to degrade unobtrusively to varying degrees. 

Does rule QA have anything that could be used to see the overall effect of a
change like this?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7232] Getting rid of 'use bytes' crutches throughout

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232

--- Comment #9 from Karsten Bräckelmann <gu...@rudersport.de> ---
(In reply to Bill Cole from comment #7)
> We know it basically works, as it has been in trunk for 3 years & 3.4 for
> over a year, and both are being used in production. We know it's the right
> direction.

Agreed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7232] Getting rid of 'use bytes' crouches throughout

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232

Kevin A. McGrail <km...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.0.0                       |3.4.2
           Severity|enhancement                 |blocker
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---

--- Comment #6 from Kevin A. McGrail <km...@apache.org> ---
Do we want to remove this patch from 3.4.2?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7232] Getting rid of 'use bytes' crouches throughout

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |4.0.0

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7232] Getting rid of 'use bytes' crutches throughout

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232

Kevin A. McGrail <km...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|REOPENED                    |RESOLVED

--- Comment #10 from Kevin A. McGrail <km...@apache.org> ---
RW, what rules are broken by use bytes going away, please? 

Closing this ticket as keeping in 3.4.2

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7232] Getting rid of 'use bytes' crouches throughout

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232

Bill Cole <bi...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |billcole@apache.org

--- Comment #7 from Bill Cole <bi...@apache.org> ---
(In reply to Kevin A. McGrail from comment #6)
> Do we want to remove this patch from 3.4.2?

I vote KEEP. 

We know it basically works, as it has been in trunk for 3 years & 3.4 for over
a year, and both are being used in production. We know it's the right
direction. We don't want to support the byte-imposed botches needed to catch
non-ascii characters any longer than absolutely necessary.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7232] Getting rid of 'use bytes' crouches throughout

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232

--- Comment #1 from Mark Martinec <Ma...@ijs.si> ---
  trunk:
Sending        lib/Mail/SpamAssassin/ArchiveIterator.pm
Sending        lib/Mail/SpamAssassin/AsyncLoop.pm
Sending        lib/Mail/SpamAssassin/AutoWhitelist.pm
Sending        lib/Mail/SpamAssassin/Bayes/CombineChi.pm
Sending        lib/Mail/SpamAssassin/Bayes/CombineNaiveBayes.pm
Sending        lib/Mail/SpamAssassin/Bayes.pm
Sending        lib/Mail/SpamAssassin/BayesStore/BDB.pm
Sending        lib/Mail/SpamAssassin/BayesStore/DBM.pm
Sending        lib/Mail/SpamAssassin/BayesStore/MySQL.pm
Sending        lib/Mail/SpamAssassin/BayesStore/PgSQL.pm
Sending        lib/Mail/SpamAssassin/BayesStore/Redis.pm
Sending        lib/Mail/SpamAssassin/BayesStore/SDBM.pm
Sending        lib/Mail/SpamAssassin/BayesStore/SQL.pm
Sending        lib/Mail/SpamAssassin/BayesStore.pm
Sending        lib/Mail/SpamAssassin/Conf/LDAP.pm
Sending        lib/Mail/SpamAssassin/Conf/Parser.pm
Sending        lib/Mail/SpamAssassin/Conf/SQL.pm
Sending        lib/Mail/SpamAssassin/Conf.pm
Sending        lib/Mail/SpamAssassin/DBBasedAddrList.pm
Sending        lib/Mail/SpamAssassin/Dns.pm
Sending        lib/Mail/SpamAssassin/DnsResolver.pm
Sending        lib/Mail/SpamAssassin/Locales.pm
Sending        lib/Mail/SpamAssassin/Locker/Flock.pm
Sending        lib/Mail/SpamAssassin/Locker/UnixNFSSafe.pm
Sending        lib/Mail/SpamAssassin/Locker/Win32.pm
Sending        lib/Mail/SpamAssassin/Locker.pm
Sending        lib/Mail/SpamAssassin/Logger/File.pm
Sending        lib/Mail/SpamAssassin/Logger/Stderr.pm
Sending        lib/Mail/SpamAssassin/Logger/Syslog.pm
Sending        lib/Mail/SpamAssassin/Logger.pm
Sending        lib/Mail/SpamAssassin/MailingList.pm
Sending        lib/Mail/SpamAssassin/Message/Metadata/Received.pm
Sending        lib/Mail/SpamAssassin/Message/Metadata.pm
Sending        lib/Mail/SpamAssassin/NetSet.pm
Sending        lib/Mail/SpamAssassin/PerMsgLearner.pm
Sending        lib/Mail/SpamAssassin/PersistentAddrList.pm
Sending        lib/Mail/SpamAssassin/Plugin/AWL.pm
Sending        lib/Mail/SpamAssassin/Plugin/AccessDB.pm
Sending        lib/Mail/SpamAssassin/Plugin/AntiVirus.pm
Sending        lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm
Sending        lib/Mail/SpamAssassin/Plugin/Bayes.pm
Sending        lib/Mail/SpamAssassin/Plugin/BodyEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Sending        lib/Mail/SpamAssassin/Plugin/DCC.pm
Sending        lib/Mail/SpamAssassin/Plugin/DKIM.pm
Sending        lib/Mail/SpamAssassin/Plugin/DNSEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/HTMLEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/HTTPSMismatch.pm
Sending        lib/Mail/SpamAssassin/Plugin/Hashcash.pm
Sending        lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/ImageInfo.pm
Sending        lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/MIMEHeader.pm
Sending        lib/Mail/SpamAssassin/Plugin/NetCache.pm
Sending        lib/Mail/SpamAssassin/Plugin/P595Body.pm
Sending        lib/Mail/SpamAssassin/Plugin/PDFInfo.pm
Sending        lib/Mail/SpamAssassin/Plugin/Pyzor.pm
Sending        lib/Mail/SpamAssassin/Plugin/RabinKarpBody.pm
Sending        lib/Mail/SpamAssassin/Plugin/Razor2.pm
Sending        lib/Mail/SpamAssassin/Plugin/RelayCountry.pm
Sending        lib/Mail/SpamAssassin/Plugin/RelayEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/ReplaceTags.pm
Sending        lib/Mail/SpamAssassin/Plugin/Reuse.pm
Sending        lib/Mail/SpamAssassin/Plugin/Rule2XSBody.pm
Sending        lib/Mail/SpamAssassin/Plugin/SPF.pm
Sending        lib/Mail/SpamAssassin/Plugin/Shortcircuit.pm
Sending        lib/Mail/SpamAssassin/Plugin/SpamCop.pm
Sending        lib/Mail/SpamAssassin/Plugin/Test.pm
Sending        lib/Mail/SpamAssassin/Plugin/TextCat.pm
Sending        lib/Mail/SpamAssassin/Plugin/TxRep.pm
Sending        lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending        lib/Mail/SpamAssassin/Plugin/URIDetail.pm
Sending        lib/Mail/SpamAssassin/Plugin/URIEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/URILocalBL.pm
Sending        lib/Mail/SpamAssassin/Plugin/WLBLEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/WhiteListSubject.pm
Sending        lib/Mail/SpamAssassin/Plugin.pm
Sending        lib/Mail/SpamAssassin/PluginHandler.pm
Sending        lib/Mail/SpamAssassin/RegistryBoundaries.pm
Sending        lib/Mail/SpamAssassin/Reporter.pm
Sending        lib/Mail/SpamAssassin/SQLBasedAddrList.pm
Sending        lib/Mail/SpamAssassin/SpamdForkScaling.pm
Sending        lib/Mail/SpamAssassin/SubProcBackChannel.pm
Sending        lib/Mail/SpamAssassin/Timeout.pm
Sending        lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Sending        lib/Mail/SpamAssassin/Util/MemoryDump.pm
Sending        lib/Mail/SpamAssassin/Util/Progress.pm
Sending        lib/Mail/SpamAssassin/Util/RegistrarBoundaries.pm
Sending        lib/Mail/SpamAssassin/Util/ScopedTimer.pm
Sending        lib/Mail/SpamAssassin/Util.pm
Sending        lib/Mail/SpamAssassin.pm
Sending        sa-learn.raw
Sending        spamassassin.raw
Committed revision 1694545.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7232] Getting rid of 'use bytes' crouches throughout

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232

--- Comment #4 from Mark Martinec <Ma...@ijs.si> ---
That was a biggie for backporting - not in patch size, but in potential
implications.

I hope older perls will be happy with introducing more Unicode strings in
processing.
The change is well tested in trunk and solves a couple of issues, but it is
quite
deep reaching and required compensating for the change in places, so my
intention
was to target it for 4.0, not with a minor release.
Anyway, my +0.5 for 3.4.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7232] Getting rid of 'use bytes' crouches throughout

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232

--- Comment #5 from Kevin A. McGrail <km...@pccc.com> ---
Understood.  We had two people look at it and I did testing on 5.8.6 on an old
box and 5.16.3 if it makes you feel better.  I'm at $dayjob right now but will
make sure to double check this tonight.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7232] Getting rid of 'use bytes' crouches throughout

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
                 CC|                            |kmcgrail@pccc.com
             Status|NEW                         |RESOLVED

--- Comment #3 from Kevin A. McGrail <km...@pccc.com> ---
Applying also to 3.4 branch and marking as resolved

svn commit -m 'KG: Syncing Trunk to 3.4:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232 removing use bytes'
Sending        lib/Mail/SpamAssassin/ArchiveIterator.pm
Sending        lib/Mail/SpamAssassin/AsyncLoop.pm
Sending        lib/Mail/SpamAssassin/AutoWhitelist.pm
Sending        lib/Mail/SpamAssassin/Bayes/CombineChi.pm
Sending        lib/Mail/SpamAssassin/Bayes/CombineNaiveBayes.pm
Sending        lib/Mail/SpamAssassin/Bayes.pm
Sending        lib/Mail/SpamAssassin/BayesStore/BDB.pm
Sending        lib/Mail/SpamAssassin/BayesStore/DBM.pm
Sending        lib/Mail/SpamAssassin/BayesStore/MySQL.pm
Sending        lib/Mail/SpamAssassin/BayesStore/PgSQL.pm
Sending        lib/Mail/SpamAssassin/BayesStore/Redis.pm
Sending        lib/Mail/SpamAssassin/BayesStore/SDBM.pm
Sending        lib/Mail/SpamAssassin/BayesStore/SQL.pm
Sending        lib/Mail/SpamAssassin/BayesStore.pm
Sending        lib/Mail/SpamAssassin/Conf/LDAP.pm
Sending        lib/Mail/SpamAssassin/Conf/Parser.pm
Sending        lib/Mail/SpamAssassin/Conf/SQL.pm
Sending        lib/Mail/SpamAssassin/Conf.pm
Sending        lib/Mail/SpamAssassin/DBBasedAddrList.pm
Sending        lib/Mail/SpamAssassin/Dns.pm
Sending        lib/Mail/SpamAssassin/DnsResolver.pm
Sending        lib/Mail/SpamAssassin/Locales.pm
Sending        lib/Mail/SpamAssassin/Locker/Flock.pm
Sending        lib/Mail/SpamAssassin/Locker/UnixNFSSafe.pm
Sending        lib/Mail/SpamAssassin/Locker/Win32.pm
Sending        lib/Mail/SpamAssassin/Locker.pm
Sending        lib/Mail/SpamAssassin/Logger/File.pm
Sending        lib/Mail/SpamAssassin/Logger/Stderr.pm
Sending        lib/Mail/SpamAssassin/Logger/Syslog.pm
Sending        lib/Mail/SpamAssassin/Logger.pm
Sending        lib/Mail/SpamAssassin/MailingList.pm
Sending        lib/Mail/SpamAssassin/Message/Metadata/Received.pm
Sending        lib/Mail/SpamAssassin/Message/Metadata.pm
Sending        lib/Mail/SpamAssassin/NetSet.pm
Sending        lib/Mail/SpamAssassin/PerMsgLearner.pm
Sending        lib/Mail/SpamAssassin/PersistentAddrList.pm
Sending        lib/Mail/SpamAssassin/Plugin/AWL.pm
Sending        lib/Mail/SpamAssassin/Plugin/AccessDB.pm
Sending        lib/Mail/SpamAssassin/Plugin/AntiVirus.pm
Sending        lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm
Sending        lib/Mail/SpamAssassin/Plugin/Bayes.pm
Sending        lib/Mail/SpamAssassin/Plugin/BodyEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Sending        lib/Mail/SpamAssassin/Plugin/DCC.pm
Sending        lib/Mail/SpamAssassin/Plugin/DKIM.pm
Sending        lib/Mail/SpamAssassin/Plugin/DNSEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/HTMLEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/HTTPSMismatch.pm
Sending        lib/Mail/SpamAssassin/Plugin/Hashcash.pm
Sending        lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/ImageInfo.pm
Sending        lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/MIMEHeader.pm
Sending        lib/Mail/SpamAssassin/Plugin/NetCache.pm
Sending        lib/Mail/SpamAssassin/Plugin/P595Body.pm
Sending        lib/Mail/SpamAssassin/Plugin/PDFInfo.pm
Sending        lib/Mail/SpamAssassin/Plugin/Pyzor.pm
Sending        lib/Mail/SpamAssassin/Plugin/RabinKarpBody.pm
Sending        lib/Mail/SpamAssassin/Plugin/Razor2.pm
Sending        lib/Mail/SpamAssassin/Plugin/RelayCountry.pm
Sending        lib/Mail/SpamAssassin/Plugin/RelayEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/ReplaceTags.pm
Sending        lib/Mail/SpamAssassin/Plugin/Reuse.pm
Sending        lib/Mail/SpamAssassin/Plugin/Rule2XSBody.pm
Sending        lib/Mail/SpamAssassin/Plugin/SPF.pm
Sending        lib/Mail/SpamAssassin/Plugin/Shortcircuit.pm
Sending        lib/Mail/SpamAssassin/Plugin/SpamCop.pm
Sending        lib/Mail/SpamAssassin/Plugin/Test.pm
Sending        lib/Mail/SpamAssassin/Plugin/TextCat.pm
Sending        lib/Mail/SpamAssassin/Plugin/TxRep.pm
Sending        lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending        lib/Mail/SpamAssassin/Plugin/URIDetail.pm
Sending        lib/Mail/SpamAssassin/Plugin/URIEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/URILocalBL.pm
Sending        lib/Mail/SpamAssassin/Plugin/WLBLEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/WhiteListSubject.pm
Sending        lib/Mail/SpamAssassin/Plugin.pm
Sending        lib/Mail/SpamAssassin/PluginHandler.pm
Sending        lib/Mail/SpamAssassin/RegistryBoundaries.pm
Sending        lib/Mail/SpamAssassin/Reporter.pm
Sending        lib/Mail/SpamAssassin/SQLBasedAddrList.pm
Sending        lib/Mail/SpamAssassin/SpamdForkScaling.pm
Sending        lib/Mail/SpamAssassin/SubProcBackChannel.pm
Sending        lib/Mail/SpamAssassin/Timeout.pm
Sending        lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Sending        lib/Mail/SpamAssassin/Util/MemoryDump.pm
Sending        lib/Mail/SpamAssassin/Util/Progress.pm
Sending        lib/Mail/SpamAssassin/Util/RegistrarBoundaries.pm
Sending        lib/Mail/SpamAssassin/Util/ScopedTimer.pm
Sending        lib/Mail/SpamAssassin/Util.pm
Sending        lib/Mail/SpamAssassin.pm
Transmitting file data
...........................................................................................
Committed revision 1790912.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7232] Getting rid of 'use bytes' crouches throughout

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232

--- Comment #2 from Mark Martinec <Ma...@ijs.si> ---
Works well, no need for 'use bytes' any longer, closing.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7232] Getting rid of 'use bytes' crutches throughout

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7232

Kevin A. McGrail <km...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Getting rid of 'use bytes'  |Getting rid of 'use bytes'
                   |crouches throughout         |crutches throughout

-- 
You are receiving this mail because:
You are the assignee for the bug.