You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2007/10/23 17:48:49 UTC
[Bug 5696] New: cut regexp base strings at Unicode high codepoints
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5696
Summary: cut regexp base strings at Unicode high codepoints
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: other
Status: NEW
Severity: minor
Priority: P5
Component: sa-compile
AssignedTo: dev@spamassassin.apache.org
ReportedBy: jm@jmason.org
a pattern like /foo bar baz \x{e2}\x{a2}\x{ac}/ winds up with the UTF-8
codepoints corrupted as it passes through the base-extraction code. to avoid
this, we should cut the base string at the first high codepoint found.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5696] [review] cut regexp base strings at Unicode high codepoints
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5696
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #4172 is|0 |1
obsolete| |
------- Additional Comments From jm@jmason.org 2007-12-21 07:17 -------
Created an attachment (id=4211)
--> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4211&action=view)
fix r2
doh! well spotted.
the test failure was because the test used the 3.3.0 output format,
which is different from 3.2.x. and of course the # of tests was wrong.
looks like I didn't test it :(
the undef warning was illustrating a bug, I think; as far as I know it's
not safe to use $1 after another match, so to be paranoid I've changed it
to take a copy, now in trunk:
: jm 15...; svn commit -m "avoid matching with \$1 active; save it beforehand"
lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Sending lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Transmitting file data .
Committed revision 606216.
this patch fixes those for 3.2.x.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5696] [review] cut regexp base strings at Unicode high codepoints
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5696
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|minor |major
Priority|P5 |P2
------- Additional Comments From jm@jmason.org 2007-12-21 02:00 -------
fixing pri; this is actually quite a biggie, since it changes the hit rate of
8-bit rules once they're compiled.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5696] [review] cut regexp base strings at Unicode high codepoints
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5696
sidney@sidney.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|major |minor
Priority|P2 |P5
------- Additional Comments From sidney@sidney.com 2007-12-21 05:39 -------
The patch doeesn't apply cleanly to 3.2 because the number of tests in
t/re_base_extraction.t sems to be different in trunk. When I corrected for that
I still got in that test file
t/re_base_extraction..............59/115 Use of uninitialized value in length at
../lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 590.
100% Completed 85.33 rules/sec in 00m00s
100% Completed 2732.45 bases/sec in 00m00s
# Failed test 64 in t/re_base_extraction.t at line 423 fail #52
failed to find 'foobar:FOO,[l=0]' at t/re_base_extraction.t line 423.
I don't have time to invesitgate that more thoroughly right now, but at first
glance it looks like it may be a bug in the patch.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5696] [review] cut regexp base strings at Unicode high codepoints
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5696
spamassassin@dostech.ca changed:
What |Removed |Added
----------------------------------------------------------------------------
Status Whiteboard|needs 2 votes for 3.2 |needs 1 votes for 3.2
------- Additional Comments From spamassassin@dostech.ca 2007-11-06 13:56 -------
sure, +1
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5696] [review] cut regexp base strings at Unicode high codepoints
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5696
maddoc@maddoc.net changed:
What |Removed |Added
----------------------------------------------------------------------------
Status Whiteboard|needs 1 votes for 3.2 |can be commited
------- Additional Comments From maddoc@maddoc.net 2007-12-22 16:32 -------
+1
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5696] [review] cut regexp base strings at Unicode high codepoints
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5696
sidney@sidney.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status Whiteboard|needs 2 votes for 3.2 |needs 1 votes for 3.2
------- Additional Comments From sidney@sidney.com 2007-12-21 11:09 -------
+1 This is back to needing one more vote
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5696] [review] cut regexp base strings at Unicode high codepoints
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5696
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Additional Comments From jm@jmason.org 2007-12-28 05:17 -------
applied to 3.2.x:
: jm 242...; svn commit -m "bug 5696: cut regexp base strings at Unicode high
codepoints, to avoid corruption of patterns containing UTF-8"
Sending lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Sending t/re_base_extraction.t
Transmitting file data ..
Committed revision 607239.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5696] [review] cut regexp base strings at Unicode high codepoints
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5696
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|cut regexp base strings at |[review] cut regexp base
|Unicode high codepoints |strings at Unicode high
| |codepoints
Status Whiteboard| |needs 2 votes for 3.2
Target Milestone|Undefined |3.2.4
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5696] cut regexp base strings at Unicode high codepoints
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5696
------- Additional Comments From jm@jmason.org 2007-10-23 08:50 -------
Created an attachment (id=4172)
--> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4172&action=view)
fix
this patch implements it, as applied to trunk:
: jm 43...; svn commit -m "bug 5696: cut regexp base strings at Unicode high
codepoints, to avoid corruption of patterns containing UTF-8"
lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm t/re_base_extraction.t
Sending lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Sending t/re_base_extraction.t
Transmitting file data ..
Committed revision 587545.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5696] [review] cut regexp base strings at Unicode high codepoints
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5696
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status Whiteboard|needs 1 votes for 3.2 |needs 2 votes for 3.2
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.