You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2015/06/22 21:06:26 UTC

[Bug 7215] New: Towards supporting IDNA (Internationalizing Domain Names in Applications)

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

            Bug ID: 7215
           Summary: Towards supporting IDNA (Internationalizing Domain
                    Names in Applications)
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Libraries
          Assignee: dev@spamassassin.apache.org
          Reporter: Mark.Martinec@ijs.si

Opening this ticket to coordinate our efforts towards supporting
Internationalizing Domain Names (which is also coupled with
better use of Unicode features of Perl).

As Kevin's plan is to play with this during Summer, I'm attaching
my current work in this area to avoid duplicating work. None of
this is yet set in stone, so it is open to reshuffling of code
or ditching/replacing/reorganizing it altogether. The main idea
is to provide some tools and example code.

It makes use of a perl module Net::LibIDN, and issues a warning
if this module is not available (and then the feature is off).
Should be compatible with existing code. Might even work with
perl 5.8.9, although 5.12 or later would be a better choice
for its much improved Unicode support.

I'm running this changed code (in SA trunk (4.0), Perl 5.20 and
5.22) for the last two months: it solves my immediate problem
in turning U-labels (in Unicode URIs) to ASCII Compatible Encoding
(ACE) for the purpose of URI lookups against black/white-lists.
Not perfect, but better than nothing.

The main problem there is that a text parser (or HTML parser)
does a poor job of extracting Unicode URIs from text, e.g. it
has no notion of Unicode whitespace or a set of characters
allowed in U-labels. Instead of the more complex task of fixing
the text parser, as a stop-gap solution I added some sanitation
code for extracted URIs: trimming prefix and suffix characters
that cannot appear in a valid Unicode URI. This sanitation code
would eventually be removed when a parser is improved.

Provided general-purpose subroutines are:
- MS::Util::idn_to_ascii
- MS::Util::is_valid_utf_8

and the three user-defined character classes:
- InIDNAWhitespace,
- InIDNAFullStop,
- InIDNA2008

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |4.0.0

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #10 from Mark Martinec <Ma...@ijs.si> ---
> My expectation with using more UTF8 (or 16) as the internal guts for SA is
> that perl 5.14 becomes a baseline install where that works well from my
> memory at least.
> I want to set a good expectation that newer perl is needed for SA 4.X unless
> you think I'm just being elitist.

I agree that 5.14 or maybe 5.12 is the baseline for more serious
Unicode support. Although we may do it in steps: bump up the minimal
version one notch with each specific problem we encounter during
development. With basic Unicode support (and even the user-defined
character classes) it seems the 5.8.9 is still able to cope somehow.

> unless you think I'm just being elitist.

Elitist? We are running 5.22 and 5.20 on our servers here :))
Have to dig deep to find something running 5.16, let alone 5.14
around here.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #20 from Mark Martinec <Ma...@ijs.si> ---
> Assuming that we still want to leave this in trunk and NOT backport to 3.4/

Yes, I think these changes are too heavyweight for a minor release.


[ but it's also true that I would not like to run 3.4 now at our site
  in production (with a reasonably fresh version of perl), as the
  4.0(=trunk) is better behaved and has been better tested (by me),
  compared to 3.4 with multiple lightly tested backports ]

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #14 from Mark Martinec <Ma...@ijs.si> ---
Collected a couple of debug log entries produced by idn_to_ascii()
to get some feeling on how successful the conversion is.

Seems the IDN handling can be quite useful as shown in the following
samples. Most of these are in Russian Cyrillic, some are Slovenian
(and I remember some German samples, but can't find them in my
recent logs):

util: idn_to_ascii: converted to ACE (0):
  /www.грузтранском.рф/ -> /www.xn--80afmnkeilbmhk.xn--p1ai/
  /www.отличное-мнение.рф/ -> /www.xn----itbbaldqlgdbdd6c9d.xn--p1ai/
  /www.правильный-директ.рф/ -> /www.xn----7sbgjfpcfnewtnj6a0kpa.xn--p1ai/
  /грамотное-сео.рф/ -> /xn----7sbijb3bhhbdnti.xn--p1ai/
  /грузтранском.рф/ -> /xn--80afmnkeilbmhk.xn--p1ai/
  /frižider.si/ -> /xn--friider-fxb.si/
  /www.žarnice.si/ -> /www.xn--arnice-2pb.si/
  /žarnice.si/ -> /xn--arnice-2pb.si/
  /www.контролируемый-имидж.рф/ ->
/www.xn----htbbggcafgkndfnb5ad5ay6n.xn--p1ai/
  /www.на-отдых-в-сахару.рф/ -> /www.xn------5cddalo2fm2ajiwwf9g.xn--p1ai/
  /www.работа-на-себя.рф/ -> /www.xn-----6kcabde5a3ehtuh4q.xn--p1ai/
  /www.стройметком.рф/ -> /www.xn--e1ahegchekikf.xn--p1ai/
  /делай-деньги.ком.рф/ -> /xn----7sbkbcddzes1a4p.xn--j1aef.xn--p1ai/
  /заказ-грузоперевозок.орг.рф/ ->
/xn----7sbajemakccd1aj5cdblpe9c.xn--c1avg.xn--p1ai/
  /играть-за-деньги.рф/ -> /xn-----6kcbmegiogj2d5a3a4mh.xn--p1ai/
  /идеал-мастер.рф/ -> /xn----7sbbnfdp1ak6bjm.xn--p1ai/
  /курсы-шоуменов.рф/ -> /xn----dtbislhedmkue7dyb.xn--p1ai/
  /люди-и-цифры.рф/ -> /xn-----jlcqbbp0c9as2d0a.xn--p1ai/
  /на-отдых-в-сахару.рф/ -> /xn------5cddalo2fm2ajiwwf9g.xn--p1ai/
  /обучаем-иностранному.рф/ -> /xn----7sbbbvt0adhbachd0aprjp8d.xn--p1ai/
  /плавучая-баня.рф/ -> /xn----7sbabed5dwak7b5b6fe.xn--p1ai/
  /работа-на-себя.рф/ -> /xn-----6kcabde5a3ehtuh4q.xn--p1ai/
  /скайвуд-лиственница.рф/ -> /xn----7sbbgbkjwcdjr3aa2cirm2e.xn--p1ai/
  /такси-московское.орг.рф/ -> /xn----7sbhmmlcbpubc4aede.xn--c1avg.xn--p1ai/


...but there is also plenty of samples which indicate a miserable
failure of the URL extraction code in properly delimiting an URL
from surrounding text when dealing with UTF-8 encoded (normalized)
text:

util: idn_to_ascii: alternative dots normalized:
  /自由。”/ -> /自由.”/

util: idn_to_ascii: conversion to ACE failed (0):
  /他一向令女人神魂颠倒的抚摸,就真的那么令她讨厌吗?/
  /www.EPChinaShow.com&t=China’s Largest and Most Authorized Electric Power
Exhibition/
  /经五年没有见到你了,求求你了妈妈,陪陪我,好不好?”/

util: idn_to_ascii: converted to ACE (0):
  /www.eme2015.org】/ -> /www.eme2015.xn--org-003b/
  /www.pdma.org)会员/ -> /www.pdma.xn--org)-ye6ft1z/
  /www.uradni-list.si•/ -> /www.uradni-list.xn--si-g3t/
  /#IUS_INFO_ČISTOPISI/ -> /xn--#ius_info_istopisi-3gc/
  /#STROKOVNI_ČLANKI/ -> /xn--#strokovni_lanki-27b/
  /179英文.files/ -> /xn--179-4p8fh21k.files/
  /kjn.uradni-list.si / -> /kjn.uradni-list.si /
  /t.c…”/ -> /t.xn--c...-jb7a/
  /www.WaterNexus.net│info@WaterNexus.net/ ->
/www.WaterNexus.xn--netinfo@waternexus-w78l.net/
  /www.adatours.com / -> /www.adatours.com /
  /www.aijssnet.com With/ -> /www.aijssnet.com with/
  /www.aloftcupertino.com / -> /www.aloftcupertino.com /
  /www.defensedaily.com that/ -> /www.defensedaily.com that/
  /www.disclaimer-uk.wur.nl / -> /www.disclaimer-uk.wur.nl /
  /www.eme2015.org】/ -> /www.eme2015.xn--org-003b/
  /www.hotchips.org   For/ -> /www.hotchips.org for/
  /www.hoti.org EarlyRegistration/ -> /www.hoti.org earlyregistration/
  /www.laboratory-journal.com”/ -> /www.laboratory-journal.xn--com-9o0a/
  /www.ontoresinc.com)provides/ -> /www.ontoresinc.com)provides/
  /www.particulars.eu”/ -> /www.particulars.xn--eu-02t/
  /www.pdma.org)会员/ -> /www.pdma.xn--org)-ye6ft1z/
  /www.sunseeker.deÂ/ -> /www.sunseeker.xn--de-qia/
  /Õ÷¸åº¯Ó¢ÎÄ1.files/ -> /xn-- o 1-7ea01bd9ezbpw814d5ma.files/
  /”http/ -> /xn--http-fb7a/
  /中文11.files/ -> /xn--11-py2cs33g.files/
  /自由.”/ -> /xn--sny74y.xn--ivg/
  /ó/ -> /xn--kda/

Seems the URL extraction code is an area that calls for much more
love in the near future. Properly recognizing Unicode delimiters
is one obvious defect, but a trickier one is probably dealing with
recognizing boundaries in Chinese, Japanese, and Korean writing.
Seems it would be valuable to reach contributors of the project:
  http://emaillab.jp/spamassassin/ja-patch/

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #13 from Mark Martinec <Ma...@ijs.si> ---
> Hah. Well jenkins is stable and 5.8.6 tests worked though a bit noisy with
> INFO: module Net::LibIDN not available warnings.

Can turn that warn() into info() for comfort, at least temporarily.

> For 4.X, we need good, stable UTF support which in my experience means using
> 5.14+.  
> I could examine other distros but I think we can require 5.14.8+ for 4.X
> especially since I'm expecting distros won't include 4.X except on the next
> major release.
> Any arguments against changing trunk INSTALL to require 5.14.8 as well as
> the PACKAGING, Makefile.PL and UPGRADE files?  I can also look at making
> Makefile.PL bootstrap with system perl and download perlbrew to make a newer
> perl available.

I think it's way to early to set the minimal version of perl as a
firm requirement. We haven't even started the more intricate work,
it's still a long way to 4.0.  It's fine to make 5.14 as a recommended
minimal version, but I'd prefer to enforce such in Makefile.PL much
closer to a 4.0 release.



For starters I'd be happy to have 5.10 as a firm requirement,
which will enable the use of a possessive quantifier syntactic sugar
( "?+","*+", "++", "{min,max}+" ), particularly useful in rules,
and the defined-or operator ( // ), which can cut down the clutter
in code somewhat:
( $a // $b  is equivalent to:  defined $a ? $a : $b,
  similarly: $c //= $d  instead of:  $c = $d unless defined $c ).

Also the Digest::SHA module became a core module, so we can get
rid of Digest::SHA1 as a backward compatibility fallback.

Also in 5.10: (perl5100delta) The regular expression engine is
no longer recursive, meaning that patterns that used to overflow
the stack will either die with useful explanations, or run
to completion.

And: (perl5100delta) Alternations, where possible, are optimised
into more efficient matching structures. String literal alternations
are merged into a trie and are matched simultaneously.  This means
that instead of O(N) time for matching N alternations at a given
point, the new code performs in O(1) time.
[...] Note: Much code exists that works around perl's historic
poor performance on alternations. Often the tricks used to do so
will disable the new optimisations.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #18 from Mark Martinec <Ma...@ijs.si> ---
partly related:

trunk:
 Added a test case for international mail (as allowed
 by RFC 6532 - SMTPUTF8) and a test
  Adding t/data/nice/unicode1
  Adding t/header_utf8.t
Committed revision 1707600.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #16 from Mark Martinec <Ma...@ijs.si> ---
Created attachment 5330
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5330&action=edit
Let RegistryBoundaries.pm be able to deal with IDN

Bug 7215: Towards supporting IDNA - handle IDN domain boundaries
  Sending lib/Mail/SpamAssassin/Conf.pm
  Sending lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
  Sending lib/Mail/SpamAssassin/RegistryBoundaries.pm
  Sending lib/Mail/SpamAssassin/Util.pm
  Sending rules/20_aux_tlds.cf
Committed revision 1703247.

Let RegistryBoundaries handle Internationalizing Domain Names in Unicode,
update documentation on directives util_rb_tld, util_rb_2tld, util_rb_3tld,
update comment in 20_aux_tlds.cf.


This is now possible too (but not required, handles ACE just fine like before):

util_rb_tld कॉम 佛山 慈善 集团 在线 한국 点看 คอม ভারত 八卦 موقع 公益 公司 移动 我爱你
util_rb_tld москва қаз онлайн сайт срб бел קום 时尚 淡马锡 орг नेट 삼성 சிங்கப்பூர் 商标
util_rb_tld 商店 商城 дети мкд 新闻 工行 كوم 中文网 中信 中国 中國 娱乐 谷歌 భారత్ ලංකා
util_rb_tld ભારત भारत 网店 संगठन 餐厅 网络 ком укр 香港 飞利浦 台湾 台灣 手机 мон الجزائر
util_rb_tld عمان ایران امارات بازار الاردن بھارت المغرب السعودية سودان مليسيا
닷컴 政府
util_rb_tld شبكة გე 机构 组织机构 健康 ไทย سورية рус рф تونس 大拿 みんな グーグル 世界 ਭਾਰਤ
util_rb_tld 网址 닷넷 コム 游戏 vermögensberater vermögensberatung 企业 信息 مصر قطر 广东
util_rb_tld இலங்கை இந்தியா հայ 新加坡 فلسطين 政务

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #12 from Kevin A. McGrail <km...@pccc.com> ---
(In reply to Mark Martinec from comment #10)
> > My expectation with using more UTF8 (or 16) as the internal guts for SA is
> > that perl 5.14 becomes a baseline install where that works well from my
> > memory at least.
> > I want to set a good expectation that newer perl is needed for SA 4.X unless
> > you think I'm just being elitist.
> 
> I agree that 5.14 or maybe 5.12 is the baseline for more serious
> Unicode support. Although we may do it in steps: bump up the minimal
> version one notch with each specific problem we encounter during
> development. With basic Unicode support (and even the user-defined
> character classes) it seems the 5.8.9 is still able to cope somehow.
> 
> > unless you think I'm just being elitist.
> 
> Elitist? We are running 5.22 and 5.20 on our servers here :))
> Have to dig deep to find something running 5.16, let alone 5.14
> around here.

Hah. Well jenkins is stable and 5.8.6 tests worked though a bit noisy with
INFO: module Net::LibIDN not available warnings.

For 4.X, we need good, stable UTF support which in my experience means using
5.14+.  

That said, RHEL/CentOS 5 which isn't EOL until 2017 ships with 5.8.8.
RHEL/CentOS 6 ships with 5.10.1. And RHEL/CentOS 7 ships with 5.16.3.  

I could examine other distros but I think we can require 5.14.8+ for 4.X
especially since I'm expecting distros won't include 4.X except on the next
major release.

To make things easier for those with older perls, we can document and even
provide some automation to use the system/distro perl to bootstrap a newer
version of perl dedicated for SA with something like perlbrew.  It will require
compilation tools/libraries but effectively it's like installing your own JRE
for a specific product.

See https://github.com/gugod/App-perlbrew under the Synopsis to see just how
easily people can add alternate perl versions to their system.  

Any arguments against changing trunk INSTALL to require 5.14.8 as well as the
PACKAGING, Makefile.PL and UPGRADE files?  I can also look at making
Makefile.PL bootstrap with system perl and download perlbrew to make a newer
perl available.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #15 from Mark Martinec <Ma...@ijs.si> ---
trunk:
  add Net::LibIDN as an optional dependency,
  add one more missing call to idn_to_ascii() in URIDNSBL.pm
Sending lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Committed revision 1695622.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #22 from Kevin A. McGrail <km...@apache.org> ---
Continuing to target this for 4.0.  Working hard to produce our last
(hopefully) 3.4.2 release.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #8 from Mark Martinec <Ma...@ijs.si> ---
> NOTE: For IDN work, I think will want to require a newer perl and possibly
> using perlbrew or something similar for SA to install it's own perl
> installation so we aren't distro dependent.  Thoughts?

The oldest 5.8 in perlbrew is 5.8.9, tried it and it works.
I didn't notice the error because I did have a Net::LibIDN installed,
which did the dots replacement instead of our explicit code,
thus masking the problem in the idn_dots.t test.

If we can afford to have multiple versions of perl installed and
running under jenkins, perlbrew is probably the most straightforward.
Dependency modules would need to be installed (e.g. by cpanm) into
each brew instance.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #5313|0                           |1
        is obsolete|                            |

--- Comment #3 from Mark Martinec <Ma...@ijs.si> ---
Created attachment 5322
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5322&action=edit
Introduce Util::idn_to_ascii and make use of it

Here is a focused variant of the previously attached patch,
which provides the Util::idn_to_ascii (which calls Net::LibIDN
if it is available and is needed), and makes use of it where
necessary (i.e. wherever DNS query is being assembled).

The following feature as mentioned in the opening posting
is _not_ included with this patch - it should be implemented
elsewhere:
| Instead of the more complex task of fixing
| the text parser, as a stop-gap solution I added some sanitation
| code for extracted URIs: trimming prefix and suffix characters
| that cannot appear in a valid Unicode URI. This sanitation code
| would eventually be removed when a parser is improved.

Similarly the user-defined character classes InIDNAWhitespace,
InIDNAFullStop, and InIDNA2008 are _not_ included in this patch
so that it can remain clean and focused on a single task of
introducing idn_to_ascii() and is_valid_utf_8().

trunk:
  Sending lib/Mail/SpamAssassin/AsyncLoop.pm
  Sending lib/Mail/SpamAssassin/Dns.pm
  Sending lib/Mail/SpamAssassin/DnsResolver.pm
  Sending lib/Mail/SpamAssassin/Plugin/AskDNS.pm
  Sending lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
  Sending lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
  Sending lib/Mail/SpamAssassin/Util.pm
Committed revision 1694252.


Please yell if this is for some reason unacceptable :)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kmcgrail@pccc.com

--- Comment #5 from Kevin A. McGrail <km...@pccc.com> ---
I tested locally with 5.8.6 and recreated.  testing now with 5.17.8

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #4 from Mark Martinec <Ma...@ijs.si> ---
I see that Jenkins is not happy: the t/idn_dots.t test failed,
although it does pass here (perl 5.22). Seems like the version
of perl on Jenkins is 5.8.6 (i.e. 11 years old:) .
Will look for a workaround...

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #2 from Mark Martinec <Ma...@ijs.si> ---
Some examples from our logs:

util: idn_to_ascii: converted to ACE (0):
  /www.tretjičlen.si/ -> /www.xn--tretjilen-qfb.si/

util: idn_to_ascii: converted to ACE (0):
  /www.fenster-türen-technik.de/ -> /www.xn--fenster-tren-technik-xec.de/

idn_to_ascii: converted to ACE (0):
  /www.zdš.si/ -> /www.xn--zd-mta.si/

util: idn_to_ascii: extracted:
  /www.ichc2016.com’’/ -> /www.ichc2016.com/

util: idn_to_ascii: extracted:
  /www.incose.org)会员/ -> /www.incose.org/

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #9 from Kevin A. McGrail <km...@pccc.com> ---
My expectation with using more UTF8 (or 16) as the internal guts for SA is that
perl 5.14 becomes a baseline install where that works well from my memory at
least.

I want to set a good expectation that newer perl is needed for SA 4.X unless
you think I'm just being elitist.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
                 CC|                            |apache@hege.li
             Status|NEW                         |RESOLVED

--- Comment #23 from Henrik Krohns <ap...@hege.li> ---
Isn't all this committed to trunk ages ago? Everything works fine as I
understand. Closing.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #21 from Kevin A. McGrail <km...@pccc.com> ---
Understood.  My plan is not to backport the full IDN stuff.

I will have a few more bugs backported and then that will be 3.4.2.

Then perhaps we get 4.0 (3.5?) moving since these major changes have been
tested in production.

I can switch to the same version too for testing.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #19 from Kevin A. McGrail <km...@pccc.com> ---
Hi Mark, 

Assuming that we still want to leave this in trunk and NOT backport to 3.4/

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

Benny Pedersen <me...@junc.eu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |me@junc.eu

--- Comment #11 from Benny Pedersen <me...@junc.eu> ---
(In reply to Kevin A. McGrail from comment #7)
> Failed on 5.14.8 likely your use bytes change will fix it though.
> 
> NOTE: For IDN work, I think will want to require a newer perl and possibly
> using perlbrew or something similar for SA to install it's own perl
> installation so we aren't distro dependent.  Thoughts?

there is precompiled problems everywhere, but i keep away from them :=)

gentoo and freebsd works for me, and google kill the rest of my day now :/

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #7 from Kevin A. McGrail <km...@pccc.com> ---
Failed on 5.14.8 likely your use bytes change will fix it though.

NOTE: For IDN work, I think will want to require a newer perl and possibly
using perlbrew or something similar for SA to install it's own perl
installation so we aren't distro dependent.  Thoughts?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #1 from Mark Martinec <Ma...@ijs.si> ---
Created attachment 5313
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5313&action=edit
Provides idn_to_ascii() and is_valid_utf_8(), and some char classes

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #17 from Mark Martinec <Ma...@ijs.si> ---
trunk:
 Bug 7215: Towards supporting IDNA
  - tags AUTHORDOMAIN and SENDERDOMAIN to ACE,
  - add metadata X-AuthorDomain and X-SenderDomain (to facilitate testing),
  - domain to ACE in a call to Mail::DKIM::AuthorDomainPolicy->fetch
Sending lib/Mail/SpamAssassin/PerMsgStatus.pm
Sending lib/Mail/SpamAssassin/Plugin/DKIM.pm
Committed revision 1707578.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7215] Towards supporting IDNA (Internationalizing Domain Names in Applications)

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7215

--- Comment #6 from Mark Martinec <Ma...@ijs.si> ---
got it:

trunk:
  Override the silly global "use bytes", breaks Unicode handling
    Sending lib/Mail/SpamAssassin/Util.pm
Committed revision 1694272.


This common idiom 'use bytes' will keep biting us on the way to
better Unicode support. Should get rid of it eventually throughout.

-- 
You are receiving this mail because:
You are the assignee for the bug.