You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oro-user@jakarta.apache.org by Matthew Stevens <ma...@sun.com> on 2001/11/27 16:14:50 UTC

Problems with non-ascii Character Encoding

Hi,

I have used ORO since beginning of 2001 for an api mapping tool for the
iPlanet Migration Toolbox.  Our software can process source files and pass
them through rules defined in XML.  The rules allow for one or more PRIMARY
matches.  If a PRIMARY match is found (using the <Perl5Matcher>.contains()
apis) then the matching region is passed to one or more SECONDAY
replacements rules associated with the PRIMARY match.  In this way,
developers can find targeted regions for api mapping and provide a 'program'
of changes to apply to that region to achieve an complex api mapping.
Enough of an introduction...

We now have customers in the PACRIM who are trying to use our software on
source files containing non-LATIN (double-byte) characters.  We have found
that the <Perl5Matcher>.contains() method fails with ArrayIndexOutofBounds
against arbitrary patterns with arbitary source.  In the case of Japanese,
Shift-JIS is the encoding commonly used.  Although it is uncommon to find
any double-byte characters in the compilable source (Java), there is often
double-byte data in the comments, DocComments and static String data.

I am interested to know whether the ORO project was built with double-byte
characters in mind.  If so, then I have found a bug and have reproducible
case.  Otherwise, I will have to use an workaround. Your comments and
suggestions are appreciated.

regards,
matt

* * * * * * * * * * * * * * * * * * * * * * * * *
* Matthew Stevens, Sun/iPlanet, Senior Engineer *
* matthew.stevens@sun.com                       *
* 610-415-2212 (office)    610-331-8511 (cell)  *
* * * * * * * * * * * * * * * * * * * * * * * * *


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>