You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-dev@jakarta.apache.org by Takashi Okamoto <to...@rd.nttdata.co.jp> on 2000/12/01 02:49:38 UTC

Re: bug with CP1252 characters and Perl5Matcher

>Special characters in the CP1252 character set (but not in ISO Latin-1)
>cause a ArrayIndexOutOfBoundsException deep within Perl5Matcher.  The
>problem occurs if I use "[^.]*\." as the pattern but not if I use
>"Test" as the pattern.  The characters are fancy forms of apostrophe
>and double-quotes (decimal 146, 147, and 148).  Use IE 5 to view the
>test files to see what they look like.  I am running Jakarta ORO 2.0,
>JDK 1.2.2, WinNT 4.0sp5.

Could you read Jakarta ORO 2.0.1 TODO file?

It says,

o Make Perl5 character classes (e.g., [abcde...]) fully support Unicode
  input.  Currently character classes only match 8-bit characters.

I posted a patch for this problem.
You can use this patch for temporaly.
May be this patch consumes much  memory (about 8k byte).
Read attached file.
--------------
Takashi Okamoto