You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oro-dev@jakarta.apache.org by "Daniel F. Savarese" <df...@savarese.org> on 2002/01/21 08:36:32 UTC

Re: Qusetion

In message <00...@eurolink.stpn.soft.net>, "Hardeep Si
ngh" writes:
>I have had this problem for a long time now:
...
>However, when I try to use this to search into a binary file (esp. a JAR
>file), it gives me
>
>Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
        at org.apache.oro.text.awk.AwkMatcher._search(AwkMatcher.java:717)

The awk package and AwkMatcher are implemented to only work with input
containing characters with 8-bit values (0-255).  This is because it is
a straight-up DFA implementation, which results in fast matches (no
backtracking) but extremely large state transition tables if the range
of input is expanded beyond 8 bits.  This will be documented more
obviously in the future.  At any rate, the reason you're getting the
exception is because a char value greater than 255 is being encountered,
for which no state transition is defined.  For full Unicode, use the
Perl or glob matchers.

daniel

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Qusetion

Posted by Hardeep Singh <ha...@in.velocient.com>.

But, how can it get a value larger than 255? I mean, even if there is a
value greater than one byte, it should be interpreted as two consecutive
characters, not just one. The problem at hand requires the speed. Hence,
what can I do to make it either just ignore unicode files or ignore the
higher bit (this shud work correctly for UTF 8).

----- Original Message -----
From: "Daniel F. Savarese" <df...@savarese.org>
To: "ORO Developers List" <or...@jakarta.apache.org>
Sent: Monday, January 21, 2002 1:06 PM
Subject: Re: Qusetion


>
> In message <00...@eurolink.stpn.soft.net>,
"Hardeep Si
> ngh" writes:
> >I have had this problem for a long time now:
> ...
> >However, when I try to use this to search into a binary file (esp. a JAR
> >file), it gives me
> >
> >Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
>         at org.apache.oro.text.awk.AwkMatcher._search(AwkMatcher.java:717)
>
> The awk package and AwkMatcher are implemented to only work with input
> containing characters with 8-bit values (0-255).  This is because it is
> a straight-up DFA implementation, which results in fast matches (no
> backtracking) but extremely large state transition tables if the range
> of input is expanded beyond 8 bits.  This will be documented more
> obviously in the future.  At any rate, the reason you're getting the
> exception is because a char value greater than 255 is being encountered,
> for which no state transition is defined.  For full Unicode, use the
> Perl or glob matchers.
>
> daniel
>
>
>
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>