You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-user@jakarta.apache.org by "Daniel F. Savarese" <df...@savarese.org> on 2002/10/10 20:34:55 UTC

Re: absolute match offset

In message <5....@biomail.ucsd.edu>, Dmitry Berans
ky writes:
>In the process of evaluating the library I came across the following 
>problem.  I use Awk* classes to search for patterns in a big (>100MB) 
>file.  What I can't figure out is how to keep track of the matchs' absolute 
>offsets (relative to the beginning of the file).  I guess, I don't quite 

I assume you're using an AwkStreamInput instance as the input.  You
don't have to do anything special to keep track of the match offset
relative to the beginning of the file.  beginOffset and endOffset
return an offset relative to the beginning of the input.  AwkStreamInput
keeps track of the offset and AwkMatcher initializes the MatchResult
with the proper offsets.

daniel



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: absolute match offset

Posted by Dmitry Beransky <db...@dembel.org>.
At 11:34 AM 10/10/2002, Daniel F. Savarese wrote:
>I assume you're using an AwkStreamInput instance as the input.  You
>don't have to do anything special to keep track of the match offset
>relative to the beginning of the file.

That's what I originally thought.  But the output I'm getting doesn't 
support this.  Given the code:


    AwkStreamInput input = new AwkStreamInput(new FileReader(genomeFile));
    org.apache.oro.text.regex.Pattern p = new 
AwkCompiler().compile(anchorPatternStr);
    AwkMatcher matcher = new AwkMatcher();

    while( matcher.contains(input, p))
         System.err.println("found at " + matcher.getMatch().beginOffset(0));

the output I get is:

found at 110
found at 460
found at 931
found at 1027
found at 413
found at 1657
found at 1756
found at 1946
found at 0
found at 55
found at 529
found at 816
found at 1965


as you can see the offsets are not consecutive.  Am I doing something wrong?


Dmitry

>>In the process of evaluating the library I came across the following 
>>problem.  I use Awk* classes to search for patterns in a big (>100MB) 
>>file.  What I can't figure out is how to keep track of the matchs' 
>>absolute offsets (relative to the beginning of the file).


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>