You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-user@jakarta.apache.org by Gang Wu <gw...@molbio.mgh.harvard.edu> on 2002/07/29 22:38:42 UTC

Pattern match result

Hi everyone,

When I tried to use ORO package in my project, I got confused by
the search result. Attached is a piece of my code for searching pattern
"gaaga" in sequence "gaagaagaagaaga". I expect 4 matches(0,3,6,9) in the
output, but
the actual output only gives 2 matches(0, 6 (following the source code)). It
seems
 the code starts doing the next match after the last character of last match
instead of the second character of the last match. Does anyone know this is
a bug or it's designed to work in this way?

It's easy to get the result I expected by changing the 'input' string every
time while doing pattern match, but the performance suffers. Does anyone
know any better way?

I deeply appreciate if anyone can give me any assistance!

Gang Wu


Source Code:
==============================================================
import org.apache.oro.text.regex.*;

public class TestYard {
    public static void main(String [] args) {
        try {
            int counter = 0;
            String seq = "gaagaagaagaaga";
            String pat = "gaaga";
            System.out.println("Input: " + seq + "\nPattern: " + pat);
            PatternCompiler compiler = new Perl5Compiler();
            Pattern pattern = compiler.compile(pat);
            PatternMatcherInput input = new PatternMatcherInput(seq);
            PatternMatcher matcher = new Perl5Matcher();
            while(matcher.contains(input, pattern)) {
                MatchResult matchResult = matcher.getMatch();
                System.out.println("beginOffset: " +
matchResult.beginOffset(0)
                    + " " + matchResult.toString());
                counter = counter + 1;
            }
            System.out.print("found: " + counter);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
====================================================================

Output:
====================================================================
Input: gaagaagaagaaga
Pattern: gaaga
beginOffset: 0 gaaga
beginOffset: 6 gaaga
found: 2
====================================================================


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>