You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by "Noel J. Bergman" <no...@devtech.com> on 2003/09/04 03:18:34 UTC

Wanted: Regex[Input|Output]Stream

Does anyone know of an implementation of the titular classes?

Basically, to compile multiple regex patterns into a stream, and then as I
read through the stream, the data should be optimally checked.  This is a
classic FSA situation.  If a match occurs, I think that I'd like to receive
an exception at that point in the stream, but the ata would still be valid,
and I can continue processing.

The read methods might throw a MatchException, whose constructor might be
loosely defined as:

  MatchException(long offset, byte b, String name);
  long offset - offset in the overall data stream
                in the match occurred.
  byte c      - the byte that caused the thrown
                MatchException.  If read(byte[])
                were called, data up to but not
                including that byte would be in
                the byte[].
  String name - A user-provided name for the pattern
                provided when the pattern was added
                to the stream.

I've looked around, and not found anything like this class.  The idea is to
be able to handle my data as normal, but be alerted to the presence of data
patterns embedded therein.

	--- Noel


RE: Wanted: Regex[Input|Output]Stream

Posted by "Noel J. Bergman" <no...@devtech.com>.
Henri,

> Any reason why this wouldn't be better served if it were on the ORO or
> Regexp lists?

I cc'd dfs, and embarrassingly, I forgot at the time that regex isn't in
Commons.  Irony is that ORO is talking about moving to Commons, AIUI.

	--- Noel


RE: Wanted: Regex[Input|Output]Stream

Posted by Henri Yandell <ba...@generationjava.com>.
Any reason why this wouldn't be better served if it were on the ORO or
Regexp lists?

Am just assuming they'd be the experts here.

Hen

On Thu, 4 Sep 2003, Noel J. Bergman wrote:

> Leo Sutic observed:
> > Just a comment: Instead of using exceptions, how about this:
> >     public interface MatchObserver {
> >        public void onMatch (long offset, byte b, String pattern);
> >    }
>
> Excellent comment; even better.  :-)  And since we're not interrupting the
> I/O stream with the exception, we can probably get rid of the byte.
>
> An alternative would be to add an Observer (actually, wouldn't that be a
> Listener, to remain consistent with Java terminology? :-)) with the pattern,
> although it seems that none of the regex engines support compiling multiple
> patterns, which I find truely bizzare.  That would allow the Listener code
> to execute as each pattern is found in the stream.
>
> I would expect further interface tweaking, but the first thing we need is
> code capable of supporting this construct.
>
> 	--- Noel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>


RE: Wanted: Regex[Input|Output]Stream

Posted by "Noel J. Bergman" <no...@devtech.com>.
Leo Sutic observed:
> Just a comment: Instead of using exceptions, how about this:
>     public interface MatchObserver {
>        public void onMatch (long offset, byte b, String pattern);
>    }

Excellent comment; even better.  :-)  And since we're not interrupting the
I/O stream with the exception, we can probably get rid of the byte.

An alternative would be to add an Observer (actually, wouldn't that be a
Listener, to remain consistent with Java terminology? :-)) with the pattern,
although it seems that none of the regex engines support compiling multiple
patterns, which I find truely bizzare.  That would allow the Listener code
to execute as each pattern is found in the stream.

I would expect further interface tweaking, but the first thing we need is
code capable of supporting this construct.

	--- Noel


RE: Wanted: Regex[Input|Output]Stream

Posted by Leo Sutic <le...@inspireinfrastructure.com>.
Just a comment: Instead of using exceptions, how about this:

    public interface MatchObserver {
        public void onMatch (long offset, byte b, String pattern);
    }

    public class Regex(Input|Output)Stream {

        public Regex(Input|Output)Stream ((Input|Output)Stream stream, 
                                          MatchObserver observer);
    }

and use callbacks instead?

I looked through the ORO classes, but it appears that the Perl5 regexes
can't be used with streams. (Possible because they aren't true FSAs.)

/LS

> From: Noel J. Bergman [mailto:noel@devtech.com]