You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-user@jakarta.apache.org by "Daniel F. Savarese" <df...@savarese.org> on 2001/07/26 16:25:54 UTC

Re: searching in a text file

In message <3B...@dia.uniroma3.it>, Andrea Palmieri writes:
>I am new to Oro. I noticed that Perl5StreamInput and methods
>manipulating Perl5StreamInput have been removed in the version 2.0. Is
>there any other way to seach text from a file?

Read in the entire file and do the search in memory.  This is what
winds up happening for most regular expressions anyway.  There's
a paper (mentioned in the CHANGES file) explaining why.  The instant
you start throwing in stuff like .*, any stream matcher is going to
have to read in the entire input anyway and probably backtrack after
htitting the end.  This is pretty much why it was removed, so as not
to fool the programmer something more efficient was going on.  The
situation is slightly different with AWK expressions, so AwkStreamInput
remains and you can use that to directly search a stream.

Stream matching is something that may have to be revisited in a future
version, but a better interface will have to be devised and, at least
for Perl expressions, a contract specified that chunks the input like
expect does, so you accept that there's a limited buffering/lookahead
in the stream.  Another alternative for Perl is to make all stream
matching non-greedy.  At any rate, stream matching used to be a
distinguishing feature of the software, but I observed how it was
behaving most of the time, did a little analysis and dug up the ACM
paper which confirmed my analysis, so Perl5StreamInput/Perl5Reader had
to be removed.

daniel