You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Chris Hill <ch...@wolfram.com> on 2000/07/25 16:04:25 UTC

Streaming input to the parser

I've been working with the Xerces in an attempt to send it parts of the 
input data as it becomes available to me.  I provided a custom InputSource 
that hands out a custom BinInputStream.  I have looked at the progressive 
parse example and I decided to try to put "marker comments" into the stream 
at the end of each chunk and pause parsing when I encounter one until more 
data arrives.  Will I have any probelms with this approach? Is there a 
better way to do this?

I'm concerned that the parser might not read more data from the stream 
after it has exausted the data from the first read (if the amount of data 
available is less than it requested).  I need to provide input to the 
parser in chunks, it's not a matter of waiting for more data to arrive, I 
need to process it in segments.

Chris


Re: Streaming input to the parser

Posted by Dean Roddey <dr...@charmedquark.com>.
> How does the parser decide that there is nothing left to parse?

If the input stream returns zero bytes, then it assumes its done, or at
least there is no more data to be gotten. If you aren't at the end of the
legal document, that will be reported as a unexpected end of file.

You need to reset the parser after you finish a legal document, i.e. it will
only parse a valid document. Once its seen a valid documet, you must start a
new parse.

--------------------------
Dean Roddey
The CIDLib C++ Frameworks
Charmed Quark Software
droddey@charmedquark.com
http://www.charmedquark.com

"You young, and you gotcha health. Whatchoo wanna job fer?"


>
> I'm having a little difficulty with my comment marker approach under
> certain conditions.  For example, I first send a doctype declaration
> followed by a comment, but my comment handler doesn't seem to get
> called.  If I send more data it crashes.  This example is actually the
main
> reason I need this functionality, I want to load a doctype declaration
that
> has many character entity definitions and reuse those definitions in each
> subsequent data block passed to the parser.
>
> If I just send an opening tag+comment marker then a data+comment marker,
it
> works and I can continue to send more data+comment markers until I close
> the opening tag.
>
> At 08:03 PM 7/25/2000, you wrote:
> >I think that your approach will work.  The parser will continue to read
> >from an input stream so long as the previous read returned anything at
> >all.
> >
> >Andy Heninger
> >IBM XML Technology Group, Cupertino, CA
> >heninger@us.ibm.com
> >
> >
> >
> >----- Original Message -----
> >From: "Chris Hill" <ch...@wolfram.com>
> >
> > > I've been working with the Xerces in an attempt to send it parts of
the
> > > input data as it becomes available to me.  I provided a custom
> >InputSource
> > > that hands out a custom BinInputStream.  I have looked at the
> >progressive
> > > parse example and I decided to try to put "marker comments" into the
> >stream
> > > at the end of each chunk and pause parsing when I encounter one until
> >more
> > > data arrives.  Will I have any probelms with this approach? Is there a
> > > better way to do this?
> > >
> > > I'm concerned that the parser might not read more data from the stream
> > > after it has exausted the data from the first read (if the amount of
> >data
> > > available is less than it requested).  I need to provide input to the
> > > parser in chunks, it's not a matter of waiting for more data to
arrive,
> >I
> > > need to process it in segments.
> > >
> > > Chris
> > >
> >
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> >For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>


Re: Streaming input to the parser

Posted by Chris Hill <ch...@wolfram.com>.
How does the parser decide that there is nothing left to parse?

I'm having a little difficulty with my comment marker approach under 
certain conditions.  For example, I first send a doctype declaration 
followed by a comment, but my comment handler doesn't seem to get 
called.  If I send more data it crashes.  This example is actually the main 
reason I need this functionality, I want to load a doctype declaration that 
has many character entity definitions and reuse those definitions in each 
subsequent data block passed to the parser.

If I just send an opening tag+comment marker then a data+comment marker, it 
works and I can continue to send more data+comment markers until I close 
the opening tag.

At 08:03 PM 7/25/2000, you wrote:
>I think that your approach will work.  The parser will continue to read
>from an input stream so long as the previous read returned anything at
>all.
>
>Andy Heninger
>IBM XML Technology Group, Cupertino, CA
>heninger@us.ibm.com
>
>
>
>----- Original Message -----
>From: "Chris Hill" <ch...@wolfram.com>
>
> > I've been working with the Xerces in an attempt to send it parts of the
> > input data as it becomes available to me.  I provided a custom
>InputSource
> > that hands out a custom BinInputStream.  I have looked at the
>progressive
> > parse example and I decided to try to put "marker comments" into the
>stream
> > at the end of each chunk and pause parsing when I encounter one until
>more
> > data arrives.  Will I have any probelms with this approach? Is there a
> > better way to do this?
> >
> > I'm concerned that the parser might not read more data from the stream
> > after it has exausted the data from the first read (if the amount of
>data
> > available is less than it requested).  I need to provide input to the
> > parser in chunks, it's not a matter of waiting for more data to arrive,
>I
> > need to process it in segments.
> >
> > Chris
> >
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Streaming input to the parser

Posted by Andy Heninger <an...@jtcsv.com>.
I think that your approach will work.  The parser will continue to read
from an input stream so long as the previous read returned anything at
all.

Andy Heninger
IBM XML Technology Group, Cupertino, CA
heninger@us.ibm.com



----- Original Message -----
From: "Chris Hill" <ch...@wolfram.com>

> I've been working with the Xerces in an attempt to send it parts of the
> input data as it becomes available to me.  I provided a custom
InputSource
> that hands out a custom BinInputStream.  I have looked at the
progressive
> parse example and I decided to try to put "marker comments" into the
stream
> at the end of each chunk and pause parsing when I encounter one until
more
> data arrives.  Will I have any probelms with this approach? Is there a
> better way to do this?
>
> I'm concerned that the parser might not read more data from the stream
> after it has exausted the data from the first read (if the amount of
data
> available is less than it requested).  I need to provide input to the
> parser in chunks, it's not a matter of waiting for more data to arrive,
I
> need to process it in segments.
>
> Chris
>



Re: Streaming input to the parser

Posted by Chris Hill <ch...@wolfram.com>.
I need to work with a single thread.  I'm upgrading some code that was 
using expat, which allows more data to be added by calling the parse 
function multiple times with additional data.  This seems like a very 
useful feature  and I'm surprised that Xerces doesn't support similar 
functionality.

At 10:05 PM 7/25/2000, you wrote:
>The best you can do is to just block inside your input stream if it calls
>you and you have no more data. So if you run the parser in another thread,
>and have the input stream block until another thread feeds it some more
>data, that would work ok.
>
>--------------------------
>Dean Roddey
>The CIDLib C++ Frameworks
>Charmed Quark Software
>droddey@charmedquark.com
>http://www.charmedquark.com
>
>"You young, and you gotcha health. Whatchoo wanna job fer?"
>
>
>----- Original Message -----
>From: "Chris Hill" <ch...@wolfram.com>
>To: <xe...@xml.apache.org>
>Sent: Tuesday, July 25, 2000 7:04 AM
>Subject: Streaming input to the parser
>
>
> > I've been working with the Xerces in an attempt to send it parts of the
> > input data as it becomes available to me.  I provided a custom InputSource
> > that hands out a custom BinInputStream.  I have looked at the progressive
> > parse example and I decided to try to put "marker comments" into the
>stream
> > at the end of each chunk and pause parsing when I encounter one until more
> > data arrives.  Will I have any probelms with this approach? Is there a
> > better way to do this?
> >
> > I'm concerned that the parser might not read more data from the stream
> > after it has exausted the data from the first read (if the amount of data
> > available is less than it requested).  I need to provide input to the
> > parser in chunks, it's not a matter of waiting for more data to arrive, I
> > need to process it in segments.
> >
> > Chris


Re: Streaming input to the parser

Posted by Dean Roddey <dr...@charmedquark.com>.
The best you can do is to just block inside your input stream if it calls
you and you have no more data. So if you run the parser in another thread,
and have the input stream block until another thread feeds it some more
data, that would work ok.

--------------------------
Dean Roddey
The CIDLib C++ Frameworks
Charmed Quark Software
droddey@charmedquark.com
http://www.charmedquark.com

"You young, and you gotcha health. Whatchoo wanna job fer?"


----- Original Message -----
From: "Chris Hill" <ch...@wolfram.com>
To: <xe...@xml.apache.org>
Sent: Tuesday, July 25, 2000 7:04 AM
Subject: Streaming input to the parser


> I've been working with the Xerces in an attempt to send it parts of the
> input data as it becomes available to me.  I provided a custom InputSource
> that hands out a custom BinInputStream.  I have looked at the progressive
> parse example and I decided to try to put "marker comments" into the
stream
> at the end of each chunk and pause parsing when I encounter one until more
> data arrives.  Will I have any probelms with this approach? Is there a
> better way to do this?
>
> I'm concerned that the parser might not read more data from the stream
> after it has exausted the data from the first read (if the amount of data
> available is less than it requested).  I need to provide input to the
> parser in chunks, it's not a matter of waiting for more data to arrive, I
> need to process it in segments.
>
> Chris
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>