You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Charles Owen <ow...@addr.com> on 2001/08/04 00:55:04 UTC

continuous document; sax

Hello.

I'm trying to use Crimson to parse a "continuous" XML document using SAX.

More specifically, an xml stream is passing through a socket, and the end of
the document may not arrive for quite some time. However, I'd like the SAX
parser to act on elements as soon as they arrive.

I haven't yet looked at the code, but it seems that the parser is waiting until 
it sees an EOF before beginning to parse. (Just a guess. Sorry.)

In any case, the parser simply hangs until I close the socket, then it processes
the document (complaining about things at the end).

Is there a way to do what I want using Crimson?

Thanks for any help,

Charles.


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: continuous document; sax

Posted by Donald Ball <ba...@webslingerZ.com>.
can crimson get its own mailing list please? seems that about 1/3 of the
traffic on general is crimson-specific and there's no point in inflicting
it on everyone.

(note, not a bust on charles, he's following the documented procedure!)

- donald



---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: continuous document; sax

Posted by Andy Clark <an...@apache.org>.
Duane Stoddard wrote:
> The Xerces FAQ actually contains information on how to do this 
> also (they have an additional requirement in that you should
> also implement your own StreamingCharFactory, but see that 
> FAQ for more information about this).

Xerces2 doesn't have this read-entire-buffer-before-doing-anything
problem. It will take whatever it can get (even if that's only a
character at a time) and process it.

> For our purposes, we use our own reader which extends the 
> BufferedReader class. I have attached this class so you can see 
> how we handle it. 

These types of solutions are perfect when you know the encoding
of the stream ahead of time. If you don't, then you really 
shouldn't try to auto-detect all of the various encodings your-
self because it can be error prone.

If you don't know the encoding, then your best bet is to write
some kind of protocol into the stream so that the receiver can
detect the end-of-file and pretend to close the stream so that
the parser can continue. 

> The licensing agreement uses a GPL style license - so 

If you're leary of the GPL kind of license and want to employ
a solution such as the one I stated earlier, there is a sample 
with Xerces2 that provides the arbitrary length data solution 
when you don't even know the original length of the document 
up front. However, it requires you to have control over the 
writer and reader of the data to actually do the protocol but
the classes handle it automatically when you use them.

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


RE: continuous document; sax

Posted by Duane Stoddard <Du...@yahoo.com>.
Hi Charles,
We've also run into this issue at open3.org and have come up with a solution. You are close on your analysis of the problem. There
are two problems in using crimson in this manner (also apparent with most other parsers - Xerces and Saxon).

1) Parsers usually read in a chunk of data at a time from the stream and then parse it.
2) Parsers will typically close the stream when the end is reached (or an exception is thrown).

There are a couple of solutions to this. You can read in the relevant data from the stream yourself and place this data into a
buffer - then give it to the parser - OR - You can implement your own stream reader to change the implementation of the read/close
methods. The Xerces FAQ actually contains information on how to do this also (they have an additional requirement in that you should
also implement your own StreamingCharFactory, but see that FAQ for more information about this).

For our purposes, we use our own reader which extends the BufferedReader class. I have attached this class so you can see how we
handle it. We create the stream for the reader and give it to the parser like this:
	XMLStreamReader is = new XMLStreamReader(
		new InputStreamReader(socket.getInputStream(), "UTF8"));
	XMLReader reader = ... // Get an XML Reader - or use JAXP or whatever
	reader.parse(new InputStream(is));

I should point out that the included code does have a copyright notice, as it is included in our open source distribution. The
licensing agreement uses a GPL style license - so depending on your needs this shouldn't be a problem. If it is an issue, or you are
timid about such things - you can use this (or the Xerces help) as an example to formulate your own solution.

Hope that helped,
Duane L. Stoddard
duane@open3.org         http://www.open3.org
Open Source Integration Solutions for the Enterprise


-----Original Message-----
From: Charles Owen [mailto:owen@addr.com]
Sent: Friday, August 03, 2001 4:55 PM
To: general@xml.apache.org
Subject: continuous document; sax



Hello.

I'm trying to use Crimson to parse a "continuous" XML document using SAX.

More specifically, an xml stream is passing through a socket, and the end of
the document may not arrive for quite some time. However, I'd like the SAX
parser to act on elements as soon as they arrive.

I haven't yet looked at the code, but it seems that the parser is waiting until
it sees an EOF before beginning to parse. (Just a guess. Sorry.)

In any case, the parser simply hangs until I close the socket, then it processes
the document (complaining about things at the end).

Is there a way to do what I want using Crimson?

Thanks for any help,

Charles.


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org