You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by jcplerm <jc...@ameritech.net> on 2003/11/06 17:22:06 UTC
Stopping a SAX parser
I'm not sure which list to post this question to, but maybe one of you know if it's possible at all to stop a SAX parser once the immediate content handler finds whatever info it is looking for, in order to avoid unnecessary parsing of the rest of a potentially large XML document?
Thanks,
jlerm
Re: Stopping a SAX parser
Posted by Jorg Heymans <jh...@domek.be>.
If this functionality is needed within a custom transformer then you
could extend AbstractDOMTransformer instead. There you have direct
access to the full document, so no need to parse every single element.
Could you be a bit more specific?
jcplerm wrote:
> I'm not sure which list to post this question to, but maybe one of you
> know if it's possible at all to stop a SAX parser once the immediate
> content handler finds whatever info it is looking for, in order to avoid
> unnecessary parsing of the rest of a potentially large XML document?
>
> Thanks,
>
> jlerm
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Stopping a SAX parser
Posted by Bruno Dumon <br...@outerthought.org>.
On Thu, 2003-11-06 at 18:03, jcplerm wrote:
> I guess throwing an exception is a pretty good suggestion that I try out.
>
> Sorry for my ignorance, but is there any XML pull parser that is already
> available in Cocoon?
not that I know of.
> Or would this require still some integration effort?
Yep. Basically you would need to create a custom generator that pull
parses a file and only sends the events of interest down the pipeline.
Make sure though the events are nicely balanced, i.e. for each
startElement there should be a corresponding endElement call, etc. Also
beware of possible differences in conventions of what the definitions of
namespaceURI, localName and qName are. (e.g. in SAX an empty namespace
is represented by an empty string while in DOM it is null)
>
> jlerm
>
> ----- Original Message -----
> From: "Bruno Dumon" <br...@outerthought.org>
> To: <us...@cocoon.apache.org>
> Sent: Thursday, November 06, 2003 10:46 AM
> Subject: Re: Stopping a SAX parser
>
>
> > On Thu, 2003-11-06 at 17:36, David Kavanagh wrote:
> > > If I'm not mistaken, you can call endDocument() in your own
> > > transformer at any time. That effectively puts an end to pipeline
> > > processing.
> >
> > But that won't stop the parser from parsing the rest of the file and
> > pushing SAX events out.
> >
> > The only way of stopping a SAX-parser is throwing an exception.
> >
> > Using a pull-parser you can decide yourself when you stop reading.
> >
> > > Hopefully, at that point, you've also genereated events for some
> > > sending some data out of your transformer!
> > >
> > > David
> > >
> > > jcplerm wrote:
> > > > I'm not sure which list to post this question to, but maybe one of
> > > > you know if it's possible at all to stop a SAX parser once the
> > > > immediate content handler finds whatever info it is looking for, in
> > > > order to avoid unnecessary parsing of the rest of a potentially
> > > > large XML document?
> > > >
> > > > Thanks,
> > > >
> > > > jlerm
--
Bruno Dumon http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org bruno@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Stopping a SAX parser
Posted by jcplerm <jc...@ameritech.net>.
I guess throwing an exception is a pretty good suggestion that I try out.
Sorry for my ignorance, but is there any XML pull parser that is already
available in Cocoon?
Or would this require still some integration effort?
jlerm
----- Original Message -----
From: "Bruno Dumon" <br...@outerthought.org>
To: <us...@cocoon.apache.org>
Sent: Thursday, November 06, 2003 10:46 AM
Subject: Re: Stopping a SAX parser
> On Thu, 2003-11-06 at 17:36, David Kavanagh wrote:
> > If I'm not mistaken, you can call endDocument() in your own
> > transformer at any time. That effectively puts an end to pipeline
> > processing.
>
> But that won't stop the parser from parsing the rest of the file and
> pushing SAX events out.
>
> The only way of stopping a SAX-parser is throwing an exception.
>
> Using a pull-parser you can decide yourself when you stop reading.
>
> > Hopefully, at that point, you've also genereated events for some
> > sending some data out of your transformer!
> >
> > David
> >
> > jcplerm wrote:
> > > I'm not sure which list to post this question to, but maybe one of
> > > you know if it's possible at all to stop a SAX parser once the
> > > immediate content handler finds whatever info it is looking for, in
> > > order to avoid unnecessary parsing of the rest of a potentially
> > > large XML document?
> > >
> > > Thanks,
> > >
> > > jlerm
> --
> Bruno Dumon http://outerthought.org/
> Outerthought - Open Source, Java & XML Competence Support Center
> bruno@outerthought.org bruno@apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Stopping a SAX parser
Posted by jcplerm <jc...@ameritech.net>.
What Bruno says makes sense. A SAX pipeline is actually just a sequence of classes, each one invoking methods of another one.
For instance:
SaxParser->ContHandler1->ContHandler2->...->ContHandlerN
If ContHandler1 invokes endDocument(), all this does is call this method in ContHandler2. ContHandler1 is free to do this at any time.
But that has absolutely no implication at all to the SaxParser. The SaxParser is totally oblivious to what happens down the line.
The SaxParser will continue parsing the source XML, identifying tokens and invoking ContHandler1's methods.
Of course, if ContHandler1 invokes endDocument() twice, this will most likely result in an exception, but that's because ContHandler was not coded correctly.
What I thought was that there could be somewhere a "back door" that ContHandler could signal SaxParser to stop, but that does not seem to exist.
Unless any of the ContHandlerX throws an exception (which might be the way to go, at least in my own application).
Thanks,
jlerm
----- Original Message -----
From: David Kavanagh
To: users@cocoon.apache.org
Sent: Thursday, November 06, 2003 10:58 AM
Subject: Re: Stopping a SAX parser
Interesting. I was under the assumption that the content handler events were synchronous, so the processing could be interrupted by calling endDocument(). Doesn't the pipeline get recycled at this point? I thought that a generator would stop processing when it gets a life cycle event to clean up.
This was my expectation, but I haven't looked at the code (which is always the last word!)
David
Bruno Dumon wrote:
On Thu, 2003-11-06 at 17:36, David Kavanagh wrote:
If I'm not mistaken, you can call endDocument() in your own
transformer at any time. That effectively puts an end to pipeline
processing.
But that won't stop the parser from parsing the rest of the file and
pushing SAX events out.
The only way of stopping a SAX-parser is throwing an exception.
Using a pull-parser you can decide yourself when you stop reading.
Hopefully, at that point, you've also genereated events for some
sending some data out of your transformer!
David
jcplerm wrote:
I'm not sure which list to post this question to, but maybe one of
you know if it's possible at all to stop a SAX parser once the
immediate content handler finds whatever info it is looking for, in
order to avoid unnecessary parsing of the rest of a potentially
large XML document?
Thanks,
jlerm
Re: Stopping a SAX parser
Posted by David Kavanagh <da...@dotech.com>.
Interesting. I was under the assumption that the content handler events
were synchronous, so the processing could be interrupted by calling
endDocument(). Doesn't the pipeline get recycled at this point? I
thought that a generator would stop processing when it gets a life cycle
event to clean up.
This was my expectation, but I haven't looked at the code (which is
always the last word!)
David
Bruno Dumon wrote:
>On Thu, 2003-11-06 at 17:36, David Kavanagh wrote:
>
>
>>If I'm not mistaken, you can call endDocument() in your own
>>transformer at any time. That effectively puts an end to pipeline
>>processing.
>>
>>
>
>But that won't stop the parser from parsing the rest of the file and
>pushing SAX events out.
>
>The only way of stopping a SAX-parser is throwing an exception.
>
>Using a pull-parser you can decide yourself when you stop reading.
>
>
>
>> Hopefully, at that point, you've also genereated events for some
>>sending some data out of your transformer!
>>
>>David
>>
>>jcplerm wrote:
>>
>>
>>>I'm not sure which list to post this question to, but maybe one of
>>>you know if it's possible at all to stop a SAX parser once the
>>>immediate content handler finds whatever info it is looking for, in
>>>order to avoid unnecessary parsing of the rest of a potentially
>>>large XML document?
>>>
>>>Thanks,
>>>
>>>jlerm
>>>
>>>
Re: Stopping a SAX parser
Posted by Bruno Dumon <br...@outerthought.org>.
On Thu, 2003-11-06 at 17:36, David Kavanagh wrote:
> If I'm not mistaken, you can call endDocument() in your own
> transformer at any time. That effectively puts an end to pipeline
> processing.
But that won't stop the parser from parsing the rest of the file and
pushing SAX events out.
The only way of stopping a SAX-parser is throwing an exception.
Using a pull-parser you can decide yourself when you stop reading.
> Hopefully, at that point, you've also genereated events for some
> sending some data out of your transformer!
>
> David
>
> jcplerm wrote:
> > I'm not sure which list to post this question to, but maybe one of
> > you know if it's possible at all to stop a SAX parser once the
> > immediate content handler finds whatever info it is looking for, in
> > order to avoid unnecessary parsing of the rest of a potentially
> > large XML document?
> >
> > Thanks,
> >
> > jlerm
--
Bruno Dumon http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org bruno@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Stopping a SAX parser
Posted by David Kavanagh <da...@dotech.com>.
If I'm not mistaken, you can call endDocument() in your own transformer
at any time. That effectively puts an end to pipeline processing.
Hopefully, at that point, you've also genereated events for some sending
some data out of your transformer!
David
jcplerm wrote:
> I'm not sure which list to post this question to, but maybe one of you
> know if it's possible at all to stop a SAX parser once the immediate
> content handler finds whatever info it is looking for, in order to
> avoid unnecessary parsing of the rest of a potentially large XML document?
>
> Thanks,
>
> jlerm