You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by jcplerm <jc...@ameritech.net> on 2003/11/06 17:22:06 UTC

Stopping a SAX parser

I'm not sure which list to post this question to, but maybe one of you know if it's possible at all to stop a SAX parser once the immediate content handler finds whatever info it is looking for, in order to avoid unnecessary parsing of the rest of a potentially large XML document?

Thanks,

jlerm

Re: Stopping a SAX parser

Posted by Jorg Heymans <jh...@domek.be>.
If this functionality is needed within a custom transformer then you 
could extend AbstractDOMTransformer instead. There you have direct 
access to the full document, so no need to parse every single element.

Could you be a bit more specific?



jcplerm wrote:

> I'm not sure which list to post this question to, but maybe one of you 
> know if it's possible at all to stop a SAX parser once the immediate 
> content handler finds whatever info it is looking for, in order to avoid 
> unnecessary parsing of the rest of a potentially large XML document?
>  
> Thanks,
>  
> jlerm


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Stopping a SAX parser

Posted by Bruno Dumon <br...@outerthought.org>.
On Thu, 2003-11-06 at 18:03, jcplerm wrote:
> I guess throwing an exception is a pretty good suggestion that I try out.
> 
> Sorry for my ignorance, but is there any XML pull parser that is already
> available in Cocoon?

not that I know of.

> Or would this require still some integration effort?

Yep. Basically you would need to create a custom generator that pull
parses a file and only sends the events of interest down the pipeline.
Make sure though the events are nicely balanced, i.e. for each
startElement there should be a corresponding endElement call, etc. Also
beware of possible differences in conventions of what the definitions of
namespaceURI, localName and qName are. (e.g. in SAX an empty namespace
is represented by an empty string while in DOM it is null)

> 
> jlerm
> 
> ----- Original Message ----- 
> From: "Bruno Dumon" <br...@outerthought.org>
> To: <us...@cocoon.apache.org>
> Sent: Thursday, November 06, 2003 10:46 AM
> Subject: Re: Stopping a SAX parser
> 
> 
> > On Thu, 2003-11-06 at 17:36, David Kavanagh wrote:
> > > If I'm not mistaken, you can call endDocument() in your own
> > > transformer at any time. That effectively puts an end to pipeline
> > > processing.
> >
> > But that won't stop the parser from parsing the rest of the file and
> > pushing SAX events out.
> >
> > The only way of stopping a SAX-parser is throwing an exception.
> >
> > Using a pull-parser you can decide yourself when you stop reading.
> >
> > >  Hopefully, at that point, you've also genereated events for some
> > > sending some data out of your transformer!
> > >
> > > David
> > >
> > > jcplerm wrote:
> > > > I'm not sure which list to post this question to, but maybe one of
> > > > you know if it's possible at all to stop a SAX parser once the
> > > > immediate content handler finds whatever info it is looking for, in
> > > > order to avoid unnecessary parsing of the rest of a potentially
> > > > large XML document?
> > > >
> > > > Thanks,
> > > >
> > > > jlerm
-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Stopping a SAX parser

Posted by jcplerm <jc...@ameritech.net>.
I guess throwing an exception is a pretty good suggestion that I try out.

Sorry for my ignorance, but is there any XML pull parser that is already
available in Cocoon?
Or would this require still some integration effort?

jlerm

----- Original Message ----- 
From: "Bruno Dumon" <br...@outerthought.org>
To: <us...@cocoon.apache.org>
Sent: Thursday, November 06, 2003 10:46 AM
Subject: Re: Stopping a SAX parser


> On Thu, 2003-11-06 at 17:36, David Kavanagh wrote:
> > If I'm not mistaken, you can call endDocument() in your own
> > transformer at any time. That effectively puts an end to pipeline
> > processing.
>
> But that won't stop the parser from parsing the rest of the file and
> pushing SAX events out.
>
> The only way of stopping a SAX-parser is throwing an exception.
>
> Using a pull-parser you can decide yourself when you stop reading.
>
> >  Hopefully, at that point, you've also genereated events for some
> > sending some data out of your transformer!
> >
> > David
> >
> > jcplerm wrote:
> > > I'm not sure which list to post this question to, but maybe one of
> > > you know if it's possible at all to stop a SAX parser once the
> > > immediate content handler finds whatever info it is looking for, in
> > > order to avoid unnecessary parsing of the rest of a potentially
> > > large XML document?
> > >
> > > Thanks,
> > >
> > > jlerm
> -- 
> Bruno Dumon                             http://outerthought.org/
> Outerthought - Open Source, Java & XML Competence Support Center
> bruno@outerthought.org                          bruno@apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Stopping a SAX parser

Posted by jcplerm <jc...@ameritech.net>.
What Bruno says makes sense. A SAX pipeline is actually just a sequence of classes, each one invoking methods of another one.

For instance:

    SaxParser->ContHandler1->ContHandler2->...->ContHandlerN

If ContHandler1 invokes endDocument(), all this does is call this method in ContHandler2. ContHandler1 is free to do this at any time. 
But that has absolutely no implication at all to the SaxParser. The SaxParser is totally oblivious to what happens down the line.
The SaxParser will continue parsing the source XML, identifying tokens and invoking ContHandler1's methods.
Of course, if ContHandler1 invokes endDocument() twice, this will most likely result in an exception, but that's because ContHandler was not coded correctly.

What I thought was that there could be somewhere a "back door" that  ContHandler could signal SaxParser to stop, but that does not seem to exist.

Unless any of the ContHandlerX throws an exception (which might be the way to go, at least in my own application).

Thanks,

jlerm
  ----- Original Message ----- 
  From: David Kavanagh 
  To: users@cocoon.apache.org 
  Sent: Thursday, November 06, 2003 10:58 AM
  Subject: Re: Stopping a SAX parser


  Interesting. I was under the assumption that the content handler events were synchronous, so the processing could be interrupted by calling endDocument(). Doesn't the pipeline get recycled at this point? I thought that a generator would stop processing when it gets a life cycle event to clean up.
  This was my expectation, but I haven't looked at the code (which is always the last word!)

  David

  Bruno Dumon wrote:

On Thu, 2003-11-06 at 17:36, David Kavanagh wrote:
  If I'm not mistaken, you can call endDocument() in your own
transformer at any time. That effectively puts an end to pipeline
processing.
    
But that won't stop the parser from parsing the rest of the file and
pushing SAX events out.

The only way of stopping a SAX-parser is throwing an exception.

Using a pull-parser you can decide yourself when you stop reading.

   Hopefully, at that point, you've also genereated events for some
sending some data out of your transformer!

David

jcplerm wrote:
    I'm not sure which list to post this question to, but maybe one of
you know if it's possible at all to stop a SAX parser once the
immediate content handler finds whatever info it is looking for, in
order to avoid unnecessary parsing of the rest of a potentially
large XML document?
 
Thanks,
 
jlerm
      

Re: Stopping a SAX parser

Posted by David Kavanagh <da...@dotech.com>.
Interesting. I was under the assumption that the content handler events 
were synchronous, so the processing could be interrupted by calling 
endDocument(). Doesn't the pipeline get recycled at this point? I 
thought that a generator would stop processing when it gets a life cycle 
event to clean up.
This was my expectation, but I haven't looked at the code (which is 
always the last word!)

David

Bruno Dumon wrote:

>On Thu, 2003-11-06 at 17:36, David Kavanagh wrote:
>  
>
>>If I'm not mistaken, you can call endDocument() in your own
>>transformer at any time. That effectively puts an end to pipeline
>>processing.
>>    
>>
>
>But that won't stop the parser from parsing the rest of the file and
>pushing SAX events out.
>
>The only way of stopping a SAX-parser is throwing an exception.
>
>Using a pull-parser you can decide yourself when you stop reading.
>
>  
>
>> Hopefully, at that point, you've also genereated events for some
>>sending some data out of your transformer!
>>
>>David
>>
>>jcplerm wrote:
>>    
>>
>>>I'm not sure which list to post this question to, but maybe one of
>>>you know if it's possible at all to stop a SAX parser once the
>>>immediate content handler finds whatever info it is looking for, in
>>>order to avoid unnecessary parsing of the rest of a potentially
>>>large XML document?
>>> 
>>>Thanks,
>>> 
>>>jlerm
>>>      
>>>

Re: Stopping a SAX parser

Posted by Bruno Dumon <br...@outerthought.org>.
On Thu, 2003-11-06 at 17:36, David Kavanagh wrote:
> If I'm not mistaken, you can call endDocument() in your own
> transformer at any time. That effectively puts an end to pipeline
> processing.

But that won't stop the parser from parsing the rest of the file and
pushing SAX events out.

The only way of stopping a SAX-parser is throwing an exception.

Using a pull-parser you can decide yourself when you stop reading.

>  Hopefully, at that point, you've also genereated events for some
> sending some data out of your transformer!
> 
> David
> 
> jcplerm wrote:
> > I'm not sure which list to post this question to, but maybe one of
> > you know if it's possible at all to stop a SAX parser once the
> > immediate content handler finds whatever info it is looking for, in
> > order to avoid unnecessary parsing of the rest of a potentially
> > large XML document?
> >  
> > Thanks,
> >  
> > jlerm
-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Stopping a SAX parser

Posted by David Kavanagh <da...@dotech.com>.
If I'm not mistaken, you can call endDocument() in your own transformer 
at any time. That effectively puts an end to pipeline processing. 
Hopefully, at that point, you've also genereated events for some sending 
some data out of your transformer!

David

jcplerm wrote:

> I'm not sure which list to post this question to, but maybe one of you 
> know if it's possible at all to stop a SAX parser once the immediate 
> content handler finds whatever info it is looking for, in order to 
> avoid unnecessary parsing of the rest of a potentially large XML document?
>  
> Thanks,
>  
> jlerm