You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Scott Boag/CAM/Lotus <Sc...@lotus.com> on 2000/07/03 04:39:27 UTC

Re: Using own HTML parser with XSLT

> The idea is that the HTML file -> XSLT -> own processing -> XML file
should
> be fast and not require an DOM tree to be built.

At some point a tree needs to be built in all current XSLT processors
available.  In order for a tree not to be built, you would have to use a
subset of XSLT that can be "streamed", and the processor would have to
pre-analysize the stylesheet first to make sure it wouldn't need to keep
parts of the tree around.  Xalan currently doesn't do this (nor do any
processors that I am aware of, though I could be wrong).  What can be done
is an incremental parse/transform so that the parse can be done in fairly
small blocks, and the transform occurs, when possible, on the partially
built tree.  Xalan 2.0 will be very good at this -- with input SAX events
for the source tree, while Xalan 1.0 only does it with the DTM, which
doesn't fit your bill.

-scott




                                                                                                                             
                    Tobias Wahlström                                                                                         
                    <tobias.wahlstrom@price        To:     "'general@xml.apache.org'" <ge...@xml.apache.org>               
                    runner.se>                     cc:     (bcc: Scott Boag/CAM/Lotus)                                       
                                                   Subject:     Using own HTML parser with XSLT                              
                    06/30/2000 05:53 AM                                                                                      
                    Please respond to                                                                                        
                    general                                                                                                  
                                                                                                                             
                                                                                                                             



Hello!

I want to read a HTML file using my own HTML parser. The result of the
parser should be XSLT processed and then another layer of processing should
be applied  and then written to a file. I would also like to use the parer
to produce a DOM tree at some times.

The idea is that the HTML file -> XSLT -> own processing -> XML file should
be fast and not require an DOM tree to be built. The preferred way should
be
to use some kind of event driven mechanism. The intention is that I want it
to be rather fast and not so memoty consuming.

Is this possible using Xalan (and Xerces)?
Is the SAX interface DocumentHandler usable to accomplish this?
Might the precopiled stylesheets (StylesheetRoot) be used?

If the questions above are completly stupid but you understood the
background I wrote above - please tell me what I should know instead of
answering my questions ;-)


Best regards,
Tobias Walström
Developer @ pricerunner.com


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org