You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by Keiron Liddle <ke...@aftexsw.com> on 2002/02/11 10:19:51 UTC

XML Parsing [2]

Since everyone knows the basics we can get into the various stages 
starting with the XML handling.


XML Input
---------

FOP can take the input XML in a number of ways:
- SAX Events through SAX Handler
- DOM which is converted into SAX Events
- data source which is parsed and converted into SAX Events
- XML+XSLT which is transformed using an XSLT Processor and the result is
fired as SAX Events

The SAX Events which are fired on the SAX Handler, class FOTreeBuilder,
must represent an XSL:FO document. If not there will be an error. Any
problems with the XML being well formed are handled here.


Element Mappings
----------------

The element mappings is a hashmap of all the elements in a particular
namespace. This makes it easy to create a different object for each
element. Element mappings are static to save on memory.

To add an extension a developer can put a jar in the classpath that
contains the file "/META-INF/services/org.apache.fop.fo.ElementMapping".
This must contain a line with the fully qualified name of a class that
implements the "org.apache.fop.fo.ElementMapping" interface. This will
then be loaded automatically at the start.
Internal mappings are: FO, SVG and Extension (pdf bookmarks)

Tree Building
-------------

The SAX Events will fire all the information for the document with start
element, end element, text data etc. This information is used to build up
a representation of the FO document. To do this for a namespace there is a
set of element mappings. When an element + namepsace mapping is found then
it can create an object for that element. If the element is not found then 
it creates a dummy object or a generic DOM for unknown namespaces.

The object is then setup and then given attributes for the element. For
the FO Tree the attributes are converted into properties. The FO objects 
use a property list mapping to convert the attributes into a list of 
properties for the element.
For other XML, for example SVG, a DOM of the XML is constructed. This DOM
can then be passed through to the renderer.
Other element mappings can be used in different ways for example to create 
elements that create areas during the layout process or Setup information 
for the renderer etc.

While the tree building is mainly about creating the FO Tree there are 
some stages that can propagate to the renderer. At the end of a page 
sequence we know that all pages in the page sequence can be rendered 
without being effected by any further XML. The end of the XML document 
also tells us that we can finalise the output document.


Associated Tasks
----------------

Error handling for xml not well formed.
Error handling for other XML parsing errors.
Developer info for adding namespace handlers.



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re: XML Parsing [2] (RTF document header)

Posted by Keiron Liddle <ke...@aftexsw.com>.

After a bit of a think I realise that this statement isn't completely 
true. Pages may still contain unresolved page numbers or links. What I 
meant is that the layout can be completed and pages that are finished can 
be rendered. The layout managers themselves can do this a page at a time.

If pages cannot be rendered immediately due to unresolved things or 
formats like rtf then we should be able to cache the pages, if not they 
must stay in memory but this info is for later.

On 2002.02.11 10:33 Bertrand Delacretaz wrote:
> On Monday 11 February 2002 10:19, Keiron Liddle wrote:
> >. . .
> > At the end of a page sequence we know that all pages in the page
> > sequence can be rendered without being effected by any further XML.
> 
> Note that this won't be the case with RTF: AFAIK an RTF document has to
> contain a "document header" with font tables, tables of list formats
> etc. This header has to come at the beginning of the document but most
> of the information (notably information about list formats) it contains
> won't be available until much later in the document.
> 
> This is a problem if we want to generate RTF on the fly, and we don't
> have a solution for this in jfor yet, we just keep the RTF document in
> memory until it is complete.
> 
> - Bertrand
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
> For additional commands, email: fop-dev-help@xml.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re: XML Parsing [2] (RTF document header)

Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.

On Monday 11 February 2002 10:19, Keiron Liddle wrote:
>. . .
> At the end of a page sequence we know that all pages in the page 
> sequence can be rendered without being effected by any further XML. 

Note that this won't be the case with RTF: AFAIK an RTF document has to 
contain a "document header" with font tables, tables of list formats 
etc. This header has to come at the beginning of the document but most 
of the information (notably information about list formats) it contains 
won't be available until much later in the document.

This is a problem if we want to generate RTF on the fly, and we don't 
have a solution for this in jfor yet, we just keep the RTF document in 
memory until it is complete. 

- Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re: XML Parsing [2]

Posted by Keiron Liddle <ke...@aftexsw.com>.

Class/method info:

> XML Input
> ---------
> 
> FOP can take the input XML in a number of ways:
> - SAX Events through SAX Handler
FOTreeBuilder is the SAX Handler which is obtained through 
getContentHandler on Driver

> - DOM which is converted into SAX Events
This is done via the render(Document) method on Driver

> - data source which is parsed and converted into SAX Events
The Driver can take an InputSource as input, this can use a Stream, String 
etc.

> - XML+XSLT which is transformed using an XSLT Processor and the result 
> is fired as SAX Events
XSLTInputHandler is used as an InputSource in the render(XMLReader, 
InputSource) method on Driver

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re: XML Parsing [2]

Posted by "Peter B. West" <pb...@powerup.com.au>.

Keiron,

Keiron Liddle wrote:

> Since everyone knows the basics we can get into the various stages 
> starting with the XML handling.
>
>
> XML Input
> ---------
>
> FOP can take the input XML in a number of ways:
> - SAX Events through SAX Handler
> - DOM which is converted into SAX Events
> - data source which is parsed and converted into SAX Events
> - XML+XSLT which is transformed using an XSLT Processor and the result is
> fired as SAX Events

Could these functions have a class reference, as you have done below 
with the SAX Handler?  Could it be a general principle that even brief 
allusions to some functionality have a reference to the associated class 
or classes?  Perhaps others who are familiar with the code could follow 
behind you with annotations, which Cyril could collate in the xml/html 
version.

>
> The SAX Events which are fired on the SAX Handler, class FOTreeBuilder,
> must represent an XSL:FO document. If not there will be an error. Any
> problems with the XML being well formed are handled here.

Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org