You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Vincent Hennebert <vh...@gmail.com> on 2009/09/23 13:44:11 UTC

When must the structure tree be output in the PDF file?

To those PDF specialists around here: am I right that the structure tree
could as well be converted into PDF at the end of a page sequence, as at
the beginning?

In other words: could the piece of code dealing with the structure tree
be moved from PDFDocumentHandler.startPageSequence to
PDFDocumentHandler.endPageSequence?

Thanks,
Vincent

Re: When must the structure tree be output in the PDF file?

Posted by Vincent Hennebert <vh...@gmail.com>.
Hi Jeremias,

Jeremias Maerki wrote:
<snip/>
> IFParser is also still missing the parse code for the structure tree. I
> guess I would defer the call to startPageSequence in the IFParser, then
> parse the reduced FO tree using a ContentHandler delegate, set that on
> the user agent and then call startPageSequence when the first page tag
> is encountered.

That’s what I ended up doing.
Thanks for your input,
Vincent


> On 24.09.2009 13:07:11 Vincent Hennebert wrote:
>> Jeremias Maerki wrote:
>>> Not just like that (if at all). The content items being produced inside
>>> the page-sequence have to be linked into the structure tree. There are
>>> links (MCIDs) back and forth between the structure tree and the content
>>> streams. You have to have the structure tree available while you create
>>> the page contents to build up the links. You could probably move the
>>> generation to endPageSequence but you'd end up duplicating some of the
>>> data structures for establishing the links in the process which you'd
>>> then have to map to the PDF library in the end. Not sure if that's what
>>> you want. I don't have this stuff present as much as back when I helped
>>> Jost, so I may be missing something.
>> Ok, then there’s the following problem: when creating the PDF document
>> out of an IF XML file, the structure tree is not yet available at the
>> time PDFDocumentHandler.startPageSequence is called. Indeed in the IF
>> the structure tree is stored as a child of the page-sequence element.
>>
>> Any idea of how to handle this, other than putting an ugly boolean at
>> the beginning of PDFDocumentHandler.startPage, “if structure tree not
>> yet built, then build structure tree”?
>>
>>
>>> On 23.09.2009 13:44:11 Vincent Hennebert wrote:
>>>> To those PDF specialists around here: am I right that the structure tree
>>>> could as well be converted into PDF at the end of a page sequence, as at
>>>> the beginning?
>>>>
>>>> In other words: could the piece of code dealing with the structure tree
>>>> be moved from PDFDocumentHandler.startPageSequence to
>>>> PDFDocumentHandler.endPageSequence?
>>>>
>>>> Thanks,
>>>> Vincent
>>>
>>>
>>>
>>> Jeremias Maerki
>>
>> Thanks,
>> Vincent
> 
> 
> 
> 
> Jeremias Maerki
> 

Re: When must the structure tree be output in the PDF file?

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Ah, I see. Essentially, you're stuck in the case where the IF is loaded
from XML in which case the user agent hasn't received the structure tree,
yet. So, to visualize this:

Direct Rendering:
- Reduced FO tree is built using XSLT and set on the user agent
- DocumentHandler.startPageSequence is called (structure tree is
available)

IF Case:
- IFParser encounters starting page-sequence tag and calls
DocumentHandler.startPageSequence but the FO tree is inside the
page-sequence tag and therefore not available at this point.

IFParser is also still missing the parse code for the structure tree. I
guess I would defer the call to startPageSequence in the IFParser, then
parse the reduced FO tree using a ContentHandler delegate, set that on
the user agent and then call startPageSequence when the first page tag
is encountered. I wouldn't do the deferring in the document handler as
the problem only happens in the IF case.

HTH

On 24.09.2009 13:07:11 Vincent Hennebert wrote:
> Jeremias Maerki wrote:
> > Not just like that (if at all). The content items being produced inside
> > the page-sequence have to be linked into the structure tree. There are
> > links (MCIDs) back and forth between the structure tree and the content
> > streams. You have to have the structure tree available while you create
> > the page contents to build up the links. You could probably move the
> > generation to endPageSequence but you'd end up duplicating some of the
> > data structures for establishing the links in the process which you'd
> > then have to map to the PDF library in the end. Not sure if that's what
> > you want. I don't have this stuff present as much as back when I helped
> > Jost, so I may be missing something.
> 
> Ok, then there’s the following problem: when creating the PDF document
> out of an IF XML file, the structure tree is not yet available at the
> time PDFDocumentHandler.startPageSequence is called. Indeed in the IF
> the structure tree is stored as a child of the page-sequence element.
> 
> Any idea of how to handle this, other than putting an ugly boolean at
> the beginning of PDFDocumentHandler.startPage, “if structure tree not
> yet built, then build structure tree”?
> 
> 
> > On 23.09.2009 13:44:11 Vincent Hennebert wrote:
> >> To those PDF specialists around here: am I right that the structure tree
> >> could as well be converted into PDF at the end of a page sequence, as at
> >> the beginning?
> >>
> >> In other words: could the piece of code dealing with the structure tree
> >> be moved from PDFDocumentHandler.startPageSequence to
> >> PDFDocumentHandler.endPageSequence?
> >>
> >> Thanks,
> >> Vincent
> > 
> > 
> > 
> > 
> > Jeremias Maerki
> 
> 
> Thanks,
> Vincent




Jeremias Maerki


Re: When must the structure tree be output in the PDF file?

Posted by Vincent Hennebert <vh...@gmail.com>.
Jeremias Maerki wrote:
> Not just like that (if at all). The content items being produced inside
> the page-sequence have to be linked into the structure tree. There are
> links (MCIDs) back and forth between the structure tree and the content
> streams. You have to have the structure tree available while you create
> the page contents to build up the links. You could probably move the
> generation to endPageSequence but you'd end up duplicating some of the
> data structures for establishing the links in the process which you'd
> then have to map to the PDF library in the end. Not sure if that's what
> you want. I don't have this stuff present as much as back when I helped
> Jost, so I may be missing something.

Ok, then there’s the following problem: when creating the PDF document
out of an IF XML file, the structure tree is not yet available at the
time PDFDocumentHandler.startPageSequence is called. Indeed in the IF
the structure tree is stored as a child of the page-sequence element.

Any idea of how to handle this, other than putting an ugly boolean at
the beginning of PDFDocumentHandler.startPage, “if structure tree not
yet built, then build structure tree”?


> On 23.09.2009 13:44:11 Vincent Hennebert wrote:
>> To those PDF specialists around here: am I right that the structure tree
>> could as well be converted into PDF at the end of a page sequence, as at
>> the beginning?
>>
>> In other words: could the piece of code dealing with the structure tree
>> be moved from PDFDocumentHandler.startPageSequence to
>> PDFDocumentHandler.endPageSequence?
>>
>> Thanks,
>> Vincent
> 
> 
> 
> 
> Jeremias Maerki


Thanks,
Vincent

Re: When must the structure tree be output in the PDF file?

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Not just like that (if at all). The content items being produced inside
the page-sequence have to be linked into the structure tree. There are
links (MCIDs) back and forth between the structure tree and the content
streams. You have to have the structure tree available while you create
the page contents to build up the links. You could probably move the
generation to endPageSequence but you'd end up duplicating some of the
data structures for establishing the links in the process which you'd
then have to map to the PDF library in the end. Not sure if that's what
you want. I don't have this stuff present as much as back when I helped
Jost, so I may be missing something.

On 23.09.2009 13:44:11 Vincent Hennebert wrote:
> To those PDF specialists around here: am I right that the structure tree
> could as well be converted into PDF at the end of a page sequence, as at
> the beginning?
> 
> In other words: could the piece of code dealing with the structure tree
> be moved from PDFDocumentHandler.startPageSequence to
> PDFDocumentHandler.endPageSequence?
> 
> Thanks,
> Vincent




Jeremias Maerki