You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Paul Vinkenoog <pa...@vinkenoog.nl> on 2007/03/19 16:50:52 UTC

PDF internal links re-implementation - advice sought

Hi all,

I've recently re-implemented the full internal links behaviour in the
PDF renderer. (I produce documentation for the Firebird project. FOP
0.93 solved a lot of our problems but we make use of links
extensively, and having them land on the top of the page was a
showstopper.)

At this point, everything works fine for our purposes, but of course I
would like it to be of use for others as well, and acceptable to the
FOP developers.

Actually I didn't want to discuss my changes until Jay Bryant
committed his named destination implementation, because I suspect he's
altered some code that I've also touched. But that hasn't happened
yet, and I'll soon be in a situation where I may be too busy to work
on this, so...


First, I would like your opinion(s) on the way link information is
passed from the layout phase to the renderer.

Currently, this is done as follows:
- Link areas have an INTERNAL_LINK trait containing the unique
  PageViewport key of the target ('P1', 'P2' etc.).
- Bookmarks have a getPageViewport() method which returns a direct
  reference to the target PageViewport.

IMO, the on-page target position should ideally be resolved during the
layout phase, so that all renderers can benefit from it. I suppose
that the information necessary could be made available at the moment
addAreas is called on the LayoutManager - except that there may be
some inline shift if page number citatons are resolved later?

But I'm afraid the layout code (especially the Knuth dance) is so
complicated that I couldn't master it in the time I had available.
I'm sorry about that, but please bear in mind that this was my first
introduction to the FOP source code ever.

Then an answer (to another person's question) on the fop-user list put
me in the right direction: keep track of the positions in the
renderer. So that's what I did.

For now, I've kept the existing functionality (see above) intact, both
as a fallback and because some other renderers may need it (although
within FOP itself, only PDFRenderer picks up INTERNAL_LINK traits).

To pass a link area's target ID to the renderer (bookmarks already
have it on board), I've revived the ID_LINK trait (it was commented
out in Trait.java). My internal link areas now carry the following
information:
  ID_LINK       : The target ID
  INTERNAL_LINK : The PV key string

This is not exactly elegant, of course. Some alternatives are:

- Create an internal link class carrying those two strings (or maybe
  rather a PageViewport field and an ID string), and make the
  INTERNAL_LINK trait of that class.

- Since we're resolving target areas in the renderer anyway, drop the
  PV key/reference altogether and pass the ID exclusively, be it in
  ID_LINK or in INTERNAL_LINK.

Both are easy to implement, but again, they may break custom renderers
that expect INTERNAL_LINK to be a string containing the PVkey. (The
first alternative allows custom renderers to adapt; with the second,
the page reference is completely lost.)

I'd like to know which solution you - the FOP developers - prefer.
That is, if my approach of resolving the links in the renderer is
acceptable at all.


A second question I have is the following:

At this moment I've overridden renderInlineArea and renderBlock to get
the PROD_ID and the current IP and BP positions.  With the documents
I've rendered so far, this is enough to resolve all the links (if a
link cannot be resolved, a warning is logged). But that doesn't prove
it's always enough. Should I check more area types, or are Block and
InlineArea the only ones that ever receive the IDs of the producing FO
element? Or, more precise: that are ever the *first* (in doc order) to
receive a certain ID?


Lastly, I'd like to tell you how much I appreciate all the effort you
people have put in this project. We've been using FOP for years in the
Firebird documentation project and it has become a tremendously
important tool for us. So if there's anything I can do to improve my
current implementation according to your wishes or requirements, I'll
be more than happy to do so -- provided I have the time and the
skills.


Kind regards,
Paul Vinkenoog

Re: PDF internal links re-implementation - advice sought

Posted by Adrian Cumiskey <ad...@gmail.com>.
Hi Paul,

Paul Vinkenoog wrote:
> Hi all,
> 
> I've recently re-implemented the full internal links behaviour in the
> PDF renderer. (I produce documentation for the Firebird project. FOP
> 0.93 solved a lot of our problems but we make use of links
> extensively, and having them land on the top of the page was a
> showstopper.)
> 
> At this point, everything works fine for our purposes, but of course I
> would like it to be of use for others as well, and acceptable to the
> FOP developers.
> 
> Actually I didn't want to discuss my changes until Jay Bryant
> committed his named destination implementation, because I suspect he's
> altered some code that I've also touched. But that hasn't happened
> yet, and I'll soon be in a situation where I may be too busy to work
> on this, so...

Its best to submit your patch anyway with a subject of [PATCH] at 
http://issues.apache.org/bugzilla/enter_bug.cgi.  Even if the patch is 
not commited, at least other FOP developers will at least be able to try 
out your work and provide you with some early feedback :-).

> First, I would like your opinion(s) on the way link information is
> passed from the layout phase to the renderer.
> 
> Currently, this is done as follows:
> - Link areas have an INTERNAL_LINK trait containing the unique
>   PageViewport key of the target ('P1', 'P2' etc.).
> - Bookmarks have a getPageViewport() method which returns a direct
>   reference to the target PageViewport.
> 
> IMO, the on-page target position should ideally be resolved during the
> layout phase, so that all renderers can benefit from it. I suppose
> that the information necessary could be made available at the moment
> addAreas is called on the LayoutManager - except that there may be
> some inline shift if page number citatons are resolved later?
> 
> But I'm afraid the layout code (especially the Knuth dance) is so
> complicated that I couldn't master it in the time I had available.
> I'm sorry about that, but please bear in mind that this was my first
> introduction to the FOP source code ever.
> 
> Then an answer (to another person's question) on the fop-user list put
> me in the right direction: keep track of the positions in the
> renderer. So that's what I did.
> 
> For now, I've kept the existing functionality (see above) intact, both
> as a fallback and because some other renderers may need it (although
> within FOP itself, only PDFRenderer picks up INTERNAL_LINK traits).
> 
> To pass a link area's target ID to the renderer (bookmarks already
> have it on board), I've revived the ID_LINK trait (it was commented
> out in Trait.java). My internal link areas now carry the following
> information:
>   ID_LINK       : The target ID
>   INTERNAL_LINK : The PV key string
> 
> This is not exactly elegant, of course. Some alternatives are:
> 
> - Create an internal link class carrying those two strings (or maybe
>   rather a PageViewport field and an ID string), and make the
>   INTERNAL_LINK trait of that class.
> 
> - Since we're resolving target areas in the renderer anyway, drop the
>   PV key/reference altogether and pass the ID exclusively, be it in
>   ID_LINK or in INTERNAL_LINK.
> 
> Both are easy to implement, but again, they may break custom renderers
> that expect INTERNAL_LINK to be a string containing the PVkey. (The
> first alternative allows custom renderers to adapt; with the second,
> the page reference is completely lost.)
> 
> I'd like to know which solution you - the FOP developers - prefer.
> That is, if my approach of resolving the links in the renderer is
> acceptable at all.

Without spending a lot of time doing my own investigation work and just 
reading from your description of the solutions I prefer the idea of 
creating an internal link class to encapsulate the two strings - this 
should provide us with more flexibility and a standard interface that 
can be used across renderers.

Adrian.

Re: PDF internal links re-implementation - advice sought

Posted by Paul Vinkenoog <pa...@vinkenoog.nl>.
Hi all,

>> inlineparent, text, viewport, foreign-object, image can also be
>> targets of internal-destination.

> InlineParent, Text and Viewport areas pass through renderInlineArea,
> so they've been taken care of. I'll implement the others as well.

Ah - ForeignObject and Image go via renderViewport. Looks like
determining the positions in renderBlock and renderInlineArea is
enough after all.

Anyway, I'll test thoroughly before submitting a patch.


Cheers,
Paul Vinkenoog

Re: PDF internal links re-implementation - advice sought

Posted by Paul Vinkenoog <pa...@vinkenoog.nl>.
Adrian and Jeremias,

Thanks for your comments and advice.

> If you create this link class, you need to make sure it works
> together with the intermediate format (XMLRenderer and
> AreaTreeParser), the XML representation of the area tree.

OK, will do.

>> At this moment I've overridden renderInlineArea and renderBlock to
>> get the PROD_ID and the current IP and BP positions.  With the
>> documents I've rendered so far, this is enough to resolve all the
>> links

> inlineparent, text, viewport, foreign-object, image can also be
> targets of internal-destination.

InlineParent, Text and Viewport areas pass through renderInlineArea,
so they've been taken care of. I'll implement the others as well.

> Check out inline-level_id.xml and block-level_id.xml in the layout
> engine tests. If you generate [1] the Area Tree XML (intermediate
> format) from those file, you'll see where the prod-id can be set.
>
> [1] you can also just look in build\test-results\layoutengine after
> the build.

Thanks for the tip, I'll have a look at it.


Kind regards,
Paul Vinkenoog

Re: PDF internal links re-implementation - advice sought

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Cool to see you're working on this, Paul!


More inline...

On 19.03.2007 16:50:52 Paul Vinkenoog wrote:
> Hi all,
> 
> I've recently re-implemented the full internal links behaviour in the
> PDF renderer. (I produce documentation for the Firebird project. FOP
> 0.93 solved a lot of our problems but we make use of links
> extensively, and having them land on the top of the page was a
> showstopper.)
> 
> At this point, everything works fine for our purposes, but of course I
> would like it to be of use for others as well, and acceptable to the
> FOP developers.
> 
> Actually I didn't want to discuss my changes until Jay Bryant
> committed his named destination implementation, because I suspect he's
> altered some code that I've also touched. But that hasn't happened
> yet, and I'll soon be in a situation where I may be too busy to work
> on this, so...
> 
> 
> First, I would like your opinion(s) on the way link information is
> passed from the layout phase to the renderer.
> 
> Currently, this is done as follows:
> - Link areas have an INTERNAL_LINK trait containing the unique
>   PageViewport key of the target ('P1', 'P2' etc.).
> - Bookmarks have a getPageViewport() method which returns a direct
>   reference to the target PageViewport.
> 
> IMO, the on-page target position should ideally be resolved during the
> layout phase, so that all renderers can benefit from it. I suppose
> that the information necessary could be made available at the moment
> addAreas is called on the LayoutManager - except that there may be
> some inline shift if page number citatons are resolved later?

Yes, the page number citations could be a problem here. Furthermore, we
absolutely don't do any absolute positioning in the layout manager at
this time so you'd have to introduce that everywhere.

> But I'm afraid the layout code (especially the Knuth dance) is so
> complicated that I couldn't master it in the time I had available.
> I'm sorry about that, but please bear in mind that this was my first
> introduction to the FOP source code ever.
> 
> Then an answer (to another person's question) on the fop-user list put
> me in the right direction: keep track of the positions in the
> renderer. So that's what I did.

I would have tried that route, too.

> For now, I've kept the existing functionality (see above) intact, both
> as a fallback and because some other renderers may need it (although
> within FOP itself, only PDFRenderer picks up INTERNAL_LINK traits).
> 
> To pass a link area's target ID to the renderer (bookmarks already
> have it on board), I've revived the ID_LINK trait (it was commented
> out in Trait.java). My internal link areas now carry the following
> information:
>   ID_LINK       : The target ID
>   INTERNAL_LINK : The PV key string
> 
> This is not exactly elegant, of course. Some alternatives are:
> 
> - Create an internal link class carrying those two strings (or maybe
>   rather a PageViewport field and an ID string), and make the
>   INTERNAL_LINK trait of that class.
> 
> - Since we're resolving target areas in the renderer anyway, drop the
>   PV key/reference altogether and pass the ID exclusively, be it in
>   ID_LINK or in INTERNAL_LINK.
> 
> Both are easy to implement, but again, they may break custom renderers
> that expect INTERNAL_LINK to be a string containing the PVkey. (The
> first alternative allows custom renderers to adapt; with the second,
> the page reference is completely lost.)
> 
> I'd like to know which solution you - the FOP developers - prefer.
> That is, if my approach of resolving the links in the renderer is
> acceptable at all.

I generally concur with what Adrian suggested. Using only the page
viewport key is obviously not enough. However, if you can do without the
page viewport key, that would be a bonus, but if it helps to speed up
any lookup processes then I see nothing against keeping it.

If you create this link class, you need to make sure it works together
with the intermediate format (XMLRenderer and AreaTreeParser), the XML
representation of the area tree.

Breaking custom renderers should of course be avoided if possible, but
if there's no other way, I guess this should be acceptable. Otherwise,
they can always chime in. And we can possibly improve from the first
patch you submit if necessary.

> 
> A second question I have is the following:
> 
> At this moment I've overridden renderInlineArea and renderBlock to get
> the PROD_ID and the current IP and BP positions.  With the documents
> I've rendered so far, this is enough to resolve all the links (if a
> link cannot be resolved, a warning is logged). But that doesn't prove
> it's always enough. Should I check more area types, or are Block and
> InlineArea the only ones that ever receive the IDs of the producing FO
> element? Or, more precise: that are ever the *first* (in doc order) to
> receive a certain ID?

inlineparent, text, viewport, foreign-object, image can also be targets
of internal-destination. Check out inline-level_id.xml and
block-level_id.xml in the layout engine tests. If you generate [1] the
Area Tree XML (intermediate format) from those file, you'll see where
the prod-id can be set.

[1] you can also just look in build\test-results\layoutengine after
the build.
> 
> Lastly, I'd like to tell you how much I appreciate all the effort you
> people have put in this project. We've been using FOP for years in the
> Firebird documentation project and it has become a tremendously
> important tool for us. So if there's anything I can do to improve my
> current implementation according to your wishes or requirements, I'll
> be more than happy to do so -- provided I have the time and the
> skills.

Thanks. And thank you for diving into one of those features that FOP
0.93 still lacks compared to 0.20.5!


Jeremias Maerki