You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Jeremias Maerki <de...@jeremias-maerki.ch> on 2008/09/03 10:26:38 UTC

Re: Links in SVGs included into PDFs

I have to call in sick today so this is my last message for now. Just to
give you some feedback. BTW, I noticed this should move to fop-dev since
we're discussion development. Please drop the fop-users CC in any
follow-ups.

On 03.09.2008 10:00:59 Stefan Bund wrote:
> Jeremias Maerki <de...@jeremias-maerki.ch> writes:
> > It just occurred to me that as an alternative we could simply track all
> > XSL-FO IDs unconditionally. 
> 
> That was, what I was thinking about. As a very first step, I want to
> implement just the Renderer part: Resolve the internal id references
> if and only if they have been registered with the IdTracker already. 
> 
> If I have that, the next step is to get the target id's into the
> IdTracker and it has already occured to me, why not just add all
> id's, if that is not, what is already done (from your comments, it's
> not).
> 
> > That would make scanning the SVG (or HTML or MathML) for links
> > unnecessary. 
> 
> Exactly. That would need an extension of Image and so on. I had
> thought of giving Image (or a baseclass) a list of resolvables and
> subclassing Image with a special SVGImage which would then do the
> parsing and store the created SVG DOM Tree to be reused at render
> time.
> 
> However, this seems to be FAR more complicated then just registering
> all id's ...

Yes, and I wouldn't have like a SVG-specific solution.

> > The overhead is probably even smaller and could actually have
> > additional side-benefits for certain special use cases. For example,
> > we could have an event that notifies the FOP-User which ID has its
> > first (and last) area on which page. The area tree (and AT XML) size
> > would increase a bit but I don't think by much (String plus
> > PageViewport reference per destination).
> 
> Sounds reasonable :-)
> 
> > On 03.09.2008 09:35:52 Jeremias Maerki wrote:
> >> Hmm, you're right. This is more complicated that I thought. Getting the
> >> actual position of a link destination on a page is easily done with
> >> information from PDFRenderer. 
> 
> This is, where I am standing currently. My problem is getting exactly
> that information.
> 
> I can use PDFRenderer.getPDFGoToForID(targetID, pvKey), however, for
> that I need to already know the pvKey. As far as I understand, I could
> get that information form the IdTracker but after lot's of grepping
> and reading through the source code I could not find a way to access
> the AreaTreeManager from the PDFRenderer.
> 
> So my question at the moment is: a) How do I get at the page viewport
> key given an id or more specifically b) How do I access the
> AreaTreeManager from PDFRenderer.

I don't think you should access the AreaTreeManager. The renderers are
designed to be passive. I believe the Renderer should be informed by the
AreaTreeHandler about any IDs it's managing. Not the other way around.

> >> Maybe that helps. I hope I didn't scare you too much.
> 
> No, you have cleared up lots of points which I guessed but was way from
> sure about.

Cool.

> I'd try to continue along the way assuming the register-all-id's stuff
> could work. And even if that would NOT work (register-all-id's), I
> think i would be MUCH farther ... as a last resort I can always hack
> references into the surrounding code, if needed even using some xslt
> to preprocess the fo+svg stuff (which is already generated from
> docbook in my case so adding some more xslt is realtively straight
> forward). Quite a hack and not a general solution but it could at
> least work for the moment :-) and I'm at the point where I'd be
> grateful for ANY working solution (I have not found ANY way to produce
> a PDF file with embeded images containing links into the PDF ... that
> is, using open source technology).
> 
> Many thanks for all your insight. I feel, this could even lead
> somewhere :-)

Good luck!



Jeremias Maerki


Re: Links in SVGs included into PDFs

Posted by Stefan Bund <as...@gmx.de>.
I have created a bug-report on bugzilla. Tse ID is #45759

  https://issues.apache.org/bugzilla/show_bug.cgi?id=45759

stefan.


Re: Links in SVGs included into PDFs

Posted by Stefan Bund <as...@gmx.de>.
Andreas Delmelle <an...@telenet.be> writes:
> From what you have described, it seems to be a valid approach, but
> without looking at the actual code, it's still an impression only...
>
> The only thing I'm slightly concerned with is that the proposed
> solution would only work for PDF output (not for the AWT preview
> panel, for example).

Thats right. You have me there. On the other hand, a solution working
across all output formats would still need adjustments within each
output format as far as I understand (In that formats PDF
renderer). Since links are stored as annotations in the inline parent
(whatever that really is ;-) ) and it's only possible to store a
single link there which always covers the complete object the area
tree would have to be extended to support svg links. So the renderers
would have to be extended all to support this extension ... of course
this would be much better as a solution since it would allow to
implement links from other graphical formats as well.

On the other hand, that solution is hugely more complex to implement,
at least for me ...

> However, since it is by far the most popular output format, this is
> not really a big issue. Supporting it for one format is still a lot
> better than nothing at all. :-)

That's my impression to. I'm coming from docbook, so now it works for
docbook->pdf. The next step on that front would be, to get graphical
links to work in docbooks HTML output using image maps. But that has
nothing to do with Apache-FOP.

> (... snip very helpful area-tree / resolvable explpanation ...)
>
> As such, you don't need them, if you can resolve the links in the SVG
> entirely at the rendering stage. As Jeremias did mention, if you need
> to track them from the layoutengine (in which case the page they
> point to may not exist yet in the area tree) then you would need a
> Resolvable implementation to make sure the resolution is triggered
> automatically when the corresponding 'id' is encountered on a
> following page.

Ahhh ... ok. I get it. Since the PDF output engine can work with
unresolved links by placing something in the PDF trailer, the PDF
renderer does not rely on links being reolved in the area tree.

However, other output formats may have a problem with this, ok.

> Indeed. Upon adding a new attachment, you can mark the earlier version
> as 'obsolete'.

Ah, ok. I'll post the bugzilla ID here when I have uploaded the code.

stefan.


Re: Links in SVGs included into PDFs

Posted by Andreas Delmelle <an...@telenet.be>.
On Sep 4, 2008, at 10:03, Stefan Bund wrote:

> Andreas Delmelle <an...@telenet.be> writes:
>> Seems like a nice job you did here!
>
> Thanks. Do you know, wether my assumptions are valid ? I've been
> guessing my way along and it would be nice for someone in the know to
> check up on the code.

 From what you have described, it seems to be a valid approach, but  
without looking at the actual code, it's still an impression only...

The only thing I'm slightly concerned with is that the proposed  
solution would only work for PDF output (not for the AWT preview  
panel, for example).
However, since it is by far the most popular output format, this is  
not really a big issue. Supporting it for one format is still a lot  
better than nothing at all. :-)

> The quesetion is, how to make the code
> available.

As I hinted, creating an 'svn diff' of the altered/added files, and  
attaching it to a Bugzilla entry (https://issues.apache.org/bugzilla)  
is the preferred option.

> What I don't understand is, how the linking stuff in the renderers
> integrates with the Resolvable stuff in the area tree. I do not add
> any resolvables in the area tree for the SVG images.

The area tree is created by the layoutengine. When an area has to be  
generated for a FO node that references another node by its  
'id' (like a basic-link, or a page-number-citation), and the 'id' it  
points to has not yet been associated to a previous page-viewport, a  
Resolvable is created and added to the IdTracker. Upon generating an  
area for a FO node with a given 'id', a check is performed to see if  
any Resolvables exist that reference it. If the 'id' was already  
added to a previous page, we don't need a Resolvable, as we can  
immediately get the information we need.

As such, you don't need them, if you can resolve the links in the SVG  
entirely at the rendering stage. As Jeremias did mention, if you need  
to track them from the layoutengine (in which case the page they  
point to may not exist yet in the area tree) then you would need a  
Resolvable implementation to make sure the resolution is triggered  
automatically when the corresponding 'id' is encountered on a  
following page.

>
>> If you like, you can already open a Bugzilla entry for this, and
>> attach a diff against FOP Trunk there.
>
> No problem. Only, what should I do when I later change the code
> further? Followup on the Bugzilla entry with a new patch ?

Indeed. Upon adding a new attachment, you can mark the earlier  
version as 'obsolete'.


Cheers

Andreas

Re: Links in SVGs included into PDFs

Posted by Stefan Bund <as...@gmx.de>.
Andreas Delmelle <an...@telenet.be> writes:
> Seems like a nice job you did here!

Thanks. Do you know, wether my assumptions are valid ? I've been
guessing my way along and it would be nice for someone in the know to
check up on the code. The quesetion is, how to make the code
available.

What I don't understand is, how the linking stuff in the renderers
integrates with the Resolvable stuff in the area tree. I do not add
any resolvables in the area tree for the SVG images.

> If you like, you can already open a Bugzilla entry for this, and
> attach a diff against FOP Trunk there.

No problem. Only, what should I do when I later change the code
further? Followup on the Bugzilla entry with a new patch ?

> This makes it more probable that someone else picks it up (in case
> you're really unsure when you'll be able to work on it further).

stefan.


Re: Links in SVGs included into PDFs

Posted by Andreas Delmelle <an...@telenet.be>.
On Sep 3, 2008, at 15:51, Stefan Bund wrote:

Hi Stefan

> <snip />

Seems like a nice job you did here!

> I don't know, how much more time I can invest. I'd very much like
> working SVG internal link support is added to the trunk so later
> Versions will support it. But there is probably quite some work before
> this can be added and I don't have enough experience with Apache-FOP
> (say: almost none, I only started using it a few days ago) to go much
> further. I can happily hack on the code but I have no idea, whether
> the stuff I'm doing is good or perverse ;-)

If you like, you can already open a Bugzilla entry for this, and  
attach a diff against FOP Trunk there.
This makes it more probable that someone else picks it up (in case  
you're really unsure when you'll be able to work on it further).

Cheers

Andreas

Re: Links in SVGs included into PDFs

Posted by Stefan Bund <as...@gmx.de>.
Jeremias Maerki <de...@jeremias-maerki.ch> writes:
> I don't think you should access the AreaTreeManager. The renderers are
> designed to be passive. I believe the Renderer should be informed by the
> AreaTreeHandler about any IDs it's managing. Not the other way around.
[...]
> Good luck!

I hope, I've had that. I have a patch which works for the very simple
rudimentary test cases I've thrown at it. It is very likely missing
something but for now it's much better than nothing IMHO.

Here what I've done:

I extended PDFGraphics2D to hold a reference to the PDFRenderer. I
added a method 'renderInternalLink(Rectangle2D rect, String idRef)' to
PDFRenderer which is really just an excerpt from 'renderInlineParent'.

And here's the crux: renderInternalLink does ONLY take the idRef NOT
the pvKey as an argument. I had this idea while looking over the
implementation for 'PDFRenderer.getPDFGoToForID'. There I got the
impression, that I might possibly just set pvKey to be null: The
PDFGoTo would be incomplete and be added to unfinishedGoTos. To be
later completed and added to the trailer.

PDFGraphics2D got a member 'addInternalLink' akin to 'addLink' and
I extended PDFANode to call 'addInternalLink' when the link target
starts with '#'.

I had to commented in a disabled piece of code in
'PDFRenderer.saveAbsolutePosition' so the missing pageRefs are set
correctly for backward links. But then:

  Voila: It works ! 

Very possibly, it will only work if some other internal link to the
same target exists (my first checks indicate, it works even if this is
not the case), but that's no problem for me right now. Possibly, the
PDF is not as efficient as it could be since ALL links from SVG's need
to be added to the PDF trailer (as far as I understand it, I really
don't know much about PDF internals) not only forward links. But again
this is not a problem for me either, at least not at the moment.

So I have come a long way. Next steps would be:

- Check more cases

- Test more thoroughly what happens if referencing a link target from
  SVG which is referenced nowhere else

- Maybe the PDFRenderer can be optimized to reduce some of the
  overhead added through the use of incomplete links, especially
  backward links (we'd just need to add a cache for all encountered
  link targets and add those targets when first referenced, something
  like that. For backward links, I don't think an optimization is
  possible)

- Possibly make this complete behavior user configurable

I don't know, how much more time I can invest. I'd very much like
working SVG internal link support is added to the trunk so later
Versions will support it. But there is probably quite some work before
this can be added and I don't have enough experience with Apache-FOP
(say: almost none, I only started using it a few days ago) to go much
further. I can happily hack on the code but I have no idea, whether
the stuff I'm doing is good or perverse ;-)

So, if batik could generate HTML image-maps I could convert my DocBook to
PDF and HTML and have active links everywhere, but that's another
story ...

stefan.