You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Jan Tosovsky <j....@email.cz> on 2015/05/25 21:16:54 UTC

FOP -> POI

Dear All,

can you hypothetically imagine any way how to convert virtual page objects
to the office document structure? I actually I think of 'Slides' to PPTX
(XSLF) conversion.

There is not an easy way to produce paginated PPTX content using pure XSLT.
But FOP has all the required info somewhere in the memory before serializing
it into PDF, which could be somehow pushed to POI.

Why do I even need this? It is a real requirement to distrubute our slides
(XML -> PDF by default) also in PPTX to tailor the given training to the
target audience (by trainers on site).

The best solution I've found so far is Acrobat's PDF -> PPTX conversion. But
even this is sometimes buggy.

I think this would be a killer feature not available even in professional
formatters.

Thanks, Jan

Re: FOP -> POI

Posted by Andreas Delmelle <an...@telenet.be>.
Hi Jan, Matthias,

> On 28 May 2015, at 21:29, Jan Tosovsky <j....@email.cz> wrote:
> 
> Hi Matthias,
> 
> On 2015-05-27 Matthias Reischenbacher wrote:
>> 
>> I know pptx a bit, because I had to implement an output channel based
>> mainly on XSLT and a little bit of java (used mainly for zip
>> compression). 
> 
> Just for curiosity, what was the source format? Could this conversion be
> open-sourced?
> 
>> +1 for adding a pptx output to FOP, but I'd recommend doing it without
> POI.
> 
> I am afraid it would lead to lots of code duplication. My idea was rather a
> dedicated project dependent on both 'libraries', utilizing the best of both
> worlds.

Matthias does make a good point about the dependency.

In the meantime, having given Jan's replies from yesterday and today some further thought, I was thinking it may be possible to take the route of adding a SAX ContentHandler to the processing chain that translates the events from the AreaTreeHandler or the IFHandler into POI API calls... Something like that?

I have to agree with the position that adding a hard dependency on POI would be too much, but maybe it could have merit as a plugin / optional runtime dependency (?) Sort of like how one currently needs the PDFBox plugin to include PDFs via fox:external-document ?

If one wanted to use it, it would be technically possible and not too hard to achieve, but it would not come out of the box, i.e. POI would not be required to compile and build FOP.

Ideas...? Plenty of those over here.


KR

Andreas

RE: FOP -> POI

Posted by Jan Tosovsky <j....@email.cz>.
Hi Matthias,

On 2015-05-27 Matthias Reischenbacher wrote:
> 
> I know pptx a bit, because I had to implement an output channel based
> mainly on XSLT and a little bit of java (used mainly for zip
> compression). 

Just for curiosity, what was the source format? Could this conversion be
open-sourced?

> +1 for adding a pptx output to FOP, but I'd recommend doing it without
POI.

I am afraid it would lead to lots of code duplication. My idea was rather a
dedicated project dependent on both 'libraries', utilizing the best of both
worlds.

Jan


Re: FOP -> POI

Posted by Matthias Reischenbacher <ma...@gmx.at>.
Hi Jan,

I know pptx a bit, because I had to implement an output channel based 
mainly on XSLT and a little bit of java (used mainly for zip 
compression). Pptx is hard to understand because of all the cross 
references between the different files, but that wouldn't justify to add 
another dependency to FOP. So +1 for adding a pptx output to FOP, but 
I'd recommend doing it without POI.

BR,
Matthias

On 27.05.2015 17:26, Jan Tosovsky wrote:
> Hi Andreas,
>
> On 2015-05-27 Andreas Delmelle wrote:
>> On 2015-05-27 Jan Tosovsky wrote:
>>> On 2015-05-25 Andreas Delmelle wrote:
>>>> <snip />
>>>> it seems like it may just be possible to achieve something like
>>>> this by means of FOP's Intermediate Formats[*], which can already
>>>> be utilised to split up the basic formatting and rendering
>>>> processes.
>>> This approach could theoretically elimintate POI completely as most
>>> of IF -> PPTX could be done via XSLT ;-)
>>> But it is too low level for me.
>> Can you clarify? Not sure I am completely following here... Is POI
>> *not* low level, then?
>>
> I meant PPTX side, mainly various 'dictionaries'. When e.g. a new slide is
> added, its ID is registered in the main file. The slide is derived from a
> default template, which must be linked via its ID. Slides can have
> annotations, there is also a special template for them, it has to be linked
> to the slide as well, and so on...
>
> This cross referencing in POI is out of the box. You just add a new slide
> and all references are updated automatically. Yes, it is doable in pure
> XSLT, but with additional effort.
>
> Another thing is bulding the final PPTX file.
>   
>> What you rather vaguely describe as "the required info somewhere in the
>> memory before serializing it into PDF" for FOP basically *is* the Area
>> Tree. The AT and IF XML formats are just XML representations of said
>> info, so seems like you would not get around it either way...?
> I simply didn't know that.
>
> Thanks for explanation,
>
> Jan
>


RE: FOP -> POI

Posted by Jan Tosovsky <j....@email.cz>.
Hi Andreas,

On 2015-05-27 Andreas Delmelle wrote:
> On 2015-05-27 Jan Tosovsky wrote:
> > On 2015-05-25 Andreas Delmelle wrote:
> > > <snip />
> > > it seems like it may just be possible to achieve something like
> > > this by means of FOP's Intermediate Formats[*], which can already
> > > be utilised to split up the basic formatting and rendering
> > > processes.
> >
> > This approach could theoretically elimintate POI completely as most
> > of IF -> PPTX could be done via XSLT ;-)
> > But it is too low level for me.
> 
> Can you clarify? Not sure I am completely following here... Is POI
> *not* low level, then?
>

I meant PPTX side, mainly various 'dictionaries'. When e.g. a new slide is
added, its ID is registered in the main file. The slide is derived from a
default template, which must be linked via its ID. Slides can have
annotations, there is also a special template for them, it has to be linked
to the slide as well, and so on...

This cross referencing in POI is out of the box. You just add a new slide
and all references are updated automatically. Yes, it is doable in pure
XSLT, but with additional effort.

Another thing is bulding the final PPTX file.
 
> What you rather vaguely describe as "the required info somewhere in the
> memory before serializing it into PDF" for FOP basically *is* the Area
> Tree. The AT and IF XML formats are just XML representations of said
> info, so seems like you would not get around it either way...?

I simply didn't know that.

Thanks for explanation,

Jan


Re: FOP -> POI

Posted by Andreas Delmelle <an...@telenet.be>.
Hi Jan

> On 27 May 2015, at 21:22, Jan Tosovsky <j....@email.cz> wrote:
> 
> On 2015-05-25 Andreas Delmelle wrote:
>> <snip />
>> it seems like it may just be possible to achieve something like 
>> this by means of FOP's Intermediate Formats[*], which can already
>> be utilised to split up the basic formatting and rendering processes.
> 
> This approach could theoretically elimintate POI completely as most of IF ->
> PPTX could be done via XSLT ;-)

Right... I got to thinking that as well. 
Of course, that happened only after I had already sent this. :)

> But it is too low level for me.

Can you clarify? Not sure I am completely following here... Is POI *not* low level, then? 
I mean: it is basically an API to read/write MS Office document formats, so would require some additional code as well, albeit Java instead of XSLT?

What you rather vaguely describe as "the required info somewhere in the memory before serializing it into PDF" for FOP basically *is* the Area Tree. The AT and IF XML formats are just XML representations of said info, so seems like you would not get around it either way...?

That said, I feel like I may be missing a crucial piece of info here.

> Anyway, I'll investigate also POI end.

OK, cool. If you do see a way that the fop-dev team can be of assistance, feel free to report back here.


KR

Andreas

RE: FOP -> POI

Posted by Jan Tosovsky <j....@email.cz>.
On 2015-05-25 Andreas Delmelle wrote:
> > On 2015-05-25 Jan Tosovsky wrote:
> >
> > can you hypothetically imagine any way how to convert virtual page
> > objects to the office document structure? I actually I think of 
> > 'Slides' to PPTX (XSLF) conversion.
> > 
> > There is not an easy way to produce paginated PPTX content using 
> > pure XSLT. But FOP has all the required info somewhere in the 
> > memory before serializing it into PDF, which could be somehow 
> > pushed to POI.
>
> it seems like it may just be possible to achieve something like 
> this by means of FOP's Intermediate Formats[*], which can already
> be utilised to split up the basic formatting and rendering processes.

This approach could theoretically elimintate POI completely as most of IF ->
PPTX could be done via XSLT ;-) But it is too low level for me.

Anyway, I'll investigate also POI end.

Jan

Re: FOP -> POI

Posted by Andreas Delmelle <an...@telenet.be>.
> On 25 May 2015, at 21:16, Jan Tosovsky <j....@email.cz> wrote:
> 

Hi Jan

> can you hypothetically imagine any way how to convert virtual page objects
> to the office document structure? I actually I think of 'Slides' to PPTX
> (XSLF) conversion.

Very interesting question...
Somewhat related, as I recall, a suggestion/feature request has been raised in the past to add OpenOffice's document format as a potential new output format to FOP.

> There is not an easy way to produce paginated PPTX content using pure XSLT.
> But FOP has all the required info somewhere in the memory before serializing
> it into PDF, which could be somehow pushed to POI.

I must admit that I am unfamiliar with the most recent Apache POI API. Last time I looked at POI must have been almost 10 years ago.

That said, it seems like it may just be possible to achieve something like this by means of FOP's Intermediate Formats[*], which can already be utilised to split up the basic formatting and rendering processes.

[*] see: http://xmlgraphics.apache.org/fop/trunk/intermediate.html

While it is still an XML format, the benefit would be that it is already paginated, which may make it easier to generate PPTX slide-decks from. 
Basically, you would use FOP to create an IF file (or stream) from XSL-FO input, as a basis for PDF rendering on the one hand, and then somehow feed that same intermediate file to POI for creation of the PPTX. Basic formatting and pagination would be done once, through FOP's layout engine.

Not sure what POI can handle as input, though, or how difficult it would be to make it handle FOP's IF...


Not sure if that goes in the direction of what you were looking for, but hope this helps!



Andreas