You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@incubator.apache.org by ro...@us.ibm.com on 2011/06/08 18:51:09 UTC

Dcument automation with ODF (was Re: Request: Can "proposed committers" introduce themselves?)

dsh <da...@googlemail.com> wrote on 06/08/2011 12:15:52 PM:

> 
> Of course we had been using ODFDOM but the issue is how do you get ODF
> transformed accordingly to other formats such as RTF, AFP or PDF and
> make those formats look consistent with what you would get if doing
> the transformation natively during design time in OO or Symphony.
> 
> 

I think your observation is correct.  The ODF Toolkit does not currently 
have a good way of generating print or print equivalent output from an ODF 
document.  The Toolkit had no layout or rendering support.

But I wonder if this is something that Apache FOP could help solve?

The styling vocabulary of ODF is loosely borrowed from XSL Formatting 
Objects (XSL:FO).   It may be possible to generate XSL:FO from ODF much 
more easily than converting from ODF to PDF or Postscript directly.  But 
once we have the XSL:FO intermediary, then the pipeline could continue 
with Apache FOP to target formats ranging from PDF to raster images.

Does that sound plausible?  Someone needs to do the layout and rendering. 
But I hate to see that code written more than once.  The ODF->XSL:FO 
conversion would be a great toolkit enhancement.  Has POI done this with 
the Microsoft formats?

-Rob

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: Dcument automation with ODF (was Re: Request: Can "proposed committers" introduce themselves?)

Posted by dsh <da...@googlemail.com>.

Actually I evaluated XSL-FO rendering engines quite excessively
including Apache FOP. At that point in time (2009) Apache FOP still
had performance issues in scenarios where you would generate thousands
of business correspondence documents to be sent to clients on a daily
basis. In the end we decided to go with a commercial XSL-FO rendering
engine vendor where we had been using Apache FOP initially.
Additionally at that point in time the most recent Apache FOP version
did not have an open source approval which is of course an
IBM-internal detail.

The reason why we picked a commercial XSL-FO rendering engine was it's
stability, some proof records such as clients already using this
particular commercial XSL-FO rendering engine and it's feature
richness such as AFP and 2D barcode (e.g. data matrix) support which
is essential if you want to directly print to commercial printers.
IIRC besides those issues one issue that probably applies to any
XSL-FO rendering engine at least to a certain degree is that depending
on how much time you spend on the XSL stylesheet it might be pretty
expensive (in terms of man hours) to reassemble the layout of the
original ODF document in the PDF document (e.g. the final document
generated during runtime does not look the same to what has been
defined by the business user during design time either using Symphony
or OO). Hence my statement that it would have been nice if the core
Symphony/OO ODF->PDF transformation would have been available as a
separate library/module which could have been run on the server (AIX
or z/OS). That way the business user would have been using the same
transformation engine as the one used on the backend.

These days, if I would be in a position to redo the design I would be
tempted to figure out whether the whole transformation process could
be off-loaded to a self-contained appliance such as datapower XA35 or
even XI50. The datapower blade extension unit would even offer to
off-load MIPS from the mainframe, something that illustrates that
efficient ODF transformation is key in commercial environments where
MIPS are expensive.

But anyway I guess this scenario is a pretty advanced scenario cause
it involves a distributed server infrastructure and a business
application that generates large amounts of either PDF or AFP
documents on a daily basis.

Cheers
Daniel

On Wed, Jun 8, 2011 at 6:51 PM,  <ro...@us.ibm.com> wrote:
> dsh <da...@googlemail.com> wrote on 06/08/2011 12:15:52 PM:
>
>>
>> Of course we had been using ODFDOM but the issue is how do you get ODF
>> transformed accordingly to other formats such as RTF, AFP or PDF and
>> make those formats look consistent with what you would get if doing
>> the transformation natively during design time in OO or Symphony.
>>
>>
>
> I think your observation is correct.  The ODF Toolkit does not currently
> have a good way of generating print or print equivalent output from an ODF
> document.  The Toolkit had no layout or rendering support.
>
> But I wonder if this is something that Apache FOP could help solve?
>
> The styling vocabulary of ODF is loosely borrowed from XSL Formatting
> Objects (XSL:FO).   It may be possible to generate XSL:FO from ODF much
> more easily than converting from ODF to PDF or Postscript directly.  But
> once we have the XSL:FO intermediary, then the pipeline could continue
> with Apache FOP to target formats ranging from PDF to raster images.
>
> Does that sound plausible?  Someone needs to do the layout and rendering.
> But I hate to see that code written more than once.  The ODF->XSL:FO
> conversion would be a great toolkit enhancement.  Has POI done this with
> the Microsoft formats?
>
> -Rob
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: Document automation with ODF (was Re: Request: Can "proposed committers" introduce themselves?)

Posted by Dave Fisher <da...@comcast.net>.

>> Of course we had been using ODFDOM but the issue is how do you get ODF
>> transformed accordingly to other formats such as RTF, AFP or PDF and
>> make those formats look consistent with what you would get if doing
>> the transformation natively during design time in OO or Symphony.
>> 
>> 
> 
> I think your observation is correct.  The ODF Toolkit does not currently 
> have a good way of generating print or print equivalent output from an ODF 
> document.  The Toolkit had no layout or rendering support.
> 
> But I wonder if this is something that Apache FOP could help solve?
> 
> The styling vocabulary of ODF is loosely borrowed from XSL Formatting 
> Objects (XSL:FO).   It may be possible to generate XSL:FO from ODF much 
> more easily than converting from ODF to PDF or Postscript directly.  But 
> once we have the XSL:FO intermediary, then the pipeline could continue 
> with Apache FOP to target formats ranging from PDF to raster images.
> 
> Does that sound plausible?  Someone needs to do the layout and rendering. 
> But I hate to see that code written more than once.  The ODF->XSL:FO 
> conversion would be a great toolkit enhancement.  Has POI done this with 
> the Microsoft formats?

POI is more about reading, writing and calculating than it is about rendering. Users come to the list with questions about it, usually to HTML, and we help. In POI Excel is much better covered. Lately Word has finally been getting some attention.

Yegor and I have experimented outside the POI project with PPT2PS (and PDF) conversion so that we can make use of slides in our postscript workflow. We have been using some EPS generated by OOo for this, but likely due to the font embedding issues that Robert referenced earlier these EPS have the text rendered as shapes which is awful looking because font anti-aliasing is gone ... big fat lowercase "l" etc for Arial of all things.

One trouble with the FOP approach is that layout and rendering of tricky features is pushed even farther away from OOo. Not knowing the details, but knowing rendering and layout, there must certainly be code to do it in OOo. I would want to follow that - it is what the ODF toolkit ought to use from the core.  Maybe the trouble with that approach is that the rendering there is too tied in with GUI considerations?

IIRC - FOP like POI can suffer from the need to have the whole DOM in memory. If you have ever built a 6000 page PDF ...

We have thought of using PDFBox...

I think until we figure out where the rendering and layout should come from, the ODF Toolkit should be included as part of the Apache OOo podling. If the community decides it needs separate incubation that's fine. 

Exploring these trade-offs scientifically is what's needed - in the podling.

I need to stop reading these emails and start reading the OOo site and looking at code.

Now back to work...

Best Regards,
Dave


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org