You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Bertrand Delacretaz <bd...@codeconsult.ch> on 2001/11/23 13:32:00 UTC

Merging jfor into FOP - what's the plan?

(repost - I think the first one didn't get through)

Now that the introductions are done, I'd like to initiate the discussion 
about how to actually merge jfor into FOP.

Currently I have one major code contribution to integrate into the jfor code 
base. I expect to be done in a week and would like to release a last 
"non-FOP" version of jfor with these changes.

Regarding the merging of jfor, I see three options:

1) inclusion of the jfor.jar in the FOP distribution, "user-level" 
integration where a -rtf switch of FOP causes jfor to run instead of FOP

Makes it possible for users to generate RTF + PDF without needing a separate 
download. No benefits on the developer side. We might get a lot of questions 
like "why is the RTF output so poor compared to PDF".

2) same but modify jfor to use the existing FOP infrastructure: startup, 
parser, configuration, logging, etc..

3) full integration of jfor as a FOP renderer, taking advantage of the FOP 
analysis of the XSL-FO document.
IMHO this needs to bypass the layout stage to stay quick and translate as 
much of the document structure as possible to RTF.

Considering that I won't have much time in the next few weeks, my suggestion 
would be to first go ahead with 1) and simultaneously 
studying and discussing how to best reach 2) and 3).

Any thoughts?

- Bertrand


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Merging jfor into FOP - what's the plan?

Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
On Thursday 29 November 2001 12:44, Keiron Liddle wrote:
> So are things like static areas, markers, page numbers etc. possible with
> rtf or are these type of things simply not possible.

Keiron,

as far as I know, RTF does support the following (but jfor currently not for 
most of these things) - In parentheses, my understanding of these concepts, 
to make sure we're on the same wavelength:

static areas - yes (headers and footers)
markers - yes (references like "see page N")
page numbers - yes (dynamic auto-numbering)

But things like page numbers must be left to RTF to compute, FOP will need to 
include an *RTF code* to let the RTF reader compute page numbers, not compute 
them by itself.

- Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Merging jfor into FOP - what's the plan?

Posted by Keiron Liddle <ke...@aftexsw.com>.
On 2001.11.27 12:40 Bertrand Delacretaz wrote:
> Without knowing too much about FOP internals, I think a processing chain
> along these lines might help:
> 
> parsing if needed
> -> SAX events
> -> FO attributes processing (validation, inheritance)
> -> StructureRenderer
> 
> StructureRenderer is
> EITHER Layout + PrintRenderer
> OR StructureProcessor (RTF, MIF, etc.)
> 
> What we need to find out is how much the existing FOP and these
> "structure renderers" have in common.

This sounds like the sort of approach that we need.
If possible we might be able to have a "layout processor" which normally 
reads the fo objects and creates an area tree. An alternate implementation 
will instead directly create the output document.

The fo object tree does all the handling of attributes.

So are things like static areas, markers, page numbers etc. possible with 
rtf or are these type of things simply not possible.

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Merging jfor into FOP - what's the plan?

Posted by "Peter B. West" <pb...@powerup.com.au>.
Bertrand et al,

It looks as though the principle of disentangling the FO and Area tree 
builds, with communication by a stream of FOEvents, would also be useful 
in this context.

Peter

Bertrand Delacretaz wrote:

> Hi Arved,
> 
> 
>>What are your recommendations for someone to come up to speed with RTF?
>>
> 
> I'd recommend to stay away from it unless you really have to ;-)
> Seriously, to someone accustomed to clear and well-defined specs, RTF is 
> somewhat messy, what it is really is a documented internal format, not a spec 
> that has been agreed upon by a carefully-selected comittee.
> 
> The RTF spec that we use in jfor is (mostly) V1.5 from Microsoft, who since 
> moved on to 1.6 (at least), but apparently 1.5 is the most widely supported 
> spec. A google search shows it at http://www.dubois.ws/software/RTF, it might 
> be harder to find at Microsoft as it's not the latest.
> 
> The rtflib package of jfor (available at www.jfor.org) encapsulates our 
> knowledge of RTF and is fairly simple and understandable, but it is still too 
> much element-oriented.
> One important thing to realize (happened too late here) is that RTF is 
> more flow-based or stack-based than element-based: not everything that is 
> opened has to be closed, it's more like a flow with embedded attribute 
> changes.
> 
> 
>>As I understand it, RTF is presented
>>to a user-agent which does a fair amount of layout; higher-level structures
>>are still present in the RTF. 
>>
> 
> Right - but there are both structure and presentations codes, so an RTF 
> document could be both. 
> Jfor has a strong bend towards structure, as usually the user goal is to get 
> an editable RTF document, where as much of the original document structure 
> must be preserved for convenience. 
> Precise appearance usually comes second, as applying a new wordprocessor 
> style sheet can change a lot of it.
> 
> RTF is both a presentation and a structure format, along with a moving target 
> due to the "spec" being expanded and rewritten with nearly every new version 
> of winword. 
> There are a many grey areas in the spec, meaning the only possible test is 
> opening the generated RTF in the desired wordprocessors (and often watching 
> it crash...).
> 
> 
>><snip>
>>This is not so different from MIF
>>
> Agreed. We are working with MIF for another project, and didn't choose FOP 
> for that because of lack of precise control over the MIF output.
> 
> I tend to see these formats as:
> -PDF for finished high-quality output ("presentation language"), layout 100% 
> done by FOP
> 
> -MIF for semi-finished high-quality output ("typography language"), layout 
> done by Framemaker according to MIF instructions.
> 
> -RTF for editable structure + presentation output ("wordprocessing 
> language"), layout done by wordprocessor.
> 
> So I fully agree that MIF and RTF "renderers" share a lot in common - 
> they must be able to get as much information as possible about the original 
> document structure, and in my view do not need any layout computations.
> 
> 
>>In a sense with RTF and MIF (and HTML for anyone who really desperately
>>wants to see FO->HTML) we are talking about translators as opposed to
>>formatters and renderers...
>>
> 
> yes - that's why I called jfor a "converter" instead of "formatter"
> 
> Without knowing too much about FOP internals, I think a processing chain 
> along these lines might help:
> 
> parsing if needed
> -> SAX events
> -> FO attributes processing (validation, inheritance) 
> -> StructureRenderer
> 
> StructureRenderer is
> EITHER Layout + PrintRenderer
> OR StructureProcessor (RTF, MIF, etc.)
> 
> What we need to find out is how much the existing FOP and these "structure 
> renderers" have in common.
> 
> - Bertrand
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
> For additional commands, email: fop-dev-help@xml.apache.org
> 
> 


-- 
Peter B. West  pbwest@powerup.com.au  http://powerup.com.au/~pbwest
"Lord, to whom shall we go?"


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


RE: Merging jfor into FOP - what's the plan?

Posted by Scott Sanders <ss...@nextance.com>.
The latest RTF Spec (1.7), pertaining to Word 2002 is at:

http://download.microsoft.com/download/Word2002/Install/1.7/W98NT42KMeXP
/EN-US/W2KRTFSF.exe

Self Extracting exe with the Word doc inside.

Scott Sanders


-----Original Message-----
From: Bertrand Delacretaz [mailto:bdelacretaz@codeconsult.ch] 
Sent: Tuesday, November 27, 2001 3:40 AM
To: fop-dev@xml.apache.org
Subject: Re: Merging jfor into FOP - what's the plan?

Hi Arved,

> What are your recommendations for someone to come up to speed with
RTF?

I'd recommend to stay away from it unless you really have to ;-)
Seriously, to someone accustomed to clear and well-defined specs, RTF is

somewhat messy, what it is really is a documented internal format, not a
spec 
that has been agreed upon by a carefully-selected comittee.

The RTF spec that we use in jfor is (mostly) V1.5 from Microsoft, who
since 
moved on to 1.6 (at least), but apparently 1.5 is the most widely
supported 
spec. A google search shows it at http://www.dubois.ws/software/RTF, it
might 
be harder to find at Microsoft as it's not the latest.

The rtflib package of jfor (available at www.jfor.org) encapsulates our 
knowledge of RTF and is fairly simple and understandable, but it is
still too 
much element-oriented.
One important thing to realize (happened too late here) is that RTF is 
more flow-based or stack-based than element-based: not everything that
is 
opened has to be closed, it's more like a flow with embedded attribute 
changes.

> As I understand it, RTF is presented
> to a user-agent which does a fair amount of layout; higher-level
structures
> are still present in the RTF. 

Right - but there are both structure and presentations codes, so an RTF 
document could be both. 
Jfor has a strong bend towards structure, as usually the user goal is to
get 
an editable RTF document, where as much of the original document
structure 
must be preserved for convenience. 
Precise appearance usually comes second, as applying a new wordprocessor

style sheet can change a lot of it.

RTF is both a presentation and a structure format, along with a moving
target 
due to the "spec" being expanded and rewritten with nearly every new
version 
of winword. 
There are a many grey areas in the spec, meaning the only possible test
is 
opening the generated RTF in the desired wordprocessors (and often
watching 
it crash...).

> <snip>
> This is not so different from MIF
Agreed. We are working with MIF for another project, and didn't choose
FOP 
for that because of lack of precise control over the MIF output.

I tend to see these formats as:
-PDF for finished high-quality output ("presentation language"), layout
100% 
done by FOP

-MIF for semi-finished high-quality output ("typography language"),
layout 
done by Framemaker according to MIF instructions.

-RTF for editable structure + presentation output ("wordprocessing 
language"), layout done by wordprocessor.

So I fully agree that MIF and RTF "renderers" share a lot in common - 
they must be able to get as much information as possible about the
original 
document structure, and in my view do not need any layout computations.

> In a sense with RTF and MIF (and HTML for anyone who really
desperately
> wants to see FO->HTML) we are talking about translators as opposed to
> formatters and renderers...

yes - that's why I called jfor a "converter" instead of "formatter"

Without knowing too much about FOP internals, I think a processing chain

along these lines might help:

parsing if needed
-> SAX events
-> FO attributes processing (validation, inheritance) 
-> StructureRenderer

StructureRenderer is
EITHER Layout + PrintRenderer
OR StructureProcessor (RTF, MIF, etc.)

What we need to find out is how much the existing FOP and these
"structure 
renderers" have in common.

- Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Merging jfor into FOP - what's the plan?

Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
Hi Arved,

> What are your recommendations for someone to come up to speed with RTF?

I'd recommend to stay away from it unless you really have to ;-)
Seriously, to someone accustomed to clear and well-defined specs, RTF is 
somewhat messy, what it is really is a documented internal format, not a spec 
that has been agreed upon by a carefully-selected comittee.

The RTF spec that we use in jfor is (mostly) V1.5 from Microsoft, who since 
moved on to 1.6 (at least), but apparently 1.5 is the most widely supported 
spec. A google search shows it at http://www.dubois.ws/software/RTF, it might 
be harder to find at Microsoft as it's not the latest.

The rtflib package of jfor (available at www.jfor.org) encapsulates our 
knowledge of RTF and is fairly simple and understandable, but it is still too 
much element-oriented.
One important thing to realize (happened too late here) is that RTF is 
more flow-based or stack-based than element-based: not everything that is 
opened has to be closed, it's more like a flow with embedded attribute 
changes.

> As I understand it, RTF is presented
> to a user-agent which does a fair amount of layout; higher-level structures
> are still present in the RTF. 

Right - but there are both structure and presentations codes, so an RTF 
document could be both. 
Jfor has a strong bend towards structure, as usually the user goal is to get 
an editable RTF document, where as much of the original document structure 
must be preserved for convenience. 
Precise appearance usually comes second, as applying a new wordprocessor 
style sheet can change a lot of it.

RTF is both a presentation and a structure format, along with a moving target 
due to the "spec" being expanded and rewritten with nearly every new version 
of winword. 
There are a many grey areas in the spec, meaning the only possible test is 
opening the generated RTF in the desired wordprocessors (and often watching 
it crash...).

> <snip>
> This is not so different from MIF
Agreed. We are working with MIF for another project, and didn't choose FOP 
for that because of lack of precise control over the MIF output.

I tend to see these formats as:
-PDF for finished high-quality output ("presentation language"), layout 100% 
done by FOP

-MIF for semi-finished high-quality output ("typography language"), layout 
done by Framemaker according to MIF instructions.

-RTF for editable structure + presentation output ("wordprocessing 
language"), layout done by wordprocessor.

So I fully agree that MIF and RTF "renderers" share a lot in common - 
they must be able to get as much information as possible about the original 
document structure, and in my view do not need any layout computations.

> In a sense with RTF and MIF (and HTML for anyone who really desperately
> wants to see FO->HTML) we are talking about translators as opposed to
> formatters and renderers...

yes - that's why I called jfor a "converter" instead of "formatter"

Without knowing too much about FOP internals, I think a processing chain 
along these lines might help:

parsing if needed
-> SAX events
-> FO attributes processing (validation, inheritance) 
-> StructureRenderer

StructureRenderer is
EITHER Layout + PrintRenderer
OR StructureProcessor (RTF, MIF, etc.)

What we need to find out is how much the existing FOP and these "structure 
renderers" have in common.

- Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Merging jfor into FOP - what's the plan?

Posted by Arved Sandstrom <Ar...@chebucto.ns.ca>.
Hi, Bertrand

What are your recommendations for someone to come up to speed with RTF? I 
(and possibly others) need to understand RTF better in order to assist.

The existing renderers for PDF, Postscript, XML and AWT can all handle raw 
areas...they do no layout whatsoever. As I understand it, RTF is presented 
to a user-agent which does a fair amount of layout; higher-level structures 
are still present in the RTF. This is not so different from MIF, and in 
fact, when the MIFRenderer was originally written, there _were_ some 
problems (as I recall) in working from the area tree directly. For example, 
MIF understands tables - this information needed to be passed along whereas 
other renderers no longer cared about such semantics.

Since the MIFRenderer is somewhat moribund (I think) then jfor really 
becomes the prototype for a different class of formatter/renderers, 
operating in parallel with the existing code for PDF etc. It would be 
interesting to see if we can do things in such a way so as to resurrect MIF 
also, since I think it never ought to have been a renderer in the first place.

In a sense with RTF and MIF (and HTML for anyone who really desperately 
wants to see FO->HTML) we are talking about translators as opposed to 
formatters and renderers...again, correct me if I am wrong, but the output 
of the translator is presented to a user-agent that will actually be doing 
layout.

Regards,
Arved Sandstrom

At 08:43 AM 11/27/01 +0100, Bertrand Delacretaz wrote:
>Hi Keiron,
>
>If there is not going to be a FOP release in the next few weeks, I 
>agree that a minimal integration does not make sense.
>
>Currently the jfor conversion is driven directly from SAX events, so the 
>first thing that comes to mind is driving it from the FO tree.
>
>You're right that, contrary to print renderers, the RTF one will need to
know 
>about the structure of the original document.
>
>Does the FO tree handle things like attribute inheritance (i.e. a block 
>inherits the font definition from an ancestor block), or is this handled 
>while doing the layout? Such inheritance is currently missing in jfor.
>
>To summarize:
>-jfor needs to know about the original document structure: speaks for option 
>(A), plugging jfor right after the FO tree stage if I understand well.
>
>-BUT jfor could probably benefit from some operations done at the layout 
>stage (attributes inheritance, others?) : speaks for option (B), extending 
>the renderer interface to let the renderers know (cleanly) about the
original 
>document structure.
>
>If you can give me some pointers about where to look at in the code to 
>evaluate (A) and (B), I'll have a look.
>
>- Bertrand
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
>For additional commands, email: fop-dev-help@xml.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Merging jfor into FOP - what's the plan?

Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
Hi Keiron,

If there is not going to be a FOP release in the next few weeks, I 
agree that a minimal integration does not make sense.

Currently the jfor conversion is driven directly from SAX events, so the 
first thing that comes to mind is driving it from the FO tree.

You're right that, contrary to print renderers, the RTF one will need to know 
about the structure of the original document.

Does the FO tree handle things like attribute inheritance (i.e. a block 
inherits the font definition from an ancestor block), or is this handled 
while doing the layout? Such inheritance is currently missing in jfor.

To summarize:
-jfor needs to know about the original document structure: speaks for option 
(A), plugging jfor right after the FO tree stage if I understand well.

-BUT jfor could probably benefit from some operations done at the layout 
stage (attributes inheritance, others?) : speaks for option (B), extending 
the renderer interface to let the renderers know (cleanly) about the original 
document structure.

If you can give me some pointers about where to look at in the code to 
evaluate (A) and (B), I'll have a look.

- Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Merging jfor into FOP - what's the plan?

Posted by Keiron Liddle <ke...@aftexsw.com>.
Hi Bertrand,

For the short term I think that (1) would be the thing to do but since 
there won't be a release of FOP for a while there may be no point doing 
anything for the short term.

As for how it will eventually end up working with the rest of fop.
Can you give us a quick rundown of what is involved in creating an rtf 
document from xsl fo. What sort of information is passed from the fo to 
the rtf. How layout is considered etc.

The way that FOP normally converts from fo to the output is by a few 
steps. First the fo is turned into the formatting object tree. This is 
then turned into an area tree. This area tree represents the final layout 
with data that any renderer can handle. The renderer then uses this area 
tree to create the pages.
This means that the renderer knows nothing about the original document and 
does not have a concept of lists, tables etc.
I should also point out that the MIF renderer used references to the 
formatting object tree to determine things ike tables to create tables in 
the output. This sort of thing is being revisited as it causes problems.

Regards,
Keiron.

On 2001.11.23 13:32 Bertrand Delacretaz wrote:
> (repost - I think the first one didn't get through)
> 
> Now that the introductions are done, I'd like to initiate the discussion
> about how to actually merge jfor into FOP.
> 
> Currently I have one major code contribution to integrate into the jfor
> code
> base. I expect to be done in a week and would like to release a last
> "non-FOP" version of jfor with these changes.
> 
> Regarding the merging of jfor, I see three options:
> 
> 1) inclusion of the jfor.jar in the FOP distribution, "user-level"
> integration where a -rtf switch of FOP causes jfor to run instead of FOP
> 
> Makes it possible for users to generate RTF + PDF without needing a
> separate
> download. No benefits on the developer side. We might get a lot of
> questions
> like "why is the RTF output so poor compared to PDF".
> 
> 2) same but modify jfor to use the existing FOP infrastructure: startup,
> parser, configuration, logging, etc..
> 
> 3) full integration of jfor as a FOP renderer, taking advantage of the
> FOP
> analysis of the XSL-FO document.
> IMHO this needs to bypass the layout stage to stay quick and translate as
> 
> much of the document structure as possible to RTF.
> 
> Considering that I won't have much time in the next few weeks, my
> suggestion
> would be to first go ahead with 1) and simultaneously
> studying and discussing how to best reach 2) and 3).
> 
> Any thoughts?
> 
> - Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org