You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by Simon Pepping <sp...@leverkruid.eu> on 2007/09/30 22:03:02 UTC

Re: Temp_Interleaved_Page_Line_Breaking, failed unit tests

On Sat, Sep 29, 2007 at 08:59:36AM +0200, Andreas L Delmelle wrote:
> As far as I can tell for the moment, the error is virtually always the same 
> ClassCastException, since the related LMs expect a list of KnuthElements, 
> where they currently get a list of ListElements.
> A KnuthElement is always a ListElement, but not vice versa. The newly added 
> ParagraphListElement, for instance, is not a KnuthElement. This seems to be 
> causing the CCE's.

Indeed, that is to be expected. Linebreaking for these paragraphs has
not yet been done. The essence of this branch is that linebreaking is
postponed until the pagebreaker needs to know the lines. For the page
elements---let me call this block mode, after TeX, which calls this
vertical mode---linebreaking during pagebreaking has been
implemented. See PageBreakingAlgorithm.resolveElements(KnuthSequence,
int) and its caller BreakingAlgorithm.findBreakingPoints. For
restricted block mode, named so after TeX's restricted vertical mode,
linebreaking is just not yet done. Finding out how to do that properly
when the pagebreaker requires the results, is the task that
remains. It applies a.o. to footnotes, list labels, table cells.

> Either the code in ListItemLM and others needs to be adapted to deal with 
> ListElements instead of KnuthElements, or the ParagraphListElement needs to 
> subclass KnuthElement instead of ListElement directly.

ParagraphListElements are not supposed be KnuthElements. The error is
genuine. After linebreaking has been applied to ParagraphListElements,
they have been replaced by KnuthBlockBoxes and similar
KnuthElements. Then the CCE will have been removed.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.eu

Re: Temp_Interleaved_Page_Line_Breaking

Posted by Simon Pepping <sp...@leverkruid.eu>.

That is a rather ideal situation. It requires not only interleaving of
page and line breaking, but also of page breaking and collection of
Knuth elements. That requires some communication. The collection of
Knuth elements is deeply recursive, LM.getNextKnuthElements. Each LM
now needs to pass its Knuth elements to the page breaker or the lowest
line breaker in the hierarchy. That can be accommodated by passing
that receiving object as an argument in
LM.getNextKnuthElements. Especially for the LMs in block mode (block
LMs and line LMs) it is important that they communicate their elements
soon, to allow the page breaker to interrupt the process and proceed
with the breaking calculations. In restricted block mode and inline
mode, i.e. mainly InlineLMs, that is not important, because the LineLM
will complete each paragraph before communicating it up.

This change is only meaningful for a best-fit strategy for one or a
few pages. For the total-fit strategy it adds complexity but no memory
efficiency. That is because this strategy cannot take a decision until
the whole page sequence is processed.

Simon

On Tue, Oct 02, 2007 at 11:31:12PM +0200, Andreas L Delmelle wrote:
>
> Just thought of it this way:
> Instead of collecting all the ListElements for the whole page-sequence in 
> one massive recursive iteration as is the case now 
> (getNextKnuthElements()), maybe the algorithm can be 'slightly' altered in 
> such a way that the FlowLM repeatedly checks back with the PageSequenceLM 
> and updates the LayoutContext for the active page.
> Not: collect *all* lines/paragraphs first, and only then *all* pages (may 
> be "total-fit", I'm not sure I would call it that...).
>
> But rather, an incremental total-fit:
>
> while (moreContent) {
>   collect more lines
>   if (accumulated line-height causes an implicit but unavoidable 
> page-break) {
>     run page-breaking algorithm over the accumulated lines
>   }
> }
>
> Obviously, the if-test is only a very rough estimation, but a good one, 
> since it guarantees that the sequence always generates at least one 
> page-break (no space-resolution, footnotes, floats taken into account here 
> yet)
>
> That would provide our 'interference' point, where decisions can be made 
> about whether to continue accumulating layout-possibilities or interrupt, 
> start adding areas based upon the best possibility so far, and resume, but 
> with a cleared state. The head of the list of lines/pages will be chopped 
> off, and their possibilities need no longer be considered.
>
> If I interpret correctly, the node corresponding to the 'best' overall 
> break for the first line/page (the one chosen by a total-fit 
> implementation), in many cases can be determined quite early in the 
> process. You don't always need to look at all words/lines in the 
> paragraph/page-sequence for that.

-- 
Simon Pepping
home page: http://www.leverkruid.eu

Re: Temp_Interleaved_Page_Line_Breaking

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Oct 2, 2007, at 18:47, Andreas L Delmelle wrote:

> On Oct 2, 2007, at 12:27, Chris Bowditch wrote:
>
>>
>> Hmm. I tend to agree with Simon's perspective here. The terms  
>> "total-fit" and "best-fit" refer to the implementation of the  
>> algorithms. Surely the end result of the total-fit algorithm is  
>> the same thing as the algorithm itself? The user expectation of  
>> total-fit may well be what you describe, but I'm not sure how it  
>> could be achieved using the algorithms themselves.
<snip />
> Also, my expression of "multiple best-fits" should actually be  
> "multiple total-fits".
> As in: There will always be some sort of threshold, like a certain  
> amount of content that can be fit onto, say, the first three pages.
>
> Once this threshold is known, running a total-fit algorithm over  
> the elements up to that point, and continuing from the next should  
> lead to *exactly* the same result as a global (uninterrupted) total- 
> fit.

Just thought of it this way:
Instead of collecting all the ListElements for the whole page- 
sequence in one massive recursive iteration as is the case now  
(getNextKnuthElements()), maybe the algorithm can be 'slightly'  
altered in such a way that the FlowLM repeatedly checks back with the  
PageSequenceLM and updates the LayoutContext for the active page.
Not: collect *all* lines/paragraphs first, and only then *all* pages  
(may be "total-fit", I'm not sure I would call it that...).

But rather, an incremental total-fit:

while (moreContent) {
   collect more lines
   if (accumulated line-height causes an implicit but unavoidable  
page-break) {
     run page-breaking algorithm over the accumulated lines
   }
}

Obviously, the if-test is only a very rough estimation, but a good  
one, since it guarantees that the sequence always generates at least  
one page-break (no space-resolution, footnotes, floats taken into  
account here yet)

That would provide our 'interference' point, where decisions can be  
made about whether to continue accumulating layout-possibilities or  
interrupt, start adding areas based upon the best possibility so far,  
and resume, but with a cleared state. The head of the list of lines/ 
pages will be chopped off, and their possibilities need no longer be  
considered.

If I interpret correctly, the node corresponding to the 'best'  
overall break for the first line/page (the one chosen by a total-fit  
implementation), in many cases can be determined quite early in the  
process. You don't always need to look at all words/lines in the  
paragraph/page-sequence for that.

Andreas

Re: Temp_Interleaved_Page_Line_Breaking

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Oct 2, 2007, at 12:27, Chris Bowditch wrote:

> Andreas L Delmelle wrote:
>
> <snip/>
>
> Hi Andreas,
>
>> That is not DISagreeing with me, I think (almost on the contrary).  
>> I  did not mean total-fit in the sense of the implementation of  
>> the  algorithm, but total-fit as to the end-result: as such, a  
>> total-fit  result may precisely require a breaking-up into many  
>> smaller best- fits. A total-fit result may just be much more  
>> difficult to  accomplish with a total-fit algorithm, than by a  
>> succession of best- fit loops, interrupted and resumed at regular  
>> intervals.
>
> Hmm. I tend to agree with Simon's perspective here. The terms  
> "total-fit" and "best-fit" refer to the implementation of the  
> algorithms. Surely the end result of the total-fit algorithm is the  
> same thing as the algorithm itself? The user expectation of total- 
> fit may well be what you describe, but I'm not sure how it could be  
> achieved using the algorithms themselves.

*sigh*
Reminds me how much I hate quibbling over semantics...
In any case: if the page-sequence is portrait-landscape-portrait- 
etc., and FOP breaks all lines as if it is a whole sequence of  
portrait-pages, then I would call that "no-fit" or "first-page-only- 
fit"... :-)

If this can only be resolved by providing the "total-fit" algorithm  
*all* possible pages in advance, then /that/ is what needs to happen.  
I understand Simon perfectly when he states that "total-fit has no  
point where you can interfere"... Well then, give it all the relevant  
info, then send it on its way, no? Using only the IPD of the first  
page is simply Wrong in many use-cases.

Also, my expression of "multiple best-fits" should actually be  
"multiple total-fits".
As in: There will always be some sort of threshold, like a certain  
amount of content that can be fit onto, say, the first three pages.

Once this threshold is known, running a total-fit algorithm over the  
elements up to that point, and continuing from the next should lead  
to *exactly* the same result as a global (uninterrupted) total-fit.

Just a thought

Cheers

Andreas

Re: Temp_Interleaved_Page_Line_Breaking

Posted by Chris Bowditch <bo...@hotmail.com>.

Andreas L Delmelle wrote:

<snip/>

Hi Andreas,

> That is not DISagreeing with me, I think (almost on the contrary). I  
> did not mean total-fit in the sense of the implementation of the  
> algorithm, but total-fit as to the end-result: as such, a total-fit  
> result may precisely require a breaking-up into many smaller best- fits. 
> A total-fit result may just be much more difficult to  accomplish with a 
> total-fit algorithm, than by a succession of best- fit loops, 
> interrupted and resumed at regular intervals.

Hmm. I tend to agree with Simon's perspective here. The terms 
"total-fit" and "best-fit" refer to the implementation of the 
algorithms. Surely the end result of the total-fit algorithm is the same 
thing as the algorithm itself? The user expectation of total-fit may 
well be what you describe, but I'm not sure how it could be achieved 
using the algorithms themselves.

> 
> What I meant was (in reply to Chris' question):
> If the layout-master-set is such that the region-body width changes  
> from one page to the next, I would not expect end-users to  additionally 
> have to fiddle with extension-properties to make FOP  behave as one 
> would generally expect... /That/ is simply native XSL- FO compliance to me.

That would be ideal :-)

However, I fear that would be too difficult to achieve. OTOH, my 
understanding of Layout is not good enough to be certain.

Thanks,

Chris

Re: Temp_Interleaved_Page_Line_Breaking

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Oct 1, 2007, at 21:50, Simon Pepping wrote:

> <snip />
   [Me:]
>> I wouldn't think so. Total-fit for me precisely implies taking  
>> into account
>> the fact that the available IPD may alter from one page to another.
>> What FOP Trunk currently does in that scenario is definitely not a  
>> total-fit, or
>> at least, only in the case the available IPD does not change  
>> between pages.
>> The key to that is the fact that FlowLM.getNextKnuthElements()  
>> gets called
>> only once per fo:flow, and in the base-loop passes on a  
>> LayoutContext to
>> its descendants with the IPD of the first page in the page- 
>> sequence. This
>> IPD (or a portion thereof) is ultimately used by *all* descendant
>> LineLayoutManagers...
>
> I definitely disagree with you. As we have it now, the total-fit
> algorithm needs to have all lines of the page sequence before it can
> calculate the best page breaks. Therefore all paragraphs must be
> broken into lines before page breaking starts. There is no room for
> changing the page widths.
>
> The total-fit algorithm has no point where you can interfere. You need
> to give it all information at the start.

That is not DISagreeing with me, I think (almost on the contrary). I  
did not mean total-fit in the sense of the implementation of the  
algorithm, but total-fit as to the end-result: as such, a total-fit  
result may precisely require a breaking-up into many smaller best- 
fits. A total-fit result may just be much more difficult to  
accomplish with a total-fit algorithm, than by a succession of best- 
fit loops, interrupted and resumed at regular intervals.

What I meant was (in reply to Chris' question):
If the layout-master-set is such that the region-body width changes  
from one page to the next, I would not expect end-users to  
additionally have to fiddle with extension-properties to make FOP  
behave as one would generally expect... /That/ is simply native XSL- 
FO compliance to me.

Cheers

Andreas

Re: Temp_Interleaved_Page_Line_Breaking

Posted by Simon Pepping <sp...@leverkruid.eu>.

On Mon, Oct 01, 2007 at 06:40:41PM +0200, Andreas L Delmelle wrote:
> One thing I was wondering about: doLineBreaking() is parameterless for the 
> moment. Do you see this changing in the future? i.e. If this method can be 
> called multiple times on the same paragraph, give it a position to start 
> from the second time? So something like
>
> doLineBreaking(LayoutContext, Position)...?

Probably. I will see that later. When in a best-fit algorithm the page
breaker has determined the page break, and it falls inside a
paragraph, then I plan to keep only that node in the active list, and
rerun the breaking algorithm from there. That should do the trick.

> If I get it correctly now, the LayoutContext is still handled through the 
> LineLM, while somehow I would see the LineLM as receiving the LayoutContext 
> only at the very latest instance. For example, the 
> paragraph-linefiller-width would no longer be stored in the KnuthParagraph 
> itself, but passed into the doLineBreaking() method. (Long shot :S)

The context is that of the LineLayoutLM, and can be determined early
on, like now. Only page-dependent parameters need to be passed in the
call of doLineBreaking(). Insofar as they are now in the layout
context, that needs to be changed.

> I wouldn't think so. Total-fit for me precisely implies taking into account 
> the fact that the available IPD may alter from one page to another. What 
> FOP Trunk currently does in that scenario is definitely not a total-fit, or 
> at least, only in the case the available IPD does not change between pages.
> The key to that is the fact that FlowLM.getNextKnuthElements() gets called 
> only once per fo:flow, and in the base-loop passes on a LayoutContext to 
> its descendants with the IPD of the first page in the page-sequence. This 
> IPD (or a portion thereof) is ultimately used by *all* descendant 
> LineLayoutManagers...

I definitely disagree with you. As we have it now, the total-fit
algorithm needs to have all lines of the page sequence before it can
calculate the best page breaks. Therefore all paragraphs must be
broken into lines before page breaking starts. There is no room for
changing the page widths.

The total-fit algorithm has no point where you can interfere. You need
to give it all information at the start.

Simon

-- 
Simon Pepping
home page: http://www.leverkruid.eu

Re: Temp_Interleaved_Page_Line_Breaking, failed unit tests

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Sep 30, 2007, at 22:03, Simon Pepping wrote:

> On Sat, Sep 29, 2007 at 08:59:36AM +0200, Andreas L Delmelle wrote:
>
>> Either the code in ListItemLM and others needs to be adapted to  
>> deal with
>> ListElements instead of KnuthElements, or the ParagraphListElement  
>> needs to
>> subclass KnuthElement instead of ListElement directly.
>
> ParagraphListElements are not supposed be KnuthElements. The error is
> genuine. After linebreaking has been applied to ParagraphListElements,
> they have been replaced by KnuthBlockBoxes and similar
> KnuthElements. Then the CCE will have been removed.

OK, I see. Just got to the conclusion myself that it looks quite  
simple though: at some point in the process, doLineBreaking() needs  
to be called to substitute the ParagraphListElements with the  
corresponding list of KnuthElements. I'm just not sure where to  
integrate it, yet.

One thing I was wondering about: doLineBreaking() is parameterless  
for the moment. Do you see this changing in the future? i.e. If this  
method can be called multiple times on the same paragraph, give it a  
position to start from the second time? So something like

doLineBreaking(LayoutContext, Position)...?

If I get it correctly now, the LayoutContext is still handled through  
the LineLM, while somehow I would see the LineLM as receiving the  
LayoutContext only at the very latest instance. For example, the  
paragraph-linefiller-width would no longer be stored in the  
KnuthParagraph itself, but passed into the doLineBreaking() method.  
(Long shot :S)

Concerning an earlier question/remark by Chris:

On Sep 25, 2007, at 10:19, Chris Bowditch wrote:
>> The page breaker can implement both the total fit strategy for a page
>> sequence and the best fit strategy for one page (or a range of
>> pages). The latter strategy allows one to have different widths for
>> different pages.
>
> I am curious to know how this works. So if I want to allow changing  
> IPD in a page-sequence, I specify a property on the page-sequence  
> to indicate best-fit instead of total-fit?

I wouldn't think so. Total-fit for me precisely implies taking into  
account the fact that the available IPD may alter from one page to  
another. What FOP Trunk currently does in that scenario is definitely  
not a total-fit, or at least, only in the case the available IPD does  
not change between pages.
The key to that is the fact that FlowLM.getNextKnuthElements() gets  
called only once per fo:flow, and in the base-loop passes on a  
LayoutContext to its descendants with the IPD of the first page in  
the page-sequence. This IPD (or a portion thereof) is ultimately used  
by *all* descendant LineLayoutManagers...

Cheers

Andreas