You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by Georg Datterl <ge...@geneon.de> on 2009/03/04 14:32:56 UTC

Index and Pagenumbers

Hi everybody, it's me again.

Today I want to build a buzzword index, telling me on which pages a word appears. 
In my index, I write a list of page-number-citation with the appropriate ref-ids. 
This works fine so far, but if the word appears twice on the same page, obviously 
the page number appears twice and looks strange. 

On http://www.sagehill.net/docbookxsl/InstallingAnFO.html I found:

Index cleanup. The XSL-FO 1.0 standard has no way of specifying how page numbers in a book's index should be cleaned up. The cleanup process entails removing duplicate page numbers on an entry, and converting a sequence of consecutive numbers to a page range. This produces a more usable index. In XEP, the extension element is rx:page-index. In Antenna House, the extension is an attribute named axf:suppress-duplicate-page-number.

Is there something similar for fop?

Regards,
 
Georg Datterl
 
------ Kontakt ------
 
Georg Datterl
 
Geneon media solutions gmbh
Gutenstetter Straße 8a
90449 Nürnberg
 
HRB Nürnberg: 17193
Geschäftsführer: Yong-Harry Steiert 

Tel.: 0911/36 78 88 - 26
Fax: 0911/36 78 88 - 20
 
www.geneon.de
 
Weitere Mitglieder der Willmy MediaGroup:
 
IRS Integrated Realization Services GmbH:    www.irs-nbg.de 
Willmy PrintMedia GmbH:                            www.willmy.de
Willmy Consult & Content GmbH:                 www.willmycc.de 

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: AW: Index and Pagenumbers

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
I haven't done any research in that direction, so I can't really tell
you how difficult this all is. If someone wants to work in this
direction, I'd personally prefer if the XSL 1.1 features at least get
some consideration over a proprietary extension (which can really only
serve as an temporary solution). At least, I would suggest you take a
short look at what XSL 1.1 provides.

At the moment, the resolution of page-number-citations can happen in two
places at the moment:
1. In PageNumberCitationLayoutManager where the page number is directly
entered as a TextArea if the target page is already known.
2. In area.inline.UnresolvedPageNumber if the target page is only known
at a later stage. In this case the page number is squeezed into some
pre-reserved space. This avoids a two-pass layout approach that other
implementations use but it has its limitations.

The problem for you: In case 2, the layout engine is no longer operating
and you cannot adjust properly for larger amounts of text (multiple
p-n-cs that are suddenly gone). However, since word indices are normally
at the end of a document, this might be something you can avoid.

Anyway, you may want to look at
layoutmgr.inline.PageNumberCitationLayoutManager. But I don't think it's
going to be easy to find a good way to merge multiple
page-number-citations. At any rate, it cannot be done in only the
before-mentioned layout manager. I suspect that a clean implementation
doesn't get around a multi-pass approach to layout which FOP currently
can't do. But again, I haven't done any research in this direction so I
can only speculate. Maybe someone else has a better idea what the best
approach would be to do this.

On 04.03.2009 14:56:09 Georg Datterl wrote:
> Hi Jeremias,
> 
> Yes, postprocessing. I'm wondering, whether implementing the extension
> will be legally safe and easy enough to make it worth the performance
> gain. 
> I'd guess it would be an attribute of a block, looking at all children
> which are page-number-citations and throw away those with duplicate
> page numbers. Is there a point where I already have the actual page number
> but can still distinguish page-number-citations from ordinary blocks?
> 
> 
> Mit freundlichen Grüßen
>  
> Georg Datterl
>  
> ------ Kontakt ------
>  
> Georg Datterl
>  
> Geneon media solutions gmbh
> Gutenstetter Straße 8a
> 90449 Nürnberg
>  
> HRB Nürnberg: 17193
> Geschäftsführer: Yong-Harry Steiert 
> 
> Tel.: 0911/36 78 88 - 26
> Fax: 0911/36 78 88 - 20
>  
> www.geneon.de
>  
> Weitere Mitglieder der Willmy MediaGroup:
>  
> IRS Integrated Realization Services GmbH:    www.irs-nbg.de 
> Willmy PrintMedia GmbH:                            www.willmy.de
> Willmy Consult & Content GmbH:                 www.willmycc.de 
> -----Ursprüngliche Nachricht-----
> Von: Jeremias Maerki [mailto:dev@jeremias-maerki.ch] 
> Gesendet: Mittwoch, 4. März 2009 14:42
> An: fop-users@xmlgraphics.apache.org
> Betreff: Re: Index and Pagenumbers
> 
> No, there's nothing like that in FOP. However, you could be adventurous and post-process the Area Tree XML ("-at" on the command-line) that FOP can generate. There you'd find all those references again.
> 
> XSL 1.1 contains support for building indices but that hasn't been implemented in FOP, yet.
> 
> On 04.03.2009 14:32:56 Georg Datterl wrote:
> > Hi everybody, it's me again.
> > 
> > Today I want to build a buzzword index, telling me on which pages a word appears. 
> > In my index, I write a list of page-number-citation with the appropriate ref-ids. 
> > This works fine so far, but if the word appears twice on the same 
> > page, obviously the page number appears twice and looks strange.
> > 
> > On http://www.sagehill.net/docbookxsl/InstallingAnFO.html I found:
> > 
> > Index cleanup. The XSL-FO 1.0 standard has no way of specifying how 
> > page numbers in a book's index should be cleaned up. The cleanup 
> > process entails removing duplicate page numbers on an entry, and 
> > converting a sequence of consecutive numbers to a page range. This 
> > produces a more usable index. In XEP, the extension element is 
> > rx:page-index. In Antenna House, the extension is an attribute named 
> > axf:suppress-duplicate-page-number.
> > 
> > Is there something similar for fop?
> > 
> > Regards,
> >  
> > Georg Datterl




Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: AW: AW: Index and Pagenumbers

Posted by Andreas Delmelle <an...@telenet.be>.
On 05 Mar 2009, at 11:30, Georg Datterl wrote:

Hi Georg

> In my case, I don't need the index before the pages are created (and  
> as far as I understood, other implementations of this feature only  
> relayout the index pages, leaving empty pages, if too many lines are  
> cleaned). And I don't mind implementing a FOP1.1 feature, but I do  
> feel a bit like Frodo in Rivendell when reading the specification.

:-) Who doesn't? Even the fop-devs who have been here from the start  
still get that experience every once in a while, although you do get  
used to the wording and the vagueness over time.

> For example, according to the specification, what should happen if  
> two referenced block following each other are in the same page- 
> sequence as the index and after the index and at the beginning of a  
> new page? When merging, one block would move to the previous page,  
> therefore the page number would change, therefore no merging, so the  
> block moves back one page, then the numbers could be merged, but the  
> the block would move again. I think I'm getting a headache.

Well, actually, the problem exists for regular page-number-citations  
as well, although the chances of that happening are obviously much  
smaller. A page-number-citation that turns out to be longer than the  
space reserved for it --the width of 3 'M' glyphs is what FOP  
currently uses, IIC-- could, strictly speaking, change the line-layout  
and so, the page-layout for that page and all following pages.


Regards

Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: AW: AW: AW: AW: Index and Pagenumbers

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
As I wrote before, the layout engine is not running anymore at that
stage, i.e. no layout manager. The adjustment code is in the area tree
called upon resolution of previously unresolved page citations.

On 06.03.2009 11:42:01 Georg Datterl wrote:
> Hi Chris,
> 
> You are right. The line breaks are inserted before. But the area
> containing the page number fits the actual number, so there's a resize
> after everything is layouted. I'd assume, a kind of LineLayoutManager
> is called again after the pages are generated. Maybe calling a
> BlockLayoutManager too would remove the space and calling the
> PageLayoutManager would result in an endless loop.
> 
> Regards,
>  
> Georg Datterl
>  
> ------ Kontakt ------
>  
> Georg Datterl
>  
> Geneon media solutions gmbh
> Gutenstetter Straße 8a
> 90449 Nürnberg
>  
> HRB Nürnberg: 17193
> Geschäftsführer: Yong-Harry Steiert 
> 
> Tel.: 0911/36 78 88 - 26
> Fax: 0911/36 78 88 - 20
>  
> www.geneon.de
>  
> Weitere Mitglieder der Willmy MediaGroup:
>  
> IRS Integrated Realization Services GmbH:    www.irs-nbg.de 
> Willmy PrintMedia GmbH:                            www.willmy.de
> Willmy Consult & Content GmbH:                 www.willmycc.de 
> -----Ursprüngliche Nachricht-----
> Von: Chris Bowditch [mailto:bowditch_chris@hotmail.com] 
> Gesendet: Freitag, 6. März 2009 11:29
> An: fop-users@xmlgraphics.apache.org
> Betreff: Re: AW: AW: AW: Index and Pagenumbers
> 
> Georg Datterl wrote:
> 
> Hi Georg,
> 
> > Hi Andreas,
> > 
> > Looks (surprisingly) good to me.  I tried the same with a background color for page-number-citation and the space between the text is from the alignment, not the page-number-citation block. Using page numbers > 100 still works, so the block at least is layouted after all page numbers are available, not with a reserved 3M-Space. 
> 
> I'm not sure I agree with you. If the space came purely from the justification then why doesn't FOP put more of the citations on each line. In fact I think 44 citations with a single digit each should fit on one line.
> 
> Regards,
> 
> Chris
> 
> > 
> > Regards,
> >  
> > Georg Datterl
> >  
> > ------ Kontakt ------
> >  
> > Georg Datterl
> >  
> > Geneon media solutions gmbh
> > Gutenstetter Straße 8a
> > 90449 Nürnberg
> >  
> > HRB Nürnberg: 17193
> > Geschäftsführer: Yong-Harry Steiert
> > 
> > Tel.: 0911/36 78 88 - 26
> > Fax: 0911/36 78 88 - 20
> >  
> > www.geneon.de
> >  
> > Weitere Mitglieder der Willmy MediaGroup:
> >  
> > IRS Integrated Realization Services GmbH:    www.irs-nbg.de 
> > Willmy PrintMedia GmbH:                            www.willmy.de
> > Willmy Consult & Content GmbH:                 www.willmycc.de 
> > -----Ursprüngliche Nachricht-----
> > Von: Andreas Delmelle [mailto:andreas.delmelle@telenet.be]
> > Gesendet: Donnerstag, 5. März 2009 20:46
> > An: fop-users@xmlgraphics.apache.org
> > Betreff: Re: AW: AW: Index and Pagenumbers
> > 
> > On 05 Mar 2009, at 20:31, Andreas Delmelle wrote:
> > 
> > 
> >>><snip />
> >>>I think, if one would take the time to artificially generate a first 
> >>>page-sequence with pages containing a lot of citations pointing 
> >>>towards the end of the document, you would already see side-effects 
> >>>to some extent. The actual page-numbers cannot be resolved before the 
> >>>line-breaks are computed, so... and here I'm not entirely certain. I 
> >>>have not yet run such test extensively myself.
> > 
> > 
> > FWIW, just ran the quickest test I could think of... see attachment PDF for the result of:
> > 
> > <root xmlns="http://www.w3.org/1999/XSL/Format" >
> >   <layout-master-set>
> >    <simple-page-master page-height="11in"
> >         page-width="8.5in"
> >         margin-left="1in"
> >         margin-right="1in"
> >         margin-top="2in"
> >         margin-bottom="2in"
> >         master-name="foo">
> >     <region-body/>
> >    </simple-page-master>
> >   </layout-master-set>
> >   <page-sequence master-reference="foo">
> >    <flow flow-name="xsl-region-body">
> >      <block text-align="justify" id="block-1">
> >        <page-number-citation ref-id="block-2"/>,
> >        <!-- repeat x 44 -->
> >        <page-number-citation ref-id="block-2"/>
> >      </block>
> >    </flow>
> >   </page-sequence>
> >   <page-sequence master-reference="foo">
> >    <flow flow-name="xsl-region-body">
> >      <block id="block-2">This is the block we point to</block>
> >    </flow>
> >   </page-sequence>
> > </root>
> > 
> > 


Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: AW: AW: AW: Index and Pagenumbers

Posted by Chris Bowditch <bo...@hotmail.com>.
Georg Datterl wrote:

Hi Georg,

> Hi Andreas, 
> 
> Looks (surprisingly) good to me.  I tried the same with a background color for page-number-citation and the space between the text is from the alignment, not the page-number-citation block. Using page numbers > 100 still works, so the block at least is layouted after all page numbers are available, not with a reserved 3M-Space. 

I'm not sure I agree with you. If the space came purely from the 
justification then why doesn't FOP put more of the citations on each 
line. In fact I think 44 citations with a single digit each should fit 
on one line.

Regards,

Chris

> 
> Regards,
>  
> Georg Datterl
>  
> ------ Kontakt ------
>  
> Georg Datterl
>  
> Geneon media solutions gmbh
> Gutenstetter Straße 8a
> 90449 Nürnberg
>  
> HRB Nürnberg: 17193
> Geschäftsführer: Yong-Harry Steiert 
> 
> Tel.: 0911/36 78 88 - 26
> Fax: 0911/36 78 88 - 20
>  
> www.geneon.de
>  
> Weitere Mitglieder der Willmy MediaGroup:
>  
> IRS Integrated Realization Services GmbH:    www.irs-nbg.de 
> Willmy PrintMedia GmbH:                            www.willmy.de
> Willmy Consult & Content GmbH:                 www.willmycc.de 
> -----Ursprüngliche Nachricht-----
> Von: Andreas Delmelle [mailto:andreas.delmelle@telenet.be] 
> Gesendet: Donnerstag, 5. März 2009 20:46
> An: fop-users@xmlgraphics.apache.org
> Betreff: Re: AW: AW: Index and Pagenumbers
> 
> On 05 Mar 2009, at 20:31, Andreas Delmelle wrote:
> 
> 
>>><snip />
>>>I think, if one would take the time to artificially generate a first 
>>>page-sequence with pages containing a lot of citations pointing 
>>>towards the end of the document, you would already see side-effects 
>>>to some extent. The actual page-numbers cannot be resolved before the 
>>>line-breaks are computed, so... and here I'm not entirely certain. I 
>>>have not yet run such test extensively myself.
> 
> 
> FWIW, just ran the quickest test I could think of... see attachment PDF for the result of:
> 
> <root xmlns="http://www.w3.org/1999/XSL/Format" >
>   <layout-master-set>
>    <simple-page-master page-height="11in"
>         page-width="8.5in"
>         margin-left="1in"
>         margin-right="1in"
>         margin-top="2in"
>         margin-bottom="2in"
>         master-name="foo">
>     <region-body/>
>    </simple-page-master>
>   </layout-master-set>
>   <page-sequence master-reference="foo">
>    <flow flow-name="xsl-region-body">
>      <block text-align="justify" id="block-1">
>        <page-number-citation ref-id="block-2"/>,
>        <!-- repeat x 44 -->
>        <page-number-citation ref-id="block-2"/>
>      </block>
>    </flow>
>   </page-sequence>
>   <page-sequence master-reference="foo">
>    <flow flow-name="xsl-region-body">
>      <block id="block-2">This is the block we point to</block>
>    </flow>
>   </page-sequence>
> </root>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
> 
> 
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


AW: AW: AW: Index and Pagenumbers

Posted by Georg Datterl <ge...@geneon.de>.
Hi Andreas, 

Looks (surprisingly) good to me.  I tried the same with a background color for page-number-citation and the space between the text is from the alignment, not the page-number-citation block. Using page numbers > 100 still works, so the block at least is layouted after all page numbers are available, not with a reserved 3M-Space. 

Regards,
 
Georg Datterl
 
------ Kontakt ------
 
Georg Datterl
 
Geneon media solutions gmbh
Gutenstetter Straße 8a
90449 Nürnberg
 
HRB Nürnberg: 17193
Geschäftsführer: Yong-Harry Steiert 

Tel.: 0911/36 78 88 - 26
Fax: 0911/36 78 88 - 20
 
www.geneon.de
 
Weitere Mitglieder der Willmy MediaGroup:
 
IRS Integrated Realization Services GmbH:    www.irs-nbg.de 
Willmy PrintMedia GmbH:                            www.willmy.de
Willmy Consult & Content GmbH:                 www.willmycc.de 
-----Ursprüngliche Nachricht-----
Von: Andreas Delmelle [mailto:andreas.delmelle@telenet.be] 
Gesendet: Donnerstag, 5. März 2009 20:46
An: fop-users@xmlgraphics.apache.org
Betreff: Re: AW: AW: Index and Pagenumbers

On 05 Mar 2009, at 20:31, Andreas Delmelle wrote:

>> <snip />
>> I think, if one would take the time to artificially generate a first 
>> page-sequence with pages containing a lot of citations pointing 
>> towards the end of the document, you would already see side-effects 
>> to some extent. The actual page-numbers cannot be resolved before the 
>> line-breaks are computed, so... and here I'm not entirely certain. I 
>> have not yet run such test extensively myself.

FWIW, just ran the quickest test I could think of... see attachment PDF for the result of:

<root xmlns="http://www.w3.org/1999/XSL/Format" >
  <layout-master-set>
   <simple-page-master page-height="11in"
        page-width="8.5in"
        margin-left="1in"
        margin-right="1in"
        margin-top="2in"
        margin-bottom="2in"
        master-name="foo">
    <region-body/>
   </simple-page-master>
  </layout-master-set>
  <page-sequence master-reference="foo">
   <flow flow-name="xsl-region-body">
     <block text-align="justify" id="block-1">
       <page-number-citation ref-id="block-2"/>,
       <!-- repeat x 44 -->
       <page-number-citation ref-id="block-2"/>
     </block>
   </flow>
  </page-sequence>
  <page-sequence master-reference="foo">
   <flow flow-name="xsl-region-body">
     <block id="block-2">This is the block we point to</block>
   </flow>
  </page-sequence>
</root>


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: AW: AW: Index and Pagenumbers

Posted by Andreas Delmelle <an...@telenet.be>.
On 05 Mar 2009, at 20:31, Andreas Delmelle wrote:

>> <snip />
>> I think, if one would take the time to artificially generate a  
>> first page-sequence with pages containing a lot of citations  
>> pointing towards the end of the document, you would already see  
>> side-effects to some extent. The actual page-numbers cannot be  
>> resolved before the line-breaks are computed, so... and here I'm  
>> not entirely certain. I have not yet run such test extensively  
>> myself.

FWIW, just ran the quickest test I could think of... see attachment  
PDF for the result of:

<root xmlns="http://www.w3.org/1999/XSL/Format" >
  <layout-master-set>
   <simple-page-master page-height="11in"
        page-width="8.5in"
        margin-left="1in"
        margin-right="1in"
        margin-top="2in"
        margin-bottom="2in"
        master-name="foo">
    <region-body/>
   </simple-page-master>
  </layout-master-set>
  <page-sequence master-reference="foo">
   <flow flow-name="xsl-region-body">
     <block text-align="justify" id="block-1">
       <page-number-citation ref-id="block-2"/>,
       <!-- repeat x 44 -->
       <page-number-citation ref-id="block-2"/>
     </block>
   </flow>
  </page-sequence>
  <page-sequence master-reference="foo">
   <flow flow-name="xsl-region-body">
     <block id="block-2">This is the block we point to</block>
   </flow>
  </page-sequence>
</root>


Re: AW: AW: Index and Pagenumbers

Posted by Andreas Delmelle <an...@telenet.be>.
> For example, according to the specification, what should happen if  
> two referenced block following each other are in the same page- 
> sequence as the index and after the index and at the beginning of a  
> new page? When merging, one block would move to the previous page,  
> therefore the page number would change, therefore no merging, so the  
> block moves back one page, then the numbers could be merged, but the  
> the block would move again. I think I'm getting a headache.

Apart from my earlier remarks, note that I personally do not yet  
consider merging to be a showstopper for a basic implementation. It  
would already be pretty cool if we could get to the point where the  
index just generates the sequence of page-numbers. Merging indeed  
magnifies some of the issues, which are quite unavoidable with FOP's  
current design.
Without merging, if we model the index to what PageNumberCitationLM  
does, the reserved space will, in most cases, be larger than what is  
actually needed. I think, if one would take the time to artificially  
generate a first page-sequence with pages containing a lot of  
citations pointing towards the end of the document, you would already  
see side-effects to some extent. The actual page-numbers cannot be  
resolved before the line-breaks are computed, so... and here I'm not  
entirely certain. I have not yet run such test extensively myself.
I think that we would end up with lines that are too short. Very  
visible to the naked eye in cases of justified alignment...

At any rate, the more citations in sequence, the more this issue is  
magnified. We could end up with a block of citations, with a layout of  
3 lines, but the actual content could have been put on a bit more than  
one line.

The biggest obstacle remains: FOP does not revisit line-breaks. The  
correction is done in the area tree, where the area corresponding to  
the page-number is always a descendant of a line-area. I don't think  
the effect you describe above could occur with the current design. The  
result would rather be a set of line-areas with too little content.  
The blocks would remain on the same pages, but they could turn out to  
look rather ugly...


HTH!

Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


AW: AW: Index and Pagenumbers

Posted by Georg Datterl <ge...@geneon.de>.
Hi Andreas, 

In my case, I don't need the index before the pages are created (and as far as I understood, other implementations of this feature only relayout the index pages, leaving empty pages, if too many lines are cleaned). And I don't mind implementing a FOP1.1 feature, but I do feel a bit like Frodo in Rivendell when reading the specification. 

For example, according to the specification, what should happen if two referenced block following each other are in the same page-sequence as the index and after the index and at the beginning of a new page? When merging, one block would move to the previous page, therefore the page number would change, therefore no merging, so the block moves back one page, then the numbers could be merged, but the the block would move again. I think I'm getting a headache.  

Regards,
 
Georg Datterl
 
------ Kontakt ------
 
Georg Datterl
 
Geneon media solutions gmbh
Gutenstetter Straße 8a
90449 Nürnberg
 
HRB Nürnberg: 17193
Geschäftsführer: Yong-Harry Steiert 

Tel.: 0911/36 78 88 - 26
Fax: 0911/36 78 88 - 20
 
www.geneon.de
 
Weitere Mitglieder der Willmy MediaGroup:
 
IRS Integrated Realization Services GmbH:    www.irs-nbg.de 
Willmy PrintMedia GmbH:                            www.willmy.de
Willmy Consult & Content GmbH:                 www.willmycc.de 
-----Ursprüngliche Nachricht-----
Von: Andreas Delmelle [mailto:andreas.delmelle@telenet.be] 
Gesendet: Mittwoch, 4. März 2009 22:46
An: fop-users@xmlgraphics.apache.org
Betreff: Re: AW: Index and Pagenumbers


On 04 Mar 2009, at 19:39, Andreas Delmelle wrote:

FWIW:

> On 04 Mar 2009, at 14:56, Georg Datterl wrote:
> <snip />
> I once started gathering some thoughts on the topic, but didn't get 
> very far...
> See: 
> http://wiki.apache.org/xmlgraphics-fop/FormattingObjectsForIndexing

Re-reading this and the original question:
If we would have fo:index-page-citation-list, fo:index-key-reference and the index-key/ref-index-key property pair, the solution would be fairly simple.
Adapt the stylesheet to append the node's id to the index-key attribute, so that we get an index-key that maps 1-on-1 to the id. A basic index-page-citation-list is all that is needed to produce sequences without duplicates.
Merging sequential page-numbers is already a nice-to-have. By itself, once the basic mechanism proves to be working, this should become easy.
The priority after basic implementation (2 objects + 2 properties) should probably go to implementing different separator-sequences.  
(NTS: may need to adapt that order on the Wiki). I consider it to be on the same level as index-ranges in terms of complexity, but different separators suddenly look more useful...

> By itself, the implementation of the 'index-key' property is not that 
> difficult. It is its treatment further on during layout/ rendering 
> that will be a challenge.

As Jeremias hinted, especially layout/rendering.
We will get a LayoutManager whose content can be partly or entirely unresolved (like the PageNumberCitationLM), and can grow significantly in case of large documents. Sometimes reaching one or more lines. This could mean that the layout for all the following pages may need to be revisited, which is something FOP is currently not equipped to do...

It would be easy enough to have such a LayoutManager merge a sequence of citations into one, keeping track of and eliminating duplicates, but the possible side-effects of a mix of resolved and unresolved elements, without somehow redoing the layout, cannot be so easily avoided in the current design.


Regards

Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: AW: Index and Pagenumbers

Posted by Andreas Delmelle <an...@telenet.be>.
On 04 Mar 2009, at 19:39, Andreas Delmelle wrote:

FWIW:

> On 04 Mar 2009, at 14:56, Georg Datterl wrote:
> <snip />
> I once started gathering some thoughts on the topic, but didn't get  
> very far...
> See: http://wiki.apache.org/xmlgraphics-fop/FormattingObjectsForIndexing

Re-reading this and the original question:
If we would have fo:index-page-citation-list, fo:index-key-reference  
and the index-key/ref-index-key property pair, the solution would be  
fairly simple.
Adapt the stylesheet to append the node's id to the index-key  
attribute, so that we get an index-key that maps 1-on-1 to the id. A  
basic index-page-citation-list is all that is needed to produce  
sequences without duplicates.
Merging sequential page-numbers is already a nice-to-have. By itself,  
once the basic mechanism proves to be working, this should become easy.
The priority after basic implementation (2 objects + 2 properties)  
should probably go to implementing different separator-sequences.  
(NTS: may need to adapt that order on the Wiki). I consider it to be  
on the same level as index-ranges in terms of complexity, but  
different separators suddenly look more useful...

> By itself, the implementation of the 'index-key' property is not  
> that difficult. It is its treatment further on during layout/ 
> rendering that will be a challenge.

As Jeremias hinted, especially layout/rendering.
We will get a LayoutManager whose content can be partly or entirely  
unresolved (like the PageNumberCitationLM), and can grow significantly  
in case of large documents. Sometimes reaching one or more lines. This  
could mean that the layout for all the following pages may need to be  
revisited, which is something FOP is currently not equipped to do...

It would be easy enough to have such a LayoutManager merge a sequence  
of citations into one, keeping track of and eliminating duplicates,  
but the possible side-effects of a mix of resolved and unresolved  
elements, without somehow redoing the layout, cannot be so easily  
avoided in the current design.


Regards

Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: AW: Index and Pagenumbers

Posted by Andreas Delmelle <an...@telenet.be>.
On 04 Mar 2009, at 14:56, Georg Datterl wrote:

Hi Georg

> Yes, postprocessing. I'm wondering, whether implementing the  
> extension will be legally safe and easy enough to make it worth the  
> performance gain.

To confirm Jeremias' preference: whether it is safe and/or easy is  
only secondary.
We know that:
* XSL-FO 1.1 has indexing features. Implementing a proprietary  
extension seems to be a slight waste of effort.
* I know of a few other users that are waiting for those features to  
implemented

The better option here is to investigate what needs to be done to  
implement the standard 1.1 features.

I once started gathering some thoughts on the topic, but didn't get  
very far...
See: http://wiki.apache.org/xmlgraphics-fop/FormattingObjectsForIndexing

By itself, the implementation of the 'index-key' property is not that  
difficult. It is its treatment further on during layout/rendering that  
will be a challenge.

If you're interested, I'll be glad to co-operate on this one.



Cheers

Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


AW: Index and Pagenumbers

Posted by Georg Datterl <ge...@geneon.de>.
Hi Jeremias,

Yes, postprocessing. I'm wondering, whether implementing the extension will be legally safe and easy enough to make it worth the performance gain. 
I'd guess it would be an attribute of a block, looking at all children which are page-number-citations and throw away those with duplicate page numbers. Is there a point where I already have the actual page number but can still distinguish page-number-citations from ordinary blocks?


Mit freundlichen Grüßen
 
Georg Datterl
 
------ Kontakt ------
 
Georg Datterl
 
Geneon media solutions gmbh
Gutenstetter Straße 8a
90449 Nürnberg
 
HRB Nürnberg: 17193
Geschäftsführer: Yong-Harry Steiert 

Tel.: 0911/36 78 88 - 26
Fax: 0911/36 78 88 - 20
 
www.geneon.de
 
Weitere Mitglieder der Willmy MediaGroup:
 
IRS Integrated Realization Services GmbH:    www.irs-nbg.de 
Willmy PrintMedia GmbH:                            www.willmy.de
Willmy Consult & Content GmbH:                 www.willmycc.de 
-----Ursprüngliche Nachricht-----
Von: Jeremias Maerki [mailto:dev@jeremias-maerki.ch] 
Gesendet: Mittwoch, 4. März 2009 14:42
An: fop-users@xmlgraphics.apache.org
Betreff: Re: Index and Pagenumbers

No, there's nothing like that in FOP. However, you could be adventurous and post-process the Area Tree XML ("-at" on the command-line) that FOP can generate. There you'd find all those references again.

XSL 1.1 contains support for building indices but that hasn't been implemented in FOP, yet.

On 04.03.2009 14:32:56 Georg Datterl wrote:
> Hi everybody, it's me again.
> 
> Today I want to build a buzzword index, telling me on which pages a word appears. 
> In my index, I write a list of page-number-citation with the appropriate ref-ids. 
> This works fine so far, but if the word appears twice on the same 
> page, obviously the page number appears twice and looks strange.
> 
> On http://www.sagehill.net/docbookxsl/InstallingAnFO.html I found:
> 
> Index cleanup. The XSL-FO 1.0 standard has no way of specifying how 
> page numbers in a book's index should be cleaned up. The cleanup 
> process entails removing duplicate page numbers on an entry, and 
> converting a sequence of consecutive numbers to a page range. This 
> produces a more usable index. In XEP, the extension element is 
> rx:page-index. In Antenna House, the extension is an attribute named 
> axf:suppress-duplicate-page-number.
> 
> Is there something similar for fop?
> 
> Regards,
>  
> Georg Datterl
>  


Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Index and Pagenumbers

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
No, there's nothing like that in FOP. However, you could be adventurous
and post-process the Area Tree XML ("-at" on the command-line) that FOP
can generate. There you'd find all those references again.

XSL 1.1 contains support for building indices but that hasn't been
implemented in FOP, yet.

On 04.03.2009 14:32:56 Georg Datterl wrote:
> Hi everybody, it's me again.
> 
> Today I want to build a buzzword index, telling me on which pages a word appears. 
> In my index, I write a list of page-number-citation with the appropriate ref-ids. 
> This works fine so far, but if the word appears twice on the same page, obviously 
> the page number appears twice and looks strange. 
> 
> On http://www.sagehill.net/docbookxsl/InstallingAnFO.html I found:
> 
> Index cleanup. The XSL-FO 1.0 standard has no way of specifying how
> page numbers in a book's index should be cleaned up. The cleanup process
> entails removing duplicate page numbers on an entry, and converting a
> sequence of consecutive numbers to a page range. This produces a more
> usable index. In XEP, the extension element is rx:page-index. In
> Antenna House, the extension is an attribute named
> axf:suppress-duplicate-page-number.
> 
> Is there something similar for fop?
> 
> Regards,
>  
> Georg Datterl
>  


Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org