You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Ramin Firoozye <ra...@wizen.com> on 2001/11/11 23:12:24 UTC

Q: Generating indexes

Greetings all,

I've searched the fop-dev archive, combed through the draft spec, and
trawled around Google, but haven't found an easy way to solve the 'Index
Generation' problem.

I'm trying to generate a traditional end-of-book index using FOP (version
0.20.2).
The idea is that in the XML document source the author can specify something
like:

<doc>
...
The moose<index entry="Animal" /> frolics with the <index entry="Animal"
/>squirrel in the field.
...
The <index entry="Animal" />geese smiled at the Kayaker.
...
</doc>

>>From this, you want to end up with something like:

A
----------
Animal 12, 18, 20
  25, 39

Ant 1, 15
...

Anyway, I've got most of it worked out in my head (and it hurts (:-)
especially the bit about multiple page-number-citations referencing a single
index entry). It all is so godawful gnarly and I thought I'd ask to see if
anybody has figured out an easier way to do this in XSL:FO.

Any tips or references would be appreciated.

Thanks,
Ramin
---
Ramin Firoozye - ramin@wizen.com - Wizen Software, San Francisco, CA.
---


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


RE: Generating indexes

Posted by Arved Sandstrom <Ar...@chebucto.ns.ca>.
Ahhh, the discussion below would tend to give that impression, wouldn't it?
:-)

Answer is, markers were 75% in place a few versions ago, but then Mark 
Lillywhite's stuff got introduced and broke that code. I think it was a good 
decision. Now I think markers will wait until the rewrite is a bit further 
along.

Regards,
Arved

At 07:28 AM 11/13/01 -0600, Jim Urban wrote:
>Is fo:marker implemented in FOP?
>
>Jim
>
>-----Original Message-----
>From: Arved Sandstrom [mailto:Arved_37@chebucto.ns.ca]
>Sent: Monday, November 12, 2001 9:24 PM
>To: fop-dev@xml.apache.org
>Subject: Re: Generating indexes
>
>
>At 10:40 PM 11/12/01 +0100, Corinna Hischke wrote:
>>Hi,
>>
>>> I'm trying to generate a traditional end-of-book index using FOP (version
>>> 0.20.2).
>>> The idea is that in the XML document source the author can specify
>>something
>>> like:
>>> ...
>>>
>>> Anyway, I've got most of it worked out in my head (and it hurts (:-)
>>> especially the bit about multiple page-number-citations referencing a
>>single
>>> index entry). It all is so godawful gnarly and I thought I'd ask to see
>if
>>> anybody has figured out an easier way to do this in XSL:FO.
>>>
>>> Any tips or references would be appreciated.
>>
>>I also thought of something like 'multiple page-number-citations' and came
>>to the
>>conclusion that markers could be used for that. I didn't try yet, but am I
>>wrong?
>>
>>Corinna
>
>Markers (fo:retrieve-marker) put content (as determined by the "best"
>qualifying fo:marker) in the static content. Plus only one marker gets
>retrieved. So you can see they are not intended for indexes - the spec
>indicates that markers are suited for manufacturing header (or footer, or
>sidebar) content that is somewhat dependent on current context - e.g. what
>is the current chapter title, what is the current section title, etc etc.
>
>I think you might be able to do something with straight XSL, but it would be
>ugly, and I think you would normally have redundancies (3 occurrences of an
>indexed word on one page - what do you do?). Perhaps the best solution is a
>2-stage one: consider the possibility of doing one formatting pass that
>generates the XML area tree. Use a Perl or Python script to generate an
>index from this data. After all, it _is_ paginated. Then write an extra
>fo:page-sequence that creates the index, and re-run FOP to produce the final
>PDF document.
>
>This is the kind of thing you have to do with LaTeX (well, with makeindex,
>not Perl scripts), for good reason. It's tough to do well any other way. :-)
>
>I should add, LaTeX \index entries go right into the formatted text. There
>is an advantage to doing this with XSL also, as the decision-making remains
>with the original XML. In this case the XSL/FOP procedure for index
>generation could be identical to LaTeX (no need to use the XMLRenderer any
>more):
>
>1) Place <index entry="index_text"/> in those spots in your original XML
>where you know that have content that you wish to index with "index_text";
>2) Run your XSLT, and have the <index.../> tags converted into some
><fox:index.../> construct. These elements have meaning to indexer only,
>which can be invoked when FOP is run - the effect is to open up an index
>file and record entries by page number;
>3) Review the index file. Edit it, OR edit the original XML and rerun FOP,
>or both, until the index file is satisfactory;
>4) Run a Perl or Python script (I admit grudgingly that it could be Java
>also) to take the index file and produce an XML file that will convert into
>a page-sequence (the XSLT needs to be ready for this, as required); this can
>be added into the original XML with a reference.
>5) Rerun XSLT and FOP, and voila.
>
>I think that an index will require this much work in general, no more and no
>less. It is an art form to produce a good, useful index and it is just not
>going to happen with a simple, automated pass. I also want to stress that
>indexes are derivative - they represent new content, and have parallels with
>footnotes. Some of the discussion so far has seemingly treated indexes as
>being more like word search indexes, and that is not what we are talking
>about.
>
>Just some thoughts.
>
>AHS
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
>For additional commands, email: fop-dev-help@xml.apache.org
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
>For additional commands, email: fop-dev-help@xml.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


RE: Generating indexes

Posted by Jim Urban <ji...@netsteps.net>.
Is fo:marker implemented in FOP?

Jim

-----Original Message-----
From: Arved Sandstrom [mailto:Arved_37@chebucto.ns.ca]
Sent: Monday, November 12, 2001 9:24 PM
To: fop-dev@xml.apache.org
Subject: Re: Generating indexes


At 10:40 PM 11/12/01 +0100, Corinna Hischke wrote:
>Hi,
>
>> I'm trying to generate a traditional end-of-book index using FOP (version
>> 0.20.2).
>> The idea is that in the XML document source the author can specify
>something
>> like:
>> ...
>>
>> Anyway, I've got most of it worked out in my head (and it hurts (:-)
>> especially the bit about multiple page-number-citations referencing a
>single
>> index entry). It all is so godawful gnarly and I thought I'd ask to see
if
>> anybody has figured out an easier way to do this in XSL:FO.
>>
>> Any tips or references would be appreciated.
>
>I also thought of something like 'multiple page-number-citations' and came
>to the
>conclusion that markers could be used for that. I didn't try yet, but am I
>wrong?
>
>Corinna

Markers (fo:retrieve-marker) put content (as determined by the "best"
qualifying fo:marker) in the static content. Plus only one marker gets
retrieved. So you can see they are not intended for indexes - the spec
indicates that markers are suited for manufacturing header (or footer, or
sidebar) content that is somewhat dependent on current context - e.g. what
is the current chapter title, what is the current section title, etc etc.

I think you might be able to do something with straight XSL, but it would be
ugly, and I think you would normally have redundancies (3 occurrences of an
indexed word on one page - what do you do?). Perhaps the best solution is a
2-stage one: consider the possibility of doing one formatting pass that
generates the XML area tree. Use a Perl or Python script to generate an
index from this data. After all, it _is_ paginated. Then write an extra
fo:page-sequence that creates the index, and re-run FOP to produce the final
PDF document.

This is the kind of thing you have to do with LaTeX (well, with makeindex,
not Perl scripts), for good reason. It's tough to do well any other way. :-)

I should add, LaTeX \index entries go right into the formatted text. There
is an advantage to doing this with XSL also, as the decision-making remains
with the original XML. In this case the XSL/FOP procedure for index
generation could be identical to LaTeX (no need to use the XMLRenderer any
more):

1) Place <index entry="index_text"/> in those spots in your original XML
where you know that have content that you wish to index with "index_text";
2) Run your XSLT, and have the <index.../> tags converted into some
<fox:index.../> construct. These elements have meaning to indexer only,
which can be invoked when FOP is run - the effect is to open up an index
file and record entries by page number;
3) Review the index file. Edit it, OR edit the original XML and rerun FOP,
or both, until the index file is satisfactory;
4) Run a Perl or Python script (I admit grudgingly that it could be Java
also) to take the index file and produce an XML file that will convert into
a page-sequence (the XSLT needs to be ready for this, as required); this can
be added into the original XML with a reference.
5) Rerun XSLT and FOP, and voila.

I think that an index will require this much work in general, no more and no
less. It is an art form to produce a good, useful index and it is just not
going to happen with a simple, automated pass. I also want to stress that
indexes are derivative - they represent new content, and have parallels with
footnotes. Some of the discussion so far has seemingly treated indexes as
being more like word search indexes, and that is not what we are talking
about.

Just some thoughts.

AHS



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Generating indexes

Posted by Arved Sandstrom <Ar...@chebucto.ns.ca>.
At 10:40 PM 11/12/01 +0100, Corinna Hischke wrote:
>Hi,
>
>> I'm trying to generate a traditional end-of-book index using FOP (version
>> 0.20.2).
>> The idea is that in the XML document source the author can specify
>something
>> like:
>> ...
>>
>> Anyway, I've got most of it worked out in my head (and it hurts (:-)
>> especially the bit about multiple page-number-citations referencing a
>single
>> index entry). It all is so godawful gnarly and I thought I'd ask to see if
>> anybody has figured out an easier way to do this in XSL:FO.
>>
>> Any tips or references would be appreciated.
>
>I also thought of something like 'multiple page-number-citations' and came
>to the
>conclusion that markers could be used for that. I didn't try yet, but am I
>wrong?
>
>Corinna

Markers (fo:retrieve-marker) put content (as determined by the "best" 
qualifying fo:marker) in the static content. Plus only one marker gets 
retrieved. So you can see they are not intended for indexes - the spec 
indicates that markers are suited for manufacturing header (or footer, or 
sidebar) content that is somewhat dependent on current context - e.g. what 
is the current chapter title, what is the current section title, etc etc.

I think you might be able to do something with straight XSL, but it would be 
ugly, and I think you would normally have redundancies (3 occurrences of an 
indexed word on one page - what do you do?). Perhaps the best solution is a 
2-stage one: consider the possibility of doing one formatting pass that 
generates the XML area tree. Use a Perl or Python script to generate an 
index from this data. After all, it _is_ paginated. Then write an extra 
fo:page-sequence that creates the index, and re-run FOP to produce the final 
PDF document.

This is the kind of thing you have to do with LaTeX (well, with makeindex, 
not Perl scripts), for good reason. It's tough to do well any other way. :-)

I should add, LaTeX \index entries go right into the formatted text. There 
is an advantage to doing this with XSL also, as the decision-making remains 
with the original XML. In this case the XSL/FOP procedure for index 
generation could be identical to LaTeX (no need to use the XMLRenderer any 
more):

1) Place <index entry="index_text"/> in those spots in your original XML 
where you know that have content that you wish to index with "index_text";
2) Run your XSLT, and have the <index.../> tags converted into some 
<fox:index.../> construct. These elements have meaning to indexer only, 
which can be invoked when FOP is run - the effect is to open up an index 
file and record entries by page number;
3) Review the index file. Edit it, OR edit the original XML and rerun FOP, 
or both, until the index file is satisfactory;
4) Run a Perl or Python script (I admit grudgingly that it could be Java 
also) to take the index file and produce an XML file that will convert into 
a page-sequence (the XSLT needs to be ready for this, as required); this can 
be added into the original XML with a reference.
5) Rerun XSLT and FOP, and voila.

I think that an index will require this much work in general, no more and no 
less. It is an art form to produce a good, useful index and it is just not 
going to happen with a simple, automated pass. I also want to stress that 
indexes are derivative - they represent new content, and have parallels with 
footnotes. Some of the discussion so far has seemingly treated indexes as 
being more like word search indexes, and that is not what we are talking
about.

Just some thoughts.

AHS



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Generating indexes

Posted by Corinna Hischke <co...@infix.de>.
Hi,

> I'm trying to generate a traditional end-of-book index using FOP (version
> 0.20.2).
> The idea is that in the XML document source the author can specify
something
> like:
> ...
>
> Anyway, I've got most of it worked out in my head (and it hurts (:-)
> especially the bit about multiple page-number-citations referencing a
single
> index entry). It all is so godawful gnarly and I thought I'd ask to see if
> anybody has figured out an easier way to do this in XSL:FO.
>
> Any tips or references would be appreciated.

I also thought of something like 'multiple page-number-citations' and came
to the
conclusion that markers could be used for that. I didn't try yet, but am I
wrong?

Corinna



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org