You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lenya.apache.org by dr...@sdf-eu.org on 2006/09/20 18:26:39 UTC

Virtual index page generation based on xpathdirectorygenerator results

Hi,

We have about 750 items in our core collection. Each item has a unique 
catalogue code which begins with two or three capital letters, followed by 
a 1 to 3 digit number, followed by one or more dot, dash or underscore 
separated subcodes. (e.g., ALR001, BB050_1, PQ42-2 etc.)

Each item has an xml file based on a custom doctype (which includes the Lenya
metadata). All the items have been imported under the
pub/content/authoring/collections directory in standard Lenya fashion (i.e.,
the collections directory has 750 sub-directories named after the items' 
catalogue code and each of those directories contains six index_nn.xml files
for the six languages we need).

BXE has been configured to work with this and works (including asset / 
image management). So far so good.

We have created an index page which:

1) Using xpathdirectorygenerator, extracts the alphabetical prefix of each code
(i.e., ALR, BB, PQ etc.).

2) Uses i18n translation as a hack to look up the code's meaning (and as 
a bonus returns it in the correct language ;-)

3) Uses XSLT to generate a unique list of letter codes (i.e., ALR001, 
ALR002 etc. become ALR in the index)

4) Generates links with the code meaning as link text to the virtual pages 
for each letter code (i.e., to ALR.html, BB.html which will index all 
the ALRnnn and BBnnn codes respectively etc.)

Issues so far were:

1) Initially we tried recursive XSLT to generate the unique letter code 
list but java.StackOverflowError occured at a recursion depth of 572 
(even just making the recursive calls and nothing else!) so in 
the end we pre-sorted the codes and got the XSLT to see if the code 
for the previous node differs from the current node, and if so to emit 
that code.

2) Because the codes are not known in advance (and we didn't want 92 
new pipelines - i.e., one for each code prefix) we need(ed) some way to 
structure the URI-space to avoid collisions between items and the virtual 
indexes. Luckily Lenya didn't seem to complain about the doctype in the 
pipeline match.

3) We hit the broken links problem mentioned elsewhere here which we've 
fixed by adjusting the publication-sitemap even though it's not entirely 
clear to us exactly how the cocoon://navigation calls work. Getting 
breadcrumbs to work properly is still proving problematic.

4) Firefox takes an age to open up or move around in siteview.

5) Lenya seems to check if a document really exists too early so we've 
had to create dummy documents for the index pages which is messy and 
further complicates the matching process.

Questions:

Is there a better approach to this? (that is cleaner and more 
scaleable if we e.g., increased the item count to say 1 million)

What's the best way of adjusting Lenya's error checking code so we don't 
need dummy documents? Obviously we don't want to break Lenya handling 
XHTML pages. Do we need some kind of flag in the URL to indicate 
that a page is virtual? Could we use usecases? What doctype does a virtual 
page have and how could it interact with doctype.xmap?

Is there any documentation on how to go about creating a good URI-space 
bearing in mind that we'll be serving multi-lingual documents to a variety 
of output formats?

Thanks,

--

drseuk@sdf-eu.org
SDF-EU Public Access UNIX System - http://sdf-eu.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org