You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Andrew Franz <af...@optushome.com.au> on 2005/06/18 02:43:24 UTC

Re: Extending DirectoryGenerator

(reposted from the user mailing list)

Andrew Franz wrote:

> I am thinking about a simple CMS (Content Management System) which 
> would have the following features:
> 1. Ability to list MS-Office files along with their 
> <SummaryInformation> attributes (this would use Jakarta POI), ability 
> to list "image" files (basically by cloning the functionality in 
> ImageDirectoryGenerator) and be able to be extended to other commonly 
> used document formats such as PDF
> 2. The output of #1 would be used as input to create a Lucene Index.
> 3. The Lucene index would be used to search an Intranet by Author, 
> Title, Subject, etc.
>
> This would mean that content-creators in the organisation would 
> categorise documents simply by updating <SummaryInformation> 
> ('Properties' in MS-Office applications) and then uploading the file 
> (the current implementation requires them to update a database 
> separate to the document itself). The Cocoon application would 
> automatically categorise the document, either by using Lucene or from 
> the SummaryInformation. Indexing would only apply to the header/meta 
> info - full text indexing of content is not required.
>
> The question (to experienced Cocoon developers) is what is the 
> preferred method of implementation?
>
> Option 1. Extend DirectoryGenerator similar to the way 
> ImageDirectoryGenerator is implemented but adding new file types
>
> Option 2. Use DirectoryGenerator 'as is' but augment it with a 
> HeaderGenerator per file/mimetype and then aggregate results such that 
> the output is similar to #1
>
> Option 3. Tell the users to 'SaveAs' MS-Office documents into an XML 
> format and use XSLT to extract the summary information. For example 
> Visio binary format (VSD) can be saved as VXD and the same information 
> can be extracted via XSLT
>
> All of the above are feasible and invariant to the user-interface so 
> the question is more about performance.
>
> Has anyone gone down this route? Are there any pitfalls I need to be 
> aware of? For the experienced Cocoon developers, what is your gut-feel 
> about which is the preferred option?
>
> Replies much appreciated.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>