You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Andrew Franz <af...@optushome.com.au> on 2005/06/18 02:43:24 UTC
Re: Extending DirectoryGenerator
(reposted from the user mailing list)
Andrew Franz wrote:
> I am thinking about a simple CMS (Content Management System) which
> would have the following features:
> 1. Ability to list MS-Office files along with their
> <SummaryInformation> attributes (this would use Jakarta POI), ability
> to list "image" files (basically by cloning the functionality in
> ImageDirectoryGenerator) and be able to be extended to other commonly
> used document formats such as PDF
> 2. The output of #1 would be used as input to create a Lucene Index.
> 3. The Lucene index would be used to search an Intranet by Author,
> Title, Subject, etc.
>
> This would mean that content-creators in the organisation would
> categorise documents simply by updating <SummaryInformation>
> ('Properties' in MS-Office applications) and then uploading the file
> (the current implementation requires them to update a database
> separate to the document itself). The Cocoon application would
> automatically categorise the document, either by using Lucene or from
> the SummaryInformation. Indexing would only apply to the header/meta
> info - full text indexing of content is not required.
>
> The question (to experienced Cocoon developers) is what is the
> preferred method of implementation?
>
> Option 1. Extend DirectoryGenerator similar to the way
> ImageDirectoryGenerator is implemented but adding new file types
>
> Option 2. Use DirectoryGenerator 'as is' but augment it with a
> HeaderGenerator per file/mimetype and then aggregate results such that
> the output is similar to #1
>
> Option 3. Tell the users to 'SaveAs' MS-Office documents into an XML
> format and use XSLT to extract the summary information. For example
> Visio binary format (VSD) can be saved as VXD and the same information
> can be extracted via XSLT
>
> All of the above are feasible and invariant to the user-interface so
> the question is more about performance.
>
> Has anyone gone down this route? Are there any pitfalls I need to be
> aware of? For the experienced Cocoon developers, what is your gut-feel
> about which is the preferred option?
>
> Replies much appreciated.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>