You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by "Grzegorz Kossakowski (JIRA)" <ji...@apache.org> on 2007/07/08 12:44:04 UTC

[jira] Commented: (COCOON-2065) huge performance increase of LuceneIndexTransformer on large Lucene indexes

    [ https://issues.apache.org/jira/browse/COCOON-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12510968 ] 

Grzegorz Kossakowski commented on COCOON-2065:
----------------------------------------------

Thanks Dominique for posting a patch.

As you already offered a help with updating documentation, would you like to move the page from wiki to our official documentation repository that is located at http://cocoon.zones.apache.org/daisy/? It's preferable to have that info in official docs.

Documents from Daisy will be published at official, reworked site soon.

> huge performance increase of LuceneIndexTransformer on large Lucene indexes
> ---------------------------------------------------------------------------
>
>                 Key: COCOON-2065
>                 URL: https://issues.apache.org/jira/browse/COCOON-2065
>             Project: Cocoon
>          Issue Type: Improvement
>          Components: Blocks: Lucene
>    Affects Versions: 2.1.6, 2.1.7, 2.1.8, 2.1.9, 2.1.10, 2.1.11-dev (Current SVN), 2.2-dev (Current SVN)
>            Reporter: Dominique De Munck
>            Priority: Minor
>             Fix For: 2.1.11-dev (Current SVN), 2.2-dev (Current SVN)
>
>         Attachments: LuceneIndexTransformer.patch
>
>
> PROBLEM:
> The LuceneIndexTransformer optimizes the Lucene index every time you add an entry to the index.
> This slows down enormously the indexing with a large index ! If upon every checkin of a document eg,
> you use it to update the entry, it will slow down.
> Eg. I have a Pentium IV 2.4 Ghz, Lucene index contains 10 000 doc.
> Where the index update only takes say 60ms, the optimize that get's called, can take 7 seconds!
> SOLUTION:
> I've created a patch that introduces an option "optimize-frequency" to determine the frequency of the optimize call.
> It defaults to 1 (current behaviour), when a user sets it to 50, only once every 50 updates the index will be optimized etc....
> If no optimization is wanted, you can set it to 0.
> This is compliant to the Lucene documentation (fragment of Lucene FAQ):
> "The IndexWriter class supports an optimize() method that compacts the index database and speedup queries. You may want to use this method after performing a complete indexing of your document set or after incremental updates of the index. If your incremental update adds documents frequently, you want to perform the optimization only once in a while to avoid the extra overhead of the optimization."
> PATCH  INFO:
> added configuration option + a function  "needToOptimize()" which is called before optimizing.
> needToOptimize() uses a random function generator, to keep code simple.
> - when the option is not set, CODE WILL BE EXECUTED AS BEFORE
> - tested one 2.1.11 SVN branch, but no differences in the "main" trunk thus can be applied there also.
> - Updated API docs
> - if patch accepted, I will also update the Wiki:
> http://wiki.apache.org/cocoon/LuceneIndexTransformer

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.