You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Nicolas Maisonneuve <n....@gmail.com> on 2004/11/16 00:34:01 UTC

luceneINdexTransformer not optimized

the method to update a document is not optimized (reindexDocument
method). this actual behavior is :

1- open reader if not open (but in fact it's always closed because of line  3)
2-delete document
3-close reader
4-open writer
5- write index
6-close index 

(NOTE: with this behavior, the merge factor is useless because this
method index only one document for a opening of indexwriter)

- A optimization in lucene is to avoid to open and close  indexreader
and indexwriter a lot of times.

so i propose this simple optimization : 
1- open reader if not open
2- delete document
3-store lucene document in a buffer (Stack)

// flush the buffer 
if ((buffer % max_buffer)==0) {

   // switch to write mode
4-   close reader 
5-   open writer 
   for (1 to max_buffer)  {
6-      write
    }
7- close writer
}


with this kind of method, 
1 -
 with a buffer of 100 doc, you divide the number of switching mode
(writ/read) to 100 , and the indexing is much much faster
2- the merge factor is really useful because the indexwriter index
more than 1 document


i've developped a Index component with 2 implemenations
1 indexerDefault with this kind of method
2- MultiThreadIndexer optimized for multiple CPU 

maybe it  could be interesting to integred this components to the lucene Block 

Nicolas Maisonneuve

Re: luceneINdexTransformer not optimized

Posted by Jeremy Quinn <je...@media.demon.co.uk>.
Many thanks
I will review this as soon as I can.

regards Jeremy


On 16 Nov 2004, at 23:03, Nicolas Maisonneuve wrote:

> see http://issues.apache.org/bugzilla/show_bug.cgi?id=32263
>
>
> On Tue, 16 Nov 2004 11:04:39 +0000, Jeremy Quinn
> <je...@media.demon.co.uk> wrote:
>> Dear Nicolas
>>
>> If you were to provide a patch and send it to bugzilla (then notify me
>> of the bug #) I would be happy to review it.
>>
>> regards Jeremy
>>
>>
>>
>>
>> On 15 Nov 2004, at 23:34, Nicolas Maisonneuve wrote:
>>
>>> the method to update a document is not optimized (reindexDocument
>>> method). this actual behavior is :
>>>
>>> 1- open reader if not open (but in fact it's always closed because of
>>> line  3)
>>> 2-delete document
>>> 3-close reader
>>> 4-open writer
>>> 5- write index
>>> 6-close index
>>>
>>> (NOTE: with this behavior, the merge factor is useless because this
>>> method index only one document for a opening of indexwriter)
>>>
>>> - A optimization in lucene is to avoid to open and close  indexreader
>>> and indexwriter a lot of times.
>>>
>>> so i propose this simple optimization :
>>> 1- open reader if not open
>>> 2- delete document
>>> 3-store lucene document in a buffer (Stack)
>>>
>>> // flush the buffer
>>> if ((buffer % max_buffer)==0) {
>>>
>>>    // switch to write mode
>>> 4-   close reader
>>> 5-   open writer
>>>    for (1 to max_buffer)  {
>>> 6-      write
>>>     }
>>> 7- close writer
>>> }
>>>
>>>
>>> with this kind of method,
>>> 1 -
>>>  with a buffer of 100 doc, you divide the number of switching mode
>>> (writ/read) to 100 , and the indexing is much much faster
>>> 2- the merge factor is really useful because the indexwriter index
>>> more than 1 document
>>>
>>>
>>> i've developped a Index component with 2 implemenations
>>> 1 indexerDefault with this kind of method
>>> 2- MultiThreadIndexer optimized for multiple CPU
>>>
>>> maybe it  could be interesting to integred this components to the
>>> lucene Block
>>>
>>> Nicolas Maisonneuve
>>>
>>>
>> --------------------------------------------------------
>>
>>                    If email from this address is not signed
>>                                  IT IS NOT FROM ME
>>
>>                          Always check the label, folks !!!!!
>> --------------------------------------------------------
>>
>>
>>
>
>
--------------------------------------------------------

                   If email from this address is not signed
                                 IT IS NOT FROM ME

                         Always check the label, folks !!!!!
--------------------------------------------------------


Re: luceneINdexTransformer not optimized

Posted by Nicolas Maisonneuve <n....@gmail.com>.
see http://issues.apache.org/bugzilla/show_bug.cgi?id=32263


On Tue, 16 Nov 2004 11:04:39 +0000, Jeremy Quinn
<je...@media.demon.co.uk> wrote:
> Dear Nicolas
> 
> If you were to provide a patch and send it to bugzilla (then notify me
> of the bug #) I would be happy to review it.
> 
> regards Jeremy
> 
> 
> 
> 
> On 15 Nov 2004, at 23:34, Nicolas Maisonneuve wrote:
> 
> > the method to update a document is not optimized (reindexDocument
> > method). this actual behavior is :
> >
> > 1- open reader if not open (but in fact it's always closed because of
> > line  3)
> > 2-delete document
> > 3-close reader
> > 4-open writer
> > 5- write index
> > 6-close index
> >
> > (NOTE: with this behavior, the merge factor is useless because this
> > method index only one document for a opening of indexwriter)
> >
> > - A optimization in lucene is to avoid to open and close  indexreader
> > and indexwriter a lot of times.
> >
> > so i propose this simple optimization :
> > 1- open reader if not open
> > 2- delete document
> > 3-store lucene document in a buffer (Stack)
> >
> > // flush the buffer
> > if ((buffer % max_buffer)==0) {
> >
> >    // switch to write mode
> > 4-   close reader
> > 5-   open writer
> >    for (1 to max_buffer)  {
> > 6-      write
> >     }
> > 7- close writer
> > }
> >
> >
> > with this kind of method,
> > 1 -
> >  with a buffer of 100 doc, you divide the number of switching mode
> > (writ/read) to 100 , and the indexing is much much faster
> > 2- the merge factor is really useful because the indexwriter index
> > more than 1 document
> >
> >
> > i've developped a Index component with 2 implemenations
> > 1 indexerDefault with this kind of method
> > 2- MultiThreadIndexer optimized for multiple CPU
> >
> > maybe it  could be interesting to integred this components to the
> > lucene Block
> >
> > Nicolas Maisonneuve
> >
> >
> --------------------------------------------------------
> 
>                    If email from this address is not signed
>                                  IT IS NOT FROM ME
> 
>                          Always check the label, folks !!!!!
> --------------------------------------------------------
> 
> 
>

Re: luceneINdexTransformer not optimized

Posted by Jeremy Quinn <je...@media.demon.co.uk>.
Dear Nicolas

If you were to provide a patch and send it to bugzilla (then notify me 
of the bug #) I would be happy to review it.

regards Jeremy


On 15 Nov 2004, at 23:34, Nicolas Maisonneuve wrote:

> the method to update a document is not optimized (reindexDocument
> method). this actual behavior is :
>
> 1- open reader if not open (but in fact it's always closed because of 
> line  3)
> 2-delete document
> 3-close reader
> 4-open writer
> 5- write index
> 6-close index
>
> (NOTE: with this behavior, the merge factor is useless because this
> method index only one document for a opening of indexwriter)
>
> - A optimization in lucene is to avoid to open and close  indexreader
> and indexwriter a lot of times.
>
> so i propose this simple optimization :
> 1- open reader if not open
> 2- delete document
> 3-store lucene document in a buffer (Stack)
>
> // flush the buffer
> if ((buffer % max_buffer)==0) {
>
>    // switch to write mode
> 4-   close reader
> 5-   open writer
>    for (1 to max_buffer)  {
> 6-      write
>     }
> 7- close writer
> }
>
>
> with this kind of method,
> 1 -
>  with a buffer of 100 doc, you divide the number of switching mode
> (writ/read) to 100 , and the indexing is much much faster
> 2- the merge factor is really useful because the indexwriter index
> more than 1 document
>
>
> i've developped a Index component with 2 implemenations
> 1 indexerDefault with this kind of method
> 2- MultiThreadIndexer optimized for multiple CPU
>
> maybe it  could be interesting to integred this components to the 
> lucene Block
>
> Nicolas Maisonneuve
>
>
--------------------------------------------------------

                   If email from this address is not signed
                                 IT IS NOT FROM ME

                         Always check the label, folks !!!!!
--------------------------------------------------------