You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Nicolas Maisonneuve <n....@gmail.com> on 2004/11/16 00:34:01 UTC
luceneINdexTransformer not optimized
the method to update a document is not optimized (reindexDocument
method). this actual behavior is :
1- open reader if not open (but in fact it's always closed because of line 3)
2-delete document
3-close reader
4-open writer
5- write index
6-close index
(NOTE: with this behavior, the merge factor is useless because this
method index only one document for a opening of indexwriter)
- A optimization in lucene is to avoid to open and close indexreader
and indexwriter a lot of times.
so i propose this simple optimization :
1- open reader if not open
2- delete document
3-store lucene document in a buffer (Stack)
// flush the buffer
if ((buffer % max_buffer)==0) {
// switch to write mode
4- close reader
5- open writer
for (1 to max_buffer) {
6- write
}
7- close writer
}
with this kind of method,
1 -
with a buffer of 100 doc, you divide the number of switching mode
(writ/read) to 100 , and the indexing is much much faster
2- the merge factor is really useful because the indexwriter index
more than 1 document
i've developped a Index component with 2 implemenations
1 indexerDefault with this kind of method
2- MultiThreadIndexer optimized for multiple CPU
maybe it could be interesting to integred this components to the lucene Block
Nicolas Maisonneuve
Re: luceneINdexTransformer not optimized
Posted by Jeremy Quinn <je...@media.demon.co.uk>.
Many thanks
I will review this as soon as I can.
regards Jeremy
On 16 Nov 2004, at 23:03, Nicolas Maisonneuve wrote:
> see http://issues.apache.org/bugzilla/show_bug.cgi?id=32263
>
>
> On Tue, 16 Nov 2004 11:04:39 +0000, Jeremy Quinn
> <je...@media.demon.co.uk> wrote:
>> Dear Nicolas
>>
>> If you were to provide a patch and send it to bugzilla (then notify me
>> of the bug #) I would be happy to review it.
>>
>> regards Jeremy
>>
>>
>>
>>
>> On 15 Nov 2004, at 23:34, Nicolas Maisonneuve wrote:
>>
>>> the method to update a document is not optimized (reindexDocument
>>> method). this actual behavior is :
>>>
>>> 1- open reader if not open (but in fact it's always closed because of
>>> line 3)
>>> 2-delete document
>>> 3-close reader
>>> 4-open writer
>>> 5- write index
>>> 6-close index
>>>
>>> (NOTE: with this behavior, the merge factor is useless because this
>>> method index only one document for a opening of indexwriter)
>>>
>>> - A optimization in lucene is to avoid to open and close indexreader
>>> and indexwriter a lot of times.
>>>
>>> so i propose this simple optimization :
>>> 1- open reader if not open
>>> 2- delete document
>>> 3-store lucene document in a buffer (Stack)
>>>
>>> // flush the buffer
>>> if ((buffer % max_buffer)==0) {
>>>
>>> // switch to write mode
>>> 4- close reader
>>> 5- open writer
>>> for (1 to max_buffer) {
>>> 6- write
>>> }
>>> 7- close writer
>>> }
>>>
>>>
>>> with this kind of method,
>>> 1 -
>>> with a buffer of 100 doc, you divide the number of switching mode
>>> (writ/read) to 100 , and the indexing is much much faster
>>> 2- the merge factor is really useful because the indexwriter index
>>> more than 1 document
>>>
>>>
>>> i've developped a Index component with 2 implemenations
>>> 1 indexerDefault with this kind of method
>>> 2- MultiThreadIndexer optimized for multiple CPU
>>>
>>> maybe it could be interesting to integred this components to the
>>> lucene Block
>>>
>>> Nicolas Maisonneuve
>>>
>>>
>> --------------------------------------------------------
>>
>> If email from this address is not signed
>> IT IS NOT FROM ME
>>
>> Always check the label, folks !!!!!
>> --------------------------------------------------------
>>
>>
>>
>
>
--------------------------------------------------------
If email from this address is not signed
IT IS NOT FROM ME
Always check the label, folks !!!!!
--------------------------------------------------------
Re: luceneINdexTransformer not optimized
Posted by Nicolas Maisonneuve <n....@gmail.com>.
see http://issues.apache.org/bugzilla/show_bug.cgi?id=32263
On Tue, 16 Nov 2004 11:04:39 +0000, Jeremy Quinn
<je...@media.demon.co.uk> wrote:
> Dear Nicolas
>
> If you were to provide a patch and send it to bugzilla (then notify me
> of the bug #) I would be happy to review it.
>
> regards Jeremy
>
>
>
>
> On 15 Nov 2004, at 23:34, Nicolas Maisonneuve wrote:
>
> > the method to update a document is not optimized (reindexDocument
> > method). this actual behavior is :
> >
> > 1- open reader if not open (but in fact it's always closed because of
> > line 3)
> > 2-delete document
> > 3-close reader
> > 4-open writer
> > 5- write index
> > 6-close index
> >
> > (NOTE: with this behavior, the merge factor is useless because this
> > method index only one document for a opening of indexwriter)
> >
> > - A optimization in lucene is to avoid to open and close indexreader
> > and indexwriter a lot of times.
> >
> > so i propose this simple optimization :
> > 1- open reader if not open
> > 2- delete document
> > 3-store lucene document in a buffer (Stack)
> >
> > // flush the buffer
> > if ((buffer % max_buffer)==0) {
> >
> > // switch to write mode
> > 4- close reader
> > 5- open writer
> > for (1 to max_buffer) {
> > 6- write
> > }
> > 7- close writer
> > }
> >
> >
> > with this kind of method,
> > 1 -
> > with a buffer of 100 doc, you divide the number of switching mode
> > (writ/read) to 100 , and the indexing is much much faster
> > 2- the merge factor is really useful because the indexwriter index
> > more than 1 document
> >
> >
> > i've developped a Index component with 2 implemenations
> > 1 indexerDefault with this kind of method
> > 2- MultiThreadIndexer optimized for multiple CPU
> >
> > maybe it could be interesting to integred this components to the
> > lucene Block
> >
> > Nicolas Maisonneuve
> >
> >
> --------------------------------------------------------
>
> If email from this address is not signed
> IT IS NOT FROM ME
>
> Always check the label, folks !!!!!
> --------------------------------------------------------
>
>
>
Re: luceneINdexTransformer not optimized
Posted by Jeremy Quinn <je...@media.demon.co.uk>.
Dear Nicolas
If you were to provide a patch and send it to bugzilla (then notify me
of the bug #) I would be happy to review it.
regards Jeremy
On 15 Nov 2004, at 23:34, Nicolas Maisonneuve wrote:
> the method to update a document is not optimized (reindexDocument
> method). this actual behavior is :
>
> 1- open reader if not open (but in fact it's always closed because of
> line 3)
> 2-delete document
> 3-close reader
> 4-open writer
> 5- write index
> 6-close index
>
> (NOTE: with this behavior, the merge factor is useless because this
> method index only one document for a opening of indexwriter)
>
> - A optimization in lucene is to avoid to open and close indexreader
> and indexwriter a lot of times.
>
> so i propose this simple optimization :
> 1- open reader if not open
> 2- delete document
> 3-store lucene document in a buffer (Stack)
>
> // flush the buffer
> if ((buffer % max_buffer)==0) {
>
> // switch to write mode
> 4- close reader
> 5- open writer
> for (1 to max_buffer) {
> 6- write
> }
> 7- close writer
> }
>
>
> with this kind of method,
> 1 -
> with a buffer of 100 doc, you divide the number of switching mode
> (writ/read) to 100 , and the indexing is much much faster
> 2- the merge factor is really useful because the indexwriter index
> more than 1 document
>
>
> i've developped a Index component with 2 implemenations
> 1 indexerDefault with this kind of method
> 2- MultiThreadIndexer optimized for multiple CPU
>
> maybe it could be interesting to integred this components to the
> lucene Block
>
> Nicolas Maisonneuve
>
>
--------------------------------------------------------
If email from this address is not signed
IT IS NOT FROM ME
Always check the label, folks !!!!!
--------------------------------------------------------