You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sunnyfr <jo...@gmail.com> on 2009/04/09 17:21:35 UTC

Re: Any tips for indexing large amounts of data?

Hi Otis,
How did you manage that? I've 8 core machine with 8GB of ram and 11GB index
for 14M docs and 50000 update every 30mn but my replication kill everything. 
My segments are merged too often sor full index replicate and cache lost and
.... I've no idea what can I do now?
Some help would be brilliant,
btw im using Solr 1.4.

Thanks,


Otis Gospodnetic wrote:
> 
> Mike is right about the occasional slow-down, which appears as a pause and
> is due to large Lucene index segment merging.  This should go away with
> newer versions of Lucene where this is happening in the background.
> 
> That said, we just indexed about 20MM documents on a single 8-core machine
> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took
> a little less than 10 hours - that's over 550 docs/second.  The vanilla
> approach before some of our changes apparently required several days to
> index the same amount of data.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ----
> From: Mike Klaas <mi...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Monday, November 19, 2007 5:50:19 PM
> Subject: Re: Any tips for indexing large amounts of data?
> 
> There should be some slowdown in larger indices as occasionally large  
> segment merge operations must occur.  However, this shouldn't really  
> affect overall speed too much.
> 
> You haven't really given us enough data to tell you anything useful.   
> I would recommend trying to do the indexing via a webapp to eliminate  
> all your code as a possible factor.  Then, look for signs to what is  
> happening when indexing slows.  For instance, is Solr high in cpu, is  
> the computer thrashing, etc?
> 
> -Mike
> 
> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
> 
>> Hi,
>>
>> Thanks for answering this question a while back. I have made some  
>> of the suggestions you mentioned. ie not committing until I've  
>> finished indexing. What I am seeing though, is as the index get  
>> larger (around 1Gb), indexing is taking a lot longer. In fact it  
>> slows down to a crawl. Have you got any pointers as to what I might  
>> be doing wrong?
>>
>> Also, I was looking at using MultiCore solr. Could this help in  
>> some way?
>>
>> Thank you
>> Brendan
>>
>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
>>
>>>
>>> : I would think you would see better performance by allowing auto  
>>> commit
>>> : to handle the commit size instead of reopening the connection  
>>> all the
>>> : time.
>>>
>>> if your goal is "fast" indexing, don't use autoCommit at all ...
>  just
>>> index everything, and don't commit until you are completely done.
>>>
>>> autoCommitting will slow your indexing down (the benefit being  
>>> that more
>>> results will be visible to searchers as you proceed)
>>>
>>>
>>>
>>>
>>> -Hoss
>>>
>>
> 
> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Any tips for indexing large amounts of data?

Posted by Glen Newton <gl...@gmail.com>.
> - As per
> http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf
Sorry, the presentation covers a lot of ground: see slide #20:
"Standard thread pools can have high contention for task queue and
other data structures when used with fine-grained tasks"
[I haven't yet implemented work stealing]

-glen

2009/4/9 Glen Newton <gl...@gmail.com>:
> For Solr / Lucene:
> - use -XX:+AggressiveOpts
> - If available, huge pages can help. See
> http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html
>  I haven't yet followed-up with my Lucene performance numbers using
> huge pages: it is 10-15% for large indexing jobs.
>
> For Lucene:
> - multi-thread using java.util.concurrent.ThreadPoolExecutor
> (http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
>  6.4 million full-text article + metadata indexed resulting in 83GB
> index; these are old number: things are down to ~10hours now)
> - while multithreading on multicore is particularly good, it also
> improves performance on single core, for small (<6 YMMV) numbers of
> threads & good I/O (test for your particular configuration)
> - Use multiple indexes & merge at the end
> - As per http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf
> use separate ThreadPoolExecutor  per index in previous, reducing queue
> contention. This is giving me an additional ~10%. I will blog about
> this in the near future...
>
> -glen
>
> 2009/4/9 sunnyfr <jo...@gmail.com>:
>>
>> Hi Otis,
>> How did you manage that? I've 8 core machine with 8GB of ram and 11GB index
>> for 14M docs and 50000 update every 30mn but my replication kill everything.
>> My segments are merged too often sor full index replicate and cache lost and
>> .... I've no idea what can I do now?
>> Some help would be brilliant,
>> btw im using Solr 1.4.
>>
>> Thanks,
>>
>>
>> Otis Gospodnetic wrote:
>>>
>>> Mike is right about the occasional slow-down, which appears as a pause and
>>> is due to large Lucene index segment merging.  This should go away with
>>> newer versions of Lucene where this is happening in the background.
>>>
>>> That said, we just indexed about 20MM documents on a single 8-core machine
>>> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took
>>> a little less than 10 hours - that's over 550 docs/second.  The vanilla
>>> approach before some of our changes apparently required several days to
>>> index the same amount of data.
>>>
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>> ----- Original Message ----
>>> From: Mike Klaas <mi...@gmail.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Monday, November 19, 2007 5:50:19 PM
>>> Subject: Re: Any tips for indexing large amounts of data?
>>>
>>> There should be some slowdown in larger indices as occasionally large
>>> segment merge operations must occur.  However, this shouldn't really
>>> affect overall speed too much.
>>>
>>> You haven't really given us enough data to tell you anything useful.
>>> I would recommend trying to do the indexing via a webapp to eliminate
>>> all your code as a possible factor.  Then, look for signs to what is
>>> happening when indexing slows.  For instance, is Solr high in cpu, is
>>> the computer thrashing, etc?
>>>
>>> -Mike
>>>
>>> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks for answering this question a while back. I have made some
>>>> of the suggestions you mentioned. ie not committing until I've
>>>> finished indexing. What I am seeing though, is as the index get
>>>> larger (around 1Gb), indexing is taking a lot longer. In fact it
>>>> slows down to a crawl. Have you got any pointers as to what I might
>>>> be doing wrong?
>>>>
>>>> Also, I was looking at using MultiCore solr. Could this help in
>>>> some way?
>>>>
>>>> Thank you
>>>> Brendan
>>>>
>>>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
>>>>
>>>>>
>>>>> : I would think you would see better performance by allowing auto
>>>>> commit
>>>>> : to handle the commit size instead of reopening the connection
>>>>> all the
>>>>> : time.
>>>>>
>>>>> if your goal is "fast" indexing, don't use autoCommit at all ...
>>>  just
>>>>> index everything, and don't commit until you are completely done.
>>>>>
>>>>> autoCommitting will slow your indexing down (the benefit being
>>>>> that more
>>>>> results will be visible to searchers as you proceed)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -Hoss
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> View this message in context: http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
>
> -
>



-- 

-

Re: Any tips for indexing large amounts of data?

Posted by Glen Newton <gl...@gmail.com>.
For Solr / Lucene:
- use -XX:+AggressiveOpts
- If available, huge pages can help. See
http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html
 I haven't yet followed-up with my Lucene performance numbers using
huge pages: it is 10-15% for large indexing jobs.

For Lucene:
- multi-thread using java.util.concurrent.ThreadPoolExecutor
(http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
  6.4 million full-text article + metadata indexed resulting in 83GB
index; these are old number: things are down to ~10hours now)
- while multithreading on multicore is particularly good, it also
improves performance on single core, for small (<6 YMMV) numbers of
threads & good I/O (test for your particular configuration)
- Use multiple indexes & merge at the end
- As per http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf
use separate ThreadPoolExecutor  per index in previous, reducing queue
contention. This is giving me an additional ~10%. I will blog about
this in the near future...

-glen

2009/4/9 sunnyfr <jo...@gmail.com>:
>
> Hi Otis,
> How did you manage that? I've 8 core machine with 8GB of ram and 11GB index
> for 14M docs and 50000 update every 30mn but my replication kill everything.
> My segments are merged too often sor full index replicate and cache lost and
> .... I've no idea what can I do now?
> Some help would be brilliant,
> btw im using Solr 1.4.
>
> Thanks,
>
>
> Otis Gospodnetic wrote:
>>
>> Mike is right about the occasional slow-down, which appears as a pause and
>> is due to large Lucene index segment merging.  This should go away with
>> newer versions of Lucene where this is happening in the background.
>>
>> That said, we just indexed about 20MM documents on a single 8-core machine
>> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took
>> a little less than 10 hours - that's over 550 docs/second.  The vanilla
>> approach before some of our changes apparently required several days to
>> index the same amount of data.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>> ----- Original Message ----
>> From: Mike Klaas <mi...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Monday, November 19, 2007 5:50:19 PM
>> Subject: Re: Any tips for indexing large amounts of data?
>>
>> There should be some slowdown in larger indices as occasionally large
>> segment merge operations must occur.  However, this shouldn't really
>> affect overall speed too much.
>>
>> You haven't really given us enough data to tell you anything useful.
>> I would recommend trying to do the indexing via a webapp to eliminate
>> all your code as a possible factor.  Then, look for signs to what is
>> happening when indexing slows.  For instance, is Solr high in cpu, is
>> the computer thrashing, etc?
>>
>> -Mike
>>
>> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
>>
>>> Hi,
>>>
>>> Thanks for answering this question a while back. I have made some
>>> of the suggestions you mentioned. ie not committing until I've
>>> finished indexing. What I am seeing though, is as the index get
>>> larger (around 1Gb), indexing is taking a lot longer. In fact it
>>> slows down to a crawl. Have you got any pointers as to what I might
>>> be doing wrong?
>>>
>>> Also, I was looking at using MultiCore solr. Could this help in
>>> some way?
>>>
>>> Thank you
>>> Brendan
>>>
>>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
>>>
>>>>
>>>> : I would think you would see better performance by allowing auto
>>>> commit
>>>> : to handle the commit size instead of reopening the connection
>>>> all the
>>>> : time.
>>>>
>>>> if your goal is "fast" indexing, don't use autoCommit at all ...
>>  just
>>>> index everything, and don't commit until you are completely done.
>>>>
>>>> autoCommitting will slow your indexing down (the benefit being
>>>> that more
>>>> results will be visible to searchers as you proceed)
>>>>
>>>>
>>>>
>>>>
>>>> -Hoss
>>>>
>>>
>>
>>
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 

-

Re: Any tips for indexing large amounts of data?

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
they don't usually turn off the slave , but it is not a bad idea if
you can take it offline. It is a logistical headache.

BTW do you have very good cache hit ratio? then it makes sense to autowarm .
--Noble

On Fri, Apr 10, 2009 at 4:07 PM, sunnyfr <jo...@gmail.com> wrote:
>
> ok but how people do for a frequent update for a large dabase and lot of
> query on it ?
> do they turn off the slave during the warmup ??
>
>
> Noble Paul നോബിള്‍  नोब्ळ् wrote:
>>
>> On Thu, Apr 9, 2009 at 8:51 PM, sunnyfr <jo...@gmail.com> wrote:
>>>
>>> Hi Otis,
>>> How did you manage that? I've 8 core machine with 8GB of ram and 11GB
>>> index
>>> for 14M docs and 50000 update every 30mn but my replication kill
>>> everything.
>>> My segments are merged too often sor full index replicate and cache lost
>>> and
>>> .... I've no idea what can I do now?
>>> Some help would be brilliant,
>>> btw im using Solr 1.4.
>>>
>>
>> sunnnyfr , whether the replication is full or delta , the caches are
>> lost completely.
>>
>> you can think of partitioning the index into separate Solrs and
>> updating one partition at a time and perform distributed search.
>>
>>> Thanks,
>>>
>>>
>>> Otis Gospodnetic wrote:
>>>>
>>>> Mike is right about the occasional slow-down, which appears as a pause
>>>> and
>>>> is due to large Lucene index segment merging.  This should go away with
>>>> newer versions of Lucene where this is happening in the background.
>>>>
>>>> That said, we just indexed about 20MM documents on a single 8-core
>>>> machine
>>>> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process
>>>> took
>>>> a little less than 10 hours - that's over 550 docs/second.  The vanilla
>>>> approach before some of our changes apparently required several days to
>>>> index the same amount of data.
>>>>
>>>> Otis
>>>> --
>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>
>>>> ----- Original Message ----
>>>> From: Mike Klaas <mi...@gmail.com>
>>>> To: solr-user@lucene.apache.org
>>>> Sent: Monday, November 19, 2007 5:50:19 PM
>>>> Subject: Re: Any tips for indexing large amounts of data?
>>>>
>>>> There should be some slowdown in larger indices as occasionally large
>>>> segment merge operations must occur.  However, this shouldn't really
>>>> affect overall speed too much.
>>>>
>>>> You haven't really given us enough data to tell you anything useful.
>>>> I would recommend trying to do the indexing via a webapp to eliminate
>>>> all your code as a possible factor.  Then, look for signs to what is
>>>> happening when indexing slows.  For instance, is Solr high in cpu, is
>>>> the computer thrashing, etc?
>>>>
>>>> -Mike
>>>>
>>>> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks for answering this question a while back. I have made some
>>>>> of the suggestions you mentioned. ie not committing until I've
>>>>> finished indexing. What I am seeing though, is as the index get
>>>>> larger (around 1Gb), indexing is taking a lot longer. In fact it
>>>>> slows down to a crawl. Have you got any pointers as to what I might
>>>>> be doing wrong?
>>>>>
>>>>> Also, I was looking at using MultiCore solr. Could this help in
>>>>> some way?
>>>>>
>>>>> Thank you
>>>>> Brendan
>>>>>
>>>>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
>>>>>
>>>>>>
>>>>>> : I would think you would see better performance by allowing auto
>>>>>> commit
>>>>>> : to handle the commit size instead of reopening the connection
>>>>>> all the
>>>>>> : time.
>>>>>>
>>>>>> if your goal is "fast" indexing, don't use autoCommit at all ...
>>>>  just
>>>>>> index everything, and don't commit until you are completely done.
>>>>>>
>>>>>> autoCommitting will slow your indexing down (the benefit being
>>>>>> that more
>>>>>> results will be visible to searchers as you proceed)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Hoss
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22986152.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: Any tips for indexing large amounts of data?

Posted by sunnyfr <jo...@gmail.com>.
ok but how people do for a frequent update for a large dabase and lot of
query on it ?
do they turn off the slave during the warmup ?? 


Noble Paul നോബിള്‍  नोब्ळ् wrote:
> 
> On Thu, Apr 9, 2009 at 8:51 PM, sunnyfr <jo...@gmail.com> wrote:
>>
>> Hi Otis,
>> How did you manage that? I've 8 core machine with 8GB of ram and 11GB
>> index
>> for 14M docs and 50000 update every 30mn but my replication kill
>> everything.
>> My segments are merged too often sor full index replicate and cache lost
>> and
>> .... I've no idea what can I do now?
>> Some help would be brilliant,
>> btw im using Solr 1.4.
>>
> 
> sunnnyfr , whether the replication is full or delta , the caches are
> lost completely.
> 
> you can think of partitioning the index into separate Solrs and
> updating one partition at a time and perform distributed search.
> 
>> Thanks,
>>
>>
>> Otis Gospodnetic wrote:
>>>
>>> Mike is right about the occasional slow-down, which appears as a pause
>>> and
>>> is due to large Lucene index segment merging.  This should go away with
>>> newer versions of Lucene where this is happening in the background.
>>>
>>> That said, we just indexed about 20MM documents on a single 8-core
>>> machine
>>> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process
>>> took
>>> a little less than 10 hours - that's over 550 docs/second.  The vanilla
>>> approach before some of our changes apparently required several days to
>>> index the same amount of data.
>>>
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>> ----- Original Message ----
>>> From: Mike Klaas <mi...@gmail.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Monday, November 19, 2007 5:50:19 PM
>>> Subject: Re: Any tips for indexing large amounts of data?
>>>
>>> There should be some slowdown in larger indices as occasionally large
>>> segment merge operations must occur.  However, this shouldn't really
>>> affect overall speed too much.
>>>
>>> You haven't really given us enough data to tell you anything useful.
>>> I would recommend trying to do the indexing via a webapp to eliminate
>>> all your code as a possible factor.  Then, look for signs to what is
>>> happening when indexing slows.  For instance, is Solr high in cpu, is
>>> the computer thrashing, etc?
>>>
>>> -Mike
>>>
>>> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks for answering this question a while back. I have made some
>>>> of the suggestions you mentioned. ie not committing until I've
>>>> finished indexing. What I am seeing though, is as the index get
>>>> larger (around 1Gb), indexing is taking a lot longer. In fact it
>>>> slows down to a crawl. Have you got any pointers as to what I might
>>>> be doing wrong?
>>>>
>>>> Also, I was looking at using MultiCore solr. Could this help in
>>>> some way?
>>>>
>>>> Thank you
>>>> Brendan
>>>>
>>>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
>>>>
>>>>>
>>>>> : I would think you would see better performance by allowing auto
>>>>> commit
>>>>> : to handle the commit size instead of reopening the connection
>>>>> all the
>>>>> : time.
>>>>>
>>>>> if your goal is "fast" indexing, don't use autoCommit at all ...
>>>  just
>>>>> index everything, and don't commit until you are completely done.
>>>>>
>>>>> autoCommitting will slow your indexing down (the benefit being
>>>>> that more
>>>>> results will be visible to searchers as you proceed)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -Hoss
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22986152.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Any tips for indexing large amounts of data?

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
On Thu, Apr 9, 2009 at 8:51 PM, sunnyfr <jo...@gmail.com> wrote:
>
> Hi Otis,
> How did you manage that? I've 8 core machine with 8GB of ram and 11GB index
> for 14M docs and 50000 update every 30mn but my replication kill everything.
> My segments are merged too often sor full index replicate and cache lost and
> .... I've no idea what can I do now?
> Some help would be brilliant,
> btw im using Solr 1.4.
>

sunnnyfr , whether the replication is full or delta , the caches are
lost completely.

you can think of partitioning the index into separate Solrs and
updating one partition at a time and perform distributed search.

> Thanks,
>
>
> Otis Gospodnetic wrote:
>>
>> Mike is right about the occasional slow-down, which appears as a pause and
>> is due to large Lucene index segment merging.  This should go away with
>> newer versions of Lucene where this is happening in the background.
>>
>> That said, we just indexed about 20MM documents on a single 8-core machine
>> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took
>> a little less than 10 hours - that's over 550 docs/second.  The vanilla
>> approach before some of our changes apparently required several days to
>> index the same amount of data.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>> ----- Original Message ----
>> From: Mike Klaas <mi...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Monday, November 19, 2007 5:50:19 PM
>> Subject: Re: Any tips for indexing large amounts of data?
>>
>> There should be some slowdown in larger indices as occasionally large
>> segment merge operations must occur.  However, this shouldn't really
>> affect overall speed too much.
>>
>> You haven't really given us enough data to tell you anything useful.
>> I would recommend trying to do the indexing via a webapp to eliminate
>> all your code as a possible factor.  Then, look for signs to what is
>> happening when indexing slows.  For instance, is Solr high in cpu, is
>> the computer thrashing, etc?
>>
>> -Mike
>>
>> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
>>
>>> Hi,
>>>
>>> Thanks for answering this question a while back. I have made some
>>> of the suggestions you mentioned. ie not committing until I've
>>> finished indexing. What I am seeing though, is as the index get
>>> larger (around 1Gb), indexing is taking a lot longer. In fact it
>>> slows down to a crawl. Have you got any pointers as to what I might
>>> be doing wrong?
>>>
>>> Also, I was looking at using MultiCore solr. Could this help in
>>> some way?
>>>
>>> Thank you
>>> Brendan
>>>
>>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
>>>
>>>>
>>>> : I would think you would see better performance by allowing auto
>>>> commit
>>>> : to handle the commit size instead of reopening the connection
>>>> all the
>>>> : time.
>>>>
>>>> if your goal is "fast" indexing, don't use autoCommit at all ...
>>  just
>>>> index everything, and don't commit until you are completely done.
>>>>
>>>> autoCommitting will slow your indexing down (the benefit being
>>>> that more
>>>> results will be visible to searchers as you proceed)
>>>>
>>>>
>>>>
>>>>
>>>> -Hoss
>>>>
>>>
>>
>>
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul