You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fuad Efendi <fu...@efendi.ca> on 2009/08/11 22:49:50 UTC

Performance Tuning: segment_merge:index_update=5:1 (timing)

In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM Buffer
Flash / Segment Merge per 1 minute of (heavy) batch document updates.

I am using mergeFactor=100 etc (I already posted message...)

So that... I can't see hardware is a problem: with more CPU and faster
RAID-0 I'll get the same numbers, 5:1. Why?..

I'll try with better hardware anyway... during segment merge everything
stops in SOLR although I am (in theory) using ConcurrentMergeScheduler with
Java 6.

 

Having mergeFactor=100000 and doing "commit" only once a year (and waiting a
week(s) till it merge/commit/optimize)???

 


RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

Posted by Fuad Efendi <fu...@efendi.ca>.
I upgraded "master" to 1.4-dev from trunk 3 days ago

BTW such performance broke my "commodity hardware", most probably network
card... can't SSH to check stats; need to check onsite what happened...


-----Original Message-----
From: Grant Ingersoll 
Sent: August-13-09 4:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

BTW, what version of Solr are you on?

On Aug 13, 2009, at 1:43 PM, Fuad Efendi wrote:

> UPDATE:
>
> I have 100,000,000 new documents in 24 hours, including possible  
> updates OR
> possibly adding same document several times. I have two segments now  
> (30Gb
> total), and network is overloaded (I use web crawler to generate  
> documents).
> I never had more than 25,000,000 within a month before...
>
> I read that high mergeFactor improves performance of updates;  
> however, it
> didn't work (it delays all merges... commit/optimize took similar  
> timing).
> High ramBufferSizeMB does the job.
>
>
> [Fuad Efendi] >Looks like I temporarily solved the problem with
> not-so-obvious settings:
> [Fuad Efendi] >ramBufferSizeMB=8192
> [Fuad Efendi] >mergeFactor=10
>
>
>
>> Never tried profiling;
>> 3000-5000 docs per second if SOLR is not busy with segment merge;
>>
>> During segment merge 99% CPU, no disk swap; I can't suspect I/O...
>>
>> During document updates (small batches 100-1000 docs) only 5-15% CPU
>>
>> constant rate 5:1 is very suspicious...
>>
>>> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM
>>> Buffer
>>> Flash / Segment Merge per 1 minute of (heavy) batch document  
>>> updates.
>
>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search




Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

Posted by Grant Ingersoll <gs...@apache.org>.
BTW, what version of Solr are you on?

On Aug 13, 2009, at 1:43 PM, Fuad Efendi wrote:

> UPDATE:
>
> I have 100,000,000 new documents in 24 hours, including possible  
> updates OR
> possibly adding same document several times. I have two segments now  
> (30Gb
> total), and network is overloaded (I use web crawler to generate  
> documents).
> I never had more than 25,000,000 within a month before...
>
> I read that high mergeFactor improves performance of updates;  
> however, it
> didn't work (it delays all merges... commit/optimize took similar  
> timing).
> High ramBufferSizeMB does the job.
>
>
> [Fuad Efendi] >Looks like I temporarily solved the problem with
> not-so-obvious settings:
> [Fuad Efendi] >ramBufferSizeMB=8192
> [Fuad Efendi] >mergeFactor=10
>
>
>
>> Never tried profiling;
>> 3000-5000 docs per second if SOLR is not busy with segment merge;
>>
>> During segment merge 99% CPU, no disk swap; I can't suspect I/O...
>>
>> During document updates (small batches 100-1000 docs) only 5-15% CPU
>>
>> constant rate 5:1 is very suspicious...
>>
>>> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM
>>> Buffer
>>> Flash / Segment Merge per 1 minute of (heavy) batch document  
>>> updates.
>
>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

Posted by Fuad Efendi <fu...@efendi.ca>.
UPDATE:

I have 100,000,000 new documents in 24 hours, including possible updates OR
possibly adding same document several times. I have two segments now (30Gb
total), and network is overloaded (I use web crawler to generate documents).
I never had more than 25,000,000 within a month before...

I read that high mergeFactor improves performance of updates; however, it
didn't work (it delays all merges... commit/optimize took similar timing).
High ramBufferSizeMB does the job.


[Fuad Efendi] >Looks like I temporarily solved the problem with
not-so-obvious settings:
[Fuad Efendi] >ramBufferSizeMB=8192
[Fuad Efendi] >mergeFactor=10



> Never tried profiling;
> 3000-5000 docs per second if SOLR is not busy with segment merge;
>
> During segment merge 99% CPU, no disk swap; I can't suspect I/O...
>
> During document updates (small batches 100-1000 docs) only 5-15% CPU
>
> constant rate 5:1 is very suspicious...
>
>> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM
>> Buffer
>> Flash / Segment Merge per 1 minute of (heavy) batch document updates.




RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

Posted by Fuad Efendi <fu...@efendi.ca>.
Hi Grant,


Looks like I temporarily solved the problem with not-so-obvious settings:
ramBufferSizeMB=8192
mergeFactor=10


Starting from scratch on a different hardware (with much more RAM and CPU;
regular SATA) I have added/updated 30 millions docs within 3 hours...
without any merge yet! Index size moved from 0 to 8Gb (5 files). I had
previously "merge" 10 times per hour, and each took about 5 minutes.


Thanks for the link; is that easy to plug MergePolicy into SOLR? I'll do
more research...


My specific "use case": many updates of documents in the index (although
only "timestamp" field changes in existing "refreshed" document)



-----Original Message-----
From: Grant Ingersoll 
Sent: August-11-09 9:52 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

Is there a time of day you could schedule merges?  See
http://www.lucidimagination.com/search/document/bd53b0431f7eada5/concurrentm
ergescheduler_and_mergepolicy_question

Or, you might be able to implement a scheduler that only merges the  
small segments, and then does the larger ones at slow times.  I  
believe there is a Lucene issue for this that is mentioned by Shai on  
that thread above.


On Aug 11, 2009, at 5:31 PM, Fuad Efendi wrote:

> Forgot to add: committing only once a day
>
> I tried mergeFactor=1000 and performance of index write was  
> extremely good
> (more than 50,000,000 updates during part of a day)
> However, "commit" was taking 2 days or more and I simply killed  
> process
> (suspecting that it can break my harddrive); I had about 8000 files  
> in index
> that day... 3 minutes waiting until new small *.del file appear, and  
> after
> several thousands of such files I killed process.
>
> Most probably "delete" in Lucene... it needs rewrite inverted index  
> (in
> fact, to optimize)...? not sure
>
>
>
> -----Original Message-----
>
> Never tried profiling;
> 3000-5000 docs per second if SOLR is not busy with segment merge;
>
> During segment merge 99% CPU, no disk swap; I can't suspect I/O...
>
> During document updates (small batches 100-1000 docs) only 5-15% CPU
>
> -server 2048Gb option of JVM (which is JRockit) + 256M for RAM Buffer
>
> I can't suspect garbage collection... I'll try to do the same with  
> much
> better hardware tomorrow (2 quad-core instead of single double-core,  
> SCSI
> RAID0 instead of single SAS, 16Gb for Tomcat instead of current 2Gb)  
> but
> constant rate 5:1 is very suspicious...
>
>
>
> -----Original Message-----
> From: Grant Ingersoll
> Sent: August-11-09 5:01 PM
>
> Have you tried profiling?  How often are you committing?  Have you
> looked at Garbage Collection or any of the usual suspects like that?
>
>
> On Aug 11, 2009, at 4:49 PM, Fuad Efendi wrote:
>
>> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM
>> Buffer
>> Flash / Segment Merge per 1 minute of (heavy) batch document updates.
>
> Define heavy.  How many docs per second?
>
>
>
>
>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search




Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

Posted by Grant Ingersoll <gs...@apache.org>.
Is there a time of day you could schedule merges?  See http://www.lucidimagination.com/search/document/bd53b0431f7eada5/concurrentmergescheduler_and_mergepolicy_question

Or, you might be able to implement a scheduler that only merges the  
small segments, and then does the larger ones at slow times.  I  
believe there is a Lucene issue for this that is mentioned by Shai on  
that thread above.


On Aug 11, 2009, at 5:31 PM, Fuad Efendi wrote:

> Forgot to add: committing only once a day
>
> I tried mergeFactor=1000 and performance of index write was  
> extremely good
> (more than 50,000,000 updates during part of a day)
> However, "commit" was taking 2 days or more and I simply killed  
> process
> (suspecting that it can break my harddrive); I had about 8000 files  
> in index
> that day... 3 minutes waiting until new small *.del file appear, and  
> after
> several thousands of such files I killed process.
>
> Most probably "delete" in Lucene... it needs rewrite inverted index  
> (in
> fact, to optimize)...? not sure
>
>
>
> -----Original Message-----
>
> Never tried profiling;
> 3000-5000 docs per second if SOLR is not busy with segment merge;
>
> During segment merge 99% CPU, no disk swap; I can't suspect I/O...
>
> During document updates (small batches 100-1000 docs) only 5-15% CPU
>
> -server 2048Gb option of JVM (which is JRockit) + 256M for RAM Buffer
>
> I can't suspect garbage collection... I'll try to do the same with  
> much
> better hardware tomorrow (2 quad-core instead of single double-core,  
> SCSI
> RAID0 instead of single SAS, 16Gb for Tomcat instead of current 2Gb)  
> but
> constant rate 5:1 is very suspicious...
>
>
>
> -----Original Message-----
> From: Grant Ingersoll
> Sent: August-11-09 5:01 PM
>
> Have you tried profiling?  How often are you committing?  Have you
> looked at Garbage Collection or any of the usual suspects like that?
>
>
> On Aug 11, 2009, at 4:49 PM, Fuad Efendi wrote:
>
>> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM
>> Buffer
>> Flash / Segment Merge per 1 minute of (heavy) batch document updates.
>
> Define heavy.  How many docs per second?
>
>
>
>
>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

Posted by Fuad Efendi <fu...@efendi.ca>.
Hi Jason,

After moving to more RAM and CPUs and setting ramBufferSizeMB=8192 problem
disappeared; I had 100 mlns documents added in 24 hours almost without any
index merge (mergeFactor=10). Lucene flushes to disk the segment when RAM
buffer is full; then MergePolicy orchestrates...

However, 500Gb Seagate SATA got quickly broken on SuSE Linux 10 & Tyan
Thunder motherboard :((( - when SOLR tried to merge 2 segments, about
10Gb... I reinstalled SLES and started again; I ordered SAS RAID Adaptec &
Seagate Cheetah 15K.5 SAS

I am wondering how one can run Nutch on SATA (if Nutch is fast enough)... I
had constant problems with Oracle block corruption on Seagate Barracuda SATA
several years ago, then moved to Cheetah...

Good SCSI controller (with dedicated CPU and cache!!!) + Cheetah 15K.5 (with
16Mb cache!!!) - and we don't need to flush 8Kb if we changed few hundred
bytes only... it's not easy to assemble good "commodity" hardware from
parts...

I am going to use Hadoop for pre-data-mining before indexing with SOLR; I
use currently mix of MySQL & HBase...

Thanks for the input!



-----Original Message-----
From: Jason Rutherglen [mailto:jason.rutherglen@gmail.com] 
Sent: August-17-09 1:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

Fuad,

I'd recommend indexing in Hadoop, then copying the new indexes to Solr
slaves.  This removes the need for Solr master servers.  Of course
you'd need a Hadoop cluster larger than the number of master servers
you have now.  The merge indexes command (which can be taxing on the
servers because it performs a copy) could be used.

It would be good to improve Solr's integration with Hadoop as
otherwise reindexing (such as for a schema change) becomes an onerous
task

-J

On Tue, Aug 11, 2009 at 2:31 PM, Fuad Efendi<fu...@efendi.ca> wrote:
> Forgot to add: committing only once a day
>
> I tried mergeFactor=1000 and performance of index write was extremely good
> (more than 50,000,000 updates during part of a day)
> However, "commit" was taking 2 days or more and I simply killed process
> (suspecting that it can break my harddrive); I had about 8000 files in
index
> that day... 3 minutes waiting until new small *.del file appear, and after
> several thousands of such files I killed process.
>
> Most probably "delete" in Lucene... it needs rewrite inverted index (in
> fact, to optimize)...? not sure
>
>
>
> -----Original Message-----
>
> Never tried profiling;
> 3000-5000 docs per second if SOLR is not busy with segment merge;
>
> During segment merge 99% CPU, no disk swap; I can't suspect I/O...
>
> During document updates (small batches 100-1000 docs) only 5-15% CPU
>
> -server 2048Gb option of JVM (which is JRockit) + 256M for RAM Buffer
>
> I can't suspect garbage collection... I'll try to do the same with much
> better hardware tomorrow (2 quad-core instead of single double-core, SCSI
> RAID0 instead of single SAS, 16Gb for Tomcat instead of current 2Gb) but
> constant rate 5:1 is very suspicious...
>
>
>
> -----Original Message-----
> From: Grant Ingersoll
> Sent: August-11-09 5:01 PM
>
> Have you tried profiling?  How often are you committing?  Have you
> looked at Garbage Collection or any of the usual suspects like that?
>
>
> On Aug 11, 2009, at 4:49 PM, Fuad Efendi wrote:
>
>> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM
>> Buffer
>> Flash / Segment Merge per 1 minute of (heavy) batch document updates.
>
> Define heavy.  How many docs per second?
>
>
>
>
>
>
>



Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

Posted by Jason Rutherglen <ja...@gmail.com>.
Fuad,

I'd recommend indexing in Hadoop, then copying the new indexes to Solr
slaves.  This removes the need for Solr master servers.  Of course
you'd need a Hadoop cluster larger than the number of master servers
you have now.  The merge indexes command (which can be taxing on the
servers because it performs a copy) could be used.

It would be good to improve Solr's integration with Hadoop as
otherwise reindexing (such as for a schema change) becomes an onerous
task

-J

On Tue, Aug 11, 2009 at 2:31 PM, Fuad Efendi<fu...@efendi.ca> wrote:
> Forgot to add: committing only once a day
>
> I tried mergeFactor=1000 and performance of index write was extremely good
> (more than 50,000,000 updates during part of a day)
> However, "commit" was taking 2 days or more and I simply killed process
> (suspecting that it can break my harddrive); I had about 8000 files in index
> that day... 3 minutes waiting until new small *.del file appear, and after
> several thousands of such files I killed process.
>
> Most probably "delete" in Lucene... it needs rewrite inverted index (in
> fact, to optimize)...? not sure
>
>
>
> -----Original Message-----
>
> Never tried profiling;
> 3000-5000 docs per second if SOLR is not busy with segment merge;
>
> During segment merge 99% CPU, no disk swap; I can't suspect I/O...
>
> During document updates (small batches 100-1000 docs) only 5-15% CPU
>
> -server 2048Gb option of JVM (which is JRockit) + 256M for RAM Buffer
>
> I can't suspect garbage collection... I'll try to do the same with much
> better hardware tomorrow (2 quad-core instead of single double-core, SCSI
> RAID0 instead of single SAS, 16Gb for Tomcat instead of current 2Gb) but
> constant rate 5:1 is very suspicious...
>
>
>
> -----Original Message-----
> From: Grant Ingersoll
> Sent: August-11-09 5:01 PM
>
> Have you tried profiling?  How often are you committing?  Have you
> looked at Garbage Collection or any of the usual suspects like that?
>
>
> On Aug 11, 2009, at 4:49 PM, Fuad Efendi wrote:
>
>> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM
>> Buffer
>> Flash / Segment Merge per 1 minute of (heavy) batch document updates.
>
> Define heavy.  How many docs per second?
>
>
>
>
>
>
>

RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

Posted by Fuad Efendi <fu...@efendi.ca>.
Forgot to add: committing only once a day

I tried mergeFactor=1000 and performance of index write was extremely good
(more than 50,000,000 updates during part of a day)
However, "commit" was taking 2 days or more and I simply killed process
(suspecting that it can break my harddrive); I had about 8000 files in index
that day... 3 minutes waiting until new small *.del file appear, and after
several thousands of such files I killed process.

Most probably "delete" in Lucene... it needs rewrite inverted index (in
fact, to optimize)...? not sure



-----Original Message-----

Never tried profiling;
3000-5000 docs per second if SOLR is not busy with segment merge;

During segment merge 99% CPU, no disk swap; I can't suspect I/O...

During document updates (small batches 100-1000 docs) only 5-15% CPU

-server 2048Gb option of JVM (which is JRockit) + 256M for RAM Buffer

I can't suspect garbage collection... I'll try to do the same with much
better hardware tomorrow (2 quad-core instead of single double-core, SCSI
RAID0 instead of single SAS, 16Gb for Tomcat instead of current 2Gb) but
constant rate 5:1 is very suspicious...



-----Original Message-----
From: Grant Ingersoll 
Sent: August-11-09 5:01 PM

Have you tried profiling?  How often are you committing?  Have you  
looked at Garbage Collection or any of the usual suspects like that?


On Aug 11, 2009, at 4:49 PM, Fuad Efendi wrote:

> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM  
> Buffer
> Flash / Segment Merge per 1 minute of (heavy) batch document updates.

Define heavy.  How many docs per second?







RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

Posted by Fuad Efendi <fu...@efendi.ca>.
Never tried profiling;
3000-5000 docs per second if SOLR is not busy with segment merge;

During segment merge 99% CPU, no disk swap; I can't suspect I/O...

During document updates (small batches 100-1000 docs) only 5-15% CPU

-server 2048Gb option of JVM (which is JRockit) + 256M for RAM Buffer

I can't suspect garbage collection... I'll try to do the same with much
better hardware tomorrow (2 quad-core instead of single double-core, SCSI
RAID0 instead of single SAS, 16Gb for Tomcat instead of current 2Gb) but
constant rate 5:1 is very suspicious...



-----Original Message-----
From: Grant Ingersoll 
Sent: August-11-09 5:01 PM

Have you tried profiling?  How often are you committing?  Have you  
looked at Garbage Collection or any of the usual suspects like that?


On Aug 11, 2009, at 4:49 PM, Fuad Efendi wrote:

> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM  
> Buffer
> Flash / Segment Merge per 1 minute of (heavy) batch document updates.

Define heavy.  How many docs per second?





Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

Posted by Grant Ingersoll <gs...@apache.org>.
Have you tried profiling?  How often are you committing?  Have you  
looked at Garbage Collection or any of the usual suspects like that?


On Aug 11, 2009, at 4:49 PM, Fuad Efendi wrote:

> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM  
> Buffer
> Flash / Segment Merge per 1 minute of (heavy) batch document updates.

Define heavy.  How many docs per second?


>
> I am using mergeFactor=100 etc (I already posted message...)
>
> So that... I can't see hardware is a problem: with more CPU and faster
> RAID-0 I'll get the same numbers, 5:1. Why?..
>
> I'll try with better hardware anyway... during segment merge  
> everything
> stops in SOLR although I am (in theory) using  
> ConcurrentMergeScheduler with
> Java 6.
>
>
>
> Having mergeFactor=100000 and doing "commit" only once a year (and  
> waiting a
> week(s) till it merge/commit/optimize)???
>
>
>