You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Burton-West, Tom" <tb...@umich.edu> on 2010/10/05 20:40:39 UTC

Experience with large merge factors

Hi all,

At some point we will need to re-build an index that totals about 3 terabytes in size (split over 12 shards).  At our current indexing speed we estimate that this will take about 4 weeks.  We would like to reduce that time.  It appears that our main bottleneck is disk I/O during index merging.

Each index is somewhere between 250 and 350GB.  We are currently using a mergeFactor of 10 and a ramBufferSizeMB of 32MB.  What this means is that for every approximately 320 MB, 3.2GB,  and 32GB we get merges.  We are doing this offline and will run an optimize at the end.  What we would like to do is reduce the number of intermediate merges.   We thought about just using a nomerge merge policy and then optimizing at the end, but suspect we would run out of filehandles and that merging 10,000 segments during an optimize might not be efficient.

We would like to find some optimum mergeFactor somewhere between 0 (noMerge merge policy) and 1,000.  (We are also planning to raise the ramBufferSizeMB significantly).

What experience do others have using a large mergeFactor?

Tom

Re: Experience with large merge factors

Posted by Lance Norskog <go...@gmail.com>.

You could do periodic small optimizes. The optimize command now
includes 'maxSegments' which limits the target number of segments.

It is possible to write a Lucene program that collects a bunch of
segments and annoints it as an index. This gives you a way to collect
segments after you write them with the nomergepolicy. As long as you
are strict about not writing duplicate records, you can shovel
segments here and there and collect them into the real index as you
please. Ugly? Yes.

On Tue, Oct 5, 2010 at 4:12 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> 4 weeks is a depressingly long time to re-index!
>
> Do you use multiple threads for indexing?  Large RAM buffer size is
> also good, but I think perf peaks out mabye around 512 MB (at least
> based on past tests)?
>
> Believe it or not, merging is typically compute bound.  It's costly to
> decode & re-encode all the vInts.
>
> Larger merge factor is good because it means the postings are copied
> fewer times, but, it's bad beacuse you could risk running out of
> descriptors, and, if the OS doesn't have enough RAM, you'll start to
> thin out the readahead that the OS can do (which makes the merge less
> efficient since the disk heads are seeking more).
>
> Cutting over to SSDs would also be a good idea, but, kinda pricey
> still ;)
>
> Do you do any deleting?
>
> Do you use stored fields and/or term vectors?  If so, try to make
> your docs "uniform" if possible, ie add the same fields in the same
> order.  This enables lucene to use bulk byte copy merging under the
> hood.
>
> I wouldn't set such a huge merge factor that you effectively disable
> all merging until the end... because, you want to take advantage of
> the concurrency while you're indexing docs to get any/all merging done
> that you can.  To wait and do all merging in the end means you
> serialize (unnecessarily) indexing & merging...
>
> Mike
>
> On Tue, Oct 5, 2010 at 2:40 PM, Burton-West, Tom <tb...@umich.edu> wrote:
>> Hi all,
>>
>> At some point we will need to re-build an index that totals about 3 terabytes in size (split over 12 shards).  At our current indexing speed we estimate that this will take about 4 weeks.  We would like to reduce that time.  It appears that our main bottleneck is disk I/O during index merging.
>>
>> Each index is somewhere between 250 and 350GB.  We are currently using a mergeFactor of 10 and a ramBufferSizeMB of 32MB.  What this means is that for every approximately 320 MB, 3.2GB,  and 32GB we get merges.  We are doing this offline and will run an optimize at the end.  What we would like to do is reduce the number of intermediate merges.   We thought about just using a nomerge merge policy and then optimizing at the end, but suspect we would run out of filehandles and that merging 10,000 segments during an optimize might not be efficient.
>>
>> We would like to find some optimum mergeFactor somewhere between 0 (noMerge merge policy) and 1,000.  (We are also planning to raise the ramBufferSizeMB significantly).
>>
>> What experience do others have using a large mergeFactor?
>>
>> Tom
>>
>>
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Experience with large merge factors

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Wed, Oct 6, 2010 at 9:57 PM, Burton-West, Tom <tb...@umich.edu> wrote:
> Hi Mike,
>
>>.Do you use multiple threads for indexing?  Large RAM buffer size is
>>>also good, but I think perf peaks out mabye around 512 MB (at least
>>>based on past tests)?
>
> We are using Solr, I'm not sure if Solr uses multiple threads for indexing.  We have 30 "producers" each sending documents to 1 of 12 Solr shards on a round robin basis.  So each shard will get multiple requests.

OK, I believe if your clients are sending multiple requests to Solr,
that it will in fact use multiple threads for adding the docs.

But, do you periodically call commit?  Because I believe Solr still
forces a close/open of the IW for commit (which is a holdover from
long ago when Lucene didn't have a .commit method), so that can really
slow down your indexing eg if there's a large running merge...

>>>Believe it or not, merging is typically compute bound.  It's costly to
>>>decode & re-encode all the vInts.
>
> Sounds like we need to do some monitoring during merging to see what the cpu use is and also the io wait during large merges.

And please post back!

>>>Larger merge factor is good because it means the postings are copied
>>>fewer times, but, it's bad beacuse you could risk running out of
>>>descriptors, and, if the OS doesn't have enough RAM, you'll start to
>>>thin out the readahead that the OS can do (which makes the merge less
>>>efficient since the disk heads are seeking more).
>
> Is there a way to estimate the amount of RAM for the readahead?   Once we start the re-indexing we will be running 12 shards on a 16 processor box with 144 GB of memory.

The "worst" part of merging is merging the postings, when we read from
3 (.tis, .frq, .prx) files simultaneously across all of the segments
being merged.  So eg if your MF is 10, then this means the OS has to
buffer readahead for 30 files (competing with other was the OS uses
memory, eg buffer cache for say searches that are running, virtual
memory for processes, etc.).

I'm sure it's rather OS specific, how "well" the OS will read ahead in
this case.  You could test, instead, forcing this readahead in lucene,
by changing the private static constant MERGE_READ_BUFFER_SIZE in
IndexWriter eg to something largish (4 MB?  8 MB?).  A large enough
readahead (either by the OS "being smart" or by Lucene forcing it)
reduces the amortized disk head seek cost, making the merge more CPU
bound and less IO bound.

Just speculating...: it could also be possible that disabling CFS
improves the OS's readahead opto, since we are then truly opening a
file and doing a single read from start to finish.

>>>Do you do any deleting?
> Deletes would happen as a byproduct of updating a record.  This shouldn't happen too frequently during re-indexing, but we update records when a document gets re-scanned and re-OCR'd.  This would probably amount to a few thousand.

OK that's interesting.

I asked because there is a potentially powerful merge opto that we
could implement (haven't yet) which is to bulk-copy the postings data
for a single term, in the case where there are no (or, not many)
deletions.  This should be a very large speedup for merging (and then
likely merging would in fact become IO bound).

>>>Do you use stored fields and/or term vectors?  If so, try to make
>>>your docs "uniform" if possible, ie add the same fields in the same
>>>order.  This enables lucene to use bulk byte copy merging under the hood.
>
> We use 4 or 5 stored fields.  They are very small compared to our huge OCR field.  Since we construct our Solr documents programattically, I'm fairly certain that they are always in the same order.  I'll have to look at the code when I get back to make sure.

OK but that "or" makes me nervous ;)  Ie, for the 4 case, if you still
add that 5th field, say with the empty string as its value, in the
precise same order that they are added in the 5 fields case, then
you'll guarantee the bulk copy opto applies during merge.

And btw it's absurd that apps need to take such steps to ensure this
brittle opto "applies".  Really Lucene should be more deterministic in
how it assigns field names to numbers:

    https://issues.apache.org/jira/browse/LUCENE-1737

Once we do that opto (which is sorta tricky because Lucene assumes in
various places now that field name -> number mapping is "dense") then
you no longer have to resort to such silliness.

> We aren't using term vectors now, but we plan to add them as well as a number of fields based on MARC (cataloging) metadata in the future.

OK same caveat (on field order) applies...

Mike

Re: Experience with large merge factors

Posted by Jan Høydahl / Cominvent <ja...@cominvent.com>.

Vote for or implement :) this issue: 
https://issues.apache.org/jira/browse/SOLR-1565 StreamingUpdateSolrServer should support RequestWriter API

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> As far as I know. It is the simplest, but I found that is uses a 'lot' of CPU overhead because it doesn't support the javabin format for the requests to solr thus marshaling everything to XML.
> Building our own queue with multiple CommonsHttpSolrServer's with the BinairyRequestWriter set, greatly improved our throughput. As it reduced the CPU load on both the machine that gathered the documents as the machine running the Solr server.
> 
> Thijs

Re: Experience with large merge factors

Posted by Thijs <vo...@gmail.com>.


On 7-10-2010 5:36, Otis Gospodnetic wrote:
> Hi Tom,
>
>
>>> .Do you use multiple threads for indexing?  Large RAM  buffer size is
>>>> also good, but I think perf peaks out mabye around 512  MB (at least
>>>> based on past tests)?
>>
>> We are using Solr, I'm not  sure if Solr uses multiple threads for indexing.
>> We have 30 "producers"  each sending documents to 1 of 12 Solr shards on a round
>> robin basis.  So  each shard will get multiple requests.
>
> Solr itself doesn't use multiple threads for indexing, but you can easily do
> that on the client side.  SolrJ's StreamingUpdateSolrServer is the simplest
> thing to use for this.

As far as I know. It is the simplest, but I found that is uses a 'lot' 
of CPU overhead because it doesn't support the javabin format for the 
requests to solr thus marshaling everything to XML.
Building our own queue with multiple CommonsHttpSolrServer's with the 
BinairyRequestWriter set, greatly improved our throughput. As it reduced 
the CPU load on both the machine that gathered the documents as the 
machine running the Solr server.

Thijs


>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> ----- Original Message ----
>> From: "Burton-West, Tom"<tb...@umich.edu>
>> To: "solr-user@lucene.apache.org"<so...@lucene.apache.org>
>> Sent: Wed, October 6, 2010 9:57:12 PM
>> Subject: RE: Experience with large merge factors
>>
>> Hi Mike,
>>
>>> .Do you use multiple threads for indexing?  Large RAM  buffer size is
>>>> also good, but I think perf peaks out mabye around 512  MB (at least
>>>> based on past tests)?
>>
>> We are using Solr, I'm not  sure if Solr uses multiple threads for indexing.
>> We have 30 "producers"  each sending documents to 1 of 12 Solr shards on a round
>> robin basis.  So  each shard will get multiple requests.
>>
>>>> Believe it or not, merging  is typically compute bound.  It's costly to
>>>> decode&   re-encode all the vInts.
>>
>> Sounds like we need to do some monitoring during  merging to see what the cpu
>> use is and also the io wait during large  merges.
>>
>>>> Larger merge factor is good because it means the postings  are copied
>>>> fewer times, but, it's bad beacuse you could risk running  out of
>>>> descriptors, and, if the OS doesn't have enough RAM, you'll  start to
>>>> thin out the readahead that the OS can do (which makes the  merge less
>>>> efficient since the disk heads are seeking  more).
>>
>> Is there a way to estimate the amount of RAM for the  readahead?   Once we
>> start the re-indexing we will be running 12 shards on  a 16 processor box with
>> 144 GB of memory.
>>
>>>> Do you do any  deleting?
>> Deletes would happen as a byproduct of updating a record.   This shouldn't
>> happen too frequently during re-indexing, but we update records  when a document
>> gets re-scanned and re-OCR'd.  This would probably amount  to a few thousand.
>>
>>
>>>> Do you use stored fields and/or term  vectors?  If so, try to make
>>>> your docs "uniform" if possible, ie  add the same fields in the same
>>>> order.  This enables lucene to  use bulk byte copy merging under the hood.
>>
>> We use 4 or 5 stored  fields.  They are very small compared to our huge OCR
>> field.  Since we  construct our Solr documents programattically, I'm fairly
>> certain that they are  always in the same order.  I'll have to look at the code
>> when I get back to  make sure.
>>
>> We aren't using term vectors now, but we plan to add them as  well as a number
>> of fields based on MARC (cataloging) metadata in the  future.
>>
>> Tom

Re: Experience with large merge factors

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Hi Tom,


> >.Do you use multiple threads for indexing?  Large RAM  buffer size is
> >>also good, but I think perf peaks out mabye around 512  MB (at least
> >>based on past tests)?
> 
> We are using Solr, I'm not  sure if Solr uses multiple threads for indexing.  
>We have 30 "producers"  each sending documents to 1 of 12 Solr shards on a round 
>robin basis.  So  each shard will get multiple requests.

Solr itself doesn't use multiple threads for indexing, but you can easily do 
that on the client side.  SolrJ's StreamingUpdateSolrServer is the simplest 
thing to use for this.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: "Burton-West, Tom" <tb...@umich.edu>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Wed, October 6, 2010 9:57:12 PM
> Subject: RE: Experience with large merge factors
> 
> Hi Mike,
> 
> >.Do you use multiple threads for indexing?  Large RAM  buffer size is
> >>also good, but I think perf peaks out mabye around 512  MB (at least
> >>based on past tests)?
> 
> We are using Solr, I'm not  sure if Solr uses multiple threads for indexing.  
>We have 30 "producers"  each sending documents to 1 of 12 Solr shards on a round 
>robin basis.  So  each shard will get multiple requests.
> 
> >>Believe it or not, merging  is typically compute bound.  It's costly to
> >>decode &  re-encode all the vInts.
> 
> Sounds like we need to do some monitoring during  merging to see what the cpu 
>use is and also the io wait during large  merges.
> 
> >>Larger merge factor is good because it means the postings  are copied 
> >>fewer times, but, it's bad beacuse you could risk running  out of
> >>descriptors, and, if the OS doesn't have enough RAM, you'll  start to
> >>thin out the readahead that the OS can do (which makes the  merge less
> >>efficient since the disk heads are seeking  more).
> 
> Is there a way to estimate the amount of RAM for the  readahead?   Once we 
>start the re-indexing we will be running 12 shards on  a 16 processor box with 
>144 GB of memory.
> 
> >>Do you do any  deleting?
> Deletes would happen as a byproduct of updating a record.   This shouldn't 
>happen too frequently during re-indexing, but we update records  when a document 
>gets re-scanned and re-OCR'd.  This would probably amount  to a few thousand.
> 
> 
> >>Do you use stored fields and/or term  vectors?  If so, try to make
> >>your docs "uniform" if possible, ie  add the same fields in the same
> >>order.  This enables lucene to  use bulk byte copy merging under the hood.
> 
> We use 4 or 5 stored  fields.  They are very small compared to our huge OCR 
>field.  Since we  construct our Solr documents programattically, I'm fairly 
>certain that they are  always in the same order.  I'll have to look at the code 
>when I get back to  make sure.
> 
> We aren't using term vectors now, but we plan to add them as  well as a number 
>of fields based on MARC (cataloging) metadata in the  future.
> 
> Tom

RE: Experience with large merge factors

Posted by "Burton-West, Tom" <tb...@umich.edu>.

Hi Mike,

>.Do you use multiple threads for indexing?  Large RAM buffer size is
>>also good, but I think perf peaks out mabye around 512 MB (at least
>>based on past tests)?

We are using Solr, I'm not sure if Solr uses multiple threads for indexing.  We have 30 "producers" each sending documents to 1 of 12 Solr shards on a round robin basis.  So each shard will get multiple requests.

>>Believe it or not, merging is typically compute bound.  It's costly to
>>decode & re-encode all the vInts.

Sounds like we need to do some monitoring during merging to see what the cpu use is and also the io wait during large merges.

>>Larger merge factor is good because it means the postings are copied 
>>fewer times, but, it's bad beacuse you could risk running out of
>>descriptors, and, if the OS doesn't have enough RAM, you'll start to
>>thin out the readahead that the OS can do (which makes the merge less
>>efficient since the disk heads are seeking more).

Is there a way to estimate the amount of RAM for the readahead?   Once we start the re-indexing we will be running 12 shards on a 16 processor box with 144 GB of memory.

>>Do you do any deleting?
Deletes would happen as a byproduct of updating a record.  This shouldn't happen too frequently during re-indexing, but we update records when a document gets re-scanned and re-OCR'd.  This would probably amount to a few thousand.


>>Do you use stored fields and/or term vectors?  If so, try to make
>>your docs "uniform" if possible, ie add the same fields in the same
>>order.  This enables lucene to use bulk byte copy merging under the hood.

We use 4 or 5 stored fields.  They are very small compared to our huge OCR field.  Since we construct our Solr documents programattically, I'm fairly certain that they are always in the same order.  I'll have to look at the code when I get back to make sure.

We aren't using term vectors now, but we plan to add them as well as a number of fields based on MARC (cataloging) metadata in the future.

Tom

Re: Experience with large merge factors

Posted by Michael McCandless <lu...@mikemccandless.com>.

4 weeks is a depressingly long time to re-index!

Do you use multiple threads for indexing?  Large RAM buffer size is
also good, but I think perf peaks out mabye around 512 MB (at least
based on past tests)?

Believe it or not, merging is typically compute bound.  It's costly to
decode & re-encode all the vInts.

Larger merge factor is good because it means the postings are copied
fewer times, but, it's bad beacuse you could risk running out of
descriptors, and, if the OS doesn't have enough RAM, you'll start to
thin out the readahead that the OS can do (which makes the merge less
efficient since the disk heads are seeking more).

Cutting over to SSDs would also be a good idea, but, kinda pricey
still ;)

Do you do any deleting?

Do you use stored fields and/or term vectors?  If so, try to make
your docs "uniform" if possible, ie add the same fields in the same
order.  This enables lucene to use bulk byte copy merging under the
hood.

I wouldn't set such a huge merge factor that you effectively disable
all merging until the end... because, you want to take advantage of
the concurrency while you're indexing docs to get any/all merging done
that you can.  To wait and do all merging in the end means you
serialize (unnecessarily) indexing & merging...

Mike

On Tue, Oct 5, 2010 at 2:40 PM, Burton-West, Tom <tb...@umich.edu> wrote:
> Hi all,
>
> At some point we will need to re-build an index that totals about 3 terabytes in size (split over 12 shards).  At our current indexing speed we estimate that this will take about 4 weeks.  We would like to reduce that time.  It appears that our main bottleneck is disk I/O during index merging.
>
> Each index is somewhere between 250 and 350GB.  We are currently using a mergeFactor of 10 and a ramBufferSizeMB of 32MB.  What this means is that for every approximately 320 MB, 3.2GB,  and 32GB we get merges.  We are doing this offline and will run an optimize at the end.  What we would like to do is reduce the number of intermediate merges.   We thought about just using a nomerge merge policy and then optimizing at the end, but suspect we would run out of filehandles and that merging 10,000 segments during an optimize might not be efficient.
>
> We would like to find some optimum mergeFactor somewhere between 0 (noMerge merge policy) and 1,000.  (We are also planning to raise the ramBufferSizeMB significantly).
>
> What experience do others have using a large mergeFactor?
>
> Tom
>
>
>
>

RE: Experience with large merge factors

Posted by "Nguyen, Vincent (CDC/OD/OADS) (CTR)" <vn...@cdc.gov>.

Thank you once again Betsy!

Vincent Vu Nguyen
Division of Science Quality and Translation
Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-6154
Century Bldg 2400
Atlanta, GA 30329 


-----Original Message-----
From: Burton-West, Tom [mailto:tburtonw@umich.edu] 
Sent: Tuesday, October 05, 2010 2:41 PM
To: solr-user@lucene.apache.org
Subject: Experience with large merge factors

Hi all,

At some point we will need to re-build an index that totals about 3
terabytes in size (split over 12 shards).  At our current indexing speed
we estimate that this will take about 4 weeks.  We would like to reduce
that time.  It appears that our main bottleneck is disk I/O during index
merging.

Each index is somewhere between 250 and 350GB.  We are currently using a
mergeFactor of 10 and a ramBufferSizeMB of 32MB.  What this means is
that for every approximately 320 MB, 3.2GB,  and 32GB we get merges.  We
are doing this offline and will run an optimize at the end.  What we
would like to do is reduce the number of intermediate merges.   We
thought about just using a nomerge merge policy and then optimizing at
the end, but suspect we would run out of filehandles and that merging
10,000 segments during an optimize might not be efficient.

We would like to find some optimum mergeFactor somewhere between 0
(noMerge merge policy) and 1,000.  (We are also planning to raise the
ramBufferSizeMB significantly).

What experience do others have using a large mergeFactor?

Tom