You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Per Steffensen <st...@designware.dk> on 2013/09/12 08:25:50 UTC

Storing/indexing speed drops quickly

Hi

SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-node on 
each, one collection across the 6 nodes, 4 shards per node
Storing/indexing from 100 threads on external machines, each thread one 
doc at the time, full speed (they always have a new doc to store/index)
See attached images
* iowait.png: Measured I/O wait on the Solr machines
* doccount.png: Measured number of doc in Solr collection

Starting from an empty collection. Things are fine wrt storing/indexing 
speed for the first two-three hours (100M docs per hour), then speed 
goes down dramatically, to an, for us, unacceptable level (max 10M per 
hour). At the same time as speed goes down, we see that I/O wait 
increases dramatically. I am not 100% sure, but quick investigation has 
shown that this is due to almost constant merging.

What to do about this problem?
Know that you can play around with mergeFactor and commit-rate, but 
earlier tests shows that this really do not seem to do the job - it 
might postpone the time where the problem occurs, but basically it is 
just a matter of time before merging exhaust the system.
Is there a way to totally avoid merging, and keep indexing speed at a 
high level, while still making sure that searches will perform fairly 
well when data-amounts become big? (guess without merging you will end 
up with lots and lots of "small" files, and I guess this is not good for 
search response-time)

Regards, Per Steffensen

Re: Storing/indexing speed drops quickly

Posted by Per Steffensen <st...@designware.dk>.

Maybe the fact that we are never ever going to delete or update 
documents, can be used for something. If we delete we will delete entire 
collections.

Regards, Per Steffensen

On 9/12/13 8:25 AM, Per Steffensen wrote:
> Hi
>
> SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-node 
> on each, one collection across the 6 nodes, 4 shards per node
> Storing/indexing from 100 threads on external machines, each thread 
> one doc at the time, full speed (they always have a new doc to 
> store/index)
> See attached images
> * iowait.png: Measured I/O wait on the Solr machines
> * doccount.png: Measured number of doc in Solr collection
>
> Starting from an empty collection. Things are fine wrt 
> storing/indexing speed for the first two-three hours (100M docs per 
> hour), then speed goes down dramatically, to an, for us, unacceptable 
> level (max 10M per hour). At the same time as speed goes down, we see 
> that I/O wait increases dramatically. I am not 100% sure, but quick 
> investigation has shown that this is due to almost constant merging.
>
> What to do about this problem?
> Know that you can play around with mergeFactor and commit-rate, but 
> earlier tests shows that this really do not seem to do the job - it 
> might postpone the time where the problem occurs, but basically it is 
> just a matter of time before merging exhaust the system.
> Is there a way to totally avoid merging, and keep indexing speed at a 
> high level, while still making sure that searches will perform fairly 
> well when data-amounts become big? (guess without merging you will end 
> up with lots and lots of "small" files, and I guess this is not good 
> for search response-time)
>
> Regards, Per Steffensen

Re: Storing/indexing speed drops quickly

Posted by Per Steffensen <st...@designware.dk>.

Now running the tests on a slightly reduced setup (2 machines, quadcore, 
8GB ram ...), but that doesnt matter

We see that storing/indexing speed drops when using 
IndexWriter.updateDocument in DirectUpdateHandler2.addDoc. But it does 
not drop when just using IndexWriter.addDocument (update-requests with 
overwrite=false)
Using addDocument: 
https://dl.dropboxusercontent.com/u/25718039/AddDocument_2Solr8GB_DocCount.png
Using updateDocument: 
https://dl.dropboxusercontent.com/u/25718039/UpdateDocument_2Solr8GB_DocCount.png
We are not too happy about having to use addDocument, because that 
allows for duplicates, and we would really want to avoid that (on 
Solr/Lucene level)

We have confirmed that doubling amount of total RAM will double the 
amount of documents in the index where the indexing-speed starts 
dropping (when we use updateDocument)
On 
https://dl.dropboxusercontent.com/u/25718039/UpdateDocument_2Solr8GB_DocCount.png 
you can see that the speed drops at around 120M documents. Running the 
same test, but with Solr machine having 16GB RAM (instead of 8GB) the 
speed drops at around 240M documents.

Any comments on why indexing speed drops with IndexWriter.updateDocument 
but not with IndexWriter.addDocument?

Regards, Per Steffensen

On 9/12/13 10:14 AM, Per Steffensen wrote:
> Seems like the attachments didnt make it through to this mailing list
>
> https://dl.dropboxusercontent.com/u/25718039/doccount.png
> https://dl.dropboxusercontent.com/u/25718039/iowait.png
>
>
> On 9/12/13 8:25 AM, Per Steffensen wrote:
>> Hi
>>
>> SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-node 
>> on each, one collection across the 6 nodes, 4 shards per node
>> Storing/indexing from 100 threads on external machines, each thread 
>> one doc at the time, full speed (they always have a new doc to 
>> store/index)
>> See attached images
>> * iowait.png: Measured I/O wait on the Solr machines
>> * doccount.png: Measured number of doc in Solr collection
>>
>> Starting from an empty collection. Things are fine wrt 
>> storing/indexing speed for the first two-three hours (100M docs per 
>> hour), then speed goes down dramatically, to an, for us, unacceptable 
>> level (max 10M per hour). At the same time as speed goes down, we see 
>> that I/O wait increases dramatically. I am not 100% sure, but quick 
>> investigation has shown that this is due to almost constant merging.
>>
>> What to do about this problem?
>> Know that you can play around with mergeFactor and commit-rate, but 
>> earlier tests shows that this really do not seem to do the job - it 
>> might postpone the time where the problem occurs, but basically it is 
>> just a matter of time before merging exhaust the system.
>> Is there a way to totally avoid merging, and keep indexing speed at a 
>> high level, while still making sure that searches will perform fairly 
>> well when data-amounts become big? (guess without merging you will 
>> end up with lots and lots of "small" files, and I guess this is not 
>> good for search response-time)
>>
>> Regards, Per Steffensen
>
>

Re: Storing/indexing speed drops quickly

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Fri, 2013-09-13 at 17:32 +0200, Shawn Heisey wrote:
> Put your OS and Solr itself on regular disks in RAID1 and your Solr data 
> on the SSD.  Due to the eventual decay caused by writes, SSD will 
> eventually die, so be ready for SSD failures to take out shard replicas. 

One of the very useful properties of wear-levelling on SSD's is the wear
status of the drive can be queried. When the drive nears its EOL,
replace it.

As Lucene mainly uses bulk writes when updating the index, I will add
that the chances of wearing out a SSD by using it primarily for
Lucene/Solr is pretty hard to do, unless one constructs a pathological
setup.

Your failure argument is thus really a claim that SSDs are not reliable
technology. That is a fair argument as there has been some really rotten
apples among the offerings. This is coupled with the fact that is is
still a very rapidly changing technology, which makes it hard to pick an
older proven drive that is not markedly surpassed by the bleeding edge.

> So far I'm not aware of any RAID solutions that offer TRIM support, 
> and without TRIM support, an SSD eventually has performance problems. 

Search speed is not affected as "only" write performance suffers without
trim, but index update speed will be affected. Also, while it is
possible to get TRIM in RAID, there is currently only a single hardware
option:

http://www.anandtech.com/show/6161/intel-brings-trim-to-raid0-ssd-arrays-on-7series-motherboards-we-test-it

Regards,
- Toke Eskildsen, State and University Library, Denmark

Re: Storing/indexing speed drops quickly

Posted by Shawn Heisey <so...@elyograg.org>.

On 9/13/2013 12:03 AM, Per Steffensen wrote:
> What is it that will fill my heap? I am trying to avoid the FieldCache.
> For now, I am actually not doing any searches - focus on indexing for
> now - and certainly not group/facet/sort searches that will use the
> FieldCache.

I don't know what makes up the heap when you have lots of documents.  I 
am not really using any RAM hungry features and I wouldn't be able to 
get away with a 4GB heap on my Solr servers.  Uncollectable (and 
collectable) RAM usage is heaviest during indexing.  I sort on one or 
two fields and we don't use facets.

Here's a screenshot of my index status page showing how big my indexes 
are on each machine, it's a couple of months old now.  These machines 
have a 6GB heap, and I don't dare make it any smaller, or I'll get OOM 
errors during indexing.  They have 64GB total RAM.

https://dl.dropboxusercontent.com/u/97770508/statuspagescreenshot.png

> More RAM will probably help, but only for a while. I want billions of
> documents in my collections - and also on each machine. Currently we are
> aiming 15 billion documents per month (500 million per day) and keep at
> least two years of data in the system. Currently we use one collection
> for each month, so when the system has been running for two years it
> will be 24 collections with 15 billion documents each. Indexing will
> only go on in the collection corresponding to the "current" month, but
> searching will (potentially) be across all 24 collections. The documents
> are very small. I know that 6 machines will not do in the long run -
> currently this is only testing - but number of machines should not be
> higher than about 20-40. In general it is a problem if Solr/Lucene will
> not perform fairly well if data does not fit RAM - then it cannot really
> be used for "big data". I would have to buy hundreds or even thousands
> of machines with 64GB+ RAM. That is not realistic.

To lower your overall RAM requirements, use SSD, and store as little 
data as possible - only the id used to retrieve data from another 
source, ideally.  That will lower your RAM requirements.  You'll 
probably still want 10-25% of your index size for the disk cache.  With 
regular disks, that's 50-100%.

Put your OS and Solr itself on regular disks in RAID1 and your Solr data 
on the SSD.  Due to the eventual decay caused by writes, SSD will 
eventually die, so be ready for SSD failures to take out shard replicas. 
  So far I'm not aware of any RAID solutions that offer TRIM support, 
and without TRIM support, an SSD eventually has performance problems. 
Without RAID, a failure will take out that replica.  That's one of the 
points of SolrCloud - having replicas so single failures don't bring 
down your index.

If you can't use SSD or get tons of RAM, you're going to have 
performance problems.  Solr (and any other Lucene-based search product) 
does really well with super-large indexes if you have the system 
resources available.  If you don't, it sucks.

Thanks,
Shawn

Re: Storing/indexing speed drops quickly

Posted by Per Steffensen <st...@designware.dk>.

On 9/12/13 4:26 PM, Shawn Heisey wrote:
> On 9/12/2013 2:14 AM, Per Steffensen wrote:
>>> Starting from an empty collection. Things are fine wrt
>>> storing/indexing speed for the first two-three hours (100M docs per
>>> hour), then speed goes down dramatically, to an, for us, unacceptable
>>> level (max 10M per hour). At the same time as speed goes down, we see
>>> that I/O wait increases dramatically. I am not 100% sure, but quick
>>> investigation has shown that this is due to almost constant merging.
> While constant merging is contributing to the slowdown, I would guess
> that your index is simply too big for the amount of RAM that you have.
> Let's ignore for a minute that you're distributed and just concentrate
> on one machine.
>
> After three hours of indexing, you have nearly 300 million documents.
> If you have a replicationFactor of 1, that's still 50 million documents
> per machine.  If your replicationFactor is 2, you've got 100 million
> documents per machine.  Let's focus on the smaller number for a minute.
replicationFactor is 1, so that is about 50 million docs per machine at 
this point
>
> 50 million documents in an index, even if they are small documents, is
> probably going to result in an index size of at least 20GB, and quite
> possibly larger.  In order to make Solr function with that many
> documents, I would guess that you have a heap that's at least 4GB in size.
Currently I have 2,5GB heap, on the 8GB machine - to leave something for 
the OS cache
>
> With only 8GB on the machine, this doesn't leave much RAM for the OS
> disk cache.  If we assume that you have 4GB left for caching, then I
> would expect to see problems about the time your per-machine indexes hit
> 15GB in size.  If you are making it beyond that with a total of 300
> million documents, then I am impressed.
>
> Two things are going to happen when you have enough documents:  1) You
> are going to fill up your Java heap and Java will need to do frequent
> collections to free up enough RAM for normal operation.  When this
> problem gets bad enough, the frequent collections will be *full* GCs,
> which are REALLY slow.
What is it that will fill my heap? I am trying to avoid the FieldCache. 
For now, I am actually not doing any searches - focus on indexing for 
now - and certainly not group/facet/sort searches that will use the 
FieldCache.
>    2) The index will be so big that the OS disk
> cache cannot effectively cache it.  I suspect that the latter is more of
> the problem, but both might be happening at nearly the same time.

>
> When dealing with an index of this size, you want as much RAM as you can
> possibly afford.  I don't think I would try what you are doing without
> at least 64GB per machine, and I would probably use at least an 8GB heap
> on each one, quite possibly larger.  With a heap that large, extreme GC
> tuning becomes a necessity.
More RAM will probably help, but only for a while. I want billions of 
documents in my collections - and also on each machine. Currently we are 
aiming 15 billion documents per month (500 million per day) and keep at 
least two years of data in the system. Currently we use one collection 
for each month, so when the system has been running for two years it 
will be 24 collections with 15 billion documents each. Indexing will 
only go on in the collection corresponding to the "current" month, but 
searching will (potentially) be across all 24 collections. The documents 
are very small. I know that 6 machines will not do in the long run - 
currently this is only testing - but number of machines should not be 
higher than about 20-40. In general it is a problem if Solr/Lucene will 
not perform fairly well if data does not fit RAM - then it cannot really 
be used for "big data". I would have to buy hundreds or even thousands 
of machines with 64GB+ RAM. That is not realistic.
>
> To cut down on the amount of merging, I go with a fairly large
> mergeFactor, but mergeFactor is basically deprecated for
> TieredMergePolicy, there's a new way to configure it now.  Here's the
> indexConfig settings that I use on my dev server:
>
> <indexConfig>
>    <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>      <int name="maxMergeAtOnce">35</int>
>      <int name="segmentsPerTier">35</int>
>      <int name="maxMergeAtOnceExplicit">105</int>
>    </mergePolicy>
>    <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
>      <int name="maxThreadCount">1</int>
>      <int name="maxMergeCount">6</int>
>    </mergeScheduler>
>    <ramBufferSizeMB>48</ramBufferSizeMB>
>    <infoStream file="INFOSTREAM-${solr.core.name}.txt">false</infoStream>
> </indexConfig>
>
> Thanks,
> Shawn
>
>
Thanks!

Re: Storing/indexing speed drops quickly

Posted by Shawn Heisey <so...@elyograg.org>.

On 9/12/2013 2:14 AM, Per Steffensen wrote:
>> Starting from an empty collection. Things are fine wrt
>> storing/indexing speed for the first two-three hours (100M docs per
>> hour), then speed goes down dramatically, to an, for us, unacceptable
>> level (max 10M per hour). At the same time as speed goes down, we see
>> that I/O wait increases dramatically. I am not 100% sure, but quick
>> investigation has shown that this is due to almost constant merging.

While constant merging is contributing to the slowdown, I would guess
that your index is simply too big for the amount of RAM that you have.
Let's ignore for a minute that you're distributed and just concentrate
on one machine.

After three hours of indexing, you have nearly 300 million documents.
If you have a replicationFactor of 1, that's still 50 million documents
per machine.  If your replicationFactor is 2, you've got 100 million
documents per machine.  Let's focus on the smaller number for a minute.

50 million documents in an index, even if they are small documents, is
probably going to result in an index size of at least 20GB, and quite
possibly larger.  In order to make Solr function with that many
documents, I would guess that you have a heap that's at least 4GB in size.

With only 8GB on the machine, this doesn't leave much RAM for the OS
disk cache.  If we assume that you have 4GB left for caching, then I
would expect to see problems about the time your per-machine indexes hit
15GB in size.  If you are making it beyond that with a total of 300
million documents, then I am impressed.

Two things are going to happen when you have enough documents:  1) You
are going to fill up your Java heap and Java will need to do frequent
collections to free up enough RAM for normal operation.  When this
problem gets bad enough, the frequent collections will be *full* GCs,
which are REALLY slow.  2) The index will be so big that the OS disk
cache cannot effectively cache it.  I suspect that the latter is more of
the problem, but both might be happening at nearly the same time.

When dealing with an index of this size, you want as much RAM as you can
possibly afford.  I don't think I would try what you are doing without
at least 64GB per machine, and I would probably use at least an 8GB heap
on each one, quite possibly larger.  With a heap that large, extreme GC
tuning becomes a necessity.

To cut down on the amount of merging, I go with a fairly large
mergeFactor, but mergeFactor is basically deprecated for
TieredMergePolicy, there's a new way to configure it now.  Here's the
indexConfig settings that I use on my dev server:

<indexConfig>
  <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
    <int name="maxMergeAtOnce">35</int>
    <int name="segmentsPerTier">35</int>
    <int name="maxMergeAtOnceExplicit">105</int>
  </mergePolicy>
  <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
    <int name="maxThreadCount">1</int>
    <int name="maxMergeCount">6</int>
  </mergeScheduler>
  <ramBufferSizeMB>48</ramBufferSizeMB>
  <infoStream file="INFOSTREAM-${solr.core.name}.txt">false</infoStream>
</indexConfig>

Thanks,
Shawn

Re: Storing/indexing speed drops quickly

Posted by Per Steffensen <st...@designware.dk>.

Seems like the attachments didnt make it through to this mailing list

https://dl.dropboxusercontent.com/u/25718039/doccount.png
https://dl.dropboxusercontent.com/u/25718039/iowait.png


On 9/12/13 8:25 AM, Per Steffensen wrote:
> Hi
>
> SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-node 
> on each, one collection across the 6 nodes, 4 shards per node
> Storing/indexing from 100 threads on external machines, each thread 
> one doc at the time, full speed (they always have a new doc to 
> store/index)
> See attached images
> * iowait.png: Measured I/O wait on the Solr machines
> * doccount.png: Measured number of doc in Solr collection
>
> Starting from an empty collection. Things are fine wrt 
> storing/indexing speed for the first two-three hours (100M docs per 
> hour), then speed goes down dramatically, to an, for us, unacceptable 
> level (max 10M per hour). At the same time as speed goes down, we see 
> that I/O wait increases dramatically. I am not 100% sure, but quick 
> investigation has shown that this is due to almost constant merging.
>
> What to do about this problem?
> Know that you can play around with mergeFactor and commit-rate, but 
> earlier tests shows that this really do not seem to do the job - it 
> might postpone the time where the problem occurs, but basically it is 
> just a matter of time before merging exhaust the system.
> Is there a way to totally avoid merging, and keep indexing speed at a 
> high level, while still making sure that searches will perform fairly 
> well when data-amounts become big? (guess without merging you will end 
> up with lots and lots of "small" files, and I guess this is not good 
> for search response-time)
>
> Regards, Per Steffensen