You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Per Steffensen <st...@designware.dk> on 2013/09/12 08:25:50 UTC
Storing/indexing speed drops quickly
Hi
SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-node on
each, one collection across the 6 nodes, 4 shards per node
Storing/indexing from 100 threads on external machines, each thread one
doc at the time, full speed (they always have a new doc to store/index)
See attached images
* iowait.png: Measured I/O wait on the Solr machines
* doccount.png: Measured number of doc in Solr collection
Starting from an empty collection. Things are fine wrt storing/indexing
speed for the first two-three hours (100M docs per hour), then speed
goes down dramatically, to an, for us, unacceptable level (max 10M per
hour). At the same time as speed goes down, we see that I/O wait
increases dramatically. I am not 100% sure, but quick investigation has
shown that this is due to almost constant merging.
What to do about this problem?
Know that you can play around with mergeFactor and commit-rate, but
earlier tests shows that this really do not seem to do the job - it
might postpone the time where the problem occurs, but basically it is
just a matter of time before merging exhaust the system.
Is there a way to totally avoid merging, and keep indexing speed at a
high level, while still making sure that searches will perform fairly
well when data-amounts become big? (guess without merging you will end
up with lots and lots of "small" files, and I guess this is not good for
search response-time)
Regards, Per Steffensen
Re: Storing/indexing speed drops quickly
Posted by Per Steffensen <st...@designware.dk>.
Maybe the fact that we are never ever going to delete or update
documents, can be used for something. If we delete we will delete entire
collections.
Regards, Per Steffensen
On 9/12/13 8:25 AM, Per Steffensen wrote:
> Hi
>
> SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-node
> on each, one collection across the 6 nodes, 4 shards per node
> Storing/indexing from 100 threads on external machines, each thread
> one doc at the time, full speed (they always have a new doc to
> store/index)
> See attached images
> * iowait.png: Measured I/O wait on the Solr machines
> * doccount.png: Measured number of doc in Solr collection
>
> Starting from an empty collection. Things are fine wrt
> storing/indexing speed for the first two-three hours (100M docs per
> hour), then speed goes down dramatically, to an, for us, unacceptable
> level (max 10M per hour). At the same time as speed goes down, we see
> that I/O wait increases dramatically. I am not 100% sure, but quick
> investigation has shown that this is due to almost constant merging.
>
> What to do about this problem?
> Know that you can play around with mergeFactor and commit-rate, but
> earlier tests shows that this really do not seem to do the job - it
> might postpone the time where the problem occurs, but basically it is
> just a matter of time before merging exhaust the system.
> Is there a way to totally avoid merging, and keep indexing speed at a
> high level, while still making sure that searches will perform fairly
> well when data-amounts become big? (guess without merging you will end
> up with lots and lots of "small" files, and I guess this is not good
> for search response-time)
>
> Regards, Per Steffensen
Re: Storing/indexing speed drops quickly
Posted by Per Steffensen <st...@designware.dk>.
Now running the tests on a slightly reduced setup (2 machines, quadcore,
8GB ram ...), but that doesnt matter
We see that storing/indexing speed drops when using
IndexWriter.updateDocument in DirectUpdateHandler2.addDoc. But it does
not drop when just using IndexWriter.addDocument (update-requests with
overwrite=false)
Using addDocument:
https://dl.dropboxusercontent.com/u/25718039/AddDocument_2Solr8GB_DocCount.png
Using updateDocument:
https://dl.dropboxusercontent.com/u/25718039/UpdateDocument_2Solr8GB_DocCount.png
We are not too happy about having to use addDocument, because that
allows for duplicates, and we would really want to avoid that (on
Solr/Lucene level)
We have confirmed that doubling amount of total RAM will double the
amount of documents in the index where the indexing-speed starts
dropping (when we use updateDocument)
On
https://dl.dropboxusercontent.com/u/25718039/UpdateDocument_2Solr8GB_DocCount.png
you can see that the speed drops at around 120M documents. Running the
same test, but with Solr machine having 16GB RAM (instead of 8GB) the
speed drops at around 240M documents.
Any comments on why indexing speed drops with IndexWriter.updateDocument
but not with IndexWriter.addDocument?
Regards, Per Steffensen
On 9/12/13 10:14 AM, Per Steffensen wrote:
> Seems like the attachments didnt make it through to this mailing list
>
> https://dl.dropboxusercontent.com/u/25718039/doccount.png
> https://dl.dropboxusercontent.com/u/25718039/iowait.png
>
>
> On 9/12/13 8:25 AM, Per Steffensen wrote:
>> Hi
>>
>> SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-node
>> on each, one collection across the 6 nodes, 4 shards per node
>> Storing/indexing from 100 threads on external machines, each thread
>> one doc at the time, full speed (they always have a new doc to
>> store/index)
>> See attached images
>> * iowait.png: Measured I/O wait on the Solr machines
>> * doccount.png: Measured number of doc in Solr collection
>>
>> Starting from an empty collection. Things are fine wrt
>> storing/indexing speed for the first two-three hours (100M docs per
>> hour), then speed goes down dramatically, to an, for us, unacceptable
>> level (max 10M per hour). At the same time as speed goes down, we see
>> that I/O wait increases dramatically. I am not 100% sure, but quick
>> investigation has shown that this is due to almost constant merging.
>>
>> What to do about this problem?
>> Know that you can play around with mergeFactor and commit-rate, but
>> earlier tests shows that this really do not seem to do the job - it
>> might postpone the time where the problem occurs, but basically it is
>> just a matter of time before merging exhaust the system.
>> Is there a way to totally avoid merging, and keep indexing speed at a
>> high level, while still making sure that searches will perform fairly
>> well when data-amounts become big? (guess without merging you will
>> end up with lots and lots of "small" files, and I guess this is not
>> good for search response-time)
>>
>> Regards, Per Steffensen
>
>
Re: Storing/indexing speed drops quickly
Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Fri, 2013-09-13 at 17:32 +0200, Shawn Heisey wrote:
> Put your OS and Solr itself on regular disks in RAID1 and your Solr data
> on the SSD. Due to the eventual decay caused by writes, SSD will
> eventually die, so be ready for SSD failures to take out shard replicas.
One of the very useful properties of wear-levelling on SSD's is the wear
status of the drive can be queried. When the drive nears its EOL,
replace it.
As Lucene mainly uses bulk writes when updating the index, I will add
that the chances of wearing out a SSD by using it primarily for
Lucene/Solr is pretty hard to do, unless one constructs a pathological
setup.
Your failure argument is thus really a claim that SSDs are not reliable
technology. That is a fair argument as there has been some really rotten
apples among the offerings. This is coupled with the fact that is is
still a very rapidly changing technology, which makes it hard to pick an
older proven drive that is not markedly surpassed by the bleeding edge.
> So far I'm not aware of any RAID solutions that offer TRIM support,
> and without TRIM support, an SSD eventually has performance problems.
Search speed is not affected as "only" write performance suffers without
trim, but index update speed will be affected. Also, while it is
possible to get TRIM in RAID, there is currently only a single hardware
option:
http://www.anandtech.com/show/6161/intel-brings-trim-to-raid0-ssd-arrays-on-7series-motherboards-we-test-it
Regards,
- Toke Eskildsen, State and University Library, Denmark
Re: Storing/indexing speed drops quickly
Posted by Shawn Heisey <so...@elyograg.org>.
On 9/13/2013 12:03 AM, Per Steffensen wrote:
> What is it that will fill my heap? I am trying to avoid the FieldCache.
> For now, I am actually not doing any searches - focus on indexing for
> now - and certainly not group/facet/sort searches that will use the
> FieldCache.
I don't know what makes up the heap when you have lots of documents. I
am not really using any RAM hungry features and I wouldn't be able to
get away with a 4GB heap on my Solr servers. Uncollectable (and
collectable) RAM usage is heaviest during indexing. I sort on one or
two fields and we don't use facets.
Here's a screenshot of my index status page showing how big my indexes
are on each machine, it's a couple of months old now. These machines
have a 6GB heap, and I don't dare make it any smaller, or I'll get OOM
errors during indexing. They have 64GB total RAM.
https://dl.dropboxusercontent.com/u/97770508/statuspagescreenshot.png
> More RAM will probably help, but only for a while. I want billions of
> documents in my collections - and also on each machine. Currently we are
> aiming 15 billion documents per month (500 million per day) and keep at
> least two years of data in the system. Currently we use one collection
> for each month, so when the system has been running for two years it
> will be 24 collections with 15 billion documents each. Indexing will
> only go on in the collection corresponding to the "current" month, but
> searching will (potentially) be across all 24 collections. The documents
> are very small. I know that 6 machines will not do in the long run -
> currently this is only testing - but number of machines should not be
> higher than about 20-40. In general it is a problem if Solr/Lucene will
> not perform fairly well if data does not fit RAM - then it cannot really
> be used for "big data". I would have to buy hundreds or even thousands
> of machines with 64GB+ RAM. That is not realistic.
To lower your overall RAM requirements, use SSD, and store as little
data as possible - only the id used to retrieve data from another
source, ideally. That will lower your RAM requirements. You'll
probably still want 10-25% of your index size for the disk cache. With
regular disks, that's 50-100%.
Put your OS and Solr itself on regular disks in RAID1 and your Solr data
on the SSD. Due to the eventual decay caused by writes, SSD will
eventually die, so be ready for SSD failures to take out shard replicas.
So far I'm not aware of any RAID solutions that offer TRIM support,
and without TRIM support, an SSD eventually has performance problems.
Without RAID, a failure will take out that replica. That's one of the
points of SolrCloud - having replicas so single failures don't bring
down your index.
If you can't use SSD or get tons of RAM, you're going to have
performance problems. Solr (and any other Lucene-based search product)
does really well with super-large indexes if you have the system
resources available. If you don't, it sucks.
Thanks,
Shawn
Re: Storing/indexing speed drops quickly
Posted by Per Steffensen <st...@designware.dk>.
On 9/12/13 4:26 PM, Shawn Heisey wrote:
> On 9/12/2013 2:14 AM, Per Steffensen wrote:
>>> Starting from an empty collection. Things are fine wrt
>>> storing/indexing speed for the first two-three hours (100M docs per
>>> hour), then speed goes down dramatically, to an, for us, unacceptable
>>> level (max 10M per hour). At the same time as speed goes down, we see
>>> that I/O wait increases dramatically. I am not 100% sure, but quick
>>> investigation has shown that this is due to almost constant merging.
> While constant merging is contributing to the slowdown, I would guess
> that your index is simply too big for the amount of RAM that you have.
> Let's ignore for a minute that you're distributed and just concentrate
> on one machine.
>
> After three hours of indexing, you have nearly 300 million documents.
> If you have a replicationFactor of 1, that's still 50 million documents
> per machine. If your replicationFactor is 2, you've got 100 million
> documents per machine. Let's focus on the smaller number for a minute.
replicationFactor is 1, so that is about 50 million docs per machine at
this point
>
> 50 million documents in an index, even if they are small documents, is
> probably going to result in an index size of at least 20GB, and quite
> possibly larger. In order to make Solr function with that many
> documents, I would guess that you have a heap that's at least 4GB in size.
Currently I have 2,5GB heap, on the 8GB machine - to leave something for
the OS cache
>
> With only 8GB on the machine, this doesn't leave much RAM for the OS
> disk cache. If we assume that you have 4GB left for caching, then I
> would expect to see problems about the time your per-machine indexes hit
> 15GB in size. If you are making it beyond that with a total of 300
> million documents, then I am impressed.
>
> Two things are going to happen when you have enough documents: 1) You
> are going to fill up your Java heap and Java will need to do frequent
> collections to free up enough RAM for normal operation. When this
> problem gets bad enough, the frequent collections will be *full* GCs,
> which are REALLY slow.
What is it that will fill my heap? I am trying to avoid the FieldCache.
For now, I am actually not doing any searches - focus on indexing for
now - and certainly not group/facet/sort searches that will use the
FieldCache.
> 2) The index will be so big that the OS disk
> cache cannot effectively cache it. I suspect that the latter is more of
> the problem, but both might be happening at nearly the same time.
>
> When dealing with an index of this size, you want as much RAM as you can
> possibly afford. I don't think I would try what you are doing without
> at least 64GB per machine, and I would probably use at least an 8GB heap
> on each one, quite possibly larger. With a heap that large, extreme GC
> tuning becomes a necessity.
More RAM will probably help, but only for a while. I want billions of
documents in my collections - and also on each machine. Currently we are
aiming 15 billion documents per month (500 million per day) and keep at
least two years of data in the system. Currently we use one collection
for each month, so when the system has been running for two years it
will be 24 collections with 15 billion documents each. Indexing will
only go on in the collection corresponding to the "current" month, but
searching will (potentially) be across all 24 collections. The documents
are very small. I know that 6 machines will not do in the long run -
currently this is only testing - but number of machines should not be
higher than about 20-40. In general it is a problem if Solr/Lucene will
not perform fairly well if data does not fit RAM - then it cannot really
be used for "big data". I would have to buy hundreds or even thousands
of machines with 64GB+ RAM. That is not realistic.
>
> To cut down on the amount of merging, I go with a fairly large
> mergeFactor, but mergeFactor is basically deprecated for
> TieredMergePolicy, there's a new way to configure it now. Here's the
> indexConfig settings that I use on my dev server:
>
> <indexConfig>
> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
> <int name="maxMergeAtOnce">35</int>
> <int name="segmentsPerTier">35</int>
> <int name="maxMergeAtOnceExplicit">105</int>
> </mergePolicy>
> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
> <int name="maxThreadCount">1</int>
> <int name="maxMergeCount">6</int>
> </mergeScheduler>
> <ramBufferSizeMB>48</ramBufferSizeMB>
> <infoStream file="INFOSTREAM-${solr.core.name}.txt">false</infoStream>
> </indexConfig>
>
> Thanks,
> Shawn
>
>
Thanks!
Re: Storing/indexing speed drops quickly
Posted by Shawn Heisey <so...@elyograg.org>.
On 9/12/2013 2:14 AM, Per Steffensen wrote:
>> Starting from an empty collection. Things are fine wrt
>> storing/indexing speed for the first two-three hours (100M docs per
>> hour), then speed goes down dramatically, to an, for us, unacceptable
>> level (max 10M per hour). At the same time as speed goes down, we see
>> that I/O wait increases dramatically. I am not 100% sure, but quick
>> investigation has shown that this is due to almost constant merging.
While constant merging is contributing to the slowdown, I would guess
that your index is simply too big for the amount of RAM that you have.
Let's ignore for a minute that you're distributed and just concentrate
on one machine.
After three hours of indexing, you have nearly 300 million documents.
If you have a replicationFactor of 1, that's still 50 million documents
per machine. If your replicationFactor is 2, you've got 100 million
documents per machine. Let's focus on the smaller number for a minute.
50 million documents in an index, even if they are small documents, is
probably going to result in an index size of at least 20GB, and quite
possibly larger. In order to make Solr function with that many
documents, I would guess that you have a heap that's at least 4GB in size.
With only 8GB on the machine, this doesn't leave much RAM for the OS
disk cache. If we assume that you have 4GB left for caching, then I
would expect to see problems about the time your per-machine indexes hit
15GB in size. If you are making it beyond that with a total of 300
million documents, then I am impressed.
Two things are going to happen when you have enough documents: 1) You
are going to fill up your Java heap and Java will need to do frequent
collections to free up enough RAM for normal operation. When this
problem gets bad enough, the frequent collections will be *full* GCs,
which are REALLY slow. 2) The index will be so big that the OS disk
cache cannot effectively cache it. I suspect that the latter is more of
the problem, but both might be happening at nearly the same time.
When dealing with an index of this size, you want as much RAM as you can
possibly afford. I don't think I would try what you are doing without
at least 64GB per machine, and I would probably use at least an 8GB heap
on each one, quite possibly larger. With a heap that large, extreme GC
tuning becomes a necessity.
To cut down on the amount of merging, I go with a fairly large
mergeFactor, but mergeFactor is basically deprecated for
TieredMergePolicy, there's a new way to configure it now. Here's the
indexConfig settings that I use on my dev server:
<indexConfig>
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
<int name="maxMergeAtOnce">35</int>
<int name="segmentsPerTier">35</int>
<int name="maxMergeAtOnceExplicit">105</int>
</mergePolicy>
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int name="maxThreadCount">1</int>
<int name="maxMergeCount">6</int>
</mergeScheduler>
<ramBufferSizeMB>48</ramBufferSizeMB>
<infoStream file="INFOSTREAM-${solr.core.name}.txt">false</infoStream>
</indexConfig>
Thanks,
Shawn
Re: Storing/indexing speed drops quickly
Posted by Per Steffensen <st...@designware.dk>.
Seems like the attachments didnt make it through to this mailing list
https://dl.dropboxusercontent.com/u/25718039/doccount.png
https://dl.dropboxusercontent.com/u/25718039/iowait.png
On 9/12/13 8:25 AM, Per Steffensen wrote:
> Hi
>
> SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-node
> on each, one collection across the 6 nodes, 4 shards per node
> Storing/indexing from 100 threads on external machines, each thread
> one doc at the time, full speed (they always have a new doc to
> store/index)
> See attached images
> * iowait.png: Measured I/O wait on the Solr machines
> * doccount.png: Measured number of doc in Solr collection
>
> Starting from an empty collection. Things are fine wrt
> storing/indexing speed for the first two-three hours (100M docs per
> hour), then speed goes down dramatically, to an, for us, unacceptable
> level (max 10M per hour). At the same time as speed goes down, we see
> that I/O wait increases dramatically. I am not 100% sure, but quick
> investigation has shown that this is due to almost constant merging.
>
> What to do about this problem?
> Know that you can play around with mergeFactor and commit-rate, but
> earlier tests shows that this really do not seem to do the job - it
> might postpone the time where the problem occurs, but basically it is
> just a matter of time before merging exhaust the system.
> Is there a way to totally avoid merging, and keep indexing speed at a
> high level, while still making sure that searches will perform fairly
> well when data-amounts become big? (guess without merging you will end
> up with lots and lots of "small" files, and I guess this is not good
> for search response-time)
>
> Regards, Per Steffensen