You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Prasad Tendulkar <pr...@cumulus-systems.com> on 2017/12/27 10:18:52 UTC

Scaling issue with Solr

Hello All,

We have been building a Solr based solution to hold a large amount of data (approx 4 TB/day or > 24 Billion documents per day). We are developing a prototype on a small scale just to evaluate Solr performance gradually. Here is our setup configuration.

Solr cloud:
node1: 16 GB RAM, 8 Core CPU, 1TB disk
node2: 16 GB RAM, 8 Core CPU, 1TB disk

Zookeeper is also installed on above 2 machines in cluster mode.
Solr commit intervals: Soft commit 3 minutes, Hard commit 15 seconds
Schema: Basic configuration. 5 fields indexed (out of one is text_general), 6 fields stored.
Collection: 12 shards (6 per node)
Heap memory: 4 GB per node
Disk cache: 12 GB per node
Document is a syslog message.

Documents are being ingested into Solr from different nodes. 12 SolrJ clients ingest data into the Solr cloud.

We are experiencing issues when we keep the setup running for long time and after processing around 100 GB of index size (I.e. Around 600 Million documents). Note that we are only indexing the data and not querying it. So there should not be any query overhead. From the VM analysis we figured out that over time the disk operations starts declining and so does the CPU, RAM and Network usage of the Solr nodes. We concluded that Solr is unable to handle one big collection due to index read/write overhead and most of the time it ends up doing only the commit (evident in Solr logs). And because of that indexing is getting hampered (?)

So we thought of creating small sized collections instead of one big collection anticipating the commit performance might improve. But eventually the performance degrades even with that and we observe more or less similar charts for CPU, memory, disk and network.

To put forth some stats here are the number of documents processed every hour

1St hour: 250 million
2nd hour: 250 million
3rd hour: 240 million
4th hour: 200 million
.
.
11th hour: 80 million

Could you please help us identifying the root cause of degradation in the performance? Are we doing something wrong with the Solr configuration or the collections/sharding etc? Due to this performance degradation we are currently stuck with Solr.

Thank you very much in advance.

Prasad Tendulkar

Re: Scaling issue with Solr

Posted by Shawn Heisey <ap...@elyograg.org>.

On 12/27/2017 3:15 PM, Damien Kamerman wrote:
> You seem to have the soft and hard commits the wrong way around. Hard
> commit is more expensive.

That is only the case if the hard commit has openSearcher set to true.

It is strongly recommended for ALL users to have openSearcher set to
false on autoCommit, and that autoCommit should be enabled.  The example
configs in recent versions have it enabled, set to 15 seconds, with
openSearcher set to false.  I personally would set it to 60-120 seconds,
just because I would like the server to be doing less work.  My servers
are not usually index-heavy, though -- so my restarts are still fast
even with a 60 second autoCommit.

The default for openSearcher is true, so I think that if autoCommit
doesn't specify it, then it is enabled.  The default for autoCommit
should likely change in 8.0, though I do not think that the default for
the commit operation itself should change.

In some situations, soft commits are just as expensive as hard commits
when openSearcher=true.  The really expensive part of a commit is
opening a new searcher, which soft commit does, so in the recommended
config, autoCommit will be much faster than autoSoftCommit.

Thanks,
Shawn

Re: Scaling issue with Solr

Posted by Damien Kamerman <da...@gmail.com>.

You seem to have the soft and hard commits the wrong way around. Hard
commit is more expensive.

On 28 December 2017 at 09:10, Walter Underwood <wu...@wunderwood.org>
wrote:

> Why are you using Solr for log search? Elasticsearch is widely used for
> log search and has the best infrastructure for that.
>
> For the past few years, it looks like a natural market segmentation is
> happening, with Solr used for product search and ES for log search. By now,
> I would not expect Solr to keep up with ES in log search features.
> Likewise, I would not expect ES to keep up with Solr for product and text
> search features.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Dec 27, 2017, at 1:33 PM, Erick Erickson <er...@gmail.com>
> wrote:
> >
> > You are probably hitting more and more background merging which will
> > slow things down. Your system looks to be severely undersized for this
> > scale.
> >
> > One thing you can try (and I emphasize I haven't prototyped this) is
> > to increase your RamBufferSizeMB solrcofnig.xml setting significantly.
> > By default, Solr won't merge segments to greater than 5G, so
> > theoretically you could just set your ramBufferSizeMB to that figure
> > and avoid merging all together. Or you could try configuring the
> > NoMergePolicy in solrconfig.xml (but beware that you're going to
> > create a lot of segments unless you set the rambuffersize higher).
> >
> > How this will affect your indexing throughput I frankly have no data.
> > You can see that with numbers like this, though, a 4G heap is much too
> > small.
> >
> > Best,
> > Erick
> >
> > On Wed, Dec 27, 2017 at 2:18 AM, Prasad Tendulkar
> > <pr...@cumulus-systems.com> wrote:
> >> Hello All,
> >>
> >> We have been building a Solr based solution to hold a large amount of
> data (approx 4 TB/day or > 24 Billion documents per day). We are developing
> a prototype on a small scale just to evaluate Solr performance gradually.
> Here is our setup configuration.
> >>
> >> Solr cloud:
> >> node1: 16 GB RAM, 8 Core CPU, 1TB disk
> >> node2: 16 GB RAM, 8 Core CPU, 1TB disk
> >>
> >> Zookeeper is also installed on above 2 machines in cluster mode.
> >> Solr commit intervals: Soft commit 3 minutes, Hard commit 15 seconds
> >> Schema: Basic configuration. 5 fields indexed (out of one is
> text_general), 6 fields stored.
> >> Collection: 12 shards (6 per node)
> >> Heap memory: 4 GB per node
> >> Disk cache: 12 GB per node
> >> Document is a syslog message.
> >>
> >> Documents are being ingested into Solr from different nodes. 12 SolrJ
> clients ingest data into the Solr cloud.
> >>
> >> We are experiencing issues when we keep the setup running for long time
> and after processing around 100 GB of index size (I.e. Around 600 Million
> documents). Note that we are only indexing the data and not querying it. So
> there should not be any query overhead. From the VM analysis we figured out
> that over time the disk operations starts declining and so does the CPU,
> RAM and Network usage of the Solr nodes. We concluded that Solr is unable
> to handle one big collection due to index read/write overhead and most of
> the time it ends up doing only the commit (evident in Solr logs). And
> because of that indexing is getting hampered (?)
> >>
> >> So we thought of creating small sized collections instead of one big
> collection anticipating the commit performance might improve. But
> eventually the performance degrades even with that and we observe more or
> less similar charts for CPU, memory, disk and network.
> >>
> >> To put forth some stats here are the number of documents processed
> every hour
> >>
> >> 1St hour: 250 million
> >> 2nd hour: 250 million
> >> 3rd hour: 240 million
> >> 4th hour: 200 million
> >> .
> >> .
> >> 11th hour: 80 million
> >>
> >> Could you please help us identifying the root cause of degradation in
> the performance? Are we doing something wrong with the Solr configuration
> or the collections/sharding etc? Due to this performance degradation we are
> currently stuck with Solr.
> >>
> >> Thank you very much in advance.
> >>
> >> Prasad Tendulkar
> >>
> >>
>
>

Re: Scaling issue with Solr

Posted by Walter Underwood <wu...@wunderwood.org>.

Why are you using Solr for log search? Elasticsearch is widely used for log search and has the best infrastructure for that.

For the past few years, it looks like a natural market segmentation is happening, with Solr used for product search and ES for log search. By now, I would not expect Solr to keep up with ES in log search features. Likewise, I would not expect ES to keep up with Solr for product and text search features.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 27, 2017, at 1:33 PM, Erick Erickson <er...@gmail.com> wrote:
> 
> You are probably hitting more and more background merging which will
> slow things down. Your system looks to be severely undersized for this
> scale.
> 
> One thing you can try (and I emphasize I haven't prototyped this) is
> to increase your RamBufferSizeMB solrcofnig.xml setting significantly.
> By default, Solr won't merge segments to greater than 5G, so
> theoretically you could just set your ramBufferSizeMB to that figure
> and avoid merging all together. Or you could try configuring the
> NoMergePolicy in solrconfig.xml (but beware that you're going to
> create a lot of segments unless you set the rambuffersize higher).
> 
> How this will affect your indexing throughput I frankly have no data.
> You can see that with numbers like this, though, a 4G heap is much too
> small.
> 
> Best,
> Erick
> 
> On Wed, Dec 27, 2017 at 2:18 AM, Prasad Tendulkar
> <pr...@cumulus-systems.com> wrote:
>> Hello All,
>> 
>> We have been building a Solr based solution to hold a large amount of data (approx 4 TB/day or > 24 Billion documents per day). We are developing a prototype on a small scale just to evaluate Solr performance gradually. Here is our setup configuration.
>> 
>> Solr cloud:
>> node1: 16 GB RAM, 8 Core CPU, 1TB disk
>> node2: 16 GB RAM, 8 Core CPU, 1TB disk
>> 
>> Zookeeper is also installed on above 2 machines in cluster mode.
>> Solr commit intervals: Soft commit 3 minutes, Hard commit 15 seconds
>> Schema: Basic configuration. 5 fields indexed (out of one is text_general), 6 fields stored.
>> Collection: 12 shards (6 per node)
>> Heap memory: 4 GB per node
>> Disk cache: 12 GB per node
>> Document is a syslog message.
>> 
>> Documents are being ingested into Solr from different nodes. 12 SolrJ clients ingest data into the Solr cloud.
>> 
>> We are experiencing issues when we keep the setup running for long time and after processing around 100 GB of index size (I.e. Around 600 Million documents). Note that we are only indexing the data and not querying it. So there should not be any query overhead. From the VM analysis we figured out that over time the disk operations starts declining and so does the CPU, RAM and Network usage of the Solr nodes. We concluded that Solr is unable to handle one big collection due to index read/write overhead and most of the time it ends up doing only the commit (evident in Solr logs). And because of that indexing is getting hampered (?)
>> 
>> So we thought of creating small sized collections instead of one big collection anticipating the commit performance might improve. But eventually the performance degrades even with that and we observe more or less similar charts for CPU, memory, disk and network.
>> 
>> To put forth some stats here are the number of documents processed every hour
>> 
>> 1St hour: 250 million
>> 2nd hour: 250 million
>> 3rd hour: 240 million
>> 4th hour: 200 million
>> .
>> .
>> 11th hour: 80 million
>> 
>> Could you please help us identifying the root cause of degradation in the performance? Are we doing something wrong with the Solr configuration or the collections/sharding etc? Due to this performance degradation we are currently stuck with Solr.
>> 
>> Thank you very much in advance.
>> 
>> Prasad Tendulkar
>> 
>>

Re: Scaling issue with Solr

Posted by Erick Erickson <er...@gmail.com>.

You are probably hitting more and more background merging which will
slow things down. Your system looks to be severely undersized for this
scale.

One thing you can try (and I emphasize I haven't prototyped this) is
to increase your RamBufferSizeMB solrcofnig.xml setting significantly.
By default, Solr won't merge segments to greater than 5G, so
theoretically you could just set your ramBufferSizeMB to that figure
and avoid merging all together. Or you could try configuring the
NoMergePolicy in solrconfig.xml (but beware that you're going to
create a lot of segments unless you set the rambuffersize higher).

How this will affect your indexing throughput I frankly have no data.
You can see that with numbers like this, though, a 4G heap is much too
small.

Best,
Erick

On Wed, Dec 27, 2017 at 2:18 AM, Prasad Tendulkar
<pr...@cumulus-systems.com> wrote:
> Hello All,
>
> We have been building a Solr based solution to hold a large amount of data (approx 4 TB/day or > 24 Billion documents per day). We are developing a prototype on a small scale just to evaluate Solr performance gradually. Here is our setup configuration.
>
> Solr cloud:
> node1: 16 GB RAM, 8 Core CPU, 1TB disk
> node2: 16 GB RAM, 8 Core CPU, 1TB disk
>
> Zookeeper is also installed on above 2 machines in cluster mode.
> Solr commit intervals: Soft commit 3 minutes, Hard commit 15 seconds
> Schema: Basic configuration. 5 fields indexed (out of one is text_general), 6 fields stored.
> Collection: 12 shards (6 per node)
> Heap memory: 4 GB per node
> Disk cache: 12 GB per node
> Document is a syslog message.
>
> Documents are being ingested into Solr from different nodes. 12 SolrJ clients ingest data into the Solr cloud.
>
> We are experiencing issues when we keep the setup running for long time and after processing around 100 GB of index size (I.e. Around 600 Million documents). Note that we are only indexing the data and not querying it. So there should not be any query overhead. From the VM analysis we figured out that over time the disk operations starts declining and so does the CPU, RAM and Network usage of the Solr nodes. We concluded that Solr is unable to handle one big collection due to index read/write overhead and most of the time it ends up doing only the commit (evident in Solr logs). And because of that indexing is getting hampered (?)
>
> So we thought of creating small sized collections instead of one big collection anticipating the commit performance might improve. But eventually the performance degrades even with that and we observe more or less similar charts for CPU, memory, disk and network.
>
> To put forth some stats here are the number of documents processed every hour
>
> 1St hour: 250 million
> 2nd hour: 250 million
> 3rd hour: 240 million
> 4th hour: 200 million
> .
> .
> 11th hour: 80 million
>
> Could you please help us identifying the root cause of degradation in the performance? Are we doing something wrong with the Solr configuration or the collections/sharding etc? Due to this performance degradation we are currently stuck with Solr.
>
> Thank you very much in advance.
>
> Prasad Tendulkar
>
>