You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bhaumik Joshi <bj...@asite.com> on 2016/04/11 13:23:06 UTC

Solr Sharding Strategy

Hi,



We are using solr 5.2.0 and we have Index-heavy (100 index updates per sec) and Query-heavy (100 queries per sec) scenario.

Index stats: 10 million documents and 16 GB index size



Which sharding strategy is best suited in above scenario?

Please share reference resources which states detailed comparison of single shard over multi shard if any.



Meanwhile we did some tests with SolrMeter (Standalone java tool for stress tests with Solr) for single shard and two shards.

Index stats of test solr cloud: 0.7 million documents and 1 GB index size.

As observed in test average query time with 2 shards is much higher than single shard.

Please find below detailed readings:
2 Shards

Intended queries per sec

Actual queries per min

Actual queries per sec

Intended updates per sec

Actual updates per min

Actual updates per sec

Total Queries

Total Q time (ms)

Avg Q Time (ms)

Avg Q Time (sec)

Total Client time (ms)

Avg Client time (ms)

10

198

3.3

10

600

10

302

662176

2192

2.192

756603

2505

25

168

2.8

25

1314

21.9

301

2019735

6710

6.71

2370018

7873


1 Shard

Intended queries per sec

Actual queries per min

Actual queries per sec

Intended updates per sec

Actual updates per min

Actual updates per sec

Total Queries

Total Q time (ms)

Avg Q Time (ms)

Avg Q Time (sec)

Total Client time (ms)

Avg Client time (ms)

10

582

9.7

10

618

10.3

302

25081

83

0.083

55612

184

25

1026

17.1

25

834

13.9

306

33366

109

0.109

259392

847


Note: Query returns 250 rows and matches 57880 documents





Thanks & Regards,


[Description: Description: Description: C:\Users\hparekh\AppData\Roaming\Microsoft\Signatures\images\logo.jpg]

Bhaumik Joshi
Developer



Asite, A4, Shivalik Business Center, B/h. Rajpath Club, Opp. Kens Ville Golf Academy, Bodakdev,
Ahmedabad 380054, Gujarat, India.
T: +91 (079) 4021 1900 Ext: 5234 | M: +91 94282 99055 | E: bjoshi@asite.com<ma...@asite.com>
W: www.asite.com<http://www.asite.com/> | Twitter: @Asite<https://twitter.com/Asite/> | Facebook: facebook.com/Asite<http://www.facebook.com/pages/ASITE/201872569531>



[Asite]

The Hyperloop Station Design Competition - A 48hr design collaboration, from mid-day, 23rd May 2016.
REGISTER HERE http://www.buildearthlive.com/hyperloop

[Build Earth Live Hyperloop]<http://www.buildearthlive.com/hyperloop>

[CC Award Winners 2015]

Re: Solr Sharding Strategy

Posted by Bhaumik Joshi <bh...@outlook.com>.
Ok i will try with pausing the indexing fully and will check the impact.

In performance test queries issued sequentially.

Thanks & Regards,
Bhaumik Joshi
________________________________________
From: Toke Eskildsen <te...@statsbiblioteket.dk>
Sent: Monday, April 11, 2016 11:13 PM
To: Bhaumik Joshi
Cc: solr-user@lucene.apache.org
Subject: Re: Solr Sharding Strategy

On Tue, 2016-04-12 at 05:57 +0000, Bhaumik Joshi wrote:

> //Insert Document
> UpdateResponse resp = cloudServer.add(doc, 1000);
>
Don't insert documents one at a time, if it can be avoided:
https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/


Try pausing the indexing fully when you do your query test, to check how
big the impact of indexing is.

When you run your query performance test, are the queries issued
sequentially or in parallel?


- Toke Eskildsen, State and Univeristy Library, Denmark


Re: Solr Sharding Strategy

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Tue, 2016-04-12 at 05:57 +0000, Bhaumik Joshi wrote:

> //Insert Document
> UpdateResponse resp = cloudServer.add(doc, 1000);
> 
Don't insert documents one at a time, if it can be avoided:
https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/


Try pausing the indexing fully when you do your query test, to check how
big the impact of indexing is.

When you run your query performance test, are the queries issued
sequentially or in parallel?


- Toke Eskildsen, State and Univeristy Library, Denmark



Re: Solr Sharding Strategy

Posted by Bhaumik Joshi <bh...@outlook.com>.
Please note that all caches are disable in mentioned test.


In 2 shards: Intended queries and updates = 10 per sec Actual queries per sec = 3.3 Actual updates per sec = 10 so for 302 queries avg query time is 2192ms.

In 1 shard: Intended queries and updates = 10 per sec Actual queries per sec = 9.7 Actual updates per sec = 10.3 so for 302 queries avg query time is 83ms.

We do soft commit when we insert/update document.

//Insert Document
UpdateResponse resp = cloudServer.add(doc, 1000);
if (resp.getStatus() == 0)
{
success = true;
}

//Update Document
UpdateRequest req = new UpdateRequest();
req.setCommitWithin(1000);
req.add(docs);
UpdateResponse resp = req.process(cloudServer);
if (resp.getStatus() == 0)
{
success = true;
}

Here is commit settings in solrconfig.xml.

<autoCommit>
<maxTime>600000</maxTime>
<maxDocs>20000</maxDocs>
<openSearcher>false</openSearcher>
</autoCommit>

<autoSoftCommit>
<maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
</autoSoftCommit>



Thanks & Regards,

Bhaumik Joshi

________________________________
From: Daniel Collins <da...@gmail.com>
Sent: Monday, April 11, 2016 8:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Sharding Strategy

I'd also ask about your indexing times, what QTime do you see for indexing
(in both scenarios), and what commit times are you using (which Toke
already asked).

Not entirely sure how to read your table, but looking at the indexing side
of things, with 2 shards, there is inherently more work to do, so you would
expect indexing latency to increase (we have to index in 1 shard, and then
index in the 2nd shard, so logically its twice the workload).

Your table suggests you managed 10 updates per second, but you never
managed 25 updates per second either with 1 shard or 2 shards.  Though the
numbers don't make sense, you managed 13.9 updates per sec on 1 shard, and
21.9 updates per sec on 2 shards.  That suggests to me that in the single
shard case, your searches are causing your indexing to throttle, maybe the
resourcing is favoring searches and so the indexing threads aren't getting
a look in...  Whereas in the 2 shard case, it seems clear (as Toke said),
that search isn't really hitting the index much, not sure where the
bottleneck is, but its not on the index, which is why your indexing load
can get more requests through.

On 11 April 2016 at 15:36, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:

> On Mon, 2016-04-11 at 11:23 +0000, Bhaumik Joshi wrote:
> > We are using solr 5.2.0 and we have Index-heavy (100 index updates per
> > sec) and Query-heavy (100 queries per sec) scenario.
>
> > Index stats: 10 million documents and 16 GB index size
>
> > Which sharding strategy is best suited in above scenario?
>
> Sharding reduces query throughput and can improve query latency as well
> as indexing speed. For small indexes, the overhead of sharding is likely
> to worsen query latency. So as always, it depends.
>
> Qualified guess: Don't use multiple shards, but consider using replicas.
>
> > Please share reference resources which states detailed comparison of
> > single shard over multi shard if any.
>
> Sorry, could not find the one I had in mind.
> >
> > Meanwhile we did some tests with SolrMeter (Standalone java tool for
> > stress tests with Solr) for single shard and two shards.
> >
> > Index stats of test solr cloud: 0.7 million documents and 1 GB index
> > size.
> >
> > As observed in test average query time with 2 shards is much higher
> > than single shard.
>
> Makes sense: Your shards are so small that the actual time spend on the
> queries is very low. So relatively, the overhead of distributed (aka
> multi-shard) searching is high, negating any search-gain you got by
> sharding. I would not have expected the performance drop-off to be that
> large (factor 20-60) though.
>
> Your query speed is unusually low for an index of your size, which leads
> me to believe that your indexing is slowing everything down. This is
> often due to too frequent commits and/or too many warm up queries.
>
> There is a bit about it at
> https://wiki.apache.org/solr/SolrPerformanceFactors
>
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>
>

Re: Solr Sharding Strategy

Posted by Daniel Collins <da...@gmail.com>.
I'd also ask about your indexing times, what QTime do you see for indexing
(in both scenarios), and what commit times are you using (which Toke
already asked).

Not entirely sure how to read your table, but looking at the indexing side
of things, with 2 shards, there is inherently more work to do, so you would
expect indexing latency to increase (we have to index in 1 shard, and then
index in the 2nd shard, so logically its twice the workload).

Your table suggests you managed 10 updates per second, but you never
managed 25 updates per second either with 1 shard or 2 shards.  Though the
numbers don't make sense, you managed 13.9 updates per sec on 1 shard, and
21.9 updates per sec on 2 shards.  That suggests to me that in the single
shard case, your searches are causing your indexing to throttle, maybe the
resourcing is favoring searches and so the indexing threads aren't getting
a look in...  Whereas in the 2 shard case, it seems clear (as Toke said),
that search isn't really hitting the index much, not sure where the
bottleneck is, but its not on the index, which is why your indexing load
can get more requests through.

On 11 April 2016 at 15:36, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:

> On Mon, 2016-04-11 at 11:23 +0000, Bhaumik Joshi wrote:
> > We are using solr 5.2.0 and we have Index-heavy (100 index updates per
> > sec) and Query-heavy (100 queries per sec) scenario.
>
> > Index stats: 10 million documents and 16 GB index size
>
> > Which sharding strategy is best suited in above scenario?
>
> Sharding reduces query throughput and can improve query latency as well
> as indexing speed. For small indexes, the overhead of sharding is likely
> to worsen query latency. So as always, it depends.
>
> Qualified guess: Don't use multiple shards, but consider using replicas.
>
> > Please share reference resources which states detailed comparison of
> > single shard over multi shard if any.
>
> Sorry, could not find the one I had in mind.
> >
> > Meanwhile we did some tests with SolrMeter (Standalone java tool for
> > stress tests with Solr) for single shard and two shards.
> >
> > Index stats of test solr cloud: 0.7 million documents and 1 GB index
> > size.
> >
> > As observed in test average query time with 2 shards is much higher
> > than single shard.
>
> Makes sense: Your shards are so small that the actual time spend on the
> queries is very low. So relatively, the overhead of distributed (aka
> multi-shard) searching is high, negating any search-gain you got by
> sharding. I would not have expected the performance drop-off to be that
> large (factor 20-60) though.
>
> Your query speed is unusually low for an index of your size, which leads
> me to believe that your indexing is slowing everything down. This is
> often due to too frequent commits and/or too many warm up queries.
>
> There is a bit about it at
> https://wiki.apache.org/solr/SolrPerformanceFactors
>
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>
>

Re: Solr Sharding Strategy

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Mon, 2016-04-11 at 11:23 +0000, Bhaumik Joshi wrote:
> We are using solr 5.2.0 and we have Index-heavy (100 index updates per
> sec) and Query-heavy (100 queries per sec) scenario.

> Index stats: 10 million documents and 16 GB index size

> Which sharding strategy is best suited in above scenario?

Sharding reduces query throughput and can improve query latency as well
as indexing speed. For small indexes, the overhead of sharding is likely
to worsen query latency. So as always, it depends.

Qualified guess: Don't use multiple shards, but consider using replicas.

> Please share reference resources which states detailed comparison of
> single shard over multi shard if any.

Sorry, could not find the one I had in mind.
> 
> Meanwhile we did some tests with SolrMeter (Standalone java tool for
> stress tests with Solr) for single shard and two shards.
> 
> Index stats of test solr cloud: 0.7 million documents and 1 GB index
> size.
> 
> As observed in test average query time with 2 shards is much higher
> than single shard.

Makes sense: Your shards are so small that the actual time spend on the
queries is very low. So relatively, the overhead of distributed (aka
multi-shard) searching is high, negating any search-gain you got by
sharding. I would not have expected the performance drop-off to be that
large (factor 20-60) though.

Your query speed is unusually low for an index of your size, which leads
me to believe that your indexing is slowing everything down. This is
often due to too frequent commits and/or too many warm up queries.

There is a bit about it at 
https://wiki.apache.org/solr/SolrPerformanceFactors


- Toke Eskildsen, State and University Library, Denmark