You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tri Nguyen <tr...@yahoo.com> on 2010/12/19 10:07:33 UTC

shard versus core

Hi,

Was wondering about  the pro's and con's of using sharding versus cores.

An index can be split up to multiple cores or multilple shards.

So why one over the other?

Thanks,


tri

Re: shard versus core

Posted by Lance Norskog <go...@gmail.com>.
2x the index size is required for optimizing.

Things that increase with index size: indexing time, query time and
disk index size. My 500GB index at a previous job worked. Indexing was
a little slow, queries were much slower. What finally made us split it
up was that one binary blob of 500GB was too much to manage: back up,
optimize etc. It was the IT that made it impossible. Lucene & Solr
worked fine.

On Mon, Dec 20, 2010 at 4:53 AM, Tri Nguyen <tr...@yahoo.com> wrote:
> Thought about it some more and after some reading.  I suppose the answer depends on what kind of response time is expected to be good enough.
>
> I can do some stress testing and see if disk i/o is the bottleneck as the index grows.  I can also look into optimizing/configuring solr parameters to help performance.  One thing I've read is my disk should be at least 2 times the index.
>
>
>
>
> --- On Mon, 12/20/10, Tri Nguyen <tr...@yahoo.com> wrote:
>
>
> From: Tri Nguyen <tr...@yahoo.com>
> Subject: Re: shard versus core
> To: solr-user@lucene.apache.org
> Date: Monday, December 20, 2010, 4:04 AM
>
>
> Hi Erick,
>
> Thanks for the explanation.
>
> At which point does the index get too big where sharding is appropriate where it affects performance?
>
> Tri
>
> --- On Sun, 12/19/10, Erick Erickson <er...@gmail.com> wrote:
>
>
> From: Erick Erickson <er...@gmail.com>
> Subject: Re: shard versus core
> To: solr-user@lucene.apache.org
> Date: Sunday, December 19, 2010, 7:36 AM
>
>
> Well, they can be different beasts. First of all, different cores can have
> different schemas, which is not true of shards. Also, shards are almost
> assumed to be running on different machines as a scaling technique,
> whereas it multiple cores are run on a single Solr instance.
>
> So using multiple cores is very similar to running multiple "virtual" Solr
> serves on a single machine, each independent of the other. This can make
> sense if, for instance, you wanted to have a bunch of small indexes all
> on one machine. You could use multiple cores rather than multiple
> instances of Solr. These indexes may or may not have anything to do with
> each other.
>
> Sharding, on the other hand, is almost always used to split a single logical
> index up amongst multiple machines in order to improve performance. The
> assumption usually is that the index is too big to give satisfactory
> performance
> on a single machine, so you'll split it into parts. That assumption really
> implies that it makes no sense to put multiple shards on the #same# machine.
>
> So really, the answer to your question is that you choose the right
> technique
> for the problem you're trying to solve. They aren't really different
> solutions to
> the same problem...
>
> Hope this helps.
> Erick
>
> On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen <tr...@yahoo.com> wrote:
>
>> Hi,
>>
>> Was wondering about  the pro's and con's of using sharding versus cores.
>>
>> An index can be split up to multiple cores or multilple shards.
>>
>> So why one over the other?
>>
>> Thanks,
>>
>>
>> tri
>



-- 
Lance Norskog
goksron@gmail.com

Re: shard versus core

Posted by Tri Nguyen <tr...@yahoo.com>.
Thought about it some more and after some reading.  I suppose the answer depends on what kind of response time is expected to be good enough.
 
I can do some stress testing and see if disk i/o is the bottleneck as the index grows.  I can also look into optimizing/configuring solr parameters to help performance.  One thing I've read is my disk should be at least 2 times the index.
 
 


--- On Mon, 12/20/10, Tri Nguyen <tr...@yahoo.com> wrote:


From: Tri Nguyen <tr...@yahoo.com>
Subject: Re: shard versus core
To: solr-user@lucene.apache.org
Date: Monday, December 20, 2010, 4:04 AM


Hi Erick,
 
Thanks for the explanation.
 
At which point does the index get too big where sharding is appropriate where it affects performance?
 
Tri

--- On Sun, 12/19/10, Erick Erickson <er...@gmail.com> wrote:


From: Erick Erickson <er...@gmail.com>
Subject: Re: shard versus core
To: solr-user@lucene.apache.org
Date: Sunday, December 19, 2010, 7:36 AM


Well, they can be different beasts. First of all, different cores can have
different schemas, which is not true of shards. Also, shards are almost
assumed to be running on different machines as a scaling technique,
whereas it multiple cores are run on a single Solr instance.

So using multiple cores is very similar to running multiple "virtual" Solr
serves on a single machine, each independent of the other. This can make
sense if, for instance, you wanted to have a bunch of small indexes all
on one machine. You could use multiple cores rather than multiple
instances of Solr. These indexes may or may not have anything to do with
each other.

Sharding, on the other hand, is almost always used to split a single logical
index up amongst multiple machines in order to improve performance. The
assumption usually is that the index is too big to give satisfactory
performance
on a single machine, so you'll split it into parts. That assumption really
implies that it makes no sense to put multiple shards on the #same# machine.

So really, the answer to your question is that you choose the right
technique
for the problem you're trying to solve. They aren't really different
solutions to
the same problem...

Hope this helps.
Erick

On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen <tr...@yahoo.com> wrote:

> Hi,
>
> Was wondering about  the pro's and con's of using sharding versus cores.
>
> An index can be split up to multiple cores or multilple shards.
>
> So why one over the other?
>
> Thanks,
>
>
> tri

Re: shard versus core

Posted by Tri Nguyen <tr...@yahoo.com>.
Hi Erick,
 
Thanks for the explanation.
 
At which point does the index get too big where sharding is appropriate where it affects performance?
 
Tri

--- On Sun, 12/19/10, Erick Erickson <er...@gmail.com> wrote:


From: Erick Erickson <er...@gmail.com>
Subject: Re: shard versus core
To: solr-user@lucene.apache.org
Date: Sunday, December 19, 2010, 7:36 AM


Well, they can be different beasts. First of all, different cores can have
different schemas, which is not true of shards. Also, shards are almost
assumed to be running on different machines as a scaling technique,
whereas it multiple cores are run on a single Solr instance.

So using multiple cores is very similar to running multiple "virtual" Solr
serves on a single machine, each independent of the other. This can make
sense if, for instance, you wanted to have a bunch of small indexes all
on one machine. You could use multiple cores rather than multiple
instances of Solr. These indexes may or may not have anything to do with
each other.

Sharding, on the other hand, is almost always used to split a single logical
index up amongst multiple machines in order to improve performance. The
assumption usually is that the index is too big to give satisfactory
performance
on a single machine, so you'll split it into parts. That assumption really
implies that it makes no sense to put multiple shards on the #same# machine.

So really, the answer to your question is that you choose the right
technique
for the problem you're trying to solve. They aren't really different
solutions to
the same problem...

Hope this helps.
Erick

On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen <tr...@yahoo.com> wrote:

> Hi,
>
> Was wondering about  the pro's and con's of using sharding versus cores.
>
> An index can be split up to multiple cores or multilple shards.
>
> So why one over the other?
>
> Thanks,
>
>
> tri

Re: shard versus core

Posted by Erick Erickson <er...@gmail.com>.
Well, they can be different beasts. First of all, different cores can have
different schemas, which is not true of shards. Also, shards are almost
assumed to be running on different machines as a scaling technique,
whereas it multiple cores are run on a single Solr instance.

So using multiple cores is very similar to running multiple "virtual" Solr
serves on a single machine, each independent of the other. This can make
sense if, for instance, you wanted to have a bunch of small indexes all
on one machine. You could use multiple cores rather than multiple
instances of Solr. These indexes may or may not have anything to do with
each other.

Sharding, on the other hand, is almost always used to split a single logical
index up amongst multiple machines in order to improve performance. The
assumption usually is that the index is too big to give satisfactory
performance
on a single machine, so you'll split it into parts. That assumption really
implies that it makes no sense to put multiple shards on the #same# machine.

So really, the answer to your question is that you choose the right
technique
for the problem you're trying to solve. They aren't really different
solutions to
the same problem...

Hope this helps.
Erick

On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen <tr...@yahoo.com> wrote:

> Hi,
>
> Was wondering about  the pro's and con's of using sharding versus cores.
>
> An index can be split up to multiple cores or multilple shards.
>
> So why one over the other?
>
> Thanks,
>
>
> tri

Re: shard versus core

Posted by Shawn Heisey <so...@elyograg.org>.
On 12/19/2010 2:07 AM, Tri Nguyen wrote:
> Was wondering about  the pro's and con's of using sharding versus cores.
>
> An index can be split up to multiple cores or multilple shards.
>
> So why one over the other?

If you split your index into multiple cores, you still have to use the 
shards parameter to tell Solr where to find the parts.  You can use 
multiple servers, multiple cores, or even both.  Which method to use 
depends on why you've decided to split your index into multiple pieces.

If the primary motivating factor is index size, you'll probably want to 
use separate servers.  Unless the only reason for distributed search is 
making build process easier (or possible), I personally would not have 
multiple "live" cores on a single machine.  An example where multiple 
cores per server is entirely appropriate (creating a new core every five 
minutes):

http://www.loggly.com/2010/08/our-solr-system/

I went to this guy's talk at Lucene Revolution.  Amazing stuff.

Shawn