You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by vidya <vi...@tcs.com> on 2016/01/06 12:30:23 UTC

core,Collection,Shard,Replication

Hi

I am new to solr. i have a doubt in understanding difference between core
and collection.
>As far as i understand, cores can be created when solr is run in local mode
and collections in solrCloud.
Can you please help me if i am wrong.
>And why do we shard a collection? i read like -
When your data is too large for one node, you can break it up and store it
in sections by creating one or more shards. Each is a portion of the logical
index, or core, and it's the set of all nodes containing that section of the
index.
But when indexing a document in one shard,it gets reflected in every shard
of that collection. But main intention of creating shard is to break up the
data.
>Why do we replicate a collection?

Thanks in advance



--
View this message in context: http://lucene.472066.n3.nabble.com/core-Collection-Shard-Replication-tp4248850.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: core,Collection,Shard,Replication

Posted by Erick Erickson <er...@gmail.com>.

bq: But when indexing a document in one shard,it gets reflected in every shard
of that collection

This is a misunderstanding (and I'm being a bit pedantic here). Each shard
contains a portion of the entire corpus. Say you have 1M docs and 2 shards.
Each shard will have very close to 500K documents.

If a shard has multiple _replicas_, each replica has a copy of the doc.

Please take the time to work through the Solr tutorials, much will become
clearer. You don't need any kind of extensive setup, you can see how things
run on any machine you have.

Best,
Erick

On Wed, Jan 6, 2016 at 5:19 AM, Binoy Dalal <bi...@gmail.com> wrote:
> The machines part may have been a bit misleading. I am sorry for that. What
> I actually meant was shards. Now, you can have multiple shards hosted on a
> single machine or multiple machines as in the example I gave.
>
> "I have to make sure that all those machines have solr server or gateway
> should be deplyed ?"
>
> Yes you do need a solr process running on all machines on which you plan to
> distribute your index.
>
> "And what multiple JVM processes run behind a solr server running?"
>
> If you mean how many jvms are running for a solr server, the answer's 1.
> "then what is a solr instance?"
> One solr process on your machine.
>
> On Wed, 6 Jan 2016, 18:33 vidya <vi...@tcs.com> wrote:
>
>> Hi
>> You described that sharding is to distribute data over multiple machines.Do
>> I have to make sure that all those machines have solr server or gateway
>> should be deplyed ?
>> And what multiple JVM processes run behind a solr server running?
>> I wanted to know what is a node. -> I understood like a mchine with solr
>> server deployed.
>> then what is a solr instance?
>>
>> Am I correct.If not,please help me
>>
>> Thanks in advance
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/core-Collection-Shard-Replication-tp4248850p4248865.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> --
> Regards,
> Binoy Dalal

Re: core,Collection,Shard,Replication

Posted by Binoy Dalal <bi...@gmail.com>.

The machines part may have been a bit misleading. I am sorry for that. What
I actually meant was shards. Now, you can have multiple shards hosted on a
single machine or multiple machines as in the example I gave.

"I have to make sure that all those machines have solr server or gateway
should be deplyed ?"

Yes you do need a solr process running on all machines on which you plan to
distribute your index.

"And what multiple JVM processes run behind a solr server running?"

If you mean how many jvms are running for a solr server, the answer's 1.
"then what is a solr instance?"
One solr process on your machine.

On Wed, 6 Jan 2016, 18:33 vidya <vi...@tcs.com> wrote:

> Hi
> You described that sharding is to distribute data over multiple machines.Do
> I have to make sure that all those machines have solr server or gateway
> should be deplyed ?
> And what multiple JVM processes run behind a solr server running?
> I wanted to know what is a node. -> I understood like a mchine with solr
> server deployed.
> then what is a solr instance?
>
> Am I correct.If not,please help me
>
> Thanks in advance
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/core-Collection-Shard-Replication-tp4248850p4248865.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal

Re: core,Collection,Shard,Replication

Posted by vidya <vi...@tcs.com>.

Hi 
You described that sharding is to distribute data over multiple machines.Do
I have to make sure that all those machines have solr server or gateway
should be deplyed ?
And what multiple JVM processes run behind a solr server running?
I wanted to know what is a node. -> I understood like a mchine with solr
server deployed.
then what is a solr instance?

Am I correct.If not,please help me

Thanks in advance



--
View this message in context: http://lucene.472066.n3.nabble.com/core-Collection-Shard-Replication-tp4248850p4248865.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: core,Collection,Shard,Replication

Posted by Binoy Dalal <bi...@gmail.com>.

1) A collection is simply a logical group and can consist of multiple
cores. The core is a representation of a single physical index or part of
an index. Both cores and collections can be created in local as well as
cloud modes.
2) Sharding is performed to distribute your index over multiple machines
when the index becomes too big for one machine.
So if you have a 10TB index and 10 machines each with 1TB of disk space
then you'll divide your index into 10 shards and put those 10 shards on
your 10 machines.
"But when indexing a document in one shard,it gets reflected in every shard
of that collection"
This is what happens only logically, meaning you can query any one shard
for a doc that might be on another and you'll still get the proper results.
Physically, one doc is only present on one shard, which is determined by
the hash value of the doc id at index time.
3) The main purpose of replication is to provide redundancy. When you're
running solr on cloud mode with multiple shards and one of your shards goes
down, your entire cluster will stop responding. In such a case a replica
for such a shard will serve as a backup and take over the responsibilities
of that shard.
This will keep your app running.

What I've written above is a very coarse grained view of all these concepts.
You should take a look at the wiki pages to gain a better fuller
understanding of these concepts.

On Wed, 6 Jan 2016, 17:00 vidya <vi...@tcs.com> wrote:

> Hi
>
> I am new to solr. i have a doubt in understanding difference between core
> and collection.
> >As far as i understand, cores can be created when solr is run in local
> mode
> and collections in solrCloud.
> Can you please help me if i am wrong.
> >And why do we shard a collection? i read like -
> When your data is too large for one node, you can break it up and store it
> in sections by creating one or more shards. Each is a portion of the
> logical
> index, or core, and it's the set of all nodes containing that section of
> the
> index.
> But when indexing a document in one shard,it gets reflected in every shard
> of that collection. But main intention of creating shard is to break up the
> data.
> >Why do we replicate a collection?
>
> Thanks in advance
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/core-Collection-Shard-Replication-tp4248850.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal