You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2013/05/01 00:39:29 UTC

Re: Master - Slave File Sizes are not Same even after "command=abortfetch"

On 4/30/2013 8:33 AM, Furkan KAMACI wrote:
> I think that replication occurs after commit by default. It has been long
> time however there is still mismatch between leader and replica
> (approximately 5 MB). I tried to pull indexes from leader but it is still
> same.

My mail server has been down most of the day, and the Apache mail 
infrastructure hasn't noticed yet that I'm back up.  I don't have copies 
of the newest messages on this thread.  I checked the web archive to see 
what else has been said.  I'll be repeating some of what has been said 
before.

On SolrCloud terminology: SolrCloud divides your index into one or more 
shards, each of which has a different piece of the index.  Each shard is 
made up of replicas.  One replica in each shard is designated leader. 
Note: a leader is still a replica, it is just the winner of the latest 
leader election.  Summary: shards, replicas, leader.

One term that you are using is "follower" ... this is not a valid 
SolrCloud term.  It might make sense to use this term for a replica that 
is not a leader, but I have never seen it used in anything official. 
Any replica can become leader, if the conditions are just right.

There are only two times that the leader replica has special 
significance - when you are indexing and when a replica starts 
operation, either as an existing replica that went down or as a new replica.

In SolrCloud, replication is *NOT* used when you index new data.  The 
*ONLY* time that replication happens in SolrCloud is when a replica is 
starts up, and even then it will only happen if the leader cannot figure 
out how to use its transaction log to sync the replica.

SolrCloud does distributed indexing.  This means that when an update 
comes in, SolrCloud determines which shard needs that update.  If the 
core that received the request is not the leader of that shard, the 
request is forwarded to the correct leader.  That leader will index the 
update and send it to all of the replicas for that shard, each of which 
will index the update independently.

Because each replica indexes independently, you can end up with 
different sizes.  The actual search results should be the same, although 
scoring can sometimes be a little bit different between replicas because 
deleted documents that exist in one replica but not another will 
contribute to the score.  SolrCloud does not attempt to keep the 
replicas absolutely identical, as long as they contain the same 
non-deleted documents.

Thanks,
Shawn

Re: Master - Slave File Sizes are not Same even after "command=abortfetch"

Posted by Furkan KAMACI <fu...@gmail.com>.

Shawn thanks for the detailed answer. I have 5 shards and 1 leader - 1
replica for each. I mean I have 10 Solr nodes. When I look at admin gui of
one of the shards leader I see that its replica has less MB of index than
leader. I don't update the data, I don't index new ones. I think that after
a time later it will sync its replica to itself but nothing has changed.

2013/5/1 Shawn Heisey <so...@elyograg.org>

> On 4/30/2013 8:33 AM, Furkan KAMACI wrote:
>
>> I think that replication occurs after commit by default. It has been long
>> time however there is still mismatch between leader and replica
>> (approximately 5 MB). I tried to pull indexes from leader but it is still
>> same.
>>
>
> My mail server has been down most of the day, and the Apache mail
> infrastructure hasn't noticed yet that I'm back up.  I don't have copies of
> the newest messages on this thread.  I checked the web archive to see what
> else has been said.  I'll be repeating some of what has been said before.
>
> On SolrCloud terminology: SolrCloud divides your index into one or more
> shards, each of which has a different piece of the index.  Each shard is
> made up of replicas.  One replica in each shard is designated leader. Note:
> a leader is still a replica, it is just the winner of the latest leader
> election.  Summary: shards, replicas, leader.
>
> One term that you are using is "follower" ... this is not a valid
> SolrCloud term.  It might make sense to use this term for a replica that is
> not a leader, but I have never seen it used in anything official. Any
> replica can become leader, if the conditions are just right.
>
> There are only two times that the leader replica has special significance
> - when you are indexing and when a replica starts operation, either as an
> existing replica that went down or as a new replica.
>
> In SolrCloud, replication is *NOT* used when you index new data.  The
> *ONLY* time that replication happens in SolrCloud is when a replica is
> starts up, and even then it will only happen if the leader cannot figure
> out how to use its transaction log to sync the replica.
>
> SolrCloud does distributed indexing.  This means that when an update comes
> in, SolrCloud determines which shard needs that update.  If the core that
> received the request is not the leader of that shard, the request is
> forwarded to the correct leader.  That leader will index the update and
> send it to all of the replicas for that shard, each of which will index the
> update independently.
>
> Because each replica indexes independently, you can end up with different
> sizes.  The actual search results should be the same, although scoring can
> sometimes be a little bit different between replicas because deleted
> documents that exist in one replica but not another will contribute to the
> score.  SolrCloud does not attempt to keep the replicas absolutely
> identical, as long as they contain the same non-deleted documents.
>
> Thanks,
> Shawn
>
>