You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Timothy Potter (JIRA)" <ji...@apache.org> on 2014/01/08 23:08:55 UTC

[jira] [Comment Edited] (SOLR-4260) Inconsistent numDocs between leader and replica

    [ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865963#comment-13865963 ] 

Timothy Potter edited comment on SOLR-4260 at 1/8/14 10:07 PM:
---------------------------------------------------------------

Still digging into it ... I'm curious why a batch of 34 adds on the leader gets processed as several sub-batches on the replica? Here's what I'm seeing the logs around the documents that are missing from the replica. Basically, there are 34 docs on the leader and only 25 processed in 4 separate batches (from my counting of the logs) on the replica. Why wouldn't it just be one for one? The docs are all roughly the same size ... and what's breaking it up? Having trouble seeing that in the logs / code ;-)

On the leader:

2014-01-08 12:23:21,501 [qtp604104855-17] INFO  update.processor.LogUpdateProcessor  - 
[demo_shard1_replica1] webapp=/solr path=/update params={wt=javabin&version=2} {add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 (1456683668181352449), 82904 (1456683668181352450), 82912 (1456683668187643904), 82913 (1456683668188692480), 82914 (1456683668188692481), 82916 (1456683668188692482), 82917 (1456683668188692483), 82918 (1456683668188692484), ... (34 adds)]} 0 34

2014-01-08 12:23:21,600 [qtp604104855-17] INFO  update.processor.LogUpdateProcessor  - 
[demo_shard1_replica1] webapp=/solr path=/update params={wt=javabin&version=2} {add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 (1456683668286210049), 83011 (1456683668286210050), 83012 (1456683668286210051), 83013 (1456683668287258624), 83018 (1456683668287258625), 83019 (1456683668289355776), 83023 (1456683668289355777), 83024 (1456683668289355778), ... (43 adds)]} 0 32


On the replica:

2014-01-08 12:23:21,126 [qtp604104855-22] INFO  update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/&update.distrib=FROMLEADER&wt=javabin&version=2} 
{add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 (1456683668181352449)]} 0 1

2014-01-08 12:23:21,134 [qtp604104855-22] INFO  update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/&update.distrib=FROMLEADER&wt=javabin&version=2} 
{add=[82904 (1456683668181352450), 82912 (1456683668187643904), 82913 (1456683668188692480), 82914 (1456683668188692481), 82916 (1456683668188692482), 82917 (1456683668188692483), 82918 (1456683668188692484), 82919 (1456683668188692485), 82922 (1456683668188692486)]} 0 2

2014-01-08 12:23:21,139 [qtp604104855-22] INFO  update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/&update.distrib=FROMLEADER&wt=javabin&version=2} 
{add=[82923 (1456683668188692487), 82926 (1456683668190789632), 82928 (1456683668190789633), 82932 (1456683668190789634), 82939 (1456683668192886784), 82945 (1456683668192886785), 82946 (1456683668192886786), 82947 (1456683668193935360), 82952 (1456683668193935361), 82962 (1456683668193935362), ... (12 adds)]} 0 3

2014-01-08 12:23:21,144 [qtp604104855-22] INFO  update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/&update.distrib=FROMLEADER&wt=javabin&version=2} 
{add=[82967 (1456683668199178240)]} 0 0


**** 9 Docs Missing here ****

2014-01-08 12:23:21,227 [qtp604104855-22] INFO  update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/&update.distrib=FROMLEADER&wt=javabin&version=2} 
{add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 (1456683668286210049), 83011 (1456683668286210050), 83012 (1456683668286210051), 83013 (1456683668287258624)]} 0 2



was (Author: tim.potter):
Still digging into it ... I'm curious why a batch of 34 adds on the leader gets processed as several sub-batches on the replica? Here's what I'm seeing the logs around the documents that are missing from the replica. Basically, there are 34 docs on the leader and only 25 processed in 4 separate batches (from my counting of the logs) on the replica. Why wouldn't it just be one for one? The docs are all roughly the same size ... and what's breaking it up? Having trouble seeing that in the logs ;-)

On the leader:

2014-01-08 12:23:21,501 [qtp604104855-17] INFO  update.processor.LogUpdateProcessor  - 
[demo_shard1_replica1] webapp=/solr path=/update params={wt=javabin&version=2} {add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 (1456683668181352449), 82904 (1456683668181352450), 82912 (1456683668187643904), 82913 (1456683668188692480), 82914 (1456683668188692481), 82916 (1456683668188692482), 82917 (1456683668188692483), 82918 (1456683668188692484), ... (34 adds)]} 0 34

2014-01-08 12:23:21,600 [qtp604104855-17] INFO  update.processor.LogUpdateProcessor  - 
[demo_shard1_replica1] webapp=/solr path=/update params={wt=javabin&version=2} {add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 (1456683668286210049), 83011 (1456683668286210050), 83012 (1456683668286210051), 83013 (1456683668287258624), 83018 (1456683668287258625), 83019 (1456683668289355776), 83023 (1456683668289355777), 83024 (1456683668289355778), ... (43 adds)]} 0 32


On the replica:

2014-01-08 12:23:21,126 [qtp604104855-22] INFO  update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/&update.distrib=FROMLEADER&wt=javabin&version=2} 
{add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 (1456683668181352449)]} 0 1

2014-01-08 12:23:21,134 [qtp604104855-22] INFO  update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/&update.distrib=FROMLEADER&wt=javabin&version=2} 
{add=[82904 (1456683668181352450), 82912 (1456683668187643904), 82913 (1456683668188692480), 82914 (1456683668188692481), 82916 (1456683668188692482), 82917 (1456683668188692483), 82918 (1456683668188692484), 82919 (1456683668188692485), 82922 (1456683668188692486)]} 0 2

2014-01-08 12:23:21,139 [qtp604104855-22] INFO  update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/&update.distrib=FROMLEADER&wt=javabin&version=2} 
{add=[82923 (1456683668188692487), 82926 (1456683668190789632), 82928 (1456683668190789633), 82932 (1456683668190789634), 82939 (1456683668192886784), 82945 (1456683668192886785), 82946 (1456683668192886786), 82947 (1456683668193935360), 82952 (1456683668193935361), 82962 (1456683668193935362), ... (12 adds)]} 0 3

2014-01-08 12:23:21,144 [qtp604104855-22] INFO  update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/&update.distrib=FROMLEADER&wt=javabin&version=2} 
{add=[82967 (1456683668199178240)]} 0 0


**** 9 Docs Missing here ****

2014-01-08 12:23:21,227 [qtp604104855-22] INFO  update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/&update.distrib=FROMLEADER&wt=javabin&version=2} 
{add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 (1456683668286210049), 83011 (1456683668286210050), 83012 (1456683668286210051), 83013 (1456683668287258624)]} 0 2


> Inconsistent numDocs between leader and replica
> -----------------------------------------------
>
>                 Key: SOLR-4260
>                 URL: https://issues.apache.org/jira/browse/SOLR-4260
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>         Environment: 5.0.0.2013.01.04.15.31.51
>            Reporter: Markus Jelsma
>            Assignee: Mark Miller
>            Priority: Critical
>             Fix For: 5.0, 4.7
>
>         Attachments: 192.168.20.102-replica1.png, 192.168.20.104-replica2.png, clusterstate.png, demo_shard1_replicas_out_of_sync.tgz
>
>
> After wiping all cores and reindexing some 3.3 million docs from Nutch using CloudSolrServer we see inconsistencies between the leader and replica for some shards.
> Each core hold about 3.3k documents. For some reason 5 out of 10 shards have a small deviation in then number of documents. The leader and slave deviate for roughly 10-20 documents, not more.
> Results hopping ranks in the result set for identical queries got my attention, there were small IDF differences for exactly the same record causing a record to shift positions in the result set. During those tests no records were indexed. Consecutive catch all queries also return different number of numDocs.
> We're running a 10 node test cluster with 10 shards and a replication factor of two and frequently reindex using a fresh build from trunk. I've not seen this issue for quite some time until a few days ago.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org