You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jerome Yang <je...@pivotal.io> on 2016/10/11 09:27:45 UTC

Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

Hi all,

I'm facing a strange problem.

Here's a solrcloud on a single machine which has 2 solr nodes, version:
solr6.1.

I create a collection with 2 shards and replica factor is 3 with default
router called "test_collection".
Index some documents and commit. Then I backup this collection.
After that, I restore from the backup and name the restored collection
"restore_test_collection".
Query from "restore_test_collection". It works fine and data is consistent.

Then, I index some new documents, and commit.
I find that the documents are all indexed in shard1 and the leader of
shard1 don't have these new documents but other replicas do have these new
documents.

Anyone have this issue?
Really need your help.

Regards,
Jerome

Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

Posted by Jerome Yang <je...@pivotal.io>.

@Mark Miller Please help~

On Tue, Oct 11, 2016 at 5:32 PM, Jerome Yang <je...@pivotal.io> wrote:

> Using curl do some tests.
>
> curl 'http://localhost:8983/solr/restore_test_collection/update?
> *commit=true*&wt=json' --data-binary @test.json -H
> 'Content-type:application/json'
>
> The leader don't have new documents, but other replicas have.
>
> curl 'http://localhost:8983/solr/restore_test_collection/update?
> *commitWithin**=1000*&wt=json' --data-binary @test.json -H
> 'Content-type:application/json'
> All replicas in shard1 have new documents include leader, and all new
> documents route to shard1.
>
> On Tue, Oct 11, 2016 at 5:27 PM, Jerome Yang <je...@pivotal.io> wrote:
>
>> Hi all,
>>
>> I'm facing a strange problem.
>>
>> Here's a solrcloud on a single machine which has 2 solr nodes, version:
>> solr6.1.
>>
>> I create a collection with 2 shards and replica factor is 3 with default
>> router called "test_collection".
>> Index some documents and commit. Then I backup this collection.
>> After that, I restore from the backup and name the restored collection
>> "restore_test_collection".
>> Query from "restore_test_collection". It works fine and data is
>> consistent.
>>
>> Then, I index some new documents, and commit.
>> I find that the documents are all indexed in shard1 and the leader of
>> shard1 don't have these new documents but other replicas do have these new
>> documents.
>>
>> Anyone have this issue?
>> Really need your help.
>>
>> Regards,
>> Jerome
>>
>
>

Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

Posted by Jerome Yang <je...@pivotal.io>.

Using curl do some tests.

curl 'http://localhost:8983/solr/restore_test_collection/update?
*commit=true*&wt=json' --data-binary @test.json -H
'Content-type:application/json'

The leader don't have new documents, but other replicas have.

curl 'http://localhost:8983/solr/restore_test_collection/update?
*commitWithin**=1000*&wt=json' --data-binary @test.json -H
'Content-type:application/json'
All replicas in shard1 have new documents include leader, and all new
documents route to shard1.

On Tue, Oct 11, 2016 at 5:27 PM, Jerome Yang <je...@pivotal.io> wrote:

> Hi all,
>
> I'm facing a strange problem.
>
> Here's a solrcloud on a single machine which has 2 solr nodes, version:
> solr6.1.
>
> I create a collection with 2 shards and replica factor is 3 with default
> router called "test_collection".
> Index some documents and commit. Then I backup this collection.
> After that, I restore from the backup and name the restored collection
> "restore_test_collection".
> Query from "restore_test_collection". It works fine and data is consistent.
>
> Then, I index some new documents, and commit.
> I find that the documents are all indexed in shard1 and the leader of
> shard1 don't have these new documents but other replicas do have these new
> documents.
>
> Anyone have this issue?
> Really need your help.
>
> Regards,
> Jerome
>

RE: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

Posted by "Marquiss, John" <Jo...@wolterskluwer.com>.

Thanks, I have done that... for those following this on the mail list or coming across this in the archives the JIRA is SOLR-10242

https://issues.apache.org/jira/browse/SOLR-10242 Cores created by Solr RESTORE end up with stale searches after indexing.


Also, we do not see any warnings or errors in any of our logs after the restore has finished.

John Marquiss

>-----Original Message-----
>From: Erick Erickson [mailto:erickerickson@gmail.com] 
>Sent: Tuesday, March 7, 2017 9:53 AM
>To: solr-user <so...@lucene.apache.org>
>Subject: Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.
>
>John:
>
>Just skimming, but this certainly seems like it merits a JIRA, please feel free to create one (you may have to create your own logon first).
>Please include the steps for the test you did where new replicas "see"
>the restored index. And this last where you hand edited things is important.
>
>The only other question I'd have is whether you saw anything odd in the logs. I'm no expert in this functionality, just covering the possibility that for >some reason the restore didn't finish successfully even though all the files appear to be copied back.
>
>I don't have any bandwidth to tackle this, but a JIRA will preserve it for others to look at.
>
>Thanks for all your research on this!
>
>Erick

Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

Posted by Erick Erickson <er...@gmail.com>.

John:

Just skimming, but this certainly seems like it merits a JIRA, please
feel free to create one (you may have to create your own logon first).
Please include the steps for the test you did where new replicas "see"
the restored index. And this last where you hand edited things is
important.

The only other question I'd have is whether you saw anything odd in
the logs. I'm no expert in this functionality, just covering the
possibility that for some reason the restore didn't finish
successfully even though all the files appear to be copied back.

I don't have any bandwidth to tackle this, but a JIRA will preserve it
for others to look at.

Thanks for all your research on this!

Erick

On Tue, Mar 7, 2017 at 7:44 AM, Marquiss, John
<Jo...@wolterskluwer.com> wrote:
> Just another bit of information supporting the thought that this has to recycling the searcher when there is a change to the index directory that is named something other than "index".
>
> Running our tests again, this time after restring the content I shut down solr and renamed the two "restore.#############" directories to "index" and updated index.properties to reflect this. After restarting Solr the collection searched correctly and immediately reflected index updates in search results following commit.
>
> I see two possible solutions for this:
>
> 1) Modify the restore process so that it copies index files into a directory named "index" instead of "restore.#############". This is probably easy but it doesn't actually fix the root problem. Something isn't respecting the path in index.properties to recycle the searcher after commit.
>
> 2) Fix and find the code to create a new searcher to watch the path in index.properties instead of specifically looking for "index". This may be harder to find but it fixes the root problem.
>
> We are more than willing to try to fix this if someone could suggest where we could start looking into the source to find this.
>
> John Marquiss
>
>>-----Original Message-----
>>From: Marquiss, John [mailto:John.Marquiss@wolterskluwer.com]
>>Sent: Monday, March 6, 2017 9:39 PM
>>To: solr-user@lucene.apache.org
>>Subject: RE: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.
>>
>>I couldn't find an issue for this in JIRA so I thought I would add some of our own findings here... We are seeing the same problem with the Solr 6 >Restore functionality. While I do not think it is important it happens on both our Linux environments and our local Windows development >environments. Also, from our testing, I do not think it has anything to do with actual indexing (if you notice in the order of my test steps documents >appear in replicas after creation, without re-indexing).
>>
>>Test Environment:
>>•      Windows 10 (we see the same behavior on Linux as well)
>>•      Java 1.8.0_121
>>•      Solr 6.3.0 with patch for SOLR-9527 (To fix RESTORE shard distribution and add createNodeSet to RESTORE)
>>•      1 Zookeeper node running on localhost:2181
>>•      3 Solr nodes running on localhost:8171, localhost:8181 and localhost:8191 (hostname NY07LP521696)
>>
>>Test and observations:
>>1)     Create a 2 shard collection 'test'
>>       http://localhost:8181/solr/admin/collections?>action=CREATE&name=test&numShards=2&replicationFactor=1&maxShardsPerNode=1&collection.configName=testconf&&createNodeSet=NY07LP>521696:8171_solr,NY07LP521696:8181_solr
>>
>>2)     Index 7 documents to 'test'
>>3)     Search 'test' - result count 7
>>4)     Backup collection 'test'
>>       http://localhost:8181/solr/admin/collections?action=BACKUP&collection=test&name=copy&location=%2FData%2Fsolr%2Fbkp&async=1234
>>
>>5)     Restore 'test' to collection 'test2'
>>       http://localhost:8191/solr/admin/collections?action=RESTORE&name=copy&location=%2FData%2Fsolr%>2Fbkp&collection=test2&async=1234&maxShardsPerNode=1&createNodeSet=NY07LP521696:8181_solr,NY07LP521696:8191_solr
>>
>>6)     Search 'test2' - result count 7
>>7)     Index 2 new documents to 'test2'
>>8)     Search 'test2' - result count 7 (new documents do not appear in results)
>>9)     Create a replica for each of the shards of 'test2'
>>       http://localhost:8191/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=NY07LP521696:8181_solr
>>       http://localhost:8191/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard2&node=NY07LP521696:8171_solr
>>
>>*** Note that it is not necessary to try to re-index the 2 new documents before this step, just create replicas and query ***
>>10)    Repeatedly query 'test2' - result count randomly changes between 7, 8 and 9. This is because Solr is randomly selecting replicas of 'test2' and >one of the two new docs were added to each of the shards in the collection so if replica0 of both shards are selected the result is 7, if replica0 and >replica1 are selected for each of either shard the result is 8 and if replica1 is selected for both shards the result is 9. This is random behavior because >we do not know ahead of time which shards the new documents will be added to and if they will be split evenly.
>>
>>       Query 'test2' with shards parameter of original restored shards - result count 7
>>       http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica0,localhost:8181/solr/test2_shard2_replica0
>>
>>       Query 'test2' with shard parameter of one original restored shard and one replica shard - result count 8
>>       http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica0,localhost:8181/solr/test2_shard2_replica1
>>       http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica1,localhost:8181/solr/test2_shard2_replica0
>>
>>       Query 'test2' with shards parameter of replica shards - result count 9
>>       http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica1,localhost:8181/solr/test2_shard2_replica1
>>
>>13)    Note that on the Solr admin Core statistics show the restored cores as not current, the Searching master is Gen 2, the Replicable master is Gen >3, on the replicated core both the Searching and Replicable master is Gen 3
>>14)    Restarting Solr corrects the issue
>>
>>Thoughts:
>>•      Solr is backing up and restoring correctly
>>•      The restored collection data is stored under a path like: …/node8181/test2_shard1_replica0/restore.20170307005909295 instead of >…/node8181/test2_shard1_replica0/index
>>•      Indexing is actually behaving correctly (documents are available in replicas even without re-indexing)
>>•      When asked to about the state of the searcher though the admin page core details Solr does know that the searcher is not current
>>
>>I was looking in the source but haven’t found the root cause yet. My gut feeling is that because the index data dir is …/restore.20170307005909295 >instead of …/index Solr isn't seeing the index changes and recycling the searcher for the restored cores. Neither committing the collection or forcing >an optimize fix the issue, restarting Solr fixes the issue but this will not be viable for us in production.
>>
>>John Marquiss
>>
>>-----Original Message-----
>>>From: Jerome Yang [mailto:jeyang@pivotal.io]
>>>Sent: Tuesday, October 11, 2016 9:23 PM
>>>To: solr-user@lucene.apache.org; erickerickson@gmail.com
>>>Subject: Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.
>>>
>>>@Erick Please help😂
>>>
>>>On Wed, Oct 12, 2016 at 10:21 AM, Jerome Yang <je...@pivotal.io> wrote:
>>>
>>>> Hi Shawn,
>>>>
>>>> I just check the clusterstate.json
>>>> <http://192.168.33.10:18983/solr/admin/zookeeper?detail=true&path=%2F
>>>> c lusterstate.json> which is restored for "restore_test_collection".
>>>> The router is "router":{"name":"compositeId"}, not implicit.
>>>>
>>>> So, it's a very serious bug I think.
>>>> Should this bug go into jira?
>>>>
>>>> Please help!
>>>>
>>>> Regards,
>>>> Jerome
>>>>
>>>>
>>>> On Tue, Oct 11, 2016 at 8:34 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>>>>
>>>>> On 10/11/2016 3:27 AM, Jerome Yang wrote:
>>>>> > Then, I index some new documents, and commit. I find that the
>>>>> > documents are all indexed in shard1 and the leader of shard1 don't
>>>>> > have these new documents but other replicas do have these new documents.
>>>>>
>>>>> Not sure why the leader would be missing the documents but other
>>>>> replicas have them, but I do have a theory about why they are only
>>>>> in shard1.  Testing that theory will involve obtaining some
>>>>> information from your system:
>>>>>
>>>>> What is the router on the restored collection? You can see this in
>>>>> the admin UI by going to Cloud->Tree, opening "collections", and
>>>>> clicking on the collection.  In the right-hand side, there will be
>>>>> some info from zookeeper, with some JSON below it that should
>>>>> mention the router.  I suspect that the router on the new collection
>>>>> may have been configured as implicit, instead of compositeId.
>>>>>
>>>>> Thanks,
>>>>> Shawn
>>>>>
>>>>>
>>>>>

RE: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

Posted by "Marquiss, John" <Jo...@wolterskluwer.com>.

Just another bit of information supporting the thought that this has to recycling the searcher when there is a change to the index directory that is named something other than "index".

Running our tests again, this time after restring the content I shut down solr and renamed the two "restore.#############" directories to "index" and updated index.properties to reflect this. After restarting Solr the collection searched correctly and immediately reflected index updates in search results following commit.

I see two possible solutions for this:

1) Modify the restore process so that it copies index files into a directory named "index" instead of "restore.#############". This is probably easy but it doesn't actually fix the root problem. Something isn't respecting the path in index.properties to recycle the searcher after commit.

2) Fix and find the code to create a new searcher to watch the path in index.properties instead of specifically looking for "index". This may be harder to find but it fixes the root problem.

We are more than willing to try to fix this if someone could suggest where we could start looking into the source to find this.

John Marquiss

>-----Original Message-----
>From: Marquiss, John [mailto:John.Marquiss@wolterskluwer.com] 
>Sent: Monday, March 6, 2017 9:39 PM
>To: solr-user@lucene.apache.org
>Subject: RE: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.
>
>I couldn't find an issue for this in JIRA so I thought I would add some of our own findings here... We are seeing the same problem with the Solr 6 >Restore functionality. While I do not think it is important it happens on both our Linux environments and our local Windows development >environments. Also, from our testing, I do not think it has anything to do with actual indexing (if you notice in the order of my test steps documents >appear in replicas after creation, without re-indexing).
>
>Test Environment:
>•	Windows 10 (we see the same behavior on Linux as well)
>•	Java 1.8.0_121
>•	Solr 6.3.0 with patch for SOLR-9527 (To fix RESTORE shard distribution and add createNodeSet to RESTORE)
>•	1 Zookeeper node running on localhost:2181
>•	3 Solr nodes running on localhost:8171, localhost:8181 and localhost:8191 (hostname NY07LP521696)
>
>Test and observations:
>1)	Create a 2 shard collection 'test'
>	http://localhost:8181/solr/admin/collections?>action=CREATE&name=test&numShards=2&replicationFactor=1&maxShardsPerNode=1&collection.configName=testconf&&createNodeSet=NY07LP>521696:8171_solr,NY07LP521696:8181_solr
>
>2)	Index 7 documents to 'test'
>3)	Search 'test' - result count 7
>4)	Backup collection 'test'
>	http://localhost:8181/solr/admin/collections?action=BACKUP&collection=test&name=copy&location=%2FData%2Fsolr%2Fbkp&async=1234
>
>5)	Restore 'test' to collection 'test2'
>	http://localhost:8191/solr/admin/collections?action=RESTORE&name=copy&location=%2FData%2Fsolr%>2Fbkp&collection=test2&async=1234&maxShardsPerNode=1&createNodeSet=NY07LP521696:8181_solr,NY07LP521696:8191_solr
>
>6)	Search 'test2' - result count 7
>7)	Index 2 new documents to 'test2'
>8)	Search 'test2' - result count 7 (new documents do not appear in results)
>9)	Create a replica for each of the shards of 'test2'
>	http://localhost:8191/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=NY07LP521696:8181_solr
>	http://localhost:8191/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard2&node=NY07LP521696:8171_solr
>
>*** Note that it is not necessary to try to re-index the 2 new documents before this step, just create replicas and query ***
>10)	Repeatedly query 'test2' - result count randomly changes between 7, 8 and 9. This is because Solr is randomly selecting replicas of 'test2' and >one of the two new docs were added to each of the shards in the collection so if replica0 of both shards are selected the result is 7, if replica0 and >replica1 are selected for each of either shard the result is 8 and if replica1 is selected for both shards the result is 9. This is random behavior because >we do not know ahead of time which shards the new documents will be added to and if they will be split evenly.
>
>	Query 'test2' with shards parameter of original restored shards - result count 7
>	http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica0,localhost:8181/solr/test2_shard2_replica0
>
>	Query 'test2' with shard parameter of one original restored shard and one replica shard - result count 8
>	http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica0,localhost:8181/solr/test2_shard2_replica1
>	http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica1,localhost:8181/solr/test2_shard2_replica0
>	
>	Query 'test2' with shards parameter of replica shards - result count 9
>	http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica1,localhost:8181/solr/test2_shard2_replica1
>
>13)	Note that on the Solr admin Core statistics show the restored cores as not current, the Searching master is Gen 2, the Replicable master is Gen >3, on the replicated core both the Searching and Replicable master is Gen 3
>14)	Restarting Solr corrects the issue
>
>Thoughts:
>•	Solr is backing up and restoring correctly
>•	The restored collection data is stored under a path like: …/node8181/test2_shard1_replica0/restore.20170307005909295 instead of >…/node8181/test2_shard1_replica0/index
>•	Indexing is actually behaving correctly (documents are available in replicas even without re-indexing)
>•	When asked to about the state of the searcher though the admin page core details Solr does know that the searcher is not current
>
>I was looking in the source but haven’t found the root cause yet. My gut feeling is that because the index data dir is …/restore.20170307005909295 >instead of …/index Solr isn't seeing the index changes and recycling the searcher for the restored cores. Neither committing the collection or forcing >an optimize fix the issue, restarting Solr fixes the issue but this will not be viable for us in production.
>
>John Marquiss
>
>-----Original Message-----
>>From: Jerome Yang [mailto:jeyang@pivotal.io]
>>Sent: Tuesday, October 11, 2016 9:23 PM
>>To: solr-user@lucene.apache.org; erickerickson@gmail.com
>>Subject: Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.
>>
>>@Erick Please help😂
>>
>>On Wed, Oct 12, 2016 at 10:21 AM, Jerome Yang <je...@pivotal.io> wrote:
>>
>>> Hi Shawn,
>>>
>>> I just check the clusterstate.json
>>> <http://192.168.33.10:18983/solr/admin/zookeeper?detail=true&path=%2F
>>> c lusterstate.json> which is restored for "restore_test_collection".
>>> The router is "router":{"name":"compositeId"}, not implicit.
>>>
>>> So, it's a very serious bug I think.
>>> Should this bug go into jira?
>>>
>>> Please help!
>>>
>>> Regards,
>>> Jerome
>>>
>>>
>>> On Tue, Oct 11, 2016 at 8:34 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>>>
>>>> On 10/11/2016 3:27 AM, Jerome Yang wrote:
>>>> > Then, I index some new documents, and commit. I find that the 
>>>> > documents are all indexed in shard1 and the leader of shard1 don't 
>>>> > have these new documents but other replicas do have these new documents.
>>>>
>>>> Not sure why the leader would be missing the documents but other 
>>>> replicas have them, but I do have a theory about why they are only 
>>>> in shard1.  Testing that theory will involve obtaining some 
>>>> information from your system:
>>>>
>>>> What is the router on the restored collection? You can see this in 
>>>> the admin UI by going to Cloud->Tree, opening "collections", and 
>>>> clicking on the collection.  In the right-hand side, there will be 
>>>> some info from zookeeper, with some JSON below it that should 
>>>> mention the router.  I suspect that the router on the new collection 
>>>> may have been configured as implicit, instead of compositeId.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>>>
>>>>

RE: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

Posted by "Marquiss, John" <Jo...@wolterskluwer.com>.

I couldn't find an issue for this in JIRA so I thought I would add some of our own findings here... We are seeing the same problem with the Solr 6 Restore functionality. While I do not think it is important it happens on both our Linux environments and our local Windows development environments. Also, from our testing, I do not think it has anything to do with actual indexing (if you notice in the order of my test steps documents appear in replicas after creation, without re-indexing).

Test Environment:
•	Windows 10 (we see the same behavior on Linux as well)
•	Java 1.8.0_121
•	Solr 6.3.0 with patch for SOLR-9527 (To fix RESTORE shard distribution and add createNodeSet to RESTORE)
•	1 Zookeeper node running on localhost:2181
•	3 Solr nodes running on localhost:8171, localhost:8181 and localhost:8191 (hostname NY07LP521696)

Test and observations:
1)	Create a 2 shard collection 'test'
	http://localhost:8181/solr/admin/collections?action=CREATE&name=test&numShards=2&replicationFactor=1&maxShardsPerNode=1&collection.configName=testconf&&createNodeSet=NY07LP521696:8171_solr,NY07LP521696:8181_solr

2)	Index 7 documents to 'test'
3)	Search 'test' - result count 7
4)	Backup collection 'test'
	http://localhost:8181/solr/admin/collections?action=BACKUP&collection=test&name=copy&location=%2FData%2Fsolr%2Fbkp&async=1234

5)	Restore 'test' to collection 'test2'
	http://localhost:8191/solr/admin/collections?action=RESTORE&name=copy&location=%2FData%2Fsolr%2Fbkp&collection=test2&async=1234&maxShardsPerNode=1&createNodeSet=NY07LP521696:8181_solr,NY07LP521696:8191_solr

6)	Search 'test2' - result count 7
7)	Index 2 new documents to 'test2'
8)	Search 'test2' - result count 7 (new documents do not appear in results)
9)	Create a replica for each of the shards of 'test2'
	http://localhost:8191/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=NY07LP521696:8181_solr
	http://localhost:8191/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard2&node=NY07LP521696:8171_solr

*** Note that it is not necessary to try to re-index the 2 new documents before this step, just create replicas and query ***
10)	Repeatedly query 'test2' - result count randomly changes between 7, 8 and 9. This is because Solr is randomly selecting replicas of 'test2' and one of the two new docs were added to each of the shards in the collection so if replica0 of both shards are selected the result is 7, if replica0 and replica1 are selected for each of either shard the result is 8 and if replica1 is selected for both shards the result is 9. This is random behavior because we do not know ahead of time which shards the new documents will be added to and if they will be split evenly.

	Query 'test2' with shards parameter of original restored shards - result count 7
	http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica0,localhost:8181/solr/test2_shard2_replica0

	Query 'test2' with shard parameter of one original restored shard and one replica shard - result count 8
	http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica0,localhost:8181/solr/test2_shard2_replica1
	http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica1,localhost:8181/solr/test2_shard2_replica0
	
	Query 'test2' with shards parameter of replica shards - result count 9
	http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica1,localhost:8181/solr/test2_shard2_replica1

13)	Note that on the Solr admin Core statistics show the restored cores as not current, the Searching master is Gen 2, the Replicable master is Gen 3, on the replicated core both the Searching and Replicable master is Gen 3
14)	Restarting Solr corrects the issue

Thoughts:
•	Solr is backing up and restoring correctly
•	The restored collection data is stored under a path like: …/node8181/test2_shard1_replica0/restore.20170307005909295 instead of …/node8181/test2_shard1_replica0/index
•	Indexing is actually behaving correctly (documents are available in replicas even without re-indexing)
•	When asked to about the state of the searcher though the admin page core details Solr does know that the searcher is not current

I was looking in the source but haven’t found the root cause yet. My gut feeling is that because the index data dir is …/restore.20170307005909295 instead of …/index Solr isn't seeing the index changes and recycling the searcher for the restored cores. Neither committing the collection or forcing an optimize fix the issue, restarting Solr fixes the issue but this will not be viable for us in production.

John Marquiss

-----Original Message-----
>From: Jerome Yang [mailto:jeyang@pivotal.io] 
>Sent: Tuesday, October 11, 2016 9:23 PM
>To: solr-user@lucene.apache.org; erickerickson@gmail.com
>Subject: Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.
>
>@Erick Please help😂
>
>On Wed, Oct 12, 2016 at 10:21 AM, Jerome Yang <je...@pivotal.io> wrote:
>
>> Hi Shawn,
>>
>> I just check the clusterstate.json
>> <http://192.168.33.10:18983/solr/admin/zookeeper?detail=true&path=%2Fc
>> lusterstate.json> which is restored for "restore_test_collection".
>> The router is "router":{"name":"compositeId"}, not implicit.
>>
>> So, it's a very serious bug I think.
>> Should this bug go into jira?
>>
>> Please help!
>>
>> Regards,
>> Jerome
>>
>>
>> On Tue, Oct 11, 2016 at 8:34 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>>
>>> On 10/11/2016 3:27 AM, Jerome Yang wrote:
>>> > Then, I index some new documents, and commit. I find that the 
>>> > documents are all indexed in shard1 and the leader of shard1 don't 
>>> > have these new documents but other replicas do have these new documents.
>>>
>>> Not sure why the leader would be missing the documents but other 
>>> replicas have them, but I do have a theory about why they are only in 
>>> shard1.  Testing that theory will involve obtaining some information 
>>> from your system:
>>>
>>> What is the router on the restored collection? You can see this in 
>>> the admin UI by going to Cloud->Tree, opening "collections", and 
>>> clicking on the collection.  In the right-hand side, there will be 
>>> some info from zookeeper, with some JSON below it that should mention 
>>> the router.  I suspect that the router on the new collection may have 
>>> been configured as implicit, instead of compositeId.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>

Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

Posted by Jerome Yang <je...@pivotal.io>.

@Erick Please help😂

On Wed, Oct 12, 2016 at 10:21 AM, Jerome Yang <je...@pivotal.io> wrote:

> Hi Shawn,
>
> I just check the clusterstate.json
> <http://192.168.33.10:18983/solr/admin/zookeeper?detail=true&path=%2Fclusterstate.json> which
> is restored for "restore_test_collection".
> The router is "router":{"name":"compositeId"},
> not implicit.
>
> So, it's a very serious bug I think.
> Should this bug go into jira?
>
> Please help!
>
> Regards,
> Jerome
>
>
> On Tue, Oct 11, 2016 at 8:34 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 10/11/2016 3:27 AM, Jerome Yang wrote:
>> > Then, I index some new documents, and commit. I find that the
>> > documents are all indexed in shard1 and the leader of shard1 don't
>> > have these new documents but other replicas do have these new documents.
>>
>> Not sure why the leader would be missing the documents but other
>> replicas have them, but I do have a theory about why they are only in
>> shard1.  Testing that theory will involve obtaining some information
>> from your system:
>>
>> What is the router on the restored collection? You can see this in the
>> admin UI by going to Cloud->Tree, opening "collections", and clicking on
>> the collection.  In the right-hand side, there will be some info from
>> zookeeper, with some JSON below it that should mention the router.  I
>> suspect that the router on the new collection may have been configured
>> as implicit, instead of compositeId.
>>
>> Thanks,
>> Shawn
>>
>>
>

Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

Posted by Jerome Yang <je...@pivotal.io>.

Hi Shawn,

I just check the clusterstate.json
<http://192.168.33.10:18983/solr/admin/zookeeper?detail=true&path=%2Fclusterstate.json>
which
is restored for "restore_test_collection".
The router is "router":{"name":"compositeId"},
not implicit.

So, it's a very serious bug I think.
Should this bug go into jira?

Please help!

Regards,
Jerome


On Tue, Oct 11, 2016 at 8:34 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 10/11/2016 3:27 AM, Jerome Yang wrote:
> > Then, I index some new documents, and commit. I find that the
> > documents are all indexed in shard1 and the leader of shard1 don't
> > have these new documents but other replicas do have these new documents.
>
> Not sure why the leader would be missing the documents but other
> replicas have them, but I do have a theory about why they are only in
> shard1.  Testing that theory will involve obtaining some information
> from your system:
>
> What is the router on the restored collection? You can see this in the
> admin UI by going to Cloud->Tree, opening "collections", and clicking on
> the collection.  In the right-hand side, there will be some info from
> zookeeper, with some JSON below it that should mention the router.  I
> suspect that the router on the new collection may have been configured
> as implicit, instead of compositeId.
>
> Thanks,
> Shawn
>
>

Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/11/2016 3:27 AM, Jerome Yang wrote:
> Then, I index some new documents, and commit. I find that the
> documents are all indexed in shard1 and the leader of shard1 don't
> have these new documents but other replicas do have these new documents. 

Not sure why the leader would be missing the documents but other
replicas have them, but I do have a theory about why they are only in
shard1.  Testing that theory will involve obtaining some information
from your system:

What is the router on the restored collection? You can see this in the
admin UI by going to Cloud->Tree, opening "collections", and clicking on
the collection.  In the right-hand side, there will be some info from
zookeeper, with some JSON below it that should mention the router.  I
suspect that the router on the new collection may have been configured
as implicit, instead of compositeId.

Thanks,
Shawn