You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Hendrik Haddorp <he...@gmx.net> on 2017/02/21 21:12:25 UTC

Re: Solr on HDFS: AutoAddReplica does not add a replica

Hi,

I had opened SOLR-10092 
(https://issues.apache.org/jira/browse/SOLR-10092) for this a while ago. 
I was now able to gt this feature working with a very small code change. 
After a few seconds Solr reassigns the replica to a different Solr 
instance as long as one replica is still up. Not really sure why one 
replica needs to be up though. I added the patch based on Solr 6.3 to 
the bug report. Would be great if it could be merged soon.

regards,
Hendrik

On 19.01.2017 17:08, Hendrik Haddorp wrote:
> HDFS is like a shared filesystem so every Solr Cloud instance can 
> access the data using the same path or URL. The clusterstate.json 
> looks like this:
>
> "shards":{"shard1":{
>         "range":"80000000-7fffffff",
>         "state":"active",
>         "replicas":{
>           "core_node1":{
>             "core":"test1.collection-0_shard1_replica1",
> "dataDir":"hdfs://master...:8000/test1.collection-0/core_node1/data/",
>             "base_url":"http://slave3....:9000/solr",
>             "node_name":"slave3....:9000_solr",
>             "state":"active",
> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node1/data/tlog"}, 
>
>           "core_node2":{
>             "core":"test1.collection-0_shard1_replica2",
> "dataDir":"hdfs://master....:8000/test1.collection-0/core_node2/data/",
>             "base_url":"http://slave2....:9000/solr",
>             "node_name":"slave2....:9000_solr",
>             "state":"active",
> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node2/data/tlog", 
>
>             "leader":"true"},
>           "core_node3":{
>             "core":"test1.collection-0_shard1_replica3",
> "dataDir":"hdfs://master....:8000/test1.collection-0/core_node3/data/",
>             "base_url":"http://slave4....:9005/solr",
>             "node_name":"slave4....:9005_solr",
>             "state":"active",
> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node3/data/tlog"}}}} 
>
>
> So every replica is always assigned to one node and this is being 
> stored in ZK, pretty much the same as for none HDFS setups. Just as 
> the data is not stored locally but on the network and as the path does 
> not contain any node information you can of course easily take over 
> the work to a different Solr node. You should just need to update the 
> owner of the replica in ZK and you should basically be done, I assume. 
> That's why the documentation states that an advantage of using HDFS is 
> that a failing node can be replaced by a different one. The Overseer 
> just has to move the ownership of the replica, which seems like what 
> the code is trying to do. There just seems to be a bug in the code so 
> that the core does not get created on the target node.
>
> Each data directory also contains a lock file. The documentation 
> states that one should use the HdfsLockFactory, which unfortunately 
> can easily lead to SOLR-8335, which hopefully will be fixed by 
> SOLR-8169. A manual cleanup is however also easily done but seems to 
> require a node restart to take effect. But I'm also only recently 
> playing around with all this ;-)
>
> regards,
> Hendrik
>
> On 19.01.2017 16:40, Shawn Heisey wrote:
>> On 1/19/2017 4:09 AM, Hendrik Haddorp wrote:
>>> Given that the data is on HDFS it shouldn't matter if any active
>>> replica is left as the data does not need to get transferred from
>>> another instance but the new core will just take over the existing
>>> data. Thus a replication factor of 1 should also work just in that
>>> case the shard would be down until the new core is up. Anyhow, it
>>> looks like the above call is missing to set the shard id I guess or
>>> some code is checking wrongly.
>> I know very little about how SolrCloud interacts with HDFS, so although
>> I'm reasonably certain about what comes below, I could be wrong.
>>
>> I have not ever heard of SolrCloud being able to automatically take over
>> an existing index directory when it creates a replica, or even share
>> index directories unless the admin fools it into doing so without its
>> knowledge.  Sharing an index directory for replicas with SolrCloud would
>> NOT work correctly.  Solr must be able to update all replicas
>> independently, which means that each of them will lock its index
>> directory and write to it.
>>
>> It is my understanding (from reading messages on mailing lists) that
>> when using HDFS, Solr replicas are all separate and consume additional
>> disk space, just like on a regular filesystem.
>>
>> I found the code that generates the "No shard id" exception, but my
>> knowledge of how the zookeeper code in Solr works is not deep enough to
>> understand what it means or how to fix it.
>>
>> Thanks,
>> Shawn
>>
>


Re: Solr on HDFS: AutoAddReplica does not add a replica

Posted by Hendrik Haddorp <he...@gmx.net>.
I'm also not really an HDFS expert but I believe it is slightly different:

The HDFS data is replicated, lets say 3 times, between the HDFS data 
nodes but for an HDFS client it looks like one directory and it is 
hidden that the data is replicated. Every client should see the same 
data. Just like every client should see the same data in ZooKeeper 
(every ZK node also has a full replica). So with 2 replicas there should 
only be two disjoint data sets. Thus it should not matter which solr 
node claims the replica and then continues where things were left. Solr 
should only be concerned about the replication between the solr replicas 
but not about the replication between the HDFS data nodes, just as it 
does not have to deal with the replication between the ZK nodes.

Anyhow, for now I would be happy if my patch for SOLR-10092 could get 
included soon as the auto add replica feature does not work without that 
at all for me :-)

On 22.02.2017 16:15, Erick Erickson wrote:
> bq: in the none HDFS case that sounds logical but in the HDFS case all
> the index data is in the shared HDFS file system
>
> That's not really the point, and it's not quite true. The Solr index
> unique _per replica_. So replica1 points to an HDFS directory (that's
> triply replicated to be sure). replica2 points to a totally different
> set of index files. So with the default replication of 3 your two
> replicas will have 6 copies of the index that are totally disjoint in
> two sets of three. From Solr's point of view, the fact that HDFS
> replicates the data doesn't really alter much.
>
> Autoaddreplica will indeed, to be able to re-use the HDFS data if a
> Solr node goes away. But that doesn't change the replication issue I
> described.
>
> At least that's my understanding, I admit I'm not an HDFS guy and it
> may be out of date.
>
> Erick
>
> On Tue, Feb 21, 2017 at 10:30 PM, Hendrik Haddorp
> <he...@gmx.net> wrote:
>> Hi Erick,
>>
>> in the none HDFS case that sounds logical but in the HDFS case all the index
>> data is in the shared HDFS file system. Even the transaction logs should be
>> in there. So the node that once had the replica should not really have more
>> information then any other node, especially if legacyClound is set to false
>> so having ZooKeeper truth.
>>
>> regards,
>> Hendrik
>>
>> On 22.02.2017 02:28, Erick Erickson wrote:
>>> Hendrik:
>>>
>>> bq: Not really sure why one replica needs to be up though.
>>>
>>> I didn't write the code so I'm guessing a bit, but consider the
>>> situation where you have no replicas for a shard up and add a new one.
>>> Eventually it could become the leader but there would have been no
>>> chance for it to check if it's version of the index was up to date.
>>> But since it would be the leader, when other replicas for that shard
>>> _do_ come on line they'd replicate the index down from the newly added
>>> replica, possibly using very old data.
>>>
>>> FWIW,
>>> Erick
>>>
>>> On Tue, Feb 21, 2017 at 1:12 PM, Hendrik Haddorp
>>> <he...@gmx.net> wrote:
>>>> Hi,
>>>>
>>>> I had opened SOLR-10092
>>>> (https://issues.apache.org/jira/browse/SOLR-10092)
>>>> for this a while ago. I was now able to gt this feature working with a
>>>> very
>>>> small code change. After a few seconds Solr reassigns the replica to a
>>>> different Solr instance as long as one replica is still up. Not really
>>>> sure
>>>> why one replica needs to be up though. I added the patch based on Solr
>>>> 6.3
>>>> to the bug report. Would be great if it could be merged soon.
>>>>
>>>> regards,
>>>> Hendrik
>>>>
>>>> On 19.01.2017 17:08, Hendrik Haddorp wrote:
>>>>> HDFS is like a shared filesystem so every Solr Cloud instance can access
>>>>> the data using the same path or URL. The clusterstate.json looks like
>>>>> this:
>>>>>
>>>>> "shards":{"shard1":{
>>>>>           "range":"80000000-7fffffff",
>>>>>           "state":"active",
>>>>>           "replicas":{
>>>>>             "core_node1":{
>>>>>               "core":"test1.collection-0_shard1_replica1",
>>>>> "dataDir":"hdfs://master...:8000/test1.collection-0/core_node1/data/",
>>>>>               "base_url":"http://slave3....:9000/solr",
>>>>>               "node_name":"slave3....:9000_solr",
>>>>>               "state":"active",
>>>>>
>>>>>
>>>>> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node1/data/tlog"},
>>>>>             "core_node2":{
>>>>>               "core":"test1.collection-0_shard1_replica2",
>>>>> "dataDir":"hdfs://master....:8000/test1.collection-0/core_node2/data/",
>>>>>               "base_url":"http://slave2....:9000/solr",
>>>>>               "node_name":"slave2....:9000_solr",
>>>>>               "state":"active",
>>>>>
>>>>>
>>>>> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node2/data/tlog",
>>>>>               "leader":"true"},
>>>>>             "core_node3":{
>>>>>               "core":"test1.collection-0_shard1_replica3",
>>>>> "dataDir":"hdfs://master....:8000/test1.collection-0/core_node3/data/",
>>>>>               "base_url":"http://slave4....:9005/solr",
>>>>>               "node_name":"slave4....:9005_solr",
>>>>>               "state":"active",
>>>>>
>>>>>
>>>>> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node3/data/tlog"}}}}
>>>>>
>>>>> So every replica is always assigned to one node and this is being stored
>>>>> in ZK, pretty much the same as for none HDFS setups. Just as the data is
>>>>> not
>>>>> stored locally but on the network and as the path does not contain any
>>>>> node
>>>>> information you can of course easily take over the work to a different
>>>>> Solr
>>>>> node. You should just need to update the owner of the replica in ZK and
>>>>> you
>>>>> should basically be done, I assume. That's why the documentation states
>>>>> that
>>>>> an advantage of using HDFS is that a failing node can be replaced by a
>>>>> different one. The Overseer just has to move the ownership of the
>>>>> replica,
>>>>> which seems like what the code is trying to do. There just seems to be a
>>>>> bug
>>>>> in the code so that the core does not get created on the target node.
>>>>>
>>>>> Each data directory also contains a lock file. The documentation states
>>>>> that one should use the HdfsLockFactory, which unfortunately can easily
>>>>> lead
>>>>> to SOLR-8335, which hopefully will be fixed by SOLR-8169. A manual
>>>>> cleanup
>>>>> is however also easily done but seems to require a node restart to take
>>>>> effect. But I'm also only recently playing around with all this ;-)
>>>>>
>>>>> regards,
>>>>> Hendrik
>>>>>
>>>>> On 19.01.2017 16:40, Shawn Heisey wrote:
>>>>>> On 1/19/2017 4:09 AM, Hendrik Haddorp wrote:
>>>>>>> Given that the data is on HDFS it shouldn't matter if any active
>>>>>>> replica is left as the data does not need to get transferred from
>>>>>>> another instance but the new core will just take over the existing
>>>>>>> data. Thus a replication factor of 1 should also work just in that
>>>>>>> case the shard would be down until the new core is up. Anyhow, it
>>>>>>> looks like the above call is missing to set the shard id I guess or
>>>>>>> some code is checking wrongly.
>>>>>> I know very little about how SolrCloud interacts with HDFS, so although
>>>>>> I'm reasonably certain about what comes below, I could be wrong.
>>>>>>
>>>>>> I have not ever heard of SolrCloud being able to automatically take
>>>>>> over
>>>>>> an existing index directory when it creates a replica, or even share
>>>>>> index directories unless the admin fools it into doing so without its
>>>>>> knowledge.  Sharing an index directory for replicas with SolrCloud
>>>>>> would
>>>>>> NOT work correctly.  Solr must be able to update all replicas
>>>>>> independently, which means that each of them will lock its index
>>>>>> directory and write to it.
>>>>>>
>>>>>> It is my understanding (from reading messages on mailing lists) that
>>>>>> when using HDFS, Solr replicas are all separate and consume additional
>>>>>> disk space, just like on a regular filesystem.
>>>>>>
>>>>>> I found the code that generates the "No shard id" exception, but my
>>>>>> knowledge of how the zookeeper code in Solr works is not deep enough to
>>>>>> understand what it means or how to fix it.
>>>>>>
>>>>>> Thanks,
>>>>>> Shawn
>>>>>>


Re: Solr on HDFS: AutoAddReplica does not add a replica

Posted by Erick Erickson <er...@gmail.com>.
bq: in the none HDFS case that sounds logical but in the HDFS case all
the index data is in the shared HDFS file system

That's not really the point, and it's not quite true. The Solr index
unique _per replica_. So replica1 points to an HDFS directory (that's
triply replicated to be sure). replica2 points to a totally different
set of index files. So with the default replication of 3 your two
replicas will have 6 copies of the index that are totally disjoint in
two sets of three. From Solr's point of view, the fact that HDFS
replicates the data doesn't really alter much.

Autoaddreplica will indeed, to be able to re-use the HDFS data if a
Solr node goes away. But that doesn't change the replication issue I
described.

At least that's my understanding, I admit I'm not an HDFS guy and it
may be out of date.

Erick

On Tue, Feb 21, 2017 at 10:30 PM, Hendrik Haddorp
<he...@gmx.net> wrote:
> Hi Erick,
>
> in the none HDFS case that sounds logical but in the HDFS case all the index
> data is in the shared HDFS file system. Even the transaction logs should be
> in there. So the node that once had the replica should not really have more
> information then any other node, especially if legacyClound is set to false
> so having ZooKeeper truth.
>
> regards,
> Hendrik
>
> On 22.02.2017 02:28, Erick Erickson wrote:
>>
>> Hendrik:
>>
>> bq: Not really sure why one replica needs to be up though.
>>
>> I didn't write the code so I'm guessing a bit, but consider the
>> situation where you have no replicas for a shard up and add a new one.
>> Eventually it could become the leader but there would have been no
>> chance for it to check if it's version of the index was up to date.
>> But since it would be the leader, when other replicas for that shard
>> _do_ come on line they'd replicate the index down from the newly added
>> replica, possibly using very old data.
>>
>> FWIW,
>> Erick
>>
>> On Tue, Feb 21, 2017 at 1:12 PM, Hendrik Haddorp
>> <he...@gmx.net> wrote:
>>>
>>> Hi,
>>>
>>> I had opened SOLR-10092
>>> (https://issues.apache.org/jira/browse/SOLR-10092)
>>> for this a while ago. I was now able to gt this feature working with a
>>> very
>>> small code change. After a few seconds Solr reassigns the replica to a
>>> different Solr instance as long as one replica is still up. Not really
>>> sure
>>> why one replica needs to be up though. I added the patch based on Solr
>>> 6.3
>>> to the bug report. Would be great if it could be merged soon.
>>>
>>> regards,
>>> Hendrik
>>>
>>> On 19.01.2017 17:08, Hendrik Haddorp wrote:
>>>>
>>>> HDFS is like a shared filesystem so every Solr Cloud instance can access
>>>> the data using the same path or URL. The clusterstate.json looks like
>>>> this:
>>>>
>>>> "shards":{"shard1":{
>>>>          "range":"80000000-7fffffff",
>>>>          "state":"active",
>>>>          "replicas":{
>>>>            "core_node1":{
>>>>              "core":"test1.collection-0_shard1_replica1",
>>>> "dataDir":"hdfs://master...:8000/test1.collection-0/core_node1/data/",
>>>>              "base_url":"http://slave3....:9000/solr",
>>>>              "node_name":"slave3....:9000_solr",
>>>>              "state":"active",
>>>>
>>>>
>>>> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node1/data/tlog"},
>>>>            "core_node2":{
>>>>              "core":"test1.collection-0_shard1_replica2",
>>>> "dataDir":"hdfs://master....:8000/test1.collection-0/core_node2/data/",
>>>>              "base_url":"http://slave2....:9000/solr",
>>>>              "node_name":"slave2....:9000_solr",
>>>>              "state":"active",
>>>>
>>>>
>>>> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node2/data/tlog",
>>>>              "leader":"true"},
>>>>            "core_node3":{
>>>>              "core":"test1.collection-0_shard1_replica3",
>>>> "dataDir":"hdfs://master....:8000/test1.collection-0/core_node3/data/",
>>>>              "base_url":"http://slave4....:9005/solr",
>>>>              "node_name":"slave4....:9005_solr",
>>>>              "state":"active",
>>>>
>>>>
>>>> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node3/data/tlog"}}}}
>>>>
>>>> So every replica is always assigned to one node and this is being stored
>>>> in ZK, pretty much the same as for none HDFS setups. Just as the data is
>>>> not
>>>> stored locally but on the network and as the path does not contain any
>>>> node
>>>> information you can of course easily take over the work to a different
>>>> Solr
>>>> node. You should just need to update the owner of the replica in ZK and
>>>> you
>>>> should basically be done, I assume. That's why the documentation states
>>>> that
>>>> an advantage of using HDFS is that a failing node can be replaced by a
>>>> different one. The Overseer just has to move the ownership of the
>>>> replica,
>>>> which seems like what the code is trying to do. There just seems to be a
>>>> bug
>>>> in the code so that the core does not get created on the target node.
>>>>
>>>> Each data directory also contains a lock file. The documentation states
>>>> that one should use the HdfsLockFactory, which unfortunately can easily
>>>> lead
>>>> to SOLR-8335, which hopefully will be fixed by SOLR-8169. A manual
>>>> cleanup
>>>> is however also easily done but seems to require a node restart to take
>>>> effect. But I'm also only recently playing around with all this ;-)
>>>>
>>>> regards,
>>>> Hendrik
>>>>
>>>> On 19.01.2017 16:40, Shawn Heisey wrote:
>>>>>
>>>>> On 1/19/2017 4:09 AM, Hendrik Haddorp wrote:
>>>>>>
>>>>>> Given that the data is on HDFS it shouldn't matter if any active
>>>>>> replica is left as the data does not need to get transferred from
>>>>>> another instance but the new core will just take over the existing
>>>>>> data. Thus a replication factor of 1 should also work just in that
>>>>>> case the shard would be down until the new core is up. Anyhow, it
>>>>>> looks like the above call is missing to set the shard id I guess or
>>>>>> some code is checking wrongly.
>>>>>
>>>>> I know very little about how SolrCloud interacts with HDFS, so although
>>>>> I'm reasonably certain about what comes below, I could be wrong.
>>>>>
>>>>> I have not ever heard of SolrCloud being able to automatically take
>>>>> over
>>>>> an existing index directory when it creates a replica, or even share
>>>>> index directories unless the admin fools it into doing so without its
>>>>> knowledge.  Sharing an index directory for replicas with SolrCloud
>>>>> would
>>>>> NOT work correctly.  Solr must be able to update all replicas
>>>>> independently, which means that each of them will lock its index
>>>>> directory and write to it.
>>>>>
>>>>> It is my understanding (from reading messages on mailing lists) that
>>>>> when using HDFS, Solr replicas are all separate and consume additional
>>>>> disk space, just like on a regular filesystem.
>>>>>
>>>>> I found the code that generates the "No shard id" exception, but my
>>>>> knowledge of how the zookeeper code in Solr works is not deep enough to
>>>>> understand what it means or how to fix it.
>>>>>
>>>>> Thanks,
>>>>> Shawn
>>>>>
>

Re: Solr on HDFS: AutoAddReplica does not add a replica

Posted by Hendrik Haddorp <he...@gmx.net>.
Hi Erick,

in the none HDFS case that sounds logical but in the HDFS case all the 
index data is in the shared HDFS file system. Even the transaction logs 
should be in there. So the node that once had the replica should not 
really have more information then any other node, especially if 
legacyClound is set to false so having ZooKeeper truth.

regards,
Hendrik

On 22.02.2017 02:28, Erick Erickson wrote:
> Hendrik:
>
> bq: Not really sure why one replica needs to be up though.
>
> I didn't write the code so I'm guessing a bit, but consider the
> situation where you have no replicas for a shard up and add a new one.
> Eventually it could become the leader but there would have been no
> chance for it to check if it's version of the index was up to date.
> But since it would be the leader, when other replicas for that shard
> _do_ come on line they'd replicate the index down from the newly added
> replica, possibly using very old data.
>
> FWIW,
> Erick
>
> On Tue, Feb 21, 2017 at 1:12 PM, Hendrik Haddorp
> <he...@gmx.net> wrote:
>> Hi,
>>
>> I had opened SOLR-10092 (https://issues.apache.org/jira/browse/SOLR-10092)
>> for this a while ago. I was now able to gt this feature working with a very
>> small code change. After a few seconds Solr reassigns the replica to a
>> different Solr instance as long as one replica is still up. Not really sure
>> why one replica needs to be up though. I added the patch based on Solr 6.3
>> to the bug report. Would be great if it could be merged soon.
>>
>> regards,
>> Hendrik
>>
>> On 19.01.2017 17:08, Hendrik Haddorp wrote:
>>> HDFS is like a shared filesystem so every Solr Cloud instance can access
>>> the data using the same path or URL. The clusterstate.json looks like this:
>>>
>>> "shards":{"shard1":{
>>>          "range":"80000000-7fffffff",
>>>          "state":"active",
>>>          "replicas":{
>>>            "core_node1":{
>>>              "core":"test1.collection-0_shard1_replica1",
>>> "dataDir":"hdfs://master...:8000/test1.collection-0/core_node1/data/",
>>>              "base_url":"http://slave3....:9000/solr",
>>>              "node_name":"slave3....:9000_solr",
>>>              "state":"active",
>>>
>>> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node1/data/tlog"},
>>>            "core_node2":{
>>>              "core":"test1.collection-0_shard1_replica2",
>>> "dataDir":"hdfs://master....:8000/test1.collection-0/core_node2/data/",
>>>              "base_url":"http://slave2....:9000/solr",
>>>              "node_name":"slave2....:9000_solr",
>>>              "state":"active",
>>>
>>> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node2/data/tlog",
>>>              "leader":"true"},
>>>            "core_node3":{
>>>              "core":"test1.collection-0_shard1_replica3",
>>> "dataDir":"hdfs://master....:8000/test1.collection-0/core_node3/data/",
>>>              "base_url":"http://slave4....:9005/solr",
>>>              "node_name":"slave4....:9005_solr",
>>>              "state":"active",
>>>
>>> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node3/data/tlog"}}}}
>>>
>>> So every replica is always assigned to one node and this is being stored
>>> in ZK, pretty much the same as for none HDFS setups. Just as the data is not
>>> stored locally but on the network and as the path does not contain any node
>>> information you can of course easily take over the work to a different Solr
>>> node. You should just need to update the owner of the replica in ZK and you
>>> should basically be done, I assume. That's why the documentation states that
>>> an advantage of using HDFS is that a failing node can be replaced by a
>>> different one. The Overseer just has to move the ownership of the replica,
>>> which seems like what the code is trying to do. There just seems to be a bug
>>> in the code so that the core does not get created on the target node.
>>>
>>> Each data directory also contains a lock file. The documentation states
>>> that one should use the HdfsLockFactory, which unfortunately can easily lead
>>> to SOLR-8335, which hopefully will be fixed by SOLR-8169. A manual cleanup
>>> is however also easily done but seems to require a node restart to take
>>> effect. But I'm also only recently playing around with all this ;-)
>>>
>>> regards,
>>> Hendrik
>>>
>>> On 19.01.2017 16:40, Shawn Heisey wrote:
>>>> On 1/19/2017 4:09 AM, Hendrik Haddorp wrote:
>>>>> Given that the data is on HDFS it shouldn't matter if any active
>>>>> replica is left as the data does not need to get transferred from
>>>>> another instance but the new core will just take over the existing
>>>>> data. Thus a replication factor of 1 should also work just in that
>>>>> case the shard would be down until the new core is up. Anyhow, it
>>>>> looks like the above call is missing to set the shard id I guess or
>>>>> some code is checking wrongly.
>>>> I know very little about how SolrCloud interacts with HDFS, so although
>>>> I'm reasonably certain about what comes below, I could be wrong.
>>>>
>>>> I have not ever heard of SolrCloud being able to automatically take over
>>>> an existing index directory when it creates a replica, or even share
>>>> index directories unless the admin fools it into doing so without its
>>>> knowledge.  Sharing an index directory for replicas with SolrCloud would
>>>> NOT work correctly.  Solr must be able to update all replicas
>>>> independently, which means that each of them will lock its index
>>>> directory and write to it.
>>>>
>>>> It is my understanding (from reading messages on mailing lists) that
>>>> when using HDFS, Solr replicas are all separate and consume additional
>>>> disk space, just like on a regular filesystem.
>>>>
>>>> I found the code that generates the "No shard id" exception, but my
>>>> knowledge of how the zookeeper code in Solr works is not deep enough to
>>>> understand what it means or how to fix it.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>


Re: Solr on HDFS: AutoAddReplica does not add a replica

Posted by Erick Erickson <er...@gmail.com>.
Hendrik:

bq: Not really sure why one replica needs to be up though.

I didn't write the code so I'm guessing a bit, but consider the
situation where you have no replicas for a shard up and add a new one.
Eventually it could become the leader but there would have been no
chance for it to check if it's version of the index was up to date.
But since it would be the leader, when other replicas for that shard
_do_ come on line they'd replicate the index down from the newly added
replica, possibly using very old data.

FWIW,
Erick

On Tue, Feb 21, 2017 at 1:12 PM, Hendrik Haddorp
<he...@gmx.net> wrote:
> Hi,
>
> I had opened SOLR-10092 (https://issues.apache.org/jira/browse/SOLR-10092)
> for this a while ago. I was now able to gt this feature working with a very
> small code change. After a few seconds Solr reassigns the replica to a
> different Solr instance as long as one replica is still up. Not really sure
> why one replica needs to be up though. I added the patch based on Solr 6.3
> to the bug report. Would be great if it could be merged soon.
>
> regards,
> Hendrik
>
> On 19.01.2017 17:08, Hendrik Haddorp wrote:
>>
>> HDFS is like a shared filesystem so every Solr Cloud instance can access
>> the data using the same path or URL. The clusterstate.json looks like this:
>>
>> "shards":{"shard1":{
>>         "range":"80000000-7fffffff",
>>         "state":"active",
>>         "replicas":{
>>           "core_node1":{
>>             "core":"test1.collection-0_shard1_replica1",
>> "dataDir":"hdfs://master...:8000/test1.collection-0/core_node1/data/",
>>             "base_url":"http://slave3....:9000/solr",
>>             "node_name":"slave3....:9000_solr",
>>             "state":"active",
>>
>> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node1/data/tlog"},
>>           "core_node2":{
>>             "core":"test1.collection-0_shard1_replica2",
>> "dataDir":"hdfs://master....:8000/test1.collection-0/core_node2/data/",
>>             "base_url":"http://slave2....:9000/solr",
>>             "node_name":"slave2....:9000_solr",
>>             "state":"active",
>>
>> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node2/data/tlog",
>>             "leader":"true"},
>>           "core_node3":{
>>             "core":"test1.collection-0_shard1_replica3",
>> "dataDir":"hdfs://master....:8000/test1.collection-0/core_node3/data/",
>>             "base_url":"http://slave4....:9005/solr",
>>             "node_name":"slave4....:9005_solr",
>>             "state":"active",
>>
>> "ulogDir":"hdfs://master....:8000/test1.collection-0/core_node3/data/tlog"}}}}
>>
>> So every replica is always assigned to one node and this is being stored
>> in ZK, pretty much the same as for none HDFS setups. Just as the data is not
>> stored locally but on the network and as the path does not contain any node
>> information you can of course easily take over the work to a different Solr
>> node. You should just need to update the owner of the replica in ZK and you
>> should basically be done, I assume. That's why the documentation states that
>> an advantage of using HDFS is that a failing node can be replaced by a
>> different one. The Overseer just has to move the ownership of the replica,
>> which seems like what the code is trying to do. There just seems to be a bug
>> in the code so that the core does not get created on the target node.
>>
>> Each data directory also contains a lock file. The documentation states
>> that one should use the HdfsLockFactory, which unfortunately can easily lead
>> to SOLR-8335, which hopefully will be fixed by SOLR-8169. A manual cleanup
>> is however also easily done but seems to require a node restart to take
>> effect. But I'm also only recently playing around with all this ;-)
>>
>> regards,
>> Hendrik
>>
>> On 19.01.2017 16:40, Shawn Heisey wrote:
>>>
>>> On 1/19/2017 4:09 AM, Hendrik Haddorp wrote:
>>>>
>>>> Given that the data is on HDFS it shouldn't matter if any active
>>>> replica is left as the data does not need to get transferred from
>>>> another instance but the new core will just take over the existing
>>>> data. Thus a replication factor of 1 should also work just in that
>>>> case the shard would be down until the new core is up. Anyhow, it
>>>> looks like the above call is missing to set the shard id I guess or
>>>> some code is checking wrongly.
>>>
>>> I know very little about how SolrCloud interacts with HDFS, so although
>>> I'm reasonably certain about what comes below, I could be wrong.
>>>
>>> I have not ever heard of SolrCloud being able to automatically take over
>>> an existing index directory when it creates a replica, or even share
>>> index directories unless the admin fools it into doing so without its
>>> knowledge.  Sharing an index directory for replicas with SolrCloud would
>>> NOT work correctly.  Solr must be able to update all replicas
>>> independently, which means that each of them will lock its index
>>> directory and write to it.
>>>
>>> It is my understanding (from reading messages on mailing lists) that
>>> when using HDFS, Solr replicas are all separate and consume additional
>>> disk space, just like on a regular filesystem.
>>>
>>> I found the code that generates the "No shard id" exception, but my
>>> knowledge of how the zookeeper code in Solr works is not deep enough to
>>> understand what it means or how to fix it.
>>>
>>> Thanks,
>>> Shawn
>>>
>>
>