You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov.INVALID> on 2021/01/28 15:56:44 UTC

Cores renamed

We recently have had a few occasions when cores for one specific collection were renamed (or more likely dropped and recreated, and thus ended up with a different core name).

Is this a known phenomenon? Is there any explanation?

It may be relevant that we just recently started running this SolrCloud on version 8.5.2, although the collection was created under Solr7.4. Also, this collection seems to experience some heavy updates such that the non-Leader replica has trouble keeping up. One of these renames occurred at 4:33am, so I highly suspect that the rename (or drop and recreate) was done by some internal Solr thread rather than by any of my coworkers. One other potential clue is that I can see that /solr/admin/cores?action=REQUESTRECOVERY was usually run on the new core a moment after it was created.

Does anyone have any insights?

Re: Cores renamed

Posted by Shawn Heisey <el...@elyograg.org>.
On 2/27/23 09:03, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
> This has happened yet again.
> 
> Does anyone yet have any input on the idea of using the Leader's collection name in Leader/Follower replication (or pre-Solr8.7 Master/Slave replication), rather than the core name?

If you're in cloud mode, you should not try to configure or initiate 
replication.  SolrCloud takes over the replication handler and uses it 
for its own purposes.  SolrCloud handles replicating the index when you 
have multiple replicas and you do not need to do anything.

In recent Solr versions, I wouldn't even include the replication handler 
in solrconfig.xml ... it is implicitly defined and does not need to be 
there.

Thanks,
Shawn

Re: Cores renamed

Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/15/23 12:30, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
> Once again, Solr has taken upon itself to rename the core. Does anyone yet have any input on the idea of using the Leader's collection name in Leader/Follower replication?

If you are in Cloud mode, let SolrCloud handle replication.  Do not try 
to do it yourself.

If you actually configure a replication handler, you risk causing a 
situation where SolrCloud cannot function properly.

Thanks,
Shawn

RE: Cores renamed

Posted by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov.INVALID>.
Once again, Solr has taken upon itself to rename the core. Does anyone yet have any input on the idea of using the Leader's collection name in Leader/Follower replication?

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID> 
Sent: Monday, February 27, 2023 11:03 AM
To: users@solr.apache.org
Subject: RE: Cores renamed

This has happened yet again.

Does anyone yet have any input on the idea of using the Leader's collection name in Leader/Follower replication (or pre-Solr8.7 Master/Slave replication), rather than the core name?

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID> 
Sent: Thursday, June 3, 2021 10:30 AM
To: users@solr.apache.org
Subject: RE: Cores renamed

As a potential solution, I was wondering about implementing Master/Slave replication using the collection name of the Master rather than the core name. My initial experiment with this in a test environment seemed to work. Does anyone have any input on the idea of using the Master's collection name in Master/Slave replication, rather than the core name?

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID> 
Sent: Wednesday, June 02, 2021 5:46 PM
To: users@solr.apache.org
Subject: RE: Cores renamed

It happened again this morning.

Attached is an excerpt from solr.log (with port #s & IP addresses redacted) and below is the current CLUSTERSTATUS (with port #s redacted)

Is there yet any explanation?

{
  "responseHeader":{
    "status":0,
    "QTime":10},
  "cluster":{
    "collections":{
      "ipg_report_large":{
        "pullReplicas":"0",
        "replicationFactor":"1",
        "shards":{"shard1":{
            "range":"80000000-7fffffff",
            "state":"active",
            "replicas":{
              "core_node8":{
                "core":"ipg_report_large_shard1_replica_n7",
                "base_url":"https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsolrdbprod26.be-md%2F%23%23%23%23%2Fsolr&data=05%7C01%7Ccraig.oakley%40nih.gov%7C5187990ace0b4855665f08db18dd09bd%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638131109859488770%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VDIiSuDZy0S7McdY8fWSCasVAThUxRlbG2prujWLhIo%3D&reserved=0",
                "node_name":"solrdbprod26.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false",
                "leader":"true"},
              "core_node10":{
                "core":"ipg_report_large_shard1_replica_n9",
                "base_url":"https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsolrdbprod25.be-md%2F%23%23%23%23%2Fsolr&data=05%7C01%7Ccraig.oakley%40nih.gov%7C5187990ace0b4855665f08db18dd09bd%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638131109859488770%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XTNVphWEdsUArB6aNFUk0JFbdSneyeq9G7M8w5Qh7pQ%3D&reserved=0",
                "node_name":"solrdbprod25.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false"}}}},
        "router":{"name":"compositeId"},
        "maxShardsPerNode":"1",
        "autoAddReplicas":"false",
        "nrtReplicas":"1",
        "tlogReplicas":"0",
        "znodeVersion":741,
        "configName":"ipg_report_large"}},
    "live_nodes":["solrdbprod26.be-md:####_solr",
      "solrdbprod25.be-md:####_solr"]}}

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID> 
Sent: Monday, May 17, 2021 5:01 PM
To: users@solr.apache.org
Subject: RE: Cores renamed

The entire directory for the old core gets removed

Here is CLUSTERSTATUS (again with port numbers redacted). I ran CLUSTERSTATUS on both nodes, and the only difference was QTime (that is, there was no real difference):

{
  "responseHeader":{
    "status":0,
    "QTime":5},
  "cluster":{
    "collections":{
      "ipg_report_large":{
        "pullReplicas":"0",
        "replicationFactor":"1",
        "shards":{"shard1":{
            "range":"80000000-7fffffff",
            "state":"active",
            "replicas":{
              "core_node4":{
                "core":"ipg_report_large_shard1_replica_n3",
                "base_url":"https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsolrdbprod26.be-md%2F%23%23%23%23%2Fsolr&data=05%7C01%7Ccraig.oakley%40nih.gov%7C5187990ace0b4855665f08db18dd09bd%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638131109859488770%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VDIiSuDZy0S7McdY8fWSCasVAThUxRlbG2prujWLhIo%3D&reserved=0",
                "node_name":"solrdbprod26.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false"},
              "core_node6":{
                "core":"ipg_report_large_shard1_replica_n5",
                "base_url":"https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsolrdbprod25.be-md%2F%23%23%23%23%2Fsolr&data=05%7C01%7Ccraig.oakley%40nih.gov%7C5187990ace0b4855665f08db18dd09bd%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638131109859488770%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XTNVphWEdsUArB6aNFUk0JFbdSneyeq9G7M8w5Qh7pQ%3D&reserved=0",
                "node_name":"solrdbprod25.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false",
                "leader":"true"}}}},
        "router":{"name":"compositeId"},
        "maxShardsPerNode":"1",
        "autoAddReplicas":"false",
        "nrtReplicas":"1",
        "tlogReplicas":"0",
        "znodeVersion":710,
        "configName":"ipg_report_large"}},
    "live_nodes":["solrdbprod26.be-md:####_solr",
      "solrdbprod25.be-md:####_solr"]}}

-----Original Message-----
From: matthew sporleder <ms...@gmail.com> 
Sent: Monday, May 17, 2021 4:34 PM
To: users@solr.apache.org
Subject: Re: Cores renamed

Can you verify all of your zkHost connection params across the entire
cluster, and share the replicationFactor, autoAddReplicas, etc for the
collection?

My theory is that you have two zookeeper configs conflicting as master
elections happens, causing new replicas to get created on-the-fly.

Also -- do these cores get deleted from the filesystem or left around?

On Mon, May 17, 2021 at 4:11 PM Oakley, Craig (NIH/NLM/NCBI) [C]
<cr...@nih.gov.invalid> wrote:
>
> > What does the core renames itself to, that would probably be the biggest hint.
>
> At 4:01pm 1/14/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n1 and to create the core ipg_report_large_shard1_replica_n7 in its place
>
> At 4:33am 1/16/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n5 (on another node of the same SolrCloud) and to create the core ipg_report_large_shard1_replica_n9 in its place
>
> At about 4:10pm 1/26/21, Solr decided on its own to drop this core ipg_report_large_shard1_replica_n9 and to create the core ipg_report_large_shard1_replica_n13 in its place
>
> In March, we created a new SolrCloud for the same collection, and reloaded the data
>
> At 7:59am 5/12/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n1 and to create the core ipg_report_large_shard1_replica_n5 in its place
>
> I am attaching an excerpt from solr.log for the most recent problem (with IP addresses and port numbers redacted)
>
> Please not that Master/Slave replication breaks when a core is renamed, so this can be a major problem
>
>
> Any ideas?
>
> -----Original Message-----
> From: Alexandre Rafalovitch <ar...@gmail.com>
> Sent: Wednesday, May 12, 2021 2:10 PM
> To: users@solr.apache.org
> Subject: Re: Cores renamed
>
> This is truly a shot in the dark, but is it possible you have
> something in core.properties file (which is where the core name is for
> non-Cloud setup)?
>
> What does the core renames itself to, that would probably be the biggest hint.
>
> Regards,
>    Alex.
>
> On Wed, 12 May 2021 at 14:00, Oakley, Craig (NIH/NLM/NCBI) [C]
> <cr...@nih.gov.invalid> wrote:
> >
> > This phenomenon has happened again (this time without any REQUESTRECOVERY)
> >
> > Does anyone yet have any explanation of this?
> >
> > -----Original Message-----
> > From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID>
> > Sent: Thursday, January 28, 2021 10:57 AM
> > To: solr-user@lucene.apache.org
> > Subject: Cores renamed
> >
> > We recently have had a few occasions when cores for one specific collection were renamed (or more likely dropped and recreated, and thus ended up with a different core name).
> >
> > Is this a known phenomenon? Is there any explanation?
> >
> > It may be relevant that we just recently started running this SolrCloud on version 8.5.2, although the collection was created under Solr7.4. Also, this collection seems to experience some heavy updates such that the non-Leader replica has trouble keeping up. One of these renames occurred at 4:33am, so I highly suspect that the rename (or drop and recreate) was done by some internal Solr thread rather than by any of my coworkers. One other potential clue is that I can see that /solr/admin/cores?action=REQUESTRECOVERY was usually run on the new core a moment after it was created.
> >
> > Does anyone have any insights?

RE: Cores renamed

Posted by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov.INVALID>.
This has happened yet again.

Does anyone yet have any input on the idea of using the Leader's collection name in Leader/Follower replication (or pre-Solr8.7 Master/Slave replication), rather than the core name?

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID> 
Sent: Thursday, June 3, 2021 10:30 AM
To: users@solr.apache.org
Subject: RE: Cores renamed

As a potential solution, I was wondering about implementing Master/Slave replication using the collection name of the Master rather than the core name. My initial experiment with this in a test environment seemed to work. Does anyone have any input on the idea of using the Master's collection name in Master/Slave replication, rather than the core name?

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID> 
Sent: Wednesday, June 02, 2021 5:46 PM
To: users@solr.apache.org
Subject: RE: Cores renamed

It happened again this morning.

Attached is an excerpt from solr.log (with port #s & IP addresses redacted) and below is the current CLUSTERSTATUS (with port #s redacted)

Is there yet any explanation?

{
  "responseHeader":{
    "status":0,
    "QTime":10},
  "cluster":{
    "collections":{
      "ipg_report_large":{
        "pullReplicas":"0",
        "replicationFactor":"1",
        "shards":{"shard1":{
            "range":"80000000-7fffffff",
            "state":"active",
            "replicas":{
              "core_node8":{
                "core":"ipg_report_large_shard1_replica_n7",
                "base_url":"http://solrdbprod26.be-md:####/solr",
                "node_name":"solrdbprod26.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false",
                "leader":"true"},
              "core_node10":{
                "core":"ipg_report_large_shard1_replica_n9",
                "base_url":"http://solrdbprod25.be-md:####/solr",
                "node_name":"solrdbprod25.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false"}}}},
        "router":{"name":"compositeId"},
        "maxShardsPerNode":"1",
        "autoAddReplicas":"false",
        "nrtReplicas":"1",
        "tlogReplicas":"0",
        "znodeVersion":741,
        "configName":"ipg_report_large"}},
    "live_nodes":["solrdbprod26.be-md:####_solr",
      "solrdbprod25.be-md:####_solr"]}}

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID> 
Sent: Monday, May 17, 2021 5:01 PM
To: users@solr.apache.org
Subject: RE: Cores renamed

The entire directory for the old core gets removed

Here is CLUSTERSTATUS (again with port numbers redacted). I ran CLUSTERSTATUS on both nodes, and the only difference was QTime (that is, there was no real difference):

{
  "responseHeader":{
    "status":0,
    "QTime":5},
  "cluster":{
    "collections":{
      "ipg_report_large":{
        "pullReplicas":"0",
        "replicationFactor":"1",
        "shards":{"shard1":{
            "range":"80000000-7fffffff",
            "state":"active",
            "replicas":{
              "core_node4":{
                "core":"ipg_report_large_shard1_replica_n3",
                "base_url":"http://solrdbprod26.be-md:####/solr",
                "node_name":"solrdbprod26.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false"},
              "core_node6":{
                "core":"ipg_report_large_shard1_replica_n5",
                "base_url":"http://solrdbprod25.be-md:####/solr",
                "node_name":"solrdbprod25.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false",
                "leader":"true"}}}},
        "router":{"name":"compositeId"},
        "maxShardsPerNode":"1",
        "autoAddReplicas":"false",
        "nrtReplicas":"1",
        "tlogReplicas":"0",
        "znodeVersion":710,
        "configName":"ipg_report_large"}},
    "live_nodes":["solrdbprod26.be-md:####_solr",
      "solrdbprod25.be-md:####_solr"]}}

-----Original Message-----
From: matthew sporleder <ms...@gmail.com> 
Sent: Monday, May 17, 2021 4:34 PM
To: users@solr.apache.org
Subject: Re: Cores renamed

Can you verify all of your zkHost connection params across the entire
cluster, and share the replicationFactor, autoAddReplicas, etc for the
collection?

My theory is that you have two zookeeper configs conflicting as master
elections happens, causing new replicas to get created on-the-fly.

Also -- do these cores get deleted from the filesystem or left around?

On Mon, May 17, 2021 at 4:11 PM Oakley, Craig (NIH/NLM/NCBI) [C]
<cr...@nih.gov.invalid> wrote:
>
> > What does the core renames itself to, that would probably be the biggest hint.
>
> At 4:01pm 1/14/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n1 and to create the core ipg_report_large_shard1_replica_n7 in its place
>
> At 4:33am 1/16/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n5 (on another node of the same SolrCloud) and to create the core ipg_report_large_shard1_replica_n9 in its place
>
> At about 4:10pm 1/26/21, Solr decided on its own to drop this core ipg_report_large_shard1_replica_n9 and to create the core ipg_report_large_shard1_replica_n13 in its place
>
> In March, we created a new SolrCloud for the same collection, and reloaded the data
>
> At 7:59am 5/12/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n1 and to create the core ipg_report_large_shard1_replica_n5 in its place
>
> I am attaching an excerpt from solr.log for the most recent problem (with IP addresses and port numbers redacted)
>
> Please not that Master/Slave replication breaks when a core is renamed, so this can be a major problem
>
>
> Any ideas?
>
> -----Original Message-----
> From: Alexandre Rafalovitch <ar...@gmail.com>
> Sent: Wednesday, May 12, 2021 2:10 PM
> To: users@solr.apache.org
> Subject: Re: Cores renamed
>
> This is truly a shot in the dark, but is it possible you have
> something in core.properties file (which is where the core name is for
> non-Cloud setup)?
>
> What does the core renames itself to, that would probably be the biggest hint.
>
> Regards,
>    Alex.
>
> On Wed, 12 May 2021 at 14:00, Oakley, Craig (NIH/NLM/NCBI) [C]
> <cr...@nih.gov.invalid> wrote:
> >
> > This phenomenon has happened again (this time without any REQUESTRECOVERY)
> >
> > Does anyone yet have any explanation of this?
> >
> > -----Original Message-----
> > From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID>
> > Sent: Thursday, January 28, 2021 10:57 AM
> > To: solr-user@lucene.apache.org
> > Subject: Cores renamed
> >
> > We recently have had a few occasions when cores for one specific collection were renamed (or more likely dropped and recreated, and thus ended up with a different core name).
> >
> > Is this a known phenomenon? Is there any explanation?
> >
> > It may be relevant that we just recently started running this SolrCloud on version 8.5.2, although the collection was created under Solr7.4. Also, this collection seems to experience some heavy updates such that the non-Leader replica has trouble keeping up. One of these renames occurred at 4:33am, so I highly suspect that the rename (or drop and recreate) was done by some internal Solr thread rather than by any of my coworkers. One other potential clue is that I can see that /solr/admin/cores?action=REQUESTRECOVERY was usually run on the new core a moment after it was created.
> >
> > Does anyone have any insights?

RE: Cores renamed

Posted by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov.INVALID>.
As a potential solution, I was wondering about implementing Master/Slave replication using the collection name of the Master rather than the core name. My initial experiment with this in a test environment seemed to work. Does anyone have any input on the idea of using the Master's collection name in Master/Slave replication, rather than the core name?

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID> 
Sent: Wednesday, June 02, 2021 5:46 PM
To: users@solr.apache.org
Subject: RE: Cores renamed

It happened again this morning.

Attached is an excerpt from solr.log (with port #s & IP addresses redacted) and below is the current CLUSTERSTATUS (with port #s redacted)

Is there yet any explanation?

{
  "responseHeader":{
    "status":0,
    "QTime":10},
  "cluster":{
    "collections":{
      "ipg_report_large":{
        "pullReplicas":"0",
        "replicationFactor":"1",
        "shards":{"shard1":{
            "range":"80000000-7fffffff",
            "state":"active",
            "replicas":{
              "core_node8":{
                "core":"ipg_report_large_shard1_replica_n7",
                "base_url":"http://solrdbprod26.be-md:####/solr",
                "node_name":"solrdbprod26.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false",
                "leader":"true"},
              "core_node10":{
                "core":"ipg_report_large_shard1_replica_n9",
                "base_url":"http://solrdbprod25.be-md:####/solr",
                "node_name":"solrdbprod25.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false"}}}},
        "router":{"name":"compositeId"},
        "maxShardsPerNode":"1",
        "autoAddReplicas":"false",
        "nrtReplicas":"1",
        "tlogReplicas":"0",
        "znodeVersion":741,
        "configName":"ipg_report_large"}},
    "live_nodes":["solrdbprod26.be-md:####_solr",
      "solrdbprod25.be-md:####_solr"]}}

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID> 
Sent: Monday, May 17, 2021 5:01 PM
To: users@solr.apache.org
Subject: RE: Cores renamed

The entire directory for the old core gets removed

Here is CLUSTERSTATUS (again with port numbers redacted). I ran CLUSTERSTATUS on both nodes, and the only difference was QTime (that is, there was no real difference):

{
  "responseHeader":{
    "status":0,
    "QTime":5},
  "cluster":{
    "collections":{
      "ipg_report_large":{
        "pullReplicas":"0",
        "replicationFactor":"1",
        "shards":{"shard1":{
            "range":"80000000-7fffffff",
            "state":"active",
            "replicas":{
              "core_node4":{
                "core":"ipg_report_large_shard1_replica_n3",
                "base_url":"http://solrdbprod26.be-md:####/solr",
                "node_name":"solrdbprod26.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false"},
              "core_node6":{
                "core":"ipg_report_large_shard1_replica_n5",
                "base_url":"http://solrdbprod25.be-md:####/solr",
                "node_name":"solrdbprod25.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false",
                "leader":"true"}}}},
        "router":{"name":"compositeId"},
        "maxShardsPerNode":"1",
        "autoAddReplicas":"false",
        "nrtReplicas":"1",
        "tlogReplicas":"0",
        "znodeVersion":710,
        "configName":"ipg_report_large"}},
    "live_nodes":["solrdbprod26.be-md:####_solr",
      "solrdbprod25.be-md:####_solr"]}}

-----Original Message-----
From: matthew sporleder <ms...@gmail.com> 
Sent: Monday, May 17, 2021 4:34 PM
To: users@solr.apache.org
Subject: Re: Cores renamed

Can you verify all of your zkHost connection params across the entire
cluster, and share the replicationFactor, autoAddReplicas, etc for the
collection?

My theory is that you have two zookeeper configs conflicting as master
elections happens, causing new replicas to get created on-the-fly.

Also -- do these cores get deleted from the filesystem or left around?

On Mon, May 17, 2021 at 4:11 PM Oakley, Craig (NIH/NLM/NCBI) [C]
<cr...@nih.gov.invalid> wrote:
>
> > What does the core renames itself to, that would probably be the biggest hint.
>
> At 4:01pm 1/14/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n1 and to create the core ipg_report_large_shard1_replica_n7 in its place
>
> At 4:33am 1/16/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n5 (on another node of the same SolrCloud) and to create the core ipg_report_large_shard1_replica_n9 in its place
>
> At about 4:10pm 1/26/21, Solr decided on its own to drop this core ipg_report_large_shard1_replica_n9 and to create the core ipg_report_large_shard1_replica_n13 in its place
>
> In March, we created a new SolrCloud for the same collection, and reloaded the data
>
> At 7:59am 5/12/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n1 and to create the core ipg_report_large_shard1_replica_n5 in its place
>
> I am attaching an excerpt from solr.log for the most recent problem (with IP addresses and port numbers redacted)
>
> Please not that Master/Slave replication breaks when a core is renamed, so this can be a major problem
>
>
> Any ideas?
>
> -----Original Message-----
> From: Alexandre Rafalovitch <ar...@gmail.com>
> Sent: Wednesday, May 12, 2021 2:10 PM
> To: users@solr.apache.org
> Subject: Re: Cores renamed
>
> This is truly a shot in the dark, but is it possible you have
> something in core.properties file (which is where the core name is for
> non-Cloud setup)?
>
> What does the core renames itself to, that would probably be the biggest hint.
>
> Regards,
>    Alex.
>
> On Wed, 12 May 2021 at 14:00, Oakley, Craig (NIH/NLM/NCBI) [C]
> <cr...@nih.gov.invalid> wrote:
> >
> > This phenomenon has happened again (this time without any REQUESTRECOVERY)
> >
> > Does anyone yet have any explanation of this?
> >
> > -----Original Message-----
> > From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID>
> > Sent: Thursday, January 28, 2021 10:57 AM
> > To: solr-user@lucene.apache.org
> > Subject: Cores renamed
> >
> > We recently have had a few occasions when cores for one specific collection were renamed (or more likely dropped and recreated, and thus ended up with a different core name).
> >
> > Is this a known phenomenon? Is there any explanation?
> >
> > It may be relevant that we just recently started running this SolrCloud on version 8.5.2, although the collection was created under Solr7.4. Also, this collection seems to experience some heavy updates such that the non-Leader replica has trouble keeping up. One of these renames occurred at 4:33am, so I highly suspect that the rename (or drop and recreate) was done by some internal Solr thread rather than by any of my coworkers. One other potential clue is that I can see that /solr/admin/cores?action=REQUESTRECOVERY was usually run on the new core a moment after it was created.
> >
> > Does anyone have any insights?

RE: Cores renamed

Posted by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov.INVALID>.
It happened again this morning.

Attached is an excerpt from solr.log (with port #s & IP addresses redacted) and below is the current CLUSTERSTATUS (with port #s redacted)

Is there yet any explanation?

{
  "responseHeader":{
    "status":0,
    "QTime":10},
  "cluster":{
    "collections":{
      "ipg_report_large":{
        "pullReplicas":"0",
        "replicationFactor":"1",
        "shards":{"shard1":{
            "range":"80000000-7fffffff",
            "state":"active",
            "replicas":{
              "core_node8":{
                "core":"ipg_report_large_shard1_replica_n7",
                "base_url":"http://solrdbprod26.be-md:####/solr",
                "node_name":"solrdbprod26.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false",
                "leader":"true"},
              "core_node10":{
                "core":"ipg_report_large_shard1_replica_n9",
                "base_url":"http://solrdbprod25.be-md:####/solr",
                "node_name":"solrdbprod25.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false"}}}},
        "router":{"name":"compositeId"},
        "maxShardsPerNode":"1",
        "autoAddReplicas":"false",
        "nrtReplicas":"1",
        "tlogReplicas":"0",
        "znodeVersion":741,
        "configName":"ipg_report_large"}},
    "live_nodes":["solrdbprod26.be-md:####_solr",
      "solrdbprod25.be-md:####_solr"]}}

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID> 
Sent: Monday, May 17, 2021 5:01 PM
To: users@solr.apache.org
Subject: RE: Cores renamed

The entire directory for the old core gets removed

Here is CLUSTERSTATUS (again with port numbers redacted). I ran CLUSTERSTATUS on both nodes, and the only difference was QTime (that is, there was no real difference):

{
  "responseHeader":{
    "status":0,
    "QTime":5},
  "cluster":{
    "collections":{
      "ipg_report_large":{
        "pullReplicas":"0",
        "replicationFactor":"1",
        "shards":{"shard1":{
            "range":"80000000-7fffffff",
            "state":"active",
            "replicas":{
              "core_node4":{
                "core":"ipg_report_large_shard1_replica_n3",
                "base_url":"http://solrdbprod26.be-md:####/solr",
                "node_name":"solrdbprod26.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false"},
              "core_node6":{
                "core":"ipg_report_large_shard1_replica_n5",
                "base_url":"http://solrdbprod25.be-md:####/solr",
                "node_name":"solrdbprod25.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false",
                "leader":"true"}}}},
        "router":{"name":"compositeId"},
        "maxShardsPerNode":"1",
        "autoAddReplicas":"false",
        "nrtReplicas":"1",
        "tlogReplicas":"0",
        "znodeVersion":710,
        "configName":"ipg_report_large"}},
    "live_nodes":["solrdbprod26.be-md:####_solr",
      "solrdbprod25.be-md:####_solr"]}}

-----Original Message-----
From: matthew sporleder <ms...@gmail.com> 
Sent: Monday, May 17, 2021 4:34 PM
To: users@solr.apache.org
Subject: Re: Cores renamed

Can you verify all of your zkHost connection params across the entire
cluster, and share the replicationFactor, autoAddReplicas, etc for the
collection?

My theory is that you have two zookeeper configs conflicting as master
elections happens, causing new replicas to get created on-the-fly.

Also -- do these cores get deleted from the filesystem or left around?

On Mon, May 17, 2021 at 4:11 PM Oakley, Craig (NIH/NLM/NCBI) [C]
<cr...@nih.gov.invalid> wrote:
>
> > What does the core renames itself to, that would probably be the biggest hint.
>
> At 4:01pm 1/14/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n1 and to create the core ipg_report_large_shard1_replica_n7 in its place
>
> At 4:33am 1/16/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n5 (on another node of the same SolrCloud) and to create the core ipg_report_large_shard1_replica_n9 in its place
>
> At about 4:10pm 1/26/21, Solr decided on its own to drop this core ipg_report_large_shard1_replica_n9 and to create the core ipg_report_large_shard1_replica_n13 in its place
>
> In March, we created a new SolrCloud for the same collection, and reloaded the data
>
> At 7:59am 5/12/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n1 and to create the core ipg_report_large_shard1_replica_n5 in its place
>
> I am attaching an excerpt from solr.log for the most recent problem (with IP addresses and port numbers redacted)
>
> Please not that Master/Slave replication breaks when a core is renamed, so this can be a major problem
>
>
> Any ideas?
>
> -----Original Message-----
> From: Alexandre Rafalovitch <ar...@gmail.com>
> Sent: Wednesday, May 12, 2021 2:10 PM
> To: users@solr.apache.org
> Subject: Re: Cores renamed
>
> This is truly a shot in the dark, but is it possible you have
> something in core.properties file (which is where the core name is for
> non-Cloud setup)?
>
> What does the core renames itself to, that would probably be the biggest hint.
>
> Regards,
>    Alex.
>
> On Wed, 12 May 2021 at 14:00, Oakley, Craig (NIH/NLM/NCBI) [C]
> <cr...@nih.gov.invalid> wrote:
> >
> > This phenomenon has happened again (this time without any REQUESTRECOVERY)
> >
> > Does anyone yet have any explanation of this?
> >
> > -----Original Message-----
> > From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID>
> > Sent: Thursday, January 28, 2021 10:57 AM
> > To: solr-user@lucene.apache.org
> > Subject: Cores renamed
> >
> > We recently have had a few occasions when cores for one specific collection were renamed (or more likely dropped and recreated, and thus ended up with a different core name).
> >
> > Is this a known phenomenon? Is there any explanation?
> >
> > It may be relevant that we just recently started running this SolrCloud on version 8.5.2, although the collection was created under Solr7.4. Also, this collection seems to experience some heavy updates such that the non-Leader replica has trouble keeping up. One of these renames occurred at 4:33am, so I highly suspect that the rename (or drop and recreate) was done by some internal Solr thread rather than by any of my coworkers. One other potential clue is that I can see that /solr/admin/cores?action=REQUESTRECOVERY was usually run on the new core a moment after it was created.
> >
> > Does anyone have any insights?

RE: Cores renamed

Posted by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov.INVALID>.
The entire directory for the old core gets removed

Here is CLUSTERSTATUS (again with port numbers redacted). I ran CLUSTERSTATUS on both nodes, and the only difference was QTime (that is, there was no real difference):

{
  "responseHeader":{
    "status":0,
    "QTime":5},
  "cluster":{
    "collections":{
      "ipg_report_large":{
        "pullReplicas":"0",
        "replicationFactor":"1",
        "shards":{"shard1":{
            "range":"80000000-7fffffff",
            "state":"active",
            "replicas":{
              "core_node4":{
                "core":"ipg_report_large_shard1_replica_n3",
                "base_url":"http://solrdbprod26.be-md:####/solr",
                "node_name":"solrdbprod26.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false"},
              "core_node6":{
                "core":"ipg_report_large_shard1_replica_n5",
                "base_url":"http://solrdbprod25.be-md:####/solr",
                "node_name":"solrdbprod25.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false",
                "leader":"true"}}}},
        "router":{"name":"compositeId"},
        "maxShardsPerNode":"1",
        "autoAddReplicas":"false",
        "nrtReplicas":"1",
        "tlogReplicas":"0",
        "znodeVersion":710,
        "configName":"ipg_report_large"}},
    "live_nodes":["solrdbprod26.be-md:####_solr",
      "solrdbprod25.be-md:####_solr"]}}

-----Original Message-----
From: matthew sporleder <ms...@gmail.com> 
Sent: Monday, May 17, 2021 4:34 PM
To: users@solr.apache.org
Subject: Re: Cores renamed

Can you verify all of your zkHost connection params across the entire
cluster, and share the replicationFactor, autoAddReplicas, etc for the
collection?

My theory is that you have two zookeeper configs conflicting as master
elections happens, causing new replicas to get created on-the-fly.

Also -- do these cores get deleted from the filesystem or left around?

On Mon, May 17, 2021 at 4:11 PM Oakley, Craig (NIH/NLM/NCBI) [C]
<cr...@nih.gov.invalid> wrote:
>
> > What does the core renames itself to, that would probably be the biggest hint.
>
> At 4:01pm 1/14/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n1 and to create the core ipg_report_large_shard1_replica_n7 in its place
>
> At 4:33am 1/16/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n5 (on another node of the same SolrCloud) and to create the core ipg_report_large_shard1_replica_n9 in its place
>
> At about 4:10pm 1/26/21, Solr decided on its own to drop this core ipg_report_large_shard1_replica_n9 and to create the core ipg_report_large_shard1_replica_n13 in its place
>
> In March, we created a new SolrCloud for the same collection, and reloaded the data
>
> At 7:59am 5/12/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n1 and to create the core ipg_report_large_shard1_replica_n5 in its place
>
> I am attaching an excerpt from solr.log for the most recent problem (with IP addresses and port numbers redacted)
>
> Please not that Master/Slave replication breaks when a core is renamed, so this can be a major problem
>
>
> Any ideas?
>
> -----Original Message-----
> From: Alexandre Rafalovitch <ar...@gmail.com>
> Sent: Wednesday, May 12, 2021 2:10 PM
> To: users@solr.apache.org
> Subject: Re: Cores renamed
>
> This is truly a shot in the dark, but is it possible you have
> something in core.properties file (which is where the core name is for
> non-Cloud setup)?
>
> What does the core renames itself to, that would probably be the biggest hint.
>
> Regards,
>    Alex.
>
> On Wed, 12 May 2021 at 14:00, Oakley, Craig (NIH/NLM/NCBI) [C]
> <cr...@nih.gov.invalid> wrote:
> >
> > This phenomenon has happened again (this time without any REQUESTRECOVERY)
> >
> > Does anyone yet have any explanation of this?
> >
> > -----Original Message-----
> > From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID>
> > Sent: Thursday, January 28, 2021 10:57 AM
> > To: solr-user@lucene.apache.org
> > Subject: Cores renamed
> >
> > We recently have had a few occasions when cores for one specific collection were renamed (or more likely dropped and recreated, and thus ended up with a different core name).
> >
> > Is this a known phenomenon? Is there any explanation?
> >
> > It may be relevant that we just recently started running this SolrCloud on version 8.5.2, although the collection was created under Solr7.4. Also, this collection seems to experience some heavy updates such that the non-Leader replica has trouble keeping up. One of these renames occurred at 4:33am, so I highly suspect that the rename (or drop and recreate) was done by some internal Solr thread rather than by any of my coworkers. One other potential clue is that I can see that /solr/admin/cores?action=REQUESTRECOVERY was usually run on the new core a moment after it was created.
> >
> > Does anyone have any insights?

Re: Cores renamed

Posted by matthew sporleder <ms...@gmail.com>.
Can you verify all of your zkHost connection params across the entire
cluster, and share the replicationFactor, autoAddReplicas, etc for the
collection?

My theory is that you have two zookeeper configs conflicting as master
elections happens, causing new replicas to get created on-the-fly.

Also -- do these cores get deleted from the filesystem or left around?

On Mon, May 17, 2021 at 4:11 PM Oakley, Craig (NIH/NLM/NCBI) [C]
<cr...@nih.gov.invalid> wrote:
>
> > What does the core renames itself to, that would probably be the biggest hint.
>
> At 4:01pm 1/14/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n1 and to create the core ipg_report_large_shard1_replica_n7 in its place
>
> At 4:33am 1/16/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n5 (on another node of the same SolrCloud) and to create the core ipg_report_large_shard1_replica_n9 in its place
>
> At about 4:10pm 1/26/21, Solr decided on its own to drop this core ipg_report_large_shard1_replica_n9 and to create the core ipg_report_large_shard1_replica_n13 in its place
>
> In March, we created a new SolrCloud for the same collection, and reloaded the data
>
> At 7:59am 5/12/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n1 and to create the core ipg_report_large_shard1_replica_n5 in its place
>
> I am attaching an excerpt from solr.log for the most recent problem (with IP addresses and port numbers redacted)
>
> Please not that Master/Slave replication breaks when a core is renamed, so this can be a major problem
>
>
> Any ideas?
>
> -----Original Message-----
> From: Alexandre Rafalovitch <ar...@gmail.com>
> Sent: Wednesday, May 12, 2021 2:10 PM
> To: users@solr.apache.org
> Subject: Re: Cores renamed
>
> This is truly a shot in the dark, but is it possible you have
> something in core.properties file (which is where the core name is for
> non-Cloud setup)?
>
> What does the core renames itself to, that would probably be the biggest hint.
>
> Regards,
>    Alex.
>
> On Wed, 12 May 2021 at 14:00, Oakley, Craig (NIH/NLM/NCBI) [C]
> <cr...@nih.gov.invalid> wrote:
> >
> > This phenomenon has happened again (this time without any REQUESTRECOVERY)
> >
> > Does anyone yet have any explanation of this?
> >
> > -----Original Message-----
> > From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID>
> > Sent: Thursday, January 28, 2021 10:57 AM
> > To: solr-user@lucene.apache.org
> > Subject: Cores renamed
> >
> > We recently have had a few occasions when cores for one specific collection were renamed (or more likely dropped and recreated, and thus ended up with a different core name).
> >
> > Is this a known phenomenon? Is there any explanation?
> >
> > It may be relevant that we just recently started running this SolrCloud on version 8.5.2, although the collection was created under Solr7.4. Also, this collection seems to experience some heavy updates such that the non-Leader replica has trouble keeping up. One of these renames occurred at 4:33am, so I highly suspect that the rename (or drop and recreate) was done by some internal Solr thread rather than by any of my coworkers. One other potential clue is that I can see that /solr/admin/cores?action=REQUESTRECOVERY was usually run on the new core a moment after it was created.
> >
> > Does anyone have any insights?

RE: Cores renamed

Posted by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov.INVALID>.
> What does the core renames itself to, that would probably be the biggest hint.

At 4:01pm 1/14/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n1 and to create the core ipg_report_large_shard1_replica_n7 in its place

At 4:33am 1/16/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n5 (on another node of the same SolrCloud) and to create the core ipg_report_large_shard1_replica_n9 in its place

At about 4:10pm 1/26/21, Solr decided on its own to drop this core ipg_report_large_shard1_replica_n9 and to create the core ipg_report_large_shard1_replica_n13 in its place

In March, we created a new SolrCloud for the same collection, and reloaded the data

At 7:59am 5/12/21, Solr decided on its own to drop the core ipg_report_large_shard1_replica_n1 and to create the core ipg_report_large_shard1_replica_n5 in its place

I am attaching an excerpt from solr.log for the most recent problem (with IP addresses and port numbers redacted)

Please not that Master/Slave replication breaks when a core is renamed, so this can be a major problem


Any ideas?

-----Original Message-----
From: Alexandre Rafalovitch <ar...@gmail.com> 
Sent: Wednesday, May 12, 2021 2:10 PM
To: users@solr.apache.org
Subject: Re: Cores renamed

This is truly a shot in the dark, but is it possible you have
something in core.properties file (which is where the core name is for
non-Cloud setup)?

What does the core renames itself to, that would probably be the biggest hint.

Regards,
   Alex.

On Wed, 12 May 2021 at 14:00, Oakley, Craig (NIH/NLM/NCBI) [C]
<cr...@nih.gov.invalid> wrote:
>
> This phenomenon has happened again (this time without any REQUESTRECOVERY)
>
> Does anyone yet have any explanation of this?
>
> -----Original Message-----
> From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID>
> Sent: Thursday, January 28, 2021 10:57 AM
> To: solr-user@lucene.apache.org
> Subject: Cores renamed
>
> We recently have had a few occasions when cores for one specific collection were renamed (or more likely dropped and recreated, and thus ended up with a different core name).
>
> Is this a known phenomenon? Is there any explanation?
>
> It may be relevant that we just recently started running this SolrCloud on version 8.5.2, although the collection was created under Solr7.4. Also, this collection seems to experience some heavy updates such that the non-Leader replica has trouble keeping up. One of these renames occurred at 4:33am, so I highly suspect that the rename (or drop and recreate) was done by some internal Solr thread rather than by any of my coworkers. One other potential clue is that I can see that /solr/admin/cores?action=REQUESTRECOVERY was usually run on the new core a moment after it was created.
>
> Does anyone have any insights?

Re: Cores renamed

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
This is truly a shot in the dark, but is it possible you have
something in core.properties file (which is where the core name is for
non-Cloud setup)?

What does the core renames itself to, that would probably be the biggest hint.

Regards,
   Alex.

On Wed, 12 May 2021 at 14:00, Oakley, Craig (NIH/NLM/NCBI) [C]
<cr...@nih.gov.invalid> wrote:
>
> This phenomenon has happened again (this time without any REQUESTRECOVERY)
>
> Does anyone yet have any explanation of this?
>
> -----Original Message-----
> From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID>
> Sent: Thursday, January 28, 2021 10:57 AM
> To: solr-user@lucene.apache.org
> Subject: Cores renamed
>
> We recently have had a few occasions when cores for one specific collection were renamed (or more likely dropped and recreated, and thus ended up with a different core name).
>
> Is this a known phenomenon? Is there any explanation?
>
> It may be relevant that we just recently started running this SolrCloud on version 8.5.2, although the collection was created under Solr7.4. Also, this collection seems to experience some heavy updates such that the non-Leader replica has trouble keeping up. One of these renames occurred at 4:33am, so I highly suspect that the rename (or drop and recreate) was done by some internal Solr thread rather than by any of my coworkers. One other potential clue is that I can see that /solr/admin/cores?action=REQUESTRECOVERY was usually run on the new core a moment after it was created.
>
> Does anyone have any insights?

RE: Cores renamed

Posted by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov.INVALID>.
This phenomenon has happened again (this time without any REQUESTRECOVERY)

Does anyone yet have any explanation of this?

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov.INVALID> 
Sent: Thursday, January 28, 2021 10:57 AM
To: solr-user@lucene.apache.org
Subject: Cores renamed

We recently have had a few occasions when cores for one specific collection were renamed (or more likely dropped and recreated, and thus ended up with a different core name).

Is this a known phenomenon? Is there any explanation?

It may be relevant that we just recently started running this SolrCloud on version 8.5.2, although the collection was created under Solr7.4. Also, this collection seems to experience some heavy updates such that the non-Leader replica has trouble keeping up. One of these renames occurred at 4:33am, so I highly suspect that the rename (or drop and recreate) was done by some internal Solr thread rather than by any of my coworkers. One other potential clue is that I can see that /solr/admin/cores?action=REQUESTRECOVERY was usually run on the new core a moment after it was created.

Does anyone have any insights?