You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by cdatta <ch...@gmail.com> on 2018/08/02 18:36:11 UTC

SolrCloud CDCR issue

Hello,

Thanks for reading my post!

We have the following environment setup:
SOLRCloud
SOLR Version: 7.3.1
9 Nodes per DC
2 DCs
2 Separate ZK ensemble (one for each SOLR DC)
CDCR bidirectional enabled.
2 Collections.
3 shards per collection, 3 replication factors. 
Basic auth enabled. (Aware of CDCR basic auth issues, so added other
DC_nodes information as part of live_nodes.
ZK ACL enabled.
Solr Node JVM heap=64 GB with G1GC enabled and tuned.

#########################
solrConfig settings for CDCR

 <lst name="replicator">
    <str name="threadPoolSize">8</str>
    <str name="schedule">1000</str>
    <str name="batchSize">512</str>
  </lst>

  <lst name="updateLogSynchronizer">
    <str name="schedule">1000</str>
    </lst>

#########################

-Dsolr.autoCommit.maxTime=60000 -Dsolr.autoSoftCommit.maxTime=1000

#########################

Now, we are seeing the following issues:

1. Data inserted into one DC not forwarding into other DC after insert
without any hard commit. 
2. Data inserted into one DC not forwarding into other DC after insert with
hard commit. Verified with /get as well.
3. After doing a hard commit on target DC and RELOAD, data started showing
up. But solr numfound is not matching across DCs. 

Errors:
Each individual shards leader queueSize was either -1 or 0. And showing
bad_request

8983/solr/collection_name_shard2_replica_n6/cdcr?action=QUEUES

{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "queues":[
    "abc.com:2181,abc1.com:2181,abc2.com:2181",[
      "collection_name",[
        "queueSize",0,
        "lastTimestamp","2018-08-01T17:21:29.990Z"]]],
  "tlogTotalSize":16545113,
  "tlogTotalCount":5,
  "updateLogSynchronizer":"stopped"}



ERROR from log:


INFO  - 2018-07-31 17:54:46.722; [   ]
org.apache.solr.handler.CdcrReplicatorManager$BootstrapStatusRunnable; CDCR
bootstrap successful in 5 seconds
INFO  - 2018-07-31 17:54:46.889; [   ]
org.apache.solr.handler.CdcrReplicatorManager$BootstrapStatusRunnable;
Create new update log reader for target collection_name with checkpoint
1607545724212346885 @ collection_name:shard2
ER
ERROR - 2018-07-31 17:54:47.052; [   ]
org.apache.solr.handler.CdcrReplicatorManager$BootstrapStatusRunnable;
Unable to bootstrap the target collection collection_name shard: shard2


WARN : [c:collection_name s:shard2 r:core_node11
x:collection_name_shard2_replica_n8]
org.apache.solr.handler.CdcrRequestHandler; The log reader for target
collection collection_name is not initialised @ collection_name:shard2

So wondering how do we proceed further. Thanks in advance.








--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SolrCloud CDCR issue

Posted by Amrit Sarkar <sa...@gmail.com>.
Hi,

Yeah if you look above I have stated the same jira. I see your question on
3DCs with Active-Active scenario, will respond there.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Mon, Aug 13, 2018 at 9:43 PM cdatta <ch...@gmail.com> wrote:

> And I was thinking about this one:
> https://issues.apache.org/jira/browse/SOLR-11959.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: SolrCloud CDCR issue

Posted by cdatta <ch...@gmail.com>.
And I was thinking about this one:
https://issues.apache.org/jira/browse/SOLR-11959.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SolrCloud CDCR issue

Posted by cdatta <ch...@gmail.com>.
I am following the workaround mentioned over here related to auth.
https://stackoverflow.com/questions/48790621/solr-cdcr-doesnt-work-if-the-authentication-is-enabled. 

My question is why all documents are not getting forwarded? Is there
something else that we are we missing here?
Also wondering is there any restriction we have from the CDCR standpoint to
have 3 DCs as ACTIVE/ACTIVE/ACTIVE scenario.

Regards,
Chandi



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SolrCloud CDCR issue

Posted by Amrit Sarkar <sa...@gmail.com>.
To the concerned,

I am afraid in informing, Authentication are not supported between Solr
clusters: https://issues.apache.org/jira/browse/SOLR-11959.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Sat, Aug 11, 2018 at 10:32 AM cdatta <ch...@gmail.com> wrote:

> I followed the exact steps you suggested. Now I am not seeing that error.
>
> INFO  - 2018-08-10 15:23:58.159; [c:collection_name s:shard2 r:core_node13
> x:collection_name_shard2_replica_n10]
> org.apache.solr.handler.CdcrReplicator; Forwarded 10 updates to target
> collection_name
>
> However, in destination DC, I am seeing different numFounds per retry. Even
> after CORE reload it's not showing exact same number.
>
> Source: Total Doc: 1310
> Destination: Total Doc :1310
>                                  :908
>                                  :457
>
> I stopped the indexing and waited for the max autocommit interval for that
> collection to expire. Even after that, did not get consistent results. Do I
> have to send explicit hard commit?
>
> Source/Desination DC: I am seeing following error now though a. Not sure if
> this is related to an existing CDCR JIRA I saw.
>
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://host:8983/solr/collection_name_shard1_replica_n2:
> Expected mime type application/octet-stream but got text/html. <html>
> <head>
> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
> <title>Error 401 require authentication</title>
> </head>
> <body>
> HTTP ERROR 401
>
> <p>Problem accessing /solr/collection_name_shard1_replica_n2/cdcr. Reason:
> <pre>    require authentication</pre></p>
> </body>
> </html>
>
>   at
>
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607)
>   at
>
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>   at
>
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>   at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
>   at org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronis
>
> org.apache.solr.common.SolrException: Unable to locate core
> collection_name_shard1_replica_n2
>   at
>
> org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$5(CoreAdminOperation.java:149)
>   at
>
> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:358)
>   at
>
> org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:389)
>
>
> Here is our security.json
>
> {
>   "authentication":{
>     "blockUnknown":true,
>     "class":"solr.BasicAuthPlugin",
>     "credentials":{
>       "solr":"--REDACTED--",
>       "admin":"--REDACTED--",
>       "solr_dev":"--REDACTED--",
>       "app_2_user":"--REDACTED--",
>       "app_1_user":"--REDACTED--"},
>     "":{"v":6}},
>   "authorization":{
>     "class":"solr.RuleBasedAuthorizationPlugin",
>     "permissions":[
>       {
>         "name":"security-edit",
>         "role":"admin",
>         "index":1},
>       {
>         "name":"collection-admin-read",
>         "role":[
>           "read",
>           "read_write",
>           "admin"],
>         "index":2},
>       {
>         "name":"read",
>         "role":[
>           "read",
>           "read_write",
>           "admin"],
>         "index":3},
>       {
>         "name":"core-admin-read",
>         "role":[
>           "read",
>           "read_write",
>           "admin"],
>         "index":4},
>       {
>         "name":"schema-read",
>         "role":[
>           "read",
>           "read_write",
>           "admin"],
>         "index":5},
>       {
>         "name":"config-read",
>         "role":[
>           "read",
>           "read_write",
>           "admin"],
>         "index":6},
>       {
>         "name":"admin-ui",
>         "path":"/",
>         "role":[
>           "read",
>           "read_write",
>           "admin"],
>         "index":7},
>       {
>         "collection":null,
>         "path":"/admin/zookeeper",
>         "role":["admin"],
>         "index":8},
>       {
>         "collection":"*",
>         "path":"/admin/file",
>         "role":["admin"],
>         "index":9},
>       {
>         "collection":"*",
>         "path":"/admin/files",
>         "role":"admin",
>         "index":10},
>       {
>         "collection":"*",
>         "path":"/dataimport",
>         "role":["admin"],
>         "index":11},
>       {
>         "name":"collection-admin-edit",
>         "role":["admin"],
>         "index":12},
>       {
>         "name":"update",
>         "role":[
>           "admin",
>           "read_write"],
>         "index":13},
>       {
>         "name":"schema-edit",
>         "role":["admin"],
>         "index":14},
>       {
>         "name":"config-edit",
>         "role":["admin"],
>         "index":15},
>       {
>         "name":"core-admin-edit",
>         "role":["admin"],
>         "index":16}],
>     "user-role":{
>       "solr":"admin",
>       "app_1_user":"read_write",
>       "solr_dev":"read",
>       "app_2_user":"read_write",
>       "admin":["admin"]},
>     "":{"v":19}}}
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: SolrCloud CDCR issue

Posted by cdatta <ch...@gmail.com>.
I followed the exact steps you suggested. Now I am not seeing that error. 

INFO  - 2018-08-10 15:23:58.159; [c:collection_name s:shard2 r:core_node13
x:collection_name_shard2_replica_n10]
org.apache.solr.handler.CdcrReplicator; Forwarded 10 updates to target
collection_name

However, in destination DC, I am seeing different numFounds per retry. Even
after CORE reload it's not showing exact same number.

Source: Total Doc: 1310
Destination: Total Doc :1310
                                 :908
                                 :457

I stopped the indexing and waited for the max autocommit interval for that
collection to expire. Even after that, did not get consistent results. Do I
have to send explicit hard commit? 

Source/Desination DC: I am seeing following error now though a. Not sure if
this is related to an existing CDCR JIRA I saw.

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://host:8983/solr/collection_name_shard1_replica_n2:
Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 require authentication</title>
</head>
<body>
HTTP ERROR 401

<p>Problem accessing /solr/collection_name_shard1_replica_n2/cdcr. Reason:
<pre>    require authentication</pre></p>
</body>
</html>

  at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607)
  at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
  at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
  at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
  at org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronis

org.apache.solr.common.SolrException: Unable to locate core
collection_name_shard1_replica_n2
  at
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$5(CoreAdminOperation.java:149)
  at
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:358)
  at
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:389)


Here is our security.json

{
  "authentication":{
    "blockUnknown":true,
    "class":"solr.BasicAuthPlugin",
    "credentials":{
      "solr":"--REDACTED--",
      "admin":"--REDACTED--",
      "solr_dev":"--REDACTED--",
      "app_2_user":"--REDACTED--",
      "app_1_user":"--REDACTED--"},
    "":{"v":6}},
  "authorization":{
    "class":"solr.RuleBasedAuthorizationPlugin",
    "permissions":[
      {
        "name":"security-edit",
        "role":"admin",
        "index":1},
      {
        "name":"collection-admin-read",
        "role":[
          "read",
          "read_write",
          "admin"],
        "index":2},
      {
        "name":"read",
        "role":[
          "read",
          "read_write",
          "admin"],
        "index":3},
      {
        "name":"core-admin-read",
        "role":[
          "read",
          "read_write",
          "admin"],
        "index":4},
      {
        "name":"schema-read",
        "role":[
          "read",
          "read_write",
          "admin"],
        "index":5},
      {
        "name":"config-read",
        "role":[
          "read",
          "read_write",
          "admin"],
        "index":6},
      {
        "name":"admin-ui",
        "path":"/",
        "role":[
          "read",
          "read_write",
          "admin"],
        "index":7},
      {
        "collection":null,
        "path":"/admin/zookeeper",
        "role":["admin"],
        "index":8},
      {
        "collection":"*",
        "path":"/admin/file",
        "role":["admin"],
        "index":9},
      {
        "collection":"*",
        "path":"/admin/files",
        "role":"admin",
        "index":10},
      {
        "collection":"*",
        "path":"/dataimport",
        "role":["admin"],
        "index":11},
      {
        "name":"collection-admin-edit",
        "role":["admin"],
        "index":12},
      {
        "name":"update",
        "role":[
          "admin",
          "read_write"],
        "index":13},
      {
        "name":"schema-edit",
        "role":["admin"],
        "index":14},
      {
        "name":"config-edit",
        "role":["admin"],
        "index":15},
      {
        "name":"core-admin-edit",
        "role":["admin"],
        "index":16}],
    "user-role":{
      "solr":"admin",
      "app_1_user":"read_write",
      "solr_dev":"read",
      "app_2_user":"read_write",
      "admin":["admin"]},
    "":{"v":19}}}




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SolrCloud CDCR issue

Posted by Amrit Sarkar <sa...@gmail.com>.
Honestly, any of the in that case. Please follow the following steps;

1. Stop CDCR on cluster-1
2. Stop CDCR on cluster-2
Both the above steps are critical.
3. Shut down all nodes of cluster-1
4. Shut down all nodes of cluster-2
5. Start all nodes at cluster-1
6. Start all nodes at cluster-2
7. Start CDCR on cluster-1
Go to logs and verify "forwarding has been started"
8. Start CDCR on cluster-2
Do the same sanity check.

I understand this is unnecessarily complex but it is the manner CDCR was
designed in the beginning. Please give it a shot and let us know.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Fri, Aug 10, 2018 at 9:09 PM cdatta <ch...@gmail.com> wrote:

> Really appreciate your response.
> I saw this information in some of your earlier posts related to CDCR. We
> are
> using our Cloud Cluster as an Active/Active settings and bi-directional
> CDCR.
> In that case, which one should we start first?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: SolrCloud CDCR issue

Posted by cdatta <ch...@gmail.com>.
Really appreciate your response.
I saw this information in some of your earlier posts related to CDCR. We are
using our Cloud Cluster as an Active/Active settings and bi-directional
CDCR.
In that case, which one should we start first?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SolrCloud CDCR issue

Posted by Amrit Sarkar <sa...@gmail.com>.
To the concerned,

WARN : [c:collection_name s:shard2 r:core_node11
> x:collection_name_shard2_replica_n8]
> org.apache.solr.handler.CdcrRequestHandler; The log reader for target
> collection collection_name is not initialised @ collection_name:shard2
>

This means the source cluster was started first and then target. You need
to shut down all the nodes both at source and target. Get the targe nodes
up, all of them before starting the source ones. Logs will be initialized
positively.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Fri, Aug 3, 2018 at 11:33 PM cdatta <ch...@gmail.com> wrote:

> Any pointers?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: SolrCloud CDCR issue

Posted by cdatta <ch...@gmail.com>.
Any pointers?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html