You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrew Kettmann <an...@evolve24.com> on 2019/06/20 17:58:10 UTC

Solr 7.7.2 - SolrCloud - Autoscale Triggers - indexSize trigger - Failure isn't sending listener a FAILED message, but a SUCCEEDED message

First, pardon any copy/pasted examples of my policies/triggers/etc as they are in Python format as that is my language of choice when working with APIs and the like. So Ignore that they are not JSON exactly as the APIs are getting JSON.


Issue summary: Collection with strict autoscaling rules that cannot be satisfied, when an IndexSize trigger is fired to split the core, it fires over and over, and it sends a SUCCESSFUL message via a configured HTTP listener.


Solr 7.7.2, SolrCloud. Collection with the following policy:


{'set-policy': {'othersolr7': [{'node': '#ANY',
                                'replica': '<2',
                                'strict': 'true'},
                               {'replica': '#ALL',
                                'shard': '#ANY',
                                'sysprop.HELM_CHART': 'othersolr7'}]}}


So one core per node, and strict set to true, There are TWO total nodes that satisfy this.


Collection is 1 shard with 2 total NRT replicas.


Configured a trigger to split at 9999 docs:


{'aboveDocs': '9999',
 'event': 'indexSize',
 'name': 'index_size_trigger_9999_docs',
 'splitMethod': 'link',
 'waitFor': '5s'}


Also a listener configured to send HTTP posts:


{'set-listener': {'afterAction': ['execute_plan'],
                  'class': 'solr.HttpTriggerListener',
                  'header.X-Trigger': '${config.trigger}',
                  'name': 'test-to-flask',
                  'stage': ['ABORTED', 'SUCCEEDED', 'FAILED'],
                  'trigger': 'index_size_trigger_9999_docs',
                  'url': 'http://HOST:5000/post/${config.name:invalidName}/${config.trigger}/${event.id}?STAGE=${stage}'}}


I put 10K docs into the collection to trigger the indexsize trigger and it triggers over and over, sending a post to my listener each time, and sending a SUCCESSFUL message after each one. New event ID each time it triggers and goes round. The message received for the "afterAction" of the execute_plan shows an error:


 'context.operations': '[{\n'
                       '  '
                       '"class":"org.apache.solr.client.solrj.request.CollectionAdminRequest$SplitShard",\n'
                       '  "method":"GET",\n'
                       '  "params.action":"SPLITSHARD",\n'
                       '  '
                       '"params.async":"index_size_trigger_9999_docs/2bbd7de63de68T2eupg9aq3fuuy2lnyi9s1ha0h/0",\n'
                       '  "params.waitForFinalState":"true",\n'
                       '  "params.collection":"othersolr7",\n'
                       '  "params.shard":"shard1",\n'
                       '  "params.splitMethod":"link"}]',
 'context.responses': '[{responseHeader={status=0,QTime=2},Operation '
                      'splitshard caused '
                      'exception:=org.apache.solr.common.SolrException:org.apache.solr.common.SolrException,exception={msg=null,rspCode=500},status={state=failed,msg=found '
                      '[index_size_trigger_9999_docs/2bbd7de63de68T2eupg9aq3fuuy2lnyi9s1ha0h/0] '
                      'in failed tasks}}]',


But then after I get that I still receive a successful message:


{'actionName': '',
 'config.afterActions': 'execute_plan',
 'config.beforeActions': '',
 'config.listenerClass': 'solr.HttpTriggerListener',
 'config.name': 'test-to-flask',
 'config.properties.afterAction': '[execute_plan]',
 'config.properties.beforeAction': '[]',
 'config.properties.class': 'solr.HttpTriggerListener',
 'config.properties.header.X-Trigger': '${config.trigger}',
 'config.properties.stage': '[ABORTED, SUCCEEDED, FAILED]',
 'config.properties.trigger': 'index_size_trigger_9999_docs',
 'config.properties.url': 'http://HOST:5000/post/${config.name:invalidName}/${config.trigger}/${event.id}?STAGE=${stage}',
 'config.stages': 'ABORTED,SUCCEEDED,FAILED',
 'config.trigger': 'index_size_trigger_9999_docs',
 'error': '',
 'event.eventTime': '769485776871016',
 'event.eventType': 'INDEXSIZE',
 'event.id': '2bbd7de63de68T2eupg9aq3fuuy2lnyi9s1ha0h',
 'event.properties.__start__': '1',
 'event.properties._enqueue_time_': '769495912359525',
 'event.properties.aboveSize': '{othersolr7_shard1_replica_n2=docs=10000, '
                               'bytes=9708660}',
 'event.properties.belowSize': '{}',
 'event.properties.requestedOps': '[Op{action=SPLITSHARD, '
                                  'hints={COLL_SHARD=[{\n'
                                  '  "first":"othersolr7",\n'
                                  '  "second":"shard1"}], '
                                  'PARAMS={splitMethod=link}}}]',
 'event.source': 'index_size_trigger_9999_docs',
 'message': '',
 'stage': 'SUCCEEDED'}



And then it continually loops and sends "successful" messages after each failed attempt. The failure, I understand because this is an unfixable situation for Solr, it can't both meet my policies in this situation AND execute the trigger. The problem is the listener sending successes each time. Anyone able to shed some light on this ? Working on setting up some automation so that when we split cores, we automatically create new containers for Solr to use and shuffle cores onto, I was testing failure cases and found this issue. Is this just a ticket I need to open in Jira or is there something I am missing ?



[https://storage.googleapis.com/e24-email-images/e24logonotag.png]<https://www.evolve24.com> Andrew Kettmann
DevOps Engineer
P: 1.314.596.2836
[LinkedIn]<https://linkedin.com/company/evolve24> [Twitter] <https://twitter.com/evolve24>  [Instagram] <https://www.instagram.com/evolve_24>

evolve24 Confidential & Proprietary Statement: This email and any attachments are confidential and may contain information that is privileged, confidential or exempt from disclosure under applicable law. It is intended for the use of the recipients. If you are not the intended recipient, or believe that you have received this communication in error, please do not read, print, copy, retransmit, disseminate, or otherwise use the information. Please delete this email and attachments, without reading, printing, copying, forwarding or saving them, and notify the Sender immediately by reply email. No confidentiality or privilege is waived or lost by any transmission in error.