You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Varun Thacker (JIRA)" <ji...@apache.org> on 2018/06/08 18:16:06 UTC

[jira] [Created] (SOLR-12470) Search Rate Trigger created more than 3 replicas

Varun Thacker created SOLR-12470:
------------------------------------

             Summary: Search Rate Trigger created more than 3 replicas
                 Key: SOLR-12470
                 URL: https://issues.apache.org/jira/browse/SOLR-12470
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: AutoScaling
            Reporter: Varun Thacker
         Attachments: 4_docs.json, bug_report.txt, graph_view.png, multiple_replicas.zip, system_docs.json

Here's the trigger that I created . At this point the collection was one shard and one replica ( on node3 )
{code:java}
curl -X POST -H 'Content-type:application/json' --data-binary '{
"set-trigger": {
"name" : "search_rate_trigger",
"event" : "searchRate",
"collection" : "test_rate_trigger",
"rate" : 1.0,
"waitFor" : "1m",
"enabled" : true,
"actions" : [
{
"name" : "compute_plan",
"class": "solr.ComputePlanAction"
},
{
"name" : "execute_plan",
"class": "solr.ExecutePlanAction"
}
]
}
}' http://localhost:8983/solr/admin/autoscaling{code}
 

I also had a trigger listener setup as I was testing the listener feature
{code:java}
curl -X POST -H 'Content-type:application/json' --data-binary '{
"set-listener": {
"name": "search_rate_listener",
"trigger": "search_rate_trigger",
"stage": ["STARTED", "ABORTED", "SUCCEEDED", "FAILED"],
"class": "solr.SystemLogListener"
}
}' http://localhost:8983/solr/admin/autoscaling{code}
 

I ran a script to fire queries at every 100ms . The index didn't have any docs so it's a simple match all query
{code:java}
while [ 1 ]
do
curl -s "http://localhost:8984/solr/test_rate_trigger/select/?q=*:*" > /dev/null
sleep .1
done{code}

After a few minutes I see 4 replicas being created.

Attaching logs from all 4 nodes. It should be fairly easy to reproduce with the above mentioned steps

Also attaching all the docs from the .system collection for reference

Here's another interesting this I noticed. I re-created the setup but this time removed the execute_plan part

Now every 1 min the compute plan action tries to create 3 replicas . Why I found this interesting is that it was trying to create two replicas on the same node
Does this look like a separate bug?
{code:java}
INFO - 2018-06-08 03:41:32.586; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=0
INFO - 2018-06-08 03:41:40.909; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core} status=0 QTime=1
INFO - 2018-06-08 03:41:40.932; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT
INFO - 2018-06-08 03:41:40.933; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
INFO - 2018-06-08 03:41:40.934; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
INFO - 2018-06-08 03:41:40.934; [ ] org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession, curr-time 9184331 sessionWrapper.createTime 9184324085271, this.sessionWrapper.createTime 9184324085271
INFO - 2018-06-08 03:42:32.604; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=0
INFO - 2018-06-08 03:42:41.525; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core} status=0 QTime=0
INFO - 2018-06-08 03:42:41.559; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT
INFO - 2018-06-08 03:42:41.560; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
INFO - 2018-06-08 03:42:41.560; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
INFO - 2018-06-08 03:42:41.561; [ ] org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession, curr-time 9244959 sessionWrapper.createTime 9244956725861, this.sessionWrapper.createTime 9244956725861
INFO - 2018-06-08 03:43:32.622; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=0
INFO - 2018-06-08 03:43:42.158; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core} status=0 QTime=1
INFO - 2018-06-08 03:43:42.178; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT
INFO - 2018-06-08 03:43:42.180; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
INFO - 2018-06-08 03:43:42.181; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
INFO - 2018-06-08 03:43:42.181; [ ] org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession, curr-time 9305581 sessionWrapper.createTime 9305577119413, this.sessionWrapper.createTime 9305577119413
INFO - 2018-06-08 03:44:32.642; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=0
INFO - 2018-06-08 03:44:42.759; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core} status=0 QTime=0
INFO - 2018-06-08 03:44:42.778; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT
INFO - 2018-06-08 03:44:42.779; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
INFO - 2018-06-08 03:44:42.779; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
INFO - 2018-06-08 03:44:42.779; [ ] org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession, curr-time 9366182 sessionWrapper.createTime 9366178796748, this.sessionWrapper.createTime 9366178796748{code}
 

Thirdly, for this above mentioned test I started observing the .system collection . Here is a query that I used to get the documents created from the first time the listener kicked in
{code:java}
http://localhost:8983/solr/.system/select?fq=source_s:SystemLogListener&q=*:*&rows=4&sort=timestamp%20asc{code}
My first expectation was that I'd see 3 docs but I saw 4 docs. Curious why it's 4 ( the docs are attached as 4_docs.json )

My intention here is to remove the system log listener with an http listener here I wanted to understand should I be looking out for 4 events or 3
The first reaction here is it's a minor bug hence I'm putting it as part of this jira

Happy to break it up into smaller Jiras once I hear back if these are valid issues. This test was run against master



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org