You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mikhail Khludnev (JIRA)" <ji...@apache.org> on 2016/10/19 14:04:58 UTC

[jira] [Updated] (SOLR-9647) CollectionsAPIDistributedZkTest got stuck, reproduces failure

     [ https://issues.apache.org/jira/browse/SOLR-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Khludnev updated SOLR-9647:
-----------------------------------
    Attachment: SOLR-9647.patch

[^SOLR-9647.patch] addresses a case when   CollectionsAPIDistributedZkTest.testCollectionsAPIAddRemoveStress() tries to spawn too many cores, it hangs and flood heap up to OOME. The reasons are: 
* when many cores register at mbean server, all of them hangs on some synchronized policy check inside jmx.
* default version buckets are huge by default, but that method even doesn't send updates. 
This patch introduces {{solrconfig-slim.xml}} in {{stressconf}} cloud configset without jmx and with trimmed version buckets. It doesn't address the speculations in the comment above. 
One more change has been required: there is a code branch: pick up the only existing configSet if there is no one specified explicitly. But testCollectionsAPIAddRemoveStress now requires an alternative configSet that's why it's skipped with %50 prob. 

Is it worth to commit? 

> CollectionsAPIDistributedZkTest got stuck, reproduces failure
> -------------------------------------------------------------
>
>                 Key: SOLR-9647
>                 URL: https://issues.apache.org/jira/browse/SOLR-9647
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Mikhail Khludnev
>         Attachments: SOLR-9647.patch
>
>
>  I have to shoot https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1129/ just because "Took 1 day 12 hr on lucene".
>    [junit4] HEARTBEAT J0 PID(30506@lucene1-us-west): 2016-10-15T00:08:30, stalled for 48990s at: CollectionsAPIDistributedZkTest.test
>    [junit4] HEARTBEAT J0 PID(30506@lucene1-us-west): 2016-10-15T00:09:30, stalled for 49050s at: CollectionsAPIDistributedZkTest.test
>  It's just got stuck. Then I run it locally, it passes from Eclipse, but fails when I run from cmd>ant. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org