You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2015/07/09 07:24:05 UTC

[jira] [Updated] (SOLR-7172) addreplica API fails with incorrect error msg "cannot create collection"

     [ https://issues.apache.org/jira/browse/SOLR-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erick Erickson updated SOLR-7172:
---------------------------------
    Attachment: SOLR-7172.patch

[~shalinmangar] [~noble.paul] Pinging you two since you've been into Assign more recently than I have and this looks bogus. But then again it's late and I suspect I'm missing something obvious. So before I dive into this am I off base?

Anyway, Assign.getNodesForNewShard doesn't make sense to me. First of all, it's only called from CREATESHARD and ADDREPLICA. It looks like copy/paste from CREATE though, and some assumptions just don't seem to work. The error message about "cannot create collection" is totally bogus since this is never called from CREATE. (thus this JIRA, which is much more serious I think than a wonky error message).

Anyway, part of the problem is in the calculations around line 208:

{code}
    int maxCoresAllowedToCreate = maxShardsPerNode * nodeList.size();
    int requestedCoresToCreate = numSlices * repFactor;
    int minCoresToCreate = requestedCoresToCreate;
    if (maxCoresAllowedToCreate < minCoresToCreate) { throw long, complex error}
{code}

In these two operations, this doesn't take into account the replicas for the collection already on the nodes in nodeList. It seems to me that nodeList is the wrong thing to be looking at as well, we've already collected a list of nodes we could put additional replicas on, and the counts of replicas belonging to the collection in question already on those nodes in nodeNameVsShardCount, shouldn't we be using those? And shouldn't the error be thrown if the number of available slots < numberOfNodes? The number of available slots isn't calculated correctly I don't think.

How this interacts with rules is a mystery to me though, don't want to wade around in this without a check. The attached patch is full of nocommits but shows what I had in mind. But it's late so don't look too closely, if you two think this is on track I'll make it a _real_ patch.

> addreplica API fails with incorrect error msg "cannot create collection"
> ------------------------------------------------------------------------
>
>                 Key: SOLR-7172
>                 URL: https://issues.apache.org/jira/browse/SOLR-7172
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.10.3, 5.0
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 5.2, Trunk
>
>         Attachments: SOLR-7172.patch
>
>
> Steps to reproduce:
> # Create 1 node solr cloud cluster
> # Create collection 'test' with numShards=1&replicationFactor=1&maxShardsPerNode=1
> # Call addreplica API:
> {code}
> http://localhost:8983/solr/admin/collections?action=addreplica&collection=test&shard=shard1&wt=json 
> {code}
> API fails with the following response:
> {code}
> {
> responseHeader: {
> status: 400,
> QTime: 9
> },
> Operation ADDREPLICA caused exception:: "org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Cannot create collection test. No live Solr-instances",
> exception: {
> msg: "Cannot create collection test. No live Solr-instances",
> rspCode: 400
> },
> error: {
> msg: "Cannot create collection test. No live Solr-instances",
> code: 400
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org