You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Mark Robert Miller (Jira)" <ji...@apache.org> on 2021/07/20 02:49:00 UTC

[jira] [Comment Edited] (SOLR-12386) Test fails for "Can't find resource" for files in the _default configset

    [ https://issues.apache.org/jira/browse/SOLR-12386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383710#comment-17383710 ] 

Mark Robert Miller edited comment on SOLR-12386 at 7/20/21, 2:48 AM:
---------------------------------------------------------------------

{quote}Retries and attempts from everyone to create core zk nodes
{quote}
I should illustrate that a bit.

Lets say you start up 100 solr servers for a nice new cluster. Or an existing cluster.

Generally, what is going to happen is that every Solr instance is going to hit ZK and do something like, ensure /configs exists. And makePath on /path1/path2/path3. Which may be 3 calls, just in case path1 and path2 don't yet exist. So what essentially happens, is that all the time, we have 100 solr servers try / retrying to make the same paths, the same existing or about to exist path parts, etc. Racing each other to get those same path parts in for a path.

So we do something like boot up a new cluster, and the zk base layout could maybe be created with, let's say, 15 zk calls. And maybe we make 900. And 100's more on a restart for nodes that are created on day 1, instant 1. Maybe we recreate nodes someone / some process just tried to delete in this process. Since independent things in lots of random places are trying to ensure nodes exist (that should be one and done on first startup, or collection create, etc), maybe you end up will all kinds of zk calls from all these servers even at random times outside startup, restart, collection create. When you could simply have a one time instance of, create these dozen paths, one client says it, it's done, case closed forever more.


was (Author: markrmiller):
{quote}Retries and attempts from everyone to create core zk nodes
{quote}
I should illustrate that a bit.

Lets say you start up 100 solr servers for a nice new cluster. Or an existing cluster.

Generally, what is going to happen is that every Solr instance is going to hit ZK and do something like, ensure /configs exists. And makePath on /path1/path2/path3. Which may be 3 calls, just in case path1 and path2 don't yet exist. So what essentially happens, is that all the time, we have 100 solr servers try / retrying to make same paths, the same exisiting or about to exist path parts, etc. So we do something like boot up a new cluster, and the zk base layout could maybe created with, let's say, 15 zk calls. And maybe we make 900. And 100's more on a restart for nodes that are created on day 1, instant 1. Maybe we recreate nodes someone just tried to delete in this process. Since independent things in lots of random places are trying to ensure nodes exist (that should be one and done on first startup, or collection create, etc, maybe you end up will all kinds of zk calls from all these servers even at random times outside startup, restart, collection create. When you could simply have one time instance of, create these dozen paths, one server says it, it's done, case closed forever more.

> Test fails for "Can't find resource" for files in the _default configset
> ------------------------------------------------------------------------
>
>                 Key: SOLR-12386
>                 URL: https://issues.apache.org/jira/browse/SOLR-12386
>             Project: Solr
>          Issue Type: Test
>          Components: SolrCloud
>            Reporter: David Smiley
>            Priority: Minor
>         Attachments: cant find resource, stacktrace.txt
>
>
> Some tests, especially ConcurrentCreateRoutedAliasTest, have failed sporadically failed with the message "Can't find resource" pertaining to a file that is in the default ConfigSet yet mysteriously can't be found.  This happens when a collection is being created that ultimately fails for this reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org