You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@solr.apache.org by Mike Drob <md...@mdrob.com> on 2021/12/01 19:24:00 UTC

Re: First class support for node roles

Noble wrote:
> We are not modifying the way the "overseer role" works today. We are just
changing the definition and standardizing the configuration &
discoverability
Ishan wrote:
> As of this SIP, we're not planning to modify the OVERSEER role (which
currently stands for preferred overseer). We can take a stab at refactoring
it later.

Grouping these two comments together, since I think they are saying
the same thing. I think this is part of my confusion. We have an old system
that doesn't work the way we want the new system to work. There may be
people already using the old system. What path do we offer for folks using
the old system to migrate to the new system? What happens if somebody
accidentally tries to use both systems at the same time?

Ishan wrote:
> When I wrote "When one or more such nodes [with OVERSEER role] are live,
Solr guarantees that one of those nodes becomes the overseer.", I meant to
somewhat capture the current behaviour as the OVERSEER role performs today.
Do you see any inconsistency with this statement vs. what it does today?

This doesn't really address my concern around what happens if all of our
existing OVERSEER candidates are down. When at least one of them is up, the
overseer will go there, and that is good and expected. But what happens if
all of the overseer eligible nodes are down. Your comment, and the old
system, would imply that the overseer election goes to some other
unrelated, untagged node. I disagree with this implementation choice. This
sounds like something role specific to determine, but I would like to see
us be more strict about it. I don't want cores leaking out of my data
roles, I don't want query processing to leak out of my "query" nodes or
whatever. Overseer shouldn't be special in this regard.

Noble wrote:
> If we do that how do we know if xyz is a role or a node in the following
request?

You're absolutely correct, thanks for pointing this out. Let's leave it as
is.



On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

>
>
> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com> wrote:
>
>> Replying to the top post in this thread because there has been a lot of
>> discussion and I don't want to look like I'm continuing any of those
>> particular threads.
>>
>> I finally had time to sit down and think about this with the attention it
>> deserves and am generally happy with how the conversation has shaped the
>> current proposal.
>>
>> GOOD: I think using system properties to define node roles is fine and I
>> like that data is the default role when not defined. I think it is
>> important to hold on to the guarantee that an active overseer will land on
>> an overseer node role.
>> CHANGE REQUEST: I would like to see a migration path for folks using the
>> current OVERSEER role. I am not sure that something can be done
>> automatically since they need to now specify new properties at startup.
>> Maybe we need to include loud warnings or support both approaches for a
>> time?
>> CHANGE REQUEST: I do not like that if all of the overseer nodes fail,
>> then it is implied the overseer will go to one of the data nodes. The
>> specific wording in the SIP - "When one or more such nodes are live, Solr
>> guarantees that one of those nodes become the overseer." implies to me that
>> failover could go from overseer1 to overseer2 to overseerN to random node.
>> I feel like we need to have some recording that there were dedicated
>> overseer nodes and stop the cascading failure instead of churning through
>> our data nodes.
>>
>> CLARIFICATION: I am slightly confused by the proposed scope of
>> "coordinator" roles from a split query/indexing standpoint. I understand
>> that these are used as examples, but would like stronger language that new
>> roles should also go through their own SIP discussions.
>>
>> CLARIFICATION: I do not like that we are storing node liveness in two
>> different places now. We have the live nodes and we have the node roles
>> stored in two different places in zookeeper and it feels like this would
>> lead to race conditions or split brain or other hard to diagnose bugs when
>> those two lists don't agree with each other. This also feels like it
>> contradicts the "single source of truth" idea later stated in the proposal.
>> I see Gus's arguments for decoupling these and am not strongly opposed, I
>> just get a lurking feeling about it. Even if we don't do this, I would like
>> this called out explicitly in the alternative approaches section as
>> something that we considered and rejected, with details why,
>>
>> GOOD: The API looks pretty clear. I would like an additional call out
>> here that all operations are GET because nodes cannot be changed at runtime.
>> CLARIFICATION: How does this interact with the previous OVERSEER
>> preference role?
>> CHANGE REQUEST: An additional API to get the list of available roles for
>> a cluster. I _think_ this could be based on the version that the cluster is
>> running? Would be useful to be able to interrogate a cluster in the
>> future... we're seeing OOM issues on queries, can we add some query nodes?
>> When were they introduced? I don't know what path this API should exist at.
>>
>
> Added a *GET /api/cluster/roles/supported* API, updated the SIP document.
> Not sure if there's a better path that we could go for.
>
>
>> CLARIFICATION: Can we list the APIs to clearly show which parts are
>> string literals and which parts are meant to be substituted by the
>> operator? *GET **/api/cluster/roles/data *would become *GET **/api/cluster/roles/${rolename}
>> *in our SIP/documentation.
>> CHANGE REQUEST: I think *GET /api/cluster/roles/nodes/node1* should be *GET
>> /api/cluster/roles/${nodename}* dropping the intermediate "nodes"
>> CHANGE REQUEST: The ZK structure also might not need that intermediate
>> "nodes" node.
>>
>> CLARIFICATION: Should listing roles require some permissions? Maybe this
>> requirement is too fundamental to the operation of a cluster and everybody
>> would have to be able to do it.
>> CLARIFICATION: How do we expect SolrJ (and other clients) to treat roles?
>> Implementation detail that the servers will figure out? Or strict guidance
>> where the client needs to check where specific roles are before sending any
>> further communication to the server?
>> CLARIFICATION: What happens when a node gets a request that it can't
>> fulfil? An overseer node gets a query or an update. A data node gets a
>> collection creation request. Do they forward it on to an appropriate node,
>> or do they reject it? Should this be configurable? If not, then it seems
>> like lazy or poorly configured clients will defeat this isolation system
>> quite easily.
>>
>> GOOD: Testing the API is very important, yes.
>> CLARIFICATION: What does testing for how nodes behave when roles are
>> added mean? I thought we established that they are not dynamic.
>>
>>
>> Thanks,
>> Mike
>>
>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>> ichattopadhyaya@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Here's an SIP for introducing the concept of node roles:
>>> https://issues.apache.org/jira/browse/SOLR-15694
>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>
>>> We also wish to add first class support for Query nodes that are used to
>>> process user queries by forwarding to data nodes, merging/aggregating them
>>> and presenting to users. This concept exists as first class citizens in
>>> most other search engines. This is a chance for Solr to catch up.
>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>
>>> Regards,
>>> Ishan / Noble / Hitesh
>>>
>>

Re: First class support for node roles

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

On Thu, Dec 2, 2021 at 12:54 AM Mike Drob <md...@mdrob.com> wrote:

> Noble wrote:
> > We are not modifying the way the "overseer role" works today. We are
> just changing the definition and standardizing the configuration &
> discoverability
> Ishan wrote:
> > As of this SIP, we're not planning to modify the OVERSEER role (which
> currently stands for preferred overseer). We can take a stab at refactoring
> it later.
>
> Grouping these two comments together, since I think they are saying
> the same thing. I think this is part of my confusion. We have an old system
> that doesn't work the way we want the new system to work. There may be
> people already using the old system. What path do we offer for folks using
> the old system to migrate to the new system?
>

The old system only supported the OVERSEER role. The users can continue
using the old system (ADDROLE/REMOVEROLE commands), but they are
deprecated. Or, they can start their nodes with a sysprop with this new
roles implementation.


> What happens if somebody accidentally tries to use both systems at the
> same time?
>

Upon a node starting up, if a node has -Dsolr.node.roles=overseer,<..>
defined, it is registered as a preferred overseer exactly as per how the
ADDROLE behaves today. Someone can use REMOVEROLE api to remove the
overseer role at runtime (disrecommended), but when this node restarts
again, the sysprop will make it a preferred overseer again.


>
> Ishan wrote:
> > When I wrote "When one or more such nodes [with OVERSEER role] are
> live, Solr guarantees that one of those nodes becomes the overseer.", I
> meant to somewhat capture the current behaviour as the OVERSEER role performs
> today. Do you see any inconsistency with this statement vs. what it does
> today?
>
> This doesn't really address my concern around what happens if all of our
> existing OVERSEER candidates are down.
>

If all preferred overseer nodes are down, some other node becomes the
overseer. This is exactly as the OVERSEER role works today; we aren't
changing that behaviour at all.


> When at least one of them is up, the overseer will go there, and that is
> good and expected. But what happens if all of the overseer eligible nodes
> are down.
>

One of the other nodes will become the overseer, exactly as one would
expect the system to work today.


> Your comment, and the old system, would imply that the overseer election
> goes to some other unrelated, untagged node. I disagree with this
> implementation choice.
>

This choice has already been made, and we're not attempting to change that
behaviour in this SIP. We can discuss an overhaul of the OVERSEER role in a
separate SIP/JIRA/thread.


> This sounds like something role specific to determine, but I would like to
> see us be more strict about it. I don't want cores leaking out of my data
> roles, I don't want query processing to leak out of my "query" nodes or
> whatever. Overseer shouldn't be special in this regard.
>

I think it is very difficult to define such a concept upfront. Different
roles will have different ways of interpreting these aspects. For OVERSEER
role, one might want the functionality to be performed by non-OVERSEER role
nodes too. For a future QUERY role, one might want data nodes to serve that
role as well. For DATA role, one might not want any other node to host the
data. I think it would be better to use the ref guide documentation for new
or existing roles to clearly specify how the system will behave in such
circumstances.


>
> Noble wrote:
> > If we do that how do we know if xyz is a role or a node in the
> following request?
>
> You're absolutely correct, thanks for pointing this out. Let's leave it as
> is.
>
>
>
> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
>
>>
>>
>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com> wrote:
>>
>>> Replying to the top post in this thread because there has been a lot of
>>> discussion and I don't want to look like I'm continuing any of those
>>> particular threads.
>>>
>>> I finally had time to sit down and think about this with the attention
>>> it deserves and am generally happy with how the conversation has shaped the
>>> current proposal.
>>>
>>> GOOD: I think using system properties to define node roles is fine and I
>>> like that data is the default role when not defined. I think it is
>>> important to hold on to the guarantee that an active overseer will land on
>>> an overseer node role.
>>> CHANGE REQUEST: I would like to see a migration path for folks using the
>>> current OVERSEER role. I am not sure that something can be done
>>> automatically since they need to now specify new properties at startup.
>>> Maybe we need to include loud warnings or support both approaches for a
>>> time?
>>> CHANGE REQUEST: I do not like that if all of the overseer nodes fail,
>>> then it is implied the overseer will go to one of the data nodes. The
>>> specific wording in the SIP - "When one or more such nodes are live, Solr
>>> guarantees that one of those nodes become the overseer." implies to me that
>>> failover could go from overseer1 to overseer2 to overseerN to random node.
>>> I feel like we need to have some recording that there were dedicated
>>> overseer nodes and stop the cascading failure instead of churning through
>>> our data nodes.
>>>
>>> CLARIFICATION: I am slightly confused by the proposed scope of
>>> "coordinator" roles from a split query/indexing standpoint. I understand
>>> that these are used as examples, but would like stronger language that new
>>> roles should also go through their own SIP discussions.
>>>
>>> CLARIFICATION: I do not like that we are storing node liveness in two
>>> different places now. We have the live nodes and we have the node roles
>>> stored in two different places in zookeeper and it feels like this would
>>> lead to race conditions or split brain or other hard to diagnose bugs when
>>> those two lists don't agree with each other. This also feels like it
>>> contradicts the "single source of truth" idea later stated in the proposal.
>>> I see Gus's arguments for decoupling these and am not strongly opposed, I
>>> just get a lurking feeling about it. Even if we don't do this, I would like
>>> this called out explicitly in the alternative approaches section as
>>> something that we considered and rejected, with details why,
>>>
>>> GOOD: The API looks pretty clear. I would like an additional call out
>>> here that all operations are GET because nodes cannot be changed at runtime.
>>> CLARIFICATION: How does this interact with the previous OVERSEER
>>> preference role?
>>> CHANGE REQUEST: An additional API to get the list of available roles for
>>> a cluster. I _think_ this could be based on the version that the cluster is
>>> running? Would be useful to be able to interrogate a cluster in the
>>> future... we're seeing OOM issues on queries, can we add some query nodes?
>>> When were they introduced? I don't know what path this API should exist at.
>>>
>>
>> Added a *GET /api/cluster/roles/supported* API, updated the SIP
>> document. Not sure if there's a better path that we could go for.
>>
>>
>>> CLARIFICATION: Can we list the APIs to clearly show which parts are
>>> string literals and which parts are meant to be substituted by the
>>> operator? *GET **/api/cluster/roles/data *would become *GET **/api/cluster/roles/${rolename}
>>> *in our SIP/documentation.
>>> CHANGE REQUEST: I think *GET /api/cluster/roles/nodes/node1* should be *GET
>>> /api/cluster/roles/${nodename}* dropping the intermediate "nodes"
>>> CHANGE REQUEST: The ZK structure also might not need that intermediate
>>> "nodes" node.
>>>
>>> CLARIFICATION: Should listing roles require some permissions? Maybe this
>>> requirement is too fundamental to the operation of a cluster and everybody
>>> would have to be able to do it.
>>> CLARIFICATION: How do we expect SolrJ (and other clients) to treat
>>> roles? Implementation detail that the servers will figure out? Or strict
>>> guidance where the client needs to check where specific roles are before
>>> sending any further communication to the server?
>>> CLARIFICATION: What happens when a node gets a request that it can't
>>> fulfil? An overseer node gets a query or an update. A data node gets a
>>> collection creation request. Do they forward it on to an appropriate node,
>>> or do they reject it? Should this be configurable? If not, then it seems
>>> like lazy or poorly configured clients will defeat this isolation system
>>> quite easily.
>>>
>>> GOOD: Testing the API is very important, yes.
>>> CLARIFICATION: What does testing for how nodes behave when roles are
>>> added mean? I thought we established that they are not dynamic.
>>>
>>>
>>> Thanks,
>>> Mike
>>>
>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>> ichattopadhyaya@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Here's an SIP for introducing the concept of node roles:
>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>
>>>> We also wish to add first class support for Query nodes that are used
>>>> to process user queries by forwarding to data nodes, merging/aggregating
>>>> them and presenting to users. This concept exists as first class citizens
>>>> in most other search engines. This is a chance for Solr to catch up.
>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>
>>>> Regards,
>>>> Ishan / Noble / Hitesh
>>>>
>>>

Re: First class support for node roles

Posted by Mark Miller <ma...@gmail.com>.

> I'm very strongly in favor of not letting users design a system in which the cluster can be "live" without an overseer.
> I understand that the overseer can be taxing to the cluster,

That is really just an implementation choice. Bluntly, It doesn't do anything smartwatch could not handle on a huge cluster.

> but honestly, what is the point of having an untaxed cluster that doesn't have an overseer?

If the choice of implementation can tax a modern beefy machine with a smartwatch feasible load and that implementation can jump to every node ... a node that stays standing can serve a query.

Which is not entirely intended to counter your point. I think we would agree on most things at rubber meets road O'clock if we were pair developing something together. I would just offer that the same person can easily come down on both sides of any design discussion in many of these situations.

In many cases, I might approach such a thing from the perspective of philosophy and idealism or perhaps intention and/or expectation. In a different wind or barometric pressure, I might instead follow a more practical line, feeling cards have been dealt, which is quite likely where I have to play from.

Its unfortunate often, it usually means all kind of punting to the user, but thats very Apache. Herding cats means I get to say "honestly, what is the point" as much as id like, a favorite, but I don't often get to see it cleave through bone like it should. Honestly, what is the point of X11 still pretending its some client / server design of value and elegance? In another 5 or 10 years tacked onto the last 5 or 10, Wayland is gonna laugh and tell you there isnt one, hasnt been one for eons, and doesn't the damn thing look silly and restrictive and broken next to his sensible glory now.

[Mark Miller - Chat @ Spike](https://spikenow.com/r/a/?ref=spike-organic-signature&_ts=1bnzlq) [1bnzlq]

On December 1, 2021 at 19:50 GMT, Houston Putman <ho...@gmail.com> wrote:

This doesn't really address my concern around what happens if all of our existing OVERSEER candidates are down. When at least one of them is up, the overseer will go there, and that is good and expected. But what happens if all of the overseer eligible nodes are down. Your comment, and the old system, would imply that the overseer election goes to some other unrelated, untagged node. I disagree with this implementation choice. This sounds like something role specific to determine, but I would like to see us be more strict about it. I don't want cores leaking out of my data roles, I don't want query processing to leak out of my "query" nodes or whatever. Overseer shouldn't be special in this regard.

I'm very strongly in favor of not letting users design a system in which the cluster can be "live" without an overseer. I understand that the overseer can be taxing to the cluster, but honestly what is the point of having an untaxed cluster that doesn't have an overseer? I can see arguments for the other roles to be stricter about this, but there are also a lot of users who wouldn't want those to be strict either (like "query" nodes).

Maybe we just put in stronger guarantees that if a non-overseer role node HAS to be selected to become overseer, it will try to migrate the overseer job to a node with the overseer role whenever one becomes live.

So maybe we don't have special rules per role, but instead roles can either be defined as "Strict" or "Loose" (better names likely exist), and the roles come with a default (Overseer -> Loose, Data -> Strict, Query -> Loose, etc.). And it is up to each role to define how to behave when running in LOOSE mode and a non-role node is used then a role node comes online (like the overseer example given above).

With the Strict/Loose option and sensible defaults, users cannot trip themselves up by default, but the option is there for people to tinker and have an iron grip over their cluster.

On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> wrote:

Noble wrote:
> We are not modifying the way the "overseer role" works today. We are just changing the definition and standardizing the configuration & discoverability
Ishan wrote:
> As of this SIP, we're not planning to modify the OVERSEER role (which currently stands for preferred overseer). We can take a stab at refactoring it later.

Grouping these two comments together, since I think they are saying the same thing. I think this is part of my confusion. We have an old system that doesn't work the way we want the new system to work. There may be people already using the old system. What path do we offer for folks using the old system to migrate to the new system? What happens if somebody accidentally tries to use both systems at the same time?

Ishan wrote:
> When I wrote "When one or more such nodes [with OVERSEER role] are live, Solr guarantees that one of those nodes becomes the overseer.", I meant to somewhat capture the current behaviour as the OVERSEER role performs today. Do you see any inconsistency with this statement vs. what it does today?

Noble wrote:
> If we do that how do we know if xyz is a role or a node in the following request?

You're absolutely correct, thanks for pointing this out. Let's leave it as is.

On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <ic...@gmail.com> wrote:

On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com> wrote:

Replying to the top post in this thread because there has been a lot of discussion and I don't want to look like I'm continuing any of those particular threads.

I finally had time to sit down and think about this with the attention it deserves and am generally happy with how the conversation has shaped the current proposal.

GOOD: I think using system properties to define node roles is fine and I like that data is the default role when not defined. I think it is important to hold on to the guarantee that an active overseer will land on an overseer node role.
CHANGE REQUEST: I would like to see a migration path for folks using the current OVERSEER role. I am not sure that something can be done automatically since they need to now specify new properties at startup. Maybe we need to include loud warnings or support both approaches for a time?
CHANGE REQUEST: I do not like that if all of the overseer nodes fail, then it is implied the overseer will go to one of the data nodes. The specific wording in the SIP - "When one or more such nodes are live, Solr guarantees that one of those nodes become the overseer." implies to me that failover could go from overseer1 to overseer2 to overseerN to random node. I feel like we need to have some recording that there were dedicated overseer nodes and stop the cascading failure instead of churning through our data nodes.

CLARIFICATION: I am slightly confused by the proposed scope of "coordinator" roles from a split query/indexing standpoint. I understand that these are used as examples, but would like stronger language that new roles should also go through their own SIP discussions.

CLARIFICATION: I do not like that we are storing node liveness in two different places now. We have the live nodes and we have the node roles stored in two different places in zookeeper and it feels like this would lead to race conditions or split brain or other hard to diagnose bugs when those two lists don't agree with each other. This also feels like it contradicts the "single source of truth" idea later stated in the proposal. I see Gus's arguments for decoupling these and am not strongly opposed, I just get a lurking feeling about it. Even if we don't do this, I would like this called out explicitly in the alternative approaches section as something that we considered and rejected, with details why,

GOOD: The API looks pretty clear. I would like an additional call out here that all operations are GET because nodes cannot be changed at runtime.
CLARIFICATION: How does this interact with the previous OVERSEER preference role?
CHANGE REQUEST: An additional API to get the list of available roles for a cluster. I _think_ this could be based on the version that the cluster is running? Would be useful to be able to interrogate a cluster in the future... we're seeing OOM issues on queries, can we add some query nodes? When were they introduced? I don't know what path this API should exist at.

Added a GET /api/cluster/roles/supported API, updated the SIP document. Not sure if there's a better path that we could go for.

CLARIFICATION: Can we list the APIs to clearly show which parts are string literals and which parts are meant to be substituted by the operator? GET /api/cluster/roles/data would become GET /api/cluster/roles/${rolename} in our SIP/documentation.
CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1 should be GET /api/cluster/roles/${nodename} dropping the intermediate "nodes"
CHANGE REQUEST: The ZK structure also might not need that intermediate "nodes" node.

CLARIFICATION: Should listing roles require some permissions? Maybe this requirement is too fundamental to the operation of a cluster and everybody would have to be able to do it.
CLARIFICATION: How do we expect SolrJ (and other clients) to treat roles? Implementation detail that the servers will figure out? Or strict guidance where the client needs to check where specific roles are before sending any further communication to the server?
CLARIFICATION: What happens when a node gets a request that it can't fulfil? An overseer node gets a query or an update. A data node gets a collection creation request. Do they forward it on to an appropriate node, or do they reject it? Should this be configurable? If not, then it seems like lazy or poorly configured clients will defeat this isolation system quite easily.

GOOD: Testing the API is very important, yes.
CLARIFICATION: What does testing for how nodes behave when roles are added mean? I thought we established that they are not dynamic.

Thanks,
Mike

On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <ic...@gmail.com> wrote:

Hi,

Here's an SIP for introducing the concept of node roles:
https://issues.apache.org/jira/browse/SOLR-15694
https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles

We also wish to add first class support for Query nodes that are used to process user queries by forwarding to data nodes, merging/aggregating them and presenting to users. This concept exists as first class citizens in most other search engines. This is a chance for Solr to catch up.

https://issues.apache.org/jira/browse/SOLR-15715

Regards,
Ishan / Noble / Hitesh

Re: First class support for node roles

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

On Thu, Dec 2, 2021 at 1:20 AM Houston Putman <ho...@gmail.com>
wrote:

> This doesn't really address my concern around what happens if all of our
>> existing OVERSEER candidates are down. When at least one of them is up, the
>> overseer will go there, and that is good and expected. But what happens if
>> all of the overseer eligible nodes are down. Your comment, and the old
>> system, would imply that the overseer election goes to some other
>> unrelated, untagged node. I disagree with this implementation choice. This
>> sounds like something role specific to determine, but I would like to see
>> us be more strict about it. I don't want cores leaking out of my data
>> roles, I don't want query processing to leak out of my "query" nodes or
>> whatever. Overseer shouldn't be special in this regard.
>>
>
> I'm very strongly in favor of not letting users design a system in which
> the cluster can be "live" without an overseer. I understand that the
> overseer can be taxing to the cluster, but honestly what is the point of
> having an untaxed cluster that doesn't have an overseer? I can see
> arguments for the other roles to be stricter about this, but there are also
> a lot of users who wouldn't want those to be strict either (like "query"
> nodes).
>

+1, there must never be a system that doesn't have an active overseer.


>
> Maybe we just put in stronger guarantees that if a non-overseer role node
> HAS to be selected to become overseer, it will try to migrate the overseer
> job to a node with the overseer role whenever one becomes live.
>
> So maybe we don't have special rules per role, but instead roles can
> either be defined as "Strict" or "Loose" (better names likely exist), and
> the roles come with a default (Overseer -> Loose, Data -> Strict, Query ->
> Loose, etc.). And it is up to each role to define how to behave when
> running in LOOSE mode and a non-role node is used then a role node comes
> online (like the overseer example given above).
>

I am wary of introducing such modes as LOOSE/STRICT in this SIP, as there
would be various concerns related to transitions (from STRICT to LOOSE, or
back) that are difficult to predict and anticipate for all future roles. If
such flexibility is needed, beyond what a role behaves like by default,
then we can deal with it on a per role basis, or as a future enhancement to
the roles framework.


>
> With the Strict/Loose option and sensible defaults, users cannot trip
> themselves up by default, but the option is there for people to tinker and
> have an iron grip over their cluster.
>

+1 to sensible defaults so users don't trip themselves. The option to
tinker for tighter grip can be tackled later, either on a per role basis or
as a generic concept later.


>
> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> wrote:
>
>> Noble wrote:
>> > We are not modifying the way the "overseer role" works today. We are
>> just changing the definition and standardizing the configuration &
>> discoverability
>> Ishan wrote:
>> > As of this SIP, we're not planning to modify the OVERSEER role (which
>> currently stands for preferred overseer). We can take a stab at refactoring
>> it later.
>>
>> Grouping these two comments together, since I think they are saying
>> the same thing. I think this is part of my confusion. We have an old system
>> that doesn't work the way we want the new system to work. There may be
>> people already using the old system. What path do we offer for folks using
>> the old system to migrate to the new system? What happens if somebody
>> accidentally tries to use both systems at the same time?
>>
>> Ishan wrote:
>> > When I wrote "When one or more such nodes [with OVERSEER role] are
>> live, Solr guarantees that one of those nodes becomes the overseer.", I
>> meant to somewhat capture the current behaviour as the OVERSEER role performs
>> today. Do you see any inconsistency with this statement vs. what it does
>> today?
>>
>> This doesn't really address my concern around what happens if all of our
>> existing OVERSEER candidates are down. When at least one of them is up, the
>> overseer will go there, and that is good and expected. But what happens if
>> all of the overseer eligible nodes are down. Your comment, and the old
>> system, would imply that the overseer election goes to some other
>> unrelated, untagged node. I disagree with this implementation choice. This
>> sounds like something role specific to determine, but I would like to see
>> us be more strict about it. I don't want cores leaking out of my data
>> roles, I don't want query processing to leak out of my "query" nodes or
>> whatever. Overseer shouldn't be special in this regard.
>>
>> Noble wrote:
>> > If we do that how do we know if xyz is a role or a node in the
>> following request?
>>
>> You're absolutely correct, thanks for pointing this out. Let's leave it
>> as is.
>>
>>
>>
>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>> ichattopadhyaya@gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com> wrote:
>>>
>>>> Replying to the top post in this thread because there has been a lot of
>>>> discussion and I don't want to look like I'm continuing any of those
>>>> particular threads.
>>>>
>>>> I finally had time to sit down and think about this with the attention
>>>> it deserves and am generally happy with how the conversation has shaped the
>>>> current proposal.
>>>>
>>>> GOOD: I think using system properties to define node roles is fine and
>>>> I like that data is the default role when not defined. I think it is
>>>> important to hold on to the guarantee that an active overseer will land on
>>>> an overseer node role.
>>>> CHANGE REQUEST: I would like to see a migration path for folks using
>>>> the current OVERSEER role. I am not sure that something can be done
>>>> automatically since they need to now specify new properties at startup.
>>>> Maybe we need to include loud warnings or support both approaches for a
>>>> time?
>>>> CHANGE REQUEST: I do not like that if all of the overseer nodes fail,
>>>> then it is implied the overseer will go to one of the data nodes. The
>>>> specific wording in the SIP - "When one or more such nodes are live, Solr
>>>> guarantees that one of those nodes become the overseer." implies to me that
>>>> failover could go from overseer1 to overseer2 to overseerN to random node.
>>>> I feel like we need to have some recording that there were dedicated
>>>> overseer nodes and stop the cascading failure instead of churning through
>>>> our data nodes.
>>>>
>>>> CLARIFICATION: I am slightly confused by the proposed scope of
>>>> "coordinator" roles from a split query/indexing standpoint. I understand
>>>> that these are used as examples, but would like stronger language that new
>>>> roles should also go through their own SIP discussions.
>>>>
>>>> CLARIFICATION: I do not like that we are storing node liveness in two
>>>> different places now. We have the live nodes and we have the node roles
>>>> stored in two different places in zookeeper and it feels like this would
>>>> lead to race conditions or split brain or other hard to diagnose bugs when
>>>> those two lists don't agree with each other. This also feels like it
>>>> contradicts the "single source of truth" idea later stated in the proposal.
>>>> I see Gus's arguments for decoupling these and am not strongly opposed, I
>>>> just get a lurking feeling about it. Even if we don't do this, I would like
>>>> this called out explicitly in the alternative approaches section as
>>>> something that we considered and rejected, with details why,
>>>>
>>>> GOOD: The API looks pretty clear. I would like an additional call out
>>>> here that all operations are GET because nodes cannot be changed at runtime.
>>>> CLARIFICATION: How does this interact with the previous OVERSEER
>>>> preference role?
>>>> CHANGE REQUEST: An additional API to get the list of available roles
>>>> for a cluster. I _think_ this could be based on the version that the
>>>> cluster is running? Would be useful to be able to interrogate a cluster in
>>>> the future... we're seeing OOM issues on queries, can we add some query
>>>> nodes? When were they introduced? I don't know what path this API should
>>>> exist at.
>>>>
>>>
>>> Added a *GET /api/cluster/roles/supported* API, updated the SIP
>>> document. Not sure if there's a better path that we could go for.
>>>
>>>
>>>> CLARIFICATION: Can we list the APIs to clearly show which parts are
>>>> string literals and which parts are meant to be substituted by the
>>>> operator? *GET **/api/cluster/roles/data *would become *GET **/api/cluster/roles/${rolename}
>>>> *in our SIP/documentation.
>>>> CHANGE REQUEST: I think *GET /api/cluster/roles/nodes/node1* should be *GET
>>>> /api/cluster/roles/${nodename}* dropping the intermediate "nodes"
>>>> CHANGE REQUEST: The ZK structure also might not need that intermediate
>>>> "nodes" node.
>>>>
>>>> CLARIFICATION: Should listing roles require some permissions? Maybe
>>>> this requirement is too fundamental to the operation of a cluster and
>>>> everybody would have to be able to do it.
>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to treat
>>>> roles? Implementation detail that the servers will figure out? Or strict
>>>> guidance where the client needs to check where specific roles are before
>>>> sending any further communication to the server?
>>>> CLARIFICATION: What happens when a node gets a request that it can't
>>>> fulfil? An overseer node gets a query or an update. A data node gets a
>>>> collection creation request. Do they forward it on to an appropriate node,
>>>> or do they reject it? Should this be configurable? If not, then it seems
>>>> like lazy or poorly configured clients will defeat this isolation system
>>>> quite easily.
>>>>
>>>> GOOD: Testing the API is very important, yes.
>>>> CLARIFICATION: What does testing for how nodes behave when roles are
>>>> added mean? I thought we established that they are not dynamic.
>>>>
>>>>
>>>> Thanks,
>>>> Mike
>>>>
>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>>> ichattopadhyaya@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Here's an SIP for introducing the concept of node roles:
>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>
>>>>> We also wish to add first class support for Query nodes that are used
>>>>> to process user queries by forwarding to data nodes, merging/aggregating
>>>>> them and presenting to users. This concept exists as first class citizens
>>>>> in most other search engines. This is a chance for Solr to catch up.
>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>
>>>>> Regards,
>>>>> Ishan / Noble / Hitesh
>>>>>
>>>>

Re: First class support for node roles

Posted by Noble Paul <no...@gmail.com>.

typo


   -


On Sun, Dec 5, 2021 at 2:37 PM Noble Paul <no...@gmail.com> wrote:

> I recommend the following format for the role spec
>
> roles=<role-name>:<role-value>
>
> each role will have an enum of allowed values and a default value
>
>
>    - role name: *data*
>       - values: [*on*, *off]*
>       - default: *allowed*
>
>

   - default: *on*


>    - role name: *overseer*
>       - values: [*allowed*, *disallowed*, *preferred]*
>       - default : *allowed*
>    - role name:* coordinator*
>       - values : [*on*, *off]*
>       - default: *off*
>
>
> examples
> roles=data:on,overseer:allowed (This is redundant because it uses all the
> default values. If a node is started without any roles value this is the
> default behavior)
> roles=data:off,overseer:preferred ( do not allow data, join overseer
> election at head)
> roles=coordinator:on,data:on (role as coordinator, but allow data, it's
> same as roles=coordinator:on)
> roles=coordinator:on,data:off (role as coordinator, disallow data)
>
>
> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <il...@gmail.com> wrote:
>
>> If we go with no negative node roles and overseer node role is not strict
>> (i.e. it’s a "preferred overseer"), then one would need to define a second
>> node role "no_overseer" to explicitly exclude a node from ever becoming
>> overseer (which I think is a useful feature until we switch the cluster
>> default to not using the overseer), plus the implementation of these two
>> node roles will obviously be coupled (and what if a node has both defined?).
>>
>> I prefer strict node roles.
>> Maybe we could have node roles with [optional] parameters to let the node
>> role implementation decide ?
>> The overseer node role for example could have one of 3 values defined for
>> each node: “preferred” (default, equivalent to the existing overseer role),
>> "accepted" (equivalent to currently not defining the overseer role) and
>> "no_way" (does not exist today).
>>
>> This could be useful in other contexts. A node role “data” could be
>> “fast” or “slow” depending on type of local persistent storage for example…
>>
>> Ilan
>>
>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com> wrote:
>>
>>> I really don't think we should have types of roles. Not
>>> negative/positive and not strict/non-strict. You have a role or you don't.
>>> What that means is up to the code implementing the role.
>>>
>>> Roles should be free to configure a preference order (binary, or n-ary
>>> or whatever, strict or loose), prohibit behavior, or enable behavior. In
>>> this SIP I feel we should focus on How to identify what node has what role,
>>> How to designate what roles a node has via config/params, and the API's for
>>> interacting with roles.
>>>
>>> We should for example be able to support roles such as
>>>
>>> PREFERRED_OVERSEER
>>> DATA
>>> NO_ROUTED_ALIAS  (just an example, not something I mean to suggest)
>>>
>>> Details about role implementation should probably be discussed in a
>>> thread about that role.  Obviously we should think about the name carefully
>>> to leave options open should we want to enhance things later so maybe
>>>
>>> OVERSEER_PREF  or just  OVERSEER
>>>
>>> would be better since it merely reades that the node implements some
>>> sort of preference or config regarding overseer... but all this can be
>>> decided on a per role basis
>>>
>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <no...@gmail.com> wrote:
>>>
>>>> Negative roles have a place
>>>>
>>>> Example is overseer
>>>>
>>>> There are 3 possible choices for that role
>>>>
>>>> a) preferred: always be in front of the election queue
>>>> b) on: not preferred, but can be an overseer if no preferred overseer
>>>> nodes are available
>>>> c) off: never become an overseer
>>>>
>>>> Today we only have options 'a' and 'b' . In a future ticket, we may
>>>> implement C
>>>>
>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote:
>>>>
>>>>> Negative roles add a lot of complexity, I would really want to stay
>>>>> away from them. That’s why I want strict roles up front. It’s maybe ok to
>>>>> push this decision out, but it also seems like the sort of thing we should
>>>>> consider at the start.
>>>>>
>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <no...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes. Negative roles is not a bad idea. If I start a node for
>>>>>> machine learning purposes, I wouldn't want that node to ever participate in
>>>>>> overseer election
>>>>>>
>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> If we have non strict roles (like overseer), then it does make sense
>>>>>>> to have negative roles.
>>>>>>> That way I can define which are the two nodes that I'd prefer the
>>>>>>> overseer to run on, and a few other nodes on which it should
>>>>>>> definitely never run for various reasons. And in case these
>>>>>>> "!overseer" are the only nodes left in the cluster, let the cluster
>>>>>>> fail the same way it would if there were no data nodes available.
>>>>>>>
>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
>>>>>>> houstonputman@gmail.com> wrote:
>>>>>>> >>>
>>>>>>> >>> With the Strict/Loose option and sensible defaults, users cannot
>>>>>>> trip themselves up by default, but the option is there for people to tinker
>>>>>>> and have an iron grip over their cluster.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> +1 to sensible defaults so users don't trip themselves. The
>>>>>>> option to tinker for tighter grip can be tackled later, either on a per
>>>>>>> role basis or as a generic concept later.
>>>>>>> >
>>>>>>> >
>>>>>>> > +1 - Can definitely be added later if we so desire, not needed for
>>>>>>> this SIP
>>>>>>> >
>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>> I think the key  is to let the roles have full control of the
>>>>>>> implications of having/not having that role. No need for even a
>>>>>>> strict/loose designation. The question of do you have the role is yes/no
>>>>>>> with no logic to guess if the role is implied or not, The question of will
>>>>>>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>>>>>> >>>
>>>>>>> >>> Once you figure out who has a role (or not) what that means is
>>>>>>> up to the role code.
>>>>>>> >>>
>>>>>>> >>> Corollary: we don't have to change the way overseer works in
>>>>>>> this SIP. We can rework it or not as we see fit separately.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> +1
>>>>>>> >>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> Only thing we need to do is find a wording that makes the above
>>>>>>> clear on first read through the SIP :)
>>>>>>> >>>
>>>>>>> >>> -Gus
>>>>>>> >>>
>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>>>>>> houstonputman@gmail.com> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> This doesn't really address my concern around what happens if
>>>>>>> all of our existing OVERSEER candidates are down. When at least one of them
>>>>>>> is up, the overseer will go there, and that is good and expected. But what
>>>>>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>>>>>> the old system, would imply that the overseer election goes to some other
>>>>>>> unrelated, untagged node. I disagree with this implementation choice. This
>>>>>>> sounds like something role specific to determine, but I would like to see
>>>>>>> us be more strict about it. I don't want cores leaking out of my data
>>>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> I'm very strongly in favor of not letting users design a system
>>>>>>> in which the cluster can be "live" without an overseer. I understand that
>>>>>>> the overseer can be taxing to the cluster, but honestly what is the point
>>>>>>> of having an untaxed cluster that doesn't have an overseer? I can see
>>>>>>> arguments for the other roles to be stricter about this, but there are also
>>>>>>> a lot of users who wouldn't want those to be strict either (like "query"
>>>>>>> nodes).
>>>>>>> >>>>
>>>>>>> >>>> Maybe we just put in stronger guarantees that if a non-overseer
>>>>>>> role node HAS to be selected to become overseer, it will try to migrate the
>>>>>>> overseer job to a node with the overseer role whenever one becomes live.
>>>>>>> >>>>
>>>>>>> >>>> So maybe we don't have special rules per role, but instead
>>>>>>> roles can either be defined as "Strict" or "Loose" (better names likely
>>>>>>> exist), and the roles come with a default (Overseer -> Loose, Data ->
>>>>>>> Strict, Query -> Loose, etc.). And it is up to each role to define how to
>>>>>>> behave when running in LOOSE mode and a non-role node is used then a role
>>>>>>> node comes online (like the overseer example given above).
>>>>>>> >>>>
>>>>>>> >>>> With the Strict/Loose option and sensible defaults, users
>>>>>>> cannot trip themselves up by default, but the option is there for people to
>>>>>>> tinker and have an iron grip over their cluster.
>>>>>>> >>>>
>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com>
>>>>>>> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> Noble wrote:
>>>>>>> >>>>> > We are not modifying the way the "overseer role" works
>>>>>>> today. We are just changing the definition and standardizing the
>>>>>>> configuration & discoverability
>>>>>>> >>>>> Ishan wrote:
>>>>>>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER
>>>>>>> role (which currently stands for preferred overseer). We can take a stab at
>>>>>>> refactoring it later.
>>>>>>> >>>>>
>>>>>>> >>>>> Grouping these two comments together, since I think they are
>>>>>>> saying the same thing. I think this is part of my confusion. We have an old
>>>>>>> system that doesn't work the way we want the new system to work. There may
>>>>>>> be people already using the old system. What path do we offer for folks
>>>>>>> using the old system to migrate to the new system? What happens if somebody
>>>>>>> accidentally tries to use both systems at the same time?
>>>>>>> >>>>>
>>>>>>> >>>>> Ishan wrote:
>>>>>>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER
>>>>>>> role] are live, Solr guarantees that one of those nodes becomes the
>>>>>>> overseer.", I meant to somewhat capture the current behaviour as the
>>>>>>> OVERSEER role performs today. Do you see any inconsistency with this
>>>>>>> statement vs. what it does today?
>>>>>>> >>>>>
>>>>>>> >>>>> This doesn't really address my concern around what happens if
>>>>>>> all of our existing OVERSEER candidates are down. When at least one of them
>>>>>>> is up, the overseer will go there, and that is good and expected. But what
>>>>>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>>>>>> the old system, would imply that the overseer election goes to some other
>>>>>>> unrelated, untagged node. I disagree with this implementation choice. This
>>>>>>> sounds like something role specific to determine, but I would like to see
>>>>>>> us be more strict about it. I don't want cores leaking out of my data
>>>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>>>> >>>>>
>>>>>>> >>>>> Noble wrote:
>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a node in
>>>>>>> the following request?
>>>>>>> >>>>>
>>>>>>> >>>>> You're absolutely correct, thanks for pointing this out. Let's
>>>>>>> leave it as is.
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com>
>>>>>>> wrote:
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Replying to the top post in this thread because there has
>>>>>>> been a lot of discussion and I don't want to look like I'm continuing any
>>>>>>> of those particular threads.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> I finally had time to sit down and think about this with the
>>>>>>> attention it deserves and am generally happy with how the conversation has
>>>>>>> shaped the current proposal.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> GOOD: I think using system properties to define node roles
>>>>>>> is fine and I like that data is the default role when not defined. I think
>>>>>>> it is important to hold on to the guarantee that an active overseer will
>>>>>>> land on an overseer node role.
>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for
>>>>>>> folks using the current OVERSEER role. I am not sure that something can be
>>>>>>> done automatically since they need to now specify new properties at
>>>>>>> startup. Maybe we need to include loud warnings or support both approaches
>>>>>>> for a time?
>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer
>>>>>>> nodes fail, then it is implied the overseer will go to one of the data
>>>>>>> nodes. The specific wording in the SIP - "When one or more such nodes are
>>>>>>> live, Solr guarantees that one of those nodes become the overseer." implies
>>>>>>> to me that failover could go from overseer1 to overseer2 to overseerN to
>>>>>>> random node. I feel like we need to have some recording that there were
>>>>>>> dedicated overseer nodes and stop the cascading failure instead of churning
>>>>>>> through our data nodes.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed scope
>>>>>>> of "coordinator" roles from a split query/indexing standpoint. I understand
>>>>>>> that these are used as examples, but would like stronger language that new
>>>>>>> roles should also go through their own SIP discussions.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node
>>>>>>> liveness in two different places now. We have the live nodes and we have
>>>>>>> the node roles stored in two different places in zookeeper and it feels
>>>>>>> like this would lead to race conditions or split brain or other hard to
>>>>>>> diagnose bugs when those two lists don't agree with each other. This also
>>>>>>> feels like it contradicts the "single source of truth" idea later stated in
>>>>>>> the proposal. I see Gus's arguments for decoupling these and am not
>>>>>>> strongly opposed, I just get a lurking feeling about it. Even if we don't
>>>>>>> do this, I would like this called out explicitly in the alternative
>>>>>>> approaches section as something that we considered and rejected, with
>>>>>>> details why,
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an additional
>>>>>>> call out here that all operations are GET because nodes cannot be changed
>>>>>>> at runtime.
>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the previous
>>>>>>> OVERSEER preference role?
>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of
>>>>>>> available roles for a cluster. I _think_ this could be based on the version
>>>>>>> that the cluster is running? Would be useful to be able to interrogate a
>>>>>>> cluster in the future... we're seeing OOM issues on queries, can we add
>>>>>>> some query nodes? When were they introduced? I don't know what path this
>>>>>>> API should exist at.
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the SIP
>>>>>>> document. Not sure if there's a better path that we could go for.
>>>>>>> >>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which
>>>>>>> parts are string literals and which parts are meant to be substituted by
>>>>>>> the operator? GET /api/cluster/roles/data would become GET
>>>>>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>>>>>> >>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1
>>>>>>> should be GET /api/cluster/roles/${nodename} dropping the intermediate
>>>>>>> "nodes"
>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that
>>>>>>> intermediate "nodes" node.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some
>>>>>>> permissions? Maybe this requirement is too fundamental to the operation of
>>>>>>> a cluster and everybody would have to be able to do it.
>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to
>>>>>>> treat roles? Implementation detail that the servers will figure out? Or
>>>>>>> strict guidance where the client needs to check where specific roles are
>>>>>>> before sending any further communication to the server?
>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request that
>>>>>>> it can't fulfil? An overseer node gets a query or an update. A data node
>>>>>>> gets a collection creation request. Do they forward it on to an appropriate
>>>>>>> node, or do they reject it? Should this be configurable? If not, then it
>>>>>>> seems like lazy or poorly configured clients will defeat this isolation
>>>>>>> system quite easily.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave when
>>>>>>> roles are added mean? I thought we established that they are not dynamic.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Thanks,
>>>>>>> >>>>>>> Mike
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Hi,
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>>> >>>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> We also wish to add first class support for Query nodes
>>>>>>> that are used to process user queries by forwarding to data nodes,
>>>>>>> merging/aggregating them and presenting to users. This concept exists as
>>>>>>> first class citizens in most other search engines. This is a chance for
>>>>>>> Solr to catch up.
>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Regards,
>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>> http://www.needhamsoftware.com (work)
>>>>>>> >>> http://www.the111shift.com (play)
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>>>
>>>>>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>
>
> --
> -----------------------------------------------------
> Noble Paul
>


-- 
-----------------------------------------------------
Noble Paul

Re: First class support for node roles

Posted by Noble Paul <no...@gmail.com>.

On Mon, Dec 6, 2021, 10:21 AM Ilan Ginzburg <il...@gmail.com> wrote:

> If we go with optional role params, we need two defaults:
> 1. the param value to use when the role is specified without a parameter,
> and
> 2. the param value to use for the role on a node for which the role is
> not specified at all.
>
> I don't know how to sensibly name these defaults, but the actual
> values would be:
> overseer: default1=preferred, default2=allowed
> data: default1=on, default2=on
> coordinator: default1=on, default2=off
>
> If we do not allow specifying a role without a parameter, then
> default1 does not exist and the example Noble posted earlier covers
> us. But simple roles will be easier to use without parameters (and the
> transition from existing overseer role would be trivial).
>

For sake of simplicity we will always force roles to have explicit values.

>
> On Sun, Dec 5, 2021 at 7:17 AM Ishan Chattopadhyaya
> <ic...@gmail.com> wrote:
> >
> > I'm +1 on this. It "looks" complicated at first, but simplifies all
> headaches going forward.
> >
> > On Sun, Dec 5, 2021 at 11:46 AM Noble Paul <no...@gmail.com> wrote:
> >>
> >> I shall update the SIP proposal if we have a consensus on this
> configuration
> >>
> >> On Sun, Dec 5, 2021 at 4:58 PM Noble Paul <no...@gmail.com> wrote:
> >>>
> >>>
> >>>
> >>> On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <gu...@gmail.com> wrote:
> >>>>
> >>>> I like this in that it's an example of how the overseer might be
> extended without creating a new role :)
> >>>>
> >>>> Not entirely sure if I'm for or against an enum implementation here,
> but it makes me a bit nervous. Enums with complexity can quickly get into
> difficulty for unit tests (especially if one wanted to write a mock object
> based test, something I think we maybe should use a bit more than we do).
> >>>>
> >>>>
> >>>>
> >>>> I would tend to think of a class to represent and collect role
> related functionality, one that perhaps has methods that receive the
> request, or other key objects and thus could be tested without standing up
> an entire server. (Not against also having them exercised in a few
> integrated tests, but the more we can avoid interleaving logic directly
> within DispatchFilter and HttpSolrCall etc. the better.
> >>>>
> >>>>
> >>>> So I guess I'm somewhat biased against any enum with more than a
> couple properties, and definitely don't want to wind up hanging lots of
> methods off of one. Better to use them to consume a configuration value and
> then instantiate a class that really holds the logic and data. I like them
> for constraining values and easy string value conversion but the more they
> look like classes the more I'd rather have a class.
> >>>
> >>>
> >>>  I just meant it is a set of values. Please let us not discuss the
> actual impl here . We should stick to discussing the high level design here
> and specifics should be dealt with in a PR
> >>>>
> >>>>
> >>>> -Gus
> >>>>
> >>>> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <no...@gmail.com>
> wrote:
> >>>>>
> >>>>> I recommend the following format for the role spec
> >>>>>
> >>>>> roles=<role-name>:<role-value>
> >>>>>
> >>>>> each role will have an enum of allowed values and a default value
> >>>>>
> >>>>> role name: data
> >>>>>
> >>>>> values: [on, off]
> >>>>> default: allowed
> >>>>>
> >>>>> role name: overseer
> >>>>>
> >>>>> values: [allowed, disallowed, preferred]
> >>>>> default : allowed
> >>>>>
> >>>>> role name: coordinator
> >>>>>
> >>>>> values : [on, off]
> >>>>> default: off
> >>>>>
> >>>>>
> >>>>> examples
> >>>>> roles=data:on,overseer:allowed (This is redundant because it uses
> all the default values. If a node is started without any roles value this
> is the default behavior)
> >>>>> roles=data:off,overseer:preferred ( do not allow data, join overseer
> election at head)
> >>>>> roles=coordinator:on,data:on (role as coordinator, but allow data,
> it's same as roles=coordinator:on)
> >>>>> roles=coordinator:on,data:off (role as coordinator, disallow data)
> >>>>>
> >>>>>
> >>>>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <il...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> If we go with no negative node roles and overseer node role is not
> strict (i.e. it’s a "preferred overseer"), then one would need to define a
> second node role "no_overseer" to explicitly exclude a node from ever
> becoming overseer (which I think is a useful feature until we switch the
> cluster default to not using the overseer), plus the implementation of
> these two node roles will obviously be coupled (and what if a node has both
> defined?).
> >>>>>>
> >>>>>> I prefer strict node roles.
> >>>>>> Maybe we could have node roles with [optional] parameters to let
> the node role implementation decide ?
> >>>>>> The overseer node role for example could have one of 3 values
> defined for each node: “preferred” (default, equivalent to the existing
> overseer role), "accepted" (equivalent to currently not defining the
> overseer role) and "no_way" (does not exist today).
> >>>>>>
> >>>>>> This could be useful in other contexts. A node role “data” could be
> “fast” or “slow” depending on type of local persistent storage for example…
> >>>>>>
> >>>>>> Ilan
> >>>>>>
> >>>>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> I really don't think we should have types of roles. Not
> negative/positive and not strict/non-strict. You have a role or you don't.
> What that means is up to the code implementing the role.
> >>>>>>>
> >>>>>>> Roles should be free to configure a preference order (binary, or
> n-ary or whatever, strict or loose), prohibit behavior, or enable behavior.
> In this SIP I feel we should focus on How to identify what node has what
> role, How to designate what roles a node has via config/params, and the
> API's for interacting with roles.
> >>>>>>>
> >>>>>>> We should for example be able to support roles such as
> >>>>>>>
> >>>>>>> PREFERRED_OVERSEER
> >>>>>>> DATA
> >>>>>>> NO_ROUTED_ALIAS  (just an example, not something I mean to suggest)
> >>>>>>>
> >>>>>>> Details about role implementation should probably be discussed in
> a thread about that role.  Obviously we should think about the name
> carefully to leave options open should we want to enhance things later so
> maybe
> >>>>>>>
> >>>>>>> OVERSEER_PREF  or just  OVERSEER
> >>>>>>>
> >>>>>>> would be better since it merely reades that the node implements
> some sort of preference or config regarding overseer... but all this can be
> decided on a per role basis
> >>>>>>>
> >>>>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <no...@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>> Negative roles have a place
> >>>>>>>>
> >>>>>>>> Example is overseer
> >>>>>>>>
> >>>>>>>> There are 3 possible choices for that role
> >>>>>>>>
> >>>>>>>> a) preferred: always be in front of the election queue
> >>>>>>>> b) on: not preferred, but can be an overseer if no preferred
> overseer nodes are available
> >>>>>>>> c) off: never become an overseer
> >>>>>>>>
> >>>>>>>> Today we only have options 'a' and 'b' . In a future ticket, we
> may implement C
> >>>>>>>>
> >>>>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Negative roles add a lot of complexity, I would really want to
> stay away from them. That’s why I want strict roles up front. It’s maybe ok
> to push this decision out, but it also seems like the sort of thing we
> should consider at the start.
> >>>>>>>>>
> >>>>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <no...@gmail.com>
> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Yes. Negative roles is not a bad idea. If I start a node for
> machine learning purposes, I wouldn't want that node to ever participate in
> overseer election
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com>
> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> If we have non strict roles (like overseer), then it does make
> sense
> >>>>>>>>>>> to have negative roles.
> >>>>>>>>>>> That way I can define which are the two nodes that I'd prefer
> the
> >>>>>>>>>>> overseer to run on, and a few other nodes on which it should
> >>>>>>>>>>> definitely never run for various reasons. And in case these
> >>>>>>>>>>> "!overseer" are the only nodes left in the cluster, let the
> cluster
> >>>>>>>>>>> fail the same way it would if there were no data nodes
> available.
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
> houstonputman@gmail.com> wrote:
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> With the Strict/Loose option and sensible defaults, users
> cannot trip themselves up by default, but the option is there for people to
> tinker and have an iron grip over their cluster.
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> +1 to sensible defaults so users don't trip themselves. The
> option to tinker for tighter grip can be tackled later, either on a per
> role basis or as a generic concept later.
> >>>>>>>>>>> >
> >>>>>>>>>>> >
> >>>>>>>>>>> > +1 - Can definitely be added later if we so desire, not
> needed for this SIP
> >>>>>>>>>>> >
> >>>>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com>
> wrote:
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> I think the key  is to let the roles have full control of
> the implications of having/not having that role. No need for even a
> strict/loose designation. The question of do you have the role is yes/no
> with no logic to guess if the role is implied or not, The question of will
> it come up with the role is "have_explicit ? use_defaults : use_defaults.
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> Once you figure out who has a role (or not) what that
> means is up to the role code.
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> Corollary: we don't have to change the way overseer works
> in this SIP. We can rework it or not as we see fit separately.
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> +1
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> Only thing we need to do is find a wording that makes the
> above clear on first read through the SIP :)
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> -Gus
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
> houstonputman@gmail.com> wrote:
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
> happens if all of our existing OVERSEER candidates are down. When at least
> one of them is up, the overseer will go there, and that is good and
> expected. But what happens if all of the overseer eligible nodes are down.
> Your comment, and the old system, would imply that the overseer election
> goes to some other unrelated, untagged node. I disagree with this
> implementation choice. This sounds like something role specific to
> determine, but I would like to see us be more strict about it. I don't want
> cores leaking out of my data roles, I don't want query processing to leak
> out of my "query" nodes or whatever. Overseer shouldn't be special in this
> regard.
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> I'm very strongly in favor of not letting users design a
> system in which the cluster can be "live" without an overseer. I understand
> that the overseer can be taxing to the cluster, but honestly what is the
> point of having an untaxed cluster that doesn't have an overseer? I can see
> arguments for the other roles to be stricter about this, but there are also
> a lot of users who wouldn't want those to be strict either (like "query"
> nodes).
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a
> non-overseer role node HAS to be selected to become overseer, it will try
> to migrate the overseer job to a node with the overseer role whenever one
> becomes live.
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> So maybe we don't have special rules per role, but
> instead roles can either be defined as "Strict" or "Loose" (better names
> likely exist), and the roles come with a default (Overseer -> Loose, Data
> -> Strict, Query -> Loose, etc.). And it is up to each role to define how
> to behave when running in LOOSE mode and a non-role node is used then a
> role node comes online (like the overseer example given above).
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> With the Strict/Loose option and sensible defaults, users
> cannot trip themselves up by default, but the option is there for people to
> tinker and have an iron grip over their cluster.
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com>
> wrote:
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Noble wrote:
> >>>>>>>>>>> >>>>> > We are not modifying the way the "overseer role" works
> today. We are just changing the definition and standardizing the
> configuration & discoverability
> >>>>>>>>>>> >>>>> Ishan wrote:
> >>>>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the
> OVERSEER role (which currently stands for preferred overseer). We can take
> a stab at refactoring it later.
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Grouping these two comments together, since I think they
> are saying the same thing. I think this is part of my confusion. We have an
> old system that doesn't work the way we want the new system to work. There
> may be people already using the old system. What path do we offer for folks
> using the old system to migrate to the new system? What happens if somebody
> accidentally tries to use both systems at the same time?
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Ishan wrote:
> >>>>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with
> OVERSEER role] are live, Solr guarantees that one of those nodes becomes
> the overseer.", I meant to somewhat capture the current behaviour as the
> OVERSEER role performs today. Do you see any inconsistency with this
> statement vs. what it does today?
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
> happens if all of our existing OVERSEER candidates are down. When at least
> one of them is up, the overseer will go there, and that is good and
> expected. But what happens if all of the overseer eligible nodes are down.
> Your comment, and the old system, would imply that the overseer election
> goes to some other unrelated, untagged node. I disagree with this
> implementation choice. This sounds like something role specific to
> determine, but I would like to see us be more strict about it. I don't want
> cores leaking out of my data roles, I don't want query processing to leak
> out of my "query" nodes or whatever. Overseer shouldn't be special in this
> regard.
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Noble wrote:
> >>>>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a
> node in the following request?
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> You're absolutely correct, thanks for pointing this out.
> Let's leave it as is.
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <
> mdrob@mdrob.com> wrote:
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> Replying to the top post in this thread because there
> has been a lot of discussion and I don't want to look like I'm continuing
> any of those particular threads.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> I finally had time to sit down and think about this
> with the attention it deserves and am generally happy with how the
> conversation has shaped the current proposal.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> GOOD: I think using system properties to define node
> roles is fine and I like that data is the default role when not defined. I
> think it is important to hold on to the guarantee that an active overseer
> will land on an overseer node role.
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path
> for folks using the current OVERSEER role. I am not sure that something can
> be done automatically since they need to now specify new properties at
> startup. Maybe we need to include loud warnings or support both approaches
> for a time?
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the
> overseer nodes fail, then it is implied the overseer will go to one of the
> data nodes. The specific wording in the SIP - "When one or more such nodes
> are live, Solr guarantees that one of those nodes become the overseer."
> implies to me that failover could go from overseer1 to overseer2 to
> overseerN to random node. I feel like we need to have some recording that
> there were dedicated overseer nodes and stop the cascading failure instead
> of churning through our data nodes.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed
> scope of "coordinator" roles from a split query/indexing standpoint. I
> understand that these are used as examples, but would like stronger
> language that new roles should also go through their own SIP discussions.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node
> liveness in two different places now. We have the live nodes and we have
> the node roles stored in two different places in zookeeper and it feels
> like this would lead to race conditions or split brain or other hard to
> diagnose bugs when those two lists don't agree with each other. This also
> feels like it contradicts the "single source of truth" idea later stated in
> the proposal. I see Gus's arguments for decoupling these and am not
> strongly opposed, I just get a lurking feeling about it. Even if we don't
> do this, I would like this called out explicitly in the alternative
> approaches section as something that we considered and rejected, with
> details why,
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an
> additional call out here that all operations are GET because nodes cannot
> be changed at runtime.
> >>>>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the
> previous OVERSEER preference role?
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of
> available roles for a cluster. I _think_ this could be based on the version
> that the cluster is running? Would be useful to be able to interrogate a
> cluster in the future... we're seeing OOM issues on queries, can we add
> some query nodes? When were they introduced? I don't know what path this
> API should exist at.
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated
> the SIP document. Not sure if there's a better path that we could go for.
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show
> which parts are string literals and which parts are meant to be substituted
> by the operator? GET /api/cluster/roles/data would become GET
> /api/cluster/roles/${rolename} in our SIP/documentation.
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET
> /api/cluster/roles/nodes/node1 should be GET /api/cluster/roles/${nodename}
> dropping the intermediate "nodes"
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need
> that intermediate "nodes" node.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some
> permissions? Maybe this requirement is too fundamental to the operation of
> a cluster and everybody would have to be able to do it.
> >>>>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other
> clients) to treat roles? Implementation detail that the servers will figure
> out? Or strict guidance where the client needs to check where specific
> roles are before sending any further communication to the server?
> >>>>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request
> that it can't fulfil? An overseer node gets a query or an update. A data
> node gets a collection creation request. Do they forward it on to an
> appropriate node, or do they reject it? Should this be configurable? If
> not, then it seems like lazy or poorly configured clients will defeat this
> isolation system quite easily.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
> >>>>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave
> when roles are added mean? I thought we established that they are not
> dynamic.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> Thanks,
> >>>>>>>>>>> >>>>>>> Mike
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> Hi,
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node
> roles:
> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
> >>>>>>>>>>> >>>>>>>>
> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> We also wish to add first class support for Query
> nodes that are used to process user queries by forwarding to data nodes,
> merging/aggregating them and presenting to users. This concept exists as
> first class citizens in most other search engines. This is a chance for
> Solr to catch up.
> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> Regards,
> >>>>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> --
> >>>>>>>>>>> >>> http://www.needhamsoftware.com (work)
> >>>>>>>>>>> >>> http://www.the111shift.com (play)
> >>>>>>>>>>>
> >>>>>>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> >>>>>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
> >>>>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> http://www.needhamsoftware.com (work)
> >>>>>>> http://www.the111shift.com (play)
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> -----------------------------------------------------
> >>>>> Noble Paul
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> http://www.needhamsoftware.com (work)
> >>>> http://www.the111shift.com (play)
> >>>
> >>>
> >>>
> >>> --
> >>> -----------------------------------------------------
> >>> Noble Paul
> >>
> >>
> >>
> >> --
> >> -----------------------------------------------------
> >> Noble Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

Re: First class support for node roles

Posted by Noble Paul <no...@gmail.com>.

There is no complexity here. You just think something is complex because
you haven't spent any time to understand this. A role is a tag for a node.
Period.

How complex is that?

We just added standard mechanism to declare and discover the nodes. There
is no extra cost or complexity





On Wed, Dec 8, 2021, 8:27 AM Jan Høydahl <ja...@cominvent.com> wrote:

>
>> Another user with a 100 node cluster who today have three overseer nodes
>> that they have shielded from having data by specifying createNodeSet
>> manually or by other means, can choose to adopt rhe role system, and define
>> tree dedicated nodes with the overseer role but without the data role, and
>> they will get exactly what they tried to achieve originally. Should they
>> later wish to start using role XYZ releast in 9.x, then they wil prepare
>> for that during the upgrade by starting a few nodes with role=XYZ and
>> everything is explicit and no magic.
>>
>
> My proposal will work exactly how you describe here.
>
>
>
> If it works exactly as I describe, then there is no use in adding the
> complexity of role modes and different default for different roles. My
> proposal is that ALL roles are always ALLOW if not specified explicitly.
> Keep it simple.
>
> Jan
>
>

Re: First class support for node roles

Posted by Noble Paul <no...@gmail.com>.

Jan,
 We are actually saying the same things, but differently. Today we have
only 2 roles

   - data
   - overseer

We have set default values for them . A user who starts a 9.0 Solr will get
the default behavior and it is the same exact behavior as we get in 8x.

Imagine you introduce a new role "xyz" . As a dev building this feature,
you may choose to not enable it by default if

   - The feature alters the default behavior of Solr
   - It's expensive (consumes disk,threads, memory etc)
   - or may require some local data which is not available in all nodes

In that case the role writer will decide what should be a sensible default

On Thu, Dec 9, 2021 at 1:55 AM Jan Høydahl <ja...@cominvent.com> wrote:

> >
> > > My proposal is that ALL roles are always ALLOW if not specified
> explicitly.
> >
> > As explained several times before, this is a problem for new roles
> introduced in future.
> > Those roles will get turned on on all nodes after an upgrade, whether a
> user wants or not.
>
> But you're not hearing me. This is a constructed problem assuming that we
> recommend users with huge clusters to start without explicitly specifying
> roles.
> Small custers with limited load will run happily without specifying roles.
> Thus a 3-node cluster will have ALL features available on ANY node without
> doing anything. Exactly like in 8.x
> And large clusters with the need for specialized nodes will specify roles
> on EVERY node.
>
> > A user explicitly mentions which roles his nodes want to assume, but
> after an upgrade he/she sees that node performing a new role. This is
> confusing.
>
>
> You are misunderstanding. The premise for the roles feature from the
> beginning was to embrace a transparent and super simple notion that once
> you start specifying solr.node.roles=foo,bar explicitly, then the nodes
> will ALLOW exactly that set of roles, nothing more, nothing less.
>
> So imagine a cluster running 9.0 with each node specifying roles
> explicitly. And you upgrade to 9.1 which introduces a new role 'sql'. Then,
> when planning the upgrade to 9.1 the user asks himself whether they need
> the SQL feature or not. They can then go about the upgrade along three paths
> A) No need for SQL, do nothing
> B) Using SQL, but want all existing nodes to run sql: Add the 'sql' role
> to all nodes during upgrade
> C) Using SQL, wish 3 dedicated nodes for it: Add the 'sql' role to three
> dedicated nodes during upgrade
>
> I think the extra complexity of "data:on", "data:off", "overseer:allow"
> etc comes from your assumption that large clusters will run without
> explicitly specifying roles, or that some of the nodes will not specify
> roles. Or that newly added roles somehow should become active on a new
> version even if you have explicitly specified roles on all nodes.
>
> I agree that some roles may need additional configuration, but I'm not
> sure that forcing that configuration into something you have to parse from
> the role-property itself is the best way to go. Perhpas each role can
> decide its mode of operation depending on whatever factors and
> configuration that that role needs. For the overseer, it may in the first
> phase work as today, that it interprets solr.node.role=overseer as a
> preference. And in some future version it may read an additional property
> solr.overseer.priority=3 to give some priority to it, or it could read some
> solr.overseer.strict=true to refuse to place overseer strictly on nodes
> with the role. I think it is premature to tackle any future role needs in
> the first version of the framework.
>
> Jan
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

-- 
-----------------------------------------------------
Noble Paul

Re: First class support for node roles

Posted by Jan Høydahl <ja...@cominvent.com>.

> And in some future version it may read an additional property solr.overseer.priority=3 to give some priority to it, or it could read some solr.overseer.strict=true to refuse to place overseer strictly on nodes with the role

This sysProp example was just an example, and maybe a bad example for this particular case since a local sysprop would not automatically be available to the election code. So the node-roles/<role> hierarchy in zk would be a better fit. However, my point was to let the feature behind a role handle this as for now, and gradually evolve the roles framework as needed.

Jan
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org

Re: First class support for node roles

Posted by Jan Høydahl <ja...@cominvent.com>.

> 
> > My proposal is that ALL roles are always ALLOW if not specified explicitly.
> 
> As explained several times before, this is a problem for new roles introduced in future.
> Those roles will get turned on on all nodes after an upgrade, whether a user wants or not.

But you're not hearing me. This is a constructed problem assuming that we recommend users with huge clusters to start without explicitly specifying roles.
Small custers with limited load will run happily without specifying roles. Thus a 3-node cluster will have ALL features available on ANY node without doing anything. Exactly like in 8.x
And large clusters with the need for specialized nodes will specify roles on EVERY node.

> A user explicitly mentions which roles his nodes want to assume, but after an upgrade he/she sees that node performing a new role. This is confusing.


You are misunderstanding. The premise for the roles feature from the beginning was to embrace a transparent and super simple notion that once you start specifying solr.node.roles=foo,bar explicitly, then the nodes will ALLOW exactly that set of roles, nothing more, nothing less.

So imagine a cluster running 9.0 with each node specifying roles explicitly. And you upgrade to 9.1 which introduces a new role 'sql'. Then, when planning the upgrade to 9.1 the user asks himself whether they need the SQL feature or not. They can then go about the upgrade along three paths
A) No need for SQL, do nothing
B) Using SQL, but want all existing nodes to run sql: Add the 'sql' role to all nodes during upgrade
C) Using SQL, wish 3 dedicated nodes for it: Add the 'sql' role to three dedicated nodes during upgrade

I think the extra complexity of "data:on", "data:off", "overseer:allow" etc comes from your assumption that large clusters will run without explicitly specifying roles, or that some of the nodes will not specify roles. Or that newly added roles somehow should become active on a new version even if you have explicitly specified roles on all nodes.

I agree that some roles may need additional configuration, but I'm not sure that forcing that configuration into something you have to parse from the role-property itself is the best way to go. Perhpas each role can decide its mode of operation depending on whatever factors and configuration that that role needs. For the overseer, it may in the first phase work as today, that it interprets solr.node.role=overseer as a preference. And in some future version it may read an additional property solr.overseer.priority=3 to give some priority to it, or it could read some solr.overseer.strict=true to refuse to place overseer strictly on nodes with the role. I think it is premature to tackle any future role needs in the first version of the framework.

Jan
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org

Re: First class support for node roles

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

> My proposal is that ALL roles are always ALLOW if not specified
explicitly.

As explained several times before, this is a problem for new roles
introduced in future. Those roles will get turned on on all nodes after an
upgrade, whether a user wants or not. A user explicitly mentions which
roles his nodes want to assume, but after an upgrade he/she sees that node
performing a new role. This is confusing.

On Wed, Dec 8, 2021 at 2:57 AM Jan Høydahl <ja...@cominvent.com> wrote:

>
>> Another user with a 100 node cluster who today have three overseer nodes
>> that they have shielded from having data by specifying createNodeSet
>> manually or by other means, can choose to adopt rhe role system, and define
>> tree dedicated nodes with the overseer role but without the data role, and
>> they will get exactly what they tried to achieve originally. Should they
>> later wish to start using role XYZ releast in 9.x, then they wil prepare
>> for that during the upgrade by starting a few nodes with role=XYZ and
>> everything is explicit and no magic.
>>
>
> My proposal will work exactly how you describe here.
>
>
>
> If it works exactly as I describe, then there is no use in adding the
> complexity of role modes and different default for different roles. My
> proposal is that ALL roles are always ALLOW if not specified explicitly.
> Keep it simple.
>
> Jan
>
>

Re: First class support for node roles

Posted by Jan Høydahl <ja...@cominvent.com>.

> 
> Another user with a 100 node cluster who today have three overseer nodes that they have shielded from having data by specifying createNodeSet manually or by other means, can choose to adopt rhe role system, and define tree dedicated nodes with the overseer role but without the data role, and they will get exactly what they tried to achieve originally. Should they later wish to start using role XYZ releast in 9.x, then they wil prepare for that during the upgrade by starting a few nodes with role=XYZ and everything is explicit and no magic.
> 
> My proposal will work exactly how you describe here.
>  

If it works exactly as I describe, then there is no use in adding the complexity of role modes and different default for different roles. My proposal is that ALL roles are always ALLOW if not specified explicitly. Keep it simple.

Jan

Re: First class support for node roles

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

On Mon, Dec 6, 2021 at 6:38 PM Jan Høydahl <ja...@cominvent.com> wrote:

> > > Also, we should not put so much emhasis on "nodes without roles
> defined" as if that should be a common way of starting nodes in a huge
> cluster.
> >
> > Jan, the need to tackle "nodes without roles defined" separately is to
> cater to those users who do not use the roles functionality; we need to
> provide a logical way for such users to opt into the roles feature. Hence,
> it is important to assume implicit defaults of roles for such nodes.
>
> I disagree.


I'm having a hard time understanding what you disagree with.


> If I'm an 8.x user with a 5-node cluster, with no roles, then all my nodes
> are eligible to take on any role, such as index, search, aggregate,
> streaming, sql, embedded zk, oversser etc (although roles are not a concept
> in 8.x).
> When that user upgrades to 9.0, without considering roles, they start all
> five nodes without roles, and every node will be ALLOWed to assume all
> roles, so there are no surprises with overseers not starting or anything.
>

This scenario is going to be supported exactly as you describe in my
proposed design. Keep in mind, I propose the default roles for nodes that
don't have an explicitly specified roles is "data:on, overseer:allowed".
Hence, all nodes will get those functionality by default for a user who
never even bothered to fiddle with roles (in 8x or 9x).


>
> Another user with a 100 node cluster who today have three overseer nodes
> that they have shielded from having data by specifying createNodeSet
> manually or by other means, can choose to adopt rhe role system, and define
> tree dedicated nodes with the overseer role but without the data role, and
> they will get exactly what they tried to achieve originally. Should they
> later wish to start using role XYZ releast in 9.x, then they wil prepare
> for that during the upgrade by starting a few nodes with role=XYZ and
> everything is explicit and no magic.
>

My proposal will work exactly how you describe here.


>
> Jan
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

Re: First class support for node roles

Posted by Jan Høydahl <ja...@cominvent.com>.

> > Also, we should not put so much emhasis on "nodes without roles defined" as if that should be a common way of starting nodes in a huge cluster.
> 
> Jan, the need to tackle "nodes without roles defined" separately is to cater to those users who do not use the roles functionality; we need to provide a logical way for such users to opt into the roles feature. Hence, it is important to assume implicit defaults of roles for such nodes.

I disagree. If I'm an 8.x user with a 5-node cluster, with no roles, then all my nodes are eligible to take on any role, such as index, search, aggregate, streaming, sql, embedded zk, oversser etc (although roles are not a concept in 8.x).
When that user upgrades to 9.0, without considering roles, they start all five nodes without roles, and every node will be ALLOWed to assume all roles, so there are no surprises with overseers not starting or anything.

Another user with a 100 node cluster who today have three overseer nodes that they have shielded from having data by specifying createNodeSet manually or by other means, can choose to adopt rhe role system, and define tree dedicated nodes with the overseer role but without the data role, and they will get exactly what they tried to achieve originally. Should they later wish to start using role XYZ releast in 9.x, then they wil prepare for that during the upgrade by starting a few nodes with role=XYZ and everything is explicit and no magic.

Jan
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org

Re: First class support for node roles

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

Ilan,
I missed addressing one aspect in your mail.

On Mon, Dec 6, 2021 at 8:37 PM Ilan Ginzburg <il...@gmail.com> wrote:

> Ishan,
>
> > > Using a string separate from the role definitions (Ishan) makes it too
> easy to have roles for which the default configuration is unknown.
> >
> > Ilan, can you please elaborate (perhaps with an example) as to what you
> mean here?
>
> If the default string for all roles for nodes with no roles configured
> got customized in a specific cluster (say it got changed to "data:on,
> overseer:disallowed"),


A user won't be able to "customize" that default string. It can happen
after a Solr upgrade (where the new Solr version can introduce a new
default string).



> when a new version of Solr with a new role gets
> deployed, that new role will not have a default defined in the string,
> which might not be the intent (if the new role for example is "ui" and
> the default expected to be "on", "ui" not being defined in the string
> makes the default likely be "off" - more about that default below).
>
> > As per my proposal, a node that was started with explicit roles, but
> without a particular role defined for it will have no functionality
> associated with that role running on it at any point in its lifetime. For
> example if a node was started with "-Dnodes.roles=data:on" will never have
> anything to do with overseer functionality. There is no concept of defaults
> in that case.
>
> That's the point. The code dealing with the overseer role will have to
> make different decisions based on the role being "on" or "off", or if
> we go with parameters, the role being "allowed", "disallowed",
> "preferred".
>
> When you say "will never have anything to do with overseer
> functionality" what does that mean when roles are defined but not the
> overseer role? I assume that should mean that for this node, overseer
> is "disallowed" (or pick any other value of the 3 possible ones).
> Therefore, there is a default when the role is not defined (and other
> roles are defined). If  "will never have anything to do with overseer
> functionality" is an option different from the 3 other ones, then we
> end up with 4 different overseer related configuration options (but I
> think it's better to be able to explicitly specify all configuration
> options).
>
> So that's my point. A per role default for when a role is not defined.
> And if we have this for all roles, this is the node default roles
> config when no roles are defined at all. When introducing role "ui"
> with a role default of "on", then all existing nodes that do not
> specify a configuration for this role (regardless of if they specify a
> configuration for other roles) get the sound default.
>
> Ilan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

Re: First class support for node roles

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

Hi all,
I think we've captured all recent feedback in the updated proposal. Thanks
to Mike and others for helping us streamline the proposal to better suit
the current overseer role. Thanks to Ilan and others for the concept of
defaultIfAbsent concept.
I think the proposal is not only complete, but also provides the simplicity
for users to opt into roles, while also allowing room for developers to
introduce new roles with ease.
If there are no further objections, we'd like to proceed with the
implementation.
Thanks,
Ishan

On Mon, Dec 6, 2021 at 9:28 PM Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

>
>
> On Mon, Dec 6, 2021 at 9:18 PM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
>
>> I've updated the SIP document with the recent changes. Also, added a new
>> section to provide guidance for adding new roles.
>>
>> On Mon, Dec 6, 2021 at 8:37 PM Ilan Ginzburg <il...@gmail.com> wrote:
>>
>>> Ishan,
>>>
>>> > > Using a string separate from the role definitions (Ishan) makes it
>>> too easy to have roles for which the default configuration is unknown.
>>> >
>>> > Ilan, can you please elaborate (perhaps with an example) as to what
>>> you mean here?
>>>
>>> If the default string for all roles for nodes with no roles configured
>>> got customized in a specific cluster (say it got changed to "data:on,
>>> overseer:disallowed"), when a new version of Solr with a new role gets
>>> deployed, that new role will not have a default defined in the string,
>>> which might not be the intent (if the new role for example is "ui" and
>>> the default expected to be "on", "ui" not being defined in the string
>>> makes the default likely be "off" - more about that default below).
>>>
>>
>> That new version (which introduces "ui" role) will modify the assumed
>> default value for solr.node.roles from "data:on,overseer:allowed" to "
>> data:on,overseer:allowed,ui:on". That means after upgrade, the nodes
>> will have it turned on since the user is not using any role defined. If the
>> user is already using roles on his/her nodes, then he/she would need to
>> append "ui:on" to those nodes where he/she wants this new functionality to
>> run. I've added a new section to the SIP document (guidance for adding new
>> role to clarify this).
>>
>>
>>>
>>> > As per my proposal, a node that was started with explicit roles, but
>>> without a particular role defined for it will have no functionality
>>> associated with that role running on it at any point in its lifetime. For
>>> example if a node was started with "-Dnodes.roles=data:on" will never have
>>> anything to do with overseer functionality. There is no concept of defaults
>>> in that case.
>>>
>>> That's the point. The code dealing with the overseer role will have to
>>> make different decisions based on the role being "on" or "off", or if
>>> we go with parameters, the role being "allowed", "disallowed",
>>> "preferred".
>>>
>>> When you say "will never have anything to do with overseer
>>> functionality" what does that mean when roles are defined but not the
>>> overseer role? I assume that should mean that for this node, overseer
>>> is "disallowed" (or pick any other value of the 3 possible ones).
>>> Therefore, there is a default when the role is not defined (and other
>>> roles are defined). If  "will never have anything to do with overseer
>>> functionality" is an option different from the 3 other ones, then we
>>> end up with 4 different overseer related configuration options (but I
>>> think it's better to be able to explicitly specify all configuration
>>> options).
>>>
>>> So that's my point. A per role default for when a role is not defined.
>>>
>>
>> Ah, I now see. Yes, there will be such a default for all roles that is
>> assumed when no role is applied on that node. For "data" role, it is "off".
>> For "overseer", it is "disallowed".
>> Does that make sense?
>>
>
> Ilan,
> I've updated the SIP document to reflect this with a concept called
> "defaultIfAbsent". Please review the SIP and let me know if it accurately
> reflects your intention.
>
>
>>
>>
>>
>>> And if we have this for all roles, this is the node default roles
>>> config when no roles are defined at all. When introducing role "ui"
>>> with a role default of "on", then all existing nodes that do not
>>> specify a configuration for this role (regardless of if they specify a
>>> configuration for other roles) get the sound default.
>>>
>>> Ilan
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>
>>>

Re: First class support for node roles

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

On Mon, Dec 6, 2021 at 9:18 PM Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

> I've updated the SIP document with the recent changes. Also, added a new
> section to provide guidance for adding new roles.
>
> On Mon, Dec 6, 2021 at 8:37 PM Ilan Ginzburg <il...@gmail.com> wrote:
>
>> Ishan,
>>
>> > > Using a string separate from the role definitions (Ishan) makes it
>> too easy to have roles for which the default configuration is unknown.
>> >
>> > Ilan, can you please elaborate (perhaps with an example) as to what you
>> mean here?
>>
>> If the default string for all roles for nodes with no roles configured
>> got customized in a specific cluster (say it got changed to "data:on,
>> overseer:disallowed"), when a new version of Solr with a new role gets
>> deployed, that new role will not have a default defined in the string,
>> which might not be the intent (if the new role for example is "ui" and
>> the default expected to be "on", "ui" not being defined in the string
>> makes the default likely be "off" - more about that default below).
>>
>
> That new version (which introduces "ui" role) will modify the assumed
> default value for solr.node.roles from "data:on,overseer:allowed" to "
> data:on,overseer:allowed,ui:on". That means after upgrade, the nodes will
> have it turned on since the user is not using any role defined. If the user
> is already using roles on his/her nodes, then he/she would need to append
> "ui:on" to those nodes where he/she wants this new functionality to run.
> I've added a new section to the SIP document (guidance for adding new role
> to clarify this).
>
>
>>
>> > As per my proposal, a node that was started with explicit roles, but
>> without a particular role defined for it will have no functionality
>> associated with that role running on it at any point in its lifetime. For
>> example if a node was started with "-Dnodes.roles=data:on" will never have
>> anything to do with overseer functionality. There is no concept of defaults
>> in that case.
>>
>> That's the point. The code dealing with the overseer role will have to
>> make different decisions based on the role being "on" or "off", or if
>> we go with parameters, the role being "allowed", "disallowed",
>> "preferred".
>>
>> When you say "will never have anything to do with overseer
>> functionality" what does that mean when roles are defined but not the
>> overseer role? I assume that should mean that for this node, overseer
>> is "disallowed" (or pick any other value of the 3 possible ones).
>> Therefore, there is a default when the role is not defined (and other
>> roles are defined). If  "will never have anything to do with overseer
>> functionality" is an option different from the 3 other ones, then we
>> end up with 4 different overseer related configuration options (but I
>> think it's better to be able to explicitly specify all configuration
>> options).
>>
>> So that's my point. A per role default for when a role is not defined.
>>
>
> Ah, I now see. Yes, there will be such a default for all roles that is
> assumed when no role is applied on that node. For "data" role, it is "off".
> For "overseer", it is "disallowed".
> Does that make sense?
>

Ilan,
I've updated the SIP document to reflect this with a concept called
"defaultIfAbsent". Please review the SIP and let me know if it accurately
reflects your intention.


>
>
>
>> And if we have this for all roles, this is the node default roles
>> config when no roles are defined at all. When introducing role "ui"
>> with a role default of "on", then all existing nodes that do not
>> specify a configuration for this role (regardless of if they specify a
>> configuration for other roles) get the sound default.
>>
>> Ilan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>> For additional commands, e-mail: dev-help@solr.apache.org
>>
>>

Re: First class support for node roles

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

I've updated the SIP document with the recent changes. Also, added a new
section to provide guidance for adding new roles.

On Mon, Dec 6, 2021 at 8:37 PM Ilan Ginzburg <il...@gmail.com> wrote:

> Ishan,
>
> > > Using a string separate from the role definitions (Ishan) makes it too
> easy to have roles for which the default configuration is unknown.
> >
> > Ilan, can you please elaborate (perhaps with an example) as to what you
> mean here?
>
> If the default string for all roles for nodes with no roles configured
> got customized in a specific cluster (say it got changed to "data:on,
> overseer:disallowed"), when a new version of Solr with a new role gets
> deployed, that new role will not have a default defined in the string,
> which might not be the intent (if the new role for example is "ui" and
> the default expected to be "on", "ui" not being defined in the string
> makes the default likely be "off" - more about that default below).
>

That new version (which introduces "ui" role) will modify the assumed
default value for solr.node.roles from "data:on,overseer:allowed" to "
data:on,overseer:allowed,ui:on". That means after upgrade, the nodes will
have it turned on since the user is not using any role defined. If the user
is already using roles on his/her nodes, then he/she would need to append
"ui:on" to those nodes where he/she wants this new functionality to run.
I've added a new section to the SIP document (guidance for adding new role
to clarify this).


>
> > As per my proposal, a node that was started with explicit roles, but
> without a particular role defined for it will have no functionality
> associated with that role running on it at any point in its lifetime. For
> example if a node was started with "-Dnodes.roles=data:on" will never have
> anything to do with overseer functionality. There is no concept of defaults
> in that case.
>
> That's the point. The code dealing with the overseer role will have to
> make different decisions based on the role being "on" or "off", or if
> we go with parameters, the role being "allowed", "disallowed",
> "preferred".
>
> When you say "will never have anything to do with overseer
> functionality" what does that mean when roles are defined but not the
> overseer role? I assume that should mean that for this node, overseer
> is "disallowed" (or pick any other value of the 3 possible ones).
> Therefore, there is a default when the role is not defined (and other
> roles are defined). If  "will never have anything to do with overseer
> functionality" is an option different from the 3 other ones, then we
> end up with 4 different overseer related configuration options (but I
> think it's better to be able to explicitly specify all configuration
> options).
>
> So that's my point. A per role default for when a role is not defined.
>

Ah, I now see. Yes, there will be such a default for all roles that is
assumed when no role is applied on that node. For "data" role, it is "off".
For "overseer", it is "disallowed".
Does that make sense?



> And if we have this for all roles, this is the node default roles
> config when no roles are defined at all. When introducing role "ui"
> with a role default of "on", then all existing nodes that do not
> specify a configuration for this role (regardless of if they specify a
> configuration for other roles) get the sound default.
>
> Ilan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

Re: First class support for node roles

Posted by Ilan Ginzburg <il...@gmail.com>.

Ishan,

> > Using a string separate from the role definitions (Ishan) makes it too easy to have roles for which the default configuration is unknown.
>
> Ilan, can you please elaborate (perhaps with an example) as to what you mean here?

If the default string for all roles for nodes with no roles configured
got customized in a specific cluster (say it got changed to "data:on,
overseer:disallowed"), when a new version of Solr with a new role gets
deployed, that new role will not have a default defined in the string,
which might not be the intent (if the new role for example is "ui" and
the default expected to be "on", "ui" not being defined in the string
makes the default likely be "off" - more about that default below).

> As per my proposal, a node that was started with explicit roles, but without a particular role defined for it will have no functionality associated with that role running on it at any point in its lifetime. For example if a node was started with "-Dnodes.roles=data:on" will never have anything to do with overseer functionality. There is no concept of defaults in that case.

That's the point. The code dealing with the overseer role will have to
make different decisions based on the role being "on" or "off", or if
we go with parameters, the role being "allowed", "disallowed",
"preferred".

When you say "will never have anything to do with overseer
functionality" what does that mean when roles are defined but not the
overseer role? I assume that should mean that for this node, overseer
is "disallowed" (or pick any other value of the 3 possible ones).
Therefore, there is a default when the role is not defined (and other
roles are defined). If  "will never have anything to do with overseer
functionality" is an option different from the 3 other ones, then we
end up with 4 different overseer related configuration options (but I
think it's better to be able to explicitly specify all configuration
options).

So that's my point. A per role default for when a role is not defined.
And if we have this for all roles, this is the node default roles
config when no roles are defined at all. When introducing role "ui"
with a role default of "on", then all existing nodes that do not
specify a configuration for this role (regardless of if they specify a
configuration for other roles) get the sound default.

Ilan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org

Re: First class support for node roles

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

> Using a string separate from the role definitions (Ishan) makes it too
easy to have roles for which the default configuration is unknown.

Ilan, can you please elaborate (perhaps with an example) as to what you
mean here?
As per my proposal, a node that was started with explicit roles, but
without a particular role defined for it will have no functionality
associated with that role running on it at any point in its lifetime. For
example if a node was started with "-Dnodes.roles=data:on" will never have
anything to do with overseer functionality. There is no concept of defaults
in that case.

> Also, we should not put so much emhasis on "nodes without roles defined"
as if that should be a common way of starting nodes in a huge cluster.

Jan, the need to tackle "nodes without roles defined" separately is to
cater to those users who do not use the roles functionality; we need to
provide a logical way for such users to opt into the roles feature. Hence,
it is important to assume implicit defaults of roles for such nodes.



On Mon, Dec 6, 2021 at 1:43 PM Ilan Ginzburg <il...@gmail.com> wrote:

> Noble got my intention correctly.
>
> I think role specific code should only have to deal with the various
> configuration options for the role. When configuration was binary (role
> defined or not), then the default is one of the two values, but even then
> we saw that for data we wanted default true and for Overseer default false.
>
> If we introduce non boolean roles (i.e. role parameters), absence of a
> role has to be mapped to one of the values (otherwise it acts as yet
> another value - confusing) . Role config in absence of explicit role
> definition for a node has to be defined for each role (data by default on,
> Overseer by default allowed... ) in some way.
> Using a string separate from the role definitions (Ishan) makes it too
> easy to have roles for which the default configuration is unknown.
>
> Ilan
>
>
>
> Le lun. 6 déc. 2021 à 08:58, Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> a écrit :
>
>> Role specific configurations can go into /node_roles/${rolename} znode,
>> and that is outside the scope of this SIP. The concept of role specific
>> modes (eg allowed, preferred for overseer) is a welcome addition to
>> original proposal to model the overseer functionality properly without any
>> confusion to user. On top of that, default roles for nodes that don't have
>> any roles defined for it can be assumed by default (data:on,
>> overseer:allowed).
>>
>> Isn't that simple and generic at the same time? Why overcomplicate
>> everything all over again?
>>
>> On Mon, 6 Dec, 2021, 12:54 pm Gus Heck, <gu...@gmail.com> wrote:
>>
>>> So I think we're loosing sight of the original concept of "default" and
>>> conflating it with role configuration.
>>>
>>> When we started talking about "default roles" the idea was "default" was
>>> a flag that indicated if the role was active on a Solr Node where no roles
>>> had been specified. Plain and simple. Full stop.
>>>
>>> Secondarily any given role might or might not have some configuration
>>> associated with it. Optionally a role that accepts configuration may define
>>> default configuration values but this has nothing to do with "default role"
>>>
>>> Default should be an intrinsic binary property of the role as a whole
>>> (not specific to a cluster or a node).
>>>
>>> There are 3 levels to think about
>>>
>>>    1. Intrinsic Attributes of the role as a whole (Example  -->
>>>    default: yes/no)
>>>    2. Configurable attributes for the role across the cluster (Example
>>>    --> Strict: yes/no) (concept mentioned previously affecting how presence of
>>>    a role is interpreted by role related code)
>>>    3. Configurable attributes for the node that relate to the role
>>>    (Example --> Election_priority_adjust: integer ) (Hypothetical way of
>>>    influencing who gets elected first in a more fine grained fashion)
>>>
>>> Maybe use the following terminology?
>>>
>>>    1. Role Intrinsic Property
>>>    2. Role Cluster Config
>>>    3. Role Node Config
>>>
>>> We almost certainly have to determine what Role Intrinsic Properties we
>>> want to support as these are likely to be coded into the role
>>> implementation directly, and implementors of roles should specify these.
>>> (I'm not presently seeing need for more than "default".
>>>
>>> The config levels I think we want to mostly identify where that
>>> information can be communicated and stored. The Role Cluster Config level
>>> is tricky since there's no "cluster" until you start the first "Node" ...
>>> so a bit of a chicken/egg there. The Role Node Config  however seems to
>>> make sense as a file that gets read and then reflected in zk as appropriate
>>> during node startup (config that specified the local directory for
>>> something would not need to show up in zk of course, just stuff that
>>> another node/overseer/query router/whatever might need to know.
>>>
>>> Definitely let's reword anything that involves the phrase "Two Defaults"
>>> since by definition only one value can be the "default" value (I suppose
>>> theoretically you could have a mapping of defaults conditional on some
>>> other value but that's definitely the opposite of simple).
>>>
>>> -Gus
>>>
>>> On Mon, Dec 6, 2021 at 12:36 AM Ishan Chattopadhyaya <
>>> ichattopadhyaya@gmail.com> wrote:
>>>
>>>> I think I understand Ilan's motivation for two defaults. Here's a
>>>> summary of what I understand Ilan's proposal, and a follow up proposal that
>>>> achieves the similar effect with less perceived complexity to user.
>>>>
>>>> *Ilan's proposal (as I understand it):*
>>>>
>>>> 1. Every role to have two defaults. Example:
>>>> data: {modes: [on, off], default1: on, *default2: on*}
>>>> overseer: {modes: [allowed, disallowed, preferred], default1:
>>>> preferred, *default2: disallowed*}
>>>> ui: {modes: [on, off], default1: on, *default2: on*}
>>>>
>>>> 2. Here, default1 is for lazy users for whom "-Dnode.roles=<rolename>"
>>>> will be interpreted as "-Dnode.roles=<rolename>:<default1 of rolename>".
>>>> 3. Here's default2 for any role *role1* is for users who either (a)
>>>> never specified any roles for a node, or (b) specified other roles, but not
>>>> *role1*. In both cases, the behaviour of that node would implicitly
>>>> assume "role1:<default2 of role1>".
>>>>
>>>> *My alternate proposal:*
>>>> 1. There are no role specific defaults. Example:
>>>> data: {modes: [on, off]}
>>>> overseer: {modes: [allowed, disallowed, preferred]}
>>>> ui: {modes: [on, off]}
>>>>
>>>> 2. There is a node specific default roles string *if no *-Dnode.roles
>>>> was specified. Example:
>>>> "data:on, overseer:allowed" (Today's system)
>>>> "data:on, overseer:allowed, ui:on" (When a future role, say "ui" is
>>>> introduced)
>>>>
>>>> 3. If a node was started with explicitly specified roles, that node
>>>> will have exactly those roles (in the specified modes) and nothing else (no
>>>> assumptions about other non-specified roles, i.e. those roles not specified
>>>> will not run).
>>>>
>>>> *Benefits of my proposal:*
>>>> 1. Easier to understand for users.
>>>> 2. Here's a scenario where user will be happier in my proposal vs.
>>>> Ilan's proposal:
>>>>    * 10 nodes with *-Droles=data:on,overseer:allowed*. (Regular data
>>>> nodes)
>>>>    * 2 nodes with *-Droles=overseer:preferred*. (Two dedicated
>>>> overseer nodes)
>>>>    * User upgrades from Solr 9.0 to 9.1, where "ui" role has been
>>>> introduced. Developers of "ui" role want it to be available for most users.
>>>>         - In Ilan's proposal, the developer chooses this in 9.1: ui:
>>>> {modes: [on, off], default1: on, *default2: on*}. Now, user upgrading
>>>> will see that UI is running on his two overseer nodes, and he's confused
>>>> (because he explicitly specified what he wants)
>>>>         - In my proposal, the developer chooses ui: {modes: [on, off]};
>>>> default roles for those users who don't specify roles: "data:on,
>>>> overseer:allowed, ui:on". Now, there are no surprises of implicit
>>>> default. Users who don't use roles at all will get this functionality
>>>> turned on, just as the developer wanted. Users who use roles will have to
>>>> explicitly append "ui:on" to their roles string on their nodes during the
>>>> upgrade (this tip will come from the upgrade notes).
>>>>
>>>> What do you think, Ilan/Noble/Mike/Gus/Houston?
>>>>
>>>> On Mon, Dec 6, 2021 at 8:10 AM Noble Paul <no...@gmail.com> wrote:
>>>>
>>>>> Ilan was asking how what should be the overseer role in the following
>>>>> situations
>>>>>
>>>>> a) role=overseer,data:on
>>>>> b) role=overseer: preferred,data:on
>>>>> c) role=data:on
>>>>>
>>>>> I'm saying a shouldn't be valid. Only b & c are valid
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Dec 6, 2021, 12:44 PM Mike Drob <md...@mdrob.com> wrote:
>>>>>
>>>>>> Ilan,
>>>>>>
>>>>>> Can you provide a more detailed concrete example? I’m having a lot of
>>>>>> trouble understanding what you are proposing, beyond that it is somehow
>>>>>> contraindicated with what Ishan/Noble suggest.
>>>>>>
>>>>>> Apologies for my failure to understand.
>>>>>>
>>>>>> Thanks,
>>>>>> Mike
>>>>>>
>>>>>> On Sun, Dec 5, 2021 at 5:21 PM Ilan Ginzburg <il...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> If we go with optional role params, we need two defaults:
>>>>>>> 1. the param value to use when the role is specified without a
>>>>>>> parameter, and
>>>>>>> 2. the param value to use for the role on a node for which the role
>>>>>>> is
>>>>>>> not specified at all.
>>>>>>>
>>>>>>> I don't know how to sensibly name these defaults, but the actual
>>>>>>> values would be:
>>>>>>> overseer: default1=preferred, default2=allowed
>>>>>>> data: default1=on, default2=on
>>>>>>> coordinator: default1=on, default2=off
>>>>>>>
>>>>>>> If we do not allow specifying a role without a parameter, then
>>>>>>> default1 does not exist and the example Noble posted earlier covers
>>>>>>> us. But simple roles will be easier to use without parameters (and
>>>>>>> the
>>>>>>> transition from existing overseer role would be trivial).
>>>>>>>
>>>>>>> On Sun, Dec 5, 2021 at 7:17 AM Ishan Chattopadhyaya
>>>>>>> <ic...@gmail.com> wrote:
>>>>>>> >
>>>>>>> > I'm +1 on this. It "looks" complicated at first, but simplifies
>>>>>>> all headaches going forward.
>>>>>>> >
>>>>>>> > On Sun, Dec 5, 2021 at 11:46 AM Noble Paul <no...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> I shall update the SIP proposal if we have a consensus on this
>>>>>>> configuration
>>>>>>> >>
>>>>>>> >> On Sun, Dec 5, 2021 at 4:58 PM Noble Paul <no...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <gu...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>> I like this in that it's an example of how the overseer might
>>>>>>> be extended without creating a new role :)
>>>>>>> >>>>
>>>>>>> >>>> Not entirely sure if I'm for or against an enum implementation
>>>>>>> here, but it makes me a bit nervous. Enums with complexity can quickly get
>>>>>>> into difficulty for unit tests (especially if one wanted to write a mock
>>>>>>> object based test, something I think we maybe should use a bit more than we
>>>>>>> do).
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> I would tend to think of a class to represent and collect role
>>>>>>> related functionality, one that perhaps has methods that receive the
>>>>>>> request, or other key objects and thus could be tested without standing up
>>>>>>> an entire server. (Not against also having them exercised in a few
>>>>>>> integrated tests, but the more we can avoid interleaving logic directly
>>>>>>> within DispatchFilter and HttpSolrCall etc. the better.
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> So I guess I'm somewhat biased against any enum with more than
>>>>>>> a couple properties, and definitely don't want to wind up hanging lots of
>>>>>>> methods off of one. Better to use them to consume a configuration value and
>>>>>>> then instantiate a class that really holds the logic and data. I like them
>>>>>>> for constraining values and easy string value conversion but the more they
>>>>>>> look like classes the more I'd rather have a class.
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>  I just meant it is a set of values. Please let us not discuss
>>>>>>> the actual impl here . We should stick to discussing the high level design
>>>>>>> here and specifics should be dealt with in a PR
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> -Gus
>>>>>>> >>>>
>>>>>>> >>>> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <
>>>>>>> noble.paul@gmail.com> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> I recommend the following format for the role spec
>>>>>>> >>>>>
>>>>>>> >>>>> roles=<role-name>:<role-value>
>>>>>>> >>>>>
>>>>>>> >>>>> each role will have an enum of allowed values and a default
>>>>>>> value
>>>>>>> >>>>>
>>>>>>> >>>>> role name: data
>>>>>>> >>>>>
>>>>>>> >>>>> values: [on, off]
>>>>>>> >>>>> default: allowed
>>>>>>> >>>>>
>>>>>>> >>>>> role name: overseer
>>>>>>> >>>>>
>>>>>>> >>>>> values: [allowed, disallowed, preferred]
>>>>>>> >>>>> default : allowed
>>>>>>> >>>>>
>>>>>>> >>>>> role name: coordinator
>>>>>>> >>>>>
>>>>>>> >>>>> values : [on, off]
>>>>>>> >>>>> default: off
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> examples
>>>>>>> >>>>> roles=data:on,overseer:allowed (This is redundant because it
>>>>>>> uses all the default values. If a node is started without any roles value
>>>>>>> this is the default behavior)
>>>>>>> >>>>> roles=data:off,overseer:preferred ( do not allow data, join
>>>>>>> overseer election at head)
>>>>>>> >>>>> roles=coordinator:on,data:on (role as coordinator, but allow
>>>>>>> data, it's same as roles=coordinator:on)
>>>>>>> >>>>> roles=coordinator:on,data:off (role as coordinator, disallow
>>>>>>> data)
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <
>>>>>>> ilansolr@gmail.com> wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>> If we go with no negative node roles and overseer node role
>>>>>>> is not strict (i.e. it’s a "preferred overseer"), then one would need to
>>>>>>> define a second node role "no_overseer" to explicitly exclude a node from
>>>>>>> ever becoming overseer (which I think is a useful feature until we switch
>>>>>>> the cluster default to not using the overseer), plus the implementation of
>>>>>>> these two node roles will obviously be coupled (and what if a node has both
>>>>>>> defined?).
>>>>>>> >>>>>>
>>>>>>> >>>>>> I prefer strict node roles.
>>>>>>> >>>>>> Maybe we could have node roles with [optional] parameters to
>>>>>>> let the node role implementation decide ?
>>>>>>> >>>>>> The overseer node role for example could have one of 3 values
>>>>>>> defined for each node: “preferred” (default, equivalent to the existing
>>>>>>> overseer role), "accepted" (equivalent to currently not defining the
>>>>>>> overseer role) and "no_way" (does not exist today).
>>>>>>> >>>>>>
>>>>>>> >>>>>> This could be useful in other contexts. A node role “data”
>>>>>>> could be “fast” or “slow” depending on type of local persistent storage for
>>>>>>> example…
>>>>>>> >>>>>>
>>>>>>> >>>>>> Ilan
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> I really don't think we should have types of roles. Not
>>>>>>> negative/positive and not strict/non-strict. You have a role or you don't.
>>>>>>> What that means is up to the code implementing the role.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Roles should be free to configure a preference order
>>>>>>> (binary, or n-ary or whatever, strict or loose), prohibit behavior, or
>>>>>>> enable behavior. In this SIP I feel we should focus on How to identify what
>>>>>>> node has what role, How to designate what roles a node has via
>>>>>>> config/params, and the API's for interacting with roles.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> We should for example be able to support roles such as
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> PREFERRED_OVERSEER
>>>>>>> >>>>>>> DATA
>>>>>>> >>>>>>> NO_ROUTED_ALIAS  (just an example, not something I mean to
>>>>>>> suggest)
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Details about role implementation should probably be
>>>>>>> discussed in a thread about that role.  Obviously we should think about the
>>>>>>> name carefully to leave options open should we want to enhance things later
>>>>>>> so maybe
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> OVERSEER_PREF  or just  OVERSEER
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> would be better since it merely reades that the node
>>>>>>> implements some sort of preference or config regarding overseer... but all
>>>>>>> this can be decided on a per role basis
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <
>>>>>>> noble.paul@gmail.com> wrote:
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Negative roles have a place
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Example is overseer
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> There are 3 possible choices for that role
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> a) preferred: always be in front of the election queue
>>>>>>> >>>>>>>> b) on: not preferred, but can be an overseer if no
>>>>>>> preferred overseer nodes are available
>>>>>>> >>>>>>>> c) off: never become an overseer
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Today we only have options 'a' and 'b' . In a future
>>>>>>> ticket, we may implement C
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com>
>>>>>>> wrote:
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> Negative roles add a lot of complexity, I would really
>>>>>>> want to stay away from them. That’s why I want strict roles up front. It’s
>>>>>>> maybe ok to push this decision out, but it also seems like the sort of
>>>>>>> thing we should consider at the start.
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <
>>>>>>> noble.paul@gmail.com> wrote:
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>> Yes. Negative roles is not a bad idea. If I start a node
>>>>>>> for machine learning purposes, I wouldn't want that node to ever
>>>>>>> participate in overseer election
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <
>>>>>>> ilansolr@gmail.com> wrote:
>>>>>>> >>>>>>>>>>>
>>>>>>> >>>>>>>>>>> If we have non strict roles (like overseer), then it
>>>>>>> does make sense
>>>>>>> >>>>>>>>>>> to have negative roles.
>>>>>>> >>>>>>>>>>> That way I can define which are the two nodes that I'd
>>>>>>> prefer the
>>>>>>> >>>>>>>>>>> overseer to run on, and a few other nodes on which it
>>>>>>> should
>>>>>>> >>>>>>>>>>> definitely never run for various reasons. And in case
>>>>>>> these
>>>>>>> >>>>>>>>>>> "!overseer" are the only nodes left in the cluster, let
>>>>>>> the cluster
>>>>>>> >>>>>>>>>>> fail the same way it would if there were no data nodes
>>>>>>> available.
>>>>>>> >>>>>>>>>>>
>>>>>>> >>>>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
>>>>>>> houstonputman@gmail.com> wrote:
>>>>>>> >>>>>>>>>>> >>>
>>>>>>> >>>>>>>>>>> >>> With the Strict/Loose option and sensible defaults,
>>>>>>> users cannot trip themselves up by default, but the option is there for
>>>>>>> people to tinker and have an iron grip over their cluster.
>>>>>>> >>>>>>>>>>> >>
>>>>>>> >>>>>>>>>>> >>
>>>>>>> >>>>>>>>>>> >> +1 to sensible defaults so users don't trip
>>>>>>> themselves. The option to tinker for tighter grip can be tackled later,
>>>>>>> either on a per role basis or as a generic concept later.
>>>>>>> >>>>>>>>>>> >
>>>>>>> >>>>>>>>>>> >
>>>>>>> >>>>>>>>>>> > +1 - Can definitely be added later if we so desire,
>>>>>>> not needed for this SIP
>>>>>>> >>>>>>>>>>> >
>>>>>>> >>>>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>> >>>>>>>>>>> >>
>>>>>>> >>>>>>>>>>> >>
>>>>>>> >>>>>>>>>>> >>
>>>>>>> >>>>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <
>>>>>>> gus.heck@gmail.com> wrote:
>>>>>>> >>>>>>>>>>> >>>
>>>>>>> >>>>>>>>>>> >>> I think the key  is to let the roles have full
>>>>>>> control of the implications of having/not having that role. No need for
>>>>>>> even a strict/loose designation. The question of do you have the role is
>>>>>>> yes/no with no logic to guess if the role is implied or not, The question
>>>>>>> of will it come up with the role is "have_explicit ? use_defaults :
>>>>>>> use_defaults.
>>>>>>> >>>>>>>>>>> >>>
>>>>>>> >>>>>>>>>>> >>> Once you figure out who has a role (or not) what
>>>>>>> that means is up to the role code.
>>>>>>> >>>>>>>>>>> >>>
>>>>>>> >>>>>>>>>>> >>> Corollary: we don't have to change the way overseer
>>>>>>> works in this SIP. We can rework it or not as we see fit separately.
>>>>>>> >>>>>>>>>>> >>
>>>>>>> >>>>>>>>>>> >>
>>>>>>> >>>>>>>>>>> >> +1
>>>>>>> >>>>>>>>>>> >>
>>>>>>> >>>>>>>>>>> >>>
>>>>>>> >>>>>>>>>>> >>>
>>>>>>> >>>>>>>>>>> >>> Only thing we need to do is find a wording that
>>>>>>> makes the above clear on first read through the SIP :)
>>>>>>> >>>>>>>>>>> >>>
>>>>>>> >>>>>>>>>>> >>> -Gus
>>>>>>> >>>>>>>>>>> >>>
>>>>>>> >>>>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>>>>>> houstonputman@gmail.com> wrote:
>>>>>>> >>>>>>>>>>> >>>>>
>>>>>>> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
>>>>>>> happens if all of our existing OVERSEER candidates are down. When at least
>>>>>>> one of them is up, the overseer will go there, and that is good and
>>>>>>> expected. But what happens if all of the overseer eligible nodes are down.
>>>>>>> Your comment, and the old system, would imply that the overseer election
>>>>>>> goes to some other unrelated, untagged node. I disagree with this
>>>>>>> implementation choice. This sounds like something role specific to
>>>>>>> determine, but I would like to see us be more strict about it. I don't want
>>>>>>> cores leaking out of my data roles, I don't want query processing to leak
>>>>>>> out of my "query" nodes or whatever. Overseer shouldn't be special in this
>>>>>>> regard.
>>>>>>> >>>>>>>>>>> >>>>
>>>>>>> >>>>>>>>>>> >>>>
>>>>>>> >>>>>>>>>>> >>>> I'm very strongly in favor of not letting users
>>>>>>> design a system in which the cluster can be "live" without an overseer. I
>>>>>>> understand that the overseer can be taxing to the cluster, but honestly
>>>>>>> what is the point of having an untaxed cluster that doesn't have an
>>>>>>> overseer? I can see arguments for the other roles to be stricter about
>>>>>>> this, but there are also a lot of users who wouldn't want those to be
>>>>>>> strict either (like "query" nodes).
>>>>>>> >>>>>>>>>>> >>>>
>>>>>>> >>>>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a
>>>>>>> non-overseer role node HAS to be selected to become overseer, it will try
>>>>>>> to migrate the overseer job to a node with the overseer role whenever one
>>>>>>> becomes live.
>>>>>>> >>>>>>>>>>> >>>>
>>>>>>> >>>>>>>>>>> >>>> So maybe we don't have special rules per role, but
>>>>>>> instead roles can either be defined as "Strict" or "Loose" (better names
>>>>>>> likely exist), and the roles come with a default (Overseer -> Loose, Data
>>>>>>> -> Strict, Query -> Loose, etc.). And it is up to each role to define how
>>>>>>> to behave when running in LOOSE mode and a non-role node is used then a
>>>>>>> role node comes online (like the overseer example given above).
>>>>>>> >>>>>>>>>>> >>>>
>>>>>>> >>>>>>>>>>> >>>> With the Strict/Loose option and sensible defaults,
>>>>>>> users cannot trip themselves up by default, but the option is there for
>>>>>>> people to tinker and have an iron grip over their cluster.
>>>>>>> >>>>>>>>>>> >>>>
>>>>>>> >>>>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <
>>>>>>> mdrob@mdrob.com> wrote:
>>>>>>> >>>>>>>>>>> >>>>>
>>>>>>> >>>>>>>>>>> >>>>> Noble wrote:
>>>>>>> >>>>>>>>>>> >>>>> > We are not modifying the way the "overseer role"
>>>>>>> works today. We are just changing the definition and standardizing the
>>>>>>> configuration & discoverability
>>>>>>> >>>>>>>>>>> >>>>> Ishan wrote:
>>>>>>> >>>>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the
>>>>>>> OVERSEER role (which currently stands for preferred overseer). We can take
>>>>>>> a stab at refactoring it later.
>>>>>>> >>>>>>>>>>> >>>>>
>>>>>>> >>>>>>>>>>> >>>>> Grouping these two comments together, since I
>>>>>>> think they are saying the same thing. I think this is part of my confusion.
>>>>>>> We have an old system that doesn't work the way we want the new system to
>>>>>>> work. There may be people already using the old system. What path do we
>>>>>>> offer for folks using the old system to migrate to the new system? What
>>>>>>> happens if somebody accidentally tries to use both systems at the same time?
>>>>>>> >>>>>>>>>>> >>>>>
>>>>>>> >>>>>>>>>>> >>>>> Ishan wrote:
>>>>>>> >>>>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with
>>>>>>> OVERSEER role] are live, Solr guarantees that one of those nodes becomes
>>>>>>> the overseer.", I meant to somewhat capture the current behaviour as the
>>>>>>> OVERSEER role performs today. Do you see any inconsistency with this
>>>>>>> statement vs. what it does today?
>>>>>>> >>>>>>>>>>> >>>>>
>>>>>>> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
>>>>>>> happens if all of our existing OVERSEER candidates are down. When at least
>>>>>>> one of them is up, the overseer will go there, and that is good and
>>>>>>> expected. But what happens if all of the overseer eligible nodes are down.
>>>>>>> Your comment, and the old system, would imply that the overseer election
>>>>>>> goes to some other unrelated, untagged node. I disagree with this
>>>>>>> implementation choice. This sounds like something role specific to
>>>>>>> determine, but I would like to see us be more strict about it. I don't want
>>>>>>> cores leaking out of my data roles, I don't want query processing to leak
>>>>>>> out of my "query" nodes or whatever. Overseer shouldn't be special in this
>>>>>>> regard.
>>>>>>> >>>>>>>>>>> >>>>>
>>>>>>> >>>>>>>>>>> >>>>> Noble wrote:
>>>>>>> >>>>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or
>>>>>>> a node in the following request?
>>>>>>> >>>>>>>>>>> >>>>>
>>>>>>> >>>>>>>>>>> >>>>> You're absolutely correct, thanks for pointing
>>>>>>> this out. Let's leave it as is.
>>>>>>> >>>>>>>>>>> >>>>>
>>>>>>> >>>>>>>>>>> >>>>>
>>>>>>> >>>>>>>>>>> >>>>>
>>>>>>> >>>>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan
>>>>>>> Chattopadhyaya <ic...@gmail.com> wrote:
>>>>>>> >>>>>>>>>>> >>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>
>>>>>>> >>>>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <
>>>>>>> mdrob@mdrob.com> wrote:
>>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>> Replying to the top post in this thread because
>>>>>>> there has been a lot of discussion and I don't want to look like I'm
>>>>>>> continuing any of those particular threads.
>>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>> I finally had time to sit down and think about
>>>>>>> this with the attention it deserves and am generally happy with how the
>>>>>>> conversation has shaped the current proposal.
>>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>> GOOD: I think using system properties to define
>>>>>>> node roles is fine and I like that data is the default role when not
>>>>>>> defined. I think it is important to hold on to the guarantee that an active
>>>>>>> overseer will land on an overseer node role.
>>>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration
>>>>>>> path for folks using the current OVERSEER role. I am not sure that
>>>>>>> something can be done automatically since they need to now specify new
>>>>>>> properties at startup. Maybe we need to include loud warnings or support
>>>>>>> both approaches for a time?
>>>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the
>>>>>>> overseer nodes fail, then it is implied the overseer will go to one of the
>>>>>>> data nodes. The specific wording in the SIP - "When one or more such nodes
>>>>>>> are live, Solr guarantees that one of those nodes become the overseer."
>>>>>>> implies to me that failover could go from overseer1 to overseer2 to
>>>>>>> overseerN to random node. I feel like we need to have some recording that
>>>>>>> there were dedicated overseer nodes and stop the cascading failure instead
>>>>>>> of churning through our data nodes.
>>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the
>>>>>>> proposed scope of "coordinator" roles from a split query/indexing
>>>>>>> standpoint. I understand that these are used as examples, but would like
>>>>>>> stronger language that new roles should also go through their own SIP
>>>>>>> discussions.
>>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing
>>>>>>> node liveness in two different places now. We have the live nodes and we
>>>>>>> have the node roles stored in two different places in zookeeper and it
>>>>>>> feels like this would lead to race conditions or split brain or other hard
>>>>>>> to diagnose bugs when those two lists don't agree with each other. This
>>>>>>> also feels like it contradicts the "single source of truth" idea later
>>>>>>> stated in the proposal. I see Gus's arguments for decoupling these and am
>>>>>>> not strongly opposed, I just get a lurking feeling about it. Even if we
>>>>>>> don't do this, I would like this called out explicitly in the alternative
>>>>>>> approaches section as something that we considered and rejected, with
>>>>>>> details why,
>>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like
>>>>>>> an additional call out here that all operations are GET because nodes
>>>>>>> cannot be changed at runtime.
>>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the
>>>>>>> previous OVERSEER preference role?
>>>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the
>>>>>>> list of available roles for a cluster. I _think_ this could be based on the
>>>>>>> version that the cluster is running? Would be useful to be able to
>>>>>>> interrogate a cluster in the future... we're seeing OOM issues on queries,
>>>>>>> can we add some query nodes? When were they introduced? I don't know what
>>>>>>> path this API should exist at.
>>>>>>> >>>>>>>>>>> >>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>
>>>>>>> >>>>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API,
>>>>>>> updated the SIP document. Not sure if there's a better path that we could
>>>>>>> go for.
>>>>>>> >>>>>>>>>>> >>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly
>>>>>>> show which parts are string literals and which parts are meant to be
>>>>>>> substituted by the operator? GET /api/cluster/roles/data would become GET
>>>>>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET
>>>>>>> /api/cluster/roles/nodes/node1 should be GET /api/cluster/roles/${nodename}
>>>>>>> dropping the intermediate "nodes"
>>>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not
>>>>>>> need that intermediate "nodes" node.
>>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some
>>>>>>> permissions? Maybe this requirement is too fundamental to the operation of
>>>>>>> a cluster and everybody would have to be able to do it.
>>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other
>>>>>>> clients) to treat roles? Implementation detail that the servers will figure
>>>>>>> out? Or strict guidance where the client needs to check where specific
>>>>>>> roles are before sending any further communication to the server?
>>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a
>>>>>>> request that it can't fulfil? An overseer node gets a query or an update. A
>>>>>>> data node gets a collection creation request. Do they forward it on to an
>>>>>>> appropriate node, or do they reject it? Should this be configurable? If
>>>>>>> not, then it seems like lazy or poorly configured clients will defeat this
>>>>>>> isolation system quite easily.
>>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes
>>>>>>> behave when roles are added mean? I thought we established that they are
>>>>>>> not dynamic.
>>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>> Thanks,
>>>>>>> >>>>>>>>>>> >>>>>>> Mike
>>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan
>>>>>>> Chattopadhyaya <ic...@gmail.com> wrote:
>>>>>>> >>>>>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>>> Hi,
>>>>>>> >>>>>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of
>>>>>>> node roles:
>>>>>>> >>>>>>>>>>> >>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>>> >>>>>>>>>>> >>>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>>> >>>>>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>>> We also wish to add first class support for
>>>>>>> Query nodes that are used to process user queries by forwarding to data
>>>>>>> nodes, merging/aggregating them and presenting to users. This concept
>>>>>>> exists as first class citizens in most other search engines. This is a
>>>>>>> chance for Solr to catch up.
>>>>>>> >>>>>>>>>>> >>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>>> >>>>>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>>>> >>>>>>>> Regards,
>>>>>>> >>>>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
>>>>>>> >>>>>>>>>>> >>>
>>>>>>> >>>>>>>>>>> >>>
>>>>>>> >>>>>>>>>>> >>>
>>>>>>> >>>>>>>>>>> >>> --
>>>>>>> >>>>>>>>>>> >>> http://www.needhamsoftware.com (work)
>>>>>>> >>>>>>>>>>> >>> http://www.the111shift.com (play)
>>>>>>> >>>>>>>>>>>
>>>>>>> >>>>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>>>> >>>>>>>>>>> For additional commands, e-mail:
>>>>>>> dev-help@solr.apache.org
>>>>>>> >>>>>>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> --
>>>>>>> >>>>>>> http://www.needhamsoftware.com (work)
>>>>>>> >>>>>>> http://www.the111shift.com (play)
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> --
>>>>>>> >>>>> -----------------------------------------------------
>>>>>>> >>>>> Noble Paul
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> --
>>>>>>> >>>> http://www.needhamsoftware.com (work)
>>>>>>> >>>> http://www.the111shift.com (play)
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>> -----------------------------------------------------
>>>>>>> >>> Noble Paul
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> -----------------------------------------------------
>>>>>>> >> Noble Paul
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>>>
>>>>>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>

Re: First class support for node roles

Posted by Jan Høydahl <ja...@cominvent.com>.

Are we making a non-issue into a configuration mess?

The overseer's job is diminishing by every version, and we should not fool ourself into believing that a stray overseer will kill an upgrade, and therefore complicate the whole role system. Also, we should not put so much emhasis on "nodes without roles defined" as if that should be a common way of starting nodes in a huge cluster. In huge clusters users should be explicit about roles on every single node.

So my proposal stands:
- Roles are binary (optional role config can be added, but not part of the role)
- Nodes started without explicit roles get ALL roles, interpreted as ALLOW
- Nodes started with explict roles get exactly those roles, interpreted as ALLOW

I.e. any future role will be ALLOWED on nodes started without explicit roles. Which means any furure "ui" or "zk" role or whatever will be ALLOWED to run on those nodes if that feature is enabled. We have to distinguish between a role which ALLOWS a feature to run and the feature itself, which is enabled e.g. by creating a collection (selects nodes from data role), configuring zookeeper (selects nodes from zk role) etc.

In large clusters where you specify roles explicitly, you will need to carve out what nodes should run new roles (such as UI), and start/restart those nodes with that role before enabling the new feature.

Jan

> 6. des. 2021 kl. 09:12 skrev Ilan Ginzburg <il...@gmail.com>:
> 
> Noble got my intention correctly.
> 
> I think role specific code should only have to deal with the various configuration options for the role. When configuration was binary (role defined or not), then the default is one of the two values, but even then we saw that for data we wanted default true and for Overseer default false.
> 
> If we introduce non boolean roles (i.e. role parameters), absence of a role has to be mapped to one of the values (otherwise it acts as yet another value - confusing) . Role config in absence of explicit role definition for a node has to be defined for each role (data by default on, Overseer by default allowed... ) in some way.
> Using a string separate from the role definitions (Ishan) makes it too easy to have roles for which the default configuration is unknown.
> 
> Ilan
> 
> 
> 
> Le lun. 6 déc. 2021 à 08:58, Ishan Chattopadhyaya <ichattopadhyaya@gmail.com <ma...@gmail.com>> a écrit :
> Role specific configurations can go into /node_roles/${rolename} znode, and that is outside the scope of this SIP. The concept of role specific modes (eg allowed, preferred for overseer) is a welcome addition to original proposal to model the overseer functionality properly without any confusion to user. On top of that, default roles for nodes that don't have any roles defined for it can be assumed by default (data:on, overseer:allowed). 
> 
> Isn't that simple and generic at the same time? Why overcomplicate everything all over again?
> 
> On Mon, 6 Dec, 2021, 12:54 pm Gus Heck, <gus.heck@gmail.com <ma...@gmail.com>> wrote:
> So I think we're loosing sight of the original concept of "default" and conflating it with role configuration.
> 
> When we started talking about "default roles" the idea was "default" was a flag that indicated if the role was active on a Solr Node where no roles had been specified. Plain and simple. Full stop.
> 
> Secondarily any given role might or might not have some configuration associated with it. Optionally a role that accepts configuration may define default configuration values but this has nothing to do with "default role" 
> 
> Default should be an intrinsic binary property of the role as a whole (not specific to a cluster or a node). 
> 
> There are 3 levels to think about 
> Intrinsic Attributes of the role as a whole (Example  --> default: yes/no)
> Configurable attributes for the role across the cluster (Example --> Strict: yes/no) (concept mentioned previously affecting how presence of a role is interpreted by role related code)
> Configurable attributes for the node that relate to the role (Example --> Election_priority_adjust: integer ) (Hypothetical way of influencing who gets elected first in a more fine grained fashion)
> Maybe use the following terminology?
> Role Intrinsic Property
> Role Cluster Config
> Role Node Config
> We almost certainly have to determine what Role Intrinsic Properties we want to support as these are likely to be coded into the role implementation directly, and implementors of roles should specify these. (I'm not presently seeing need for more than "default". 
> 
> The config levels I think we want to mostly identify where that information can be communicated and stored. The Role Cluster Config level is tricky since there's no "cluster" until you start the first "Node" ... so a bit of a chicken/egg there. The Role Node Config  however seems to make sense as a file that gets read and then reflected in zk as appropriate during node startup (config that specified the local directory for something would not need to show up in zk of course, just stuff that another node/overseer/query router/whatever might need to know.
> 
> Definitely let's reword anything that involves the phrase "Two Defaults" since by definition only one value can be the "default" value (I suppose theoretically you could have a mapping of defaults conditional on some other value but that's definitely the opposite of simple). 
> 
> -Gus
> 
> On Mon, Dec 6, 2021 at 12:36 AM Ishan Chattopadhyaya <ichattopadhyaya@gmail.com <ma...@gmail.com>> wrote:
> I think I understand Ilan's motivation for two defaults. Here's a summary of what I understand Ilan's proposal, and a follow up proposal that achieves the similar effect with less perceived complexity to user.
> 
> Ilan's proposal (as I understand it):
> 
> 1. Every role to have two defaults. Example:
> data: {modes: [on, off], default1: on, default2: on}
> overseer: {modes: [allowed, disallowed, preferred], default1: preferred, default2: disallowed}
> ui: {modes: [on, off], default1: on, default2: on}
> 
> 2. Here, default1 is for lazy users for whom "-Dnode.roles=<rolename>" will be interpreted as "-Dnode.roles=<rolename>:<default1 of rolename>".
> 3. Here's default2 for any role role1 is for users who either (a) never specified any roles for a node, or (b) specified other roles, but not role1. In both cases, the behaviour of that node would implicitly assume "role1:<default2 of role1>".
> 
> My alternate proposal:
> 1. There are no role specific defaults. Example:
> data: {modes: [on, off]}
> overseer: {modes: [allowed, disallowed, preferred]}
> ui: {modes: [on, off]}
> 
> 2. There is a node specific default roles string if no -Dnode.roles was specified. Example:
> "data:on, overseer:allowed" (Today's system)
> "data:on, overseer:allowed, ui:on" (When a future role, say "ui" is introduced)
> 
> 3. If a node was started with explicitly specified roles, that node will have exactly those roles (in the specified modes) and nothing else (no assumptions about other non-specified roles, i.e. those roles not specified will not run).
> 
> Benefits of my proposal:
> 1. Easier to understand for users.
> 2. Here's a scenario where user will be happier in my proposal vs. Ilan's proposal:
>    * 10 nodes with -Droles=data:on,overseer:allowed. (Regular data nodes)
>    * 2 nodes with -Droles=overseer:preferred. (Two dedicated overseer nodes)
>    * User upgrades from Solr 9.0 to 9.1, where "ui" role has been introduced. Developers of "ui" role want it to be available for most users.
>         - In Ilan's proposal, the developer chooses this in 9.1: ui: {modes: [on, off], default1: on, default2: on}. Now, user upgrading will see that UI is running on his two overseer nodes, and he's confused (because he explicitly specified what he wants)
>         - In my proposal, the developer chooses ui: {modes: [on, off]}; default roles for those users who don't specify roles: "data:on, overseer:allowed, ui:on". Now, there are no surprises of implicit default. Users who don't use roles at all will get this functionality turned on, just as the developer wanted. Users who use roles will have to explicitly append "ui:on" to their roles string on their nodes during the upgrade (this tip will come from the upgrade notes).
> 
> What do you think, Ilan/Noble/Mike/Gus/Houston?
> 
> On Mon, Dec 6, 2021 at 8:10 AM Noble Paul <noble.paul@gmail.com <ma...@gmail.com>> wrote:
> Ilan was asking how what should be the overseer role in the following situations 
> 
> a) role=overseer,data:on
> b) role=overseer: preferred,data:on
> c) role=data:on
> 
> I'm saying a shouldn't be valid. Only b & c are valid
> 
> 
> 
> 
> 
> 
> On Mon, Dec 6, 2021, 12:44 PM Mike Drob <mdrob@mdrob.com <ma...@mdrob.com>> wrote:
> Ilan,
> 
> Can you provide a more detailed concrete example? I’m having a lot of trouble understanding what you are proposing, beyond that it is somehow contraindicated with what Ishan/Noble suggest.
> 
> Apologies for my failure to understand.
> 
> Thanks,
> Mike
> 
> On Sun, Dec 5, 2021 at 5:21 PM Ilan Ginzburg <ilansolr@gmail.com <ma...@gmail.com>> wrote:
> If we go with optional role params, we need two defaults:
> 1. the param value to use when the role is specified without a parameter, and
> 2. the param value to use for the role on a node for which the role is
> not specified at all.
> 
> I don't know how to sensibly name these defaults, but the actual
> values would be:
> overseer: default1=preferred, default2=allowed
> data: default1=on, default2=on
> coordinator: default1=on, default2=off
> 
> If we do not allow specifying a role without a parameter, then
> default1 does not exist and the example Noble posted earlier covers
> us. But simple roles will be easier to use without parameters (and the
> transition from existing overseer role would be trivial).
> 
> On Sun, Dec 5, 2021 at 7:17 AM Ishan Chattopadhyaya
> <ichattopadhyaya@gmail.com <ma...@gmail.com>> wrote:
> >
> > I'm +1 on this. It "looks" complicated at first, but simplifies all headaches going forward.
> >
> > On Sun, Dec 5, 2021 at 11:46 AM Noble Paul <noble.paul@gmail.com <ma...@gmail.com>> wrote:
> >>
> >> I shall update the SIP proposal if we have a consensus on this configuration
> >>
> >> On Sun, Dec 5, 2021 at 4:58 PM Noble Paul <noble.paul@gmail.com <ma...@gmail.com>> wrote:
> >>>
> >>>
> >>>
> >>> On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <gus.heck@gmail.com <ma...@gmail.com>> wrote:
> >>>>
> >>>> I like this in that it's an example of how the overseer might be extended without creating a new role :)
> >>>>
> >>>> Not entirely sure if I'm for or against an enum implementation here, but it makes me a bit nervous. Enums with complexity can quickly get into difficulty for unit tests (especially if one wanted to write a mock object based test, something I think we maybe should use a bit more than we do).
> >>>>
> >>>>
> >>>>
> >>>> I would tend to think of a class to represent and collect role related functionality, one that perhaps has methods that receive the request, or other key objects and thus could be tested without standing up an entire server. (Not against also having them exercised in a few integrated tests, but the more we can avoid interleaving logic directly within DispatchFilter and HttpSolrCall etc. the better.
> >>>>
> >>>>
> >>>> So I guess I'm somewhat biased against any enum with more than a couple properties, and definitely don't want to wind up hanging lots of methods off of one. Better to use them to consume a configuration value and then instantiate a class that really holds the logic and data. I like them for constraining values and easy string value conversion but the more they look like classes the more I'd rather have a class.
> >>>
> >>>
> >>>  I just meant it is a set of values. Please let us not discuss the actual impl here . We should stick to discussing the high level design here and specifics should be dealt with in a PR
> >>>>
> >>>>
> >>>> -Gus
> >>>>
> >>>> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <noble.paul@gmail.com <ma...@gmail.com>> wrote:
> >>>>>
> >>>>> I recommend the following format for the role spec
> >>>>>
> >>>>> roles=<role-name>:<role-value>
> >>>>>
> >>>>> each role will have an enum of allowed values and a default value
> >>>>>
> >>>>> role name: data
> >>>>>
> >>>>> values: [on, off]
> >>>>> default: allowed
> >>>>>
> >>>>> role name: overseer
> >>>>>
> >>>>> values: [allowed, disallowed, preferred]
> >>>>> default : allowed
> >>>>>
> >>>>> role name: coordinator
> >>>>>
> >>>>> values : [on, off]
> >>>>> default: off
> >>>>>
> >>>>>
> >>>>> examples
> >>>>> roles=data:on,overseer:allowed (This is redundant because it uses all the default values. If a node is started without any roles value this is the default behavior)
> >>>>> roles=data:off,overseer:preferred ( do not allow data, join overseer election at head)
> >>>>> roles=coordinator:on,data:on (role as coordinator, but allow data, it's same as roles=coordinator:on)
> >>>>> roles=coordinator:on,data:off (role as coordinator, disallow data)
> >>>>>
> >>>>>
> >>>>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <ilansolr@gmail.com <ma...@gmail.com>> wrote:
> >>>>>>
> >>>>>> If we go with no negative node roles and overseer node role is not strict (i.e. it’s a "preferred overseer"), then one would need to define a second node role "no_overseer" to explicitly exclude a node from ever becoming overseer (which I think is a useful feature until we switch the cluster default to not using the overseer), plus the implementation of these two node roles will obviously be coupled (and what if a node has both defined?).
> >>>>>>
> >>>>>> I prefer strict node roles.
> >>>>>> Maybe we could have node roles with [optional] parameters to let the node role implementation decide ?
> >>>>>> The overseer node role for example could have one of 3 values defined for each node: “preferred” (default, equivalent to the existing overseer role), "accepted" (equivalent to currently not defining the overseer role) and "no_way" (does not exist today).
> >>>>>>
> >>>>>> This could be useful in other contexts. A node role “data” could be “fast” or “slow” depending on type of local persistent storage for example…
> >>>>>>
> >>>>>> Ilan
> >>>>>>
> >>>>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gus.heck@gmail.com <ma...@gmail.com>> wrote:
> >>>>>>>
> >>>>>>> I really don't think we should have types of roles. Not negative/positive and not strict/non-strict. You have a role or you don't. What that means is up to the code implementing the role.
> >>>>>>>
> >>>>>>> Roles should be free to configure a preference order (binary, or n-ary or whatever, strict or loose), prohibit behavior, or enable behavior. In this SIP I feel we should focus on How to identify what node has what role, How to designate what roles a node has via config/params, and the API's for interacting with roles.
> >>>>>>>
> >>>>>>> We should for example be able to support roles such as
> >>>>>>>
> >>>>>>> PREFERRED_OVERSEER
> >>>>>>> DATA
> >>>>>>> NO_ROUTED_ALIAS  (just an example, not something I mean to suggest)
> >>>>>>>
> >>>>>>> Details about role implementation should probably be discussed in a thread about that role.  Obviously we should think about the name carefully to leave options open should we want to enhance things later so maybe
> >>>>>>>
> >>>>>>> OVERSEER_PREF  or just  OVERSEER
> >>>>>>>
> >>>>>>> would be better since it merely reades that the node implements some sort of preference or config regarding overseer... but all this can be decided on a per role basis
> >>>>>>>
> >>>>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <noble.paul@gmail.com <ma...@gmail.com>> wrote:
> >>>>>>>>
> >>>>>>>> Negative roles have a place
> >>>>>>>>
> >>>>>>>> Example is overseer
> >>>>>>>>
> >>>>>>>> There are 3 possible choices for that role
> >>>>>>>>
> >>>>>>>> a) preferred: always be in front of the election queue
> >>>>>>>> b) on: not preferred, but can be an overseer if no preferred overseer nodes are available
> >>>>>>>> c) off: never become an overseer
> >>>>>>>>
> >>>>>>>> Today we only have options 'a' and 'b' . In a future ticket, we may implement C
> >>>>>>>>
> >>>>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <mdrob@mdrob.com <ma...@mdrob.com>> wrote:
> >>>>>>>>>
> >>>>>>>>> Negative roles add a lot of complexity, I would really want to stay away from them. That’s why I want strict roles up front. It’s maybe ok to push this decision out, but it also seems like the sort of thing we should consider at the start.
> >>>>>>>>>
> >>>>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <noble.paul@gmail.com <ma...@gmail.com>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Yes. Negative roles is not a bad idea. If I start a node for machine learning purposes, I wouldn't want that node to ever participate in overseer election
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <ilansolr@gmail.com <ma...@gmail.com>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> If we have non strict roles (like overseer), then it does make sense
> >>>>>>>>>>> to have negative roles.
> >>>>>>>>>>> That way I can define which are the two nodes that I'd prefer the
> >>>>>>>>>>> overseer to run on, and a few other nodes on which it should
> >>>>>>>>>>> definitely never run for various reasons. And in case these
> >>>>>>>>>>> "!overseer" are the only nodes left in the cluster, let the cluster
> >>>>>>>>>>> fail the same way it would if there were no data nodes available.
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <houstonputman@gmail.com <ma...@gmail.com>> wrote:
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> With the Strict/Loose option and sensible defaults, users cannot trip themselves up by default, but the option is there for people to tinker and have an iron grip over their cluster.
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> +1 to sensible defaults so users don't trip themselves. The option to tinker for tighter grip can be tackled later, either on a per role basis or as a generic concept later.
> >>>>>>>>>>> >
> >>>>>>>>>>> >
> >>>>>>>>>>> > +1 - Can definitely be added later if we so desire, not needed for this SIP
> >>>>>>>>>>> >
> >>>>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <ichattopadhyaya@gmail.com <ma...@gmail.com>> wrote:
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gus.heck@gmail.com <ma...@gmail.com>> wrote:
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> I think the key  is to let the roles have full control of the implications of having/not having that role. No need for even a strict/loose designation. The question of do you have the role is yes/no with no logic to guess if the role is implied or not, The question of will it come up with the role is "have_explicit ? use_defaults : use_defaults.
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> Once you figure out who has a role (or not) what that means is up to the role code.
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> Corollary: we don't have to change the way overseer works in this SIP. We can rework it or not as we see fit separately.
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> +1
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> Only thing we need to do is find a wording that makes the above clear on first read through the SIP :)
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> -Gus
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <houstonputman@gmail.com <ma...@gmail.com>> wrote:
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> This doesn't really address my concern around what happens if all of our existing OVERSEER candidates are down. When at least one of them is up, the overseer will go there, and that is good and expected. But what happens if all of the overseer eligible nodes are down. Your comment, and the old system, would imply that the overseer election goes to some other unrelated, untagged node. I disagree with this implementation choice. This sounds like something role specific to determine, but I would like to see us be more strict about it. I don't want cores leaking out of my data roles, I don't want query processing to leak out of my "query" nodes or whatever. Overseer shouldn't be special in this regard.
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> I'm very strongly in favor of not letting users design a system in which the cluster can be "live" without an overseer. I understand that the overseer can be taxing to the cluster, but honestly what is the point of having an untaxed cluster that doesn't have an overseer? I can see arguments for the other roles to be stricter about this, but there are also a lot of users who wouldn't want those to be strict either (like "query" nodes).
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a non-overseer role node HAS to be selected to become overseer, it will try to migrate the overseer job to a node with the overseer role whenever one becomes live.
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> So maybe we don't have special rules per role, but instead roles can either be defined as "Strict" or "Loose" (better names likely exist), and the roles come with a default (Overseer -> Loose, Data -> Strict, Query -> Loose, etc.). And it is up to each role to define how to behave when running in LOOSE mode and a non-role node is used then a role node comes online (like the overseer example given above).
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> With the Strict/Loose option and sensible defaults, users cannot trip themselves up by default, but the option is there for people to tinker and have an iron grip over their cluster.
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <mdrob@mdrob.com <ma...@mdrob.com>> wrote:
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Noble wrote:
> >>>>>>>>>>> >>>>> > We are not modifying the way the "overseer role" works today. We are just changing the definition and standardizing the configuration & discoverability
> >>>>>>>>>>> >>>>> Ishan wrote:
> >>>>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER role (which currently stands for preferred overseer). We can take a stab at refactoring it later.
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Grouping these two comments together, since I think they are saying the same thing. I think this is part of my confusion. We have an old system that doesn't work the way we want the new system to work. There may be people already using the old system. What path do we offer for folks using the old system to migrate to the new system? What happens if somebody accidentally tries to use both systems at the same time?
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Ishan wrote:
> >>>>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER role] are live, Solr guarantees that one of those nodes becomes the overseer.", I meant to somewhat capture the current behaviour as the OVERSEER role performs today. Do you see any inconsistency with this statement vs. what it does today?
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> This doesn't really address my concern around what happens if all of our existing OVERSEER candidates are down. When at least one of them is up, the overseer will go there, and that is good and expected. But what happens if all of the overseer eligible nodes are down. Your comment, and the old system, would imply that the overseer election goes to some other unrelated, untagged node. I disagree with this implementation choice. This sounds like something role specific to determine, but I would like to see us be more strict about it. I don't want cores leaking out of my data roles, I don't want query processing to leak out of my "query" nodes or whatever. Overseer shouldn't be special in this regard.
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Noble wrote:
> >>>>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a node in the following request?
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> You're absolutely correct, thanks for pointing this out. Let's leave it as is.
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <ichattopadhyaya@gmail.com <ma...@gmail.com>> wrote:
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <mdrob@mdrob.com <ma...@mdrob.com>> wrote:
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> Replying to the top post in this thread because there has been a lot of discussion and I don't want to look like I'm continuing any of those particular threads.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> I finally had time to sit down and think about this with the attention it deserves and am generally happy with how the conversation has shaped the current proposal.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> GOOD: I think using system properties to define node roles is fine and I like that data is the default role when not defined. I think it is important to hold on to the guarantee that an active overseer will land on an overseer node role.
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for folks using the current OVERSEER role. I am not sure that something can be done automatically since they need to now specify new properties at startup. Maybe we need to include loud warnings or support both approaches for a time?
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer nodes fail, then it is implied the overseer will go to one of the data nodes. The specific wording in the SIP - "When one or more such nodes are live, Solr guarantees that one of those nodes become the overseer." implies to me that failover could go from overseer1 to overseer2 to overseerN to random node. I feel like we need to have some recording that there were dedicated overseer nodes and stop the cascading failure instead of churning through our data nodes.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed scope of "coordinator" roles from a split query/indexing standpoint. I understand that these are used as examples, but would like stronger language that new roles should also go through their own SIP discussions.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node liveness in two different places now. We have the live nodes and we have the node roles stored in two different places in zookeeper and it feels like this would lead to race conditions or split brain or other hard to diagnose bugs when those two lists don't agree with each other. This also feels like it contradicts the "single source of truth" idea later stated in the proposal. I see Gus's arguments for decoupling these and am not strongly opposed, I just get a lurking feeling about it. Even if we don't do this, I would like this called out explicitly in the alternative approaches section as something that we considered and rejected, with details why,
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an additional call out here that all operations are GET because nodes cannot be changed at runtime.
> >>>>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the previous OVERSEER preference role?
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of available roles for a cluster. I _think_ this could be based on the version that the cluster is running? Would be useful to be able to interrogate a cluster in the future... we're seeing OOM issues on queries, can we add some query nodes? When were they introduced? I don't know what path this API should exist at.
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the SIP document. Not sure if there's a better path that we could go for.
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which parts are string literals and which parts are meant to be substituted by the operator? GET /api/cluster/roles/data would become GET /api/cluster/roles/${rolename} in our SIP/documentation.
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1 should be GET /api/cluster/roles/${nodename} dropping the intermediate "nodes"
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that intermediate "nodes" node.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some permissions? Maybe this requirement is too fundamental to the operation of a cluster and everybody would have to be able to do it.
> >>>>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to treat roles? Implementation detail that the servers will figure out? Or strict guidance where the client needs to check where specific roles are before sending any further communication to the server?
> >>>>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request that it can't fulfil? An overseer node gets a query or an update. A data node gets a collection creation request. Do they forward it on to an appropriate node, or do they reject it? Should this be configurable? If not, then it seems like lazy or poorly configured clients will defeat this isolation system quite easily.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
> >>>>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave when roles are added mean? I thought we established that they are not dynamic.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> Thanks,
> >>>>>>>>>>> >>>>>>> Mike
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <ichattopadhyaya@gmail.com <ma...@gmail.com>> wrote:
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> Hi,
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node roles:
> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694 <https://issues.apache.org/jira/browse/SOLR-15694>
> >>>>>>>>>>> >>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles <https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles>
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> We also wish to add first class support for Query nodes that are used to process user queries by forwarding to data nodes, merging/aggregating them and presenting to users. This concept exists as first class citizens in most other search engines. This is a chance for Solr to catch up.
> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715 <https://issues.apache.org/jira/browse/SOLR-15715>
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> Regards,
> >>>>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> --
> >>>>>>>>>>> >>> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
> >>>>>>>>>>> >>> http://www.the111shift.com <http://www.the111shift.com/> (play)
> >>>>>>>>>>>
> >>>>>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org <ma...@solr.apache.org>
> >>>>>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org <ma...@solr.apache.org>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
> >>>>>>> http://www.the111shift.com <http://www.the111shift.com/> (play)
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> -----------------------------------------------------
> >>>>> Noble Paul
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
> >>>> http://www.the111shift.com <http://www.the111shift.com/> (play)
> >>>
> >>>
> >>>
> >>> --
> >>> -----------------------------------------------------
> >>> Noble Paul
> >>
> >>
> >>
> >> --
> >> -----------------------------------------------------
> >> Noble Paul
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org <ma...@solr.apache.org>
> For additional commands, e-mail: dev-help@solr.apache.org <ma...@solr.apache.org>
> 
> 
> 
> -- 
> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
> http://www.the111shift.com <http://www.the111shift.com/> (play)

Re: First class support for node roles

Posted by Ilan Ginzburg <il...@gmail.com>.

Noble got my intention correctly.

I think role specific code should only have to deal with the various
configuration options for the role. When configuration was binary (role
defined or not), then the default is one of the two values, but even then
we saw that for data we wanted default true and for Overseer default false.

If we introduce non boolean roles (i.e. role parameters), absence of a role
has to be mapped to one of the values (otherwise it acts as yet another
value - confusing) . Role config in absence of explicit role definition for
a node has to be defined for each role (data by default on, Overseer by
default allowed... ) in some way.
Using a string separate from the role definitions (Ishan) makes it too easy
to have roles for which the default configuration is unknown.

Ilan



Le lun. 6 déc. 2021 à 08:58, Ishan Chattopadhyaya <ic...@gmail.com>
a écrit :

> Role specific configurations can go into /node_roles/${rolename} znode,
> and that is outside the scope of this SIP. The concept of role specific
> modes (eg allowed, preferred for overseer) is a welcome addition to
> original proposal to model the overseer functionality properly without any
> confusion to user. On top of that, default roles for nodes that don't have
> any roles defined for it can be assumed by default (data:on,
> overseer:allowed).
>
> Isn't that simple and generic at the same time? Why overcomplicate
> everything all over again?
>
> On Mon, 6 Dec, 2021, 12:54 pm Gus Heck, <gu...@gmail.com> wrote:
>
>> So I think we're loosing sight of the original concept of "default" and
>> conflating it with role configuration.
>>
>> When we started talking about "default roles" the idea was "default" was
>> a flag that indicated if the role was active on a Solr Node where no roles
>> had been specified. Plain and simple. Full stop.
>>
>> Secondarily any given role might or might not have some configuration
>> associated with it. Optionally a role that accepts configuration may define
>> default configuration values but this has nothing to do with "default role"
>>
>> Default should be an intrinsic binary property of the role as a whole
>> (not specific to a cluster or a node).
>>
>> There are 3 levels to think about
>>
>>    1. Intrinsic Attributes of the role as a whole (Example  --> default:
>>    yes/no)
>>    2. Configurable attributes for the role across the cluster (Example
>>    --> Strict: yes/no) (concept mentioned previously affecting how presence of
>>    a role is interpreted by role related code)
>>    3. Configurable attributes for the node that relate to the role
>>    (Example --> Election_priority_adjust: integer ) (Hypothetical way of
>>    influencing who gets elected first in a more fine grained fashion)
>>
>> Maybe use the following terminology?
>>
>>    1. Role Intrinsic Property
>>    2. Role Cluster Config
>>    3. Role Node Config
>>
>> We almost certainly have to determine what Role Intrinsic Properties we
>> want to support as these are likely to be coded into the role
>> implementation directly, and implementors of roles should specify these.
>> (I'm not presently seeing need for more than "default".
>>
>> The config levels I think we want to mostly identify where that
>> information can be communicated and stored. The Role Cluster Config level
>> is tricky since there's no "cluster" until you start the first "Node" ...
>> so a bit of a chicken/egg there. The Role Node Config  however seems to
>> make sense as a file that gets read and then reflected in zk as appropriate
>> during node startup (config that specified the local directory for
>> something would not need to show up in zk of course, just stuff that
>> another node/overseer/query router/whatever might need to know.
>>
>> Definitely let's reword anything that involves the phrase "Two Defaults"
>> since by definition only one value can be the "default" value (I suppose
>> theoretically you could have a mapping of defaults conditional on some
>> other value but that's definitely the opposite of simple).
>>
>> -Gus
>>
>> On Mon, Dec 6, 2021 at 12:36 AM Ishan Chattopadhyaya <
>> ichattopadhyaya@gmail.com> wrote:
>>
>>> I think I understand Ilan's motivation for two defaults. Here's a
>>> summary of what I understand Ilan's proposal, and a follow up proposal that
>>> achieves the similar effect with less perceived complexity to user.
>>>
>>> *Ilan's proposal (as I understand it):*
>>>
>>> 1. Every role to have two defaults. Example:
>>> data: {modes: [on, off], default1: on, *default2: on*}
>>> overseer: {modes: [allowed, disallowed, preferred], default1: preferred, *default2:
>>> disallowed*}
>>> ui: {modes: [on, off], default1: on, *default2: on*}
>>>
>>> 2. Here, default1 is for lazy users for whom "-Dnode.roles=<rolename>"
>>> will be interpreted as "-Dnode.roles=<rolename>:<default1 of rolename>".
>>> 3. Here's default2 for any role *role1* is for users who either (a)
>>> never specified any roles for a node, or (b) specified other roles, but not
>>> *role1*. In both cases, the behaviour of that node would implicitly
>>> assume "role1:<default2 of role1>".
>>>
>>> *My alternate proposal:*
>>> 1. There are no role specific defaults. Example:
>>> data: {modes: [on, off]}
>>> overseer: {modes: [allowed, disallowed, preferred]}
>>> ui: {modes: [on, off]}
>>>
>>> 2. There is a node specific default roles string *if no *-Dnode.roles
>>> was specified. Example:
>>> "data:on, overseer:allowed" (Today's system)
>>> "data:on, overseer:allowed, ui:on" (When a future role, say "ui" is
>>> introduced)
>>>
>>> 3. If a node was started with explicitly specified roles, that node will
>>> have exactly those roles (in the specified modes) and nothing else (no
>>> assumptions about other non-specified roles, i.e. those roles not specified
>>> will not run).
>>>
>>> *Benefits of my proposal:*
>>> 1. Easier to understand for users.
>>> 2. Here's a scenario where user will be happier in my proposal vs.
>>> Ilan's proposal:
>>>    * 10 nodes with *-Droles=data:on,overseer:allowed*. (Regular data
>>> nodes)
>>>    * 2 nodes with *-Droles=overseer:preferred*. (Two dedicated overseer
>>> nodes)
>>>    * User upgrades from Solr 9.0 to 9.1, where "ui" role has been
>>> introduced. Developers of "ui" role want it to be available for most users.
>>>         - In Ilan's proposal, the developer chooses this in 9.1: ui:
>>> {modes: [on, off], default1: on, *default2: on*}. Now, user upgrading
>>> will see that UI is running on his two overseer nodes, and he's confused
>>> (because he explicitly specified what he wants)
>>>         - In my proposal, the developer chooses ui: {modes: [on, off]};
>>> default roles for those users who don't specify roles: "data:on,
>>> overseer:allowed, ui:on". Now, there are no surprises of implicit
>>> default. Users who don't use roles at all will get this functionality
>>> turned on, just as the developer wanted. Users who use roles will have to
>>> explicitly append "ui:on" to their roles string on their nodes during the
>>> upgrade (this tip will come from the upgrade notes).
>>>
>>> What do you think, Ilan/Noble/Mike/Gus/Houston?
>>>
>>> On Mon, Dec 6, 2021 at 8:10 AM Noble Paul <no...@gmail.com> wrote:
>>>
>>>> Ilan was asking how what should be the overseer role in the following
>>>> situations
>>>>
>>>> a) role=overseer,data:on
>>>> b) role=overseer: preferred,data:on
>>>> c) role=data:on
>>>>
>>>> I'm saying a shouldn't be valid. Only b & c are valid
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Dec 6, 2021, 12:44 PM Mike Drob <md...@mdrob.com> wrote:
>>>>
>>>>> Ilan,
>>>>>
>>>>> Can you provide a more detailed concrete example? I’m having a lot of
>>>>> trouble understanding what you are proposing, beyond that it is somehow
>>>>> contraindicated with what Ishan/Noble suggest.
>>>>>
>>>>> Apologies for my failure to understand.
>>>>>
>>>>> Thanks,
>>>>> Mike
>>>>>
>>>>> On Sun, Dec 5, 2021 at 5:21 PM Ilan Ginzburg <il...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> If we go with optional role params, we need two defaults:
>>>>>> 1. the param value to use when the role is specified without a
>>>>>> parameter, and
>>>>>> 2. the param value to use for the role on a node for which the role is
>>>>>> not specified at all.
>>>>>>
>>>>>> I don't know how to sensibly name these defaults, but the actual
>>>>>> values would be:
>>>>>> overseer: default1=preferred, default2=allowed
>>>>>> data: default1=on, default2=on
>>>>>> coordinator: default1=on, default2=off
>>>>>>
>>>>>> If we do not allow specifying a role without a parameter, then
>>>>>> default1 does not exist and the example Noble posted earlier covers
>>>>>> us. But simple roles will be easier to use without parameters (and the
>>>>>> transition from existing overseer role would be trivial).
>>>>>>
>>>>>> On Sun, Dec 5, 2021 at 7:17 AM Ishan Chattopadhyaya
>>>>>> <ic...@gmail.com> wrote:
>>>>>> >
>>>>>> > I'm +1 on this. It "looks" complicated at first, but simplifies all
>>>>>> headaches going forward.
>>>>>> >
>>>>>> > On Sun, Dec 5, 2021 at 11:46 AM Noble Paul <no...@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> I shall update the SIP proposal if we have a consensus on this
>>>>>> configuration
>>>>>> >>
>>>>>> >> On Sun, Dec 5, 2021 at 4:58 PM Noble Paul <no...@gmail.com>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <gu...@gmail.com>
>>>>>> wrote:
>>>>>> >>>>
>>>>>> >>>> I like this in that it's an example of how the overseer might be
>>>>>> extended without creating a new role :)
>>>>>> >>>>
>>>>>> >>>> Not entirely sure if I'm for or against an enum implementation
>>>>>> here, but it makes me a bit nervous. Enums with complexity can quickly get
>>>>>> into difficulty for unit tests (especially if one wanted to write a mock
>>>>>> object based test, something I think we maybe should use a bit more than we
>>>>>> do).
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> I would tend to think of a class to represent and collect role
>>>>>> related functionality, one that perhaps has methods that receive the
>>>>>> request, or other key objects and thus could be tested without standing up
>>>>>> an entire server. (Not against also having them exercised in a few
>>>>>> integrated tests, but the more we can avoid interleaving logic directly
>>>>>> within DispatchFilter and HttpSolrCall etc. the better.
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> So I guess I'm somewhat biased against any enum with more than a
>>>>>> couple properties, and definitely don't want to wind up hanging lots of
>>>>>> methods off of one. Better to use them to consume a configuration value and
>>>>>> then instantiate a class that really holds the logic and data. I like them
>>>>>> for constraining values and easy string value conversion but the more they
>>>>>> look like classes the more I'd rather have a class.
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>  I just meant it is a set of values. Please let us not discuss
>>>>>> the actual impl here . We should stick to discussing the high level design
>>>>>> here and specifics should be dealt with in a PR
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> -Gus
>>>>>> >>>>
>>>>>> >>>> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <no...@gmail.com>
>>>>>> wrote:
>>>>>> >>>>>
>>>>>> >>>>> I recommend the following format for the role spec
>>>>>> >>>>>
>>>>>> >>>>> roles=<role-name>:<role-value>
>>>>>> >>>>>
>>>>>> >>>>> each role will have an enum of allowed values and a default
>>>>>> value
>>>>>> >>>>>
>>>>>> >>>>> role name: data
>>>>>> >>>>>
>>>>>> >>>>> values: [on, off]
>>>>>> >>>>> default: allowed
>>>>>> >>>>>
>>>>>> >>>>> role name: overseer
>>>>>> >>>>>
>>>>>> >>>>> values: [allowed, disallowed, preferred]
>>>>>> >>>>> default : allowed
>>>>>> >>>>>
>>>>>> >>>>> role name: coordinator
>>>>>> >>>>>
>>>>>> >>>>> values : [on, off]
>>>>>> >>>>> default: off
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> examples
>>>>>> >>>>> roles=data:on,overseer:allowed (This is redundant because it
>>>>>> uses all the default values. If a node is started without any roles value
>>>>>> this is the default behavior)
>>>>>> >>>>> roles=data:off,overseer:preferred ( do not allow data, join
>>>>>> overseer election at head)
>>>>>> >>>>> roles=coordinator:on,data:on (role as coordinator, but allow
>>>>>> data, it's same as roles=coordinator:on)
>>>>>> >>>>> roles=coordinator:on,data:off (role as coordinator, disallow
>>>>>> data)
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <
>>>>>> ilansolr@gmail.com> wrote:
>>>>>> >>>>>>
>>>>>> >>>>>> If we go with no negative node roles and overseer node role is
>>>>>> not strict (i.e. it’s a "preferred overseer"), then one would need to
>>>>>> define a second node role "no_overseer" to explicitly exclude a node from
>>>>>> ever becoming overseer (which I think is a useful feature until we switch
>>>>>> the cluster default to not using the overseer), plus the implementation of
>>>>>> these two node roles will obviously be coupled (and what if a node has both
>>>>>> defined?).
>>>>>> >>>>>>
>>>>>> >>>>>> I prefer strict node roles.
>>>>>> >>>>>> Maybe we could have node roles with [optional] parameters to
>>>>>> let the node role implementation decide ?
>>>>>> >>>>>> The overseer node role for example could have one of 3 values
>>>>>> defined for each node: “preferred” (default, equivalent to the existing
>>>>>> overseer role), "accepted" (equivalent to currently not defining the
>>>>>> overseer role) and "no_way" (does not exist today).
>>>>>> >>>>>>
>>>>>> >>>>>> This could be useful in other contexts. A node role “data”
>>>>>> could be “fast” or “slow” depending on type of local persistent storage for
>>>>>> example…
>>>>>> >>>>>>
>>>>>> >>>>>> Ilan
>>>>>> >>>>>>
>>>>>> >>>>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com>
>>>>>> wrote:
>>>>>> >>>>>>>
>>>>>> >>>>>>> I really don't think we should have types of roles. Not
>>>>>> negative/positive and not strict/non-strict. You have a role or you don't.
>>>>>> What that means is up to the code implementing the role.
>>>>>> >>>>>>>
>>>>>> >>>>>>> Roles should be free to configure a preference order (binary,
>>>>>> or n-ary or whatever, strict or loose), prohibit behavior, or enable
>>>>>> behavior. In this SIP I feel we should focus on How to identify what node
>>>>>> has what role, How to designate what roles a node has via config/params,
>>>>>> and the API's for interacting with roles.
>>>>>> >>>>>>>
>>>>>> >>>>>>> We should for example be able to support roles such as
>>>>>> >>>>>>>
>>>>>> >>>>>>> PREFERRED_OVERSEER
>>>>>> >>>>>>> DATA
>>>>>> >>>>>>> NO_ROUTED_ALIAS  (just an example, not something I mean to
>>>>>> suggest)
>>>>>> >>>>>>>
>>>>>> >>>>>>> Details about role implementation should probably be
>>>>>> discussed in a thread about that role.  Obviously we should think about the
>>>>>> name carefully to leave options open should we want to enhance things later
>>>>>> so maybe
>>>>>> >>>>>>>
>>>>>> >>>>>>> OVERSEER_PREF  or just  OVERSEER
>>>>>> >>>>>>>
>>>>>> >>>>>>> would be better since it merely reades that the node
>>>>>> implements some sort of preference or config regarding overseer... but all
>>>>>> this can be decided on a per role basis
>>>>>> >>>>>>>
>>>>>> >>>>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <
>>>>>> noble.paul@gmail.com> wrote:
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Negative roles have a place
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Example is overseer
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> There are 3 possible choices for that role
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> a) preferred: always be in front of the election queue
>>>>>> >>>>>>>> b) on: not preferred, but can be an overseer if no preferred
>>>>>> overseer nodes are available
>>>>>> >>>>>>>> c) off: never become an overseer
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Today we only have options 'a' and 'b' . In a future ticket,
>>>>>> we may implement C
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com>
>>>>>> wrote:
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> Negative roles add a lot of complexity, I would really want
>>>>>> to stay away from them. That’s why I want strict roles up front. It’s maybe
>>>>>> ok to push this decision out, but it also seems like the sort of thing we
>>>>>> should consider at the start.
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <
>>>>>> noble.paul@gmail.com> wrote:
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> Yes. Negative roles is not a bad idea. If I start a node
>>>>>> for machine learning purposes, I wouldn't want that node to ever
>>>>>> participate in overseer election
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <
>>>>>> ilansolr@gmail.com> wrote:
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> If we have non strict roles (like overseer), then it does
>>>>>> make sense
>>>>>> >>>>>>>>>>> to have negative roles.
>>>>>> >>>>>>>>>>> That way I can define which are the two nodes that I'd
>>>>>> prefer the
>>>>>> >>>>>>>>>>> overseer to run on, and a few other nodes on which it
>>>>>> should
>>>>>> >>>>>>>>>>> definitely never run for various reasons. And in case
>>>>>> these
>>>>>> >>>>>>>>>>> "!overseer" are the only nodes left in the cluster, let
>>>>>> the cluster
>>>>>> >>>>>>>>>>> fail the same way it would if there were no data nodes
>>>>>> available.
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
>>>>>> houstonputman@gmail.com> wrote:
>>>>>> >>>>>>>>>>> >>>
>>>>>> >>>>>>>>>>> >>> With the Strict/Loose option and sensible defaults,
>>>>>> users cannot trip themselves up by default, but the option is there for
>>>>>> people to tinker and have an iron grip over their cluster.
>>>>>> >>>>>>>>>>> >>
>>>>>> >>>>>>>>>>> >>
>>>>>> >>>>>>>>>>> >> +1 to sensible defaults so users don't trip
>>>>>> themselves. The option to tinker for tighter grip can be tackled later,
>>>>>> either on a per role basis or as a generic concept later.
>>>>>> >>>>>>>>>>> >
>>>>>> >>>>>>>>>>> >
>>>>>> >>>>>>>>>>> > +1 - Can definitely be added later if we so desire, not
>>>>>> needed for this SIP
>>>>>> >>>>>>>>>>> >
>>>>>> >>>>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>> >>>>>>>>>>> >>
>>>>>> >>>>>>>>>>> >>
>>>>>> >>>>>>>>>>> >>
>>>>>> >>>>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <
>>>>>> gus.heck@gmail.com> wrote:
>>>>>> >>>>>>>>>>> >>>
>>>>>> >>>>>>>>>>> >>> I think the key  is to let the roles have full
>>>>>> control of the implications of having/not having that role. No need for
>>>>>> even a strict/loose designation. The question of do you have the role is
>>>>>> yes/no with no logic to guess if the role is implied or not, The question
>>>>>> of will it come up with the role is "have_explicit ? use_defaults :
>>>>>> use_defaults.
>>>>>> >>>>>>>>>>> >>>
>>>>>> >>>>>>>>>>> >>> Once you figure out who has a role (or not) what that
>>>>>> means is up to the role code.
>>>>>> >>>>>>>>>>> >>>
>>>>>> >>>>>>>>>>> >>> Corollary: we don't have to change the way overseer
>>>>>> works in this SIP. We can rework it or not as we see fit separately.
>>>>>> >>>>>>>>>>> >>
>>>>>> >>>>>>>>>>> >>
>>>>>> >>>>>>>>>>> >> +1
>>>>>> >>>>>>>>>>> >>
>>>>>> >>>>>>>>>>> >>>
>>>>>> >>>>>>>>>>> >>>
>>>>>> >>>>>>>>>>> >>> Only thing we need to do is find a wording that makes
>>>>>> the above clear on first read through the SIP :)
>>>>>> >>>>>>>>>>> >>>
>>>>>> >>>>>>>>>>> >>> -Gus
>>>>>> >>>>>>>>>>> >>>
>>>>>> >>>>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>>>>> houstonputman@gmail.com> wrote:
>>>>>> >>>>>>>>>>> >>>>>
>>>>>> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
>>>>>> happens if all of our existing OVERSEER candidates are down. When at least
>>>>>> one of them is up, the overseer will go there, and that is good and
>>>>>> expected. But what happens if all of the overseer eligible nodes are down.
>>>>>> Your comment, and the old system, would imply that the overseer election
>>>>>> goes to some other unrelated, untagged node. I disagree with this
>>>>>> implementation choice. This sounds like something role specific to
>>>>>> determine, but I would like to see us be more strict about it. I don't want
>>>>>> cores leaking out of my data roles, I don't want query processing to leak
>>>>>> out of my "query" nodes or whatever. Overseer shouldn't be special in this
>>>>>> regard.
>>>>>> >>>>>>>>>>> >>>>
>>>>>> >>>>>>>>>>> >>>>
>>>>>> >>>>>>>>>>> >>>> I'm very strongly in favor of not letting users
>>>>>> design a system in which the cluster can be "live" without an overseer. I
>>>>>> understand that the overseer can be taxing to the cluster, but honestly
>>>>>> what is the point of having an untaxed cluster that doesn't have an
>>>>>> overseer? I can see arguments for the other roles to be stricter about
>>>>>> this, but there are also a lot of users who wouldn't want those to be
>>>>>> strict either (like "query" nodes).
>>>>>> >>>>>>>>>>> >>>>
>>>>>> >>>>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a
>>>>>> non-overseer role node HAS to be selected to become overseer, it will try
>>>>>> to migrate the overseer job to a node with the overseer role whenever one
>>>>>> becomes live.
>>>>>> >>>>>>>>>>> >>>>
>>>>>> >>>>>>>>>>> >>>> So maybe we don't have special rules per role, but
>>>>>> instead roles can either be defined as "Strict" or "Loose" (better names
>>>>>> likely exist), and the roles come with a default (Overseer -> Loose, Data
>>>>>> -> Strict, Query -> Loose, etc.). And it is up to each role to define how
>>>>>> to behave when running in LOOSE mode and a non-role node is used then a
>>>>>> role node comes online (like the overseer example given above).
>>>>>> >>>>>>>>>>> >>>>
>>>>>> >>>>>>>>>>> >>>> With the Strict/Loose option and sensible defaults,
>>>>>> users cannot trip themselves up by default, but the option is there for
>>>>>> people to tinker and have an iron grip over their cluster.
>>>>>> >>>>>>>>>>> >>>>
>>>>>> >>>>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <
>>>>>> mdrob@mdrob.com> wrote:
>>>>>> >>>>>>>>>>> >>>>>
>>>>>> >>>>>>>>>>> >>>>> Noble wrote:
>>>>>> >>>>>>>>>>> >>>>> > We are not modifying the way the "overseer role"
>>>>>> works today. We are just changing the definition and standardizing the
>>>>>> configuration & discoverability
>>>>>> >>>>>>>>>>> >>>>> Ishan wrote:
>>>>>> >>>>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the
>>>>>> OVERSEER role (which currently stands for preferred overseer). We can take
>>>>>> a stab at refactoring it later.
>>>>>> >>>>>>>>>>> >>>>>
>>>>>> >>>>>>>>>>> >>>>> Grouping these two comments together, since I think
>>>>>> they are saying the same thing. I think this is part of my confusion. We
>>>>>> have an old system that doesn't work the way we want the new system to
>>>>>> work. There may be people already using the old system. What path do we
>>>>>> offer for folks using the old system to migrate to the new system? What
>>>>>> happens if somebody accidentally tries to use both systems at the same time?
>>>>>> >>>>>>>>>>> >>>>>
>>>>>> >>>>>>>>>>> >>>>> Ishan wrote:
>>>>>> >>>>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with
>>>>>> OVERSEER role] are live, Solr guarantees that one of those nodes becomes
>>>>>> the overseer.", I meant to somewhat capture the current behaviour as the
>>>>>> OVERSEER role performs today. Do you see any inconsistency with this
>>>>>> statement vs. what it does today?
>>>>>> >>>>>>>>>>> >>>>>
>>>>>> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
>>>>>> happens if all of our existing OVERSEER candidates are down. When at least
>>>>>> one of them is up, the overseer will go there, and that is good and
>>>>>> expected. But what happens if all of the overseer eligible nodes are down.
>>>>>> Your comment, and the old system, would imply that the overseer election
>>>>>> goes to some other unrelated, untagged node. I disagree with this
>>>>>> implementation choice. This sounds like something role specific to
>>>>>> determine, but I would like to see us be more strict about it. I don't want
>>>>>> cores leaking out of my data roles, I don't want query processing to leak
>>>>>> out of my "query" nodes or whatever. Overseer shouldn't be special in this
>>>>>> regard.
>>>>>> >>>>>>>>>>> >>>>>
>>>>>> >>>>>>>>>>> >>>>> Noble wrote:
>>>>>> >>>>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or
>>>>>> a node in the following request?
>>>>>> >>>>>>>>>>> >>>>>
>>>>>> >>>>>>>>>>> >>>>> You're absolutely correct, thanks for pointing this
>>>>>> out. Let's leave it as is.
>>>>>> >>>>>>>>>>> >>>>>
>>>>>> >>>>>>>>>>> >>>>>
>>>>>> >>>>>>>>>>> >>>>>
>>>>>> >>>>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan
>>>>>> Chattopadhyaya <ic...@gmail.com> wrote:
>>>>>> >>>>>>>>>>> >>>>>>
>>>>>> >>>>>>>>>>> >>>>>>
>>>>>> >>>>>>>>>>> >>>>>>
>>>>>> >>>>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <
>>>>>> mdrob@mdrob.com> wrote:
>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>> Replying to the top post in this thread because
>>>>>> there has been a lot of discussion and I don't want to look like I'm
>>>>>> continuing any of those particular threads.
>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>> I finally had time to sit down and think about
>>>>>> this with the attention it deserves and am generally happy with how the
>>>>>> conversation has shaped the current proposal.
>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>> GOOD: I think using system properties to define
>>>>>> node roles is fine and I like that data is the default role when not
>>>>>> defined. I think it is important to hold on to the guarantee that an active
>>>>>> overseer will land on an overseer node role.
>>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration
>>>>>> path for folks using the current OVERSEER role. I am not sure that
>>>>>> something can be done automatically since they need to now specify new
>>>>>> properties at startup. Maybe we need to include loud warnings or support
>>>>>> both approaches for a time?
>>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the
>>>>>> overseer nodes fail, then it is implied the overseer will go to one of the
>>>>>> data nodes. The specific wording in the SIP - "When one or more such nodes
>>>>>> are live, Solr guarantees that one of those nodes become the overseer."
>>>>>> implies to me that failover could go from overseer1 to overseer2 to
>>>>>> overseerN to random node. I feel like we need to have some recording that
>>>>>> there were dedicated overseer nodes and stop the cascading failure instead
>>>>>> of churning through our data nodes.
>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the
>>>>>> proposed scope of "coordinator" roles from a split query/indexing
>>>>>> standpoint. I understand that these are used as examples, but would like
>>>>>> stronger language that new roles should also go through their own SIP
>>>>>> discussions.
>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing
>>>>>> node liveness in two different places now. We have the live nodes and we
>>>>>> have the node roles stored in two different places in zookeeper and it
>>>>>> feels like this would lead to race conditions or split brain or other hard
>>>>>> to diagnose bugs when those two lists don't agree with each other. This
>>>>>> also feels like it contradicts the "single source of truth" idea later
>>>>>> stated in the proposal. I see Gus's arguments for decoupling these and am
>>>>>> not strongly opposed, I just get a lurking feeling about it. Even if we
>>>>>> don't do this, I would like this called out explicitly in the alternative
>>>>>> approaches section as something that we considered and rejected, with
>>>>>> details why,
>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an
>>>>>> additional call out here that all operations are GET because nodes cannot
>>>>>> be changed at runtime.
>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the
>>>>>> previous OVERSEER preference role?
>>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list
>>>>>> of available roles for a cluster. I _think_ this could be based on the
>>>>>> version that the cluster is running? Would be useful to be able to
>>>>>> interrogate a cluster in the future... we're seeing OOM issues on queries,
>>>>>> can we add some query nodes? When were they introduced? I don't know what
>>>>>> path this API should exist at.
>>>>>> >>>>>>>>>>> >>>>>>
>>>>>> >>>>>>>>>>> >>>>>>
>>>>>> >>>>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API,
>>>>>> updated the SIP document. Not sure if there's a better path that we could
>>>>>> go for.
>>>>>> >>>>>>>>>>> >>>>>>
>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly
>>>>>> show which parts are string literals and which parts are meant to be
>>>>>> substituted by the operator? GET /api/cluster/roles/data would become GET
>>>>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET
>>>>>> /api/cluster/roles/nodes/node1 should be GET /api/cluster/roles/${nodename}
>>>>>> dropping the intermediate "nodes"
>>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not
>>>>>> need that intermediate "nodes" node.
>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some
>>>>>> permissions? Maybe this requirement is too fundamental to the operation of
>>>>>> a cluster and everybody would have to be able to do it.
>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other
>>>>>> clients) to treat roles? Implementation detail that the servers will figure
>>>>>> out? Or strict guidance where the client needs to check where specific
>>>>>> roles are before sending any further communication to the server?
>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a
>>>>>> request that it can't fulfil? An overseer node gets a query or an update. A
>>>>>> data node gets a collection creation request. Do they forward it on to an
>>>>>> appropriate node, or do they reject it? Should this be configurable? If
>>>>>> not, then it seems like lazy or poorly configured clients will defeat this
>>>>>> isolation system quite easily.
>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes
>>>>>> behave when roles are added mean? I thought we established that they are
>>>>>> not dynamic.
>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>> Thanks,
>>>>>> >>>>>>>>>>> >>>>>>> Mike
>>>>>> >>>>>>>>>>> >>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan
>>>>>> Chattopadhyaya <ic...@gmail.com> wrote:
>>>>>> >>>>>>>>>>> >>>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>>> Hi,
>>>>>> >>>>>>>>>>> >>>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of
>>>>>> node roles:
>>>>>> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>> >>>>>>>>>>> >>>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>> >>>>>>>>>>> >>>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>>> We also wish to add first class support for
>>>>>> Query nodes that are used to process user queries by forwarding to data
>>>>>> nodes, merging/aggregating them and presenting to users. This concept
>>>>>> exists as first class citizens in most other search engines. This is a
>>>>>> chance for Solr to catch up.
>>>>>> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>> >>>>>>>>>>> >>>>>>>>
>>>>>> >>>>>>>>>>> >>>>>>>> Regards,
>>>>>> >>>>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
>>>>>> >>>>>>>>>>> >>>
>>>>>> >>>>>>>>>>> >>>
>>>>>> >>>>>>>>>>> >>>
>>>>>> >>>>>>>>>>> >>> --
>>>>>> >>>>>>>>>>> >>> http://www.needhamsoftware.com (work)
>>>>>> >>>>>>>>>>> >>> http://www.the111shift.com (play)
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>>> >>>>>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> --
>>>>>> >>>>>>> http://www.needhamsoftware.com (work)
>>>>>> >>>>>>> http://www.the111shift.com (play)
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> --
>>>>>> >>>>> -----------------------------------------------------
>>>>>> >>>>> Noble Paul
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> --
>>>>>> >>>> http://www.needhamsoftware.com (work)
>>>>>> >>>> http://www.the111shift.com (play)
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> -----------------------------------------------------
>>>>>> >>> Noble Paul
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> -----------------------------------------------------
>>>>>> >> Noble Paul
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>>
>>>>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

Re: First class support for node roles

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

Role specific configurations can go into /node_roles/${rolename} znode, and
that is outside the scope of this SIP. The concept of role specific modes
(eg allowed, preferred for overseer) is a welcome addition to original
proposal to model the overseer functionality properly without any confusion
to user. On top of that, default roles for nodes that don't have any roles
defined for it can be assumed by default (data:on, overseer:allowed).

Isn't that simple and generic at the same time? Why overcomplicate
everything all over again?

On Mon, 6 Dec, 2021, 12:54 pm Gus Heck, <gu...@gmail.com> wrote:

> So I think we're loosing sight of the original concept of "default" and
> conflating it with role configuration.
>
> When we started talking about "default roles" the idea was "default" was a
> flag that indicated if the role was active on a Solr Node where no roles
> had been specified. Plain and simple. Full stop.
>
> Secondarily any given role might or might not have some configuration
> associated with it. Optionally a role that accepts configuration may define
> default configuration values but this has nothing to do with "default role"
>
> Default should be an intrinsic binary property of the role as a whole (not
> specific to a cluster or a node).
>
> There are 3 levels to think about
>
>    1. Intrinsic Attributes of the role as a whole (Example  --> default:
>    yes/no)
>    2. Configurable attributes for the role across the cluster (Example
>    --> Strict: yes/no) (concept mentioned previously affecting how presence of
>    a role is interpreted by role related code)
>    3. Configurable attributes for the node that relate to the role
>    (Example --> Election_priority_adjust: integer ) (Hypothetical way of
>    influencing who gets elected first in a more fine grained fashion)
>
> Maybe use the following terminology?
>
>    1. Role Intrinsic Property
>    2. Role Cluster Config
>    3. Role Node Config
>
> We almost certainly have to determine what Role Intrinsic Properties we
> want to support as these are likely to be coded into the role
> implementation directly, and implementors of roles should specify these.
> (I'm not presently seeing need for more than "default".
>
> The config levels I think we want to mostly identify where that
> information can be communicated and stored. The Role Cluster Config level
> is tricky since there's no "cluster" until you start the first "Node" ...
> so a bit of a chicken/egg there. The Role Node Config  however seems to
> make sense as a file that gets read and then reflected in zk as appropriate
> during node startup (config that specified the local directory for
> something would not need to show up in zk of course, just stuff that
> another node/overseer/query router/whatever might need to know.
>
> Definitely let's reword anything that involves the phrase "Two Defaults"
> since by definition only one value can be the "default" value (I suppose
> theoretically you could have a mapping of defaults conditional on some
> other value but that's definitely the opposite of simple).
>
> -Gus
>
> On Mon, Dec 6, 2021 at 12:36 AM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
>
>> I think I understand Ilan's motivation for two defaults. Here's a summary
>> of what I understand Ilan's proposal, and a follow up proposal that
>> achieves the similar effect with less perceived complexity to user.
>>
>> *Ilan's proposal (as I understand it):*
>>
>> 1. Every role to have two defaults. Example:
>> data: {modes: [on, off], default1: on, *default2: on*}
>> overseer: {modes: [allowed, disallowed, preferred], default1: preferred, *default2:
>> disallowed*}
>> ui: {modes: [on, off], default1: on, *default2: on*}
>>
>> 2. Here, default1 is for lazy users for whom "-Dnode.roles=<rolename>"
>> will be interpreted as "-Dnode.roles=<rolename>:<default1 of rolename>".
>> 3. Here's default2 for any role *role1* is for users who either (a)
>> never specified any roles for a node, or (b) specified other roles, but not
>> *role1*. In both cases, the behaviour of that node would implicitly
>> assume "role1:<default2 of role1>".
>>
>> *My alternate proposal:*
>> 1. There are no role specific defaults. Example:
>> data: {modes: [on, off]}
>> overseer: {modes: [allowed, disallowed, preferred]}
>> ui: {modes: [on, off]}
>>
>> 2. There is a node specific default roles string *if no *-Dnode.roles
>> was specified. Example:
>> "data:on, overseer:allowed" (Today's system)
>> "data:on, overseer:allowed, ui:on" (When a future role, say "ui" is
>> introduced)
>>
>> 3. If a node was started with explicitly specified roles, that node will
>> have exactly those roles (in the specified modes) and nothing else (no
>> assumptions about other non-specified roles, i.e. those roles not specified
>> will not run).
>>
>> *Benefits of my proposal:*
>> 1. Easier to understand for users.
>> 2. Here's a scenario where user will be happier in my proposal vs. Ilan's
>> proposal:
>>    * 10 nodes with *-Droles=data:on,overseer:allowed*. (Regular data
>> nodes)
>>    * 2 nodes with *-Droles=overseer:preferred*. (Two dedicated overseer
>> nodes)
>>    * User upgrades from Solr 9.0 to 9.1, where "ui" role has been
>> introduced. Developers of "ui" role want it to be available for most users.
>>         - In Ilan's proposal, the developer chooses this in 9.1: ui:
>> {modes: [on, off], default1: on, *default2: on*}. Now, user upgrading
>> will see that UI is running on his two overseer nodes, and he's confused
>> (because he explicitly specified what he wants)
>>         - In my proposal, the developer chooses ui: {modes: [on, off]};
>> default roles for those users who don't specify roles: "data:on,
>> overseer:allowed, ui:on". Now, there are no surprises of implicit
>> default. Users who don't use roles at all will get this functionality
>> turned on, just as the developer wanted. Users who use roles will have to
>> explicitly append "ui:on" to their roles string on their nodes during the
>> upgrade (this tip will come from the upgrade notes).
>>
>> What do you think, Ilan/Noble/Mike/Gus/Houston?
>>
>> On Mon, Dec 6, 2021 at 8:10 AM Noble Paul <no...@gmail.com> wrote:
>>
>>> Ilan was asking how what should be the overseer role in the following
>>> situations
>>>
>>> a) role=overseer,data:on
>>> b) role=overseer: preferred,data:on
>>> c) role=data:on
>>>
>>> I'm saying a shouldn't be valid. Only b & c are valid
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Dec 6, 2021, 12:44 PM Mike Drob <md...@mdrob.com> wrote:
>>>
>>>> Ilan,
>>>>
>>>> Can you provide a more detailed concrete example? I’m having a lot of
>>>> trouble understanding what you are proposing, beyond that it is somehow
>>>> contraindicated with what Ishan/Noble suggest.
>>>>
>>>> Apologies for my failure to understand.
>>>>
>>>> Thanks,
>>>> Mike
>>>>
>>>> On Sun, Dec 5, 2021 at 5:21 PM Ilan Ginzburg <il...@gmail.com>
>>>> wrote:
>>>>
>>>>> If we go with optional role params, we need two defaults:
>>>>> 1. the param value to use when the role is specified without a
>>>>> parameter, and
>>>>> 2. the param value to use for the role on a node for which the role is
>>>>> not specified at all.
>>>>>
>>>>> I don't know how to sensibly name these defaults, but the actual
>>>>> values would be:
>>>>> overseer: default1=preferred, default2=allowed
>>>>> data: default1=on, default2=on
>>>>> coordinator: default1=on, default2=off
>>>>>
>>>>> If we do not allow specifying a role without a parameter, then
>>>>> default1 does not exist and the example Noble posted earlier covers
>>>>> us. But simple roles will be easier to use without parameters (and the
>>>>> transition from existing overseer role would be trivial).
>>>>>
>>>>> On Sun, Dec 5, 2021 at 7:17 AM Ishan Chattopadhyaya
>>>>> <ic...@gmail.com> wrote:
>>>>> >
>>>>> > I'm +1 on this. It "looks" complicated at first, but simplifies all
>>>>> headaches going forward.
>>>>> >
>>>>> > On Sun, Dec 5, 2021 at 11:46 AM Noble Paul <no...@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> I shall update the SIP proposal if we have a consensus on this
>>>>> configuration
>>>>> >>
>>>>> >> On Sun, Dec 5, 2021 at 4:58 PM Noble Paul <no...@gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <gu...@gmail.com>
>>>>> wrote:
>>>>> >>>>
>>>>> >>>> I like this in that it's an example of how the overseer might be
>>>>> extended without creating a new role :)
>>>>> >>>>
>>>>> >>>> Not entirely sure if I'm for or against an enum implementation
>>>>> here, but it makes me a bit nervous. Enums with complexity can quickly get
>>>>> into difficulty for unit tests (especially if one wanted to write a mock
>>>>> object based test, something I think we maybe should use a bit more than we
>>>>> do).
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> I would tend to think of a class to represent and collect role
>>>>> related functionality, one that perhaps has methods that receive the
>>>>> request, or other key objects and thus could be tested without standing up
>>>>> an entire server. (Not against also having them exercised in a few
>>>>> integrated tests, but the more we can avoid interleaving logic directly
>>>>> within DispatchFilter and HttpSolrCall etc. the better.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> So I guess I'm somewhat biased against any enum with more than a
>>>>> couple properties, and definitely don't want to wind up hanging lots of
>>>>> methods off of one. Better to use them to consume a configuration value and
>>>>> then instantiate a class that really holds the logic and data. I like them
>>>>> for constraining values and easy string value conversion but the more they
>>>>> look like classes the more I'd rather have a class.
>>>>> >>>
>>>>> >>>
>>>>> >>>  I just meant it is a set of values. Please let us not discuss the
>>>>> actual impl here . We should stick to discussing the high level design here
>>>>> and specifics should be dealt with in a PR
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> -Gus
>>>>> >>>>
>>>>> >>>> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <no...@gmail.com>
>>>>> wrote:
>>>>> >>>>>
>>>>> >>>>> I recommend the following format for the role spec
>>>>> >>>>>
>>>>> >>>>> roles=<role-name>:<role-value>
>>>>> >>>>>
>>>>> >>>>> each role will have an enum of allowed values and a default value
>>>>> >>>>>
>>>>> >>>>> role name: data
>>>>> >>>>>
>>>>> >>>>> values: [on, off]
>>>>> >>>>> default: allowed
>>>>> >>>>>
>>>>> >>>>> role name: overseer
>>>>> >>>>>
>>>>> >>>>> values: [allowed, disallowed, preferred]
>>>>> >>>>> default : allowed
>>>>> >>>>>
>>>>> >>>>> role name: coordinator
>>>>> >>>>>
>>>>> >>>>> values : [on, off]
>>>>> >>>>> default: off
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> examples
>>>>> >>>>> roles=data:on,overseer:allowed (This is redundant because it
>>>>> uses all the default values. If a node is started without any roles value
>>>>> this is the default behavior)
>>>>> >>>>> roles=data:off,overseer:preferred ( do not allow data, join
>>>>> overseer election at head)
>>>>> >>>>> roles=coordinator:on,data:on (role as coordinator, but allow
>>>>> data, it's same as roles=coordinator:on)
>>>>> >>>>> roles=coordinator:on,data:off (role as coordinator, disallow
>>>>> data)
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <
>>>>> ilansolr@gmail.com> wrote:
>>>>> >>>>>>
>>>>> >>>>>> If we go with no negative node roles and overseer node role is
>>>>> not strict (i.e. it’s a "preferred overseer"), then one would need to
>>>>> define a second node role "no_overseer" to explicitly exclude a node from
>>>>> ever becoming overseer (which I think is a useful feature until we switch
>>>>> the cluster default to not using the overseer), plus the implementation of
>>>>> these two node roles will obviously be coupled (and what if a node has both
>>>>> defined?).
>>>>> >>>>>>
>>>>> >>>>>> I prefer strict node roles.
>>>>> >>>>>> Maybe we could have node roles with [optional] parameters to
>>>>> let the node role implementation decide ?
>>>>> >>>>>> The overseer node role for example could have one of 3 values
>>>>> defined for each node: “preferred” (default, equivalent to the existing
>>>>> overseer role), "accepted" (equivalent to currently not defining the
>>>>> overseer role) and "no_way" (does not exist today).
>>>>> >>>>>>
>>>>> >>>>>> This could be useful in other contexts. A node role “data”
>>>>> could be “fast” or “slow” depending on type of local persistent storage for
>>>>> example…
>>>>> >>>>>>
>>>>> >>>>>> Ilan
>>>>> >>>>>>
>>>>> >>>>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com>
>>>>> wrote:
>>>>> >>>>>>>
>>>>> >>>>>>> I really don't think we should have types of roles. Not
>>>>> negative/positive and not strict/non-strict. You have a role or you don't.
>>>>> What that means is up to the code implementing the role.
>>>>> >>>>>>>
>>>>> >>>>>>> Roles should be free to configure a preference order (binary,
>>>>> or n-ary or whatever, strict or loose), prohibit behavior, or enable
>>>>> behavior. In this SIP I feel we should focus on How to identify what node
>>>>> has what role, How to designate what roles a node has via config/params,
>>>>> and the API's for interacting with roles.
>>>>> >>>>>>>
>>>>> >>>>>>> We should for example be able to support roles such as
>>>>> >>>>>>>
>>>>> >>>>>>> PREFERRED_OVERSEER
>>>>> >>>>>>> DATA
>>>>> >>>>>>> NO_ROUTED_ALIAS  (just an example, not something I mean to
>>>>> suggest)
>>>>> >>>>>>>
>>>>> >>>>>>> Details about role implementation should probably be discussed
>>>>> in a thread about that role.  Obviously we should think about the name
>>>>> carefully to leave options open should we want to enhance things later so
>>>>> maybe
>>>>> >>>>>>>
>>>>> >>>>>>> OVERSEER_PREF  or just  OVERSEER
>>>>> >>>>>>>
>>>>> >>>>>>> would be better since it merely reades that the node
>>>>> implements some sort of preference or config regarding overseer... but all
>>>>> this can be decided on a per role basis
>>>>> >>>>>>>
>>>>> >>>>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <
>>>>> noble.paul@gmail.com> wrote:
>>>>> >>>>>>>>
>>>>> >>>>>>>> Negative roles have a place
>>>>> >>>>>>>>
>>>>> >>>>>>>> Example is overseer
>>>>> >>>>>>>>
>>>>> >>>>>>>> There are 3 possible choices for that role
>>>>> >>>>>>>>
>>>>> >>>>>>>> a) preferred: always be in front of the election queue
>>>>> >>>>>>>> b) on: not preferred, but can be an overseer if no preferred
>>>>> overseer nodes are available
>>>>> >>>>>>>> c) off: never become an overseer
>>>>> >>>>>>>>
>>>>> >>>>>>>> Today we only have options 'a' and 'b' . In a future ticket,
>>>>> we may implement C
>>>>> >>>>>>>>
>>>>> >>>>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com>
>>>>> wrote:
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> Negative roles add a lot of complexity, I would really want
>>>>> to stay away from them. That’s why I want strict roles up front. It’s maybe
>>>>> ok to push this decision out, but it also seems like the sort of thing we
>>>>> should consider at the start.
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <
>>>>> noble.paul@gmail.com> wrote:
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Yes. Negative roles is not a bad idea. If I start a node
>>>>> for machine learning purposes, I wouldn't want that node to ever
>>>>> participate in overseer election
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <
>>>>> ilansolr@gmail.com> wrote:
>>>>> >>>>>>>>>>>
>>>>> >>>>>>>>>>> If we have non strict roles (like overseer), then it does
>>>>> make sense
>>>>> >>>>>>>>>>> to have negative roles.
>>>>> >>>>>>>>>>> That way I can define which are the two nodes that I'd
>>>>> prefer the
>>>>> >>>>>>>>>>> overseer to run on, and a few other nodes on which it
>>>>> should
>>>>> >>>>>>>>>>> definitely never run for various reasons. And in case these
>>>>> >>>>>>>>>>> "!overseer" are the only nodes left in the cluster, let
>>>>> the cluster
>>>>> >>>>>>>>>>> fail the same way it would if there were no data nodes
>>>>> available.
>>>>> >>>>>>>>>>>
>>>>> >>>>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
>>>>> houstonputman@gmail.com> wrote:
>>>>> >>>>>>>>>>> >>>
>>>>> >>>>>>>>>>> >>> With the Strict/Loose option and sensible defaults,
>>>>> users cannot trip themselves up by default, but the option is there for
>>>>> people to tinker and have an iron grip over their cluster.
>>>>> >>>>>>>>>>> >>
>>>>> >>>>>>>>>>> >>
>>>>> >>>>>>>>>>> >> +1 to sensible defaults so users don't trip themselves.
>>>>> The option to tinker for tighter grip can be tackled later, either on a per
>>>>> role basis or as a generic concept later.
>>>>> >>>>>>>>>>> >
>>>>> >>>>>>>>>>> >
>>>>> >>>>>>>>>>> > +1 - Can definitely be added later if we so desire, not
>>>>> needed for this SIP
>>>>> >>>>>>>>>>> >
>>>>> >>>>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>> >>>>>>>>>>> >>
>>>>> >>>>>>>>>>> >>
>>>>> >>>>>>>>>>> >>
>>>>> >>>>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <
>>>>> gus.heck@gmail.com> wrote:
>>>>> >>>>>>>>>>> >>>
>>>>> >>>>>>>>>>> >>> I think the key  is to let the roles have full control
>>>>> of the implications of having/not having that role. No need for even a
>>>>> strict/loose designation. The question of do you have the role is yes/no
>>>>> with no logic to guess if the role is implied or not, The question of will
>>>>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>>>> >>>>>>>>>>> >>>
>>>>> >>>>>>>>>>> >>> Once you figure out who has a role (or not) what that
>>>>> means is up to the role code.
>>>>> >>>>>>>>>>> >>>
>>>>> >>>>>>>>>>> >>> Corollary: we don't have to change the way overseer
>>>>> works in this SIP. We can rework it or not as we see fit separately.
>>>>> >>>>>>>>>>> >>
>>>>> >>>>>>>>>>> >>
>>>>> >>>>>>>>>>> >> +1
>>>>> >>>>>>>>>>> >>
>>>>> >>>>>>>>>>> >>>
>>>>> >>>>>>>>>>> >>>
>>>>> >>>>>>>>>>> >>> Only thing we need to do is find a wording that makes
>>>>> the above clear on first read through the SIP :)
>>>>> >>>>>>>>>>> >>>
>>>>> >>>>>>>>>>> >>> -Gus
>>>>> >>>>>>>>>>> >>>
>>>>> >>>>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>>>> houstonputman@gmail.com> wrote:
>>>>> >>>>>>>>>>> >>>>>
>>>>> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
>>>>> happens if all of our existing OVERSEER candidates are down. When at least
>>>>> one of them is up, the overseer will go there, and that is good and
>>>>> expected. But what happens if all of the overseer eligible nodes are down.
>>>>> Your comment, and the old system, would imply that the overseer election
>>>>> goes to some other unrelated, untagged node. I disagree with this
>>>>> implementation choice. This sounds like something role specific to
>>>>> determine, but I would like to see us be more strict about it. I don't want
>>>>> cores leaking out of my data roles, I don't want query processing to leak
>>>>> out of my "query" nodes or whatever. Overseer shouldn't be special in this
>>>>> regard.
>>>>> >>>>>>>>>>> >>>>
>>>>> >>>>>>>>>>> >>>>
>>>>> >>>>>>>>>>> >>>> I'm very strongly in favor of not letting users
>>>>> design a system in which the cluster can be "live" without an overseer. I
>>>>> understand that the overseer can be taxing to the cluster, but honestly
>>>>> what is the point of having an untaxed cluster that doesn't have an
>>>>> overseer? I can see arguments for the other roles to be stricter about
>>>>> this, but there are also a lot of users who wouldn't want those to be
>>>>> strict either (like "query" nodes).
>>>>> >>>>>>>>>>> >>>>
>>>>> >>>>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a
>>>>> non-overseer role node HAS to be selected to become overseer, it will try
>>>>> to migrate the overseer job to a node with the overseer role whenever one
>>>>> becomes live.
>>>>> >>>>>>>>>>> >>>>
>>>>> >>>>>>>>>>> >>>> So maybe we don't have special rules per role, but
>>>>> instead roles can either be defined as "Strict" or "Loose" (better names
>>>>> likely exist), and the roles come with a default (Overseer -> Loose, Data
>>>>> -> Strict, Query -> Loose, etc.). And it is up to each role to define how
>>>>> to behave when running in LOOSE mode and a non-role node is used then a
>>>>> role node comes online (like the overseer example given above).
>>>>> >>>>>>>>>>> >>>>
>>>>> >>>>>>>>>>> >>>> With the Strict/Loose option and sensible defaults,
>>>>> users cannot trip themselves up by default, but the option is there for
>>>>> people to tinker and have an iron grip over their cluster.
>>>>> >>>>>>>>>>> >>>>
>>>>> >>>>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <
>>>>> mdrob@mdrob.com> wrote:
>>>>> >>>>>>>>>>> >>>>>
>>>>> >>>>>>>>>>> >>>>> Noble wrote:
>>>>> >>>>>>>>>>> >>>>> > We are not modifying the way the "overseer role"
>>>>> works today. We are just changing the definition and standardizing the
>>>>> configuration & discoverability
>>>>> >>>>>>>>>>> >>>>> Ishan wrote:
>>>>> >>>>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the
>>>>> OVERSEER role (which currently stands for preferred overseer). We can take
>>>>> a stab at refactoring it later.
>>>>> >>>>>>>>>>> >>>>>
>>>>> >>>>>>>>>>> >>>>> Grouping these two comments together, since I think
>>>>> they are saying the same thing. I think this is part of my confusion. We
>>>>> have an old system that doesn't work the way we want the new system to
>>>>> work. There may be people already using the old system. What path do we
>>>>> offer for folks using the old system to migrate to the new system? What
>>>>> happens if somebody accidentally tries to use both systems at the same time?
>>>>> >>>>>>>>>>> >>>>>
>>>>> >>>>>>>>>>> >>>>> Ishan wrote:
>>>>> >>>>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with
>>>>> OVERSEER role] are live, Solr guarantees that one of those nodes becomes
>>>>> the overseer.", I meant to somewhat capture the current behaviour as the
>>>>> OVERSEER role performs today. Do you see any inconsistency with this
>>>>> statement vs. what it does today?
>>>>> >>>>>>>>>>> >>>>>
>>>>> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
>>>>> happens if all of our existing OVERSEER candidates are down. When at least
>>>>> one of them is up, the overseer will go there, and that is good and
>>>>> expected. But what happens if all of the overseer eligible nodes are down.
>>>>> Your comment, and the old system, would imply that the overseer election
>>>>> goes to some other unrelated, untagged node. I disagree with this
>>>>> implementation choice. This sounds like something role specific to
>>>>> determine, but I would like to see us be more strict about it. I don't want
>>>>> cores leaking out of my data roles, I don't want query processing to leak
>>>>> out of my "query" nodes or whatever. Overseer shouldn't be special in this
>>>>> regard.
>>>>> >>>>>>>>>>> >>>>>
>>>>> >>>>>>>>>>> >>>>> Noble wrote:
>>>>> >>>>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a
>>>>> node in the following request?
>>>>> >>>>>>>>>>> >>>>>
>>>>> >>>>>>>>>>> >>>>> You're absolutely correct, thanks for pointing this
>>>>> out. Let's leave it as is.
>>>>> >>>>>>>>>>> >>>>>
>>>>> >>>>>>>>>>> >>>>>
>>>>> >>>>>>>>>>> >>>>>
>>>>> >>>>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya
>>>>> <ic...@gmail.com> wrote:
>>>>> >>>>>>>>>>> >>>>>>
>>>>> >>>>>>>>>>> >>>>>>
>>>>> >>>>>>>>>>> >>>>>>
>>>>> >>>>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <
>>>>> mdrob@mdrob.com> wrote:
>>>>> >>>>>>>>>>> >>>>>>>
>>>>> >>>>>>>>>>> >>>>>>> Replying to the top post in this thread because
>>>>> there has been a lot of discussion and I don't want to look like I'm
>>>>> continuing any of those particular threads.
>>>>> >>>>>>>>>>> >>>>>>>
>>>>> >>>>>>>>>>> >>>>>>> I finally had time to sit down and think about
>>>>> this with the attention it deserves and am generally happy with how the
>>>>> conversation has shaped the current proposal.
>>>>> >>>>>>>>>>> >>>>>>>
>>>>> >>>>>>>>>>> >>>>>>> GOOD: I think using system properties to define
>>>>> node roles is fine and I like that data is the default role when not
>>>>> defined. I think it is important to hold on to the guarantee that an active
>>>>> overseer will land on an overseer node role.
>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration
>>>>> path for folks using the current OVERSEER role. I am not sure that
>>>>> something can be done automatically since they need to now specify new
>>>>> properties at startup. Maybe we need to include loud warnings or support
>>>>> both approaches for a time?
>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the
>>>>> overseer nodes fail, then it is implied the overseer will go to one of the
>>>>> data nodes. The specific wording in the SIP - "When one or more such nodes
>>>>> are live, Solr guarantees that one of those nodes become the overseer."
>>>>> implies to me that failover could go from overseer1 to overseer2 to
>>>>> overseerN to random node. I feel like we need to have some recording that
>>>>> there were dedicated overseer nodes and stop the cascading failure instead
>>>>> of churning through our data nodes.
>>>>> >>>>>>>>>>> >>>>>>>
>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the
>>>>> proposed scope of "coordinator" roles from a split query/indexing
>>>>> standpoint. I understand that these are used as examples, but would like
>>>>> stronger language that new roles should also go through their own SIP
>>>>> discussions.
>>>>> >>>>>>>>>>> >>>>>>>
>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing
>>>>> node liveness in two different places now. We have the live nodes and we
>>>>> have the node roles stored in two different places in zookeeper and it
>>>>> feels like this would lead to race conditions or split brain or other hard
>>>>> to diagnose bugs when those two lists don't agree with each other. This
>>>>> also feels like it contradicts the "single source of truth" idea later
>>>>> stated in the proposal. I see Gus's arguments for decoupling these and am
>>>>> not strongly opposed, I just get a lurking feeling about it. Even if we
>>>>> don't do this, I would like this called out explicitly in the alternative
>>>>> approaches section as something that we considered and rejected, with
>>>>> details why,
>>>>> >>>>>>>>>>> >>>>>>>
>>>>> >>>>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an
>>>>> additional call out here that all operations are GET because nodes cannot
>>>>> be changed at runtime.
>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the
>>>>> previous OVERSEER preference role?
>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list
>>>>> of available roles for a cluster. I _think_ this could be based on the
>>>>> version that the cluster is running? Would be useful to be able to
>>>>> interrogate a cluster in the future... we're seeing OOM issues on queries,
>>>>> can we add some query nodes? When were they introduced? I don't know what
>>>>> path this API should exist at.
>>>>> >>>>>>>>>>> >>>>>>
>>>>> >>>>>>>>>>> >>>>>>
>>>>> >>>>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API,
>>>>> updated the SIP document. Not sure if there's a better path that we could
>>>>> go for.
>>>>> >>>>>>>>>>> >>>>>>
>>>>> >>>>>>>>>>> >>>>>>>
>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly
>>>>> show which parts are string literals and which parts are meant to be
>>>>> substituted by the operator? GET /api/cluster/roles/data would become GET
>>>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET
>>>>> /api/cluster/roles/nodes/node1 should be GET /api/cluster/roles/${nodename}
>>>>> dropping the intermediate "nodes"
>>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not
>>>>> need that intermediate "nodes" node.
>>>>> >>>>>>>>>>> >>>>>>>
>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some
>>>>> permissions? Maybe this requirement is too fundamental to the operation of
>>>>> a cluster and everybody would have to be able to do it.
>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other
>>>>> clients) to treat roles? Implementation detail that the servers will figure
>>>>> out? Or strict guidance where the client needs to check where specific
>>>>> roles are before sending any further communication to the server?
>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a
>>>>> request that it can't fulfil? An overseer node gets a query or an update. A
>>>>> data node gets a collection creation request. Do they forward it on to an
>>>>> appropriate node, or do they reject it? Should this be configurable? If
>>>>> not, then it seems like lazy or poorly configured clients will defeat this
>>>>> isolation system quite easily.
>>>>> >>>>>>>>>>> >>>>>>>
>>>>> >>>>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes
>>>>> behave when roles are added mean? I thought we established that they are
>>>>> not dynamic.
>>>>> >>>>>>>>>>> >>>>>>>
>>>>> >>>>>>>>>>> >>>>>>>
>>>>> >>>>>>>>>>> >>>>>>> Thanks,
>>>>> >>>>>>>>>>> >>>>>>> Mike
>>>>> >>>>>>>>>>> >>>>>>>
>>>>> >>>>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan
>>>>> Chattopadhyaya <ic...@gmail.com> wrote:
>>>>> >>>>>>>>>>> >>>>>>>>
>>>>> >>>>>>>>>>> >>>>>>>> Hi,
>>>>> >>>>>>>>>>> >>>>>>>>
>>>>> >>>>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node
>>>>> roles:
>>>>> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>> >>>>>>>>>>> >>>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>> >>>>>>>>>>> >>>>>>>>
>>>>> >>>>>>>>>>> >>>>>>>> We also wish to add first class support for Query
>>>>> nodes that are used to process user queries by forwarding to data nodes,
>>>>> merging/aggregating them and presenting to users. This concept exists as
>>>>> first class citizens in most other search engines. This is a chance for
>>>>> Solr to catch up.
>>>>> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>> >>>>>>>>>>> >>>>>>>>
>>>>> >>>>>>>>>>> >>>>>>>> Regards,
>>>>> >>>>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
>>>>> >>>>>>>>>>> >>>
>>>>> >>>>>>>>>>> >>>
>>>>> >>>>>>>>>>> >>>
>>>>> >>>>>>>>>>> >>> --
>>>>> >>>>>>>>>>> >>> http://www.needhamsoftware.com (work)
>>>>> >>>>>>>>>>> >>> http://www.the111shift.com (play)
>>>>> >>>>>>>>>>>
>>>>> >>>>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>> >>>>>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>> >>>>>>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> --
>>>>> >>>>>>> http://www.needhamsoftware.com (work)
>>>>> >>>>>>> http://www.the111shift.com (play)
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> --
>>>>> >>>>> -----------------------------------------------------
>>>>> >>>>> Noble Paul
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> --
>>>>> >>>> http://www.needhamsoftware.com (work)
>>>>> >>>> http://www.the111shift.com (play)
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> -----------------------------------------------------
>>>>> >>> Noble Paul
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> -----------------------------------------------------
>>>>> >> Noble Paul
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>
>>>>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: First class support for node roles

Posted by Gus Heck <gu...@gmail.com>.

So I think we're loosing sight of the original concept of "default" and
conflating it with role configuration.

When we started talking about "default roles" the idea was "default" was a
flag that indicated if the role was active on a Solr Node where no roles
had been specified. Plain and simple. Full stop.

Secondarily any given role might or might not have some configuration
associated with it. Optionally a role that accepts configuration may define
default configuration values but this has nothing to do with "default role"

Default should be an intrinsic binary property of the role as a whole (not
specific to a cluster or a node).

There are 3 levels to think about

   1. Intrinsic Attributes of the role as a whole (Example  --> default:
   yes/no)
   2. Configurable attributes for the role across the cluster (Example -->
   Strict: yes/no) (concept mentioned previously affecting how presence of a
   role is interpreted by role related code)
   3. Configurable attributes for the node that relate to the role (Example
   --> Election_priority_adjust: integer ) (Hypothetical way of influencing
   who gets elected first in a more fine grained fashion)

Maybe use the following terminology?

   1. Role Intrinsic Property
   2. Role Cluster Config
   3. Role Node Config

We almost certainly have to determine what Role Intrinsic Properties we
want to support as these are likely to be coded into the role
implementation directly, and implementors of roles should specify these.
(I'm not presently seeing need for more than "default".

The config levels I think we want to mostly identify where that information
can be communicated and stored. The Role Cluster Config level is tricky
since there's no "cluster" until you start the first "Node" ... so a bit of
a chicken/egg there. The Role Node Config  however seems to make sense as a
file that gets read and then reflected in zk as appropriate during node
startup (config that specified the local directory for something would not
need to show up in zk of course, just stuff that another
node/overseer/query router/whatever might need to know.

Definitely let's reword anything that involves the phrase "Two Defaults"
since by definition only one value can be the "default" value (I suppose
theoretically you could have a mapping of defaults conditional on some
other value but that's definitely the opposite of simple).

-Gus

On Mon, Dec 6, 2021 at 12:36 AM Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

> I think I understand Ilan's motivation for two defaults. Here's a summary
> of what I understand Ilan's proposal, and a follow up proposal that
> achieves the similar effect with less perceived complexity to user.
>
> *Ilan's proposal (as I understand it):*
>
> 1. Every role to have two defaults. Example:
> data: {modes: [on, off], default1: on, *default2: on*}
> overseer: {modes: [allowed, disallowed, preferred], default1: preferred, *default2:
> disallowed*}
> ui: {modes: [on, off], default1: on, *default2: on*}
>
> 2. Here, default1 is for lazy users for whom "-Dnode.roles=<rolename>"
> will be interpreted as "-Dnode.roles=<rolename>:<default1 of rolename>".
> 3. Here's default2 for any role *role1* is for users who either (a) never
> specified any roles for a node, or (b) specified other roles, but not
> *role1*. In both cases, the behaviour of that node would implicitly
> assume "role1:<default2 of role1>".
>
> *My alternate proposal:*
> 1. There are no role specific defaults. Example:
> data: {modes: [on, off]}
> overseer: {modes: [allowed, disallowed, preferred]}
> ui: {modes: [on, off]}
>
> 2. There is a node specific default roles string *if no *-Dnode.roles was
> specified. Example:
> "data:on, overseer:allowed" (Today's system)
> "data:on, overseer:allowed, ui:on" (When a future role, say "ui" is
> introduced)
>
> 3. If a node was started with explicitly specified roles, that node will
> have exactly those roles (in the specified modes) and nothing else (no
> assumptions about other non-specified roles, i.e. those roles not specified
> will not run).
>
> *Benefits of my proposal:*
> 1. Easier to understand for users.
> 2. Here's a scenario where user will be happier in my proposal vs. Ilan's
> proposal:
>    * 10 nodes with *-Droles=data:on,overseer:allowed*. (Regular data
> nodes)
>    * 2 nodes with *-Droles=overseer:preferred*. (Two dedicated overseer
> nodes)
>    * User upgrades from Solr 9.0 to 9.1, where "ui" role has been
> introduced. Developers of "ui" role want it to be available for most users.
>         - In Ilan's proposal, the developer chooses this in 9.1: ui:
> {modes: [on, off], default1: on, *default2: on*}. Now, user upgrading
> will see that UI is running on his two overseer nodes, and he's confused
> (because he explicitly specified what he wants)
>         - In my proposal, the developer chooses ui: {modes: [on, off]};
> default roles for those users who don't specify roles: "data:on,
> overseer:allowed, ui:on". Now, there are no surprises of implicit
> default. Users who don't use roles at all will get this functionality
> turned on, just as the developer wanted. Users who use roles will have to
> explicitly append "ui:on" to their roles string on their nodes during the
> upgrade (this tip will come from the upgrade notes).
>
> What do you think, Ilan/Noble/Mike/Gus/Houston?
>
> On Mon, Dec 6, 2021 at 8:10 AM Noble Paul <no...@gmail.com> wrote:
>
>> Ilan was asking how what should be the overseer role in the following
>> situations
>>
>> a) role=overseer,data:on
>> b) role=overseer: preferred,data:on
>> c) role=data:on
>>
>> I'm saying a shouldn't be valid. Only b & c are valid
>>
>>
>>
>>
>>
>>
>> On Mon, Dec 6, 2021, 12:44 PM Mike Drob <md...@mdrob.com> wrote:
>>
>>> Ilan,
>>>
>>> Can you provide a more detailed concrete example? I’m having a lot of
>>> trouble understanding what you are proposing, beyond that it is somehow
>>> contraindicated with what Ishan/Noble suggest.
>>>
>>> Apologies for my failure to understand.
>>>
>>> Thanks,
>>> Mike
>>>
>>> On Sun, Dec 5, 2021 at 5:21 PM Ilan Ginzburg <il...@gmail.com> wrote:
>>>
>>>> If we go with optional role params, we need two defaults:
>>>> 1. the param value to use when the role is specified without a
>>>> parameter, and
>>>> 2. the param value to use for the role on a node for which the role is
>>>> not specified at all.
>>>>
>>>> I don't know how to sensibly name these defaults, but the actual
>>>> values would be:
>>>> overseer: default1=preferred, default2=allowed
>>>> data: default1=on, default2=on
>>>> coordinator: default1=on, default2=off
>>>>
>>>> If we do not allow specifying a role without a parameter, then
>>>> default1 does not exist and the example Noble posted earlier covers
>>>> us. But simple roles will be easier to use without parameters (and the
>>>> transition from existing overseer role would be trivial).
>>>>
>>>> On Sun, Dec 5, 2021 at 7:17 AM Ishan Chattopadhyaya
>>>> <ic...@gmail.com> wrote:
>>>> >
>>>> > I'm +1 on this. It "looks" complicated at first, but simplifies all
>>>> headaches going forward.
>>>> >
>>>> > On Sun, Dec 5, 2021 at 11:46 AM Noble Paul <no...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> I shall update the SIP proposal if we have a consensus on this
>>>> configuration
>>>> >>
>>>> >> On Sun, Dec 5, 2021 at 4:58 PM Noble Paul <no...@gmail.com>
>>>> wrote:
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <gu...@gmail.com> wrote:
>>>> >>>>
>>>> >>>> I like this in that it's an example of how the overseer might be
>>>> extended without creating a new role :)
>>>> >>>>
>>>> >>>> Not entirely sure if I'm for or against an enum implementation
>>>> here, but it makes me a bit nervous. Enums with complexity can quickly get
>>>> into difficulty for unit tests (especially if one wanted to write a mock
>>>> object based test, something I think we maybe should use a bit more than we
>>>> do).
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> I would tend to think of a class to represent and collect role
>>>> related functionality, one that perhaps has methods that receive the
>>>> request, or other key objects and thus could be tested without standing up
>>>> an entire server. (Not against also having them exercised in a few
>>>> integrated tests, but the more we can avoid interleaving logic directly
>>>> within DispatchFilter and HttpSolrCall etc. the better.
>>>> >>>>
>>>> >>>>
>>>> >>>> So I guess I'm somewhat biased against any enum with more than a
>>>> couple properties, and definitely don't want to wind up hanging lots of
>>>> methods off of one. Better to use them to consume a configuration value and
>>>> then instantiate a class that really holds the logic and data. I like them
>>>> for constraining values and easy string value conversion but the more they
>>>> look like classes the more I'd rather have a class.
>>>> >>>
>>>> >>>
>>>> >>>  I just meant it is a set of values. Please let us not discuss the
>>>> actual impl here . We should stick to discussing the high level design here
>>>> and specifics should be dealt with in a PR
>>>> >>>>
>>>> >>>>
>>>> >>>> -Gus
>>>> >>>>
>>>> >>>> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <no...@gmail.com>
>>>> wrote:
>>>> >>>>>
>>>> >>>>> I recommend the following format for the role spec
>>>> >>>>>
>>>> >>>>> roles=<role-name>:<role-value>
>>>> >>>>>
>>>> >>>>> each role will have an enum of allowed values and a default value
>>>> >>>>>
>>>> >>>>> role name: data
>>>> >>>>>
>>>> >>>>> values: [on, off]
>>>> >>>>> default: allowed
>>>> >>>>>
>>>> >>>>> role name: overseer
>>>> >>>>>
>>>> >>>>> values: [allowed, disallowed, preferred]
>>>> >>>>> default : allowed
>>>> >>>>>
>>>> >>>>> role name: coordinator
>>>> >>>>>
>>>> >>>>> values : [on, off]
>>>> >>>>> default: off
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> examples
>>>> >>>>> roles=data:on,overseer:allowed (This is redundant because it uses
>>>> all the default values. If a node is started without any roles value this
>>>> is the default behavior)
>>>> >>>>> roles=data:off,overseer:preferred ( do not allow data, join
>>>> overseer election at head)
>>>> >>>>> roles=coordinator:on,data:on (role as coordinator, but allow
>>>> data, it's same as roles=coordinator:on)
>>>> >>>>> roles=coordinator:on,data:off (role as coordinator, disallow data)
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <il...@gmail.com>
>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> If we go with no negative node roles and overseer node role is
>>>> not strict (i.e. it’s a "preferred overseer"), then one would need to
>>>> define a second node role "no_overseer" to explicitly exclude a node from
>>>> ever becoming overseer (which I think is a useful feature until we switch
>>>> the cluster default to not using the overseer), plus the implementation of
>>>> these two node roles will obviously be coupled (and what if a node has both
>>>> defined?).
>>>> >>>>>>
>>>> >>>>>> I prefer strict node roles.
>>>> >>>>>> Maybe we could have node roles with [optional] parameters to let
>>>> the node role implementation decide ?
>>>> >>>>>> The overseer node role for example could have one of 3 values
>>>> defined for each node: “preferred” (default, equivalent to the existing
>>>> overseer role), "accepted" (equivalent to currently not defining the
>>>> overseer role) and "no_way" (does not exist today).
>>>> >>>>>>
>>>> >>>>>> This could be useful in other contexts. A node role “data” could
>>>> be “fast” or “slow” depending on type of local persistent storage for
>>>> example…
>>>> >>>>>>
>>>> >>>>>> Ilan
>>>> >>>>>>
>>>> >>>>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com> wrote:
>>>> >>>>>>>
>>>> >>>>>>> I really don't think we should have types of roles. Not
>>>> negative/positive and not strict/non-strict. You have a role or you don't.
>>>> What that means is up to the code implementing the role.
>>>> >>>>>>>
>>>> >>>>>>> Roles should be free to configure a preference order (binary,
>>>> or n-ary or whatever, strict or loose), prohibit behavior, or enable
>>>> behavior. In this SIP I feel we should focus on How to identify what node
>>>> has what role, How to designate what roles a node has via config/params,
>>>> and the API's for interacting with roles.
>>>> >>>>>>>
>>>> >>>>>>> We should for example be able to support roles such as
>>>> >>>>>>>
>>>> >>>>>>> PREFERRED_OVERSEER
>>>> >>>>>>> DATA
>>>> >>>>>>> NO_ROUTED_ALIAS  (just an example, not something I mean to
>>>> suggest)
>>>> >>>>>>>
>>>> >>>>>>> Details about role implementation should probably be discussed
>>>> in a thread about that role.  Obviously we should think about the name
>>>> carefully to leave options open should we want to enhance things later so
>>>> maybe
>>>> >>>>>>>
>>>> >>>>>>> OVERSEER_PREF  or just  OVERSEER
>>>> >>>>>>>
>>>> >>>>>>> would be better since it merely reades that the node implements
>>>> some sort of preference or config regarding overseer... but all this can be
>>>> decided on a per role basis
>>>> >>>>>>>
>>>> >>>>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <
>>>> noble.paul@gmail.com> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>> Negative roles have a place
>>>> >>>>>>>>
>>>> >>>>>>>> Example is overseer
>>>> >>>>>>>>
>>>> >>>>>>>> There are 3 possible choices for that role
>>>> >>>>>>>>
>>>> >>>>>>>> a) preferred: always be in front of the election queue
>>>> >>>>>>>> b) on: not preferred, but can be an overseer if no preferred
>>>> overseer nodes are available
>>>> >>>>>>>> c) off: never become an overseer
>>>> >>>>>>>>
>>>> >>>>>>>> Today we only have options 'a' and 'b' . In a future ticket,
>>>> we may implement C
>>>> >>>>>>>>
>>>> >>>>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com>
>>>> wrote:
>>>> >>>>>>>>>
>>>> >>>>>>>>> Negative roles add a lot of complexity, I would really want
>>>> to stay away from them. That’s why I want strict roles up front. It’s maybe
>>>> ok to push this decision out, but it also seems like the sort of thing we
>>>> should consider at the start.
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <
>>>> noble.paul@gmail.com> wrote:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Yes. Negative roles is not a bad idea. If I start a node for
>>>> machine learning purposes, I wouldn't want that node to ever participate in
>>>> overseer election
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <
>>>> ilansolr@gmail.com> wrote:
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> If we have non strict roles (like overseer), then it does
>>>> make sense
>>>> >>>>>>>>>>> to have negative roles.
>>>> >>>>>>>>>>> That way I can define which are the two nodes that I'd
>>>> prefer the
>>>> >>>>>>>>>>> overseer to run on, and a few other nodes on which it should
>>>> >>>>>>>>>>> definitely never run for various reasons. And in case these
>>>> >>>>>>>>>>> "!overseer" are the only nodes left in the cluster, let the
>>>> cluster
>>>> >>>>>>>>>>> fail the same way it would if there were no data nodes
>>>> available.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
>>>> houstonputman@gmail.com> wrote:
>>>> >>>>>>>>>>> >>>
>>>> >>>>>>>>>>> >>> With the Strict/Loose option and sensible defaults,
>>>> users cannot trip themselves up by default, but the option is there for
>>>> people to tinker and have an iron grip over their cluster.
>>>> >>>>>>>>>>> >>
>>>> >>>>>>>>>>> >>
>>>> >>>>>>>>>>> >> +1 to sensible defaults so users don't trip themselves.
>>>> The option to tinker for tighter grip can be tackled later, either on a per
>>>> role basis or as a generic concept later.
>>>> >>>>>>>>>>> >
>>>> >>>>>>>>>>> >
>>>> >>>>>>>>>>> > +1 - Can definitely be added later if we so desire, not
>>>> needed for this SIP
>>>> >>>>>>>>>>> >
>>>> >>>>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>>> ichattopadhyaya@gmail.com> wrote:
>>>> >>>>>>>>>>> >>
>>>> >>>>>>>>>>> >>
>>>> >>>>>>>>>>> >>
>>>> >>>>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <
>>>> gus.heck@gmail.com> wrote:
>>>> >>>>>>>>>>> >>>
>>>> >>>>>>>>>>> >>> I think the key  is to let the roles have full control
>>>> of the implications of having/not having that role. No need for even a
>>>> strict/loose designation. The question of do you have the role is yes/no
>>>> with no logic to guess if the role is implied or not, The question of will
>>>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>>> >>>>>>>>>>> >>>
>>>> >>>>>>>>>>> >>> Once you figure out who has a role (or not) what that
>>>> means is up to the role code.
>>>> >>>>>>>>>>> >>>
>>>> >>>>>>>>>>> >>> Corollary: we don't have to change the way overseer
>>>> works in this SIP. We can rework it or not as we see fit separately.
>>>> >>>>>>>>>>> >>
>>>> >>>>>>>>>>> >>
>>>> >>>>>>>>>>> >> +1
>>>> >>>>>>>>>>> >>
>>>> >>>>>>>>>>> >>>
>>>> >>>>>>>>>>> >>>
>>>> >>>>>>>>>>> >>> Only thing we need to do is find a wording that makes
>>>> the above clear on first read through the SIP :)
>>>> >>>>>>>>>>> >>>
>>>> >>>>>>>>>>> >>> -Gus
>>>> >>>>>>>>>>> >>>
>>>> >>>>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>>> houstonputman@gmail.com> wrote:
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
>>>> happens if all of our existing OVERSEER candidates are down. When at least
>>>> one of them is up, the overseer will go there, and that is good and
>>>> expected. But what happens if all of the overseer eligible nodes are down.
>>>> Your comment, and the old system, would imply that the overseer election
>>>> goes to some other unrelated, untagged node. I disagree with this
>>>> implementation choice. This sounds like something role specific to
>>>> determine, but I would like to see us be more strict about it. I don't want
>>>> cores leaking out of my data roles, I don't want query processing to leak
>>>> out of my "query" nodes or whatever. Overseer shouldn't be special in this
>>>> regard.
>>>> >>>>>>>>>>> >>>>
>>>> >>>>>>>>>>> >>>>
>>>> >>>>>>>>>>> >>>> I'm very strongly in favor of not letting users design
>>>> a system in which the cluster can be "live" without an overseer. I
>>>> understand that the overseer can be taxing to the cluster, but honestly
>>>> what is the point of having an untaxed cluster that doesn't have an
>>>> overseer? I can see arguments for the other roles to be stricter about
>>>> this, but there are also a lot of users who wouldn't want those to be
>>>> strict either (like "query" nodes).
>>>> >>>>>>>>>>> >>>>
>>>> >>>>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a
>>>> non-overseer role node HAS to be selected to become overseer, it will try
>>>> to migrate the overseer job to a node with the overseer role whenever one
>>>> becomes live.
>>>> >>>>>>>>>>> >>>>
>>>> >>>>>>>>>>> >>>> So maybe we don't have special rules per role, but
>>>> instead roles can either be defined as "Strict" or "Loose" (better names
>>>> likely exist), and the roles come with a default (Overseer -> Loose, Data
>>>> -> Strict, Query -> Loose, etc.). And it is up to each role to define how
>>>> to behave when running in LOOSE mode and a non-role node is used then a
>>>> role node comes online (like the overseer example given above).
>>>> >>>>>>>>>>> >>>>
>>>> >>>>>>>>>>> >>>> With the Strict/Loose option and sensible defaults,
>>>> users cannot trip themselves up by default, but the option is there for
>>>> people to tinker and have an iron grip over their cluster.
>>>> >>>>>>>>>>> >>>>
>>>> >>>>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <
>>>> mdrob@mdrob.com> wrote:
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>> Noble wrote:
>>>> >>>>>>>>>>> >>>>> > We are not modifying the way the "overseer role"
>>>> works today. We are just changing the definition and standardizing the
>>>> configuration & discoverability
>>>> >>>>>>>>>>> >>>>> Ishan wrote:
>>>> >>>>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the
>>>> OVERSEER role (which currently stands for preferred overseer). We can take
>>>> a stab at refactoring it later.
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>> Grouping these two comments together, since I think
>>>> they are saying the same thing. I think this is part of my confusion. We
>>>> have an old system that doesn't work the way we want the new system to
>>>> work. There may be people already using the old system. What path do we
>>>> offer for folks using the old system to migrate to the new system? What
>>>> happens if somebody accidentally tries to use both systems at the same time?
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>> Ishan wrote:
>>>> >>>>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with
>>>> OVERSEER role] are live, Solr guarantees that one of those nodes becomes
>>>> the overseer.", I meant to somewhat capture the current behaviour as the
>>>> OVERSEER role performs today. Do you see any inconsistency with this
>>>> statement vs. what it does today?
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
>>>> happens if all of our existing OVERSEER candidates are down. When at least
>>>> one of them is up, the overseer will go there, and that is good and
>>>> expected. But what happens if all of the overseer eligible nodes are down.
>>>> Your comment, and the old system, would imply that the overseer election
>>>> goes to some other unrelated, untagged node. I disagree with this
>>>> implementation choice. This sounds like something role specific to
>>>> determine, but I would like to see us be more strict about it. I don't want
>>>> cores leaking out of my data roles, I don't want query processing to leak
>>>> out of my "query" nodes or whatever. Overseer shouldn't be special in this
>>>> regard.
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>> Noble wrote:
>>>> >>>>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a
>>>> node in the following request?
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>> You're absolutely correct, thanks for pointing this
>>>> out. Let's leave it as is.
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>>> ichattopadhyaya@gmail.com> wrote:
>>>> >>>>>>>>>>> >>>>>>
>>>> >>>>>>>>>>> >>>>>>
>>>> >>>>>>>>>>> >>>>>>
>>>> >>>>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <
>>>> mdrob@mdrob.com> wrote:
>>>> >>>>>>>>>>> >>>>>>>
>>>> >>>>>>>>>>> >>>>>>> Replying to the top post in this thread because
>>>> there has been a lot of discussion and I don't want to look like I'm
>>>> continuing any of those particular threads.
>>>> >>>>>>>>>>> >>>>>>>
>>>> >>>>>>>>>>> >>>>>>> I finally had time to sit down and think about this
>>>> with the attention it deserves and am generally happy with how the
>>>> conversation has shaped the current proposal.
>>>> >>>>>>>>>>> >>>>>>>
>>>> >>>>>>>>>>> >>>>>>> GOOD: I think using system properties to define
>>>> node roles is fine and I like that data is the default role when not
>>>> defined. I think it is important to hold on to the guarantee that an active
>>>> overseer will land on an overseer node role.
>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration
>>>> path for folks using the current OVERSEER role. I am not sure that
>>>> something can be done automatically since they need to now specify new
>>>> properties at startup. Maybe we need to include loud warnings or support
>>>> both approaches for a time?
>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the
>>>> overseer nodes fail, then it is implied the overseer will go to one of the
>>>> data nodes. The specific wording in the SIP - "When one or more such nodes
>>>> are live, Solr guarantees that one of those nodes become the overseer."
>>>> implies to me that failover could go from overseer1 to overseer2 to
>>>> overseerN to random node. I feel like we need to have some recording that
>>>> there were dedicated overseer nodes and stop the cascading failure instead
>>>> of churning through our data nodes.
>>>> >>>>>>>>>>> >>>>>>>
>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the
>>>> proposed scope of "coordinator" roles from a split query/indexing
>>>> standpoint. I understand that these are used as examples, but would like
>>>> stronger language that new roles should also go through their own SIP
>>>> discussions.
>>>> >>>>>>>>>>> >>>>>>>
>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing
>>>> node liveness in two different places now. We have the live nodes and we
>>>> have the node roles stored in two different places in zookeeper and it
>>>> feels like this would lead to race conditions or split brain or other hard
>>>> to diagnose bugs when those two lists don't agree with each other. This
>>>> also feels like it contradicts the "single source of truth" idea later
>>>> stated in the proposal. I see Gus's arguments for decoupling these and am
>>>> not strongly opposed, I just get a lurking feeling about it. Even if we
>>>> don't do this, I would like this called out explicitly in the alternative
>>>> approaches section as something that we considered and rejected, with
>>>> details why,
>>>> >>>>>>>>>>> >>>>>>>
>>>> >>>>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an
>>>> additional call out here that all operations are GET because nodes cannot
>>>> be changed at runtime.
>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the
>>>> previous OVERSEER preference role?
>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list
>>>> of available roles for a cluster. I _think_ this could be based on the
>>>> version that the cluster is running? Would be useful to be able to
>>>> interrogate a cluster in the future... we're seeing OOM issues on queries,
>>>> can we add some query nodes? When were they introduced? I don't know what
>>>> path this API should exist at.
>>>> >>>>>>>>>>> >>>>>>
>>>> >>>>>>>>>>> >>>>>>
>>>> >>>>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API,
>>>> updated the SIP document. Not sure if there's a better path that we could
>>>> go for.
>>>> >>>>>>>>>>> >>>>>>
>>>> >>>>>>>>>>> >>>>>>>
>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show
>>>> which parts are string literals and which parts are meant to be substituted
>>>> by the operator? GET /api/cluster/roles/data would become GET
>>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET
>>>> /api/cluster/roles/nodes/node1 should be GET /api/cluster/roles/${nodename}
>>>> dropping the intermediate "nodes"
>>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not
>>>> need that intermediate "nodes" node.
>>>> >>>>>>>>>>> >>>>>>>
>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some
>>>> permissions? Maybe this requirement is too fundamental to the operation of
>>>> a cluster and everybody would have to be able to do it.
>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other
>>>> clients) to treat roles? Implementation detail that the servers will figure
>>>> out? Or strict guidance where the client needs to check where specific
>>>> roles are before sending any further communication to the server?
>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a
>>>> request that it can't fulfil? An overseer node gets a query or an update. A
>>>> data node gets a collection creation request. Do they forward it on to an
>>>> appropriate node, or do they reject it? Should this be configurable? If
>>>> not, then it seems like lazy or poorly configured clients will defeat this
>>>> isolation system quite easily.
>>>> >>>>>>>>>>> >>>>>>>
>>>> >>>>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes
>>>> behave when roles are added mean? I thought we established that they are
>>>> not dynamic.
>>>> >>>>>>>>>>> >>>>>>>
>>>> >>>>>>>>>>> >>>>>>>
>>>> >>>>>>>>>>> >>>>>>> Thanks,
>>>> >>>>>>>>>>> >>>>>>> Mike
>>>> >>>>>>>>>>> >>>>>>>
>>>> >>>>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan
>>>> Chattopadhyaya <ic...@gmail.com> wrote:
>>>> >>>>>>>>>>> >>>>>>>>
>>>> >>>>>>>>>>> >>>>>>>> Hi,
>>>> >>>>>>>>>>> >>>>>>>>
>>>> >>>>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node
>>>> roles:
>>>> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>> >>>>>>>>>>> >>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>> >>>>>>>>>>> >>>>>>>>
>>>> >>>>>>>>>>> >>>>>>>> We also wish to add first class support for Query
>>>> nodes that are used to process user queries by forwarding to data nodes,
>>>> merging/aggregating them and presenting to users. This concept exists as
>>>> first class citizens in most other search engines. This is a chance for
>>>> Solr to catch up.
>>>> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>> >>>>>>>>>>> >>>>>>>>
>>>> >>>>>>>>>>> >>>>>>>> Regards,
>>>> >>>>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
>>>> >>>>>>>>>>> >>>
>>>> >>>>>>>>>>> >>>
>>>> >>>>>>>>>>> >>>
>>>> >>>>>>>>>>> >>> --
>>>> >>>>>>>>>>> >>> http://www.needhamsoftware.com (work)
>>>> >>>>>>>>>>> >>> http://www.the111shift.com (play)
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> ---------------------------------------------------------------------
>>>> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>> >>>>>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>> >>>>>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> --
>>>> >>>>>>> http://www.needhamsoftware.com (work)
>>>> >>>>>>> http://www.the111shift.com (play)
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> --
>>>> >>>>> -----------------------------------------------------
>>>> >>>>> Noble Paul
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> --
>>>> >>>> http://www.needhamsoftware.com (work)
>>>> >>>> http://www.the111shift.com (play)
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> -----------------------------------------------------
>>>> >>> Noble Paul
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> -----------------------------------------------------
>>>> >> Noble Paul
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>
>>>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: First class support for node roles

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

I think I understand Ilan's motivation for two defaults. Here's a summary
of what I understand Ilan's proposal, and a follow up proposal that
achieves the similar effect with less perceived complexity to user.

*Ilan's proposal (as I understand it):*

1. Every role to have two defaults. Example:
data: {modes: [on, off], default1: on, *default2: on*}
overseer: {modes: [allowed, disallowed, preferred], default1:
preferred, *default2:
disallowed*}
ui: {modes: [on, off], default1: on, *default2: on*}

2. Here, default1 is for lazy users for whom "-Dnode.roles=<rolename>" will
be interpreted as "-Dnode.roles=<rolename>:<default1 of rolename>".
3. Here's default2 for any role *role1* is for users who either (a) never
specified any roles for a node, or (b) specified other roles, but not
*role1*. In both cases, the behaviour of that node would implicitly
assume "role1:<default2 of role1>".

*My alternate proposal:*
1. There are no role specific defaults. Example:
data: {modes: [on, off]}
overseer: {modes: [allowed, disallowed, preferred]}
ui: {modes: [on, off]}

2. There is a node specific default roles string *if no *-Dnode.roles was
specified. Example:
"data:on, overseer:allowed" (Today's system)
"data:on, overseer:allowed, ui:on" (When a future role, say "ui" is
introduced)

3. If a node was started with explicitly specified roles, that node will
have exactly those roles (in the specified modes) and nothing else (no
assumptions about other non-specified roles, i.e. those roles not specified
will not run).

*Benefits of my proposal:*
1. Easier to understand for users.
2. Here's a scenario where user will be happier in my proposal vs. Ilan's
proposal:
   * 10 nodes with *-Droles=data:on,overseer:allowed*. (Regular data nodes)
   * 2 nodes with *-Droles=overseer:preferred*. (Two dedicated overseer
nodes)
   * User upgrades from Solr 9.0 to 9.1, where "ui" role has been
introduced. Developers of "ui" role want it to be available for most users.
        - In Ilan's proposal, the developer chooses this in 9.1: ui:
{modes: [on, off], default1: on, *default2: on*}. Now, user upgrading will
see that UI is running on his two overseer nodes, and he's confused
(because he explicitly specified what he wants)
        - In my proposal, the developer chooses ui: {modes: [on, off]};
default roles for those users who don't specify roles: "data:on,
overseer:allowed, ui:on". Now, there are no surprises of implicit default.
Users who don't use roles at all will get this functionality turned on,
just as the developer wanted. Users who use roles will have to explicitly
append "ui:on" to their roles string on their nodes during the upgrade
(this tip will come from the upgrade notes).

What do you think, Ilan/Noble/Mike/Gus/Houston?

On Mon, Dec 6, 2021 at 8:10 AM Noble Paul <no...@gmail.com> wrote:

> Ilan was asking how what should be the overseer role in the following
> situations
>
> a) role=overseer,data:on
> b) role=overseer: preferred,data:on
> c) role=data:on
>
> I'm saying a shouldn't be valid. Only b & c are valid
>
>
>
>
>
>
> On Mon, Dec 6, 2021, 12:44 PM Mike Drob <md...@mdrob.com> wrote:
>
>> Ilan,
>>
>> Can you provide a more detailed concrete example? I’m having a lot of
>> trouble understanding what you are proposing, beyond that it is somehow
>> contraindicated with what Ishan/Noble suggest.
>>
>> Apologies for my failure to understand.
>>
>> Thanks,
>> Mike
>>
>> On Sun, Dec 5, 2021 at 5:21 PM Ilan Ginzburg <il...@gmail.com> wrote:
>>
>>> If we go with optional role params, we need two defaults:
>>> 1. the param value to use when the role is specified without a
>>> parameter, and
>>> 2. the param value to use for the role on a node for which the role is
>>> not specified at all.
>>>
>>> I don't know how to sensibly name these defaults, but the actual
>>> values would be:
>>> overseer: default1=preferred, default2=allowed
>>> data: default1=on, default2=on
>>> coordinator: default1=on, default2=off
>>>
>>> If we do not allow specifying a role without a parameter, then
>>> default1 does not exist and the example Noble posted earlier covers
>>> us. But simple roles will be easier to use without parameters (and the
>>> transition from existing overseer role would be trivial).
>>>
>>> On Sun, Dec 5, 2021 at 7:17 AM Ishan Chattopadhyaya
>>> <ic...@gmail.com> wrote:
>>> >
>>> > I'm +1 on this. It "looks" complicated at first, but simplifies all
>>> headaches going forward.
>>> >
>>> > On Sun, Dec 5, 2021 at 11:46 AM Noble Paul <no...@gmail.com>
>>> wrote:
>>> >>
>>> >> I shall update the SIP proposal if we have a consensus on this
>>> configuration
>>> >>
>>> >> On Sun, Dec 5, 2021 at 4:58 PM Noble Paul <no...@gmail.com>
>>> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <gu...@gmail.com> wrote:
>>> >>>>
>>> >>>> I like this in that it's an example of how the overseer might be
>>> extended without creating a new role :)
>>> >>>>
>>> >>>> Not entirely sure if I'm for or against an enum implementation
>>> here, but it makes me a bit nervous. Enums with complexity can quickly get
>>> into difficulty for unit tests (especially if one wanted to write a mock
>>> object based test, something I think we maybe should use a bit more than we
>>> do).
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> I would tend to think of a class to represent and collect role
>>> related functionality, one that perhaps has methods that receive the
>>> request, or other key objects and thus could be tested without standing up
>>> an entire server. (Not against also having them exercised in a few
>>> integrated tests, but the more we can avoid interleaving logic directly
>>> within DispatchFilter and HttpSolrCall etc. the better.
>>> >>>>
>>> >>>>
>>> >>>> So I guess I'm somewhat biased against any enum with more than a
>>> couple properties, and definitely don't want to wind up hanging lots of
>>> methods off of one. Better to use them to consume a configuration value and
>>> then instantiate a class that really holds the logic and data. I like them
>>> for constraining values and easy string value conversion but the more they
>>> look like classes the more I'd rather have a class.
>>> >>>
>>> >>>
>>> >>>  I just meant it is a set of values. Please let us not discuss the
>>> actual impl here . We should stick to discussing the high level design here
>>> and specifics should be dealt with in a PR
>>> >>>>
>>> >>>>
>>> >>>> -Gus
>>> >>>>
>>> >>>> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <no...@gmail.com>
>>> wrote:
>>> >>>>>
>>> >>>>> I recommend the following format for the role spec
>>> >>>>>
>>> >>>>> roles=<role-name>:<role-value>
>>> >>>>>
>>> >>>>> each role will have an enum of allowed values and a default value
>>> >>>>>
>>> >>>>> role name: data
>>> >>>>>
>>> >>>>> values: [on, off]
>>> >>>>> default: allowed
>>> >>>>>
>>> >>>>> role name: overseer
>>> >>>>>
>>> >>>>> values: [allowed, disallowed, preferred]
>>> >>>>> default : allowed
>>> >>>>>
>>> >>>>> role name: coordinator
>>> >>>>>
>>> >>>>> values : [on, off]
>>> >>>>> default: off
>>> >>>>>
>>> >>>>>
>>> >>>>> examples
>>> >>>>> roles=data:on,overseer:allowed (This is redundant because it uses
>>> all the default values. If a node is started without any roles value this
>>> is the default behavior)
>>> >>>>> roles=data:off,overseer:preferred ( do not allow data, join
>>> overseer election at head)
>>> >>>>> roles=coordinator:on,data:on (role as coordinator, but allow data,
>>> it's same as roles=coordinator:on)
>>> >>>>> roles=coordinator:on,data:off (role as coordinator, disallow data)
>>> >>>>>
>>> >>>>>
>>> >>>>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <il...@gmail.com>
>>> wrote:
>>> >>>>>>
>>> >>>>>> If we go with no negative node roles and overseer node role is
>>> not strict (i.e. it’s a "preferred overseer"), then one would need to
>>> define a second node role "no_overseer" to explicitly exclude a node from
>>> ever becoming overseer (which I think is a useful feature until we switch
>>> the cluster default to not using the overseer), plus the implementation of
>>> these two node roles will obviously be coupled (and what if a node has both
>>> defined?).
>>> >>>>>>
>>> >>>>>> I prefer strict node roles.
>>> >>>>>> Maybe we could have node roles with [optional] parameters to let
>>> the node role implementation decide ?
>>> >>>>>> The overseer node role for example could have one of 3 values
>>> defined for each node: “preferred” (default, equivalent to the existing
>>> overseer role), "accepted" (equivalent to currently not defining the
>>> overseer role) and "no_way" (does not exist today).
>>> >>>>>>
>>> >>>>>> This could be useful in other contexts. A node role “data” could
>>> be “fast” or “slow” depending on type of local persistent storage for
>>> example…
>>> >>>>>>
>>> >>>>>> Ilan
>>> >>>>>>
>>> >>>>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com> wrote:
>>> >>>>>>>
>>> >>>>>>> I really don't think we should have types of roles. Not
>>> negative/positive and not strict/non-strict. You have a role or you don't.
>>> What that means is up to the code implementing the role.
>>> >>>>>>>
>>> >>>>>>> Roles should be free to configure a preference order (binary, or
>>> n-ary or whatever, strict or loose), prohibit behavior, or enable behavior.
>>> In this SIP I feel we should focus on How to identify what node has what
>>> role, How to designate what roles a node has via config/params, and the
>>> API's for interacting with roles.
>>> >>>>>>>
>>> >>>>>>> We should for example be able to support roles such as
>>> >>>>>>>
>>> >>>>>>> PREFERRED_OVERSEER
>>> >>>>>>> DATA
>>> >>>>>>> NO_ROUTED_ALIAS  (just an example, not something I mean to
>>> suggest)
>>> >>>>>>>
>>> >>>>>>> Details about role implementation should probably be discussed
>>> in a thread about that role.  Obviously we should think about the name
>>> carefully to leave options open should we want to enhance things later so
>>> maybe
>>> >>>>>>>
>>> >>>>>>> OVERSEER_PREF  or just  OVERSEER
>>> >>>>>>>
>>> >>>>>>> would be better since it merely reades that the node implements
>>> some sort of preference or config regarding overseer... but all this can be
>>> decided on a per role basis
>>> >>>>>>>
>>> >>>>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <no...@gmail.com>
>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>> Negative roles have a place
>>> >>>>>>>>
>>> >>>>>>>> Example is overseer
>>> >>>>>>>>
>>> >>>>>>>> There are 3 possible choices for that role
>>> >>>>>>>>
>>> >>>>>>>> a) preferred: always be in front of the election queue
>>> >>>>>>>> b) on: not preferred, but can be an overseer if no preferred
>>> overseer nodes are available
>>> >>>>>>>> c) off: never become an overseer
>>> >>>>>>>>
>>> >>>>>>>> Today we only have options 'a' and 'b' . In a future ticket, we
>>> may implement C
>>> >>>>>>>>
>>> >>>>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com>
>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>> Negative roles add a lot of complexity, I would really want to
>>> stay away from them. That’s why I want strict roles up front. It’s maybe ok
>>> to push this decision out, but it also seems like the sort of thing we
>>> should consider at the start.
>>> >>>>>>>>>
>>> >>>>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <
>>> noble.paul@gmail.com> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>> Yes. Negative roles is not a bad idea. If I start a node for
>>> machine learning purposes, I wouldn't want that node to ever participate in
>>> overseer election
>>> >>>>>>>>>>
>>> >>>>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <
>>> ilansolr@gmail.com> wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> If we have non strict roles (like overseer), then it does
>>> make sense
>>> >>>>>>>>>>> to have negative roles.
>>> >>>>>>>>>>> That way I can define which are the two nodes that I'd
>>> prefer the
>>> >>>>>>>>>>> overseer to run on, and a few other nodes on which it should
>>> >>>>>>>>>>> definitely never run for various reasons. And in case these
>>> >>>>>>>>>>> "!overseer" are the only nodes left in the cluster, let the
>>> cluster
>>> >>>>>>>>>>> fail the same way it would if there were no data nodes
>>> available.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
>>> houstonputman@gmail.com> wrote:
>>> >>>>>>>>>>> >>>
>>> >>>>>>>>>>> >>> With the Strict/Loose option and sensible defaults,
>>> users cannot trip themselves up by default, but the option is there for
>>> people to tinker and have an iron grip over their cluster.
>>> >>>>>>>>>>> >>
>>> >>>>>>>>>>> >>
>>> >>>>>>>>>>> >> +1 to sensible defaults so users don't trip themselves.
>>> The option to tinker for tighter grip can be tackled later, either on a per
>>> role basis or as a generic concept later.
>>> >>>>>>>>>>> >
>>> >>>>>>>>>>> >
>>> >>>>>>>>>>> > +1 - Can definitely be added later if we so desire, not
>>> needed for this SIP
>>> >>>>>>>>>>> >
>>> >>>>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>> ichattopadhyaya@gmail.com> wrote:
>>> >>>>>>>>>>> >>
>>> >>>>>>>>>>> >>
>>> >>>>>>>>>>> >>
>>> >>>>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <
>>> gus.heck@gmail.com> wrote:
>>> >>>>>>>>>>> >>>
>>> >>>>>>>>>>> >>> I think the key  is to let the roles have full control
>>> of the implications of having/not having that role. No need for even a
>>> strict/loose designation. The question of do you have the role is yes/no
>>> with no logic to guess if the role is implied or not, The question of will
>>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>> >>>>>>>>>>> >>>
>>> >>>>>>>>>>> >>> Once you figure out who has a role (or not) what that
>>> means is up to the role code.
>>> >>>>>>>>>>> >>>
>>> >>>>>>>>>>> >>> Corollary: we don't have to change the way overseer
>>> works in this SIP. We can rework it or not as we see fit separately.
>>> >>>>>>>>>>> >>
>>> >>>>>>>>>>> >>
>>> >>>>>>>>>>> >> +1
>>> >>>>>>>>>>> >>
>>> >>>>>>>>>>> >>>
>>> >>>>>>>>>>> >>>
>>> >>>>>>>>>>> >>> Only thing we need to do is find a wording that makes
>>> the above clear on first read through the SIP :)
>>> >>>>>>>>>>> >>>
>>> >>>>>>>>>>> >>> -Gus
>>> >>>>>>>>>>> >>>
>>> >>>>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>> houstonputman@gmail.com> wrote:
>>> >>>>>>>>>>> >>>>>
>>> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
>>> happens if all of our existing OVERSEER candidates are down. When at least
>>> one of them is up, the overseer will go there, and that is good and
>>> expected. But what happens if all of the overseer eligible nodes are down.
>>> Your comment, and the old system, would imply that the overseer election
>>> goes to some other unrelated, untagged node. I disagree with this
>>> implementation choice. This sounds like something role specific to
>>> determine, but I would like to see us be more strict about it. I don't want
>>> cores leaking out of my data roles, I don't want query processing to leak
>>> out of my "query" nodes or whatever. Overseer shouldn't be special in this
>>> regard.
>>> >>>>>>>>>>> >>>>
>>> >>>>>>>>>>> >>>>
>>> >>>>>>>>>>> >>>> I'm very strongly in favor of not letting users design
>>> a system in which the cluster can be "live" without an overseer. I
>>> understand that the overseer can be taxing to the cluster, but honestly
>>> what is the point of having an untaxed cluster that doesn't have an
>>> overseer? I can see arguments for the other roles to be stricter about
>>> this, but there are also a lot of users who wouldn't want those to be
>>> strict either (like "query" nodes).
>>> >>>>>>>>>>> >>>>
>>> >>>>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a
>>> non-overseer role node HAS to be selected to become overseer, it will try
>>> to migrate the overseer job to a node with the overseer role whenever one
>>> becomes live.
>>> >>>>>>>>>>> >>>>
>>> >>>>>>>>>>> >>>> So maybe we don't have special rules per role, but
>>> instead roles can either be defined as "Strict" or "Loose" (better names
>>> likely exist), and the roles come with a default (Overseer -> Loose, Data
>>> -> Strict, Query -> Loose, etc.). And it is up to each role to define how
>>> to behave when running in LOOSE mode and a non-role node is used then a
>>> role node comes online (like the overseer example given above).
>>> >>>>>>>>>>> >>>>
>>> >>>>>>>>>>> >>>> With the Strict/Loose option and sensible defaults,
>>> users cannot trip themselves up by default, but the option is there for
>>> people to tinker and have an iron grip over their cluster.
>>> >>>>>>>>>>> >>>>
>>> >>>>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <
>>> mdrob@mdrob.com> wrote:
>>> >>>>>>>>>>> >>>>>
>>> >>>>>>>>>>> >>>>> Noble wrote:
>>> >>>>>>>>>>> >>>>> > We are not modifying the way the "overseer role"
>>> works today. We are just changing the definition and standardizing the
>>> configuration & discoverability
>>> >>>>>>>>>>> >>>>> Ishan wrote:
>>> >>>>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the
>>> OVERSEER role (which currently stands for preferred overseer). We can take
>>> a stab at refactoring it later.
>>> >>>>>>>>>>> >>>>>
>>> >>>>>>>>>>> >>>>> Grouping these two comments together, since I think
>>> they are saying the same thing. I think this is part of my confusion. We
>>> have an old system that doesn't work the way we want the new system to
>>> work. There may be people already using the old system. What path do we
>>> offer for folks using the old system to migrate to the new system? What
>>> happens if somebody accidentally tries to use both systems at the same time?
>>> >>>>>>>>>>> >>>>>
>>> >>>>>>>>>>> >>>>> Ishan wrote:
>>> >>>>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with
>>> OVERSEER role] are live, Solr guarantees that one of those nodes becomes
>>> the overseer.", I meant to somewhat capture the current behaviour as the
>>> OVERSEER role performs today. Do you see any inconsistency with this
>>> statement vs. what it does today?
>>> >>>>>>>>>>> >>>>>
>>> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
>>> happens if all of our existing OVERSEER candidates are down. When at least
>>> one of them is up, the overseer will go there, and that is good and
>>> expected. But what happens if all of the overseer eligible nodes are down.
>>> Your comment, and the old system, would imply that the overseer election
>>> goes to some other unrelated, untagged node. I disagree with this
>>> implementation choice. This sounds like something role specific to
>>> determine, but I would like to see us be more strict about it. I don't want
>>> cores leaking out of my data roles, I don't want query processing to leak
>>> out of my "query" nodes or whatever. Overseer shouldn't be special in this
>>> regard.
>>> >>>>>>>>>>> >>>>>
>>> >>>>>>>>>>> >>>>> Noble wrote:
>>> >>>>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a
>>> node in the following request?
>>> >>>>>>>>>>> >>>>>
>>> >>>>>>>>>>> >>>>> You're absolutely correct, thanks for pointing this
>>> out. Let's leave it as is.
>>> >>>>>>>>>>> >>>>>
>>> >>>>>>>>>>> >>>>>
>>> >>>>>>>>>>> >>>>>
>>> >>>>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>> ichattopadhyaya@gmail.com> wrote:
>>> >>>>>>>>>>> >>>>>>
>>> >>>>>>>>>>> >>>>>>
>>> >>>>>>>>>>> >>>>>>
>>> >>>>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <
>>> mdrob@mdrob.com> wrote:
>>> >>>>>>>>>>> >>>>>>>
>>> >>>>>>>>>>> >>>>>>> Replying to the top post in this thread because
>>> there has been a lot of discussion and I don't want to look like I'm
>>> continuing any of those particular threads.
>>> >>>>>>>>>>> >>>>>>>
>>> >>>>>>>>>>> >>>>>>> I finally had time to sit down and think about this
>>> with the attention it deserves and am generally happy with how the
>>> conversation has shaped the current proposal.
>>> >>>>>>>>>>> >>>>>>>
>>> >>>>>>>>>>> >>>>>>> GOOD: I think using system properties to define node
>>> roles is fine and I like that data is the default role when not defined. I
>>> think it is important to hold on to the guarantee that an active overseer
>>> will land on an overseer node role.
>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path
>>> for folks using the current OVERSEER role. I am not sure that something can
>>> be done automatically since they need to now specify new properties at
>>> startup. Maybe we need to include loud warnings or support both approaches
>>> for a time?
>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the
>>> overseer nodes fail, then it is implied the overseer will go to one of the
>>> data nodes. The specific wording in the SIP - "When one or more such nodes
>>> are live, Solr guarantees that one of those nodes become the overseer."
>>> implies to me that failover could go from overseer1 to overseer2 to
>>> overseerN to random node. I feel like we need to have some recording that
>>> there were dedicated overseer nodes and stop the cascading failure instead
>>> of churning through our data nodes.
>>> >>>>>>>>>>> >>>>>>>
>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the
>>> proposed scope of "coordinator" roles from a split query/indexing
>>> standpoint. I understand that these are used as examples, but would like
>>> stronger language that new roles should also go through their own SIP
>>> discussions.
>>> >>>>>>>>>>> >>>>>>>
>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing
>>> node liveness in two different places now. We have the live nodes and we
>>> have the node roles stored in two different places in zookeeper and it
>>> feels like this would lead to race conditions or split brain or other hard
>>> to diagnose bugs when those two lists don't agree with each other. This
>>> also feels like it contradicts the "single source of truth" idea later
>>> stated in the proposal. I see Gus's arguments for decoupling these and am
>>> not strongly opposed, I just get a lurking feeling about it. Even if we
>>> don't do this, I would like this called out explicitly in the alternative
>>> approaches section as something that we considered and rejected, with
>>> details why,
>>> >>>>>>>>>>> >>>>>>>
>>> >>>>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an
>>> additional call out here that all operations are GET because nodes cannot
>>> be changed at runtime.
>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the
>>> previous OVERSEER preference role?
>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of
>>> available roles for a cluster. I _think_ this could be based on the version
>>> that the cluster is running? Would be useful to be able to interrogate a
>>> cluster in the future... we're seeing OOM issues on queries, can we add
>>> some query nodes? When were they introduced? I don't know what path this
>>> API should exist at.
>>> >>>>>>>>>>> >>>>>>
>>> >>>>>>>>>>> >>>>>>
>>> >>>>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated
>>> the SIP document. Not sure if there's a better path that we could go for.
>>> >>>>>>>>>>> >>>>>>
>>> >>>>>>>>>>> >>>>>>>
>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show
>>> which parts are string literals and which parts are meant to be substituted
>>> by the operator? GET /api/cluster/roles/data would become GET
>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET
>>> /api/cluster/roles/nodes/node1 should be GET /api/cluster/roles/${nodename}
>>> dropping the intermediate "nodes"
>>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need
>>> that intermediate "nodes" node.
>>> >>>>>>>>>>> >>>>>>>
>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some
>>> permissions? Maybe this requirement is too fundamental to the operation of
>>> a cluster and everybody would have to be able to do it.
>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other
>>> clients) to treat roles? Implementation detail that the servers will figure
>>> out? Or strict guidance where the client needs to check where specific
>>> roles are before sending any further communication to the server?
>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a
>>> request that it can't fulfil? An overseer node gets a query or an update. A
>>> data node gets a collection creation request. Do they forward it on to an
>>> appropriate node, or do they reject it? Should this be configurable? If
>>> not, then it seems like lazy or poorly configured clients will defeat this
>>> isolation system quite easily.
>>> >>>>>>>>>>> >>>>>>>
>>> >>>>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>> >>>>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes
>>> behave when roles are added mean? I thought we established that they are
>>> not dynamic.
>>> >>>>>>>>>>> >>>>>>>
>>> >>>>>>>>>>> >>>>>>>
>>> >>>>>>>>>>> >>>>>>> Thanks,
>>> >>>>>>>>>>> >>>>>>> Mike
>>> >>>>>>>>>>> >>>>>>>
>>> >>>>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya
>>> <ic...@gmail.com> wrote:
>>> >>>>>>>>>>> >>>>>>>>
>>> >>>>>>>>>>> >>>>>>>> Hi,
>>> >>>>>>>>>>> >>>>>>>>
>>> >>>>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node
>>> roles:
>>> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>> >>>>>>>>>>> >>>>>>>>
>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>> >>>>>>>>>>> >>>>>>>>
>>> >>>>>>>>>>> >>>>>>>> We also wish to add first class support for Query
>>> nodes that are used to process user queries by forwarding to data nodes,
>>> merging/aggregating them and presenting to users. This concept exists as
>>> first class citizens in most other search engines. This is a chance for
>>> Solr to catch up.
>>> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>> >>>>>>>>>>> >>>>>>>>
>>> >>>>>>>>>>> >>>>>>>> Regards,
>>> >>>>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
>>> >>>>>>>>>>> >>>
>>> >>>>>>>>>>> >>>
>>> >>>>>>>>>>> >>>
>>> >>>>>>>>>>> >>> --
>>> >>>>>>>>>>> >>> http://www.needhamsoftware.com (work)
>>> >>>>>>>>>>> >>> http://www.the111shift.com (play)
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> ---------------------------------------------------------------------
>>> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>> >>>>>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>> >>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> http://www.needhamsoftware.com (work)
>>> >>>>>>> http://www.the111shift.com (play)
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> -----------------------------------------------------
>>> >>>>> Noble Paul
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> http://www.needhamsoftware.com (work)
>>> >>>> http://www.the111shift.com (play)
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> -----------------------------------------------------
>>> >>> Noble Paul
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> -----------------------------------------------------
>>> >> Noble Paul
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>
>>>

Re: First class support for node roles

Posted by Noble Paul <no...@gmail.com>.

Ilan was asking how what should be the overseer role in the following
situations

a) role=overseer,data:on
b) role=overseer: preferred,data:on
c) role=data:on

I'm saying a shouldn't be valid. Only b & c are valid






On Mon, Dec 6, 2021, 12:44 PM Mike Drob <md...@mdrob.com> wrote:

> Ilan,
>
> Can you provide a more detailed concrete example? I’m having a lot of
> trouble understanding what you are proposing, beyond that it is somehow
> contraindicated with what Ishan/Noble suggest.
>
> Apologies for my failure to understand.
>
> Thanks,
> Mike
>
> On Sun, Dec 5, 2021 at 5:21 PM Ilan Ginzburg <il...@gmail.com> wrote:
>
>> If we go with optional role params, we need two defaults:
>> 1. the param value to use when the role is specified without a parameter,
>> and
>> 2. the param value to use for the role on a node for which the role is
>> not specified at all.
>>
>> I don't know how to sensibly name these defaults, but the actual
>> values would be:
>> overseer: default1=preferred, default2=allowed
>> data: default1=on, default2=on
>> coordinator: default1=on, default2=off
>>
>> If we do not allow specifying a role without a parameter, then
>> default1 does not exist and the example Noble posted earlier covers
>> us. But simple roles will be easier to use without parameters (and the
>> transition from existing overseer role would be trivial).
>>
>> On Sun, Dec 5, 2021 at 7:17 AM Ishan Chattopadhyaya
>> <ic...@gmail.com> wrote:
>> >
>> > I'm +1 on this. It "looks" complicated at first, but simplifies all
>> headaches going forward.
>> >
>> > On Sun, Dec 5, 2021 at 11:46 AM Noble Paul <no...@gmail.com>
>> wrote:
>> >>
>> >> I shall update the SIP proposal if we have a consensus on this
>> configuration
>> >>
>> >> On Sun, Dec 5, 2021 at 4:58 PM Noble Paul <no...@gmail.com>
>> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <gu...@gmail.com> wrote:
>> >>>>
>> >>>> I like this in that it's an example of how the overseer might be
>> extended without creating a new role :)
>> >>>>
>> >>>> Not entirely sure if I'm for or against an enum implementation here,
>> but it makes me a bit nervous. Enums with complexity can quickly get into
>> difficulty for unit tests (especially if one wanted to write a mock object
>> based test, something I think we maybe should use a bit more than we do).
>> >>>>
>> >>>>
>> >>>>
>> >>>> I would tend to think of a class to represent and collect role
>> related functionality, one that perhaps has methods that receive the
>> request, or other key objects and thus could be tested without standing up
>> an entire server. (Not against also having them exercised in a few
>> integrated tests, but the more we can avoid interleaving logic directly
>> within DispatchFilter and HttpSolrCall etc. the better.
>> >>>>
>> >>>>
>> >>>> So I guess I'm somewhat biased against any enum with more than a
>> couple properties, and definitely don't want to wind up hanging lots of
>> methods off of one. Better to use them to consume a configuration value and
>> then instantiate a class that really holds the logic and data. I like them
>> for constraining values and easy string value conversion but the more they
>> look like classes the more I'd rather have a class.
>> >>>
>> >>>
>> >>>  I just meant it is a set of values. Please let us not discuss the
>> actual impl here . We should stick to discussing the high level design here
>> and specifics should be dealt with in a PR
>> >>>>
>> >>>>
>> >>>> -Gus
>> >>>>
>> >>>> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <no...@gmail.com>
>> wrote:
>> >>>>>
>> >>>>> I recommend the following format for the role spec
>> >>>>>
>> >>>>> roles=<role-name>:<role-value>
>> >>>>>
>> >>>>> each role will have an enum of allowed values and a default value
>> >>>>>
>> >>>>> role name: data
>> >>>>>
>> >>>>> values: [on, off]
>> >>>>> default: allowed
>> >>>>>
>> >>>>> role name: overseer
>> >>>>>
>> >>>>> values: [allowed, disallowed, preferred]
>> >>>>> default : allowed
>> >>>>>
>> >>>>> role name: coordinator
>> >>>>>
>> >>>>> values : [on, off]
>> >>>>> default: off
>> >>>>>
>> >>>>>
>> >>>>> examples
>> >>>>> roles=data:on,overseer:allowed (This is redundant because it uses
>> all the default values. If a node is started without any roles value this
>> is the default behavior)
>> >>>>> roles=data:off,overseer:preferred ( do not allow data, join
>> overseer election at head)
>> >>>>> roles=coordinator:on,data:on (role as coordinator, but allow data,
>> it's same as roles=coordinator:on)
>> >>>>> roles=coordinator:on,data:off (role as coordinator, disallow data)
>> >>>>>
>> >>>>>
>> >>>>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <il...@gmail.com>
>> wrote:
>> >>>>>>
>> >>>>>> If we go with no negative node roles and overseer node role is not
>> strict (i.e. it’s a "preferred overseer"), then one would need to define a
>> second node role "no_overseer" to explicitly exclude a node from ever
>> becoming overseer (which I think is a useful feature until we switch the
>> cluster default to not using the overseer), plus the implementation of
>> these two node roles will obviously be coupled (and what if a node has both
>> defined?).
>> >>>>>>
>> >>>>>> I prefer strict node roles.
>> >>>>>> Maybe we could have node roles with [optional] parameters to let
>> the node role implementation decide ?
>> >>>>>> The overseer node role for example could have one of 3 values
>> defined for each node: “preferred” (default, equivalent to the existing
>> overseer role), "accepted" (equivalent to currently not defining the
>> overseer role) and "no_way" (does not exist today).
>> >>>>>>
>> >>>>>> This could be useful in other contexts. A node role “data” could
>> be “fast” or “slow” depending on type of local persistent storage for
>> example…
>> >>>>>>
>> >>>>>> Ilan
>> >>>>>>
>> >>>>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com> wrote:
>> >>>>>>>
>> >>>>>>> I really don't think we should have types of roles. Not
>> negative/positive and not strict/non-strict. You have a role or you don't.
>> What that means is up to the code implementing the role.
>> >>>>>>>
>> >>>>>>> Roles should be free to configure a preference order (binary, or
>> n-ary or whatever, strict or loose), prohibit behavior, or enable behavior.
>> In this SIP I feel we should focus on How to identify what node has what
>> role, How to designate what roles a node has via config/params, and the
>> API's for interacting with roles.
>> >>>>>>>
>> >>>>>>> We should for example be able to support roles such as
>> >>>>>>>
>> >>>>>>> PREFERRED_OVERSEER
>> >>>>>>> DATA
>> >>>>>>> NO_ROUTED_ALIAS  (just an example, not something I mean to
>> suggest)
>> >>>>>>>
>> >>>>>>> Details about role implementation should probably be discussed in
>> a thread about that role.  Obviously we should think about the name
>> carefully to leave options open should we want to enhance things later so
>> maybe
>> >>>>>>>
>> >>>>>>> OVERSEER_PREF  or just  OVERSEER
>> >>>>>>>
>> >>>>>>> would be better since it merely reades that the node implements
>> some sort of preference or config regarding overseer... but all this can be
>> decided on a per role basis
>> >>>>>>>
>> >>>>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <no...@gmail.com>
>> wrote:
>> >>>>>>>>
>> >>>>>>>> Negative roles have a place
>> >>>>>>>>
>> >>>>>>>> Example is overseer
>> >>>>>>>>
>> >>>>>>>> There are 3 possible choices for that role
>> >>>>>>>>
>> >>>>>>>> a) preferred: always be in front of the election queue
>> >>>>>>>> b) on: not preferred, but can be an overseer if no preferred
>> overseer nodes are available
>> >>>>>>>> c) off: never become an overseer
>> >>>>>>>>
>> >>>>>>>> Today we only have options 'a' and 'b' . In a future ticket, we
>> may implement C
>> >>>>>>>>
>> >>>>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote:
>> >>>>>>>>>
>> >>>>>>>>> Negative roles add a lot of complexity, I would really want to
>> stay away from them. That’s why I want strict roles up front. It’s maybe ok
>> to push this decision out, but it also seems like the sort of thing we
>> should consider at the start.
>> >>>>>>>>>
>> >>>>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <no...@gmail.com>
>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> Yes. Negative roles is not a bad idea. If I start a node for
>> machine learning purposes, I wouldn't want that node to ever participate in
>> overseer election
>> >>>>>>>>>>
>> >>>>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com>
>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> If we have non strict roles (like overseer), then it does
>> make sense
>> >>>>>>>>>>> to have negative roles.
>> >>>>>>>>>>> That way I can define which are the two nodes that I'd prefer
>> the
>> >>>>>>>>>>> overseer to run on, and a few other nodes on which it should
>> >>>>>>>>>>> definitely never run for various reasons. And in case these
>> >>>>>>>>>>> "!overseer" are the only nodes left in the cluster, let the
>> cluster
>> >>>>>>>>>>> fail the same way it would if there were no data nodes
>> available.
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
>> houstonputman@gmail.com> wrote:
>> >>>>>>>>>>> >>>
>> >>>>>>>>>>> >>> With the Strict/Loose option and sensible defaults, users
>> cannot trip themselves up by default, but the option is there for people to
>> tinker and have an iron grip over their cluster.
>> >>>>>>>>>>> >>
>> >>>>>>>>>>> >>
>> >>>>>>>>>>> >> +1 to sensible defaults so users don't trip themselves.
>> The option to tinker for tighter grip can be tackled later, either on a per
>> role basis or as a generic concept later.
>> >>>>>>>>>>> >
>> >>>>>>>>>>> >
>> >>>>>>>>>>> > +1 - Can definitely be added later if we so desire, not
>> needed for this SIP
>> >>>>>>>>>>> >
>> >>>>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>> ichattopadhyaya@gmail.com> wrote:
>> >>>>>>>>>>> >>
>> >>>>>>>>>>> >>
>> >>>>>>>>>>> >>
>> >>>>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <
>> gus.heck@gmail.com> wrote:
>> >>>>>>>>>>> >>>
>> >>>>>>>>>>> >>> I think the key  is to let the roles have full control of
>> the implications of having/not having that role. No need for even a
>> strict/loose designation. The question of do you have the role is yes/no
>> with no logic to guess if the role is implied or not, The question of will
>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>> >>>>>>>>>>> >>>
>> >>>>>>>>>>> >>> Once you figure out who has a role (or not) what that
>> means is up to the role code.
>> >>>>>>>>>>> >>>
>> >>>>>>>>>>> >>> Corollary: we don't have to change the way overseer works
>> in this SIP. We can rework it or not as we see fit separately.
>> >>>>>>>>>>> >>
>> >>>>>>>>>>> >>
>> >>>>>>>>>>> >> +1
>> >>>>>>>>>>> >>
>> >>>>>>>>>>> >>>
>> >>>>>>>>>>> >>>
>> >>>>>>>>>>> >>> Only thing we need to do is find a wording that makes the
>> above clear on first read through the SIP :)
>> >>>>>>>>>>> >>>
>> >>>>>>>>>>> >>> -Gus
>> >>>>>>>>>>> >>>
>> >>>>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>> houstonputman@gmail.com> wrote:
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
>> happens if all of our existing OVERSEER candidates are down. When at least
>> one of them is up, the overseer will go there, and that is good and
>> expected. But what happens if all of the overseer eligible nodes are down.
>> Your comment, and the old system, would imply that the overseer election
>> goes to some other unrelated, untagged node. I disagree with this
>> implementation choice. This sounds like something role specific to
>> determine, but I would like to see us be more strict about it. I don't want
>> cores leaking out of my data roles, I don't want query processing to leak
>> out of my "query" nodes or whatever. Overseer shouldn't be special in this
>> regard.
>> >>>>>>>>>>> >>>>
>> >>>>>>>>>>> >>>>
>> >>>>>>>>>>> >>>> I'm very strongly in favor of not letting users design a
>> system in which the cluster can be "live" without an overseer. I understand
>> that the overseer can be taxing to the cluster, but honestly what is the
>> point of having an untaxed cluster that doesn't have an overseer? I can see
>> arguments for the other roles to be stricter about this, but there are also
>> a lot of users who wouldn't want those to be strict either (like "query"
>> nodes).
>> >>>>>>>>>>> >>>>
>> >>>>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a
>> non-overseer role node HAS to be selected to become overseer, it will try
>> to migrate the overseer job to a node with the overseer role whenever one
>> becomes live.
>> >>>>>>>>>>> >>>>
>> >>>>>>>>>>> >>>> So maybe we don't have special rules per role, but
>> instead roles can either be defined as "Strict" or "Loose" (better names
>> likely exist), and the roles come with a default (Overseer -> Loose, Data
>> -> Strict, Query -> Loose, etc.). And it is up to each role to define how
>> to behave when running in LOOSE mode and a non-role node is used then a
>> role node comes online (like the overseer example given above).
>> >>>>>>>>>>> >>>>
>> >>>>>>>>>>> >>>> With the Strict/Loose option and sensible defaults,
>> users cannot trip themselves up by default, but the option is there for
>> people to tinker and have an iron grip over their cluster.
>> >>>>>>>>>>> >>>>
>> >>>>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <
>> mdrob@mdrob.com> wrote:
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>> Noble wrote:
>> >>>>>>>>>>> >>>>> > We are not modifying the way the "overseer role"
>> works today. We are just changing the definition and standardizing the
>> configuration & discoverability
>> >>>>>>>>>>> >>>>> Ishan wrote:
>> >>>>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the
>> OVERSEER role (which currently stands for preferred overseer). We can take
>> a stab at refactoring it later.
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>> Grouping these two comments together, since I think
>> they are saying the same thing. I think this is part of my confusion. We
>> have an old system that doesn't work the way we want the new system to
>> work. There may be people already using the old system. What path do we
>> offer for folks using the old system to migrate to the new system? What
>> happens if somebody accidentally tries to use both systems at the same time?
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>> Ishan wrote:
>> >>>>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with
>> OVERSEER role] are live, Solr guarantees that one of those nodes becomes
>> the overseer.", I meant to somewhat capture the current behaviour as the
>> OVERSEER role performs today. Do you see any inconsistency with this
>> statement vs. what it does today?
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
>> happens if all of our existing OVERSEER candidates are down. When at least
>> one of them is up, the overseer will go there, and that is good and
>> expected. But what happens if all of the overseer eligible nodes are down.
>> Your comment, and the old system, would imply that the overseer election
>> goes to some other unrelated, untagged node. I disagree with this
>> implementation choice. This sounds like something role specific to
>> determine, but I would like to see us be more strict about it. I don't want
>> cores leaking out of my data roles, I don't want query processing to leak
>> out of my "query" nodes or whatever. Overseer shouldn't be special in this
>> regard.
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>> Noble wrote:
>> >>>>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a
>> node in the following request?
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>> You're absolutely correct, thanks for pointing this
>> out. Let's leave it as is.
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>> ichattopadhyaya@gmail.com> wrote:
>> >>>>>>>>>>> >>>>>>
>> >>>>>>>>>>> >>>>>>
>> >>>>>>>>>>> >>>>>>
>> >>>>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <
>> mdrob@mdrob.com> wrote:
>> >>>>>>>>>>> >>>>>>>
>> >>>>>>>>>>> >>>>>>> Replying to the top post in this thread because there
>> has been a lot of discussion and I don't want to look like I'm continuing
>> any of those particular threads.
>> >>>>>>>>>>> >>>>>>>
>> >>>>>>>>>>> >>>>>>> I finally had time to sit down and think about this
>> with the attention it deserves and am generally happy with how the
>> conversation has shaped the current proposal.
>> >>>>>>>>>>> >>>>>>>
>> >>>>>>>>>>> >>>>>>> GOOD: I think using system properties to define node
>> roles is fine and I like that data is the default role when not defined. I
>> think it is important to hold on to the guarantee that an active overseer
>> will land on an overseer node role.
>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path
>> for folks using the current OVERSEER role. I am not sure that something can
>> be done automatically since they need to now specify new properties at
>> startup. Maybe we need to include loud warnings or support both approaches
>> for a time?
>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the
>> overseer nodes fail, then it is implied the overseer will go to one of the
>> data nodes. The specific wording in the SIP - "When one or more such nodes
>> are live, Solr guarantees that one of those nodes become the overseer."
>> implies to me that failover could go from overseer1 to overseer2 to
>> overseerN to random node. I feel like we need to have some recording that
>> there were dedicated overseer nodes and stop the cascading failure instead
>> of churning through our data nodes.
>> >>>>>>>>>>> >>>>>>>
>> >>>>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed
>> scope of "coordinator" roles from a split query/indexing standpoint. I
>> understand that these are used as examples, but would like stronger
>> language that new roles should also go through their own SIP discussions.
>> >>>>>>>>>>> >>>>>>>
>> >>>>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node
>> liveness in two different places now. We have the live nodes and we have
>> the node roles stored in two different places in zookeeper and it feels
>> like this would lead to race conditions or split brain or other hard to
>> diagnose bugs when those two lists don't agree with each other. This also
>> feels like it contradicts the "single source of truth" idea later stated in
>> the proposal. I see Gus's arguments for decoupling these and am not
>> strongly opposed, I just get a lurking feeling about it. Even if we don't
>> do this, I would like this called out explicitly in the alternative
>> approaches section as something that we considered and rejected, with
>> details why,
>> >>>>>>>>>>> >>>>>>>
>> >>>>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an
>> additional call out here that all operations are GET because nodes cannot
>> be changed at runtime.
>> >>>>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the
>> previous OVERSEER preference role?
>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of
>> available roles for a cluster. I _think_ this could be based on the version
>> that the cluster is running? Would be useful to be able to interrogate a
>> cluster in the future... we're seeing OOM issues on queries, can we add
>> some query nodes? When were they introduced? I don't know what path this
>> API should exist at.
>> >>>>>>>>>>> >>>>>>
>> >>>>>>>>>>> >>>>>>
>> >>>>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated
>> the SIP document. Not sure if there's a better path that we could go for.
>> >>>>>>>>>>> >>>>>>
>> >>>>>>>>>>> >>>>>>>
>> >>>>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show
>> which parts are string literals and which parts are meant to be substituted
>> by the operator? GET /api/cluster/roles/data would become GET
>> /api/cluster/roles/${rolename} in our SIP/documentation.
>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET
>> /api/cluster/roles/nodes/node1 should be GET /api/cluster/roles/${nodename}
>> dropping the intermediate "nodes"
>> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need
>> that intermediate "nodes" node.
>> >>>>>>>>>>> >>>>>>>
>> >>>>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some
>> permissions? Maybe this requirement is too fundamental to the operation of
>> a cluster and everybody would have to be able to do it.
>> >>>>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other
>> clients) to treat roles? Implementation detail that the servers will figure
>> out? Or strict guidance where the client needs to check where specific
>> roles are before sending any further communication to the server?
>> >>>>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a
>> request that it can't fulfil? An overseer node gets a query or an update. A
>> data node gets a collection creation request. Do they forward it on to an
>> appropriate node, or do they reject it? Should this be configurable? If
>> not, then it seems like lazy or poorly configured clients will defeat this
>> isolation system quite easily.
>> >>>>>>>>>>> >>>>>>>
>> >>>>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>> >>>>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave
>> when roles are added mean? I thought we established that they are not
>> dynamic.
>> >>>>>>>>>>> >>>>>>>
>> >>>>>>>>>>> >>>>>>>
>> >>>>>>>>>>> >>>>>>> Thanks,
>> >>>>>>>>>>> >>>>>>> Mike
>> >>>>>>>>>>> >>>>>>>
>> >>>>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>> ichattopadhyaya@gmail.com> wrote:
>> >>>>>>>>>>> >>>>>>>>
>> >>>>>>>>>>> >>>>>>>> Hi,
>> >>>>>>>>>>> >>>>>>>>
>> >>>>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node
>> roles:
>> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>> >>>>>>>>>>> >>>>>>>>
>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>> >>>>>>>>>>> >>>>>>>>
>> >>>>>>>>>>> >>>>>>>> We also wish to add first class support for Query
>> nodes that are used to process user queries by forwarding to data nodes,
>> merging/aggregating them and presenting to users. This concept exists as
>> first class citizens in most other search engines. This is a chance for
>> Solr to catch up.
>> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>> >>>>>>>>>>> >>>>>>>>
>> >>>>>>>>>>> >>>>>>>> Regards,
>> >>>>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
>> >>>>>>>>>>> >>>
>> >>>>>>>>>>> >>>
>> >>>>>>>>>>> >>>
>> >>>>>>>>>>> >>> --
>> >>>>>>>>>>> >>> http://www.needhamsoftware.com (work)
>> >>>>>>>>>>> >>> http://www.the111shift.com (play)
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> ---------------------------------------------------------------------
>> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>> >>>>>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>> >>>>>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> http://www.needhamsoftware.com (work)
>> >>>>>>> http://www.the111shift.com (play)
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> -----------------------------------------------------
>> >>>>> Noble Paul
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> http://www.needhamsoftware.com (work)
>> >>>> http://www.the111shift.com (play)
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> -----------------------------------------------------
>> >>> Noble Paul
>> >>
>> >>
>> >>
>> >> --
>> >> -----------------------------------------------------
>> >> Noble Paul
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>> For additional commands, e-mail: dev-help@solr.apache.org
>>
>>

Re: First class support for node roles

Posted by Mike Drob <md...@mdrob.com>.

Ilan,

Can you provide a more detailed concrete example? I’m having a lot of
trouble understanding what you are proposing, beyond that it is somehow
contraindicated with what Ishan/Noble suggest.

Apologies for my failure to understand.

Thanks,
Mike

On Sun, Dec 5, 2021 at 5:21 PM Ilan Ginzburg <il...@gmail.com> wrote:

> If we go with optional role params, we need two defaults:
> 1. the param value to use when the role is specified without a parameter,
> and
> 2. the param value to use for the role on a node for which the role is
> not specified at all.
>
> I don't know how to sensibly name these defaults, but the actual
> values would be:
> overseer: default1=preferred, default2=allowed
> data: default1=on, default2=on
> coordinator: default1=on, default2=off
>
> If we do not allow specifying a role without a parameter, then
> default1 does not exist and the example Noble posted earlier covers
> us. But simple roles will be easier to use without parameters (and the
> transition from existing overseer role would be trivial).
>
> On Sun, Dec 5, 2021 at 7:17 AM Ishan Chattopadhyaya
> <ic...@gmail.com> wrote:
> >
> > I'm +1 on this. It "looks" complicated at first, but simplifies all
> headaches going forward.
> >
> > On Sun, Dec 5, 2021 at 11:46 AM Noble Paul <no...@gmail.com> wrote:
> >>
> >> I shall update the SIP proposal if we have a consensus on this
> configuration
> >>
> >> On Sun, Dec 5, 2021 at 4:58 PM Noble Paul <no...@gmail.com> wrote:
> >>>
> >>>
> >>>
> >>> On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <gu...@gmail.com> wrote:
> >>>>
> >>>> I like this in that it's an example of how the overseer might be
> extended without creating a new role :)
> >>>>
> >>>> Not entirely sure if I'm for or against an enum implementation here,
> but it makes me a bit nervous. Enums with complexity can quickly get into
> difficulty for unit tests (especially if one wanted to write a mock object
> based test, something I think we maybe should use a bit more than we do).
> >>>>
> >>>>
> >>>>
> >>>> I would tend to think of a class to represent and collect role
> related functionality, one that perhaps has methods that receive the
> request, or other key objects and thus could be tested without standing up
> an entire server. (Not against also having them exercised in a few
> integrated tests, but the more we can avoid interleaving logic directly
> within DispatchFilter and HttpSolrCall etc. the better.
> >>>>
> >>>>
> >>>> So I guess I'm somewhat biased against any enum with more than a
> couple properties, and definitely don't want to wind up hanging lots of
> methods off of one. Better to use them to consume a configuration value and
> then instantiate a class that really holds the logic and data. I like them
> for constraining values and easy string value conversion but the more they
> look like classes the more I'd rather have a class.
> >>>
> >>>
> >>>  I just meant it is a set of values. Please let us not discuss the
> actual impl here . We should stick to discussing the high level design here
> and specifics should be dealt with in a PR
> >>>>
> >>>>
> >>>> -Gus
> >>>>
> >>>> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <no...@gmail.com>
> wrote:
> >>>>>
> >>>>> I recommend the following format for the role spec
> >>>>>
> >>>>> roles=<role-name>:<role-value>
> >>>>>
> >>>>> each role will have an enum of allowed values and a default value
> >>>>>
> >>>>> role name: data
> >>>>>
> >>>>> values: [on, off]
> >>>>> default: allowed
> >>>>>
> >>>>> role name: overseer
> >>>>>
> >>>>> values: [allowed, disallowed, preferred]
> >>>>> default : allowed
> >>>>>
> >>>>> role name: coordinator
> >>>>>
> >>>>> values : [on, off]
> >>>>> default: off
> >>>>>
> >>>>>
> >>>>> examples
> >>>>> roles=data:on,overseer:allowed (This is redundant because it uses
> all the default values. If a node is started without any roles value this
> is the default behavior)
> >>>>> roles=data:off,overseer:preferred ( do not allow data, join overseer
> election at head)
> >>>>> roles=coordinator:on,data:on (role as coordinator, but allow data,
> it's same as roles=coordinator:on)
> >>>>> roles=coordinator:on,data:off (role as coordinator, disallow data)
> >>>>>
> >>>>>
> >>>>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <il...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> If we go with no negative node roles and overseer node role is not
> strict (i.e. it’s a "preferred overseer"), then one would need to define a
> second node role "no_overseer" to explicitly exclude a node from ever
> becoming overseer (which I think is a useful feature until we switch the
> cluster default to not using the overseer), plus the implementation of
> these two node roles will obviously be coupled (and what if a node has both
> defined?).
> >>>>>>
> >>>>>> I prefer strict node roles.
> >>>>>> Maybe we could have node roles with [optional] parameters to let
> the node role implementation decide ?
> >>>>>> The overseer node role for example could have one of 3 values
> defined for each node: “preferred” (default, equivalent to the existing
> overseer role), "accepted" (equivalent to currently not defining the
> overseer role) and "no_way" (does not exist today).
> >>>>>>
> >>>>>> This could be useful in other contexts. A node role “data” could be
> “fast” or “slow” depending on type of local persistent storage for example…
> >>>>>>
> >>>>>> Ilan
> >>>>>>
> >>>>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> I really don't think we should have types of roles. Not
> negative/positive and not strict/non-strict. You have a role or you don't.
> What that means is up to the code implementing the role.
> >>>>>>>
> >>>>>>> Roles should be free to configure a preference order (binary, or
> n-ary or whatever, strict or loose), prohibit behavior, or enable behavior.
> In this SIP I feel we should focus on How to identify what node has what
> role, How to designate what roles a node has via config/params, and the
> API's for interacting with roles.
> >>>>>>>
> >>>>>>> We should for example be able to support roles such as
> >>>>>>>
> >>>>>>> PREFERRED_OVERSEER
> >>>>>>> DATA
> >>>>>>> NO_ROUTED_ALIAS  (just an example, not something I mean to suggest)
> >>>>>>>
> >>>>>>> Details about role implementation should probably be discussed in
> a thread about that role.  Obviously we should think about the name
> carefully to leave options open should we want to enhance things later so
> maybe
> >>>>>>>
> >>>>>>> OVERSEER_PREF  or just  OVERSEER
> >>>>>>>
> >>>>>>> would be better since it merely reades that the node implements
> some sort of preference or config regarding overseer... but all this can be
> decided on a per role basis
> >>>>>>>
> >>>>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <no...@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>> Negative roles have a place
> >>>>>>>>
> >>>>>>>> Example is overseer
> >>>>>>>>
> >>>>>>>> There are 3 possible choices for that role
> >>>>>>>>
> >>>>>>>> a) preferred: always be in front of the election queue
> >>>>>>>> b) on: not preferred, but can be an overseer if no preferred
> overseer nodes are available
> >>>>>>>> c) off: never become an overseer
> >>>>>>>>
> >>>>>>>> Today we only have options 'a' and 'b' . In a future ticket, we
> may implement C
> >>>>>>>>
> >>>>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Negative roles add a lot of complexity, I would really want to
> stay away from them. That’s why I want strict roles up front. It’s maybe ok
> to push this decision out, but it also seems like the sort of thing we
> should consider at the start.
> >>>>>>>>>
> >>>>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <no...@gmail.com>
> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Yes. Negative roles is not a bad idea. If I start a node for
> machine learning purposes, I wouldn't want that node to ever participate in
> overseer election
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com>
> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> If we have non strict roles (like overseer), then it does make
> sense
> >>>>>>>>>>> to have negative roles.
> >>>>>>>>>>> That way I can define which are the two nodes that I'd prefer
> the
> >>>>>>>>>>> overseer to run on, and a few other nodes on which it should
> >>>>>>>>>>> definitely never run for various reasons. And in case these
> >>>>>>>>>>> "!overseer" are the only nodes left in the cluster, let the
> cluster
> >>>>>>>>>>> fail the same way it would if there were no data nodes
> available.
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
> houstonputman@gmail.com> wrote:
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> With the Strict/Loose option and sensible defaults, users
> cannot trip themselves up by default, but the option is there for people to
> tinker and have an iron grip over their cluster.
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> +1 to sensible defaults so users don't trip themselves. The
> option to tinker for tighter grip can be tackled later, either on a per
> role basis or as a generic concept later.
> >>>>>>>>>>> >
> >>>>>>>>>>> >
> >>>>>>>>>>> > +1 - Can definitely be added later if we so desire, not
> needed for this SIP
> >>>>>>>>>>> >
> >>>>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com>
> wrote:
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> I think the key  is to let the roles have full control of
> the implications of having/not having that role. No need for even a
> strict/loose designation. The question of do you have the role is yes/no
> with no logic to guess if the role is implied or not, The question of will
> it come up with the role is "have_explicit ? use_defaults : use_defaults.
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> Once you figure out who has a role (or not) what that
> means is up to the role code.
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> Corollary: we don't have to change the way overseer works
> in this SIP. We can rework it or not as we see fit separately.
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> +1
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> Only thing we need to do is find a wording that makes the
> above clear on first read through the SIP :)
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> -Gus
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
> houstonputman@gmail.com> wrote:
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
> happens if all of our existing OVERSEER candidates are down. When at least
> one of them is up, the overseer will go there, and that is good and
> expected. But what happens if all of the overseer eligible nodes are down.
> Your comment, and the old system, would imply that the overseer election
> goes to some other unrelated, untagged node. I disagree with this
> implementation choice. This sounds like something role specific to
> determine, but I would like to see us be more strict about it. I don't want
> cores leaking out of my data roles, I don't want query processing to leak
> out of my "query" nodes or whatever. Overseer shouldn't be special in this
> regard.
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> I'm very strongly in favor of not letting users design a
> system in which the cluster can be "live" without an overseer. I understand
> that the overseer can be taxing to the cluster, but honestly what is the
> point of having an untaxed cluster that doesn't have an overseer? I can see
> arguments for the other roles to be stricter about this, but there are also
> a lot of users who wouldn't want those to be strict either (like "query"
> nodes).
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a
> non-overseer role node HAS to be selected to become overseer, it will try
> to migrate the overseer job to a node with the overseer role whenever one
> becomes live.
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> So maybe we don't have special rules per role, but
> instead roles can either be defined as "Strict" or "Loose" (better names
> likely exist), and the roles come with a default (Overseer -> Loose, Data
> -> Strict, Query -> Loose, etc.). And it is up to each role to define how
> to behave when running in LOOSE mode and a non-role node is used then a
> role node comes online (like the overseer example given above).
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> With the Strict/Loose option and sensible defaults, users
> cannot trip themselves up by default, but the option is there for people to
> tinker and have an iron grip over their cluster.
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com>
> wrote:
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Noble wrote:
> >>>>>>>>>>> >>>>> > We are not modifying the way the "overseer role" works
> today. We are just changing the definition and standardizing the
> configuration & discoverability
> >>>>>>>>>>> >>>>> Ishan wrote:
> >>>>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the
> OVERSEER role (which currently stands for preferred overseer). We can take
> a stab at refactoring it later.
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Grouping these two comments together, since I think they
> are saying the same thing. I think this is part of my confusion. We have an
> old system that doesn't work the way we want the new system to work. There
> may be people already using the old system. What path do we offer for folks
> using the old system to migrate to the new system? What happens if somebody
> accidentally tries to use both systems at the same time?
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Ishan wrote:
> >>>>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with
> OVERSEER role] are live, Solr guarantees that one of those nodes becomes
> the overseer.", I meant to somewhat capture the current behaviour as the
> OVERSEER role performs today. Do you see any inconsistency with this
> statement vs. what it does today?
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> This doesn't really address my concern around what
> happens if all of our existing OVERSEER candidates are down. When at least
> one of them is up, the overseer will go there, and that is good and
> expected. But what happens if all of the overseer eligible nodes are down.
> Your comment, and the old system, would imply that the overseer election
> goes to some other unrelated, untagged node. I disagree with this
> implementation choice. This sounds like something role specific to
> determine, but I would like to see us be more strict about it. I don't want
> cores leaking out of my data roles, I don't want query processing to leak
> out of my "query" nodes or whatever. Overseer shouldn't be special in this
> regard.
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Noble wrote:
> >>>>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a
> node in the following request?
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> You're absolutely correct, thanks for pointing this out.
> Let's leave it as is.
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <
> mdrob@mdrob.com> wrote:
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> Replying to the top post in this thread because there
> has been a lot of discussion and I don't want to look like I'm continuing
> any of those particular threads.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> I finally had time to sit down and think about this
> with the attention it deserves and am generally happy with how the
> conversation has shaped the current proposal.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> GOOD: I think using system properties to define node
> roles is fine and I like that data is the default role when not defined. I
> think it is important to hold on to the guarantee that an active overseer
> will land on an overseer node role.
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path
> for folks using the current OVERSEER role. I am not sure that something can
> be done automatically since they need to now specify new properties at
> startup. Maybe we need to include loud warnings or support both approaches
> for a time?
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the
> overseer nodes fail, then it is implied the overseer will go to one of the
> data nodes. The specific wording in the SIP - "When one or more such nodes
> are live, Solr guarantees that one of those nodes become the overseer."
> implies to me that failover could go from overseer1 to overseer2 to
> overseerN to random node. I feel like we need to have some recording that
> there were dedicated overseer nodes and stop the cascading failure instead
> of churning through our data nodes.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed
> scope of "coordinator" roles from a split query/indexing standpoint. I
> understand that these are used as examples, but would like stronger
> language that new roles should also go through their own SIP discussions.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node
> liveness in two different places now. We have the live nodes and we have
> the node roles stored in two different places in zookeeper and it feels
> like this would lead to race conditions or split brain or other hard to
> diagnose bugs when those two lists don't agree with each other. This also
> feels like it contradicts the "single source of truth" idea later stated in
> the proposal. I see Gus's arguments for decoupling these and am not
> strongly opposed, I just get a lurking feeling about it. Even if we don't
> do this, I would like this called out explicitly in the alternative
> approaches section as something that we considered and rejected, with
> details why,
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an
> additional call out here that all operations are GET because nodes cannot
> be changed at runtime.
> >>>>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the
> previous OVERSEER preference role?
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of
> available roles for a cluster. I _think_ this could be based on the version
> that the cluster is running? Would be useful to be able to interrogate a
> cluster in the future... we're seeing OOM issues on queries, can we add
> some query nodes? When were they introduced? I don't know what path this
> API should exist at.
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated
> the SIP document. Not sure if there's a better path that we could go for.
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show
> which parts are string literals and which parts are meant to be substituted
> by the operator? GET /api/cluster/roles/data would become GET
> /api/cluster/roles/${rolename} in our SIP/documentation.
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET
> /api/cluster/roles/nodes/node1 should be GET /api/cluster/roles/${nodename}
> dropping the intermediate "nodes"
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need
> that intermediate "nodes" node.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some
> permissions? Maybe this requirement is too fundamental to the operation of
> a cluster and everybody would have to be able to do it.
> >>>>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other
> clients) to treat roles? Implementation detail that the servers will figure
> out? Or strict guidance where the client needs to check where specific
> roles are before sending any further communication to the server?
> >>>>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request
> that it can't fulfil? An overseer node gets a query or an update. A data
> node gets a collection creation request. Do they forward it on to an
> appropriate node, or do they reject it? Should this be configurable? If
> not, then it seems like lazy or poorly configured clients will defeat this
> isolation system quite easily.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
> >>>>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave
> when roles are added mean? I thought we established that they are not
> dynamic.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> Thanks,
> >>>>>>>>>>> >>>>>>> Mike
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> Hi,
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node
> roles:
> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
> >>>>>>>>>>> >>>>>>>>
> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> We also wish to add first class support for Query
> nodes that are used to process user queries by forwarding to data nodes,
> merging/aggregating them and presenting to users. This concept exists as
> first class citizens in most other search engines. This is a chance for
> Solr to catch up.
> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> Regards,
> >>>>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> --
> >>>>>>>>>>> >>> http://www.needhamsoftware.com (work)
> >>>>>>>>>>> >>> http://www.the111shift.com (play)
> >>>>>>>>>>>
> >>>>>>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> >>>>>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
> >>>>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> http://www.needhamsoftware.com (work)
> >>>>>>> http://www.the111shift.com (play)
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> -----------------------------------------------------
> >>>>> Noble Paul
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> http://www.needhamsoftware.com (work)
> >>>> http://www.the111shift.com (play)
> >>>
> >>>
> >>>
> >>> --
> >>> -----------------------------------------------------
> >>> Noble Paul
> >>
> >>
> >>
> >> --
> >> -----------------------------------------------------
> >> Noble Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

Re: First class support for node roles

Posted by Ilan Ginzburg <il...@gmail.com>.

If we go with optional role params, we need two defaults:
1. the param value to use when the role is specified without a parameter, and
2. the param value to use for the role on a node for which the role is
not specified at all.

I don't know how to sensibly name these defaults, but the actual
values would be:
overseer: default1=preferred, default2=allowed
data: default1=on, default2=on
coordinator: default1=on, default2=off

If we do not allow specifying a role without a parameter, then
default1 does not exist and the example Noble posted earlier covers
us. But simple roles will be easier to use without parameters (and the
transition from existing overseer role would be trivial).

On Sun, Dec 5, 2021 at 7:17 AM Ishan Chattopadhyaya
<ic...@gmail.com> wrote:
>
> I'm +1 on this. It "looks" complicated at first, but simplifies all headaches going forward.
>
> On Sun, Dec 5, 2021 at 11:46 AM Noble Paul <no...@gmail.com> wrote:
>>
>> I shall update the SIP proposal if we have a consensus on this configuration
>>
>> On Sun, Dec 5, 2021 at 4:58 PM Noble Paul <no...@gmail.com> wrote:
>>>
>>>
>>>
>>> On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <gu...@gmail.com> wrote:
>>>>
>>>> I like this in that it's an example of how the overseer might be extended without creating a new role :)
>>>>
>>>> Not entirely sure if I'm for or against an enum implementation here, but it makes me a bit nervous. Enums with complexity can quickly get into difficulty for unit tests (especially if one wanted to write a mock object based test, something I think we maybe should use a bit more than we do).
>>>>
>>>>
>>>>
>>>> I would tend to think of a class to represent and collect role related functionality, one that perhaps has methods that receive the request, or other key objects and thus could be tested without standing up an entire server. (Not against also having them exercised in a few integrated tests, but the more we can avoid interleaving logic directly within DispatchFilter and HttpSolrCall etc. the better.
>>>>
>>>>
>>>> So I guess I'm somewhat biased against any enum with more than a couple properties, and definitely don't want to wind up hanging lots of methods off of one. Better to use them to consume a configuration value and then instantiate a class that really holds the logic and data. I like them for constraining values and easy string value conversion but the more they look like classes the more I'd rather have a class.
>>>
>>>
>>>  I just meant it is a set of values. Please let us not discuss the actual impl here . We should stick to discussing the high level design here and specifics should be dealt with in a PR
>>>>
>>>>
>>>> -Gus
>>>>
>>>> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <no...@gmail.com> wrote:
>>>>>
>>>>> I recommend the following format for the role spec
>>>>>
>>>>> roles=<role-name>:<role-value>
>>>>>
>>>>> each role will have an enum of allowed values and a default value
>>>>>
>>>>> role name: data
>>>>>
>>>>> values: [on, off]
>>>>> default: allowed
>>>>>
>>>>> role name: overseer
>>>>>
>>>>> values: [allowed, disallowed, preferred]
>>>>> default : allowed
>>>>>
>>>>> role name: coordinator
>>>>>
>>>>> values : [on, off]
>>>>> default: off
>>>>>
>>>>>
>>>>> examples
>>>>> roles=data:on,overseer:allowed (This is redundant because it uses all the default values. If a node is started without any roles value this is the default behavior)
>>>>> roles=data:off,overseer:preferred ( do not allow data, join overseer election at head)
>>>>> roles=coordinator:on,data:on (role as coordinator, but allow data, it's same as roles=coordinator:on)
>>>>> roles=coordinator:on,data:off (role as coordinator, disallow data)
>>>>>
>>>>>
>>>>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <il...@gmail.com> wrote:
>>>>>>
>>>>>> If we go with no negative node roles and overseer node role is not strict (i.e. it’s a "preferred overseer"), then one would need to define a second node role "no_overseer" to explicitly exclude a node from ever becoming overseer (which I think is a useful feature until we switch the cluster default to not using the overseer), plus the implementation of these two node roles will obviously be coupled (and what if a node has both defined?).
>>>>>>
>>>>>> I prefer strict node roles.
>>>>>> Maybe we could have node roles with [optional] parameters to let the node role implementation decide ?
>>>>>> The overseer node role for example could have one of 3 values defined for each node: “preferred” (default, equivalent to the existing overseer role), "accepted" (equivalent to currently not defining the overseer role) and "no_way" (does not exist today).
>>>>>>
>>>>>> This could be useful in other contexts. A node role “data” could be “fast” or “slow” depending on type of local persistent storage for example…
>>>>>>
>>>>>> Ilan
>>>>>>
>>>>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com> wrote:
>>>>>>>
>>>>>>> I really don't think we should have types of roles. Not negative/positive and not strict/non-strict. You have a role or you don't. What that means is up to the code implementing the role.
>>>>>>>
>>>>>>> Roles should be free to configure a preference order (binary, or n-ary or whatever, strict or loose), prohibit behavior, or enable behavior. In this SIP I feel we should focus on How to identify what node has what role, How to designate what roles a node has via config/params, and the API's for interacting with roles.
>>>>>>>
>>>>>>> We should for example be able to support roles such as
>>>>>>>
>>>>>>> PREFERRED_OVERSEER
>>>>>>> DATA
>>>>>>> NO_ROUTED_ALIAS  (just an example, not something I mean to suggest)
>>>>>>>
>>>>>>> Details about role implementation should probably be discussed in a thread about that role.  Obviously we should think about the name carefully to leave options open should we want to enhance things later so maybe
>>>>>>>
>>>>>>> OVERSEER_PREF  or just  OVERSEER
>>>>>>>
>>>>>>> would be better since it merely reades that the node implements some sort of preference or config regarding overseer... but all this can be decided on a per role basis
>>>>>>>
>>>>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <no...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Negative roles have a place
>>>>>>>>
>>>>>>>> Example is overseer
>>>>>>>>
>>>>>>>> There are 3 possible choices for that role
>>>>>>>>
>>>>>>>> a) preferred: always be in front of the election queue
>>>>>>>> b) on: not preferred, but can be an overseer if no preferred overseer nodes are available
>>>>>>>> c) off: never become an overseer
>>>>>>>>
>>>>>>>> Today we only have options 'a' and 'b' . In a future ticket, we may implement C
>>>>>>>>
>>>>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote:
>>>>>>>>>
>>>>>>>>> Negative roles add a lot of complexity, I would really want to stay away from them. That’s why I want strict roles up front. It’s maybe ok to push this decision out, but it also seems like the sort of thing we should consider at the start.
>>>>>>>>>
>>>>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <no...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Yes. Negative roles is not a bad idea. If I start a node for machine learning purposes, I wouldn't want that node to ever participate in overseer election
>>>>>>>>>>
>>>>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> If we have non strict roles (like overseer), then it does make sense
>>>>>>>>>>> to have negative roles.
>>>>>>>>>>> That way I can define which are the two nodes that I'd prefer the
>>>>>>>>>>> overseer to run on, and a few other nodes on which it should
>>>>>>>>>>> definitely never run for various reasons. And in case these
>>>>>>>>>>> "!overseer" are the only nodes left in the cluster, let the cluster
>>>>>>>>>>> fail the same way it would if there were no data nodes available.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <ho...@gmail.com> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> With the Strict/Loose option and sensible defaults, users cannot trip themselves up by default, but the option is there for people to tinker and have an iron grip over their cluster.
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> +1 to sensible defaults so users don't trip themselves. The option to tinker for tighter grip can be tackled later, either on a per role basis or as a generic concept later.
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > +1 - Can definitely be added later if we so desire, not needed for this SIP
>>>>>>>>>>> >
>>>>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <ic...@gmail.com> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> I think the key  is to let the roles have full control of the implications of having/not having that role. No need for even a strict/loose designation. The question of do you have the role is yes/no with no logic to guess if the role is implied or not, The question of will it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Once you figure out who has a role (or not) what that means is up to the role code.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Corollary: we don't have to change the way overseer works in this SIP. We can rework it or not as we see fit separately.
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> +1
>>>>>>>>>>> >>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Only thing we need to do is find a wording that makes the above clear on first read through the SIP :)
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> -Gus
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <ho...@gmail.com> wrote:
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> This doesn't really address my concern around what happens if all of our existing OVERSEER candidates are down. When at least one of them is up, the overseer will go there, and that is good and expected. But what happens if all of the overseer eligible nodes are down. Your comment, and the old system, would imply that the overseer election goes to some other unrelated, untagged node. I disagree with this implementation choice. This sounds like something role specific to determine, but I would like to see us be more strict about it. I don't want cores leaking out of my data roles, I don't want query processing to leak out of my "query" nodes or whatever. Overseer shouldn't be special in this regard.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> I'm very strongly in favor of not letting users design a system in which the cluster can be "live" without an overseer. I understand that the overseer can be taxing to the cluster, but honestly what is the point of having an untaxed cluster that doesn't have an overseer? I can see arguments for the other roles to be stricter about this, but there are also a lot of users who wouldn't want those to be strict either (like "query" nodes).
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a non-overseer role node HAS to be selected to become overseer, it will try to migrate the overseer job to a node with the overseer role whenever one becomes live.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> So maybe we don't have special rules per role, but instead roles can either be defined as "Strict" or "Loose" (better names likely exist), and the roles come with a default (Overseer -> Loose, Data -> Strict, Query -> Loose, etc.). And it is up to each role to define how to behave when running in LOOSE mode and a non-role node is used then a role node comes online (like the overseer example given above).
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> With the Strict/Loose option and sensible defaults, users cannot trip themselves up by default, but the option is there for people to tinker and have an iron grip over their cluster.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> wrote:
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Noble wrote:
>>>>>>>>>>> >>>>> > We are not modifying the way the "overseer role" works today. We are just changing the definition and standardizing the configuration & discoverability
>>>>>>>>>>> >>>>> Ishan wrote:
>>>>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER role (which currently stands for preferred overseer). We can take a stab at refactoring it later.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Grouping these two comments together, since I think they are saying the same thing. I think this is part of my confusion. We have an old system that doesn't work the way we want the new system to work. There may be people already using the old system. What path do we offer for folks using the old system to migrate to the new system? What happens if somebody accidentally tries to use both systems at the same time?
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Ishan wrote:
>>>>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER role] are live, Solr guarantees that one of those nodes becomes the overseer.", I meant to somewhat capture the current behaviour as the OVERSEER role performs today. Do you see any inconsistency with this statement vs. what it does today?
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> This doesn't really address my concern around what happens if all of our existing OVERSEER candidates are down. When at least one of them is up, the overseer will go there, and that is good and expected. But what happens if all of the overseer eligible nodes are down. Your comment, and the old system, would imply that the overseer election goes to some other unrelated, untagged node. I disagree with this implementation choice. This sounds like something role specific to determine, but I would like to see us be more strict about it. I don't want cores leaking out of my data roles, I don't want query processing to leak out of my "query" nodes or whatever. Overseer shouldn't be special in this regard.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Noble wrote:
>>>>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a node in the following request?
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> You're absolutely correct, thanks for pointing this out. Let's leave it as is.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <ic...@gmail.com> wrote:
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com> wrote:
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> Replying to the top post in this thread because there has been a lot of discussion and I don't want to look like I'm continuing any of those particular threads.
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> I finally had time to sit down and think about this with the attention it deserves and am generally happy with how the conversation has shaped the current proposal.
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> GOOD: I think using system properties to define node roles is fine and I like that data is the default role when not defined. I think it is important to hold on to the guarantee that an active overseer will land on an overseer node role.
>>>>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for folks using the current OVERSEER role. I am not sure that something can be done automatically since they need to now specify new properties at startup. Maybe we need to include loud warnings or support both approaches for a time?
>>>>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer nodes fail, then it is implied the overseer will go to one of the data nodes. The specific wording in the SIP - "When one or more such nodes are live, Solr guarantees that one of those nodes become the overseer." implies to me that failover could go from overseer1 to overseer2 to overseerN to random node. I feel like we need to have some recording that there were dedicated overseer nodes and stop the cascading failure instead of churning through our data nodes.
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed scope of "coordinator" roles from a split query/indexing standpoint. I understand that these are used as examples, but would like stronger language that new roles should also go through their own SIP discussions.
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node liveness in two different places now. We have the live nodes and we have the node roles stored in two different places in zookeeper and it feels like this would lead to race conditions or split brain or other hard to diagnose bugs when those two lists don't agree with each other. This also feels like it contradicts the "single source of truth" idea later stated in the proposal. I see Gus's arguments for decoupling these and am not strongly opposed, I just get a lurking feeling about it. Even if we don't do this, I would like this called out explicitly in the alternative approaches section as something that we considered and rejected, with details why,
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an additional call out here that all operations are GET because nodes cannot be changed at runtime.
>>>>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the previous OVERSEER preference role?
>>>>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of available roles for a cluster. I _think_ this could be based on the version that the cluster is running? Would be useful to be able to interrogate a cluster in the future... we're seeing OOM issues on queries, can we add some query nodes? When were they introduced? I don't know what path this API should exist at.
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the SIP document. Not sure if there's a better path that we could go for.
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which parts are string literals and which parts are meant to be substituted by the operator? GET /api/cluster/roles/data would become GET /api/cluster/roles/${rolename} in our SIP/documentation.
>>>>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1 should be GET /api/cluster/roles/${nodename} dropping the intermediate "nodes"
>>>>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that intermediate "nodes" node.
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some permissions? Maybe this requirement is too fundamental to the operation of a cluster and everybody would have to be able to do it.
>>>>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to treat roles? Implementation detail that the servers will figure out? Or strict guidance where the client needs to check where specific roles are before sending any further communication to the server?
>>>>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request that it can't fulfil? An overseer node gets a query or an update. A data node gets a collection creation request. Do they forward it on to an appropriate node, or do they reject it? Should this be configurable? If not, then it seems like lazy or poorly configured clients will defeat this isolation system quite easily.
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>>>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave when roles are added mean? I thought we established that they are not dynamic.
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> Thanks,
>>>>>>>>>>> >>>>>>> Mike
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <ic...@gmail.com> wrote:
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> Hi,
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>>>>>>> >>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> We also wish to add first class support for Query nodes that are used to process user queries by forwarding to data nodes, merging/aggregating them and presenting to users. This concept exists as first class citizens in most other search engines. This is a chance for Solr to catch up.
>>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> Regards,
>>>>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> --
>>>>>>>>>>> >>> http://www.needhamsoftware.com (work)
>>>>>>>>>>> >>> http://www.the111shift.com (play)
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> http://www.needhamsoftware.com (work)
>>>>>>> http://www.the111shift.com (play)
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -----------------------------------------------------
>>>>> Noble Paul
>>>>
>>>>
>>>>
>>>> --
>>>> http://www.needhamsoftware.com (work)
>>>> http://www.the111shift.com (play)
>>>
>>>
>>>
>>> --
>>> -----------------------------------------------------
>>> Noble Paul
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org

Re: First class support for node roles

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

I'm +1 on this. It "looks" complicated at first, but simplifies all
headaches going forward.

On Sun, Dec 5, 2021 at 11:46 AM Noble Paul <no...@gmail.com> wrote:

> I shall update the SIP proposal if we have a consensus on this
> configuration
>
> On Sun, Dec 5, 2021 at 4:58 PM Noble Paul <no...@gmail.com> wrote:
>
>>
>>
>> On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <gu...@gmail.com> wrote:
>>
>>> I like this in that it's an example of how the overseer might be
>>> extended without creating a new role :)
>>>
>>> Not entirely sure if I'm for or against an enum implementation here, but
>>> it makes me a bit nervous. Enums with complexity can quickly get into
>>> difficulty for unit tests (especially if one wanted to write a mock object
>>> based test, something I think we maybe should use a bit more than we do).
>>>
>>
>>>
>>> I would tend to think of a class to represent and collect role related
>>> functionality, one that perhaps has methods that receive the request, or
>>> other key objects and thus could be tested without standing up an entire
>>> server. (Not against also having them exercised in a few integrated tests,
>>> but the more we can avoid interleaving logic directly within DispatchFilter
>>> and HttpSolrCall etc. the better.
>>>
>>
>>> So I guess I'm somewhat biased against any enum with more than a couple
>>> properties, and definitely don't want to wind up hanging lots of methods
>>> off of one. Better to use them to consume a configuration value and then
>>> instantiate a class that really holds the logic and data. I like them for
>>> constraining values and easy string value conversion but the more they look
>>> like classes the more I'd rather have a class.
>>>
>>
>>  I just meant it is a set of values. Please let us not discuss the actual
>> impl here . We should stick to discussing the high level design here
>> and specifics should be dealt with in a PR
>>
>>>
>>> -Gus
>>>
>>> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <no...@gmail.com> wrote:
>>>
>>>> I recommend the following format for the role spec
>>>>
>>>> roles=<role-name>:<role-value>
>>>>
>>>> each role will have an enum of allowed values and a default value
>>>>
>>>>
>>>>    - role name: *data*
>>>>       - values: [*on*, *off]*
>>>>       - default: *allowed*
>>>>    - role name: *overseer*
>>>>       - values: [*allowed*, *disallowed*, *preferred]*
>>>>       - default : *allowed*
>>>>    - role name:* coordinator*
>>>>       - values : [*on*, *off]*
>>>>       - default: *off*
>>>>
>>>>
>>>> examples
>>>> roles=data:on,overseer:allowed (This is redundant because it uses all
>>>> the default values. If a node is started without any roles value this is
>>>> the default behavior)
>>>> roles=data:off,overseer:preferred ( do not allow data, join overseer
>>>> election at head)
>>>> roles=coordinator:on,data:on (role as coordinator, but allow data,
>>>> it's same as roles=coordinator:on)
>>>> roles=coordinator:on,data:off (role as coordinator, disallow data)
>>>>
>>>>
>>>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <il...@gmail.com>
>>>> wrote:
>>>>
>>>>> If we go with no negative node roles and overseer node role is not
>>>>> strict (i.e. it’s a "preferred overseer"), then one would need to define a
>>>>> second node role "no_overseer" to explicitly exclude a node from ever
>>>>> becoming overseer (which I think is a useful feature until we switch the
>>>>> cluster default to not using the overseer), plus the implementation of
>>>>> these two node roles will obviously be coupled (and what if a node has both
>>>>> defined?).
>>>>>
>>>>> I prefer strict node roles.
>>>>> Maybe we could have node roles with [optional] parameters to let the
>>>>> node role implementation decide ?
>>>>> The overseer node role for example could have one of 3 values defined
>>>>> for each node: “preferred” (default, equivalent to the existing overseer
>>>>> role), "accepted" (equivalent to currently not defining the overseer role)
>>>>> and "no_way" (does not exist today).
>>>>>
>>>>> This could be useful in other contexts. A node role “data” could be
>>>>> “fast” or “slow” depending on type of local persistent storage for example…
>>>>>
>>>>> Ilan
>>>>>
>>>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com> wrote:
>>>>>
>>>>>> I really don't think we should have types of roles. Not
>>>>>> negative/positive and not strict/non-strict. You have a role or you don't.
>>>>>> What that means is up to the code implementing the role.
>>>>>>
>>>>>> Roles should be free to configure a preference order (binary, or
>>>>>> n-ary or whatever, strict or loose), prohibit behavior, or enable behavior.
>>>>>> In this SIP I feel we should focus on How to identify what node has what
>>>>>> role, How to designate what roles a node has via config/params, and the
>>>>>> API's for interacting with roles.
>>>>>>
>>>>>> We should for example be able to support roles such as
>>>>>>
>>>>>> PREFERRED_OVERSEER
>>>>>> DATA
>>>>>> NO_ROUTED_ALIAS  (just an example, not something I mean to suggest)
>>>>>>
>>>>>> Details about role implementation should probably be discussed in a
>>>>>> thread about that role.  Obviously we should think about the name carefully
>>>>>> to leave options open should we want to enhance things later so maybe
>>>>>>
>>>>>> OVERSEER_PREF  or just  OVERSEER
>>>>>>
>>>>>> would be better since it merely reades that the node implements some
>>>>>> sort of preference or config regarding overseer... but all this can be
>>>>>> decided on a per role basis
>>>>>>
>>>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <no...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Negative roles have a place
>>>>>>>
>>>>>>> Example is overseer
>>>>>>>
>>>>>>> There are 3 possible choices for that role
>>>>>>>
>>>>>>> a) preferred: always be in front of the election queue
>>>>>>> b) on: not preferred, but can be an overseer if no preferred
>>>>>>> overseer nodes are available
>>>>>>> c) off: never become an overseer
>>>>>>>
>>>>>>> Today we only have options 'a' and 'b' . In a future ticket, we may
>>>>>>> implement C
>>>>>>>
>>>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote:
>>>>>>>
>>>>>>>> Negative roles add a lot of complexity, I would really want to stay
>>>>>>>> away from them. That’s why I want strict roles up front. It’s maybe ok to
>>>>>>>> push this decision out, but it also seems like the sort of thing we should
>>>>>>>> consider at the start.
>>>>>>>>
>>>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <no...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Yes. Negative roles is not a bad idea. If I start a node for
>>>>>>>>> machine learning purposes, I wouldn't want that node to ever participate in
>>>>>>>>> overseer election
>>>>>>>>>
>>>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> If we have non strict roles (like overseer), then it does make
>>>>>>>>>> sense
>>>>>>>>>> to have negative roles.
>>>>>>>>>> That way I can define which are the two nodes that I'd prefer the
>>>>>>>>>> overseer to run on, and a few other nodes on which it should
>>>>>>>>>> definitely never run for various reasons. And in case these
>>>>>>>>>> "!overseer" are the only nodes left in the cluster, let the
>>>>>>>>>> cluster
>>>>>>>>>> fail the same way it would if there were no data nodes available.
>>>>>>>>>>
>>>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
>>>>>>>>>> houstonputman@gmail.com> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> With the Strict/Loose option and sensible defaults, users
>>>>>>>>>> cannot trip themselves up by default, but the option is there for people to
>>>>>>>>>> tinker and have an iron grip over their cluster.
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> +1 to sensible defaults so users don't trip themselves. The
>>>>>>>>>> option to tinker for tighter grip can be tackled later, either on a per
>>>>>>>>>> role basis or as a generic concept later.
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > +1 - Can definitely be added later if we so desire, not needed
>>>>>>>>>> for this SIP
>>>>>>>>>> >
>>>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>>>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> I think the key  is to let the roles have full control of the
>>>>>>>>>> implications of having/not having that role. No need for even a
>>>>>>>>>> strict/loose designation. The question of do you have the role is yes/no
>>>>>>>>>> with no logic to guess if the role is implied or not, The question of will
>>>>>>>>>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Once you figure out who has a role (or not) what that means
>>>>>>>>>> is up to the role code.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Corollary: we don't have to change the way overseer works in
>>>>>>>>>> this SIP. We can rework it or not as we see fit separately.
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> +1
>>>>>>>>>> >>
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>> Only thing we need to do is find a wording that makes the
>>>>>>>>>> above clear on first read through the SIP :)
>>>>>>>>>> >>>
>>>>>>>>>> >>> -Gus
>>>>>>>>>> >>>
>>>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>>>>>>>>> houstonputman@gmail.com> wrote:
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> This doesn't really address my concern around what happens
>>>>>>>>>> if all of our existing OVERSEER candidates are down. When at least one of
>>>>>>>>>> them is up, the overseer will go there, and that is good and expected. But
>>>>>>>>>> what happens if all of the overseer eligible nodes are down. Your comment,
>>>>>>>>>> and the old system, would imply that the overseer election goes to some
>>>>>>>>>> other unrelated, untagged node. I disagree with this implementation choice.
>>>>>>>>>> This sounds like something role specific to determine, but I would like to
>>>>>>>>>> see us be more strict about it. I don't want cores leaking out of my data
>>>>>>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>>>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> I'm very strongly in favor of not letting users design a
>>>>>>>>>> system in which the cluster can be "live" without an overseer. I understand
>>>>>>>>>> that the overseer can be taxing to the cluster, but honestly what is the
>>>>>>>>>> point of having an untaxed cluster that doesn't have an overseer? I can see
>>>>>>>>>> arguments for the other roles to be stricter about this, but there are also
>>>>>>>>>> a lot of users who wouldn't want those to be strict either (like "query"
>>>>>>>>>> nodes).
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a
>>>>>>>>>> non-overseer role node HAS to be selected to become overseer, it will try
>>>>>>>>>> to migrate the overseer job to a node with the overseer role whenever one
>>>>>>>>>> becomes live.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> So maybe we don't have special rules per role, but instead
>>>>>>>>>> roles can either be defined as "Strict" or "Loose" (better names likely
>>>>>>>>>> exist), and the roles come with a default (Overseer -> Loose, Data ->
>>>>>>>>>> Strict, Query -> Loose, etc.). And it is up to each role to define how to
>>>>>>>>>> behave when running in LOOSE mode and a non-role node is used then a role
>>>>>>>>>> node comes online (like the overseer example given above).
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> With the Strict/Loose option and sensible defaults, users
>>>>>>>>>> cannot trip themselves up by default, but the option is there for people to
>>>>>>>>>> tinker and have an iron grip over their cluster.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Noble wrote:
>>>>>>>>>> >>>>> > We are not modifying the way the "overseer role" works
>>>>>>>>>> today. We are just changing the definition and standardizing the
>>>>>>>>>> configuration & discoverability
>>>>>>>>>> >>>>> Ishan wrote:
>>>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER
>>>>>>>>>> role (which currently stands for preferred overseer). We can take a stab at
>>>>>>>>>> refactoring it later.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Grouping these two comments together, since I think they
>>>>>>>>>> are saying the same thing. I think this is part of my confusion. We have an
>>>>>>>>>> old system that doesn't work the way we want the new system to work. There
>>>>>>>>>> may be people already using the old system. What path do we offer for folks
>>>>>>>>>> using the old system to migrate to the new system? What happens if somebody
>>>>>>>>>> accidentally tries to use both systems at the same time?
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Ishan wrote:
>>>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER
>>>>>>>>>> role] are live, Solr guarantees that one of those nodes becomes the
>>>>>>>>>> overseer.", I meant to somewhat capture the current behaviour as the
>>>>>>>>>> OVERSEER role performs today. Do you see any inconsistency with this
>>>>>>>>>> statement vs. what it does today?
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> This doesn't really address my concern around what happens
>>>>>>>>>> if all of our existing OVERSEER candidates are down. When at least one of
>>>>>>>>>> them is up, the overseer will go there, and that is good and expected. But
>>>>>>>>>> what happens if all of the overseer eligible nodes are down. Your comment,
>>>>>>>>>> and the old system, would imply that the overseer election goes to some
>>>>>>>>>> other unrelated, untagged node. I disagree with this implementation choice.
>>>>>>>>>> This sounds like something role specific to determine, but I would like to
>>>>>>>>>> see us be more strict about it. I don't want cores leaking out of my data
>>>>>>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>>>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Noble wrote:
>>>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a node
>>>>>>>>>> in the following request?
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> You're absolutely correct, thanks for pointing this out.
>>>>>>>>>> Let's leave it as is.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>>>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <
>>>>>>>>>> mdrob@mdrob.com> wrote:
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> Replying to the top post in this thread because there has
>>>>>>>>>> been a lot of discussion and I don't want to look like I'm continuing any
>>>>>>>>>> of those particular threads.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> I finally had time to sit down and think about this with
>>>>>>>>>> the attention it deserves and am generally happy with how the conversation
>>>>>>>>>> has shaped the current proposal.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> GOOD: I think using system properties to define node
>>>>>>>>>> roles is fine and I like that data is the default role when not defined. I
>>>>>>>>>> think it is important to hold on to the guarantee that an active overseer
>>>>>>>>>> will land on an overseer node role.
>>>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for
>>>>>>>>>> folks using the current OVERSEER role. I am not sure that something can be
>>>>>>>>>> done automatically since they need to now specify new properties at
>>>>>>>>>> startup. Maybe we need to include loud warnings or support both approaches
>>>>>>>>>> for a time?
>>>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer
>>>>>>>>>> nodes fail, then it is implied the overseer will go to one of the data
>>>>>>>>>> nodes. The specific wording in the SIP - "When one or more such nodes are
>>>>>>>>>> live, Solr guarantees that one of those nodes become the overseer." implies
>>>>>>>>>> to me that failover could go from overseer1 to overseer2 to overseerN to
>>>>>>>>>> random node. I feel like we need to have some recording that there were
>>>>>>>>>> dedicated overseer nodes and stop the cascading failure instead of churning
>>>>>>>>>> through our data nodes.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed
>>>>>>>>>> scope of "coordinator" roles from a split query/indexing standpoint. I
>>>>>>>>>> understand that these are used as examples, but would like stronger
>>>>>>>>>> language that new roles should also go through their own SIP discussions.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node
>>>>>>>>>> liveness in two different places now. We have the live nodes and we have
>>>>>>>>>> the node roles stored in two different places in zookeeper and it feels
>>>>>>>>>> like this would lead to race conditions or split brain or other hard to
>>>>>>>>>> diagnose bugs when those two lists don't agree with each other. This also
>>>>>>>>>> feels like it contradicts the "single source of truth" idea later stated in
>>>>>>>>>> the proposal. I see Gus's arguments for decoupling these and am not
>>>>>>>>>> strongly opposed, I just get a lurking feeling about it. Even if we don't
>>>>>>>>>> do this, I would like this called out explicitly in the alternative
>>>>>>>>>> approaches section as something that we considered and rejected, with
>>>>>>>>>> details why,
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an
>>>>>>>>>> additional call out here that all operations are GET because nodes cannot
>>>>>>>>>> be changed at runtime.
>>>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the previous
>>>>>>>>>> OVERSEER preference role?
>>>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of
>>>>>>>>>> available roles for a cluster. I _think_ this could be based on the version
>>>>>>>>>> that the cluster is running? Would be useful to be able to interrogate a
>>>>>>>>>> cluster in the future... we're seeing OOM issues on queries, can we add
>>>>>>>>>> some query nodes? When were they introduced? I don't know what path this
>>>>>>>>>> API should exist at.
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the
>>>>>>>>>> SIP document. Not sure if there's a better path that we could go for.
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which
>>>>>>>>>> parts are string literals and which parts are meant to be substituted by
>>>>>>>>>> the operator? GET /api/cluster/roles/data would become GET
>>>>>>>>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET
>>>>>>>>>> /api/cluster/roles/nodes/node1 should be GET /api/cluster/roles/${nodename}
>>>>>>>>>> dropping the intermediate "nodes"
>>>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that
>>>>>>>>>> intermediate "nodes" node.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some
>>>>>>>>>> permissions? Maybe this requirement is too fundamental to the operation of
>>>>>>>>>> a cluster and everybody would have to be able to do it.
>>>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients)
>>>>>>>>>> to treat roles? Implementation detail that the servers will figure out? Or
>>>>>>>>>> strict guidance where the client needs to check where specific roles are
>>>>>>>>>> before sending any further communication to the server?
>>>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request
>>>>>>>>>> that it can't fulfil? An overseer node gets a query or an update. A data
>>>>>>>>>> node gets a collection creation request. Do they forward it on to an
>>>>>>>>>> appropriate node, or do they reject it? Should this be configurable? If
>>>>>>>>>> not, then it seems like lazy or poorly configured clients will defeat this
>>>>>>>>>> isolation system quite easily.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave
>>>>>>>>>> when roles are added mean? I thought we established that they are not
>>>>>>>>>> dynamic.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> Thanks,
>>>>>>>>>> >>>>>>> Mike
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>>>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> Hi,
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> We also wish to add first class support for Query nodes
>>>>>>>>>> that are used to process user queries by forwarding to data nodes,
>>>>>>>>>> merging/aggregating them and presenting to users. This concept exists as
>>>>>>>>>> first class citizens in most other search engines. This is a chance for
>>>>>>>>>> Solr to catch up.
>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> Regards,
>>>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>> --
>>>>>>>>>> >>> http://www.needhamsoftware.com (work)
>>>>>>>>>> >>> http://www.the111shift.com (play)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> http://www.needhamsoftware.com (work)
>>>>>> http://www.the111shift.com (play)
>>>>>>
>>>>>
>>>>
>>>> --
>>>> -----------------------------------------------------
>>>> Noble Paul
>>>>
>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul
>>
>
>
> --
> -----------------------------------------------------
> Noble Paul
>

Re: First class support for node roles

Posted by Noble Paul <no...@gmail.com>.

I shall update the SIP proposal if we have a consensus on this configuration

On Sun, Dec 5, 2021 at 4:58 PM Noble Paul <no...@gmail.com> wrote:

>
>
> On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <gu...@gmail.com> wrote:
>
>> I like this in that it's an example of how the overseer might be extended
>> without creating a new role :)
>>
>> Not entirely sure if I'm for or against an enum implementation here, but
>> it makes me a bit nervous. Enums with complexity can quickly get into
>> difficulty for unit tests (especially if one wanted to write a mock object
>> based test, something I think we maybe should use a bit more than we do).
>>
>
>>
>> I would tend to think of a class to represent and collect role related
>> functionality, one that perhaps has methods that receive the request, or
>> other key objects and thus could be tested without standing up an entire
>> server. (Not against also having them exercised in a few integrated tests,
>> but the more we can avoid interleaving logic directly within DispatchFilter
>> and HttpSolrCall etc. the better.
>>
>
>> So I guess I'm somewhat biased against any enum with more than a couple
>> properties, and definitely don't want to wind up hanging lots of methods
>> off of one. Better to use them to consume a configuration value and then
>> instantiate a class that really holds the logic and data. I like them for
>> constraining values and easy string value conversion but the more they look
>> like classes the more I'd rather have a class.
>>
>
>  I just meant it is a set of values. Please let us not discuss the actual
> impl here . We should stick to discussing the high level design here
> and specifics should be dealt with in a PR
>
>>
>> -Gus
>>
>> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <no...@gmail.com> wrote:
>>
>>> I recommend the following format for the role spec
>>>
>>> roles=<role-name>:<role-value>
>>>
>>> each role will have an enum of allowed values and a default value
>>>
>>>
>>>    - role name: *data*
>>>       - values: [*on*, *off]*
>>>       - default: *allowed*
>>>    - role name: *overseer*
>>>       - values: [*allowed*, *disallowed*, *preferred]*
>>>       - default : *allowed*
>>>    - role name:* coordinator*
>>>       - values : [*on*, *off]*
>>>       - default: *off*
>>>
>>>
>>> examples
>>> roles=data:on,overseer:allowed (This is redundant because it uses all
>>> the default values. If a node is started without any roles value this is
>>> the default behavior)
>>> roles=data:off,overseer:preferred ( do not allow data, join overseer
>>> election at head)
>>> roles=coordinator:on,data:on (role as coordinator, but allow data, it's
>>> same as roles=coordinator:on)
>>> roles=coordinator:on,data:off (role as coordinator, disallow data)
>>>
>>>
>>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <il...@gmail.com>
>>> wrote:
>>>
>>>> If we go with no negative node roles and overseer node role is not
>>>> strict (i.e. it’s a "preferred overseer"), then one would need to define a
>>>> second node role "no_overseer" to explicitly exclude a node from ever
>>>> becoming overseer (which I think is a useful feature until we switch the
>>>> cluster default to not using the overseer), plus the implementation of
>>>> these two node roles will obviously be coupled (and what if a node has both
>>>> defined?).
>>>>
>>>> I prefer strict node roles.
>>>> Maybe we could have node roles with [optional] parameters to let the
>>>> node role implementation decide ?
>>>> The overseer node role for example could have one of 3 values defined
>>>> for each node: “preferred” (default, equivalent to the existing overseer
>>>> role), "accepted" (equivalent to currently not defining the overseer role)
>>>> and "no_way" (does not exist today).
>>>>
>>>> This could be useful in other contexts. A node role “data” could be
>>>> “fast” or “slow” depending on type of local persistent storage for example…
>>>>
>>>> Ilan
>>>>
>>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com> wrote:
>>>>
>>>>> I really don't think we should have types of roles. Not
>>>>> negative/positive and not strict/non-strict. You have a role or you don't.
>>>>> What that means is up to the code implementing the role.
>>>>>
>>>>> Roles should be free to configure a preference order (binary, or n-ary
>>>>> or whatever, strict or loose), prohibit behavior, or enable behavior. In
>>>>> this SIP I feel we should focus on How to identify what node has what role,
>>>>> How to designate what roles a node has via config/params, and the API's for
>>>>> interacting with roles.
>>>>>
>>>>> We should for example be able to support roles such as
>>>>>
>>>>> PREFERRED_OVERSEER
>>>>> DATA
>>>>> NO_ROUTED_ALIAS  (just an example, not something I mean to suggest)
>>>>>
>>>>> Details about role implementation should probably be discussed in a
>>>>> thread about that role.  Obviously we should think about the name carefully
>>>>> to leave options open should we want to enhance things later so maybe
>>>>>
>>>>> OVERSEER_PREF  or just  OVERSEER
>>>>>
>>>>> would be better since it merely reades that the node implements some
>>>>> sort of preference or config regarding overseer... but all this can be
>>>>> decided on a per role basis
>>>>>
>>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <no...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Negative roles have a place
>>>>>>
>>>>>> Example is overseer
>>>>>>
>>>>>> There are 3 possible choices for that role
>>>>>>
>>>>>> a) preferred: always be in front of the election queue
>>>>>> b) on: not preferred, but can be an overseer if no preferred overseer
>>>>>> nodes are available
>>>>>> c) off: never become an overseer
>>>>>>
>>>>>> Today we only have options 'a' and 'b' . In a future ticket, we may
>>>>>> implement C
>>>>>>
>>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote:
>>>>>>
>>>>>>> Negative roles add a lot of complexity, I would really want to stay
>>>>>>> away from them. That’s why I want strict roles up front. It’s maybe ok to
>>>>>>> push this decision out, but it also seems like the sort of thing we should
>>>>>>> consider at the start.
>>>>>>>
>>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <no...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Yes. Negative roles is not a bad idea. If I start a node for
>>>>>>>> machine learning purposes, I wouldn't want that node to ever participate in
>>>>>>>> overseer election
>>>>>>>>
>>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> If we have non strict roles (like overseer), then it does make
>>>>>>>>> sense
>>>>>>>>> to have negative roles.
>>>>>>>>> That way I can define which are the two nodes that I'd prefer the
>>>>>>>>> overseer to run on, and a few other nodes on which it should
>>>>>>>>> definitely never run for various reasons. And in case these
>>>>>>>>> "!overseer" are the only nodes left in the cluster, let the cluster
>>>>>>>>> fail the same way it would if there were no data nodes available.
>>>>>>>>>
>>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
>>>>>>>>> houstonputman@gmail.com> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> With the Strict/Loose option and sensible defaults, users
>>>>>>>>> cannot trip themselves up by default, but the option is there for people to
>>>>>>>>> tinker and have an iron grip over their cluster.
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> +1 to sensible defaults so users don't trip themselves. The
>>>>>>>>> option to tinker for tighter grip can be tackled later, either on a per
>>>>>>>>> role basis or as a generic concept later.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > +1 - Can definitely be added later if we so desire, not needed
>>>>>>>>> for this SIP
>>>>>>>>> >
>>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> I think the key  is to let the roles have full control of the
>>>>>>>>> implications of having/not having that role. No need for even a
>>>>>>>>> strict/loose designation. The question of do you have the role is yes/no
>>>>>>>>> with no logic to guess if the role is implied or not, The question of will
>>>>>>>>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>>>>>>>> >>>
>>>>>>>>> >>> Once you figure out who has a role (or not) what that means is
>>>>>>>>> up to the role code.
>>>>>>>>> >>>
>>>>>>>>> >>> Corollary: we don't have to change the way overseer works in
>>>>>>>>> this SIP. We can rework it or not as we see fit separately.
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> +1
>>>>>>>>> >>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> Only thing we need to do is find a wording that makes the
>>>>>>>>> above clear on first read through the SIP :)
>>>>>>>>> >>>
>>>>>>>>> >>> -Gus
>>>>>>>>> >>>
>>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>>>>>>>> houstonputman@gmail.com> wrote:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> This doesn't really address my concern around what happens
>>>>>>>>> if all of our existing OVERSEER candidates are down. When at least one of
>>>>>>>>> them is up, the overseer will go there, and that is good and expected. But
>>>>>>>>> what happens if all of the overseer eligible nodes are down. Your comment,
>>>>>>>>> and the old system, would imply that the overseer election goes to some
>>>>>>>>> other unrelated, untagged node. I disagree with this implementation choice.
>>>>>>>>> This sounds like something role specific to determine, but I would like to
>>>>>>>>> see us be more strict about it. I don't want cores leaking out of my data
>>>>>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>>>>>> >>>>
>>>>>>>>> >>>>
>>>>>>>>> >>>> I'm very strongly in favor of not letting users design a
>>>>>>>>> system in which the cluster can be "live" without an overseer. I understand
>>>>>>>>> that the overseer can be taxing to the cluster, but honestly what is the
>>>>>>>>> point of having an untaxed cluster that doesn't have an overseer? I can see
>>>>>>>>> arguments for the other roles to be stricter about this, but there are also
>>>>>>>>> a lot of users who wouldn't want those to be strict either (like "query"
>>>>>>>>> nodes).
>>>>>>>>> >>>>
>>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a
>>>>>>>>> non-overseer role node HAS to be selected to become overseer, it will try
>>>>>>>>> to migrate the overseer job to a node with the overseer role whenever one
>>>>>>>>> becomes live.
>>>>>>>>> >>>>
>>>>>>>>> >>>> So maybe we don't have special rules per role, but instead
>>>>>>>>> roles can either be defined as "Strict" or "Loose" (better names likely
>>>>>>>>> exist), and the roles come with a default (Overseer -> Loose, Data ->
>>>>>>>>> Strict, Query -> Loose, etc.). And it is up to each role to define how to
>>>>>>>>> behave when running in LOOSE mode and a non-role node is used then a role
>>>>>>>>> node comes online (like the overseer example given above).
>>>>>>>>> >>>>
>>>>>>>>> >>>> With the Strict/Loose option and sensible defaults, users
>>>>>>>>> cannot trip themselves up by default, but the option is there for people to
>>>>>>>>> tinker and have an iron grip over their cluster.
>>>>>>>>> >>>>
>>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com>
>>>>>>>>> wrote:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Noble wrote:
>>>>>>>>> >>>>> > We are not modifying the way the "overseer role" works
>>>>>>>>> today. We are just changing the definition and standardizing the
>>>>>>>>> configuration & discoverability
>>>>>>>>> >>>>> Ishan wrote:
>>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER
>>>>>>>>> role (which currently stands for preferred overseer). We can take a stab at
>>>>>>>>> refactoring it later.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Grouping these two comments together, since I think they are
>>>>>>>>> saying the same thing. I think this is part of my confusion. We have an old
>>>>>>>>> system that doesn't work the way we want the new system to work. There may
>>>>>>>>> be people already using the old system. What path do we offer for folks
>>>>>>>>> using the old system to migrate to the new system? What happens if somebody
>>>>>>>>> accidentally tries to use both systems at the same time?
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Ishan wrote:
>>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER
>>>>>>>>> role] are live, Solr guarantees that one of those nodes becomes the
>>>>>>>>> overseer.", I meant to somewhat capture the current behaviour as the
>>>>>>>>> OVERSEER role performs today. Do you see any inconsistency with this
>>>>>>>>> statement vs. what it does today?
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> This doesn't really address my concern around what happens
>>>>>>>>> if all of our existing OVERSEER candidates are down. When at least one of
>>>>>>>>> them is up, the overseer will go there, and that is good and expected. But
>>>>>>>>> what happens if all of the overseer eligible nodes are down. Your comment,
>>>>>>>>> and the old system, would imply that the overseer election goes to some
>>>>>>>>> other unrelated, untagged node. I disagree with this implementation choice.
>>>>>>>>> This sounds like something role specific to determine, but I would like to
>>>>>>>>> see us be more strict about it. I don't want cores leaking out of my data
>>>>>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Noble wrote:
>>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a node in
>>>>>>>>> the following request?
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> You're absolutely correct, thanks for pointing this out.
>>>>>>>>> Let's leave it as is.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com>
>>>>>>>>> wrote:
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> Replying to the top post in this thread because there has
>>>>>>>>> been a lot of discussion and I don't want to look like I'm continuing any
>>>>>>>>> of those particular threads.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> I finally had time to sit down and think about this with
>>>>>>>>> the attention it deserves and am generally happy with how the conversation
>>>>>>>>> has shaped the current proposal.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> GOOD: I think using system properties to define node roles
>>>>>>>>> is fine and I like that data is the default role when not defined. I think
>>>>>>>>> it is important to hold on to the guarantee that an active overseer will
>>>>>>>>> land on an overseer node role.
>>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for
>>>>>>>>> folks using the current OVERSEER role. I am not sure that something can be
>>>>>>>>> done automatically since they need to now specify new properties at
>>>>>>>>> startup. Maybe we need to include loud warnings or support both approaches
>>>>>>>>> for a time?
>>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer
>>>>>>>>> nodes fail, then it is implied the overseer will go to one of the data
>>>>>>>>> nodes. The specific wording in the SIP - "When one or more such nodes are
>>>>>>>>> live, Solr guarantees that one of those nodes become the overseer." implies
>>>>>>>>> to me that failover could go from overseer1 to overseer2 to overseerN to
>>>>>>>>> random node. I feel like we need to have some recording that there were
>>>>>>>>> dedicated overseer nodes and stop the cascading failure instead of churning
>>>>>>>>> through our data nodes.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed
>>>>>>>>> scope of "coordinator" roles from a split query/indexing standpoint. I
>>>>>>>>> understand that these are used as examples, but would like stronger
>>>>>>>>> language that new roles should also go through their own SIP discussions.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node
>>>>>>>>> liveness in two different places now. We have the live nodes and we have
>>>>>>>>> the node roles stored in two different places in zookeeper and it feels
>>>>>>>>> like this would lead to race conditions or split brain or other hard to
>>>>>>>>> diagnose bugs when those two lists don't agree with each other. This also
>>>>>>>>> feels like it contradicts the "single source of truth" idea later stated in
>>>>>>>>> the proposal. I see Gus's arguments for decoupling these and am not
>>>>>>>>> strongly opposed, I just get a lurking feeling about it. Even if we don't
>>>>>>>>> do this, I would like this called out explicitly in the alternative
>>>>>>>>> approaches section as something that we considered and rejected, with
>>>>>>>>> details why,
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an
>>>>>>>>> additional call out here that all operations are GET because nodes cannot
>>>>>>>>> be changed at runtime.
>>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the previous
>>>>>>>>> OVERSEER preference role?
>>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of
>>>>>>>>> available roles for a cluster. I _think_ this could be based on the version
>>>>>>>>> that the cluster is running? Would be useful to be able to interrogate a
>>>>>>>>> cluster in the future... we're seeing OOM issues on queries, can we add
>>>>>>>>> some query nodes? When were they introduced? I don't know what path this
>>>>>>>>> API should exist at.
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the
>>>>>>>>> SIP document. Not sure if there's a better path that we could go for.
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which
>>>>>>>>> parts are string literals and which parts are meant to be substituted by
>>>>>>>>> the operator? GET /api/cluster/roles/data would become GET
>>>>>>>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1
>>>>>>>>> should be GET /api/cluster/roles/${nodename} dropping the intermediate
>>>>>>>>> "nodes"
>>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that
>>>>>>>>> intermediate "nodes" node.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some
>>>>>>>>> permissions? Maybe this requirement is too fundamental to the operation of
>>>>>>>>> a cluster and everybody would have to be able to do it.
>>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients)
>>>>>>>>> to treat roles? Implementation detail that the servers will figure out? Or
>>>>>>>>> strict guidance where the client needs to check where specific roles are
>>>>>>>>> before sending any further communication to the server?
>>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request
>>>>>>>>> that it can't fulfil? An overseer node gets a query or an update. A data
>>>>>>>>> node gets a collection creation request. Do they forward it on to an
>>>>>>>>> appropriate node, or do they reject it? Should this be configurable? If
>>>>>>>>> not, then it seems like lazy or poorly configured clients will defeat this
>>>>>>>>> isolation system quite easily.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave when
>>>>>>>>> roles are added mean? I thought we established that they are not dynamic.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> Thanks,
>>>>>>>>> >>>>>>> Mike
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> Hi,
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>>>>> >>>>>>>>
>>>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> We also wish to add first class support for Query nodes
>>>>>>>>> that are used to process user queries by forwarding to data nodes,
>>>>>>>>> merging/aggregating them and presenting to users. This concept exists as
>>>>>>>>> first class citizens in most other search engines. This is a chance for
>>>>>>>>> Solr to catch up.
>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> Regards,
>>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> --
>>>>>>>>> >>> http://www.needhamsoftware.com (work)
>>>>>>>>> >>> http://www.the111shift.com (play)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> --
>>>>> http://www.needhamsoftware.com (work)
>>>>> http://www.the111shift.com (play)
>>>>>
>>>>
>>>
>>> --
>>> -----------------------------------------------------
>>> Noble Paul
>>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>
>
> --
> -----------------------------------------------------
> Noble Paul
>


-- 
-----------------------------------------------------
Noble Paul

Re: First class support for node roles

Posted by Noble Paul <no...@gmail.com>.

On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <gu...@gmail.com> wrote:

> I like this in that it's an example of how the overseer might be extended
> without creating a new role :)
>
> Not entirely sure if I'm for or against an enum implementation here, but
> it makes me a bit nervous. Enums with complexity can quickly get into
> difficulty for unit tests (especially if one wanted to write a mock object
> based test, something I think we maybe should use a bit more than we do).
>

>
> I would tend to think of a class to represent and collect role related
> functionality, one that perhaps has methods that receive the request, or
> other key objects and thus could be tested without standing up an entire
> server. (Not against also having them exercised in a few integrated tests,
> but the more we can avoid interleaving logic directly within DispatchFilter
> and HttpSolrCall etc. the better.
>

> So I guess I'm somewhat biased against any enum with more than a couple
> properties, and definitely don't want to wind up hanging lots of methods
> off of one. Better to use them to consume a configuration value and then
> instantiate a class that really holds the logic and data. I like them for
> constraining values and easy string value conversion but the more they look
> like classes the more I'd rather have a class.
>

 I just meant it is a set of values. Please let us not discuss the actual
impl here . We should stick to discussing the high level design here
and specifics should be dealt with in a PR

>
> -Gus
>
> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <no...@gmail.com> wrote:
>
>> I recommend the following format for the role spec
>>
>> roles=<role-name>:<role-value>
>>
>> each role will have an enum of allowed values and a default value
>>
>>
>>    - role name: *data*
>>       - values: [*on*, *off]*
>>       - default: *allowed*
>>    - role name: *overseer*
>>       - values: [*allowed*, *disallowed*, *preferred]*
>>       - default : *allowed*
>>    - role name:* coordinator*
>>       - values : [*on*, *off]*
>>       - default: *off*
>>
>>
>> examples
>> roles=data:on,overseer:allowed (This is redundant because it uses all
>> the default values. If a node is started without any roles value this is
>> the default behavior)
>> roles=data:off,overseer:preferred ( do not allow data, join overseer
>> election at head)
>> roles=coordinator:on,data:on (role as coordinator, but allow data, it's
>> same as roles=coordinator:on)
>> roles=coordinator:on,data:off (role as coordinator, disallow data)
>>
>>
>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <il...@gmail.com> wrote:
>>
>>> If we go with no negative node roles and overseer node role is not
>>> strict (i.e. it’s a "preferred overseer"), then one would need to define a
>>> second node role "no_overseer" to explicitly exclude a node from ever
>>> becoming overseer (which I think is a useful feature until we switch the
>>> cluster default to not using the overseer), plus the implementation of
>>> these two node roles will obviously be coupled (and what if a node has both
>>> defined?).
>>>
>>> I prefer strict node roles.
>>> Maybe we could have node roles with [optional] parameters to let the
>>> node role implementation decide ?
>>> The overseer node role for example could have one of 3 values defined
>>> for each node: “preferred” (default, equivalent to the existing overseer
>>> role), "accepted" (equivalent to currently not defining the overseer role)
>>> and "no_way" (does not exist today).
>>>
>>> This could be useful in other contexts. A node role “data” could be
>>> “fast” or “slow” depending on type of local persistent storage for example…
>>>
>>> Ilan
>>>
>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com> wrote:
>>>
>>>> I really don't think we should have types of roles. Not
>>>> negative/positive and not strict/non-strict. You have a role or you don't.
>>>> What that means is up to the code implementing the role.
>>>>
>>>> Roles should be free to configure a preference order (binary, or n-ary
>>>> or whatever, strict or loose), prohibit behavior, or enable behavior. In
>>>> this SIP I feel we should focus on How to identify what node has what role,
>>>> How to designate what roles a node has via config/params, and the API's for
>>>> interacting with roles.
>>>>
>>>> We should for example be able to support roles such as
>>>>
>>>> PREFERRED_OVERSEER
>>>> DATA
>>>> NO_ROUTED_ALIAS  (just an example, not something I mean to suggest)
>>>>
>>>> Details about role implementation should probably be discussed in a
>>>> thread about that role.  Obviously we should think about the name carefully
>>>> to leave options open should we want to enhance things later so maybe
>>>>
>>>> OVERSEER_PREF  or just  OVERSEER
>>>>
>>>> would be better since it merely reades that the node implements some
>>>> sort of preference or config regarding overseer... but all this can be
>>>> decided on a per role basis
>>>>
>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <no...@gmail.com>
>>>> wrote:
>>>>
>>>>> Negative roles have a place
>>>>>
>>>>> Example is overseer
>>>>>
>>>>> There are 3 possible choices for that role
>>>>>
>>>>> a) preferred: always be in front of the election queue
>>>>> b) on: not preferred, but can be an overseer if no preferred overseer
>>>>> nodes are available
>>>>> c) off: never become an overseer
>>>>>
>>>>> Today we only have options 'a' and 'b' . In a future ticket, we may
>>>>> implement C
>>>>>
>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote:
>>>>>
>>>>>> Negative roles add a lot of complexity, I would really want to stay
>>>>>> away from them. That’s why I want strict roles up front. It’s maybe ok to
>>>>>> push this decision out, but it also seems like the sort of thing we should
>>>>>> consider at the start.
>>>>>>
>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <no...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Yes. Negative roles is not a bad idea. If I start a node for
>>>>>>> machine learning purposes, I wouldn't want that node to ever participate in
>>>>>>> overseer election
>>>>>>>
>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> If we have non strict roles (like overseer), then it does make sense
>>>>>>>> to have negative roles.
>>>>>>>> That way I can define which are the two nodes that I'd prefer the
>>>>>>>> overseer to run on, and a few other nodes on which it should
>>>>>>>> definitely never run for various reasons. And in case these
>>>>>>>> "!overseer" are the only nodes left in the cluster, let the cluster
>>>>>>>> fail the same way it would if there were no data nodes available.
>>>>>>>>
>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
>>>>>>>> houstonputman@gmail.com> wrote:
>>>>>>>> >>>
>>>>>>>> >>> With the Strict/Loose option and sensible defaults, users
>>>>>>>> cannot trip themselves up by default, but the option is there for people to
>>>>>>>> tinker and have an iron grip over their cluster.
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> +1 to sensible defaults so users don't trip themselves. The
>>>>>>>> option to tinker for tighter grip can be tackled later, either on a per
>>>>>>>> role basis or as a generic concept later.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > +1 - Can definitely be added later if we so desire, not needed
>>>>>>>> for this SIP
>>>>>>>> >
>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >>>
>>>>>>>> >>> I think the key  is to let the roles have full control of the
>>>>>>>> implications of having/not having that role. No need for even a
>>>>>>>> strict/loose designation. The question of do you have the role is yes/no
>>>>>>>> with no logic to guess if the role is implied or not, The question of will
>>>>>>>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>>>>>>> >>>
>>>>>>>> >>> Once you figure out who has a role (or not) what that means is
>>>>>>>> up to the role code.
>>>>>>>> >>>
>>>>>>>> >>> Corollary: we don't have to change the way overseer works in
>>>>>>>> this SIP. We can rework it or not as we see fit separately.
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> +1
>>>>>>>> >>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> Only thing we need to do is find a wording that makes the above
>>>>>>>> clear on first read through the SIP :)
>>>>>>>> >>>
>>>>>>>> >>> -Gus
>>>>>>>> >>>
>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>>>>>>> houstonputman@gmail.com> wrote:
>>>>>>>> >>>>>
>>>>>>>> >>>>> This doesn't really address my concern around what happens if
>>>>>>>> all of our existing OVERSEER candidates are down. When at least one of them
>>>>>>>> is up, the overseer will go there, and that is good and expected. But what
>>>>>>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>>>>>>> the old system, would imply that the overseer election goes to some other
>>>>>>>> unrelated, untagged node. I disagree with this implementation choice. This
>>>>>>>> sounds like something role specific to determine, but I would like to see
>>>>>>>> us be more strict about it. I don't want cores leaking out of my data
>>>>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> I'm very strongly in favor of not letting users design a
>>>>>>>> system in which the cluster can be "live" without an overseer. I understand
>>>>>>>> that the overseer can be taxing to the cluster, but honestly what is the
>>>>>>>> point of having an untaxed cluster that doesn't have an overseer? I can see
>>>>>>>> arguments for the other roles to be stricter about this, but there are also
>>>>>>>> a lot of users who wouldn't want those to be strict either (like "query"
>>>>>>>> nodes).
>>>>>>>> >>>>
>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a
>>>>>>>> non-overseer role node HAS to be selected to become overseer, it will try
>>>>>>>> to migrate the overseer job to a node with the overseer role whenever one
>>>>>>>> becomes live.
>>>>>>>> >>>>
>>>>>>>> >>>> So maybe we don't have special rules per role, but instead
>>>>>>>> roles can either be defined as "Strict" or "Loose" (better names likely
>>>>>>>> exist), and the roles come with a default (Overseer -> Loose, Data ->
>>>>>>>> Strict, Query -> Loose, etc.). And it is up to each role to define how to
>>>>>>>> behave when running in LOOSE mode and a non-role node is used then a role
>>>>>>>> node comes online (like the overseer example given above).
>>>>>>>> >>>>
>>>>>>>> >>>> With the Strict/Loose option and sensible defaults, users
>>>>>>>> cannot trip themselves up by default, but the option is there for people to
>>>>>>>> tinker and have an iron grip over their cluster.
>>>>>>>> >>>>
>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com>
>>>>>>>> wrote:
>>>>>>>> >>>>>
>>>>>>>> >>>>> Noble wrote:
>>>>>>>> >>>>> > We are not modifying the way the "overseer role" works
>>>>>>>> today. We are just changing the definition and standardizing the
>>>>>>>> configuration & discoverability
>>>>>>>> >>>>> Ishan wrote:
>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER
>>>>>>>> role (which currently stands for preferred overseer). We can take a stab at
>>>>>>>> refactoring it later.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Grouping these two comments together, since I think they are
>>>>>>>> saying the same thing. I think this is part of my confusion. We have an old
>>>>>>>> system that doesn't work the way we want the new system to work. There may
>>>>>>>> be people already using the old system. What path do we offer for folks
>>>>>>>> using the old system to migrate to the new system? What happens if somebody
>>>>>>>> accidentally tries to use both systems at the same time?
>>>>>>>> >>>>>
>>>>>>>> >>>>> Ishan wrote:
>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER
>>>>>>>> role] are live, Solr guarantees that one of those nodes becomes the
>>>>>>>> overseer.", I meant to somewhat capture the current behaviour as the
>>>>>>>> OVERSEER role performs today. Do you see any inconsistency with this
>>>>>>>> statement vs. what it does today?
>>>>>>>> >>>>>
>>>>>>>> >>>>> This doesn't really address my concern around what happens if
>>>>>>>> all of our existing OVERSEER candidates are down. When at least one of them
>>>>>>>> is up, the overseer will go there, and that is good and expected. But what
>>>>>>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>>>>>>> the old system, would imply that the overseer election goes to some other
>>>>>>>> unrelated, untagged node. I disagree with this implementation choice. This
>>>>>>>> sounds like something role specific to determine, but I would like to see
>>>>>>>> us be more strict about it. I don't want cores leaking out of my data
>>>>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Noble wrote:
>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a node in
>>>>>>>> the following request?
>>>>>>>> >>>>>
>>>>>>>> >>>>> You're absolutely correct, thanks for pointing this out.
>>>>>>>> Let's leave it as is.
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com>
>>>>>>>> wrote:
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> Replying to the top post in this thread because there has
>>>>>>>> been a lot of discussion and I don't want to look like I'm continuing any
>>>>>>>> of those particular threads.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> I finally had time to sit down and think about this with
>>>>>>>> the attention it deserves and am generally happy with how the conversation
>>>>>>>> has shaped the current proposal.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> GOOD: I think using system properties to define node roles
>>>>>>>> is fine and I like that data is the default role when not defined. I think
>>>>>>>> it is important to hold on to the guarantee that an active overseer will
>>>>>>>> land on an overseer node role.
>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for
>>>>>>>> folks using the current OVERSEER role. I am not sure that something can be
>>>>>>>> done automatically since they need to now specify new properties at
>>>>>>>> startup. Maybe we need to include loud warnings or support both approaches
>>>>>>>> for a time?
>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer
>>>>>>>> nodes fail, then it is implied the overseer will go to one of the data
>>>>>>>> nodes. The specific wording in the SIP - "When one or more such nodes are
>>>>>>>> live, Solr guarantees that one of those nodes become the overseer." implies
>>>>>>>> to me that failover could go from overseer1 to overseer2 to overseerN to
>>>>>>>> random node. I feel like we need to have some recording that there were
>>>>>>>> dedicated overseer nodes and stop the cascading failure instead of churning
>>>>>>>> through our data nodes.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed scope
>>>>>>>> of "coordinator" roles from a split query/indexing standpoint. I understand
>>>>>>>> that these are used as examples, but would like stronger language that new
>>>>>>>> roles should also go through their own SIP discussions.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node
>>>>>>>> liveness in two different places now. We have the live nodes and we have
>>>>>>>> the node roles stored in two different places in zookeeper and it feels
>>>>>>>> like this would lead to race conditions or split brain or other hard to
>>>>>>>> diagnose bugs when those two lists don't agree with each other. This also
>>>>>>>> feels like it contradicts the "single source of truth" idea later stated in
>>>>>>>> the proposal. I see Gus's arguments for decoupling these and am not
>>>>>>>> strongly opposed, I just get a lurking feeling about it. Even if we don't
>>>>>>>> do this, I would like this called out explicitly in the alternative
>>>>>>>> approaches section as something that we considered and rejected, with
>>>>>>>> details why,
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an
>>>>>>>> additional call out here that all operations are GET because nodes cannot
>>>>>>>> be changed at runtime.
>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the previous
>>>>>>>> OVERSEER preference role?
>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of
>>>>>>>> available roles for a cluster. I _think_ this could be based on the version
>>>>>>>> that the cluster is running? Would be useful to be able to interrogate a
>>>>>>>> cluster in the future... we're seeing OOM issues on queries, can we add
>>>>>>>> some query nodes? When were they introduced? I don't know what path this
>>>>>>>> API should exist at.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the
>>>>>>>> SIP document. Not sure if there's a better path that we could go for.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which
>>>>>>>> parts are string literals and which parts are meant to be substituted by
>>>>>>>> the operator? GET /api/cluster/roles/data would become GET
>>>>>>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1
>>>>>>>> should be GET /api/cluster/roles/${nodename} dropping the intermediate
>>>>>>>> "nodes"
>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that
>>>>>>>> intermediate "nodes" node.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some
>>>>>>>> permissions? Maybe this requirement is too fundamental to the operation of
>>>>>>>> a cluster and everybody would have to be able to do it.
>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients)
>>>>>>>> to treat roles? Implementation detail that the servers will figure out? Or
>>>>>>>> strict guidance where the client needs to check where specific roles are
>>>>>>>> before sending any further communication to the server?
>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request that
>>>>>>>> it can't fulfil? An overseer node gets a query or an update. A data node
>>>>>>>> gets a collection creation request. Do they forward it on to an appropriate
>>>>>>>> node, or do they reject it? Should this be configurable? If not, then it
>>>>>>>> seems like lazy or poorly configured clients will defeat this isolation
>>>>>>>> system quite easily.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave when
>>>>>>>> roles are added mean? I thought we established that they are not dynamic.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> Thanks,
>>>>>>>> >>>>>>> Mike
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Hi,
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>>>> >>>>>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> We also wish to add first class support for Query nodes
>>>>>>>> that are used to process user queries by forwarding to data nodes,
>>>>>>>> merging/aggregating them and presenting to users. This concept exists as
>>>>>>>> first class citizens in most other search engines. This is a chance for
>>>>>>>> Solr to catch up.
>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Regards,
>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> --
>>>>>>>> >>> http://www.needhamsoftware.com (work)
>>>>>>>> >>> http://www.the111shift.com (play)
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>>>>
>>>>>>>>
>>>>
>>>> --
>>>> http://www.needhamsoftware.com (work)
>>>> http://www.the111shift.com (play)
>>>>
>>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


-- 
-----------------------------------------------------
Noble Paul

Re: First class support for node roles

Posted by Gus Heck <gu...@gmail.com>.

I like this in that it's an example of how the overseer might be extended
without creating a new role :)

Not entirely sure if I'm for or against an enum implementation here, but it
makes me a bit nervous. Enums with complexity can quickly get into
difficulty for unit tests (especially if one wanted to write a mock object
based test, something I think we maybe should use a bit more than we do).

I would tend to think of a class to represent and collect role related
functionality, one that perhaps has methods that receive the request, or
other key objects and thus could be tested without standing up an entire
server. (Not against also having them exercised in a few integrated tests,
but the more we can avoid interleaving logic directly within DispatchFilter
and HttpSolrCall etc. the better.

So I guess I'm somewhat biased against any enum with more than a couple
properties, and definitely don't want to wind up hanging lots of methods
off of one. Better to use them to consume a configuration value and then
instantiate a class that really holds the logic and data. I like them for
constraining values and easy string value conversion but the more they look
like classes the more I'd rather have a class.

-Gus

On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <no...@gmail.com> wrote:

> I recommend the following format for the role spec
>
> roles=<role-name>:<role-value>
>
> each role will have an enum of allowed values and a default value
>
>
>    - role name: *data*
>       - values: [*on*, *off]*
>       - default: *allowed*
>    - role name: *overseer*
>       - values: [*allowed*, *disallowed*, *preferred]*
>       - default : *allowed*
>    - role name:* coordinator*
>       - values : [*on*, *off]*
>       - default: *off*
>
>
> examples
> roles=data:on,overseer:allowed (This is redundant because it uses all the
> default values. If a node is started without any roles value this is the
> default behavior)
> roles=data:off,overseer:preferred ( do not allow data, join overseer
> election at head)
> roles=coordinator:on,data:on (role as coordinator, but allow data, it's
> same as roles=coordinator:on)
> roles=coordinator:on,data:off (role as coordinator, disallow data)
>
>
> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <il...@gmail.com> wrote:
>
>> If we go with no negative node roles and overseer node role is not strict
>> (i.e. it’s a "preferred overseer"), then one would need to define a second
>> node role "no_overseer" to explicitly exclude a node from ever becoming
>> overseer (which I think is a useful feature until we switch the cluster
>> default to not using the overseer), plus the implementation of these two
>> node roles will obviously be coupled (and what if a node has both defined?).
>>
>> I prefer strict node roles.
>> Maybe we could have node roles with [optional] parameters to let the node
>> role implementation decide ?
>> The overseer node role for example could have one of 3 values defined for
>> each node: “preferred” (default, equivalent to the existing overseer role),
>> "accepted" (equivalent to currently not defining the overseer role) and
>> "no_way" (does not exist today).
>>
>> This could be useful in other contexts. A node role “data” could be
>> “fast” or “slow” depending on type of local persistent storage for example…
>>
>> Ilan
>>
>> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com> wrote:
>>
>>> I really don't think we should have types of roles. Not
>>> negative/positive and not strict/non-strict. You have a role or you don't.
>>> What that means is up to the code implementing the role.
>>>
>>> Roles should be free to configure a preference order (binary, or n-ary
>>> or whatever, strict or loose), prohibit behavior, or enable behavior. In
>>> this SIP I feel we should focus on How to identify what node has what role,
>>> How to designate what roles a node has via config/params, and the API's for
>>> interacting with roles.
>>>
>>> We should for example be able to support roles such as
>>>
>>> PREFERRED_OVERSEER
>>> DATA
>>> NO_ROUTED_ALIAS  (just an example, not something I mean to suggest)
>>>
>>> Details about role implementation should probably be discussed in a
>>> thread about that role.  Obviously we should think about the name carefully
>>> to leave options open should we want to enhance things later so maybe
>>>
>>> OVERSEER_PREF  or just  OVERSEER
>>>
>>> would be better since it merely reades that the node implements some
>>> sort of preference or config regarding overseer... but all this can be
>>> decided on a per role basis
>>>
>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <no...@gmail.com> wrote:
>>>
>>>> Negative roles have a place
>>>>
>>>> Example is overseer
>>>>
>>>> There are 3 possible choices for that role
>>>>
>>>> a) preferred: always be in front of the election queue
>>>> b) on: not preferred, but can be an overseer if no preferred overseer
>>>> nodes are available
>>>> c) off: never become an overseer
>>>>
>>>> Today we only have options 'a' and 'b' . In a future ticket, we may
>>>> implement C
>>>>
>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote:
>>>>
>>>>> Negative roles add a lot of complexity, I would really want to stay
>>>>> away from them. That’s why I want strict roles up front. It’s maybe ok to
>>>>> push this decision out, but it also seems like the sort of thing we should
>>>>> consider at the start.
>>>>>
>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <no...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes. Negative roles is not a bad idea. If I start a node for
>>>>>> machine learning purposes, I wouldn't want that node to ever participate in
>>>>>> overseer election
>>>>>>
>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> If we have non strict roles (like overseer), then it does make sense
>>>>>>> to have negative roles.
>>>>>>> That way I can define which are the two nodes that I'd prefer the
>>>>>>> overseer to run on, and a few other nodes on which it should
>>>>>>> definitely never run for various reasons. And in case these
>>>>>>> "!overseer" are the only nodes left in the cluster, let the cluster
>>>>>>> fail the same way it would if there were no data nodes available.
>>>>>>>
>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
>>>>>>> houstonputman@gmail.com> wrote:
>>>>>>> >>>
>>>>>>> >>> With the Strict/Loose option and sensible defaults, users cannot
>>>>>>> trip themselves up by default, but the option is there for people to tinker
>>>>>>> and have an iron grip over their cluster.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> +1 to sensible defaults so users don't trip themselves. The
>>>>>>> option to tinker for tighter grip can be tackled later, either on a per
>>>>>>> role basis or as a generic concept later.
>>>>>>> >
>>>>>>> >
>>>>>>> > +1 - Can definitely be added later if we so desire, not needed for
>>>>>>> this SIP
>>>>>>> >
>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>> I think the key  is to let the roles have full control of the
>>>>>>> implications of having/not having that role. No need for even a
>>>>>>> strict/loose designation. The question of do you have the role is yes/no
>>>>>>> with no logic to guess if the role is implied or not, The question of will
>>>>>>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>>>>>> >>>
>>>>>>> >>> Once you figure out who has a role (or not) what that means is
>>>>>>> up to the role code.
>>>>>>> >>>
>>>>>>> >>> Corollary: we don't have to change the way overseer works in
>>>>>>> this SIP. We can rework it or not as we see fit separately.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> +1
>>>>>>> >>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> Only thing we need to do is find a wording that makes the above
>>>>>>> clear on first read through the SIP :)
>>>>>>> >>>
>>>>>>> >>> -Gus
>>>>>>> >>>
>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>>>>>> houstonputman@gmail.com> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> This doesn't really address my concern around what happens if
>>>>>>> all of our existing OVERSEER candidates are down. When at least one of them
>>>>>>> is up, the overseer will go there, and that is good and expected. But what
>>>>>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>>>>>> the old system, would imply that the overseer election goes to some other
>>>>>>> unrelated, untagged node. I disagree with this implementation choice. This
>>>>>>> sounds like something role specific to determine, but I would like to see
>>>>>>> us be more strict about it. I don't want cores leaking out of my data
>>>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> I'm very strongly in favor of not letting users design a system
>>>>>>> in which the cluster can be "live" without an overseer. I understand that
>>>>>>> the overseer can be taxing to the cluster, but honestly what is the point
>>>>>>> of having an untaxed cluster that doesn't have an overseer? I can see
>>>>>>> arguments for the other roles to be stricter about this, but there are also
>>>>>>> a lot of users who wouldn't want those to be strict either (like "query"
>>>>>>> nodes).
>>>>>>> >>>>
>>>>>>> >>>> Maybe we just put in stronger guarantees that if a non-overseer
>>>>>>> role node HAS to be selected to become overseer, it will try to migrate the
>>>>>>> overseer job to a node with the overseer role whenever one becomes live.
>>>>>>> >>>>
>>>>>>> >>>> So maybe we don't have special rules per role, but instead
>>>>>>> roles can either be defined as "Strict" or "Loose" (better names likely
>>>>>>> exist), and the roles come with a default (Overseer -> Loose, Data ->
>>>>>>> Strict, Query -> Loose, etc.). And it is up to each role to define how to
>>>>>>> behave when running in LOOSE mode and a non-role node is used then a role
>>>>>>> node comes online (like the overseer example given above).
>>>>>>> >>>>
>>>>>>> >>>> With the Strict/Loose option and sensible defaults, users
>>>>>>> cannot trip themselves up by default, but the option is there for people to
>>>>>>> tinker and have an iron grip over their cluster.
>>>>>>> >>>>
>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com>
>>>>>>> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> Noble wrote:
>>>>>>> >>>>> > We are not modifying the way the "overseer role" works
>>>>>>> today. We are just changing the definition and standardizing the
>>>>>>> configuration & discoverability
>>>>>>> >>>>> Ishan wrote:
>>>>>>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER
>>>>>>> role (which currently stands for preferred overseer). We can take a stab at
>>>>>>> refactoring it later.
>>>>>>> >>>>>
>>>>>>> >>>>> Grouping these two comments together, since I think they are
>>>>>>> saying the same thing. I think this is part of my confusion. We have an old
>>>>>>> system that doesn't work the way we want the new system to work. There may
>>>>>>> be people already using the old system. What path do we offer for folks
>>>>>>> using the old system to migrate to the new system? What happens if somebody
>>>>>>> accidentally tries to use both systems at the same time?
>>>>>>> >>>>>
>>>>>>> >>>>> Ishan wrote:
>>>>>>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER
>>>>>>> role] are live, Solr guarantees that one of those nodes becomes the
>>>>>>> overseer.", I meant to somewhat capture the current behaviour as the
>>>>>>> OVERSEER role performs today. Do you see any inconsistency with this
>>>>>>> statement vs. what it does today?
>>>>>>> >>>>>
>>>>>>> >>>>> This doesn't really address my concern around what happens if
>>>>>>> all of our existing OVERSEER candidates are down. When at least one of them
>>>>>>> is up, the overseer will go there, and that is good and expected. But what
>>>>>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>>>>>> the old system, would imply that the overseer election goes to some other
>>>>>>> unrelated, untagged node. I disagree with this implementation choice. This
>>>>>>> sounds like something role specific to determine, but I would like to see
>>>>>>> us be more strict about it. I don't want cores leaking out of my data
>>>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>>>> >>>>>
>>>>>>> >>>>> Noble wrote:
>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a node in
>>>>>>> the following request?
>>>>>>> >>>>>
>>>>>>> >>>>> You're absolutely correct, thanks for pointing this out. Let's
>>>>>>> leave it as is.
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com>
>>>>>>> wrote:
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Replying to the top post in this thread because there has
>>>>>>> been a lot of discussion and I don't want to look like I'm continuing any
>>>>>>> of those particular threads.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> I finally had time to sit down and think about this with the
>>>>>>> attention it deserves and am generally happy with how the conversation has
>>>>>>> shaped the current proposal.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> GOOD: I think using system properties to define node roles
>>>>>>> is fine and I like that data is the default role when not defined. I think
>>>>>>> it is important to hold on to the guarantee that an active overseer will
>>>>>>> land on an overseer node role.
>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for
>>>>>>> folks using the current OVERSEER role. I am not sure that something can be
>>>>>>> done automatically since they need to now specify new properties at
>>>>>>> startup. Maybe we need to include loud warnings or support both approaches
>>>>>>> for a time?
>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer
>>>>>>> nodes fail, then it is implied the overseer will go to one of the data
>>>>>>> nodes. The specific wording in the SIP - "When one or more such nodes are
>>>>>>> live, Solr guarantees that one of those nodes become the overseer." implies
>>>>>>> to me that failover could go from overseer1 to overseer2 to overseerN to
>>>>>>> random node. I feel like we need to have some recording that there were
>>>>>>> dedicated overseer nodes and stop the cascading failure instead of churning
>>>>>>> through our data nodes.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed scope
>>>>>>> of "coordinator" roles from a split query/indexing standpoint. I understand
>>>>>>> that these are used as examples, but would like stronger language that new
>>>>>>> roles should also go through their own SIP discussions.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node
>>>>>>> liveness in two different places now. We have the live nodes and we have
>>>>>>> the node roles stored in two different places in zookeeper and it feels
>>>>>>> like this would lead to race conditions or split brain or other hard to
>>>>>>> diagnose bugs when those two lists don't agree with each other. This also
>>>>>>> feels like it contradicts the "single source of truth" idea later stated in
>>>>>>> the proposal. I see Gus's arguments for decoupling these and am not
>>>>>>> strongly opposed, I just get a lurking feeling about it. Even if we don't
>>>>>>> do this, I would like this called out explicitly in the alternative
>>>>>>> approaches section as something that we considered and rejected, with
>>>>>>> details why,
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an additional
>>>>>>> call out here that all operations are GET because nodes cannot be changed
>>>>>>> at runtime.
>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the previous
>>>>>>> OVERSEER preference role?
>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of
>>>>>>> available roles for a cluster. I _think_ this could be based on the version
>>>>>>> that the cluster is running? Would be useful to be able to interrogate a
>>>>>>> cluster in the future... we're seeing OOM issues on queries, can we add
>>>>>>> some query nodes? When were they introduced? I don't know what path this
>>>>>>> API should exist at.
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the SIP
>>>>>>> document. Not sure if there's a better path that we could go for.
>>>>>>> >>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which
>>>>>>> parts are string literals and which parts are meant to be substituted by
>>>>>>> the operator? GET /api/cluster/roles/data would become GET
>>>>>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>>>>>> >>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1
>>>>>>> should be GET /api/cluster/roles/${nodename} dropping the intermediate
>>>>>>> "nodes"
>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that
>>>>>>> intermediate "nodes" node.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some
>>>>>>> permissions? Maybe this requirement is too fundamental to the operation of
>>>>>>> a cluster and everybody would have to be able to do it.
>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to
>>>>>>> treat roles? Implementation detail that the servers will figure out? Or
>>>>>>> strict guidance where the client needs to check where specific roles are
>>>>>>> before sending any further communication to the server?
>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request that
>>>>>>> it can't fulfil? An overseer node gets a query or an update. A data node
>>>>>>> gets a collection creation request. Do they forward it on to an appropriate
>>>>>>> node, or do they reject it? Should this be configurable? If not, then it
>>>>>>> seems like lazy or poorly configured clients will defeat this isolation
>>>>>>> system quite easily.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave when
>>>>>>> roles are added mean? I thought we established that they are not dynamic.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Thanks,
>>>>>>> >>>>>>> Mike
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Hi,
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>>> >>>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> We also wish to add first class support for Query nodes
>>>>>>> that are used to process user queries by forwarding to data nodes,
>>>>>>> merging/aggregating them and presenting to users. This concept exists as
>>>>>>> first class citizens in most other search engines. This is a chance for
>>>>>>> Solr to catch up.
>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Regards,
>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>> http://www.needhamsoftware.com (work)
>>>>>>> >>> http://www.the111shift.com (play)
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>>>
>>>>>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>
>
> --
> -----------------------------------------------------
> Noble Paul
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: First class support for node roles

Posted by Noble Paul <no...@gmail.com>.

I recommend the following format for the role spec

roles=<role-name>:<role-value>

each role will have an enum of allowed values and a default value


   - role name: *data*
      - values: [*on*, *off]*
      - default: *allowed*
   - role name: *overseer*
      - values: [*allowed*, *disallowed*, *preferred]*
      - default : *allowed*
   - role name:* coordinator*
      - values : [*on*, *off]*
      - default: *off*


examples
roles=data:on,overseer:allowed (This is redundant because it uses all the
default values. If a node is started without any roles value this is the
default behavior)
roles=data:off,overseer:preferred ( do not allow data, join overseer
election at head)
roles=coordinator:on,data:on (role as coordinator, but allow data, it's
same as roles=coordinator:on)
roles=coordinator:on,data:off (role as coordinator, disallow data)


On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <il...@gmail.com> wrote:

> If we go with no negative node roles and overseer node role is not strict
> (i.e. it’s a "preferred overseer"), then one would need to define a second
> node role "no_overseer" to explicitly exclude a node from ever becoming
> overseer (which I think is a useful feature until we switch the cluster
> default to not using the overseer), plus the implementation of these two
> node roles will obviously be coupled (and what if a node has both defined?).
>
> I prefer strict node roles.
> Maybe we could have node roles with [optional] parameters to let the node
> role implementation decide ?
> The overseer node role for example could have one of 3 values defined for
> each node: “preferred” (default, equivalent to the existing overseer role),
> "accepted" (equivalent to currently not defining the overseer role) and
> "no_way" (does not exist today).
>
> This could be useful in other contexts. A node role “data” could be “fast”
> or “slow” depending on type of local persistent storage for example…
>
> Ilan
>
> On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com> wrote:
>
>> I really don't think we should have types of roles. Not negative/positive
>> and not strict/non-strict. You have a role or you don't. What that means is
>> up to the code implementing the role.
>>
>> Roles should be free to configure a preference order (binary, or n-ary or
>> whatever, strict or loose), prohibit behavior, or enable behavior. In this
>> SIP I feel we should focus on How to identify what node has what role, How
>> to designate what roles a node has via config/params, and the API's for
>> interacting with roles.
>>
>> We should for example be able to support roles such as
>>
>> PREFERRED_OVERSEER
>> DATA
>> NO_ROUTED_ALIAS  (just an example, not something I mean to suggest)
>>
>> Details about role implementation should probably be discussed in a
>> thread about that role.  Obviously we should think about the name carefully
>> to leave options open should we want to enhance things later so maybe
>>
>> OVERSEER_PREF  or just  OVERSEER
>>
>> would be better since it merely reades that the node implements some sort
>> of preference or config regarding overseer... but all this can be decided
>> on a per role basis
>>
>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <no...@gmail.com> wrote:
>>
>>> Negative roles have a place
>>>
>>> Example is overseer
>>>
>>> There are 3 possible choices for that role
>>>
>>> a) preferred: always be in front of the election queue
>>> b) on: not preferred, but can be an overseer if no preferred overseer
>>> nodes are available
>>> c) off: never become an overseer
>>>
>>> Today we only have options 'a' and 'b' . In a future ticket, we may
>>> implement C
>>>
>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote:
>>>
>>>> Negative roles add a lot of complexity, I would really want to stay
>>>> away from them. That’s why I want strict roles up front. It’s maybe ok to
>>>> push this decision out, but it also seems like the sort of thing we should
>>>> consider at the start.
>>>>
>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <no...@gmail.com> wrote:
>>>>
>>>>> Yes. Negative roles is not a bad idea. If I start a node for
>>>>> machine learning purposes, I wouldn't want that node to ever participate in
>>>>> overseer election
>>>>>
>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com> wrote:
>>>>>
>>>>>> If we have non strict roles (like overseer), then it does make sense
>>>>>> to have negative roles.
>>>>>> That way I can define which are the two nodes that I'd prefer the
>>>>>> overseer to run on, and a few other nodes on which it should
>>>>>> definitely never run for various reasons. And in case these
>>>>>> "!overseer" are the only nodes left in the cluster, let the cluster
>>>>>> fail the same way it would if there were no data nodes available.
>>>>>>
>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
>>>>>> houstonputman@gmail.com> wrote:
>>>>>> >>>
>>>>>> >>> With the Strict/Loose option and sensible defaults, users cannot
>>>>>> trip themselves up by default, but the option is there for people to tinker
>>>>>> and have an iron grip over their cluster.
>>>>>> >>
>>>>>> >>
>>>>>> >> +1 to sensible defaults so users don't trip themselves. The option
>>>>>> to tinker for tighter grip can be tackled later, either on a per role basis
>>>>>> or as a generic concept later.
>>>>>> >
>>>>>> >
>>>>>> > +1 - Can definitely be added later if we so desire, not needed for
>>>>>> this SIP
>>>>>> >
>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> I think the key  is to let the roles have full control of the
>>>>>> implications of having/not having that role. No need for even a
>>>>>> strict/loose designation. The question of do you have the role is yes/no
>>>>>> with no logic to guess if the role is implied or not, The question of will
>>>>>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>>>>> >>>
>>>>>> >>> Once you figure out who has a role (or not) what that means is up
>>>>>> to the role code.
>>>>>> >>>
>>>>>> >>> Corollary: we don't have to change the way overseer works in this
>>>>>> SIP. We can rework it or not as we see fit separately.
>>>>>> >>
>>>>>> >>
>>>>>> >> +1
>>>>>> >>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> Only thing we need to do is find a wording that makes the above
>>>>>> clear on first read through the SIP :)
>>>>>> >>>
>>>>>> >>> -Gus
>>>>>> >>>
>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>>>>> houstonputman@gmail.com> wrote:
>>>>>> >>>>>
>>>>>> >>>>> This doesn't really address my concern around what happens if
>>>>>> all of our existing OVERSEER candidates are down. When at least one of them
>>>>>> is up, the overseer will go there, and that is good and expected. But what
>>>>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>>>>> the old system, would imply that the overseer election goes to some other
>>>>>> unrelated, untagged node. I disagree with this implementation choice. This
>>>>>> sounds like something role specific to determine, but I would like to see
>>>>>> us be more strict about it. I don't want cores leaking out of my data
>>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> I'm very strongly in favor of not letting users design a system
>>>>>> in which the cluster can be "live" without an overseer. I understand that
>>>>>> the overseer can be taxing to the cluster, but honestly what is the point
>>>>>> of having an untaxed cluster that doesn't have an overseer? I can see
>>>>>> arguments for the other roles to be stricter about this, but there are also
>>>>>> a lot of users who wouldn't want those to be strict either (like "query"
>>>>>> nodes).
>>>>>> >>>>
>>>>>> >>>> Maybe we just put in stronger guarantees that if a non-overseer
>>>>>> role node HAS to be selected to become overseer, it will try to migrate the
>>>>>> overseer job to a node with the overseer role whenever one becomes live.
>>>>>> >>>>
>>>>>> >>>> So maybe we don't have special rules per role, but instead roles
>>>>>> can either be defined as "Strict" or "Loose" (better names likely exist),
>>>>>> and the roles come with a default (Overseer -> Loose, Data -> Strict, Query
>>>>>> -> Loose, etc.). And it is up to each role to define how to behave when
>>>>>> running in LOOSE mode and a non-role node is used then a role node comes
>>>>>> online (like the overseer example given above).
>>>>>> >>>>
>>>>>> >>>> With the Strict/Loose option and sensible defaults, users cannot
>>>>>> trip themselves up by default, but the option is there for people to tinker
>>>>>> and have an iron grip over their cluster.
>>>>>> >>>>
>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com>
>>>>>> wrote:
>>>>>> >>>>>
>>>>>> >>>>> Noble wrote:
>>>>>> >>>>> > We are not modifying the way the "overseer role" works today.
>>>>>> We are just changing the definition and standardizing the configuration &
>>>>>> discoverability
>>>>>> >>>>> Ishan wrote:
>>>>>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER
>>>>>> role (which currently stands for preferred overseer). We can take a stab at
>>>>>> refactoring it later.
>>>>>> >>>>>
>>>>>> >>>>> Grouping these two comments together, since I think they are
>>>>>> saying the same thing. I think this is part of my confusion. We have an old
>>>>>> system that doesn't work the way we want the new system to work. There may
>>>>>> be people already using the old system. What path do we offer for folks
>>>>>> using the old system to migrate to the new system? What happens if somebody
>>>>>> accidentally tries to use both systems at the same time?
>>>>>> >>>>>
>>>>>> >>>>> Ishan wrote:
>>>>>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER
>>>>>> role] are live, Solr guarantees that one of those nodes becomes the
>>>>>> overseer.", I meant to somewhat capture the current behaviour as the
>>>>>> OVERSEER role performs today. Do you see any inconsistency with this
>>>>>> statement vs. what it does today?
>>>>>> >>>>>
>>>>>> >>>>> This doesn't really address my concern around what happens if
>>>>>> all of our existing OVERSEER candidates are down. When at least one of them
>>>>>> is up, the overseer will go there, and that is good and expected. But what
>>>>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>>>>> the old system, would imply that the overseer election goes to some other
>>>>>> unrelated, untagged node. I disagree with this implementation choice. This
>>>>>> sounds like something role specific to determine, but I would like to see
>>>>>> us be more strict about it. I don't want cores leaking out of my data
>>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>>> >>>>>
>>>>>> >>>>> Noble wrote:
>>>>>> >>>>> > If we do that how do we know if xyz is a role or a node in
>>>>>> the following request?
>>>>>> >>>>>
>>>>>> >>>>> You're absolutely correct, thanks for pointing this out. Let's
>>>>>> leave it as is.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com>
>>>>>> wrote:
>>>>>> >>>>>>>
>>>>>> >>>>>>> Replying to the top post in this thread because there has
>>>>>> been a lot of discussion and I don't want to look like I'm continuing any
>>>>>> of those particular threads.
>>>>>> >>>>>>>
>>>>>> >>>>>>> I finally had time to sit down and think about this with the
>>>>>> attention it deserves and am generally happy with how the conversation has
>>>>>> shaped the current proposal.
>>>>>> >>>>>>>
>>>>>> >>>>>>> GOOD: I think using system properties to define node roles is
>>>>>> fine and I like that data is the default role when not defined. I think it
>>>>>> is important to hold on to the guarantee that an active overseer will land
>>>>>> on an overseer node role.
>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for
>>>>>> folks using the current OVERSEER role. I am not sure that something can be
>>>>>> done automatically since they need to now specify new properties at
>>>>>> startup. Maybe we need to include loud warnings or support both approaches
>>>>>> for a time?
>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer
>>>>>> nodes fail, then it is implied the overseer will go to one of the data
>>>>>> nodes. The specific wording in the SIP - "When one or more such nodes are
>>>>>> live, Solr guarantees that one of those nodes become the overseer." implies
>>>>>> to me that failover could go from overseer1 to overseer2 to overseerN to
>>>>>> random node. I feel like we need to have some recording that there were
>>>>>> dedicated overseer nodes and stop the cascading failure instead of churning
>>>>>> through our data nodes.
>>>>>> >>>>>>>
>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed scope
>>>>>> of "coordinator" roles from a split query/indexing standpoint. I understand
>>>>>> that these are used as examples, but would like stronger language that new
>>>>>> roles should also go through their own SIP discussions.
>>>>>> >>>>>>>
>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node
>>>>>> liveness in two different places now. We have the live nodes and we have
>>>>>> the node roles stored in two different places in zookeeper and it feels
>>>>>> like this would lead to race conditions or split brain or other hard to
>>>>>> diagnose bugs when those two lists don't agree with each other. This also
>>>>>> feels like it contradicts the "single source of truth" idea later stated in
>>>>>> the proposal. I see Gus's arguments for decoupling these and am not
>>>>>> strongly opposed, I just get a lurking feeling about it. Even if we don't
>>>>>> do this, I would like this called out explicitly in the alternative
>>>>>> approaches section as something that we considered and rejected, with
>>>>>> details why,
>>>>>> >>>>>>>
>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an additional
>>>>>> call out here that all operations are GET because nodes cannot be changed
>>>>>> at runtime.
>>>>>> >>>>>>> CLARIFICATION: How does this interact with the previous
>>>>>> OVERSEER preference role?
>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of
>>>>>> available roles for a cluster. I _think_ this could be based on the version
>>>>>> that the cluster is running? Would be useful to be able to interrogate a
>>>>>> cluster in the future... we're seeing OOM issues on queries, can we add
>>>>>> some query nodes? When were they introduced? I don't know what path this
>>>>>> API should exist at.
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the SIP
>>>>>> document. Not sure if there's a better path that we could go for.
>>>>>> >>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which
>>>>>> parts are string literals and which parts are meant to be substituted by
>>>>>> the operator? GET /api/cluster/roles/data would become GET
>>>>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>>>>> >>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1
>>>>>> should be GET /api/cluster/roles/${nodename} dropping the intermediate
>>>>>> "nodes"
>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that
>>>>>> intermediate "nodes" node.
>>>>>> >>>>>>>
>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some permissions?
>>>>>> Maybe this requirement is too fundamental to the operation of a cluster and
>>>>>> everybody would have to be able to do it.
>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to
>>>>>> treat roles? Implementation detail that the servers will figure out? Or
>>>>>> strict guidance where the client needs to check where specific roles are
>>>>>> before sending any further communication to the server?
>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request that
>>>>>> it can't fulfil? An overseer node gets a query or an update. A data node
>>>>>> gets a collection creation request. Do they forward it on to an appropriate
>>>>>> node, or do they reject it? Should this be configurable? If not, then it
>>>>>> seems like lazy or poorly configured clients will defeat this isolation
>>>>>> system quite easily.
>>>>>> >>>>>>>
>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave when
>>>>>> roles are added mean? I thought we established that they are not dynamic.
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> Thanks,
>>>>>> >>>>>>> Mike
>>>>>> >>>>>>>
>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Hi,
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>> >>>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> We also wish to add first class support for Query nodes that
>>>>>> are used to process user queries by forwarding to data nodes,
>>>>>> merging/aggregating them and presenting to users. This concept exists as
>>>>>> first class citizens in most other search engines. This is a chance for
>>>>>> Solr to catch up.
>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Regards,
>>>>>> >>>>>>>> Ishan / Noble / Hitesh
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> http://www.needhamsoftware.com (work)
>>>>>> >>> http://www.the111shift.com (play)
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>>
>>>>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

-- 
-----------------------------------------------------
Noble Paul

Re: First class support for node roles

Posted by Gus Heck <gu...@gmail.com>.

On Sat, Dec 4, 2021 at 7:01 PM Ilan Ginzburg <il...@gmail.com> wrote:

> If we go with no negative node roles and overseer node role is not strict
> (i.e. it’s a "preferred overseer"), then one would need to define a second
> node role "no_overseer" to explicitly exclude a node from ever becoming
> overseer (which I think is a useful feature until we switch the cluster
> default to not using the overseer), plus the implementation of these two
> node roles will obviously be coupled (and what if a node has both defined?).
>
>
Or add configuration options to the existing role? Only an issue if you
want to change the current behavior away from the default install and even
thin if it's easily put back with config that's not too big of a deal.

Re: First class support for node roles

Posted by Ilan Ginzburg <il...@gmail.com>.

If we go with no negative node roles and overseer node role is not strict
(i.e. it’s a "preferred overseer"), then one would need to define a second
node role "no_overseer" to explicitly exclude a node from ever becoming
overseer (which I think is a useful feature until we switch the cluster
default to not using the overseer), plus the implementation of these two
node roles will obviously be coupled (and what if a node has both defined?).

I prefer strict node roles.
Maybe we could have node roles with [optional] parameters to let the node
role implementation decide ?
The overseer node role for example could have one of 3 values defined for
each node: “preferred” (default, equivalent to the existing overseer role),
"accepted" (equivalent to currently not defining the overseer role) and
"no_way" (does not exist today).

This could be useful in other contexts. A node role “data” could be “fast”
or “slow” depending on type of local persistent storage for example…

Ilan

On Fri 3 Dec 2021 at 16:10, Gus Heck <gu...@gmail.com> wrote:

> I really don't think we should have types of roles. Not negative/positive
> and not strict/non-strict. You have a role or you don't. What that means is
> up to the code implementing the role.
>
> Roles should be free to configure a preference order (binary, or n-ary or
> whatever, strict or loose), prohibit behavior, or enable behavior. In this
> SIP I feel we should focus on How to identify what node has what role, How
> to designate what roles a node has via config/params, and the API's for
> interacting with roles.
>
> We should for example be able to support roles such as
>
> PREFERRED_OVERSEER
> DATA
> NO_ROUTED_ALIAS  (just an example, not something I mean to suggest)
>
> Details about role implementation should probably be discussed in a thread
> about that role.  Obviously we should think about the name carefully to
> leave options open should we want to enhance things later so maybe
>
> OVERSEER_PREF  or just  OVERSEER
>
> would be better since it merely reades that the node implements some sort
> of preference or config regarding overseer... but all this can be decided
> on a per role basis
>
> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <no...@gmail.com> wrote:
>
>> Negative roles have a place
>>
>> Example is overseer
>>
>> There are 3 possible choices for that role
>>
>> a) preferred: always be in front of the election queue
>> b) on: not preferred, but can be an overseer if no preferred overseer
>> nodes are available
>> c) off: never become an overseer
>>
>> Today we only have options 'a' and 'b' . In a future ticket, we may
>> implement C
>>
>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote:
>>
>>> Negative roles add a lot of complexity, I would really want to stay away
>>> from them. That’s why I want strict roles up front. It’s maybe ok to push
>>> this decision out, but it also seems like the sort of thing we should
>>> consider at the start.
>>>
>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <no...@gmail.com> wrote:
>>>
>>>> Yes. Negative roles is not a bad idea. If I start a node for
>>>> machine learning purposes, I wouldn't want that node to ever participate in
>>>> overseer election
>>>>
>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com> wrote:
>>>>
>>>>> If we have non strict roles (like overseer), then it does make sense
>>>>> to have negative roles.
>>>>> That way I can define which are the two nodes that I'd prefer the
>>>>> overseer to run on, and a few other nodes on which it should
>>>>> definitely never run for various reasons. And in case these
>>>>> "!overseer" are the only nodes left in the cluster, let the cluster
>>>>> fail the same way it would if there were no data nodes available.
>>>>>
>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <ho...@gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> With the Strict/Loose option and sensible defaults, users cannot
>>>>> trip themselves up by default, but the option is there for people to tinker
>>>>> and have an iron grip over their cluster.
>>>>> >>
>>>>> >>
>>>>> >> +1 to sensible defaults so users don't trip themselves. The option
>>>>> to tinker for tighter grip can be tackled later, either on a per role basis
>>>>> or as a generic concept later.
>>>>> >
>>>>> >
>>>>> > +1 - Can definitely be added later if we so desire, not needed for
>>>>> this SIP
>>>>> >
>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com> wrote:
>>>>> >>>
>>>>> >>> I think the key  is to let the roles have full control of the
>>>>> implications of having/not having that role. No need for even a
>>>>> strict/loose designation. The question of do you have the role is yes/no
>>>>> with no logic to guess if the role is implied or not, The question of will
>>>>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>>>> >>>
>>>>> >>> Once you figure out who has a role (or not) what that means is up
>>>>> to the role code.
>>>>> >>>
>>>>> >>> Corollary: we don't have to change the way overseer works in this
>>>>> SIP. We can rework it or not as we see fit separately.
>>>>> >>
>>>>> >>
>>>>> >> +1
>>>>> >>
>>>>> >>>
>>>>> >>>
>>>>> >>> Only thing we need to do is find a wording that makes the above
>>>>> clear on first read through the SIP :)
>>>>> >>>
>>>>> >>> -Gus
>>>>> >>>
>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>>>> houstonputman@gmail.com> wrote:
>>>>> >>>>>
>>>>> >>>>> This doesn't really address my concern around what happens if
>>>>> all of our existing OVERSEER candidates are down. When at least one of them
>>>>> is up, the overseer will go there, and that is good and expected. But what
>>>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>>>> the old system, would imply that the overseer election goes to some other
>>>>> unrelated, untagged node. I disagree with this implementation choice. This
>>>>> sounds like something role specific to determine, but I would like to see
>>>>> us be more strict about it. I don't want cores leaking out of my data
>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> I'm very strongly in favor of not letting users design a system
>>>>> in which the cluster can be "live" without an overseer. I understand that
>>>>> the overseer can be taxing to the cluster, but honestly what is the point
>>>>> of having an untaxed cluster that doesn't have an overseer? I can see
>>>>> arguments for the other roles to be stricter about this, but there are also
>>>>> a lot of users who wouldn't want those to be strict either (like "query"
>>>>> nodes).
>>>>> >>>>
>>>>> >>>> Maybe we just put in stronger guarantees that if a non-overseer
>>>>> role node HAS to be selected to become overseer, it will try to migrate the
>>>>> overseer job to a node with the overseer role whenever one becomes live.
>>>>> >>>>
>>>>> >>>> So maybe we don't have special rules per role, but instead roles
>>>>> can either be defined as "Strict" or "Loose" (better names likely exist),
>>>>> and the roles come with a default (Overseer -> Loose, Data -> Strict, Query
>>>>> -> Loose, etc.). And it is up to each role to define how to behave when
>>>>> running in LOOSE mode and a non-role node is used then a role node comes
>>>>> online (like the overseer example given above).
>>>>> >>>>
>>>>> >>>> With the Strict/Loose option and sensible defaults, users cannot
>>>>> trip themselves up by default, but the option is there for people to tinker
>>>>> and have an iron grip over their cluster.
>>>>> >>>>
>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> wrote:
>>>>> >>>>>
>>>>> >>>>> Noble wrote:
>>>>> >>>>> > We are not modifying the way the "overseer role" works today.
>>>>> We are just changing the definition and standardizing the configuration &
>>>>> discoverability
>>>>> >>>>> Ishan wrote:
>>>>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER role
>>>>> (which currently stands for preferred overseer). We can take a stab at
>>>>> refactoring it later.
>>>>> >>>>>
>>>>> >>>>> Grouping these two comments together, since I think they are
>>>>> saying the same thing. I think this is part of my confusion. We have an old
>>>>> system that doesn't work the way we want the new system to work. There may
>>>>> be people already using the old system. What path do we offer for folks
>>>>> using the old system to migrate to the new system? What happens if somebody
>>>>> accidentally tries to use both systems at the same time?
>>>>> >>>>>
>>>>> >>>>> Ishan wrote:
>>>>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER role]
>>>>> are live, Solr guarantees that one of those nodes becomes the overseer.", I
>>>>> meant to somewhat capture the current behaviour as the OVERSEER role
>>>>> performs today. Do you see any inconsistency with this statement vs. what
>>>>> it does today?
>>>>> >>>>>
>>>>> >>>>> This doesn't really address my concern around what happens if
>>>>> all of our existing OVERSEER candidates are down. When at least one of them
>>>>> is up, the overseer will go there, and that is good and expected. But what
>>>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>>>> the old system, would imply that the overseer election goes to some other
>>>>> unrelated, untagged node. I disagree with this implementation choice. This
>>>>> sounds like something role specific to determine, but I would like to see
>>>>> us be more strict about it. I don't want cores leaking out of my data
>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>> >>>>>
>>>>> >>>>> Noble wrote:
>>>>> >>>>> > If we do that how do we know if xyz is a role or a node in the
>>>>> following request?
>>>>> >>>>>
>>>>> >>>>> You're absolutely correct, thanks for pointing this out. Let's
>>>>> leave it as is.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com>
>>>>> wrote:
>>>>> >>>>>>>
>>>>> >>>>>>> Replying to the top post in this thread because there has been
>>>>> a lot of discussion and I don't want to look like I'm continuing any of
>>>>> those particular threads.
>>>>> >>>>>>>
>>>>> >>>>>>> I finally had time to sit down and think about this with the
>>>>> attention it deserves and am generally happy with how the conversation has
>>>>> shaped the current proposal.
>>>>> >>>>>>>
>>>>> >>>>>>> GOOD: I think using system properties to define node roles is
>>>>> fine and I like that data is the default role when not defined. I think it
>>>>> is important to hold on to the guarantee that an active overseer will land
>>>>> on an overseer node role.
>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for folks
>>>>> using the current OVERSEER role. I am not sure that something can be done
>>>>> automatically since they need to now specify new properties at startup.
>>>>> Maybe we need to include loud warnings or support both approaches for a
>>>>> time?
>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer
>>>>> nodes fail, then it is implied the overseer will go to one of the data
>>>>> nodes. The specific wording in the SIP - "When one or more such nodes are
>>>>> live, Solr guarantees that one of those nodes become the overseer." implies
>>>>> to me that failover could go from overseer1 to overseer2 to overseerN to
>>>>> random node. I feel like we need to have some recording that there were
>>>>> dedicated overseer nodes and stop the cascading failure instead of churning
>>>>> through our data nodes.
>>>>> >>>>>>>
>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed scope of
>>>>> "coordinator" roles from a split query/indexing standpoint. I understand
>>>>> that these are used as examples, but would like stronger language that new
>>>>> roles should also go through their own SIP discussions.
>>>>> >>>>>>>
>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node liveness
>>>>> in two different places now. We have the live nodes and we have the node
>>>>> roles stored in two different places in zookeeper and it feels like this
>>>>> would lead to race conditions or split brain or other hard to diagnose bugs
>>>>> when those two lists don't agree with each other. This also feels like it
>>>>> contradicts the "single source of truth" idea later stated in the proposal.
>>>>> I see Gus's arguments for decoupling these and am not strongly opposed, I
>>>>> just get a lurking feeling about it. Even if we don't do this, I would like
>>>>> this called out explicitly in the alternative approaches section as
>>>>> something that we considered and rejected, with details why,
>>>>> >>>>>>>
>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an additional
>>>>> call out here that all operations are GET because nodes cannot be changed
>>>>> at runtime.
>>>>> >>>>>>> CLARIFICATION: How does this interact with the previous
>>>>> OVERSEER preference role?
>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of available
>>>>> roles for a cluster. I _think_ this could be based on the version that the
>>>>> cluster is running? Would be useful to be able to interrogate a cluster in
>>>>> the future... we're seeing OOM issues on queries, can we add some query
>>>>> nodes? When were they introduced? I don't know what path this API should
>>>>> exist at.
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the SIP
>>>>> document. Not sure if there's a better path that we could go for.
>>>>> >>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which
>>>>> parts are string literals and which parts are meant to be substituted by
>>>>> the operator? GET /api/cluster/roles/data would become GET
>>>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>>>> >>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1
>>>>> should be GET /api/cluster/roles/${nodename} dropping the intermediate
>>>>> "nodes"
>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that
>>>>> intermediate "nodes" node.
>>>>> >>>>>>>
>>>>> >>>>>>> CLARIFICATION: Should listing roles require some permissions?
>>>>> Maybe this requirement is too fundamental to the operation of a cluster and
>>>>> everybody would have to be able to do it.
>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to
>>>>> treat roles? Implementation detail that the servers will figure out? Or
>>>>> strict guidance where the client needs to check where specific roles are
>>>>> before sending any further communication to the server?
>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request that it
>>>>> can't fulfil? An overseer node gets a query or an update. A data node gets
>>>>> a collection creation request. Do they forward it on to an appropriate
>>>>> node, or do they reject it? Should this be configurable? If not, then it
>>>>> seems like lazy or poorly configured clients will defeat this isolation
>>>>> system quite easily.
>>>>> >>>>>>>
>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave when
>>>>> roles are added mean? I thought we established that they are not dynamic.
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> Thanks,
>>>>> >>>>>>> Mike
>>>>> >>>>>>>
>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>> >>>>>>>>
>>>>> >>>>>>>> Hi,
>>>>> >>>>>>>>
>>>>> >>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>> >>>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>> >>>>>>>>
>>>>> >>>>>>>> We also wish to add first class support for Query nodes that
>>>>> are used to process user queries by forwarding to data nodes,
>>>>> merging/aggregating them and presenting to users. This concept exists as
>>>>> first class citizens in most other search engines. This is a chance for
>>>>> Solr to catch up.
>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>> >>>>>>>>
>>>>> >>>>>>>> Regards,
>>>>> >>>>>>>> Ishan / Noble / Hitesh
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> http://www.needhamsoftware.com (work)
>>>>> >>> http://www.the111shift.com (play)
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>
>>>>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: First class support for node roles

Posted by Gus Heck <gu...@gmail.com>.

I really don't think we should have types of roles. Not negative/positive
and not strict/non-strict. You have a role or you don't. What that means is
up to the code implementing the role.

Roles should be free to configure a preference order (binary, or n-ary or
whatever, strict or loose), prohibit behavior, or enable behavior. In this
SIP I feel we should focus on How to identify what node has what role, How
to designate what roles a node has via config/params, and the API's for
interacting with roles.

We should for example be able to support roles such as

PREFERRED_OVERSEER
DATA
NO_ROUTED_ALIAS  (just an example, not something I mean to suggest)

Details about role implementation should probably be discussed in a thread
about that role.  Obviously we should think about the name carefully to
leave options open should we want to enhance things later so maybe

OVERSEER_PREF  or just  OVERSEER

would be better since it merely reades that the node implements some sort
of preference or config regarding overseer... but all this can be decided
on a per role basis

On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <no...@gmail.com> wrote:

> Negative roles have a place
>
> Example is overseer
>
> There are 3 possible choices for that role
>
> a) preferred: always be in front of the election queue
> b) on: not preferred, but can be an overseer if no preferred overseer
> nodes are available
> c) off: never become an overseer
>
> Today we only have options 'a' and 'b' . In a future ticket, we may
> implement C
>
> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote:
>
>> Negative roles add a lot of complexity, I would really want to stay away
>> from them. That’s why I want strict roles up front. It’s maybe ok to push
>> this decision out, but it also seems like the sort of thing we should
>> consider at the start.
>>
>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <no...@gmail.com> wrote:
>>
>>> Yes. Negative roles is not a bad idea. If I start a node for
>>> machine learning purposes, I wouldn't want that node to ever participate in
>>> overseer election
>>>
>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com> wrote:
>>>
>>>> If we have non strict roles (like overseer), then it does make sense
>>>> to have negative roles.
>>>> That way I can define which are the two nodes that I'd prefer the
>>>> overseer to run on, and a few other nodes on which it should
>>>> definitely never run for various reasons. And in case these
>>>> "!overseer" are the only nodes left in the cluster, let the cluster
>>>> fail the same way it would if there were no data nodes available.
>>>>
>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <ho...@gmail.com>
>>>> wrote:
>>>> >>>
>>>> >>> With the Strict/Loose option and sensible defaults, users cannot
>>>> trip themselves up by default, but the option is there for people to tinker
>>>> and have an iron grip over their cluster.
>>>> >>
>>>> >>
>>>> >> +1 to sensible defaults so users don't trip themselves. The option
>>>> to tinker for tighter grip can be tackled later, either on a per role basis
>>>> or as a generic concept later.
>>>> >
>>>> >
>>>> > +1 - Can definitely be added later if we so desire, not needed for
>>>> this SIP
>>>> >
>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>>> ichattopadhyaya@gmail.com> wrote:
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com> wrote:
>>>> >>>
>>>> >>> I think the key  is to let the roles have full control of the
>>>> implications of having/not having that role. No need for even a
>>>> strict/loose designation. The question of do you have the role is yes/no
>>>> with no logic to guess if the role is implied or not, The question of will
>>>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>>> >>>
>>>> >>> Once you figure out who has a role (or not) what that means is up
>>>> to the role code.
>>>> >>>
>>>> >>> Corollary: we don't have to change the way overseer works in this
>>>> SIP. We can rework it or not as we see fit separately.
>>>> >>
>>>> >>
>>>> >> +1
>>>> >>
>>>> >>>
>>>> >>>
>>>> >>> Only thing we need to do is find a wording that makes the above
>>>> clear on first read through the SIP :)
>>>> >>>
>>>> >>> -Gus
>>>> >>>
>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>>> houstonputman@gmail.com> wrote:
>>>> >>>>>
>>>> >>>>> This doesn't really address my concern around what happens if all
>>>> of our existing OVERSEER candidates are down. When at least one of them is
>>>> up, the overseer will go there, and that is good and expected. But what
>>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>>> the old system, would imply that the overseer election goes to some other
>>>> unrelated, untagged node. I disagree with this implementation choice. This
>>>> sounds like something role specific to determine, but I would like to see
>>>> us be more strict about it. I don't want cores leaking out of my data
>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>> whatever. Overseer shouldn't be special in this regard.
>>>> >>>>
>>>> >>>>
>>>> >>>> I'm very strongly in favor of not letting users design a system in
>>>> which the cluster can be "live" without an overseer. I understand that the
>>>> overseer can be taxing to the cluster, but honestly what is the point of
>>>> having an untaxed cluster that doesn't have an overseer? I can see
>>>> arguments for the other roles to be stricter about this, but there are also
>>>> a lot of users who wouldn't want those to be strict either (like "query"
>>>> nodes).
>>>> >>>>
>>>> >>>> Maybe we just put in stronger guarantees that if a non-overseer
>>>> role node HAS to be selected to become overseer, it will try to migrate the
>>>> overseer job to a node with the overseer role whenever one becomes live.
>>>> >>>>
>>>> >>>> So maybe we don't have special rules per role, but instead roles
>>>> can either be defined as "Strict" or "Loose" (better names likely exist),
>>>> and the roles come with a default (Overseer -> Loose, Data -> Strict, Query
>>>> -> Loose, etc.). And it is up to each role to define how to behave when
>>>> running in LOOSE mode and a non-role node is used then a role node comes
>>>> online (like the overseer example given above).
>>>> >>>>
>>>> >>>> With the Strict/Loose option and sensible defaults, users cannot
>>>> trip themselves up by default, but the option is there for people to tinker
>>>> and have an iron grip over their cluster.
>>>> >>>>
>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> wrote:
>>>> >>>>>
>>>> >>>>> Noble wrote:
>>>> >>>>> > We are not modifying the way the "overseer role" works today.
>>>> We are just changing the definition and standardizing the configuration &
>>>> discoverability
>>>> >>>>> Ishan wrote:
>>>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER role
>>>> (which currently stands for preferred overseer). We can take a stab at
>>>> refactoring it later.
>>>> >>>>>
>>>> >>>>> Grouping these two comments together, since I think they are
>>>> saying the same thing. I think this is part of my confusion. We have an old
>>>> system that doesn't work the way we want the new system to work. There may
>>>> be people already using the old system. What path do we offer for folks
>>>> using the old system to migrate to the new system? What happens if somebody
>>>> accidentally tries to use both systems at the same time?
>>>> >>>>>
>>>> >>>>> Ishan wrote:
>>>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER role]
>>>> are live, Solr guarantees that one of those nodes becomes the overseer.", I
>>>> meant to somewhat capture the current behaviour as the OVERSEER role
>>>> performs today. Do you see any inconsistency with this statement vs. what
>>>> it does today?
>>>> >>>>>
>>>> >>>>> This doesn't really address my concern around what happens if all
>>>> of our existing OVERSEER candidates are down. When at least one of them is
>>>> up, the overseer will go there, and that is good and expected. But what
>>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>>> the old system, would imply that the overseer election goes to some other
>>>> unrelated, untagged node. I disagree with this implementation choice. This
>>>> sounds like something role specific to determine, but I would like to see
>>>> us be more strict about it. I don't want cores leaking out of my data
>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>> whatever. Overseer shouldn't be special in this regard.
>>>> >>>>>
>>>> >>>>> Noble wrote:
>>>> >>>>> > If we do that how do we know if xyz is a role or a node in the
>>>> following request?
>>>> >>>>>
>>>> >>>>> You're absolutely correct, thanks for pointing this out. Let's
>>>> leave it as is.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>>> ichattopadhyaya@gmail.com> wrote:
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com>
>>>> wrote:
>>>> >>>>>>>
>>>> >>>>>>> Replying to the top post in this thread because there has been
>>>> a lot of discussion and I don't want to look like I'm continuing any of
>>>> those particular threads.
>>>> >>>>>>>
>>>> >>>>>>> I finally had time to sit down and think about this with the
>>>> attention it deserves and am generally happy with how the conversation has
>>>> shaped the current proposal.
>>>> >>>>>>>
>>>> >>>>>>> GOOD: I think using system properties to define node roles is
>>>> fine and I like that data is the default role when not defined. I think it
>>>> is important to hold on to the guarantee that an active overseer will land
>>>> on an overseer node role.
>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for folks
>>>> using the current OVERSEER role. I am not sure that something can be done
>>>> automatically since they need to now specify new properties at startup.
>>>> Maybe we need to include loud warnings or support both approaches for a
>>>> time?
>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer nodes
>>>> fail, then it is implied the overseer will go to one of the data nodes. The
>>>> specific wording in the SIP - "When one or more such nodes are live, Solr
>>>> guarantees that one of those nodes become the overseer." implies to me that
>>>> failover could go from overseer1 to overseer2 to overseerN to random node.
>>>> I feel like we need to have some recording that there were dedicated
>>>> overseer nodes and stop the cascading failure instead of churning through
>>>> our data nodes.
>>>> >>>>>>>
>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed scope of
>>>> "coordinator" roles from a split query/indexing standpoint. I understand
>>>> that these are used as examples, but would like stronger language that new
>>>> roles should also go through their own SIP discussions.
>>>> >>>>>>>
>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node liveness
>>>> in two different places now. We have the live nodes and we have the node
>>>> roles stored in two different places in zookeeper and it feels like this
>>>> would lead to race conditions or split brain or other hard to diagnose bugs
>>>> when those two lists don't agree with each other. This also feels like it
>>>> contradicts the "single source of truth" idea later stated in the proposal.
>>>> I see Gus's arguments for decoupling these and am not strongly opposed, I
>>>> just get a lurking feeling about it. Even if we don't do this, I would like
>>>> this called out explicitly in the alternative approaches section as
>>>> something that we considered and rejected, with details why,
>>>> >>>>>>>
>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an additional
>>>> call out here that all operations are GET because nodes cannot be changed
>>>> at runtime.
>>>> >>>>>>> CLARIFICATION: How does this interact with the previous
>>>> OVERSEER preference role?
>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of available
>>>> roles for a cluster. I _think_ this could be based on the version that the
>>>> cluster is running? Would be useful to be able to interrogate a cluster in
>>>> the future... we're seeing OOM issues on queries, can we add some query
>>>> nodes? When were they introduced? I don't know what path this API should
>>>> exist at.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the SIP
>>>> document. Not sure if there's a better path that we could go for.
>>>> >>>>>>
>>>> >>>>>>>
>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which parts
>>>> are string literals and which parts are meant to be substituted by the
>>>> operator? GET /api/cluster/roles/data would become GET
>>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>>> >>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1
>>>> should be GET /api/cluster/roles/${nodename} dropping the intermediate
>>>> "nodes"
>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that
>>>> intermediate "nodes" node.
>>>> >>>>>>>
>>>> >>>>>>> CLARIFICATION: Should listing roles require some permissions?
>>>> Maybe this requirement is too fundamental to the operation of a cluster and
>>>> everybody would have to be able to do it.
>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to
>>>> treat roles? Implementation detail that the servers will figure out? Or
>>>> strict guidance where the client needs to check where specific roles are
>>>> before sending any further communication to the server?
>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request that it
>>>> can't fulfil? An overseer node gets a query or an update. A data node gets
>>>> a collection creation request. Do they forward it on to an appropriate
>>>> node, or do they reject it? Should this be configurable? If not, then it
>>>> seems like lazy or poorly configured clients will defeat this isolation
>>>> system quite easily.
>>>> >>>>>>>
>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave when
>>>> roles are added mean? I thought we established that they are not dynamic.
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> Thanks,
>>>> >>>>>>> Mike
>>>> >>>>>>>
>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>>> ichattopadhyaya@gmail.com> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>> Hi,
>>>> >>>>>>>>
>>>> >>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>> >>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>> >>>>>>>>
>>>> >>>>>>>> We also wish to add first class support for Query nodes that
>>>> are used to process user queries by forwarding to data nodes,
>>>> merging/aggregating them and presenting to users. This concept exists as
>>>> first class citizens in most other search engines. This is a chance for
>>>> Solr to catch up.
>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>> >>>>>>>>
>>>> >>>>>>>> Regards,
>>>> >>>>>>>> Ishan / Noble / Hitesh
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> http://www.needhamsoftware.com (work)
>>>> >>> http://www.the111shift.com (play)
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>
>>>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: First class support for node roles

Posted by Noble Paul <no...@gmail.com>.

Negative roles have a place

Example is overseer

There are 3 possible choices for that role

a) preferred: always be in front of the election queue
b) on: not preferred, but can be an overseer if no preferred overseer nodes
are available
c) off: never become an overseer

Today we only have options 'a' and 'b' . In a future ticket, we may
implement C

On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote:

> Negative roles add a lot of complexity, I would really want to stay away
> from them. That’s why I want strict roles up front. It’s maybe ok to push
> this decision out, but it also seems like the sort of thing we should
> consider at the start.
>
> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <no...@gmail.com> wrote:
>
>> Yes. Negative roles is not a bad idea. If I start a node for
>> machine learning purposes, I wouldn't want that node to ever participate in
>> overseer election
>>
>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com> wrote:
>>
>>> If we have non strict roles (like overseer), then it does make sense
>>> to have negative roles.
>>> That way I can define which are the two nodes that I'd prefer the
>>> overseer to run on, and a few other nodes on which it should
>>> definitely never run for various reasons. And in case these
>>> "!overseer" are the only nodes left in the cluster, let the cluster
>>> fail the same way it would if there were no data nodes available.
>>>
>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <ho...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> With the Strict/Loose option and sensible defaults, users cannot
>>> trip themselves up by default, but the option is there for people to tinker
>>> and have an iron grip over their cluster.
>>> >>
>>> >>
>>> >> +1 to sensible defaults so users don't trip themselves. The option to
>>> tinker for tighter grip can be tackled later, either on a per role basis or
>>> as a generic concept later.
>>> >
>>> >
>>> > +1 - Can definitely be added later if we so desire, not needed for
>>> this SIP
>>> >
>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>> ichattopadhyaya@gmail.com> wrote:
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com> wrote:
>>> >>>
>>> >>> I think the key  is to let the roles have full control of the
>>> implications of having/not having that role. No need for even a
>>> strict/loose designation. The question of do you have the role is yes/no
>>> with no logic to guess if the role is implied or not, The question of will
>>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>> >>>
>>> >>> Once you figure out who has a role (or not) what that means is up to
>>> the role code.
>>> >>>
>>> >>> Corollary: we don't have to change the way overseer works in this
>>> SIP. We can rework it or not as we see fit separately.
>>> >>
>>> >>
>>> >> +1
>>> >>
>>> >>>
>>> >>>
>>> >>> Only thing we need to do is find a wording that makes the above
>>> clear on first read through the SIP :)
>>> >>>
>>> >>> -Gus
>>> >>>
>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>> houstonputman@gmail.com> wrote:
>>> >>>>>
>>> >>>>> This doesn't really address my concern around what happens if all
>>> of our existing OVERSEER candidates are down. When at least one of them is
>>> up, the overseer will go there, and that is good and expected. But what
>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>> the old system, would imply that the overseer election goes to some other
>>> unrelated, untagged node. I disagree with this implementation choice. This
>>> sounds like something role specific to determine, but I would like to see
>>> us be more strict about it. I don't want cores leaking out of my data
>>> roles, I don't want query processing to leak out of my "query" nodes or
>>> whatever. Overseer shouldn't be special in this regard.
>>> >>>>
>>> >>>>
>>> >>>> I'm very strongly in favor of not letting users design a system in
>>> which the cluster can be "live" without an overseer. I understand that the
>>> overseer can be taxing to the cluster, but honestly what is the point of
>>> having an untaxed cluster that doesn't have an overseer? I can see
>>> arguments for the other roles to be stricter about this, but there are also
>>> a lot of users who wouldn't want those to be strict either (like "query"
>>> nodes).
>>> >>>>
>>> >>>> Maybe we just put in stronger guarantees that if a non-overseer
>>> role node HAS to be selected to become overseer, it will try to migrate the
>>> overseer job to a node with the overseer role whenever one becomes live.
>>> >>>>
>>> >>>> So maybe we don't have special rules per role, but instead roles
>>> can either be defined as "Strict" or "Loose" (better names likely exist),
>>> and the roles come with a default (Overseer -> Loose, Data -> Strict, Query
>>> -> Loose, etc.). And it is up to each role to define how to behave when
>>> running in LOOSE mode and a non-role node is used then a role node comes
>>> online (like the overseer example given above).
>>> >>>>
>>> >>>> With the Strict/Loose option and sensible defaults, users cannot
>>> trip themselves up by default, but the option is there for people to tinker
>>> and have an iron grip over their cluster.
>>> >>>>
>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> wrote:
>>> >>>>>
>>> >>>>> Noble wrote:
>>> >>>>> > We are not modifying the way the "overseer role" works today. We
>>> are just changing the definition and standardizing the configuration &
>>> discoverability
>>> >>>>> Ishan wrote:
>>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER role
>>> (which currently stands for preferred overseer). We can take a stab at
>>> refactoring it later.
>>> >>>>>
>>> >>>>> Grouping these two comments together, since I think they are
>>> saying the same thing. I think this is part of my confusion. We have an old
>>> system that doesn't work the way we want the new system to work. There may
>>> be people already using the old system. What path do we offer for folks
>>> using the old system to migrate to the new system? What happens if somebody
>>> accidentally tries to use both systems at the same time?
>>> >>>>>
>>> >>>>> Ishan wrote:
>>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER role]
>>> are live, Solr guarantees that one of those nodes becomes the overseer.", I
>>> meant to somewhat capture the current behaviour as the OVERSEER role
>>> performs today. Do you see any inconsistency with this statement vs. what
>>> it does today?
>>> >>>>>
>>> >>>>> This doesn't really address my concern around what happens if all
>>> of our existing OVERSEER candidates are down. When at least one of them is
>>> up, the overseer will go there, and that is good and expected. But what
>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>> the old system, would imply that the overseer election goes to some other
>>> unrelated, untagged node. I disagree with this implementation choice. This
>>> sounds like something role specific to determine, but I would like to see
>>> us be more strict about it. I don't want cores leaking out of my data
>>> roles, I don't want query processing to leak out of my "query" nodes or
>>> whatever. Overseer shouldn't be special in this regard.
>>> >>>>>
>>> >>>>> Noble wrote:
>>> >>>>> > If we do that how do we know if xyz is a role or a node in the
>>> following request?
>>> >>>>>
>>> >>>>> You're absolutely correct, thanks for pointing this out. Let's
>>> leave it as is.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>> ichattopadhyaya@gmail.com> wrote:
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com>
>>> wrote:
>>> >>>>>>>
>>> >>>>>>> Replying to the top post in this thread because there has been a
>>> lot of discussion and I don't want to look like I'm continuing any of those
>>> particular threads.
>>> >>>>>>>
>>> >>>>>>> I finally had time to sit down and think about this with the
>>> attention it deserves and am generally happy with how the conversation has
>>> shaped the current proposal.
>>> >>>>>>>
>>> >>>>>>> GOOD: I think using system properties to define node roles is
>>> fine and I like that data is the default role when not defined. I think it
>>> is important to hold on to the guarantee that an active overseer will land
>>> on an overseer node role.
>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for folks
>>> using the current OVERSEER role. I am not sure that something can be done
>>> automatically since they need to now specify new properties at startup.
>>> Maybe we need to include loud warnings or support both approaches for a
>>> time?
>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer nodes
>>> fail, then it is implied the overseer will go to one of the data nodes. The
>>> specific wording in the SIP - "When one or more such nodes are live, Solr
>>> guarantees that one of those nodes become the overseer." implies to me that
>>> failover could go from overseer1 to overseer2 to overseerN to random node.
>>> I feel like we need to have some recording that there were dedicated
>>> overseer nodes and stop the cascading failure instead of churning through
>>> our data nodes.
>>> >>>>>>>
>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed scope of
>>> "coordinator" roles from a split query/indexing standpoint. I understand
>>> that these are used as examples, but would like stronger language that new
>>> roles should also go through their own SIP discussions.
>>> >>>>>>>
>>> >>>>>>> CLARIFICATION: I do not like that we are storing node liveness
>>> in two different places now. We have the live nodes and we have the node
>>> roles stored in two different places in zookeeper and it feels like this
>>> would lead to race conditions or split brain or other hard to diagnose bugs
>>> when those two lists don't agree with each other. This also feels like it
>>> contradicts the "single source of truth" idea later stated in the proposal.
>>> I see Gus's arguments for decoupling these and am not strongly opposed, I
>>> just get a lurking feeling about it. Even if we don't do this, I would like
>>> this called out explicitly in the alternative approaches section as
>>> something that we considered and rejected, with details why,
>>> >>>>>>>
>>> >>>>>>> GOOD: The API looks pretty clear. I would like an additional
>>> call out here that all operations are GET because nodes cannot be changed
>>> at runtime.
>>> >>>>>>> CLARIFICATION: How does this interact with the previous OVERSEER
>>> preference role?
>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of available
>>> roles for a cluster. I _think_ this could be based on the version that the
>>> cluster is running? Would be useful to be able to interrogate a cluster in
>>> the future... we're seeing OOM issues on queries, can we add some query
>>> nodes? When were they introduced? I don't know what path this API should
>>> exist at.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the SIP
>>> document. Not sure if there's a better path that we could go for.
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which parts
>>> are string literals and which parts are meant to be substituted by the
>>> operator? GET /api/cluster/roles/data would become GET
>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>> >>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1
>>> should be GET /api/cluster/roles/${nodename} dropping the intermediate
>>> "nodes"
>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that
>>> intermediate "nodes" node.
>>> >>>>>>>
>>> >>>>>>> CLARIFICATION: Should listing roles require some permissions?
>>> Maybe this requirement is too fundamental to the operation of a cluster and
>>> everybody would have to be able to do it.
>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to
>>> treat roles? Implementation detail that the servers will figure out? Or
>>> strict guidance where the client needs to check where specific roles are
>>> before sending any further communication to the server?
>>> >>>>>>> CLARIFICATION: What happens when a node gets a request that it
>>> can't fulfil? An overseer node gets a query or an update. A data node gets
>>> a collection creation request. Do they forward it on to an appropriate
>>> node, or do they reject it? Should this be configurable? If not, then it
>>> seems like lazy or poorly configured clients will defeat this isolation
>>> system quite easily.
>>> >>>>>>>
>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave when roles
>>> are added mean? I thought we established that they are not dynamic.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> Thanks,
>>> >>>>>>> Mike
>>> >>>>>>>
>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>> ichattopadhyaya@gmail.com> wrote:
>>> >>>>>>>>
>>> >>>>>>>> Hi,
>>> >>>>>>>>
>>> >>>>>>>> Here's an SIP for introducing the concept of node roles:
>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>> >>>>>>>>
>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>> >>>>>>>>
>>> >>>>>>>> We also wish to add first class support for Query nodes that
>>> are used to process user queries by forwarding to data nodes,
>>> merging/aggregating them and presenting to users. This concept exists as
>>> first class citizens in most other search engines. This is a chance for
>>> Solr to catch up.
>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>> >>>>>>>>
>>> >>>>>>>> Regards,
>>> >>>>>>>> Ishan / Noble / Hitesh
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> http://www.needhamsoftware.com (work)
>>> >>> http://www.the111shift.com (play)
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>
>>>

Re: First class support for node roles

Posted by Mike Drob <md...@mdrob.com>.

Negative roles add a lot of complexity, I would really want to stay away
from them. That’s why I want strict roles up front. It’s maybe ok to push
this decision out, but it also seems like the sort of thing we should
consider at the start.

On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <no...@gmail.com> wrote:

> Yes. Negative roles is not a bad idea. If I start a node for
> machine learning purposes, I wouldn't want that node to ever participate in
> overseer election
>
> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com> wrote:
>
>> If we have non strict roles (like overseer), then it does make sense
>> to have negative roles.
>> That way I can define which are the two nodes that I'd prefer the
>> overseer to run on, and a few other nodes on which it should
>> definitely never run for various reasons. And in case these
>> "!overseer" are the only nodes left in the cluster, let the cluster
>> fail the same way it would if there were no data nodes available.
>>
>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <ho...@gmail.com>
>> wrote:
>> >>>
>> >>> With the Strict/Loose option and sensible defaults, users cannot trip
>> themselves up by default, but the option is there for people to tinker and
>> have an iron grip over their cluster.
>> >>
>> >>
>> >> +1 to sensible defaults so users don't trip themselves. The option to
>> tinker for tighter grip can be tackled later, either on a per role basis or
>> as a generic concept later.
>> >
>> >
>> > +1 - Can definitely be added later if we so desire, not needed for this
>> SIP
>> >
>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>> ichattopadhyaya@gmail.com> wrote:
>> >>
>> >>
>> >>
>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com> wrote:
>> >>>
>> >>> I think the key  is to let the roles have full control of the
>> implications of having/not having that role. No need for even a
>> strict/loose designation. The question of do you have the role is yes/no
>> with no logic to guess if the role is implied or not, The question of will
>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>> >>>
>> >>> Once you figure out who has a role (or not) what that means is up to
>> the role code.
>> >>>
>> >>> Corollary: we don't have to change the way overseer works in this
>> SIP. We can rework it or not as we see fit separately.
>> >>
>> >>
>> >> +1
>> >>
>> >>>
>> >>>
>> >>> Only thing we need to do is find a wording that makes the above clear
>> on first read through the SIP :)
>> >>>
>> >>> -Gus
>> >>>
>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>> houstonputman@gmail.com> wrote:
>> >>>>>
>> >>>>> This doesn't really address my concern around what happens if all
>> of our existing OVERSEER candidates are down. When at least one of them is
>> up, the overseer will go there, and that is good and expected. But what
>> happens if all of the overseer eligible nodes are down. Your comment, and
>> the old system, would imply that the overseer election goes to some other
>> unrelated, untagged node. I disagree with this implementation choice. This
>> sounds like something role specific to determine, but I would like to see
>> us be more strict about it. I don't want cores leaking out of my data
>> roles, I don't want query processing to leak out of my "query" nodes or
>> whatever. Overseer shouldn't be special in this regard.
>> >>>>
>> >>>>
>> >>>> I'm very strongly in favor of not letting users design a system in
>> which the cluster can be "live" without an overseer. I understand that the
>> overseer can be taxing to the cluster, but honestly what is the point of
>> having an untaxed cluster that doesn't have an overseer? I can see
>> arguments for the other roles to be stricter about this, but there are also
>> a lot of users who wouldn't want those to be strict either (like "query"
>> nodes).
>> >>>>
>> >>>> Maybe we just put in stronger guarantees that if a non-overseer role
>> node HAS to be selected to become overseer, it will try to migrate the
>> overseer job to a node with the overseer role whenever one becomes live.
>> >>>>
>> >>>> So maybe we don't have special rules per role, but instead roles can
>> either be defined as "Strict" or "Loose" (better names likely exist), and
>> the roles come with a default (Overseer -> Loose, Data -> Strict, Query ->
>> Loose, etc.). And it is up to each role to define how to behave when
>> running in LOOSE mode and a non-role node is used then a role node comes
>> online (like the overseer example given above).
>> >>>>
>> >>>> With the Strict/Loose option and sensible defaults, users cannot
>> trip themselves up by default, but the option is there for people to tinker
>> and have an iron grip over their cluster.
>> >>>>
>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> wrote:
>> >>>>>
>> >>>>> Noble wrote:
>> >>>>> > We are not modifying the way the "overseer role" works today. We
>> are just changing the definition and standardizing the configuration &
>> discoverability
>> >>>>> Ishan wrote:
>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER role
>> (which currently stands for preferred overseer). We can take a stab at
>> refactoring it later.
>> >>>>>
>> >>>>> Grouping these two comments together, since I think they are saying
>> the same thing. I think this is part of my confusion. We have an old system
>> that doesn't work the way we want the new system to work. There may be
>> people already using the old system. What path do we offer for folks using
>> the old system to migrate to the new system? What happens if somebody
>> accidentally tries to use both systems at the same time?
>> >>>>>
>> >>>>> Ishan wrote:
>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER role]
>> are live, Solr guarantees that one of those nodes becomes the overseer.", I
>> meant to somewhat capture the current behaviour as the OVERSEER role
>> performs today. Do you see any inconsistency with this statement vs. what
>> it does today?
>> >>>>>
>> >>>>> This doesn't really address my concern around what happens if all
>> of our existing OVERSEER candidates are down. When at least one of them is
>> up, the overseer will go there, and that is good and expected. But what
>> happens if all of the overseer eligible nodes are down. Your comment, and
>> the old system, would imply that the overseer election goes to some other
>> unrelated, untagged node. I disagree with this implementation choice. This
>> sounds like something role specific to determine, but I would like to see
>> us be more strict about it. I don't want cores leaking out of my data
>> roles, I don't want query processing to leak out of my "query" nodes or
>> whatever. Overseer shouldn't be special in this regard.
>> >>>>>
>> >>>>> Noble wrote:
>> >>>>> > If we do that how do we know if xyz is a role or a node in the
>> following request?
>> >>>>>
>> >>>>> You're absolutely correct, thanks for pointing this out. Let's
>> leave it as is.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>> ichattopadhyaya@gmail.com> wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com>
>> wrote:
>> >>>>>>>
>> >>>>>>> Replying to the top post in this thread because there has been a
>> lot of discussion and I don't want to look like I'm continuing any of those
>> particular threads.
>> >>>>>>>
>> >>>>>>> I finally had time to sit down and think about this with the
>> attention it deserves and am generally happy with how the conversation has
>> shaped the current proposal.
>> >>>>>>>
>> >>>>>>> GOOD: I think using system properties to define node roles is
>> fine and I like that data is the default role when not defined. I think it
>> is important to hold on to the guarantee that an active overseer will land
>> on an overseer node role.
>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for folks
>> using the current OVERSEER role. I am not sure that something can be done
>> automatically since they need to now specify new properties at startup.
>> Maybe we need to include loud warnings or support both approaches for a
>> time?
>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer nodes
>> fail, then it is implied the overseer will go to one of the data nodes. The
>> specific wording in the SIP - "When one or more such nodes are live, Solr
>> guarantees that one of those nodes become the overseer." implies to me that
>> failover could go from overseer1 to overseer2 to overseerN to random node.
>> I feel like we need to have some recording that there were dedicated
>> overseer nodes and stop the cascading failure instead of churning through
>> our data nodes.
>> >>>>>>>
>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed scope of
>> "coordinator" roles from a split query/indexing standpoint. I understand
>> that these are used as examples, but would like stronger language that new
>> roles should also go through their own SIP discussions.
>> >>>>>>>
>> >>>>>>> CLARIFICATION: I do not like that we are storing node liveness in
>> two different places now. We have the live nodes and we have the node roles
>> stored in two different places in zookeeper and it feels like this would
>> lead to race conditions or split brain or other hard to diagnose bugs when
>> those two lists don't agree with each other. This also feels like it
>> contradicts the "single source of truth" idea later stated in the proposal.
>> I see Gus's arguments for decoupling these and am not strongly opposed, I
>> just get a lurking feeling about it. Even if we don't do this, I would like
>> this called out explicitly in the alternative approaches section as
>> something that we considered and rejected, with details why,
>> >>>>>>>
>> >>>>>>> GOOD: The API looks pretty clear. I would like an additional call
>> out here that all operations are GET because nodes cannot be changed at
>> runtime.
>> >>>>>>> CLARIFICATION: How does this interact with the previous OVERSEER
>> preference role?
>> >>>>>>> CHANGE REQUEST: An additional API to get the list of available
>> roles for a cluster. I _think_ this could be based on the version that the
>> cluster is running? Would be useful to be able to interrogate a cluster in
>> the future... we're seeing OOM issues on queries, can we add some query
>> nodes? When were they introduced? I don't know what path this API should
>> exist at.
>> >>>>>>
>> >>>>>>
>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the SIP
>> document. Not sure if there's a better path that we could go for.
>> >>>>>>
>> >>>>>>>
>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which parts
>> are string literals and which parts are meant to be substituted by the
>> operator? GET /api/cluster/roles/data would become GET
>> /api/cluster/roles/${rolename} in our SIP/documentation.
>> >>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1 should
>> be GET /api/cluster/roles/${nodename} dropping the intermediate "nodes"
>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that
>> intermediate "nodes" node.
>> >>>>>>>
>> >>>>>>> CLARIFICATION: Should listing roles require some permissions?
>> Maybe this requirement is too fundamental to the operation of a cluster and
>> everybody would have to be able to do it.
>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to
>> treat roles? Implementation detail that the servers will figure out? Or
>> strict guidance where the client needs to check where specific roles are
>> before sending any further communication to the server?
>> >>>>>>> CLARIFICATION: What happens when a node gets a request that it
>> can't fulfil? An overseer node gets a query or an update. A data node gets
>> a collection creation request. Do they forward it on to an appropriate
>> node, or do they reject it? Should this be configurable? If not, then it
>> seems like lazy or poorly configured clients will defeat this isolation
>> system quite easily.
>> >>>>>>>
>> >>>>>>> GOOD: Testing the API is very important, yes.
>> >>>>>>> CLARIFICATION: What does testing for how nodes behave when roles
>> are added mean? I thought we established that they are not dynamic.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> Mike
>> >>>>>>>
>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>> ichattopadhyaya@gmail.com> wrote:
>> >>>>>>>>
>> >>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>> Here's an SIP for introducing the concept of node roles:
>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>> >>>>>>>>
>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>> >>>>>>>>
>> >>>>>>>> We also wish to add first class support for Query nodes that are
>> used to process user queries by forwarding to data nodes,
>> merging/aggregating them and presenting to users. This concept exists as
>> first class citizens in most other search engines. This is a chance for
>> Solr to catch up.
>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>> >>>>>>>>
>> >>>>>>>> Regards,
>> >>>>>>>> Ishan / Noble / Hitesh
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> http://www.needhamsoftware.com (work)
>> >>> http://www.the111shift.com (play)
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>> For additional commands, e-mail: dev-help@solr.apache.org
>>
>>

Re: First class support for node roles

Posted by Noble Paul <no...@gmail.com>.

Yes. Negative roles is not a bad idea. If I start a node for
machine learning purposes, I wouldn't want that node to ever participate in
overseer election

On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <il...@gmail.com> wrote:

> If we have non strict roles (like overseer), then it does make sense
> to have negative roles.
> That way I can define which are the two nodes that I'd prefer the
> overseer to run on, and a few other nodes on which it should
> definitely never run for various reasons. And in case these
> "!overseer" are the only nodes left in the cluster, let the cluster
> fail the same way it would if there were no data nodes available.
>
> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <ho...@gmail.com>
> wrote:
> >>>
> >>> With the Strict/Loose option and sensible defaults, users cannot trip
> themselves up by default, but the option is there for people to tinker and
> have an iron grip over their cluster.
> >>
> >>
> >> +1 to sensible defaults so users don't trip themselves. The option to
> tinker for tighter grip can be tackled later, either on a per role basis or
> as a generic concept later.
> >
> >
> > +1 - Can definitely be added later if we so desire, not needed for this
> SIP
> >
> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
> >>
> >>
> >>
> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com> wrote:
> >>>
> >>> I think the key  is to let the roles have full control of the
> implications of having/not having that role. No need for even a
> strict/loose designation. The question of do you have the role is yes/no
> with no logic to guess if the role is implied or not, The question of will
> it come up with the role is "have_explicit ? use_defaults : use_defaults.
> >>>
> >>> Once you figure out who has a role (or not) what that means is up to
> the role code.
> >>>
> >>> Corollary: we don't have to change the way overseer works in this SIP.
> We can rework it or not as we see fit separately.
> >>
> >>
> >> +1
> >>
> >>>
> >>>
> >>> Only thing we need to do is find a wording that makes the above clear
> on first read through the SIP :)
> >>>
> >>> -Gus
> >>>
> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <ho...@gmail.com>
> wrote:
> >>>>>
> >>>>> This doesn't really address my concern around what happens if all of
> our existing OVERSEER candidates are down. When at least one of them is up,
> the overseer will go there, and that is good and expected. But what happens
> if all of the overseer eligible nodes are down. Your comment, and the old
> system, would imply that the overseer election goes to some other
> unrelated, untagged node. I disagree with this implementation choice. This
> sounds like something role specific to determine, but I would like to see
> us be more strict about it. I don't want cores leaking out of my data
> roles, I don't want query processing to leak out of my "query" nodes or
> whatever. Overseer shouldn't be special in this regard.
> >>>>
> >>>>
> >>>> I'm very strongly in favor of not letting users design a system in
> which the cluster can be "live" without an overseer. I understand that the
> overseer can be taxing to the cluster, but honestly what is the point of
> having an untaxed cluster that doesn't have an overseer? I can see
> arguments for the other roles to be stricter about this, but there are also
> a lot of users who wouldn't want those to be strict either (like "query"
> nodes).
> >>>>
> >>>> Maybe we just put in stronger guarantees that if a non-overseer role
> node HAS to be selected to become overseer, it will try to migrate the
> overseer job to a node with the overseer role whenever one becomes live.
> >>>>
> >>>> So maybe we don't have special rules per role, but instead roles can
> either be defined as "Strict" or "Loose" (better names likely exist), and
> the roles come with a default (Overseer -> Loose, Data -> Strict, Query ->
> Loose, etc.). And it is up to each role to define how to behave when
> running in LOOSE mode and a non-role node is used then a role node comes
> online (like the overseer example given above).
> >>>>
> >>>> With the Strict/Loose option and sensible defaults, users cannot trip
> themselves up by default, but the option is there for people to tinker and
> have an iron grip over their cluster.
> >>>>
> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> wrote:
> >>>>>
> >>>>> Noble wrote:
> >>>>> > We are not modifying the way the "overseer role" works today. We
> are just changing the definition and standardizing the configuration &
> discoverability
> >>>>> Ishan wrote:
> >>>>> > As of this SIP, we're not planning to modify the OVERSEER role
> (which currently stands for preferred overseer). We can take a stab at
> refactoring it later.
> >>>>>
> >>>>> Grouping these two comments together, since I think they are saying
> the same thing. I think this is part of my confusion. We have an old system
> that doesn't work the way we want the new system to work. There may be
> people already using the old system. What path do we offer for folks using
> the old system to migrate to the new system? What happens if somebody
> accidentally tries to use both systems at the same time?
> >>>>>
> >>>>> Ishan wrote:
> >>>>> > When I wrote "When one or more such nodes [with OVERSEER role] are
> live, Solr guarantees that one of those nodes becomes the overseer.", I
> meant to somewhat capture the current behaviour as the OVERSEER role
> performs today. Do you see any inconsistency with this statement vs. what
> it does today?
> >>>>>
> >>>>> This doesn't really address my concern around what happens if all of
> our existing OVERSEER candidates are down. When at least one of them is up,
> the overseer will go there, and that is good and expected. But what happens
> if all of the overseer eligible nodes are down. Your comment, and the old
> system, would imply that the overseer election goes to some other
> unrelated, untagged node. I disagree with this implementation choice. This
> sounds like something role specific to determine, but I would like to see
> us be more strict about it. I don't want cores leaking out of my data
> roles, I don't want query processing to leak out of my "query" nodes or
> whatever. Overseer shouldn't be special in this regard.
> >>>>>
> >>>>> Noble wrote:
> >>>>> > If we do that how do we know if xyz is a role or a node in the
> following request?
> >>>>>
> >>>>> You're absolutely correct, thanks for pointing this out. Let's leave
> it as is.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com> wrote:
> >>>>>>>
> >>>>>>> Replying to the top post in this thread because there has been a
> lot of discussion and I don't want to look like I'm continuing any of those
> particular threads.
> >>>>>>>
> >>>>>>> I finally had time to sit down and think about this with the
> attention it deserves and am generally happy with how the conversation has
> shaped the current proposal.
> >>>>>>>
> >>>>>>> GOOD: I think using system properties to define node roles is fine
> and I like that data is the default role when not defined. I think it is
> important to hold on to the guarantee that an active overseer will land on
> an overseer node role.
> >>>>>>> CHANGE REQUEST: I would like to see a migration path for folks
> using the current OVERSEER role. I am not sure that something can be done
> automatically since they need to now specify new properties at startup.
> Maybe we need to include loud warnings or support both approaches for a
> time?
> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer nodes
> fail, then it is implied the overseer will go to one of the data nodes. The
> specific wording in the SIP - "When one or more such nodes are live, Solr
> guarantees that one of those nodes become the overseer." implies to me that
> failover could go from overseer1 to overseer2 to overseerN to random node.
> I feel like we need to have some recording that there were dedicated
> overseer nodes and stop the cascading failure instead of churning through
> our data nodes.
> >>>>>>>
> >>>>>>> CLARIFICATION: I am slightly confused by the proposed scope of
> "coordinator" roles from a split query/indexing standpoint. I understand
> that these are used as examples, but would like stronger language that new
> roles should also go through their own SIP discussions.
> >>>>>>>
> >>>>>>> CLARIFICATION: I do not like that we are storing node liveness in
> two different places now. We have the live nodes and we have the node roles
> stored in two different places in zookeeper and it feels like this would
> lead to race conditions or split brain or other hard to diagnose bugs when
> those two lists don't agree with each other. This also feels like it
> contradicts the "single source of truth" idea later stated in the proposal.
> I see Gus's arguments for decoupling these and am not strongly opposed, I
> just get a lurking feeling about it. Even if we don't do this, I would like
> this called out explicitly in the alternative approaches section as
> something that we considered and rejected, with details why,
> >>>>>>>
> >>>>>>> GOOD: The API looks pretty clear. I would like an additional call
> out here that all operations are GET because nodes cannot be changed at
> runtime.
> >>>>>>> CLARIFICATION: How does this interact with the previous OVERSEER
> preference role?
> >>>>>>> CHANGE REQUEST: An additional API to get the list of available
> roles for a cluster. I _think_ this could be based on the version that the
> cluster is running? Would be useful to be able to interrogate a cluster in
> the future... we're seeing OOM issues on queries, can we add some query
> nodes? When were they introduced? I don't know what path this API should
> exist at.
> >>>>>>
> >>>>>>
> >>>>>> Added a GET /api/cluster/roles/supported API, updated the SIP
> document. Not sure if there's a better path that we could go for.
> >>>>>>
> >>>>>>>
> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which parts
> are string literals and which parts are meant to be substituted by the
> operator? GET /api/cluster/roles/data would become GET
> /api/cluster/roles/${rolename} in our SIP/documentation.
> >>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1 should
> be GET /api/cluster/roles/${nodename} dropping the intermediate "nodes"
> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that
> intermediate "nodes" node.
> >>>>>>>
> >>>>>>> CLARIFICATION: Should listing roles require some permissions?
> Maybe this requirement is too fundamental to the operation of a cluster and
> everybody would have to be able to do it.
> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to treat
> roles? Implementation detail that the servers will figure out? Or strict
> guidance where the client needs to check where specific roles are before
> sending any further communication to the server?
> >>>>>>> CLARIFICATION: What happens when a node gets a request that it
> can't fulfil? An overseer node gets a query or an update. A data node gets
> a collection creation request. Do they forward it on to an appropriate
> node, or do they reject it? Should this be configurable? If not, then it
> seems like lazy or poorly configured clients will defeat this isolation
> system quite easily.
> >>>>>>>
> >>>>>>> GOOD: Testing the API is very important, yes.
> >>>>>>> CLARIFICATION: What does testing for how nodes behave when roles
> are added mean? I thought we established that they are not dynamic.
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Mike
> >>>>>>>
> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Here's an SIP for introducing the concept of node roles:
> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
> >>>>>>>>
> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
> >>>>>>>>
> >>>>>>>> We also wish to add first class support for Query nodes that are
> used to process user queries by forwarding to data nodes,
> merging/aggregating them and presenting to users. This concept exists as
> first class citizens in most other search engines. This is a chance for
> Solr to catch up.
> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Ishan / Noble / Hitesh
> >>>
> >>>
> >>>
> >>> --
> >>> http://www.needhamsoftware.com (work)
> >>> http://www.the111shift.com (play)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

Re: First class support for node roles

Posted by Ilan Ginzburg <il...@gmail.com>.

If we have non strict roles (like overseer), then it does make sense
to have negative roles.
That way I can define which are the two nodes that I'd prefer the
overseer to run on, and a few other nodes on which it should
definitely never run for various reasons. And in case these
"!overseer" are the only nodes left in the cluster, let the cluster
fail the same way it would if there were no data nodes available.

On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <ho...@gmail.com> wrote:
>>>
>>> With the Strict/Loose option and sensible defaults, users cannot trip themselves up by default, but the option is there for people to tinker and have an iron grip over their cluster.
>>
>>
>> +1 to sensible defaults so users don't trip themselves. The option to tinker for tighter grip can be tackled later, either on a per role basis or as a generic concept later.
>
>
> +1 - Can definitely be added later if we so desire, not needed for this SIP
>
> On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <ic...@gmail.com> wrote:
>>
>>
>>
>> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com> wrote:
>>>
>>> I think the key  is to let the roles have full control of the implications of having/not having that role. No need for even a strict/loose designation. The question of do you have the role is yes/no with no logic to guess if the role is implied or not, The question of will it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>>
>>> Once you figure out who has a role (or not) what that means is up to the role code.
>>>
>>> Corollary: we don't have to change the way overseer works in this SIP. We can rework it or not as we see fit separately.
>>
>>
>> +1
>>
>>>
>>>
>>> Only thing we need to do is find a wording that makes the above clear on first read through the SIP :)
>>>
>>> -Gus
>>>
>>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <ho...@gmail.com> wrote:
>>>>>
>>>>> This doesn't really address my concern around what happens if all of our existing OVERSEER candidates are down. When at least one of them is up, the overseer will go there, and that is good and expected. But what happens if all of the overseer eligible nodes are down. Your comment, and the old system, would imply that the overseer election goes to some other unrelated, untagged node. I disagree with this implementation choice. This sounds like something role specific to determine, but I would like to see us be more strict about it. I don't want cores leaking out of my data roles, I don't want query processing to leak out of my "query" nodes or whatever. Overseer shouldn't be special in this regard.
>>>>
>>>>
>>>> I'm very strongly in favor of not letting users design a system in which the cluster can be "live" without an overseer. I understand that the overseer can be taxing to the cluster, but honestly what is the point of having an untaxed cluster that doesn't have an overseer? I can see arguments for the other roles to be stricter about this, but there are also a lot of users who wouldn't want those to be strict either (like "query" nodes).
>>>>
>>>> Maybe we just put in stronger guarantees that if a non-overseer role node HAS to be selected to become overseer, it will try to migrate the overseer job to a node with the overseer role whenever one becomes live.
>>>>
>>>> So maybe we don't have special rules per role, but instead roles can either be defined as "Strict" or "Loose" (better names likely exist), and the roles come with a default (Overseer -> Loose, Data -> Strict, Query -> Loose, etc.). And it is up to each role to define how to behave when running in LOOSE mode and a non-role node is used then a role node comes online (like the overseer example given above).
>>>>
>>>> With the Strict/Loose option and sensible defaults, users cannot trip themselves up by default, but the option is there for people to tinker and have an iron grip over their cluster.
>>>>
>>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> wrote:
>>>>>
>>>>> Noble wrote:
>>>>> > We are not modifying the way the "overseer role" works today. We are just changing the definition and standardizing the configuration & discoverability
>>>>> Ishan wrote:
>>>>> > As of this SIP, we're not planning to modify the OVERSEER role (which currently stands for preferred overseer). We can take a stab at refactoring it later.
>>>>>
>>>>> Grouping these two comments together, since I think they are saying the same thing. I think this is part of my confusion. We have an old system that doesn't work the way we want the new system to work. There may be people already using the old system. What path do we offer for folks using the old system to migrate to the new system? What happens if somebody accidentally tries to use both systems at the same time?
>>>>>
>>>>> Ishan wrote:
>>>>> > When I wrote "When one or more such nodes [with OVERSEER role] are live, Solr guarantees that one of those nodes becomes the overseer.", I meant to somewhat capture the current behaviour as the OVERSEER role performs today. Do you see any inconsistency with this statement vs. what it does today?
>>>>>
>>>>> This doesn't really address my concern around what happens if all of our existing OVERSEER candidates are down. When at least one of them is up, the overseer will go there, and that is good and expected. But what happens if all of the overseer eligible nodes are down. Your comment, and the old system, would imply that the overseer election goes to some other unrelated, untagged node. I disagree with this implementation choice. This sounds like something role specific to determine, but I would like to see us be more strict about it. I don't want cores leaking out of my data roles, I don't want query processing to leak out of my "query" nodes or whatever. Overseer shouldn't be special in this regard.
>>>>>
>>>>> Noble wrote:
>>>>> > If we do that how do we know if xyz is a role or a node in the following request?
>>>>>
>>>>> You're absolutely correct, thanks for pointing this out. Let's leave it as is.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <ic...@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com> wrote:
>>>>>>>
>>>>>>> Replying to the top post in this thread because there has been a lot of discussion and I don't want to look like I'm continuing any of those particular threads.
>>>>>>>
>>>>>>> I finally had time to sit down and think about this with the attention it deserves and am generally happy with how the conversation has shaped the current proposal.
>>>>>>>
>>>>>>> GOOD: I think using system properties to define node roles is fine and I like that data is the default role when not defined. I think it is important to hold on to the guarantee that an active overseer will land on an overseer node role.
>>>>>>> CHANGE REQUEST: I would like to see a migration path for folks using the current OVERSEER role. I am not sure that something can be done automatically since they need to now specify new properties at startup. Maybe we need to include loud warnings or support both approaches for a time?
>>>>>>> CHANGE REQUEST: I do not like that if all of the overseer nodes fail, then it is implied the overseer will go to one of the data nodes. The specific wording in the SIP - "When one or more such nodes are live, Solr guarantees that one of those nodes become the overseer." implies to me that failover could go from overseer1 to overseer2 to overseerN to random node. I feel like we need to have some recording that there were dedicated overseer nodes and stop the cascading failure instead of churning through our data nodes.
>>>>>>>
>>>>>>> CLARIFICATION: I am slightly confused by the proposed scope of "coordinator" roles from a split query/indexing standpoint. I understand that these are used as examples, but would like stronger language that new roles should also go through their own SIP discussions.
>>>>>>>
>>>>>>> CLARIFICATION: I do not like that we are storing node liveness in two different places now. We have the live nodes and we have the node roles stored in two different places in zookeeper and it feels like this would lead to race conditions or split brain or other hard to diagnose bugs when those two lists don't agree with each other. This also feels like it contradicts the "single source of truth" idea later stated in the proposal. I see Gus's arguments for decoupling these and am not strongly opposed, I just get a lurking feeling about it. Even if we don't do this, I would like this called out explicitly in the alternative approaches section as something that we considered and rejected, with details why,
>>>>>>>
>>>>>>> GOOD: The API looks pretty clear. I would like an additional call out here that all operations are GET because nodes cannot be changed at runtime.
>>>>>>> CLARIFICATION: How does this interact with the previous OVERSEER preference role?
>>>>>>> CHANGE REQUEST: An additional API to get the list of available roles for a cluster. I _think_ this could be based on the version that the cluster is running? Would be useful to be able to interrogate a cluster in the future... we're seeing OOM issues on queries, can we add some query nodes? When were they introduced? I don't know what path this API should exist at.
>>>>>>
>>>>>>
>>>>>> Added a GET /api/cluster/roles/supported API, updated the SIP document. Not sure if there's a better path that we could go for.
>>>>>>
>>>>>>>
>>>>>>> CLARIFICATION: Can we list the APIs to clearly show which parts are string literals and which parts are meant to be substituted by the operator? GET /api/cluster/roles/data would become GET /api/cluster/roles/${rolename} in our SIP/documentation.
>>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1 should be GET /api/cluster/roles/${nodename} dropping the intermediate "nodes"
>>>>>>> CHANGE REQUEST: The ZK structure also might not need that intermediate "nodes" node.
>>>>>>>
>>>>>>> CLARIFICATION: Should listing roles require some permissions? Maybe this requirement is too fundamental to the operation of a cluster and everybody would have to be able to do it.
>>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to treat roles? Implementation detail that the servers will figure out? Or strict guidance where the client needs to check where specific roles are before sending any further communication to the server?
>>>>>>> CLARIFICATION: What happens when a node gets a request that it can't fulfil? An overseer node gets a query or an update. A data node gets a collection creation request. Do they forward it on to an appropriate node, or do they reject it? Should this be configurable? If not, then it seems like lazy or poorly configured clients will defeat this isolation system quite easily.
>>>>>>>
>>>>>>> GOOD: Testing the API is very important, yes.
>>>>>>> CLARIFICATION: What does testing for how nodes behave when roles are added mean? I thought we established that they are not dynamic.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mike
>>>>>>>
>>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <ic...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>>>>
>>>>>>>> We also wish to add first class support for Query nodes that are used to process user queries by forwarding to data nodes, merging/aggregating them and presenting to users. This concept exists as first class citizens in most other search engines. This is a chance for Solr to catch up.
>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ishan / Noble / Hitesh
>>>
>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org

Re: First class support for node roles

Posted by Houston Putman <ho...@gmail.com>.

>
> With the Strict/Loose option and sensible defaults, users cannot trip
>> themselves up by default, but the option is there for people to tinker and
>> have an iron grip over their cluster.
>>
>
> +1 to sensible defaults so users don't trip themselves. The option to
> tinker for tighter grip can be tackled later, either on a per role basis or
> as a generic concept later.
>

+1 - Can definitely be added later if we so desire, not needed for this SIP

On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

>
>
> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com> wrote:
>
>> I think the key  is to let the roles have full control of the
>> implications of having/not having that role. No need for even a
>> strict/loose designation. The question of do you have the role is yes/no
>> with no logic to guess if the role is implied or not, The question of will
>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>
>> Once you figure out who has a role (or not) what that means is up to the
>> role code.
>>
>> Corollary: we don't have to change the way overseer works in this SIP. We
>> can rework it or not as we see fit separately.
>>
>
> +1
>
>
>>
>> Only thing we need to do is find a wording that makes the above clear on
>> first read through the SIP :)
>>
>> -Gus
>>
>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <ho...@gmail.com>
>> wrote:
>>
>>> This doesn't really address my concern around what happens if all of our
>>>> existing OVERSEER candidates are down. When at least one of them is up, the
>>>> overseer will go there, and that is good and expected. But what happens if
>>>> all of the overseer eligible nodes are down. Your comment, and the old
>>>> system, would imply that the overseer election goes to some other
>>>> unrelated, untagged node. I disagree with this implementation choice. This
>>>> sounds like something role specific to determine, but I would like to see
>>>> us be more strict about it. I don't want cores leaking out of my data
>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>> whatever. Overseer shouldn't be special in this regard.
>>>>
>>>
>>> I'm very strongly in favor of not letting users design a system in which
>>> the cluster can be "live" without an overseer. I understand that the
>>> overseer can be taxing to the cluster, but honestly what is the point of
>>> having an untaxed cluster that doesn't have an overseer? I can see
>>> arguments for the other roles to be stricter about this, but there are also
>>> a lot of users who wouldn't want those to be strict either (like "query"
>>> nodes).
>>>
>>> Maybe we just put in stronger guarantees that if a non-overseer role
>>> node HAS to be selected to become overseer, it will try to migrate the
>>> overseer job to a node with the overseer role whenever one becomes live.
>>>
>>> So maybe we don't have special rules per role, but instead roles can
>>> either be defined as "Strict" or "Loose" (better names likely exist), and
>>> the roles come with a default (Overseer -> Loose, Data -> Strict, Query ->
>>> Loose, etc.). And it is up to each role to define how to behave when
>>> running in LOOSE mode and a non-role node is used then a role node comes
>>> online (like the overseer example given above).
>>>
>>> With the Strict/Loose option and sensible defaults, users cannot trip
>>> themselves up by default, but the option is there for people to tinker and
>>> have an iron grip over their cluster.
>>>
>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> wrote:
>>>
>>>> Noble wrote:
>>>> > We are not modifying the way the "overseer role" works today. We are
>>>> just changing the definition and standardizing the configuration &
>>>> discoverability
>>>> Ishan wrote:
>>>> > As of this SIP, we're not planning to modify the OVERSEER role (which
>>>> currently stands for preferred overseer). We can take a stab at refactoring
>>>> it later.
>>>>
>>>> Grouping these two comments together, since I think they are saying
>>>> the same thing. I think this is part of my confusion. We have an old system
>>>> that doesn't work the way we want the new system to work. There may be
>>>> people already using the old system. What path do we offer for folks using
>>>> the old system to migrate to the new system? What happens if somebody
>>>> accidentally tries to use both systems at the same time?
>>>>
>>>> Ishan wrote:
>>>> > When I wrote "When one or more such nodes [with OVERSEER role] are
>>>> live, Solr guarantees that one of those nodes becomes the overseer.",
>>>> I meant to somewhat capture the current behaviour as the OVERSEER role performs
>>>> today. Do you see any inconsistency with this statement vs. what it does
>>>> today?
>>>>
>>>> This doesn't really address my concern around what happens if all of
>>>> our existing OVERSEER candidates are down. When at least one of them is up,
>>>> the overseer will go there, and that is good and expected. But what happens
>>>> if all of the overseer eligible nodes are down. Your comment, and the old
>>>> system, would imply that the overseer election goes to some other
>>>> unrelated, untagged node. I disagree with this implementation choice. This
>>>> sounds like something role specific to determine, but I would like to see
>>>> us be more strict about it. I don't want cores leaking out of my data
>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>> whatever. Overseer shouldn't be special in this regard.
>>>>
>>>> Noble wrote:
>>>> > If we do that how do we know if xyz is a role or a node in the
>>>> following request?
>>>>
>>>> You're absolutely correct, thanks for pointing this out. Let's leave it
>>>> as is.
>>>>
>>>>
>>>>
>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>>> ichattopadhyaya@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com> wrote:
>>>>>
>>>>>> Replying to the top post in this thread because there has been a lot
>>>>>> of discussion and I don't want to look like I'm continuing any of those
>>>>>> particular threads.
>>>>>>
>>>>>> I finally had time to sit down and think about this with the
>>>>>> attention it deserves and am generally happy with how the conversation has
>>>>>> shaped the current proposal.
>>>>>>
>>>>>> GOOD: I think using system properties to define node roles is fine
>>>>>> and I like that data is the default role when not defined. I think it is
>>>>>> important to hold on to the guarantee that an active overseer will land on
>>>>>> an overseer node role.
>>>>>> CHANGE REQUEST: I would like to see a migration path for folks using
>>>>>> the current OVERSEER role. I am not sure that something can be done
>>>>>> automatically since they need to now specify new properties at startup.
>>>>>> Maybe we need to include loud warnings or support both approaches for a
>>>>>> time?
>>>>>> CHANGE REQUEST: I do not like that if all of the overseer nodes fail,
>>>>>> then it is implied the overseer will go to one of the data nodes. The
>>>>>> specific wording in the SIP - "When one or more such nodes are live, Solr
>>>>>> guarantees that one of those nodes become the overseer." implies to me that
>>>>>> failover could go from overseer1 to overseer2 to overseerN to random node.
>>>>>> I feel like we need to have some recording that there were dedicated
>>>>>> overseer nodes and stop the cascading failure instead of churning through
>>>>>> our data nodes.
>>>>>>
>>>>>> CLARIFICATION: I am slightly confused by the proposed scope of
>>>>>> "coordinator" roles from a split query/indexing standpoint. I understand
>>>>>> that these are used as examples, but would like stronger language that new
>>>>>> roles should also go through their own SIP discussions.
>>>>>>
>>>>>> CLARIFICATION: I do not like that we are storing node liveness in two
>>>>>> different places now. We have the live nodes and we have the node roles
>>>>>> stored in two different places in zookeeper and it feels like this would
>>>>>> lead to race conditions or split brain or other hard to diagnose bugs when
>>>>>> those two lists don't agree with each other. This also feels like it
>>>>>> contradicts the "single source of truth" idea later stated in the proposal.
>>>>>> I see Gus's arguments for decoupling these and am not strongly opposed, I
>>>>>> just get a lurking feeling about it. Even if we don't do this, I would like
>>>>>> this called out explicitly in the alternative approaches section as
>>>>>> something that we considered and rejected, with details why,
>>>>>>
>>>>>> GOOD: The API looks pretty clear. I would like an additional call out
>>>>>> here that all operations are GET because nodes cannot be changed at runtime.
>>>>>> CLARIFICATION: How does this interact with the previous OVERSEER
>>>>>> preference role?
>>>>>> CHANGE REQUEST: An additional API to get the list of available roles
>>>>>> for a cluster. I _think_ this could be based on the version that the
>>>>>> cluster is running? Would be useful to be able to interrogate a cluster in
>>>>>> the future... we're seeing OOM issues on queries, can we add some query
>>>>>> nodes? When were they introduced? I don't know what path this API should
>>>>>> exist at.
>>>>>>
>>>>>
>>>>> Added a *GET /api/cluster/roles/supported* API, updated the SIP
>>>>> document. Not sure if there's a better path that we could go for.
>>>>>
>>>>>
>>>>>> CLARIFICATION: Can we list the APIs to clearly show which parts are
>>>>>> string literals and which parts are meant to be substituted by the
>>>>>> operator? *GET **/api/cluster/roles/data *would become *GET **/api/cluster/roles/${rolename}
>>>>>> *in our SIP/documentation.
>>>>>> CHANGE REQUEST: I think *GET /api/cluster/roles/nodes/node1* should
>>>>>> be *GET /api/cluster/roles/${nodename}* dropping the intermediate
>>>>>> "nodes"
>>>>>> CHANGE REQUEST: The ZK structure also might not need that
>>>>>> intermediate "nodes" node.
>>>>>>
>>>>>> CLARIFICATION: Should listing roles require some permissions? Maybe
>>>>>> this requirement is too fundamental to the operation of a cluster and
>>>>>> everybody would have to be able to do it.
>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to treat
>>>>>> roles? Implementation detail that the servers will figure out? Or strict
>>>>>> guidance where the client needs to check where specific roles are before
>>>>>> sending any further communication to the server?
>>>>>> CLARIFICATION: What happens when a node gets a request that it can't
>>>>>> fulfil? An overseer node gets a query or an update. A data node gets a
>>>>>> collection creation request. Do they forward it on to an appropriate node,
>>>>>> or do they reject it? Should this be configurable? If not, then it seems
>>>>>> like lazy or poorly configured clients will defeat this isolation system
>>>>>> quite easily.
>>>>>>
>>>>>> GOOD: Testing the API is very important, yes.
>>>>>> CLARIFICATION: What does testing for how nodes behave when roles are
>>>>>> added mean? I thought we established that they are not dynamic.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Mike
>>>>>>
>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>>>
>>>>>>> We also wish to add first class support for Query nodes that are
>>>>>>> used to process user queries by forwarding to data nodes,
>>>>>>> merging/aggregating them and presenting to users. This concept exists as
>>>>>>> first class citizens in most other search engines. This is a chance for
>>>>>>> Solr to catch up.
>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ishan / Noble / Hitesh
>>>>>>>
>>>>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

Re: First class support for node roles

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gu...@gmail.com> wrote:

> I think the key  is to let the roles have full control of the implications
> of having/not having that role. No need for even a strict/loose
> designation. The question of do you have the role is yes/no with no logic
> to guess if the role is implied or not, The question of will it come up
> with the role is "have_explicit ? use_defaults : use_defaults.
>
> Once you figure out who has a role (or not) what that means is up to the
> role code.
>
> Corollary: we don't have to change the way overseer works in this SIP. We
> can rework it or not as we see fit separately.
>

+1


>
> Only thing we need to do is find a wording that makes the above clear on
> first read through the SIP :)
>
> -Gus
>
> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <ho...@gmail.com>
> wrote:
>
>> This doesn't really address my concern around what happens if all of our
>>> existing OVERSEER candidates are down. When at least one of them is up, the
>>> overseer will go there, and that is good and expected. But what happens if
>>> all of the overseer eligible nodes are down. Your comment, and the old
>>> system, would imply that the overseer election goes to some other
>>> unrelated, untagged node. I disagree with this implementation choice. This
>>> sounds like something role specific to determine, but I would like to see
>>> us be more strict about it. I don't want cores leaking out of my data
>>> roles, I don't want query processing to leak out of my "query" nodes or
>>> whatever. Overseer shouldn't be special in this regard.
>>>
>>
>> I'm very strongly in favor of not letting users design a system in which
>> the cluster can be "live" without an overseer. I understand that the
>> overseer can be taxing to the cluster, but honestly what is the point of
>> having an untaxed cluster that doesn't have an overseer? I can see
>> arguments for the other roles to be stricter about this, but there are also
>> a lot of users who wouldn't want those to be strict either (like "query"
>> nodes).
>>
>> Maybe we just put in stronger guarantees that if a non-overseer role node
>> HAS to be selected to become overseer, it will try to migrate the overseer
>> job to a node with the overseer role whenever one becomes live.
>>
>> So maybe we don't have special rules per role, but instead roles can
>> either be defined as "Strict" or "Loose" (better names likely exist), and
>> the roles come with a default (Overseer -> Loose, Data -> Strict, Query ->
>> Loose, etc.). And it is up to each role to define how to behave when
>> running in LOOSE mode and a non-role node is used then a role node comes
>> online (like the overseer example given above).
>>
>> With the Strict/Loose option and sensible defaults, users cannot trip
>> themselves up by default, but the option is there for people to tinker and
>> have an iron grip over their cluster.
>>
>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> wrote:
>>
>>> Noble wrote:
>>> > We are not modifying the way the "overseer role" works today. We are
>>> just changing the definition and standardizing the configuration &
>>> discoverability
>>> Ishan wrote:
>>> > As of this SIP, we're not planning to modify the OVERSEER role (which
>>> currently stands for preferred overseer). We can take a stab at refactoring
>>> it later.
>>>
>>> Grouping these two comments together, since I think they are saying
>>> the same thing. I think this is part of my confusion. We have an old system
>>> that doesn't work the way we want the new system to work. There may be
>>> people already using the old system. What path do we offer for folks using
>>> the old system to migrate to the new system? What happens if somebody
>>> accidentally tries to use both systems at the same time?
>>>
>>> Ishan wrote:
>>> > When I wrote "When one or more such nodes [with OVERSEER role] are
>>> live, Solr guarantees that one of those nodes becomes the overseer.", I
>>> meant to somewhat capture the current behaviour as the OVERSEER role performs
>>> today. Do you see any inconsistency with this statement vs. what it does
>>> today?
>>>
>>> This doesn't really address my concern around what happens if all of our
>>> existing OVERSEER candidates are down. When at least one of them is up, the
>>> overseer will go there, and that is good and expected. But what happens if
>>> all of the overseer eligible nodes are down. Your comment, and the old
>>> system, would imply that the overseer election goes to some other
>>> unrelated, untagged node. I disagree with this implementation choice. This
>>> sounds like something role specific to determine, but I would like to see
>>> us be more strict about it. I don't want cores leaking out of my data
>>> roles, I don't want query processing to leak out of my "query" nodes or
>>> whatever. Overseer shouldn't be special in this regard.
>>>
>>> Noble wrote:
>>> > If we do that how do we know if xyz is a role or a node in the
>>> following request?
>>>
>>> You're absolutely correct, thanks for pointing this out. Let's leave it
>>> as is.
>>>
>>>
>>>
>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>> ichattopadhyaya@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com> wrote:
>>>>
>>>>> Replying to the top post in this thread because there has been a lot
>>>>> of discussion and I don't want to look like I'm continuing any of those
>>>>> particular threads.
>>>>>
>>>>> I finally had time to sit down and think about this with the attention
>>>>> it deserves and am generally happy with how the conversation has shaped the
>>>>> current proposal.
>>>>>
>>>>> GOOD: I think using system properties to define node roles is fine and
>>>>> I like that data is the default role when not defined. I think it is
>>>>> important to hold on to the guarantee that an active overseer will land on
>>>>> an overseer node role.
>>>>> CHANGE REQUEST: I would like to see a migration path for folks using
>>>>> the current OVERSEER role. I am not sure that something can be done
>>>>> automatically since they need to now specify new properties at startup.
>>>>> Maybe we need to include loud warnings or support both approaches for a
>>>>> time?
>>>>> CHANGE REQUEST: I do not like that if all of the overseer nodes fail,
>>>>> then it is implied the overseer will go to one of the data nodes. The
>>>>> specific wording in the SIP - "When one or more such nodes are live, Solr
>>>>> guarantees that one of those nodes become the overseer." implies to me that
>>>>> failover could go from overseer1 to overseer2 to overseerN to random node.
>>>>> I feel like we need to have some recording that there were dedicated
>>>>> overseer nodes and stop the cascading failure instead of churning through
>>>>> our data nodes.
>>>>>
>>>>> CLARIFICATION: I am slightly confused by the proposed scope of
>>>>> "coordinator" roles from a split query/indexing standpoint. I understand
>>>>> that these are used as examples, but would like stronger language that new
>>>>> roles should also go through their own SIP discussions.
>>>>>
>>>>> CLARIFICATION: I do not like that we are storing node liveness in two
>>>>> different places now. We have the live nodes and we have the node roles
>>>>> stored in two different places in zookeeper and it feels like this would
>>>>> lead to race conditions or split brain or other hard to diagnose bugs when
>>>>> those two lists don't agree with each other. This also feels like it
>>>>> contradicts the "single source of truth" idea later stated in the proposal.
>>>>> I see Gus's arguments for decoupling these and am not strongly opposed, I
>>>>> just get a lurking feeling about it. Even if we don't do this, I would like
>>>>> this called out explicitly in the alternative approaches section as
>>>>> something that we considered and rejected, with details why,
>>>>>
>>>>> GOOD: The API looks pretty clear. I would like an additional call out
>>>>> here that all operations are GET because nodes cannot be changed at runtime.
>>>>> CLARIFICATION: How does this interact with the previous OVERSEER
>>>>> preference role?
>>>>> CHANGE REQUEST: An additional API to get the list of available roles
>>>>> for a cluster. I _think_ this could be based on the version that the
>>>>> cluster is running? Would be useful to be able to interrogate a cluster in
>>>>> the future... we're seeing OOM issues on queries, can we add some query
>>>>> nodes? When were they introduced? I don't know what path this API should
>>>>> exist at.
>>>>>
>>>>
>>>> Added a *GET /api/cluster/roles/supported* API, updated the SIP
>>>> document. Not sure if there's a better path that we could go for.
>>>>
>>>>
>>>>> CLARIFICATION: Can we list the APIs to clearly show which parts are
>>>>> string literals and which parts are meant to be substituted by the
>>>>> operator? *GET **/api/cluster/roles/data *would become *GET **/api/cluster/roles/${rolename}
>>>>> *in our SIP/documentation.
>>>>> CHANGE REQUEST: I think *GET /api/cluster/roles/nodes/node1* should be
>>>>>  *GET /api/cluster/roles/${nodename}* dropping the intermediate
>>>>> "nodes"
>>>>> CHANGE REQUEST: The ZK structure also might not need that intermediate
>>>>> "nodes" node.
>>>>>
>>>>> CLARIFICATION: Should listing roles require some permissions? Maybe
>>>>> this requirement is too fundamental to the operation of a cluster and
>>>>> everybody would have to be able to do it.
>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to treat
>>>>> roles? Implementation detail that the servers will figure out? Or strict
>>>>> guidance where the client needs to check where specific roles are before
>>>>> sending any further communication to the server?
>>>>> CLARIFICATION: What happens when a node gets a request that it can't
>>>>> fulfil? An overseer node gets a query or an update. A data node gets a
>>>>> collection creation request. Do they forward it on to an appropriate node,
>>>>> or do they reject it? Should this be configurable? If not, then it seems
>>>>> like lazy or poorly configured clients will defeat this isolation system
>>>>> quite easily.
>>>>>
>>>>> GOOD: Testing the API is very important, yes.
>>>>> CLARIFICATION: What does testing for how nodes behave when roles are
>>>>> added mean? I thought we established that they are not dynamic.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Mike
>>>>>
>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>>
>>>>>> We also wish to add first class support for Query nodes that are used
>>>>>> to process user queries by forwarding to data nodes, merging/aggregating
>>>>>> them and presenting to users. This concept exists as first class citizens
>>>>>> in most other search engines. This is a chance for Solr to catch up.
>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>>
>>>>>> Regards,
>>>>>> Ishan / Noble / Hitesh
>>>>>>
>>>>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: First class support for node roles

Posted by Gus Heck <gu...@gmail.com>.

argh bad edit... The question of will it come up with the role is
"have_explicit ? use_explicit : use_defaults.

On Wed, Dec 1, 2021 at 3:01 PM Gus Heck <gu...@gmail.com> wrote:

> I think the key  is to let the roles have full control of the implications
> of having/not having that role. No need for even a strict/loose
> designation. The question of do you have the role is yes/no with no logic
> to guess if the role is implied or not, The question of will it come up
> with the role is "have_explicit ? use_defaults : use_defaults.
>
> Once you figure out who has a role (or not) what that means is up to the
> role code.
>
> Corollary: we don't have to change the way overseer works in this SIP. We
> can rework it or not as we see fit separately.
>
> Only thing we need to do is find a wording that makes the above clear on
> first read through the SIP :)
>
> -Gus
>
> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <ho...@gmail.com>
> wrote:
>
>> This doesn't really address my concern around what happens if all of our
>>> existing OVERSEER candidates are down. When at least one of them is up, the
>>> overseer will go there, and that is good and expected. But what happens if
>>> all of the overseer eligible nodes are down. Your comment, and the old
>>> system, would imply that the overseer election goes to some other
>>> unrelated, untagged node. I disagree with this implementation choice. This
>>> sounds like something role specific to determine, but I would like to see
>>> us be more strict about it. I don't want cores leaking out of my data
>>> roles, I don't want query processing to leak out of my "query" nodes or
>>> whatever. Overseer shouldn't be special in this regard.
>>>
>>
>> I'm very strongly in favor of not letting users design a system in which
>> the cluster can be "live" without an overseer. I understand that the
>> overseer can be taxing to the cluster, but honestly what is the point of
>> having an untaxed cluster that doesn't have an overseer? I can see
>> arguments for the other roles to be stricter about this, but there are also
>> a lot of users who wouldn't want those to be strict either (like "query"
>> nodes).
>>
>> Maybe we just put in stronger guarantees that if a non-overseer role node
>> HAS to be selected to become overseer, it will try to migrate the overseer
>> job to a node with the overseer role whenever one becomes live.
>>
>> So maybe we don't have special rules per role, but instead roles can
>> either be defined as "Strict" or "Loose" (better names likely exist), and
>> the roles come with a default (Overseer -> Loose, Data -> Strict, Query ->
>> Loose, etc.). And it is up to each role to define how to behave when
>> running in LOOSE mode and a non-role node is used then a role node comes
>> online (like the overseer example given above).
>>
>> With the Strict/Loose option and sensible defaults, users cannot trip
>> themselves up by default, but the option is there for people to tinker and
>> have an iron grip over their cluster.
>>
>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> wrote:
>>
>>> Noble wrote:
>>> > We are not modifying the way the "overseer role" works today. We are
>>> just changing the definition and standardizing the configuration &
>>> discoverability
>>> Ishan wrote:
>>> > As of this SIP, we're not planning to modify the OVERSEER role (which
>>> currently stands for preferred overseer). We can take a stab at refactoring
>>> it later.
>>>
>>> Grouping these two comments together, since I think they are saying
>>> the same thing. I think this is part of my confusion. We have an old system
>>> that doesn't work the way we want the new system to work. There may be
>>> people already using the old system. What path do we offer for folks using
>>> the old system to migrate to the new system? What happens if somebody
>>> accidentally tries to use both systems at the same time?
>>>
>>> Ishan wrote:
>>> > When I wrote "When one or more such nodes [with OVERSEER role] are
>>> live, Solr guarantees that one of those nodes becomes the overseer.", I
>>> meant to somewhat capture the current behaviour as the OVERSEER role performs
>>> today. Do you see any inconsistency with this statement vs. what it does
>>> today?
>>>
>>> This doesn't really address my concern around what happens if all of our
>>> existing OVERSEER candidates are down. When at least one of them is up, the
>>> overseer will go there, and that is good and expected. But what happens if
>>> all of the overseer eligible nodes are down. Your comment, and the old
>>> system, would imply that the overseer election goes to some other
>>> unrelated, untagged node. I disagree with this implementation choice. This
>>> sounds like something role specific to determine, but I would like to see
>>> us be more strict about it. I don't want cores leaking out of my data
>>> roles, I don't want query processing to leak out of my "query" nodes or
>>> whatever. Overseer shouldn't be special in this regard.
>>>
>>> Noble wrote:
>>> > If we do that how do we know if xyz is a role or a node in the
>>> following request?
>>>
>>> You're absolutely correct, thanks for pointing this out. Let's leave it
>>> as is.
>>>
>>>
>>>
>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>> ichattopadhyaya@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com> wrote:
>>>>
>>>>> Replying to the top post in this thread because there has been a lot
>>>>> of discussion and I don't want to look like I'm continuing any of those
>>>>> particular threads.
>>>>>
>>>>> I finally had time to sit down and think about this with the attention
>>>>> it deserves and am generally happy with how the conversation has shaped the
>>>>> current proposal.
>>>>>
>>>>> GOOD: I think using system properties to define node roles is fine and
>>>>> I like that data is the default role when not defined. I think it is
>>>>> important to hold on to the guarantee that an active overseer will land on
>>>>> an overseer node role.
>>>>> CHANGE REQUEST: I would like to see a migration path for folks using
>>>>> the current OVERSEER role. I am not sure that something can be done
>>>>> automatically since they need to now specify new properties at startup.
>>>>> Maybe we need to include loud warnings or support both approaches for a
>>>>> time?
>>>>> CHANGE REQUEST: I do not like that if all of the overseer nodes fail,
>>>>> then it is implied the overseer will go to one of the data nodes. The
>>>>> specific wording in the SIP - "When one or more such nodes are live, Solr
>>>>> guarantees that one of those nodes become the overseer." implies to me that
>>>>> failover could go from overseer1 to overseer2 to overseerN to random node.
>>>>> I feel like we need to have some recording that there were dedicated
>>>>> overseer nodes and stop the cascading failure instead of churning through
>>>>> our data nodes.
>>>>>
>>>>> CLARIFICATION: I am slightly confused by the proposed scope of
>>>>> "coordinator" roles from a split query/indexing standpoint. I understand
>>>>> that these are used as examples, but would like stronger language that new
>>>>> roles should also go through their own SIP discussions.
>>>>>
>>>>> CLARIFICATION: I do not like that we are storing node liveness in two
>>>>> different places now. We have the live nodes and we have the node roles
>>>>> stored in two different places in zookeeper and it feels like this would
>>>>> lead to race conditions or split brain or other hard to diagnose bugs when
>>>>> those two lists don't agree with each other. This also feels like it
>>>>> contradicts the "single source of truth" idea later stated in the proposal.
>>>>> I see Gus's arguments for decoupling these and am not strongly opposed, I
>>>>> just get a lurking feeling about it. Even if we don't do this, I would like
>>>>> this called out explicitly in the alternative approaches section as
>>>>> something that we considered and rejected, with details why,
>>>>>
>>>>> GOOD: The API looks pretty clear. I would like an additional call out
>>>>> here that all operations are GET because nodes cannot be changed at runtime.
>>>>> CLARIFICATION: How does this interact with the previous OVERSEER
>>>>> preference role?
>>>>> CHANGE REQUEST: An additional API to get the list of available roles
>>>>> for a cluster. I _think_ this could be based on the version that the
>>>>> cluster is running? Would be useful to be able to interrogate a cluster in
>>>>> the future... we're seeing OOM issues on queries, can we add some query
>>>>> nodes? When were they introduced? I don't know what path this API should
>>>>> exist at.
>>>>>
>>>>
>>>> Added a *GET /api/cluster/roles/supported* API, updated the SIP
>>>> document. Not sure if there's a better path that we could go for.
>>>>
>>>>
>>>>> CLARIFICATION: Can we list the APIs to clearly show which parts are
>>>>> string literals and which parts are meant to be substituted by the
>>>>> operator? *GET **/api/cluster/roles/data *would become *GET **/api/cluster/roles/${rolename}
>>>>> *in our SIP/documentation.
>>>>> CHANGE REQUEST: I think *GET /api/cluster/roles/nodes/node1* should be
>>>>>  *GET /api/cluster/roles/${nodename}* dropping the intermediate
>>>>> "nodes"
>>>>> CHANGE REQUEST: The ZK structure also might not need that intermediate
>>>>> "nodes" node.
>>>>>
>>>>> CLARIFICATION: Should listing roles require some permissions? Maybe
>>>>> this requirement is too fundamental to the operation of a cluster and
>>>>> everybody would have to be able to do it.
>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to treat
>>>>> roles? Implementation detail that the servers will figure out? Or strict
>>>>> guidance where the client needs to check where specific roles are before
>>>>> sending any further communication to the server?
>>>>> CLARIFICATION: What happens when a node gets a request that it can't
>>>>> fulfil? An overseer node gets a query or an update. A data node gets a
>>>>> collection creation request. Do they forward it on to an appropriate node,
>>>>> or do they reject it? Should this be configurable? If not, then it seems
>>>>> like lazy or poorly configured clients will defeat this isolation system
>>>>> quite easily.
>>>>>
>>>>> GOOD: Testing the API is very important, yes.
>>>>> CLARIFICATION: What does testing for how nodes behave when roles are
>>>>> added mean? I thought we established that they are not dynamic.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Mike
>>>>>
>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>>>> ichattopadhyaya@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>>
>>>>>> We also wish to add first class support for Query nodes that are used
>>>>>> to process user queries by forwarding to data nodes, merging/aggregating
>>>>>> them and presenting to users. This concept exists as first class citizens
>>>>>> in most other search engines. This is a chance for Solr to catch up.
>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>>
>>>>>> Regards,
>>>>>> Ishan / Noble / Hitesh
>>>>>>
>>>>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: First class support for node roles

Posted by Gus Heck <gu...@gmail.com>.

I think the key  is to let the roles have full control of the implications
of having/not having that role. No need for even a strict/loose
designation. The question of do you have the role is yes/no with no logic
to guess if the role is implied or not, The question of will it come up
with the role is "have_explicit ? use_defaults : use_defaults.

Once you figure out who has a role (or not) what that means is up to the
role code.

Corollary: we don't have to change the way overseer works in this SIP. We
can rework it or not as we see fit separately.

Only thing we need to do is find a wording that makes the above clear on
first read through the SIP :)

-Gus

On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <ho...@gmail.com>
wrote:

> This doesn't really address my concern around what happens if all of our
>> existing OVERSEER candidates are down. When at least one of them is up, the
>> overseer will go there, and that is good and expected. But what happens if
>> all of the overseer eligible nodes are down. Your comment, and the old
>> system, would imply that the overseer election goes to some other
>> unrelated, untagged node. I disagree with this implementation choice. This
>> sounds like something role specific to determine, but I would like to see
>> us be more strict about it. I don't want cores leaking out of my data
>> roles, I don't want query processing to leak out of my "query" nodes or
>> whatever. Overseer shouldn't be special in this regard.
>>
>
> I'm very strongly in favor of not letting users design a system in which
> the cluster can be "live" without an overseer. I understand that the
> overseer can be taxing to the cluster, but honestly what is the point of
> having an untaxed cluster that doesn't have an overseer? I can see
> arguments for the other roles to be stricter about this, but there are also
> a lot of users who wouldn't want those to be strict either (like "query"
> nodes).
>
> Maybe we just put in stronger guarantees that if a non-overseer role node
> HAS to be selected to become overseer, it will try to migrate the overseer
> job to a node with the overseer role whenever one becomes live.
>
> So maybe we don't have special rules per role, but instead roles can
> either be defined as "Strict" or "Loose" (better names likely exist), and
> the roles come with a default (Overseer -> Loose, Data -> Strict, Query ->
> Loose, etc.). And it is up to each role to define how to behave when
> running in LOOSE mode and a non-role node is used then a role node comes
> online (like the overseer example given above).
>
> With the Strict/Loose option and sensible defaults, users cannot trip
> themselves up by default, but the option is there for people to tinker and
> have an iron grip over their cluster.
>
> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> wrote:
>
>> Noble wrote:
>> > We are not modifying the way the "overseer role" works today. We are
>> just changing the definition and standardizing the configuration &
>> discoverability
>> Ishan wrote:
>> > As of this SIP, we're not planning to modify the OVERSEER role (which
>> currently stands for preferred overseer). We can take a stab at refactoring
>> it later.
>>
>> Grouping these two comments together, since I think they are saying
>> the same thing. I think this is part of my confusion. We have an old system
>> that doesn't work the way we want the new system to work. There may be
>> people already using the old system. What path do we offer for folks using
>> the old system to migrate to the new system? What happens if somebody
>> accidentally tries to use both systems at the same time?
>>
>> Ishan wrote:
>> > When I wrote "When one or more such nodes [with OVERSEER role] are
>> live, Solr guarantees that one of those nodes becomes the overseer.", I
>> meant to somewhat capture the current behaviour as the OVERSEER role performs
>> today. Do you see any inconsistency with this statement vs. what it does
>> today?
>>
>> This doesn't really address my concern around what happens if all of our
>> existing OVERSEER candidates are down. When at least one of them is up, the
>> overseer will go there, and that is good and expected. But what happens if
>> all of the overseer eligible nodes are down. Your comment, and the old
>> system, would imply that the overseer election goes to some other
>> unrelated, untagged node. I disagree with this implementation choice. This
>> sounds like something role specific to determine, but I would like to see
>> us be more strict about it. I don't want cores leaking out of my data
>> roles, I don't want query processing to leak out of my "query" nodes or
>> whatever. Overseer shouldn't be special in this regard.
>>
>> Noble wrote:
>> > If we do that how do we know if xyz is a role or a node in the
>> following request?
>>
>> You're absolutely correct, thanks for pointing this out. Let's leave it
>> as is.
>>
>>
>>
>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>> ichattopadhyaya@gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com> wrote:
>>>
>>>> Replying to the top post in this thread because there has been a lot of
>>>> discussion and I don't want to look like I'm continuing any of those
>>>> particular threads.
>>>>
>>>> I finally had time to sit down and think about this with the attention
>>>> it deserves and am generally happy with how the conversation has shaped the
>>>> current proposal.
>>>>
>>>> GOOD: I think using system properties to define node roles is fine and
>>>> I like that data is the default role when not defined. I think it is
>>>> important to hold on to the guarantee that an active overseer will land on
>>>> an overseer node role.
>>>> CHANGE REQUEST: I would like to see a migration path for folks using
>>>> the current OVERSEER role. I am not sure that something can be done
>>>> automatically since they need to now specify new properties at startup.
>>>> Maybe we need to include loud warnings or support both approaches for a
>>>> time?
>>>> CHANGE REQUEST: I do not like that if all of the overseer nodes fail,
>>>> then it is implied the overseer will go to one of the data nodes. The
>>>> specific wording in the SIP - "When one or more such nodes are live, Solr
>>>> guarantees that one of those nodes become the overseer." implies to me that
>>>> failover could go from overseer1 to overseer2 to overseerN to random node.
>>>> I feel like we need to have some recording that there were dedicated
>>>> overseer nodes and stop the cascading failure instead of churning through
>>>> our data nodes.
>>>>
>>>> CLARIFICATION: I am slightly confused by the proposed scope of
>>>> "coordinator" roles from a split query/indexing standpoint. I understand
>>>> that these are used as examples, but would like stronger language that new
>>>> roles should also go through their own SIP discussions.
>>>>
>>>> CLARIFICATION: I do not like that we are storing node liveness in two
>>>> different places now. We have the live nodes and we have the node roles
>>>> stored in two different places in zookeeper and it feels like this would
>>>> lead to race conditions or split brain or other hard to diagnose bugs when
>>>> those two lists don't agree with each other. This also feels like it
>>>> contradicts the "single source of truth" idea later stated in the proposal.
>>>> I see Gus's arguments for decoupling these and am not strongly opposed, I
>>>> just get a lurking feeling about it. Even if we don't do this, I would like
>>>> this called out explicitly in the alternative approaches section as
>>>> something that we considered and rejected, with details why,
>>>>
>>>> GOOD: The API looks pretty clear. I would like an additional call out
>>>> here that all operations are GET because nodes cannot be changed at runtime.
>>>> CLARIFICATION: How does this interact with the previous OVERSEER
>>>> preference role?
>>>> CHANGE REQUEST: An additional API to get the list of available roles
>>>> for a cluster. I _think_ this could be based on the version that the
>>>> cluster is running? Would be useful to be able to interrogate a cluster in
>>>> the future... we're seeing OOM issues on queries, can we add some query
>>>> nodes? When were they introduced? I don't know what path this API should
>>>> exist at.
>>>>
>>>
>>> Added a *GET /api/cluster/roles/supported* API, updated the SIP
>>> document. Not sure if there's a better path that we could go for.
>>>
>>>
>>>> CLARIFICATION: Can we list the APIs to clearly show which parts are
>>>> string literals and which parts are meant to be substituted by the
>>>> operator? *GET **/api/cluster/roles/data *would become *GET **/api/cluster/roles/${rolename}
>>>> *in our SIP/documentation.
>>>> CHANGE REQUEST: I think *GET /api/cluster/roles/nodes/node1* should be *GET
>>>> /api/cluster/roles/${nodename}* dropping the intermediate "nodes"
>>>> CHANGE REQUEST: The ZK structure also might not need that intermediate
>>>> "nodes" node.
>>>>
>>>> CLARIFICATION: Should listing roles require some permissions? Maybe
>>>> this requirement is too fundamental to the operation of a cluster and
>>>> everybody would have to be able to do it.
>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to treat
>>>> roles? Implementation detail that the servers will figure out? Or strict
>>>> guidance where the client needs to check where specific roles are before
>>>> sending any further communication to the server?
>>>> CLARIFICATION: What happens when a node gets a request that it can't
>>>> fulfil? An overseer node gets a query or an update. A data node gets a
>>>> collection creation request. Do they forward it on to an appropriate node,
>>>> or do they reject it? Should this be configurable? If not, then it seems
>>>> like lazy or poorly configured clients will defeat this isolation system
>>>> quite easily.
>>>>
>>>> GOOD: Testing the API is very important, yes.
>>>> CLARIFICATION: What does testing for how nodes behave when roles are
>>>> added mean? I thought we established that they are not dynamic.
>>>>
>>>>
>>>> Thanks,
>>>> Mike
>>>>
>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>>> ichattopadhyaya@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Here's an SIP for introducing the concept of node roles:
>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>
>>>>> We also wish to add first class support for Query nodes that are used
>>>>> to process user queries by forwarding to data nodes, merging/aggregating
>>>>> them and presenting to users. This concept exists as first class citizens
>>>>> in most other search engines. This is a chance for Solr to catch up.
>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>
>>>>> Regards,
>>>>> Ishan / Noble / Hitesh
>>>>>
>>>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: First class support for node roles

Posted by Houston Putman <ho...@gmail.com>.

>
> This doesn't really address my concern around what happens if all of our
> existing OVERSEER candidates are down. When at least one of them is up, the
> overseer will go there, and that is good and expected. But what happens if
> all of the overseer eligible nodes are down. Your comment, and the old
> system, would imply that the overseer election goes to some other
> unrelated, untagged node. I disagree with this implementation choice. This
> sounds like something role specific to determine, but I would like to see
> us be more strict about it. I don't want cores leaking out of my data
> roles, I don't want query processing to leak out of my "query" nodes or
> whatever. Overseer shouldn't be special in this regard.
>

I'm very strongly in favor of not letting users design a system in which
the cluster can be "live" without an overseer. I understand that the
overseer can be taxing to the cluster, but honestly what is the point of
having an untaxed cluster that doesn't have an overseer? I can see
arguments for the other roles to be stricter about this, but there are also
a lot of users who wouldn't want those to be strict either (like "query"
nodes).

Maybe we just put in stronger guarantees that if a non-overseer role node
HAS to be selected to become overseer, it will try to migrate the overseer
job to a node with the overseer role whenever one becomes live.

So maybe we don't have special rules per role, but instead roles can either
be defined as "Strict" or "Loose" (better names likely exist), and the
roles come with a default (Overseer -> Loose, Data -> Strict, Query ->
Loose, etc.). And it is up to each role to define how to behave when
running in LOOSE mode and a non-role node is used then a role node comes
online (like the overseer example given above).

With the Strict/Loose option and sensible defaults, users cannot trip
themselves up by default, but the option is there for people to tinker and
have an iron grip over their cluster.

On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> wrote:

> Noble wrote:
> > We are not modifying the way the "overseer role" works today. We are
> just changing the definition and standardizing the configuration &
> discoverability
> Ishan wrote:
> > As of this SIP, we're not planning to modify the OVERSEER role (which
> currently stands for preferred overseer). We can take a stab at refactoring
> it later.
>
> Grouping these two comments together, since I think they are saying
> the same thing. I think this is part of my confusion. We have an old system
> that doesn't work the way we want the new system to work. There may be
> people already using the old system. What path do we offer for folks using
> the old system to migrate to the new system? What happens if somebody
> accidentally tries to use both systems at the same time?
>
> Ishan wrote:
> > When I wrote "When one or more such nodes [with OVERSEER role] are
> live, Solr guarantees that one of those nodes becomes the overseer.", I
> meant to somewhat capture the current behaviour as the OVERSEER role performs
> today. Do you see any inconsistency with this statement vs. what it does
> today?
>
> This doesn't really address my concern around what happens if all of our
> existing OVERSEER candidates are down. When at least one of them is up, the
> overseer will go there, and that is good and expected. But what happens if
> all of the overseer eligible nodes are down. Your comment, and the old
> system, would imply that the overseer election goes to some other
> unrelated, untagged node. I disagree with this implementation choice. This
> sounds like something role specific to determine, but I would like to see
> us be more strict about it. I don't want cores leaking out of my data
> roles, I don't want query processing to leak out of my "query" nodes or
> whatever. Overseer shouldn't be special in this regard.
>
> Noble wrote:
> > If we do that how do we know if xyz is a role or a node in the
> following request?
>
> You're absolutely correct, thanks for pointing this out. Let's leave it as
> is.
>
>
>
> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
>
>>
>>
>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com> wrote:
>>
>>> Replying to the top post in this thread because there has been a lot of
>>> discussion and I don't want to look like I'm continuing any of those
>>> particular threads.
>>>
>>> I finally had time to sit down and think about this with the attention
>>> it deserves and am generally happy with how the conversation has shaped the
>>> current proposal.
>>>
>>> GOOD: I think using system properties to define node roles is fine and I
>>> like that data is the default role when not defined. I think it is
>>> important to hold on to the guarantee that an active overseer will land on
>>> an overseer node role.
>>> CHANGE REQUEST: I would like to see a migration path for folks using the
>>> current OVERSEER role. I am not sure that something can be done
>>> automatically since they need to now specify new properties at startup.
>>> Maybe we need to include loud warnings or support both approaches for a
>>> time?
>>> CHANGE REQUEST: I do not like that if all of the overseer nodes fail,
>>> then it is implied the overseer will go to one of the data nodes. The
>>> specific wording in the SIP - "When one or more such nodes are live, Solr
>>> guarantees that one of those nodes become the overseer." implies to me that
>>> failover could go from overseer1 to overseer2 to overseerN to random node.
>>> I feel like we need to have some recording that there were dedicated
>>> overseer nodes and stop the cascading failure instead of churning through
>>> our data nodes.
>>>
>>> CLARIFICATION: I am slightly confused by the proposed scope of
>>> "coordinator" roles from a split query/indexing standpoint. I understand
>>> that these are used as examples, but would like stronger language that new
>>> roles should also go through their own SIP discussions.
>>>
>>> CLARIFICATION: I do not like that we are storing node liveness in two
>>> different places now. We have the live nodes and we have the node roles
>>> stored in two different places in zookeeper and it feels like this would
>>> lead to race conditions or split brain or other hard to diagnose bugs when
>>> those two lists don't agree with each other. This also feels like it
>>> contradicts the "single source of truth" idea later stated in the proposal.
>>> I see Gus's arguments for decoupling these and am not strongly opposed, I
>>> just get a lurking feeling about it. Even if we don't do this, I would like
>>> this called out explicitly in the alternative approaches section as
>>> something that we considered and rejected, with details why,
>>>
>>> GOOD: The API looks pretty clear. I would like an additional call out
>>> here that all operations are GET because nodes cannot be changed at runtime.
>>> CLARIFICATION: How does this interact with the previous OVERSEER
>>> preference role?
>>> CHANGE REQUEST: An additional API to get the list of available roles for
>>> a cluster. I _think_ this could be based on the version that the cluster is
>>> running? Would be useful to be able to interrogate a cluster in the
>>> future... we're seeing OOM issues on queries, can we add some query nodes?
>>> When were they introduced? I don't know what path this API should exist at.
>>>
>>
>> Added a *GET /api/cluster/roles/supported* API, updated the SIP
>> document. Not sure if there's a better path that we could go for.
>>
>>
>>> CLARIFICATION: Can we list the APIs to clearly show which parts are
>>> string literals and which parts are meant to be substituted by the
>>> operator? *GET **/api/cluster/roles/data *would become *GET **/api/cluster/roles/${rolename}
>>> *in our SIP/documentation.
>>> CHANGE REQUEST: I think *GET /api/cluster/roles/nodes/node1* should be *GET
>>> /api/cluster/roles/${nodename}* dropping the intermediate "nodes"
>>> CHANGE REQUEST: The ZK structure also might not need that intermediate
>>> "nodes" node.
>>>
>>> CLARIFICATION: Should listing roles require some permissions? Maybe this
>>> requirement is too fundamental to the operation of a cluster and everybody
>>> would have to be able to do it.
>>> CLARIFICATION: How do we expect SolrJ (and other clients) to treat
>>> roles? Implementation detail that the servers will figure out? Or strict
>>> guidance where the client needs to check where specific roles are before
>>> sending any further communication to the server?
>>> CLARIFICATION: What happens when a node gets a request that it can't
>>> fulfil? An overseer node gets a query or an update. A data node gets a
>>> collection creation request. Do they forward it on to an appropriate node,
>>> or do they reject it? Should this be configurable? If not, then it seems
>>> like lazy or poorly configured clients will defeat this isolation system
>>> quite easily.
>>>
>>> GOOD: Testing the API is very important, yes.
>>> CLARIFICATION: What does testing for how nodes behave when roles are
>>> added mean? I thought we established that they are not dynamic.
>>>
>>>
>>> Thanks,
>>> Mike
>>>
>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>> ichattopadhyaya@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Here's an SIP for introducing the concept of node roles:
>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>
>>>> We also wish to add first class support for Query nodes that are used
>>>> to process user queries by forwarding to data nodes, merging/aggregating
>>>> them and presenting to users. This concept exists as first class citizens
>>>> in most other search engines. This is a chance for Solr to catch up.
>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>
>>>> Regards,
>>>> Ishan / Noble / Hitesh
>>>>
>>>