You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ilan Ginzburg (Jira)" <ji...@apache.org> on 2020/08/03 14:29:00 UTC
[jira] [Comment Edited] (SOLR-14613) Provide a clean API for pluggable replica assignment implementations

    [ https://issues.apache.org/jira/browse/SOLR-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170070#comment-17170070 ] 

Ilan Ginzburg edited comment on SOLR-14613 at 8/3/20, 2:28 PM:
---------------------------------------------------------------

This is feedback on [~noble.paul]'s [PR for alternate proposal using the cluster APIs|https://github.com/apache/lucene-solr/pull/1714/].
 I'm putting it here because I find it hard to follow global discussions on very fragmented comments on files in a PR.

As a general note, I believe sample code of how these interfaces are to be used would be very useful (more so than how the interfaces can be implemented). Not only it guides the reader of that code in understanding the structure, it also helps the designer of that code understand which abstractions work etc. Even if the sample can't be run at this early stage, it's still a useful tool.
 I didn't really review the implementation changes for existing classes as I think these should come once we agree on the interfaces (otherwise it's throw away work).

Would be happy to get a bit more context on the value this proposal brings compared to the ongoing effort in [PR 1684|https://github.com/apache/lucene-solr/pull/1684] because I don't see it.

Here are my comments:

{{+*AssignContext*+}}:

Getting all the information needed for plugin operation is not going to be as efficient as it could because {{getCluster()}} with the hints is going to have to contact nodes to fetch relevant information, then {{fetchMetrics()}} might have to contact the same nodes again to fetch metrics. A system in which everything can be fetched by a minimal number of round trips to remote nodes (i.e. one per node) is preferable. We could mitigate by caching, but there would still be an underlying inefficiency.

Also are missing the fetching of system properties for example. If these have yet another method, that means a single node might be contacted three times to fetch all required information.

It seems unrealistic in {{getCluster()}} to provide very general hints such as {{PrefetchHint.COLLECTION_STATE}}. A placement implementation might care about the detailed collection state of one or two collections but not about the states of all collections in the cluster (might not care at all about other collections).

{{+*SimpleMap*+}}:

{{SimpleMap}} and {{LinkedSimpleHashMap}} use {{CharSequence}} as keys, this is inappropriate as +explicitly+ stated by the Javadoc for {{[CharSequence|https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/CharSequence.html]}}:
{quote}This interface does not refine the general contracts of the equals and hashCode methods. The result of testing two objects that implement CharSequence for equality is therefore, in general, undefined. Each object may be implemented by a different class, and there is no guarantee that each class will be capable of testing its instances for equality with those of the other. *It is therefore inappropriate to use arbitrary CharSequence instances as elements in a set or as keys in a map.*
{quote}
Also, unclear how the statement in the {{SimpleMap}} Javadoc _"It is designed to support large datasets without consuming lot of memory"_ is backed by reality. {{SimpleMap}} is a {{LinkedHashMap}} (in {{LinkedSimpleHashMap}} but this class is never used) or (in anonymous subclasses in {{ClusterState}}, {{Slice}} and {{DocCollection}}) delegates to other types of maps defined elsewhere (mostly/only {{HashMap}} or {{LinkedHashMap}}). It is actually less efficient (by a tiny amount) than directly using these other types directly.

{{SimpleMap}} does not implement {{Map}} and this makes it harder for new developers to approach (and its name is misleading). {{SimpleMap}} offers methods for iterating over all elements but these can be implemented as separate utility methods applicable to any map or collection, and existing methods such as {{Map.forEach()}} can also be used directly where needed.

I therefore suggest to not use {{SimpleMap}}.

{{+*NodeMetrics*+}}

This interface actually exposes a large part of the implementation. I would prefer the interfaces to hide implementation details. Also, I'd rather we move away from non typed methods that need to be cast all over the place and lead to a frustrating dev experience. We can hide both implementation and weakly typed implementations (if that's needed at some point - usually it is for communicating over the wire) in the hidden implementation classes and provide plugins with simpler and easier to consume interfaces.

{{+*AssignStrategy*+}}

The plugin is asked to return a {{WorkOrderResult}} that contains the {{WorkDecision}}'s as well as the final {{SolrCluster}} state. I believe it's not the responsibility of the plugin to return the final state (and would argue there might not be real use for that state in an ambitious implementation where {{WorkDecision}}'s from multiple placement computations can be combined to create the latest current known/expected cluster state for subsequent placement computations).

{{+*SolrCluster*+}}

{{collections()}} would return a map (currently {{SimpleMap}}) containing potentially hundreds of thousands collections. This map of collections will have to be built and will likely not be needed because each plugin computation will need a limited number of (solr) collection detailed info.

{{overseerNode()}} should likely return a {{SolrNode}} instance, like {{nodes()}} do.

{{+*Config*+}}

{{resources()}} likely better named {{getFileNames()}}.

{{resource()}} signature is wrong if a stream is to be returned to caller.

{{+*SolrCollection*+}}

A collection has a name so should likely have a {{getName()}} method.
 A collection has properties and should likely provide a way to retrieve these properties (for example {{withCollection}}).

{{+*ShardReplica*+}}

{{Replica}} would likely be a better and simpler name (otherwise shall we calls shards {{CollectionShard}}?).
 Exposing the internal {{org.apache.solr.common.cloud.Replica.Type}} class is not a good idea IMO.

{{+*WorkOrder*+}}

Rather than having a notion of {{WorkOrder}} {{Type}}, the code might be easier to read and understand and use if subclasses (or subinterfaces) of {{WorkOrder}} are used instead.

{{+*SolrNode*+}}

Why do placement code need a URL to the node? Are we planning to allow plugin code to go do whatever they want to do or are we targeting a controlled (and simpler) environment? I prefer the second option, I'm afraid things will quickly become unmanageable with the first option.


was (Author: murblanc):
This is feedback on [~noble.paul]'s [PR for alternate proposal using the cluster APIs|https://github.com/apache/lucene-solr/pull/1714/].
 I'm putting it here because I find it hard to follow global discussions on very fragmented comments on files in a PR.

As a general note, I believe sample code of how these interfaces are to be used would be very useful (more so than how the interfaces can be implemented). Note only it guides the reader of that code in understanding the structure, it also helps the designer of that code understand which abstractions work etc. Even if the sample can't be run at this early stage, it's still a useful tool.
 I didn't really review the implementation changes for existing classes as I think these should come once we agree on the interfaces (otherwise it's throw away work).

Would be happy to get a bit more context on the value this proposal brings compared to the ongoing effort in [PR 1684|https://github.com/apache/lucene-solr/pull/1684] because I don't see it.

Here are my comments:

{{+*AssignContext*+}}:

Getting all the information needed for plugin operation is not going to be as efficient as it could because {{getCluster()}} with the hints is going to have to contact nodes to fetch relevant information, then {{fetchMetrics()}} might have to contact the same nodes again to fetch metrics. A system in which everything can be fetched by a minimal number of round trips to remote nodes (i.e. one per node) is preferable. We could mitigate by caching, but there would still be an underlying inefficiency.

Also are missing the fetching of system properties for example. If these have yet another method, that means a single node might be contacted three times to fetch all required information.

It seems unrealistic in {{getCluster()}} to provide very general hints such as {{PrefetchHint.COLLECTION_STATE}}. A placement implementation might care about the detailed collection state of one or two collections but not about the states of all collections in the cluster (might not care at all about other collections).

{{+*SimpleMap*+}}:

{{SimpleMap}} and {{LinkedSimpleHashMap}} use {{CharSequence}} as keys, this is inappropriate as +explicitly+ stated by the Javadoc for {{[CharSequence|https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/CharSequence.html]}}:
{quote}This interface does not refine the general contracts of the equals and hashCode methods. The result of testing two objects that implement CharSequence for equality is therefore, in general, undefined. Each object may be implemented by a different class, and there is no guarantee that each class will be capable of testing its instances for equality with those of the other. *It is therefore inappropriate to use arbitrary CharSequence instances as elements in a set or as keys in a map.*
{quote}
Also, unclear how the statement in the {{SimpleMap}} Javadoc _"It is designed to support large datasets without consuming lot of memory"_ is backed by reality. {{SimpleMap}} is a {{LinkedHashMap}} (in {{LinkedSimpleHashMap}} but this class is never used) or (in anonymous subclasses in {{ClusterState}}, {{Slice}} and {{DocCollection}}) delegates to other types of maps defined elsewhere (mostly/only {{HashMap}} or {{LinkedHashMap}}). It is actually less efficient (by a tiny amount) than directly using these other types directly.

{{SimpleMap}} does not implement {{Map}} and this makes it harder for new developers to approach (and its name is misleading). {{SimpleMap}} offers methods for iterating over all elements but these can be implemented as separate utility methods applicable to any map or collection, and existing methods such as {{Map.forEach()}} can also be used directly where needed.

I therefore suggest to not use {{SimpleMap}}.

{{+*NodeMetrics*+}}

This interface actually exposes a large part of the implementation. I would prefer the interfaces to hide implementation details. Also, I'd rather we move away from non typed methods that need to be cast all over the place and lead to a frustrating dev experience. We can hide both implementation and weakly typed implementations (if that's needed at some point - usually it is for communicating over the wire) in the hidden implementation classes and provide plugins with simpler and easier to consume interfaces.

{{+*AssignStrategy*+}}

The plugin is asked to return a {{WorkOrderResult}} that contains the {{WorkDecision}}'s as well as the final {{SolrCluster}} state. I believe it's not the responsibility of the plugin to return the final state (and would argue there might not be real use for that state in an ambitious implementation where {{WorkDecision}}'s from multiple placement computations can be combined to create the latest current known/expected cluster state for subsequent placement computations).

{{+*SolrCluster*+}}

{{collections()}} would return a map (currently {{SimpleMap}}) containing potentially hundreds of thousands collections. This map of collections will have to be built and will likely not be needed because each plugin computation will need a limited number of (solr) collection detailed info.

{{overseerNode()}} should likely return a {{SolrNode}} instance, like {{nodes()}} do.

{{+*Config*+}}

{{resources()}} likely better named {{getFileNames()}}.

{{resource()}} signature is wrong if a stream is to be returned to caller.

{{+*SolrCollection*+}}

A collection has a name so should likely have a {{getName()}} method.
 A collection has properties and should likely provide a way to retrieve these properties (for example {{withCollection}}).

{{+*ShardReplica*+}}

{{Replica}} would likely be a better and simpler name (otherwise shall we calls shards {{CollectionShard}}?).
 Exposing the internal {{org.apache.solr.common.cloud.Replica.Type}} class is not a good idea IMO.

{{+*WorkOrder*+}}

Rather than having a notion of {{WorkOrder}} {{Type}}, the code might be easier to read and understand and use if subclasses (or subinterfaces) of {{WorkOrder}} are used instead.

{{+*SolrNode*+}}

Why do placement code need a URL to the node? Are we planning to allow plugin code to go do whatever they want to do or are we targeting a controlled (and simpler) environment? I prefer the second option, I'm afraid things will quickly become unmanageable with the first option.

> Provide a clean API for pluggable replica assignment implementations
> --------------------------------------------------------------------
>
>                 Key: SOLR-14613
>                 URL: https://issues.apache.org/jira/browse/SOLR-14613
>             Project: Solr
>          Issue Type: Improvement
>          Components: AutoScaling
>            Reporter: Andrzej Bialecki
>            Assignee: Ilan Ginzburg
>            Priority: Major
>          Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> As described in SIP-8 the current autoscaling Policy implementation has several limitations that make it difficult to use for very large clusters and very large collections. SIP-8 also mentions the possible migration path by providing alternative implementations of the placement strategies that are less complex but more efficient in these very large environments.
> We should review the existing APIs that the current autoscaling engine uses ({{SolrCloudManager}} , {{AssignStrategy}} , {{Suggester}} and related interfaces) to see if they provide a sufficient and minimal API for plugging in alternative autoscaling placement strategies, and if necessary refactor the existing APIs.
> Since these APIs are internal it should be possible to do this without breaking back-compat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org