You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@solr.apache.org by David Smiley <ds...@apache.org> on 2023/06/19 20:41:46 UTC

SolrCloudManager... DistribStateManager...

I noticed the SolrCloudManager concept added some time ago brought about to
abstract away SolrCloud in the context of doing simulated experiments on
auto-scaling.  Essentially -- need to simulate SolrCloud and not actually
use a real SolrCloud.  But that need and code went away in 9.0...
nonetheless SolrCloudManager and its friends (like DistributedStateManager)
are still around.  I could imagine someone advocating for them
nonetheless.  But the present state is very half-implemented as there is
code all over the place that assumes ZooKeeper (e.g. uses SolrZkClient or
ZkStateReader) instead of some of these abstractions.  I think there is a
need to set a direction here -- do we embrace abstracting SolrCloud within
Solr or do we revert this stuff as needless indirection / concepts.

I think there's lots of room to debate / review the particulars of
SolrCloudManager and friends if we do want to keep it.
DistributedQueueFactory isn't even used anymore.  NodeStateProvider is only
for AttributeFactory; not very obvious.  DistribStateManager is essentially
SolrZkClient but nonetheless still references ZK classes.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

Re: SolrCloudManager... DistribStateManager...

Posted by David Smiley <ds...@apache.org>.

Gus, what do you mean by "local management"?

Anyway, I think I was misunderstood.  And besides, admittedly you two
aren't familiar with the classes in question.  But here goes a
clarification:

I'm concerned about internal SolrCloud code design/maintenance with respect
to these abstractions -- some of which *are* still used.  Maybe
SolrCloudManager isn't an issue -- it's just a facade / holder to other
stuff -- probably worth embracing further.  I think the main concern here
is DistribStateManager which is basically trying to abstract SolrZkClient.
The question is... should we bother?  To use it in some places and not
others adds a degree of complexity / confusion and I question if it's
serving any value.  There is exactly one implementation (SolrZkClient
based, of course).  In Solr 8, there were four plus many anonymous
subclasses.  I could hypothetically imagine unit tests using some trivial
in-memory like impl -- in Solr 8 that's "SimDistribStateManager" but we
don't have tests that use such a thing.  And besides, using embedded ZK is
no big deal for a test; lots of our tests use ZK directly for this.  The
real value of DSM was to do fast simulations of scale.  And perhaps to open
a pathway to use something other than ZK... but I don't think that's
realistic (a poor use of limited contributor time, not to mention ZK works
pretty well).

*If* we keep DistribStateManager, I think I would prefer that SolrZkClient
*be* a DistribStateManager without having to wrap/delegate.  What makes
that non-trivial is the semantic differences in exceptions -- DSM maps some
ZK specific exceptions (e.g. NoNodeException) into general ones (e.g.
NoSuchElementException).  Yet DSM actually fails to conceal ZK because it
depends on ZK classes like Op and Watcher and even KeeperException!  So
that was an apparent failed attempt.  Since DSM is not used *too* much, I
could see simply replacing its method signatures with that of SolrZkClient.

Or just remove DSM.  This is my preference as it reduces complexity.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

On Mon, Jun 19, 2023 at 4:41 PM David Smiley <ds...@apache.org> wrote:

> I noticed the SolrCloudManager concept added some time ago brought about
> to abstract away SolrCloud in the context of doing simulated experiments on
> auto-scaling.  Essentially -- need to simulate SolrCloud and not actually
> use a real SolrCloud.  But that need and code went away in 9.0...
> nonetheless SolrCloudManager and its friends (like DistributedStateManager)
> are still around.  I could imagine someone advocating for them
> nonetheless.  But the present state is very half-implemented as there is
> code all over the place that assumes ZooKeeper (e.g. uses SolrZkClient or
> ZkStateReader) instead of some of these abstractions.  I think there is a
> need to set a direction here -- do we embrace abstracting SolrCloud within
> Solr or do we revert this stuff as needless indirection / concepts.
>
> I think there's lots of room to debate / review the particulars of
> SolrCloudManager and friends if we do want to keep it.
> DistributedQueueFactory isn't even used anymore.  NodeStateProvider is only
> for AttributeFactory; not very obvious.  DistribStateManager is essentially
> SolrZkClient but nonetheless still references ZK classes.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>

Re: SolrCloudManager... DistribStateManager...

Posted by Eric Pugh <ep...@opensourceconnections.com>.

I dropped the notes that Gus made into a Miro board just to visualize it..   https://miro.com/app/board/o9J_lm8BmXE=/?moveToWidget=3458764558006503531&cot=14

Be interesting during one of our community calls to talk about “what do we have” and “what are we missing?”….     Feels to me like a number of our SIPs actually map to this progression.

Eric


> On Jun 20, 2023, at 12:04 PM, Eric Pugh <ep...@opensourceconnections.com> wrote:
> 
> I think this is a very interesting progression…..    It’s a really nice mental model of “what tools should I reach for when?"
> 
>> On Jun 20, 2023, at 11:41 AM, Gus Heck <gu...@gmail.com> wrote:
>> 
>> I'm not familiar with these classes, but I'm not particularly fond of
>> anything that leads us in a direction of requiring a *third* installation
>> for initial use. (Zookeeper being #2 already). That said, we really need a
>> good replacement for autoscaling, and large installs might reasonably want
>> to offload any non query work. Ideally, we would have a smooth transition
>> so that users can easily follow this path:
>> 
>>   1. Single Cloud Node, embedded zookeeper, local management (mostly
>>   unused, maybe not loaded)
>>   2. A few Cloud nodes (2-5), embedded zookeeper, local management
>>   3. Moderate cloud (6-12 nodes), embedded zookeeper on subset of nodes,
>>   local management
>>   4. Large cloud 13-25 nodes, external zookeeper, local management
>>   5. > 25 nodes, external zookeeper, management local or external.
>>   6. >100 nodes, recommended external zk and management
>> 
>> Thus folks doing moderate stuff don't need to bother with installing
>> anything other than Solr. Somewhere along that scale they would likely
>> start using tlog and then tlog/pull setups as well. Ideally we would have a
>> clear path to make these transitions with minimal downtime.
>> 
>> So if we can fit what these classes do into that dream, great. If they
>> point elsewhere meh.
>> 
>> Note: of course none of this has anything to do with "user-managed" Solr
>> (a.k.a. legacy solr) which is managed manually by users and doesn't have zk.
>> 
>> 
>> On Mon, Jun 19, 2023, 4:42 PM David Smiley <ds...@apache.org> wrote:
>> 
>>> I noticed the SolrCloudManager concept added some time ago brought about to
>>> abstract away SolrCloud in the context of doing simulated experiments on
>>> auto-scaling.  Essentially -- need to simulate SolrCloud and not actually
>>> use a real SolrCloud.  But that need and code went away in 9.0...
>>> nonetheless SolrCloudManager and its friends (like DistributedStateManager)
>>> are still around.  I could imagine someone advocating for them
>>> nonetheless.  But the present state is very half-implemented as there is
>>> code all over the place that assumes ZooKeeper (e.g. uses SolrZkClient or
>>> ZkStateReader) instead of some of these abstractions.  I think there is a
>>> need to set a direction here -- do we embrace abstracting SolrCloud within
>>> Solr or do we revert this stuff as needless indirection / concepts.
>>> 
>>> I think there's lots of room to debate / review the particulars of
>>> SolrCloudManager and friends if we do want to keep it.
>>> DistributedQueueFactory isn't even used anymore.  NodeStateProvider is only
>>> for AttributeFactory; not very obvious.  DistribStateManager is essentially
>>> SolrZkClient but nonetheless still references ZK classes.
>>> 
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>> 
> 
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
> 

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

Re: SolrCloudManager... DistribStateManager...

Posted by Gus Heck <gu...@gmail.com>.

Yeah, looking back at the original email I read too quick and misunderstood
the juxtaposition of "manager" and "extracting", but the current effort to
make embedded zk a supported production config (not sure where that
stands),  and some form of  or tweak to server roles stuff would get us
through 4, and the (misunderstood) idea of management code (possibly along
with the UI) being extractable to a management server for large installs
(so that there's no chance of the balancing calculations and other such
needs bogging down the servers serving/indexing) seems interesting... also
has the effect of potentially isolating cluster wide operations away from
individual query endpoints.  This is part of where I was going when I
extracted the startup mechanisms into a ServletListner. Breaking out admin
into its own servlet (or two servlets, cluster-admin and node-admin, the
latter only containing endpoints for actions applicable to the current
node) would be a way to make this easy. There's not really a good reason
for the query and update paths to be passing through an "if this is an
admin request do something else" A precursor to that in my mind would be to
pull out auth into a separate filter that can be reused across contexts.

-Gus

On Tue, Jun 20, 2023 at 12:05 PM Eric Pugh <ep...@opensourceconnections.com>
wrote:

> I think this is a very interesting progression…..    It’s a really nice
> mental model of “what tools should I reach for when?"
>
> > On Jun 20, 2023, at 11:41 AM, Gus Heck <gu...@gmail.com> wrote:
> >
> > I'm not familiar with these classes, but I'm not particularly fond of
> > anything that leads us in a direction of requiring a *third* installation
> > for initial use. (Zookeeper being #2 already). That said, we really need
> a
> > good replacement for autoscaling, and large installs might reasonably
> want
> > to offload any non query work. Ideally, we would have a smooth transition
> > so that users can easily follow this path:
> >
> >   1. Single Cloud Node, embedded zookeeper, local management (mostly
> >   unused, maybe not loaded)
> >   2. A few Cloud nodes (2-5), embedded zookeeper, local management
> >   3. Moderate cloud (6-12 nodes), embedded zookeeper on subset of nodes,
> >   local management
> >   4. Large cloud 13-25 nodes, external zookeeper, local management
> >   5. > 25 nodes, external zookeeper, management local or external.
> >   6. >100 nodes, recommended external zk and management
> >
> > Thus folks doing moderate stuff don't need to bother with installing
> > anything other than Solr. Somewhere along that scale they would likely
> > start using tlog and then tlog/pull setups as well. Ideally we would
> have a
> > clear path to make these transitions with minimal downtime.
> >
> > So if we can fit what these classes do into that dream, great. If they
> > point elsewhere meh.
> >
> > Note: of course none of this has anything to do with "user-managed" Solr
> > (a.k.a. legacy solr) which is managed manually by users and doesn't have
> zk.
> >
> >
> > On Mon, Jun 19, 2023, 4:42 PM David Smiley <ds...@apache.org> wrote:
> >
> >> I noticed the SolrCloudManager concept added some time ago brought
> about to
> >> abstract away SolrCloud in the context of doing simulated experiments on
> >> auto-scaling.  Essentially -- need to simulate SolrCloud and not
> actually
> >> use a real SolrCloud.  But that need and code went away in 9.0...
> >> nonetheless SolrCloudManager and its friends (like
> DistributedStateManager)
> >> are still around.  I could imagine someone advocating for them
> >> nonetheless.  But the present state is very half-implemented as there is
> >> code all over the place that assumes ZooKeeper (e.g. uses SolrZkClient
> or
> >> ZkStateReader) instead of some of these abstractions.  I think there is
> a
> >> need to set a direction here -- do we embrace abstracting SolrCloud
> within
> >> Solr or do we revert this stuff as needless indirection / concepts.
> >>
> >> I think there's lots of room to debate / review the particulars of
> >> SolrCloudManager and friends if we do want to keep it.
> >> DistributedQueueFactory isn't even used anymore.  NodeStateProvider is
> only
> >> for AttributeFactory; not very obvious.  DistribStateManager is
> essentially
> >> SolrZkClient but nonetheless still references ZK classes.
> >>
> >> ~ David Smiley
> >> Apache Lucene/Solr Search Developer
> >> http://www.linkedin.com/in/davidwsmiley
> >>
>
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: SolrCloudManager... DistribStateManager...

Posted by Eric Pugh <ep...@opensourceconnections.com>.

I think this is a very interesting progression…..    It’s a really nice mental model of “what tools should I reach for when?"

> On Jun 20, 2023, at 11:41 AM, Gus Heck <gu...@gmail.com> wrote:
> 
> I'm not familiar with these classes, but I'm not particularly fond of
> anything that leads us in a direction of requiring a *third* installation
> for initial use. (Zookeeper being #2 already). That said, we really need a
> good replacement for autoscaling, and large installs might reasonably want
> to offload any non query work. Ideally, we would have a smooth transition
> so that users can easily follow this path:
> 
>   1. Single Cloud Node, embedded zookeeper, local management (mostly
>   unused, maybe not loaded)
>   2. A few Cloud nodes (2-5), embedded zookeeper, local management
>   3. Moderate cloud (6-12 nodes), embedded zookeeper on subset of nodes,
>   local management
>   4. Large cloud 13-25 nodes, external zookeeper, local management
>   5. > 25 nodes, external zookeeper, management local or external.
>   6. >100 nodes, recommended external zk and management
> 
> Thus folks doing moderate stuff don't need to bother with installing
> anything other than Solr. Somewhere along that scale they would likely
> start using tlog and then tlog/pull setups as well. Ideally we would have a
> clear path to make these transitions with minimal downtime.
> 
> So if we can fit what these classes do into that dream, great. If they
> point elsewhere meh.
> 
> Note: of course none of this has anything to do with "user-managed" Solr
> (a.k.a. legacy solr) which is managed manually by users and doesn't have zk.
> 
> 
> On Mon, Jun 19, 2023, 4:42 PM David Smiley <ds...@apache.org> wrote:
> 
>> I noticed the SolrCloudManager concept added some time ago brought about to
>> abstract away SolrCloud in the context of doing simulated experiments on
>> auto-scaling.  Essentially -- need to simulate SolrCloud and not actually
>> use a real SolrCloud.  But that need and code went away in 9.0...
>> nonetheless SolrCloudManager and its friends (like DistributedStateManager)
>> are still around.  I could imagine someone advocating for them
>> nonetheless.  But the present state is very half-implemented as there is
>> code all over the place that assumes ZooKeeper (e.g. uses SolrZkClient or
>> ZkStateReader) instead of some of these abstractions.  I think there is a
>> need to set a direction here -- do we embrace abstracting SolrCloud within
>> Solr or do we revert this stuff as needless indirection / concepts.
>> 
>> I think there's lots of room to debate / review the particulars of
>> SolrCloudManager and friends if we do want to keep it.
>> DistributedQueueFactory isn't even used anymore.  NodeStateProvider is only
>> for AttributeFactory; not very obvious.  DistribStateManager is essentially
>> SolrZkClient but nonetheless still references ZK classes.
>> 
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>> 

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

Re: SolrCloudManager... DistribStateManager...

Posted by Gus Heck <gu...@gmail.com>.

I'm not familiar with these classes, but I'm not particularly fond of
anything that leads us in a direction of requiring a *third* installation
for initial use. (Zookeeper being #2 already). That said, we really need a
good replacement for autoscaling, and large installs might reasonably want
to offload any non query work. Ideally, we would have a smooth transition
so that users can easily follow this path:

   1. Single Cloud Node, embedded zookeeper, local management (mostly
   unused, maybe not loaded)
   2. A few Cloud nodes (2-5), embedded zookeeper, local management
   3. Moderate cloud (6-12 nodes), embedded zookeeper on subset of nodes,
   local management
   4. Large cloud 13-25 nodes, external zookeeper, local management
   5. > 25 nodes, external zookeeper, management local or external.
   6. >100 nodes, recommended external zk and management

Thus folks doing moderate stuff don't need to bother with installing
anything other than Solr. Somewhere along that scale they would likely
start using tlog and then tlog/pull setups as well. Ideally we would have a
clear path to make these transitions with minimal downtime.

So if we can fit what these classes do into that dream, great. If they
point elsewhere meh.

Note: of course none of this has anything to do with "user-managed" Solr
(a.k.a. legacy solr) which is managed manually by users and doesn't have zk.

On Mon, Jun 19, 2023, 4:42 PM David Smiley <ds...@apache.org> wrote:

> I noticed the SolrCloudManager concept added some time ago brought about to
> abstract away SolrCloud in the context of doing simulated experiments on
> auto-scaling.  Essentially -- need to simulate SolrCloud and not actually
> use a real SolrCloud.  But that need and code went away in 9.0...
> nonetheless SolrCloudManager and its friends (like DistributedStateManager)
> are still around.  I could imagine someone advocating for them
> nonetheless.  But the present state is very half-implemented as there is
> code all over the place that assumes ZooKeeper (e.g. uses SolrZkClient or
> ZkStateReader) instead of some of these abstractions.  I think there is a
> need to set a direction here -- do we embrace abstracting SolrCloud within
> Solr or do we revert this stuff as needless indirection / concepts.
>
> I think there's lots of room to debate / review the particulars of
> SolrCloudManager and friends if we do want to keep it.
> DistributedQueueFactory isn't even used anymore.  NodeStateProvider is only
> for AttributeFactory; not very obvious.  DistribStateManager is essentially
> SolrZkClient but nonetheless still references ZK classes.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>