You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@solr.apache.org by Houston Putman <ho...@apache.org> on 2023/04/05 16:45:26 UTC

SIP-18: A Solr Kubernetes Module for native integration

Hey everyone,

This is a new SIP, not a duplicate of SIP-17 (Authoscaling on Kubernetes),
and completely unrelated.

Basically there is a lot of very messy logic we do in the Solr Operator to
bootstrap security and manage various things. This logic must exist because
Solr has no idea that Kubernetes exists.
If we can use Kubernetes APIs to pull in information, instead of relying on
the Solr Operator to inject that information in hacky-ways, the user
experience on Kubernetes is going to get many times better for users
wanting to secure their SolrClouds. This will also help us use
authorization by default (which we always preach) via the Solr Operator.

This SIP is not very filled out because I'm still thinking on various
aspects. But in general, we can attack the different plugins one-by-one and
the SIP can evolve throughout the process. This SIP is very easy to break
up, which is nice.

Please let me know if I can explain more, or how I can make the SIP page
better.

- Houston

Re: SIP-18: A Solr Kubernetes Module for native integration

Posted by Houston Putman <ho...@apache.org>.

Yeah good question. Everything would be included in a single “kubernetes”
module. So while everything can be done independently, the first feature
will have to create the module, then the others features can be added.

Luckily adding a module is pretty straightforward, so it doesnt matter
which feature is added first.

- Houston

2023년 5월 3일 (수) 오전 10:45, Radu Gheorghe <ra...@sematext.com>님이 작성:

> Hello,
>
> Sorry for being late to the party. The SIP sounds good to me.
>
> Houston, you already mentioned that work for the module is easy to
> break down. You're referring to the fact that pretty much every major
> piece of functionality (Authentication, ConfigSets...) can be
> developed almost independent of each other, correct? I assume it's not
> worth having multiple modules, because you're likely to want
> everything, as a user, if you're using Kubernetes.
>
> Best regards,
> Radu
> --
> Elasticsearch/OpenSearch & Solr Consulting, Production Support & Training
> Sematext Cloud - Full Stack Observability
> https://sematext.com/
>
> On Fri, Apr 21, 2023 at 12:55 AM Arrieta, Alejandro
> <aa...@perrinsoftware.com> wrote:
> >
> > Hello team,
> >
> > noob warning:
> > today I learned what SIP means. with SIP17 and 18 being very interesting
> > reads.
> >
> https://cwiki.apache.org/confluence/display/SOLR/Solr+Improvement+Proposals
> > Too many telephone references.
> > sorry for the interruption.
> > Alejandro Arrieta
> >
> > On Thu, Apr 20, 2023 at 5:27 PM Houston Putman <ho...@apache.org>
> wrote:
> >
> > > Thanks for the questions Jason!
> > >
> > > So the general idea is that we'd add a Solr contrib/module, and that
> > > > module would have a dep on some sort of Kubernetes client so it could
> > > > manage certain Solr entities (e.g. security.json, configsets, etc.)
> as
> > > > Kubernetes resources (configmaps, etc.).  Am I understanding that
> > > > right?
> > > >
> > >
> > > Yes, absolutely. And possibly other things, like leverage Kubernetes'
> > > secrets managements to manage
> > > credentials for users. (Auto-import BasicAuth secrets with certain
> labels,
> > > integrate with Kubernetes ServiceAccounts, etc.)
> > >
> > > But yeah, generally the idea is to use Kubernetes state instead of
> > > Zookeeper state for certain features.
> > >
> > > One place there might be room for improvement in the writeup so far is
> > > > around the motivation/value-prop for some of these Solr->Kubernetes
> > > > integrations.  The value in some integrations (e.g.
> > > > KubernetesSSLCredentialsProvider) is relatively self-evident I think,
> > > > but others are a little less clear and could use spelled out
> > > > explicitly IMO.  e.g. What's the benefit of storing security.json or
> > > > configsets in Kubernetes configmaps over ZooKeeper?
> > > >
> > >
> > > This is a great question.
> > >
> > > Generally Solr has fairly good tool support for managing various
> things in
> > > Zookeeper.
> > >
> > > The "zkCli.sh" script and various "bin/solr" commands allow users to
> easily
> > > manage their Zookeeper state to setup
> > > Solr to run the way they need it to. This works very well for users
> running
> > > Solr on bare-metal, and manually running these commands.
> > >
> > > However, running these commands in Kubernetes is not very convenient
> and it
> > > does not really jive with
> > > the Kubernetes' idempotent model. Basically there isn't a good or easy
> way
> > > to run to run the
> > > solr/zk setup commands before a SolrCloud is created. And when we do
> it in
> > > things like an "initContainer",
> > > the commands have to be run every time a solr process is started (or
> > > restarted). This isn't really convenient
> > > and adds complexity that really makes running Solr on Kubernetes much
> less
> > > appealing.
> > >
> > > Another thing is state management. So let's say that the Solr Operator
> > > wants to enable auth by default when running Solr.
> > > It has to create a security.json for Solr to use, and generate
> passwords
> > > and secrets for users to use.
> > > However, it also needs to setup a user & password for itself (the
> operator)
> > > to use to interact with the cluster.
> > > But that's ok, it does it, and it can easily upload this file to
> zookeeper
> > > in the initContainer if no security.json already exists.
> > >
> > > However we need to allow users to update this file themselves to add
> more
> > > users, and do other stuff. So basically we
> > > can't let the Solr Operator make any changes to this file. So even if a
> > > user decides that they want to change the security.json secret
> > > they passed in the SolrCloud, the operator can't make that change
> happen,
> > > since it can't overwrite what already exists in zookeeper.
> > > This will always be a problem when there are two "sources of truth".
> One
> > > has to be prioritized.
> > >
> > > If we allow the security.json to be loaded from a kubernetes secret,
> then
> > > the secret that the user provides is the
> > > single source of truth. So no matter if the security.json is changed
> > > through the security UI, the changes will be reflected in
> > > the kubernetes secret. So users can be free to overwrite that secret if
> > > they want to, given that everyone knows its the current
> > > accepted state of the security.json file.
> > >
> > > The exact same issues exist with ConfigMaps. Many Solr Operator users
> want
> > > to manage their configMaps through
> > > Kubernetes, just like they manage their SolrClouds. It makes sense,
> keep
> > > all of your Solr infra managed together.
> > > However the operator cannot keep the configSets managed in Zookeeper
> > > updated with the configSets managed
> > > via Kube ConfigSets. It's two sources of truth.
> > >
> > > *TLDR*: Solr has many command line utilities that work well to setup
> Solr
> > > when its running on bare metal or a VM.
> > > However, these solutions do not work well in a cloud system like
> > > Kubernetes. If we try to make these things
> > > easier to setup in Kubernetes, it ultimately results in 2 sources of
> truth
> > > (Kubernetes and Zookeeper). If we make
> > > plugins that allow to load in these settings from Kubernetes instead of
> > > Zookeeper, we are back down to 1 source
> > > of truth. And this single source of truth (obviously) works well in
> > > Kubernetes, because they are native Kubernetes resources.
> > >
> > > - Houston
> > >
> > > On Tue, Apr 11, 2023 at 2:36 PM Jason Gerlowski <gerlowskija@gmail.com
> >
> > > wrote:
> > >
> > > > Hi Houston,
> > > >
> > > > So the general idea is that we'd add a Solr contrib/module, and that
> > > > module would have a dep on some sort of Kubernetes client so it could
> > > > manage certain Solr entities (e.g. security.json, configsets, etc.)
> as
> > > > Kubernetes resources (configmaps, etc.).  Am I understanding that
> > > > right?
> > > >
> > > > > Please let me know if I can explain more, or how I can make the SIP
> > > page
> > > > better.
> > > >
> > > > One place there might be room for improvement in the writeup so far
> is
> > > > around the motivation/value-prop for some of these Solr->Kubernetes
> > > > integrations.  The value in some integrations (e.g.
> > > > KubernetesSSLCredentialsProvider) is relatively self-evident I think,
> > > > but others are a little less clear and could use spelled out
> > > > explicitly IMO.  e.g. What's the benefit of storing security.json or
> > > > configsets in Kubernetes configmaps over ZooKeeper?
> > > >
> > > > Best,
> > > >
> > > > Jason
> > > >
> > > > On Wed, Apr 5, 2023 at 12:45 PM Houston Putman <ho...@apache.org>
> > > wrote:
> > > > >
> > > > > Hey everyone,
> > > > >
> > > > > This is a new SIP, not a duplicate of SIP-17 (Authoscaling on
> > > > Kubernetes),
> > > > > and completely unrelated.
> > > > >
> > > > > Basically there is a lot of very messy logic we do in the Solr
> Operator
> > > > to
> > > > > bootstrap security and manage various things. This logic must exist
> > > > because
> > > > > Solr has no idea that Kubernetes exists.
> > > > > If we can use Kubernetes APIs to pull in information, instead of
> > > relying
> > > > on
> > > > > the Solr Operator to inject that information in hacky-ways, the
> user
> > > > > experience on Kubernetes is going to get many times better for
> users
> > > > > wanting to secure their SolrClouds. This will also help us use
> > > > > authorization by default (which we always preach) via the Solr
> > > Operator.
> > > > >
> > > > > This SIP is not very filled out because I'm still thinking on
> various
> > > > > aspects. But in general, we can attack the different plugins
> one-by-one
> > > > and
> > > > > the SIP can evolve throughout the process. This SIP is very easy to
> > > break
> > > > > up, which is nice.
> > > > >
> > > > > Please let me know if I can explain more, or how I can make the SIP
> > > page
> > > > > better.
> > > > >
> > > > > - Houston
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> > > > For additional commands, e-mail: dev-help@solr.apache.org
> > > >
> > > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

Re: SIP-18: A Solr Kubernetes Module for native integration

Posted by Radu Gheorghe <ra...@sematext.com>.

Hello,

Sorry for being late to the party. The SIP sounds good to me.

Houston, you already mentioned that work for the module is easy to
break down. You're referring to the fact that pretty much every major
piece of functionality (Authentication, ConfigSets...) can be
developed almost independent of each other, correct? I assume it's not
worth having multiple modules, because you're likely to want
everything, as a user, if you're using Kubernetes.

Best regards,
Radu
--
Elasticsearch/OpenSearch & Solr Consulting, Production Support & Training
Sematext Cloud - Full Stack Observability
https://sematext.com/

On Fri, Apr 21, 2023 at 12:55 AM Arrieta, Alejandro
<aa...@perrinsoftware.com> wrote:
>
> Hello team,
>
> noob warning:
> today I learned what SIP means. with SIP17 and 18 being very interesting
> reads.
> https://cwiki.apache.org/confluence/display/SOLR/Solr+Improvement+Proposals
> Too many telephone references.
> sorry for the interruption.
> Alejandro Arrieta
>
> On Thu, Apr 20, 2023 at 5:27 PM Houston Putman <ho...@apache.org> wrote:
>
> > Thanks for the questions Jason!
> >
> > So the general idea is that we'd add a Solr contrib/module, and that
> > > module would have a dep on some sort of Kubernetes client so it could
> > > manage certain Solr entities (e.g. security.json, configsets, etc.) as
> > > Kubernetes resources (configmaps, etc.).  Am I understanding that
> > > right?
> > >
> >
> > Yes, absolutely. And possibly other things, like leverage Kubernetes'
> > secrets managements to manage
> > credentials for users. (Auto-import BasicAuth secrets with certain labels,
> > integrate with Kubernetes ServiceAccounts, etc.)
> >
> > But yeah, generally the idea is to use Kubernetes state instead of
> > Zookeeper state for certain features.
> >
> > One place there might be room for improvement in the writeup so far is
> > > around the motivation/value-prop for some of these Solr->Kubernetes
> > > integrations.  The value in some integrations (e.g.
> > > KubernetesSSLCredentialsProvider) is relatively self-evident I think,
> > > but others are a little less clear and could use spelled out
> > > explicitly IMO.  e.g. What's the benefit of storing security.json or
> > > configsets in Kubernetes configmaps over ZooKeeper?
> > >
> >
> > This is a great question.
> >
> > Generally Solr has fairly good tool support for managing various things in
> > Zookeeper.
> >
> > The "zkCli.sh" script and various "bin/solr" commands allow users to easily
> > manage their Zookeeper state to setup
> > Solr to run the way they need it to. This works very well for users running
> > Solr on bare-metal, and manually running these commands.
> >
> > However, running these commands in Kubernetes is not very convenient and it
> > does not really jive with
> > the Kubernetes' idempotent model. Basically there isn't a good or easy way
> > to run to run the
> > solr/zk setup commands before a SolrCloud is created. And when we do it in
> > things like an "initContainer",
> > the commands have to be run every time a solr process is started (or
> > restarted). This isn't really convenient
> > and adds complexity that really makes running Solr on Kubernetes much less
> > appealing.
> >
> > Another thing is state management. So let's say that the Solr Operator
> > wants to enable auth by default when running Solr.
> > It has to create a security.json for Solr to use, and generate passwords
> > and secrets for users to use.
> > However, it also needs to setup a user & password for itself (the operator)
> > to use to interact with the cluster.
> > But that's ok, it does it, and it can easily upload this file to zookeeper
> > in the initContainer if no security.json already exists.
> >
> > However we need to allow users to update this file themselves to add more
> > users, and do other stuff. So basically we
> > can't let the Solr Operator make any changes to this file. So even if a
> > user decides that they want to change the security.json secret
> > they passed in the SolrCloud, the operator can't make that change happen,
> > since it can't overwrite what already exists in zookeeper.
> > This will always be a problem when there are two "sources of truth". One
> > has to be prioritized.
> >
> > If we allow the security.json to be loaded from a kubernetes secret, then
> > the secret that the user provides is the
> > single source of truth. So no matter if the security.json is changed
> > through the security UI, the changes will be reflected in
> > the kubernetes secret. So users can be free to overwrite that secret if
> > they want to, given that everyone knows its the current
> > accepted state of the security.json file.
> >
> > The exact same issues exist with ConfigMaps. Many Solr Operator users want
> > to manage their configMaps through
> > Kubernetes, just like they manage their SolrClouds. It makes sense, keep
> > all of your Solr infra managed together.
> > However the operator cannot keep the configSets managed in Zookeeper
> > updated with the configSets managed
> > via Kube ConfigSets. It's two sources of truth.
> >
> > *TLDR*: Solr has many command line utilities that work well to setup Solr
> > when its running on bare metal or a VM.
> > However, these solutions do not work well in a cloud system like
> > Kubernetes. If we try to make these things
> > easier to setup in Kubernetes, it ultimately results in 2 sources of truth
> > (Kubernetes and Zookeeper). If we make
> > plugins that allow to load in these settings from Kubernetes instead of
> > Zookeeper, we are back down to 1 source
> > of truth. And this single source of truth (obviously) works well in
> > Kubernetes, because they are native Kubernetes resources.
> >
> > - Houston
> >
> > On Tue, Apr 11, 2023 at 2:36 PM Jason Gerlowski <ge...@gmail.com>
> > wrote:
> >
> > > Hi Houston,
> > >
> > > So the general idea is that we'd add a Solr contrib/module, and that
> > > module would have a dep on some sort of Kubernetes client so it could
> > > manage certain Solr entities (e.g. security.json, configsets, etc.) as
> > > Kubernetes resources (configmaps, etc.).  Am I understanding that
> > > right?
> > >
> > > > Please let me know if I can explain more, or how I can make the SIP
> > page
> > > better.
> > >
> > > One place there might be room for improvement in the writeup so far is
> > > around the motivation/value-prop for some of these Solr->Kubernetes
> > > integrations.  The value in some integrations (e.g.
> > > KubernetesSSLCredentialsProvider) is relatively self-evident I think,
> > > but others are a little less clear and could use spelled out
> > > explicitly IMO.  e.g. What's the benefit of storing security.json or
> > > configsets in Kubernetes configmaps over ZooKeeper?
> > >
> > > Best,
> > >
> > > Jason
> > >
> > > On Wed, Apr 5, 2023 at 12:45 PM Houston Putman <ho...@apache.org>
> > wrote:
> > > >
> > > > Hey everyone,
> > > >
> > > > This is a new SIP, not a duplicate of SIP-17 (Authoscaling on
> > > Kubernetes),
> > > > and completely unrelated.
> > > >
> > > > Basically there is a lot of very messy logic we do in the Solr Operator
> > > to
> > > > bootstrap security and manage various things. This logic must exist
> > > because
> > > > Solr has no idea that Kubernetes exists.
> > > > If we can use Kubernetes APIs to pull in information, instead of
> > relying
> > > on
> > > > the Solr Operator to inject that information in hacky-ways, the user
> > > > experience on Kubernetes is going to get many times better for users
> > > > wanting to secure their SolrClouds. This will also help us use
> > > > authorization by default (which we always preach) via the Solr
> > Operator.
> > > >
> > > > This SIP is not very filled out because I'm still thinking on various
> > > > aspects. But in general, we can attack the different plugins one-by-one
> > > and
> > > > the SIP can evolve throughout the process. This SIP is very easy to
> > break
> > > > up, which is nice.
> > > >
> > > > Please let me know if I can explain more, or how I can make the SIP
> > page
> > > > better.
> > > >
> > > > - Houston
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> > > For additional commands, e-mail: dev-help@solr.apache.org
> > >
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org

Re: SIP-18: A Solr Kubernetes Module for native integration

Posted by "Arrieta, Alejandro" <aa...@perrinsoftware.com>.

Hello team,

noob warning:
today I learned what SIP means. with SIP17 and 18 being very interesting
reads.
https://cwiki.apache.org/confluence/display/SOLR/Solr+Improvement+Proposals
Too many telephone references.
sorry for the interruption.
Alejandro Arrieta

On Thu, Apr 20, 2023 at 5:27 PM Houston Putman <ho...@apache.org> wrote:

> Thanks for the questions Jason!
>
> So the general idea is that we'd add a Solr contrib/module, and that
> > module would have a dep on some sort of Kubernetes client so it could
> > manage certain Solr entities (e.g. security.json, configsets, etc.) as
> > Kubernetes resources (configmaps, etc.).  Am I understanding that
> > right?
> >
>
> Yes, absolutely. And possibly other things, like leverage Kubernetes'
> secrets managements to manage
> credentials for users. (Auto-import BasicAuth secrets with certain labels,
> integrate with Kubernetes ServiceAccounts, etc.)
>
> But yeah, generally the idea is to use Kubernetes state instead of
> Zookeeper state for certain features.
>
> One place there might be room for improvement in the writeup so far is
> > around the motivation/value-prop for some of these Solr->Kubernetes
> > integrations.  The value in some integrations (e.g.
> > KubernetesSSLCredentialsProvider) is relatively self-evident I think,
> > but others are a little less clear and could use spelled out
> > explicitly IMO.  e.g. What's the benefit of storing security.json or
> > configsets in Kubernetes configmaps over ZooKeeper?
> >
>
> This is a great question.
>
> Generally Solr has fairly good tool support for managing various things in
> Zookeeper.
>
> The "zkCli.sh" script and various "bin/solr" commands allow users to easily
> manage their Zookeeper state to setup
> Solr to run the way they need it to. This works very well for users running
> Solr on bare-metal, and manually running these commands.
>
> However, running these commands in Kubernetes is not very convenient and it
> does not really jive with
> the Kubernetes' idempotent model. Basically there isn't a good or easy way
> to run to run the
> solr/zk setup commands before a SolrCloud is created. And when we do it in
> things like an "initContainer",
> the commands have to be run every time a solr process is started (or
> restarted). This isn't really convenient
> and adds complexity that really makes running Solr on Kubernetes much less
> appealing.
>
> Another thing is state management. So let's say that the Solr Operator
> wants to enable auth by default when running Solr.
> It has to create a security.json for Solr to use, and generate passwords
> and secrets for users to use.
> However, it also needs to setup a user & password for itself (the operator)
> to use to interact with the cluster.
> But that's ok, it does it, and it can easily upload this file to zookeeper
> in the initContainer if no security.json already exists.
>
> However we need to allow users to update this file themselves to add more
> users, and do other stuff. So basically we
> can't let the Solr Operator make any changes to this file. So even if a
> user decides that they want to change the security.json secret
> they passed in the SolrCloud, the operator can't make that change happen,
> since it can't overwrite what already exists in zookeeper.
> This will always be a problem when there are two "sources of truth". One
> has to be prioritized.
>
> If we allow the security.json to be loaded from a kubernetes secret, then
> the secret that the user provides is the
> single source of truth. So no matter if the security.json is changed
> through the security UI, the changes will be reflected in
> the kubernetes secret. So users can be free to overwrite that secret if
> they want to, given that everyone knows its the current
> accepted state of the security.json file.
>
> The exact same issues exist with ConfigMaps. Many Solr Operator users want
> to manage their configMaps through
> Kubernetes, just like they manage their SolrClouds. It makes sense, keep
> all of your Solr infra managed together.
> However the operator cannot keep the configSets managed in Zookeeper
> updated with the configSets managed
> via Kube ConfigSets. It's two sources of truth.
>
> *TLDR*: Solr has many command line utilities that work well to setup Solr
> when its running on bare metal or a VM.
> However, these solutions do not work well in a cloud system like
> Kubernetes. If we try to make these things
> easier to setup in Kubernetes, it ultimately results in 2 sources of truth
> (Kubernetes and Zookeeper). If we make
> plugins that allow to load in these settings from Kubernetes instead of
> Zookeeper, we are back down to 1 source
> of truth. And this single source of truth (obviously) works well in
> Kubernetes, because they are native Kubernetes resources.
>
> - Houston
>
> On Tue, Apr 11, 2023 at 2:36 PM Jason Gerlowski <ge...@gmail.com>
> wrote:
>
> > Hi Houston,
> >
> > So the general idea is that we'd add a Solr contrib/module, and that
> > module would have a dep on some sort of Kubernetes client so it could
> > manage certain Solr entities (e.g. security.json, configsets, etc.) as
> > Kubernetes resources (configmaps, etc.).  Am I understanding that
> > right?
> >
> > > Please let me know if I can explain more, or how I can make the SIP
> page
> > better.
> >
> > One place there might be room for improvement in the writeup so far is
> > around the motivation/value-prop for some of these Solr->Kubernetes
> > integrations.  The value in some integrations (e.g.
> > KubernetesSSLCredentialsProvider) is relatively self-evident I think,
> > but others are a little less clear and could use spelled out
> > explicitly IMO.  e.g. What's the benefit of storing security.json or
> > configsets in Kubernetes configmaps over ZooKeeper?
> >
> > Best,
> >
> > Jason
> >
> > On Wed, Apr 5, 2023 at 12:45 PM Houston Putman <ho...@apache.org>
> wrote:
> > >
> > > Hey everyone,
> > >
> > > This is a new SIP, not a duplicate of SIP-17 (Authoscaling on
> > Kubernetes),
> > > and completely unrelated.
> > >
> > > Basically there is a lot of very messy logic we do in the Solr Operator
> > to
> > > bootstrap security and manage various things. This logic must exist
> > because
> > > Solr has no idea that Kubernetes exists.
> > > If we can use Kubernetes APIs to pull in information, instead of
> relying
> > on
> > > the Solr Operator to inject that information in hacky-ways, the user
> > > experience on Kubernetes is going to get many times better for users
> > > wanting to secure their SolrClouds. This will also help us use
> > > authorization by default (which we always preach) via the Solr
> Operator.
> > >
> > > This SIP is not very filled out because I'm still thinking on various
> > > aspects. But in general, we can attack the different plugins one-by-one
> > and
> > > the SIP can evolve throughout the process. This SIP is very easy to
> break
> > > up, which is nice.
> > >
> > > Please let me know if I can explain more, or how I can make the SIP
> page
> > > better.
> > >
> > > - Houston
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> > For additional commands, e-mail: dev-help@solr.apache.org
> >
> >
>

Re: SIP-18: A Solr Kubernetes Module for native integration

Posted by Houston Putman <ho...@apache.org>.

Thanks for the questions Jason!

So the general idea is that we'd add a Solr contrib/module, and that
> module would have a dep on some sort of Kubernetes client so it could
> manage certain Solr entities (e.g. security.json, configsets, etc.) as
> Kubernetes resources (configmaps, etc.).  Am I understanding that
> right?
>

Yes, absolutely. And possibly other things, like leverage Kubernetes'
secrets managements to manage
credentials for users. (Auto-import BasicAuth secrets with certain labels,
integrate with Kubernetes ServiceAccounts, etc.)

But yeah, generally the idea is to use Kubernetes state instead of
Zookeeper state for certain features.

One place there might be room for improvement in the writeup so far is
> around the motivation/value-prop for some of these Solr->Kubernetes
> integrations.  The value in some integrations (e.g.
> KubernetesSSLCredentialsProvider) is relatively self-evident I think,
> but others are a little less clear and could use spelled out
> explicitly IMO.  e.g. What's the benefit of storing security.json or
> configsets in Kubernetes configmaps over ZooKeeper?
>

This is a great question.

Generally Solr has fairly good tool support for managing various things in
Zookeeper.

The "zkCli.sh" script and various "bin/solr" commands allow users to easily
manage their Zookeeper state to setup
Solr to run the way they need it to. This works very well for users running
Solr on bare-metal, and manually running these commands.

However, running these commands in Kubernetes is not very convenient and it
does not really jive with
the Kubernetes' idempotent model. Basically there isn't a good or easy way
to run to run the
solr/zk setup commands before a SolrCloud is created. And when we do it in
things like an "initContainer",
the commands have to be run every time a solr process is started (or
restarted). This isn't really convenient
and adds complexity that really makes running Solr on Kubernetes much less
appealing.

Another thing is state management. So let's say that the Solr Operator
wants to enable auth by default when running Solr.
It has to create a security.json for Solr to use, and generate passwords
and secrets for users to use.
However, it also needs to setup a user & password for itself (the operator)
to use to interact with the cluster.
But that's ok, it does it, and it can easily upload this file to zookeeper
in the initContainer if no security.json already exists.

However we need to allow users to update this file themselves to add more
users, and do other stuff. So basically we
can't let the Solr Operator make any changes to this file. So even if a
user decides that they want to change the security.json secret
they passed in the SolrCloud, the operator can't make that change happen,
since it can't overwrite what already exists in zookeeper.
This will always be a problem when there are two "sources of truth". One
has to be prioritized.

If we allow the security.json to be loaded from a kubernetes secret, then
the secret that the user provides is the
single source of truth. So no matter if the security.json is changed
through the security UI, the changes will be reflected in
the kubernetes secret. So users can be free to overwrite that secret if
they want to, given that everyone knows its the current
accepted state of the security.json file.

The exact same issues exist with ConfigMaps. Many Solr Operator users want
to manage their configMaps through
Kubernetes, just like they manage their SolrClouds. It makes sense, keep
all of your Solr infra managed together.
However the operator cannot keep the configSets managed in Zookeeper
updated with the configSets managed
via Kube ConfigSets. It's two sources of truth.

*TLDR*: Solr has many command line utilities that work well to setup Solr
when its running on bare metal or a VM.
However, these solutions do not work well in a cloud system like
Kubernetes. If we try to make these things
easier to setup in Kubernetes, it ultimately results in 2 sources of truth
(Kubernetes and Zookeeper). If we make
plugins that allow to load in these settings from Kubernetes instead of
Zookeeper, we are back down to 1 source
of truth. And this single source of truth (obviously) works well in
Kubernetes, because they are native Kubernetes resources.

- Houston

On Tue, Apr 11, 2023 at 2:36 PM Jason Gerlowski <ge...@gmail.com>
wrote:

> Hi Houston,
>
> So the general idea is that we'd add a Solr contrib/module, and that
> module would have a dep on some sort of Kubernetes client so it could
> manage certain Solr entities (e.g. security.json, configsets, etc.) as
> Kubernetes resources (configmaps, etc.).  Am I understanding that
> right?
>
> > Please let me know if I can explain more, or how I can make the SIP page
> better.
>
> One place there might be room for improvement in the writeup so far is
> around the motivation/value-prop for some of these Solr->Kubernetes
> integrations.  The value in some integrations (e.g.
> KubernetesSSLCredentialsProvider) is relatively self-evident I think,
> but others are a little less clear and could use spelled out
> explicitly IMO.  e.g. What's the benefit of storing security.json or
> configsets in Kubernetes configmaps over ZooKeeper?
>
> Best,
>
> Jason
>
> On Wed, Apr 5, 2023 at 12:45 PM Houston Putman <ho...@apache.org> wrote:
> >
> > Hey everyone,
> >
> > This is a new SIP, not a duplicate of SIP-17 (Authoscaling on
> Kubernetes),
> > and completely unrelated.
> >
> > Basically there is a lot of very messy logic we do in the Solr Operator
> to
> > bootstrap security and manage various things. This logic must exist
> because
> > Solr has no idea that Kubernetes exists.
> > If we can use Kubernetes APIs to pull in information, instead of relying
> on
> > the Solr Operator to inject that information in hacky-ways, the user
> > experience on Kubernetes is going to get many times better for users
> > wanting to secure their SolrClouds. This will also help us use
> > authorization by default (which we always preach) via the Solr Operator.
> >
> > This SIP is not very filled out because I'm still thinking on various
> > aspects. But in general, we can attack the different plugins one-by-one
> and
> > the SIP can evolve throughout the process. This SIP is very easy to break
> > up, which is nice.
> >
> > Please let me know if I can explain more, or how I can make the SIP page
> > better.
> >
> > - Houston
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

Re: SIP-18: A Solr Kubernetes Module for native integration

Posted by Jason Gerlowski <ge...@gmail.com>.

Hi Houston,

So the general idea is that we'd add a Solr contrib/module, and that
module would have a dep on some sort of Kubernetes client so it could
manage certain Solr entities (e.g. security.json, configsets, etc.) as
Kubernetes resources (configmaps, etc.).  Am I understanding that
right?

> Please let me know if I can explain more, or how I can make the SIP page better.

One place there might be room for improvement in the writeup so far is
around the motivation/value-prop for some of these Solr->Kubernetes
integrations.  The value in some integrations (e.g.
KubernetesSSLCredentialsProvider) is relatively self-evident I think,
but others are a little less clear and could use spelled out
explicitly IMO.  e.g. What's the benefit of storing security.json or
configsets in Kubernetes configmaps over ZooKeeper?

Best,

Jason

On Wed, Apr 5, 2023 at 12:45 PM Houston Putman <ho...@apache.org> wrote:
>
> Hey everyone,
>
> This is a new SIP, not a duplicate of SIP-17 (Authoscaling on Kubernetes),
> and completely unrelated.
>
> Basically there is a lot of very messy logic we do in the Solr Operator to
> bootstrap security and manage various things. This logic must exist because
> Solr has no idea that Kubernetes exists.
> If we can use Kubernetes APIs to pull in information, instead of relying on
> the Solr Operator to inject that information in hacky-ways, the user
> experience on Kubernetes is going to get many times better for users
> wanting to secure their SolrClouds. This will also help us use
> authorization by default (which we always preach) via the Solr Operator.
>
> This SIP is not very filled out because I'm still thinking on various
> aspects. But in general, we can attack the different plugins one-by-one and
> the SIP can evolve throughout the process. This SIP is very easy to break
> up, which is nice.
>
> Please let me know if I can explain more, or how I can make the SIP page
> better.
>
> - Houston

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org