You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2021/08/05 15:17:14 UTC

[GitHub] [solr-operator] gerlowskija opened a new issue #301: Add support for GCS storage to 'solrbackup'

gerlowskija opened a new issue #301:
URL: https://github.com/apache/solr-operator/issues/301


   Currently the 'solrbackup' resource assumes that users want backups stored "locally" (i.e. stored on a PV or mounted drive using Solr's LocalFileSystemRepository).  These local backups can then optionally be "persisted" - which involves compressing them and shipping them to a different PV or S3 bucket.
   
   But no support exists for using other backup destinations that Solr supports natively, such as GCS (as of 8.9).
   
   We should add this support.  Users can configure their GCS-backup settings under solrcloud's `backupRestoreOptions` object, leaving the `solrbackup` object relatively untouched (except that any "persistence" section on 'solrbackup' would now be ignored, as we can only easily compress files that are stored locally). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] gerlowskija edited a comment on issue #301: Add support for GCS storage to 'solrbackup'

Posted by GitBox <gi...@apache.org>.
gerlowskija edited a comment on issue #301:
URL: https://github.com/apache/solr-operator/issues/301#issuecomment-893584633


   I've attached a rough PR that shows how this could be done.  Below are an example 'solrcloud' and 'solrbackup' that use the proposed functionality:
   
   **SolrCloud**
   ```
   apiVersion: solr.apache.org/v1beta1
   kind: SolrCloud
   metadata:
     name: jasons_cluster
   spec:
     dataStorage:
       persistent:
         reclaimPolicy: Delete
         pvcTemplate:
           spec:
             resources:
               requests:
                 storage: "5Gi"
       backupRestoreOptions:
         gcsStorage:
           bucket: "solr-log-test"
           gcsCredentialSecret: "my-gcs-secret"
           baseLocation: "logs"
       ...
   ```
   The most noteworthy addition in this snippet is `.Spec.dataStorage.backupRestoreOptions.gcsStorage.gcsCredentialSecret`.  This required property holds the name of a secret created by the user.  This secret must have a key "service-account-key.json" whose value is the user's [Google Service Key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys).
   
   **SolrBackup**
   ```
   apiVersion: solr.apache.org/v1beta1
   kind: SolrBackup
   metadata:
     name: gcs_techproducts_backup
     namespace: default
   spec:
     solrCloud: jasons_cluster
     collections:
       - techproducts
   ```
   (Note that there's no new configuration in 'solrbackup', just the removal of the 'persistence' section for gcs-backups.)
   
   I'm not wedded to these syntaxes by any means - just wanted to get some examples up here as a concrete starting point for discussion.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] gerlowskija edited a comment on issue #301: Add support for GCS storage to 'solrbackup'

Posted by GitBox <gi...@apache.org>.
gerlowskija edited a comment on issue #301:
URL: https://github.com/apache/solr-operator/issues/301#issuecomment-896118995


   > The only thing the operator should do for these "native" backup options, is to call the Solr API right?
   
   That's what I'm proposing, yep - the operator wouldn't be doing any of the compression or relocation features for GCS that it currently supports for 'local' backups.  It's "just" calling the Solr API.  (Which, I'd contend, isn't "nothing".  That still saves Ops folks from crafting their own solr.xml, from needing to learn Solr's backup and async-polling APIs, etc.)
   
   > the only real benefit would be to have the operator be able to do this on a schedule
   
   Definitely agree.  As I said above, I think there's value in this ticket alone.  But GCS-support gets much more appealing as the operator's backup featureset generally gets more robust.  I love the idea of a "backupschedule" entity that creates individual solrbackup objects in turn.  I'll file an issue for that as a placeholder for discussion.
   
   > I think we could change the SolrBackup to do either "managed" or "remote" backups
   
   I think I agree with your suggestions here, but let me restate a few of them to make sure I'm understanding you correctly.  There's a point or two I'm unclear on.
   
   1. I see what you're getting at with the "managed" vs "remote" distinction, but I'm not sure whether and where you imagine that appearing in the yaml configs.  Are you suggesting an explicit setting on 'solrbackup'?  Or that it be implicit based on the value of the 'repository' setting you mention?
   2. Letting users specify a "backupRepositoryName" on their 'solrbackup' makes sense to me.  And further it implies that a user should be able to configure multiple sets of backup configs in their solrcloud's backupRestoreOptions setting. (i.e. configuring local backup settings and gcs backup settings aren't mutually exclusive - we'd support use of both within the same solrcloud.)
   3. It seems like you're on the fence about having the operator bootstrap required buckets/locations, and don't have a strong opinion there.  I lean towards skipping that bc of the complexity of bringing in S3, GCS, etc. clients to the operator - at least until we get feedback from users that it'd be worth it, but also have mixed feelings about it.
   
   So taking those suggestions, our new example CRDs would look something like:
   
   **solrcloud**
   ```
   ...
     dataStorage:
       ...
       backupRestoreOptions:
         - name: "my-gcs"
            type: "gcs" // A new enum-ish field - either 'local', or 'gcs'
            bucket: "solr-log-test"
            gcsCredentialSecret: "my-gcs-secret"
            baseLocation: "logs"
         - name: "my-local"
            type: "local"
            default: true
       ...
   ```
   
   **solrbackup (gcs)**
   ```
   apiVersion: solr.apache.org/v1beta1
   kind: SolrBackup
   metadata:
     name: gcs_techproducts_backup
     namespace: default
   spec:
     solrCloud: jasons_cluster
     repository: "my-gcs"
     location: "logs_alt"
     collections:
       - techproducts
   ```
   
   **solrbackup (local)**
   ```
   apiVersion: solr.apache.org/v1beta1
   kind: SolrBackup
   metadata:
     name: local_techproducts_backup
     namespace: default
   spec:
     solrCloud: jasons_cluster
     repository: "my-local"
     location: "logs_alt"
     persistence:  // Ignored if 'my-local' repository isn't of a type that supports "managed" backups. (i.e. type=local)
       volume:
         source:
           persistentVolumeClaim:
             claimName: "pvc-test"
     collections:
       - techproducts
   ```
   
   Those could be off a bit based on what you meant regarding the "managed" v. "remote" flag.  Does this look closer to what you were thinking? @HoustonPutman 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] gerlowskija commented on issue #301: Add support for GCS storage to 'solrbackup'

Posted by GitBox <gi...@apache.org>.
gerlowskija commented on issue #301:
URL: https://github.com/apache/solr-operator/issues/301#issuecomment-896118995


   > The only thing the operator should do for these "native" backup options, is to call the Solr API right?
   
   That's what I'm proposing, yep - the operator wouldn't be doing any of the compression or relocation features for GCS that it currently supports for 'local' backups.  It's "just" calling the Solr API.  (Which, I'd contend, isn't "nothing".  That still saves Ops folks from crafting their own solr.xml, from needing to learn Solr's backup and async-polling APIs, etc.)
   
   > the only real benefit would be to have the operator be able to do this on a schedule
   
   Definitely agree.  As I said above, I think there's value in this ticket alone.  But GCS-support gets much more appealing as the operator's backup featureset generally gets more robust.  I love the idea of a "backupschedule" entity that creates individual solrbackup objects in turn.  I'll file an issue for that as a placeholder for discussion.
   
   > I think we could change the SolrBackup to do either "managed" or "remote" backups
   
   I think I agree with your suggestions here, but let me restate a few of them to make sure I'm understanding you correctly.  There's a point or two I'm unclear on.
   
   1. I see what you're getting at with the "managed" vs "remote" distinction, but I'm not sure whether and where you imagine that appearing in the yaml configs.  Are you suggesting an explicit setting on 'solrbackup'?  Or that it be implicit based on the value of the 'repository' setting you mention?
   2. Letting users specify a "backupRepositoryName" on their 'solrbackup' makes sense to me.  And further it implies that a user should be able to configure multiple sets of backup configs in their solrcloud's backupRestoreOptions setting. (i.e. configuring local backup settings and gcs backup settings aren't mutually exclusive - we'd support use of both within the same solrcloud.)
   3. It seems like you're on the fence about having the operator bootstrap required buckets/locations, and don't have a strong opinion there.  I lean towards skipping that bc of the complexity of bringing in S3, GCS, etc. clients to the operator - at least until we get feedback from users that it'd be worth it, but also have mixed feelings about it.
   
   So taking those suggestions, our new example CRDs would look something like:
   
   **solrcloud**
   ```
   ...
     dataStorage:
       ...
       backupRestoreOptions:
         - name: "my-gcs"
            type: "gcs" // A new enum-ish field - either 'local', or 'gcs'
            bucket: "solr-log-test"
            gcsCredentialSecret: "my-gcs-secret"
            baseLocation: "logs"
         - name: "my-local"
            type: "local"
            default: true
       ...
   ```
   
   **solrbackup (gcs)**
   ```
   apiVersion: solr.apache.org/v1beta1
   kind: SolrBackup
   metadata:
     name: gcs_techproducts_backup
     namespace: default
   spec:
     solrCloud: jasons_cluster
     repository: "my-gcs"
     location: "logs_alt"
     collections:
       - techproducts
   ```
   
   **solrbackup (local)**
   ```
   apiVersion: solr.apache.org/v1beta1
   kind: SolrBackup
   metadata:
     name: local_techproducts_backup
     namespace: default
   spec:
     solrCloud: jasons_cluster
     repository: "my-local"
     location: "logs_alt"
     persistence:  // Ignored if 'my-local' repository isn't of a type that supports "managed" backups. (i.e. type=local)
       volume:
         source:
           persistentVolumeClaim:
             claimName: "pvc-test"
     collections:
       - techproducts
   ```
   
   Those could be off a bit based on what you meant regarding the "managed" v. "remote" flag.  Does this look closer to what you were thinking? @HoustonPutman 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] gerlowskija commented on issue #301: Add support for GCS storage to 'solrbackup'

Posted by GitBox <gi...@apache.org>.
gerlowskija commented on issue #301:
URL: https://github.com/apache/solr-operator/issues/301#issuecomment-893584633


   I've attached a rough PR that shows how this could be done.  Below are an example 'solrcloud' and 'solrbackup' that use the proposed functionality:
   
   **SolrCloud**
   ```
   apiVersion: solr.apache.org/v1beta1
   kind: SolrCloud
   metadata:
     name: jasons_cluster
   spec:
     dataStorage:
       persistent:
         reclaimPolicy: Delete
         pvcTemplate:
           spec:
             resources:
               requests:
                 storage: "5Gi"
       backupRestoreOptions:
         gcsStorage:
           bucket: "solr-log-test"
           gcsCredentialSecret: "my-gcs-secret"
           baseLocation: "logs"
       ...
   ```
   
   **SolrBackup**
   ```
   apiVersion: solr.apache.org/v1beta1
   kind: SolrBackup
   metadata:
     name: gcs_techproducts_backup
     namespace: default
   spec:
     solrCloud: jasons_cluster
     collections:
       - techproducts
   ```
   (Note that there's no new configuration in 'solrbackup', just the removal of the 'persistence' section for gcs-backups.)
   
   I'm not wedded to these syntaxes by any means - just wanted to get some examples up here as a concrete starting point for discussion.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] gerlowskija commented on issue #301: Add support for GCS storage to 'solrbackup'

Posted by GitBox <gi...@apache.org>.
gerlowskija commented on issue #301:
URL: https://github.com/apache/solr-operator/issues/301#issuecomment-896239650


   Two additional notes:
   
   1. If solrcloud's `backupRestoreOptions` changes to allow configuring multiple backup repositories, how should backcompat be handled?  The README mentions that we don't make any strict guarantees being an 0.x release, but I wasn't sure whether we still tried to avoid that or not in practice.
   2. I created a ticket for backupschedules here:  #303 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] gerlowskija commented on issue #301: Add support for GCS storage to 'solrbackup'

Posted by GitBox <gi...@apache.org>.
gerlowskija commented on issue #301:
URL: https://github.com/apache/solr-operator/issues/301#issuecomment-893547108


   Some questions this raises:
   
   - should backup mechanisms be mutually exclusive in a given 'solrcloud' definition, or should we allow users to configure multiple and choose between them in each 'solrbackup'.
   - What's the best way to surface an error if a user attempts to configure GCS backups but is using a solr version that doesn't support that.
   - should the operator attempt to create any missing GCS paths/buckets using the provided credentials, or require everything to be created up front by users. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] HoustonPutman commented on issue #301: Add support for GCS storage to 'solrbackup'

Posted by GitBox <gi...@apache.org>.
HoustonPutman commented on issue #301:
URL: https://github.com/apache/solr-operator/issues/301#issuecomment-894420587


   So I just want to make sure I understand correctly.
   
   The only thing the operator should do for these "native" backup options, is to call the Solr API right?
   (And possibly setup paths at the resulting location if necessary)
   
   I'll need to think on this a bit more, but to me it sounds like the only real benefit would be to have the operator be able to do this on a schedule. (and possibly delete old backups if necessary). So instead of facilitating the backup mechanism, it would just be in charge of managing the backups. The more I type this out, the more I'm starting to like it. It would also allow the Solr operator, in the future, to do automatic-rollbacks if it detects failures in a Collection.
   
   I think we could change the SolrBackup to do either "managed" or "remote" backups, and in the case of remote, let the user provide the `repository` and `location` arguments for the Backup command. I do like the idea of requiring users to manage the directories themselves, as we don't really want to build in S3, GCS, HDFS, etc behavior into the operator. But it would make sense if we wanted to support setting up the location to make sure it is ready initially.
   
   So in that case your example of the backupRestoreOptions in the SolrCloud object would be spot on, but the SolrBackup object would need the `backupRepositoryName` (I guess unless there is a default one specified...), and additional options such as `location`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] gerlowskija edited a comment on issue #301: Add support for GCS storage to 'solrbackup'

Posted by GitBox <gi...@apache.org>.
gerlowskija edited a comment on issue #301:
URL: https://github.com/apache/solr-operator/issues/301#issuecomment-896118995


   > The only thing the operator should do for these "native" backup options, is to call the Solr API right?
   
   That's what I'm proposing, yep - the operator wouldn't be doing any of the compression or relocation features for GCS that it currently supports for 'local' backups.  It's "just" calling the Solr API.  (Which, I'd contend, isn't "nothing".  That still saves Ops folks from crafting their own solr.xml, from needing to learn Solr's backup and async-polling APIs, etc.)
   
   > the only real benefit would be to have the operator be able to do this on a schedule
   
   Definitely agree.  As I said above, I think there's value in this ticket alone.  But GCS-support gets much more appealing as the operator's backup featureset generally gets more robust.  I love the idea of a "backupschedule" entity that creates individual solrbackup objects in turn.  I'll file an issue for that as a placeholder for discussion.
   
   > I think we could change the SolrBackup to do either "managed" or "remote" backups
   
   I think I agree with your suggestions here, but let me restate a few of them to make sure I'm understanding you correctly.  There's a point or two I'm unclear on.
   
   1. I see what you're getting at with the "managed" vs "remote" distinction, but I'm not sure whether and where you imagine that appearing in the yaml configs.  Are you suggesting an explicit setting on 'solrbackup'?  Or that it be implicit based on the value of the 'repository' setting you mention?
   2. Letting users specify a "backupRepositoryName" on their 'solrbackup' makes sense to me.  And further it implies that a user should be able to configure multiple sets of backup configs in their solrcloud's backupRestoreOptions setting. (i.e. configuring local backup settings and gcs backup settings aren't mutually exclusive - we'd support use of both within the same solrcloud.)
   3. It seems like you're on the fence about having the operator bootstrap required buckets/locations, and don't have a strong opinion there.  I lean towards skipping that bc of the complexity of bringing in S3, GCS, etc. clients to the operator - at least until we get feedback from users that it'd be worth it, but also have mixed feelings about it.
   
   So taking those suggestions, our new example CRDs would look something like:
   
   **solrcloud**
   ```
   ...
     dataStorage:
       ...
       backupRestoreOptions:
         gcsRepositories:
           - name: "customer-data"
              bucket: "customer-data-bucket"
              gcsCredentialSecret: "my-gcs-secret"
              defaultLocation: "customer_data"
         managedRepositories:
           - name: "local-log-data"
              volume:
                persistentVolumeClaim:
                  claimName: "log-pvc"
       ...
   ```
   
   **solrbackup (gcs)**
   ```
   apiVersion: solr.apache.org/v1beta1
   kind: SolrBackup
   metadata:
     name: gcs_techproducts_backup
     namespace: default
   spec:
     solrCloud: jasons_cluster
     repository: "customer-data"
     location: "customer_data_alt"
     collections:
       - techproducts
   ```
   
   **solrbackup (local)**
   ```
   apiVersion: solr.apache.org/v1beta1
   kind: SolrBackup
   metadata:
     name: local_techproducts_backup
     namespace: default
   spec:
     solrCloud: jasons_cluster
     repository: "local-log-data"
     location: "logs_alt"
     persistence:  // Ignored if 'my-local' repository isn't of a type that supports "managed" backups. (i.e. type=local)
       volume:
         source:
           persistentVolumeClaim:
             claimName: "pvc-test"
     collections:
       - applicationlogs
   ```
   
   Those could be off a bit based on what you meant about exposing the "managed" v. "remote" flag.  Does this look closer to what you were thinking? @HoustonPutman 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] HoustonPutman closed issue #301: Add support for GCS storage to 'solrbackup'

Posted by GitBox <gi...@apache.org>.
HoustonPutman closed issue #301:
URL: https://github.com/apache/solr-operator/issues/301


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] HoustonPutman closed issue #301: Add support for GCS storage to 'solrbackup'

Posted by GitBox <gi...@apache.org>.
HoustonPutman closed issue #301:
URL: https://github.com/apache/solr-operator/issues/301


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org