You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Richard (JIRA)" <ji...@apache.org> on 2019/05/30 19:43:00 UTC
[jira] [Commented] (SOLR-9952) S3BackupRepository

    [ https://issues.apache.org/jira/browse/SOLR-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852277#comment-16852277 ] 

Richard commented on SOLR-9952:
-------------------------------

Is anyone using this at a more present time?  
Some background: 
I'm currently running solr v7.4, we run solr on bare metal, and currently store our backups on a hdfs cluster which is also on bare metal. We want to store our backups into AWS on S3. I tried applying these patches, and following the attached PDF and have ran into a pile of problems.

I was getting many errors trying to connect via {{S3N}}, which makes sense due to how things have changed since the creation of this ticket. I changed this to {{S3A}} and was starting to get somewhere. I also had to add the following property to the xml file in order to remove some connection errors I was getting

{code:xml}
  <property>
    <name>fs.s3a.endpoint</name>
    <value>s3.eu-west-2.amazonaws.com</value>
  </property>
{code}

I was then getting issues with the filesystem used, so I changed this to use the {{S3AFileSystem}} rather than the {{NativeS3FileSystem}} because that is also soon to be depreciated. This started to provide some positive results.

I can now backup onto S3, funnily enough I got the classic {noformat}Caused by: java.lang.IllegalStateException: Connection pool shut down{noformat}

It's really hacky, but somehow, got around this by essentially swapping around the order of what is being backed up. Originally it backs up the index files and then the zookeeper information, however, to get backups working, I swapped it round so it backups the zookeeper information first, and then the index files. This can be seen below:
{code}
--- a/solr/core/src/java/org/apache/solr/cloud/api/collections/BackupCmd.java
+++ b/solr/core/src/java/org/apache/solr/cloud/api/collections/BackupCmd.java
@@ -89,17 +89,6 @@ public class BackupCmd implements OverseerCollectionMessageHandler.Cmd {
     // Create a directory to store backup details.
     repository.createDirectory(backupPath);

-    String strategy = message.getStr(CollectionAdminParams.INDEX_BACKUP_STRATEGY, CollectionAdminParams.COPY_FILES_STRATEGY);
-    switch (strategy) {
-      case CollectionAdminParams.COPY_FILES_STRATEGY: {
-        copyIndexFiles(backupPath, message, results);
-        break;
-      }
-      case CollectionAdminParams.NO_INDEX_BACKUP_STRATEGY: {
-        break;
-      }
-    }
-
     log.info("Starting to backup ZK data for backupName={}", backupName);

     //Download the configs
@@ -127,6 +116,18 @@ public class BackupCmd implements OverseerCollectionMessageHandler.Cmd {
     backupMgr.downloadCollectionProperties(location, backupName, collectionName);

     log.info("Completed backing up ZK data for backupName={}", backupName);
+
+    String strategy = message.getStr(CollectionAdminParams.INDEX_BACKUP_STRATEGY, CollectionAdminParams.COPY_FILES_STRATEGY);
+    switch (strategy) {
+      case CollectionAdminParams.COPY_FILES_STRATEGY: {
+        copyIndexFiles(backupPath, message, results);
+        break;
+      }
+      case CollectionAdminParams.NO_INDEX_BACKUP_STRATEGY: {
+        break;
+      }
+    }
+
   }
{code}

So with the above, I am able to successfully backup a collection to an S3 bucket. My next problem is restoring.

I am getting the same {noformat}Caused by: java.lang.IllegalStateException: Connection pool shut down{noformat}.

It appears it is getting the list of files it needs from the S3 bucket to restore and restores the first file successfully. But when it comes to restoring the second file, it appears something has closed the connection.

I have tried relentless different versions of many different packages, from the {{aws}} package, the {{http lib}} package, even tried upgrade hadoop and still getting the same issue. 

I saw on a lot of posts to add the following to the connection
{code:java}
.setConnectionManagerShared(true)
{code}
Which I forced in all instances, and still met the same problem. 

So my question is, has anyone got this working successfully at a more recent time? 

> S3BackupRepository
> ------------------
>
>                 Key: SOLR-9952
>                 URL: https://issues.apache.org/jira/browse/SOLR-9952
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Backup/Restore
>            Reporter: Mikhail Khludnev
>            Priority: Major
>         Attachments: 0001-SOLR-9952-Added-dependencies-for-hadoop-amazon-integ.patch, 0002-SOLR-9952-Added-integration-test-for-checking-backup.patch, Running Solr on S3.pdf, core-site.xml.template
>
>
> I'd like to have a backup repository implementation allows to snapshot to AWS S3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org