You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2021/04/13 19:38:17 UTC

[GitHub] [solr] dsmiley commented on a change in pull request #68: Use BackupRepository instead of BlobStore.

dsmiley commented on a change in pull request #68:
URL: https://github.com/apache/solr/pull/68#discussion_r612708013



##########
File path: solr/contrib/blob-directory/src/java/org/apache/solr/blob/BlobDirectoryFactory.java
##########
@@ -20,31 +20,49 @@
 import java.io.File;
 import java.io.IOException;
 import java.lang.invoke.MethodHandles;
+import java.net.URI;
+import java.util.Arrays;
 import java.util.List;
 import java.util.Locale;
 import java.util.regex.Pattern;
+import java.util.stream.Collectors;
 
 import org.apache.commons.io.FileUtils;
 import org.apache.lucene.store.*;
 import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.CoreAdminParams;
 import org.apache.solr.common.params.SolrParams;
 import org.apache.solr.common.util.IOUtils;
 import org.apache.solr.common.util.NamedList;
 import org.apache.solr.core.CachingDirectoryFactory;
 import org.apache.solr.core.CoreContainer;
 import org.apache.solr.core.CoreDescriptor;
 import org.apache.solr.core.DirectoryFactory;
+import org.apache.solr.core.backup.repository.BackupRepository;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 public class BlobDirectoryFactory extends CachingDirectoryFactory {
 
   private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
 
+  /**
+   * Default maximum number of threads that push files concurrently to the backup repository.
+   */
+  private static final int DEFAULT_MAX_THREADS = 20;
+
+  /**
+   * Default size of the buffer used to transfer input-output stream to the {@link BackupRepository}, in bytes.
+   * The appropriate size depends on the {@link BackupRepository}.
+   * There is one buffer per thread.
+   */
+  public static final int DEFAULT_STREAM_BUFFER_SIZE = 32_768;
+
   private static final Pattern INDEX_NAME_PATTERN = Pattern.compile("index(?:\\.[0-9]{17})?");
 
   private String localRootPath;
-  private BlobStore blobStore;
+  private BackupRepository backupRepository;

Review comment:
       In terms of naming, let's just use the word "repository" to refer to the remote place?  I know of course that the abstraction is "BackupRepository" but the "backup"-ness aspect could plausibly be renamed away so that it's clearer that it's more general than backups.  I mentioned this in JIRA somewhere.

##########
File path: solr/contrib/blob-directory/src/java/org/apache/solr/blob/BlobDirectory.java
##########
@@ -181,7 +182,12 @@ protected void ensureOpen() throws AlreadyClosedException {
     @Override
     public void close() throws IOException {
       // Free the reference to the IndexOutput.
-      blobFileSupplier.indexOutput = null;
+      blobFileSupplier.getBlobFile();
+      // TODO
+      //  It could be possible to start pushing the file asynchronously with BlobPusher at this time,

Review comment:
       Nice comment but I wouldn't expect this idea to be in close(); I'd expect it to be somewhere when we write files.

##########
File path: solr/contrib/blob-directory/src/java/org/apache/solr/blob/BlobPusher.java
##########
@@ -17,36 +17,64 @@
 
 package org.apache.solr.blob;
 
+import org.apache.lucene.util.IOUtils;
+import org.apache.solr.common.util.ExecutorUtil;
+import org.apache.solr.common.util.SolrNamedThreadFactory;
+import org.apache.solr.core.backup.repository.BackupRepository;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
 import java.io.Closeable;
 import java.io.IOException;
 import java.io.InputStream;
+import java.io.OutputStream;
 import java.lang.invoke.MethodHandles;
+import java.net.URI;
 import java.util.Collection;
 import java.util.List;
+import java.util.Objects;
 import java.util.concurrent.*;
+import java.util.regex.Pattern;
 import java.util.stream.Collectors;
 
-import org.apache.lucene.util.IOUtils;
-import org.apache.solr.common.util.ExecutorUtil;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-/** Pushes a set of files to Blob, and works with listings. */
+/**
+ * Pushes a set of files to Blob, and works with listings.
+ */
 public class BlobPusher implements Closeable {
 
   private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
 
-  // WORK IN PROGRESS!!
+  private static final String PATH_SEPARATOR = "/";
+  private static final Pattern PATH_SPLITTER = Pattern.compile(PATH_SEPARATOR);
 
-  private final BlobStore blobStore;
-  private final ExecutorService executor = ExecutorUtil.newMDCAwareCachedThreadPool("blobPusher");
+  // WORK IN PROGRESS!!
 
-  public BlobPusher(BlobStore blobStore) {
-    this.blobStore = blobStore;
+  private final BackupRepository backupRepository;
+  private final URI backupLocation;
+  private final ExecutorService executor;
+  // Pool of stream buffers, one per executor thread, to reuse them as the buffer size may be large
+  // to have efficient stream transfer to the remote backup repository.
+  private final ThreadLocal<byte[]> streamBuffers;

Review comment:
       I haven't seen this technique before, or I don't recall.  I wonder if it's worth the trouble?

##########
File path: solr/contrib/blob-directory/src/java/org/apache/solr/blob/BlobDirectory.java
##########
@@ -181,7 +182,12 @@ protected void ensureOpen() throws AlreadyClosedException {
     @Override
     public void close() throws IOException {
       // Free the reference to the IndexOutput.
-      blobFileSupplier.indexOutput = null;
+      blobFileSupplier.getBlobFile();

Review comment:
       Calling getBlobFile frees a reference?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org