You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2021/07/09 15:49:29 UTC

[GitHub] [solr] HoustonPutman commented on a change in pull request #120: SOLR-15089: Allow backup/restoration to Amazon's S3 blobstore

HoustonPutman commented on a change in pull request #120:
URL: https://github.com/apache/solr/pull/120#discussion_r667029697



##########
File path: solr/contrib/blob-repository/src/java/org/apache/solr/s3/S3BackupRepositoryConfig.java
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.s3;
+
+import com.google.common.base.Strings;
+import org.apache.solr.common.util.NamedList;
+
+import java.util.Locale;
+import java.util.Map;
+
+/**
+ * Class representing the {@code backup} blob config bundle specified in solr.xml. All user-provided config can be
+ * overridden via environment variables (use uppercase, with '_' instead of '.'), see {@link S3BackupRepositoryConfig#toEnvVar}.
+ */
+public class S3BackupRepositoryConfig {
+
+    public static final String BUCKET_NAME = "blob.s3.bucket.name";
+    public static final String REGION = "blob.s3.region";
+    public static final String PROXY_HOST = "blob.s3.proxy.host";
+    public static final String PROXY_PORT = "blob.s3.proxy.port";
+    public static final String S3MOCK = "blob.s3.mock";
+
+    private final String bucketName;
+    private final String region;
+    private final String proxyHost;
+    private final int proxyPort;
+    private final boolean s3mock;
+
+    public S3BackupRepositoryConfig(NamedList<String> args) {
+        NamedList<String> config = args.clone();
+
+        region = getStringConfig(config, REGION);
+        bucketName = getStringConfig(config, BUCKET_NAME);
+        proxyHost = getStringConfig(config, PROXY_HOST);
+        proxyPort = getIntConfig(config, PROXY_PORT);
+        s3mock = getBooleanConfig(config, S3MOCK);
+    }
+
+    /**
+     * Construct a {@link S3StorageClient} from the provided config.
+     */
+    public S3StorageClient buildClient() {
+
+        if (s3mock) {
+            return new AdobeMockS3StorageClient(bucketName);
+        } else {
+            return new S3StorageClient(bucketName, region, proxyHost, proxyPort);
+        }
+    }
+
+    private static String getStringConfig(NamedList<String> config, String property) {
+        Map<String, String> env = System.getenv();
+        return env.getOrDefault(toEnvVar(property), config.get(property));
+    }
+
+    private static int getIntConfig(NamedList<String> config, String property) {

Review comment:
       Seems like this should accept an int as well....

##########
File path: solr/contrib/blob-repository/src/java/org/apache/solr/s3/S3BackupRepositoryConfig.java
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.s3;
+
+import com.google.common.base.Strings;
+import org.apache.solr.common.util.NamedList;
+
+import java.util.Locale;
+import java.util.Map;
+
+/**
+ * Class representing the {@code backup} blob config bundle specified in solr.xml. All user-provided config can be
+ * overridden via environment variables (use uppercase, with '_' instead of '.'), see {@link S3BackupRepositoryConfig#toEnvVar}.
+ */
+public class S3BackupRepositoryConfig {
+
+    public static final String BUCKET_NAME = "blob.s3.bucket.name";
+    public static final String REGION = "blob.s3.region";
+    public static final String PROXY_HOST = "blob.s3.proxy.host";
+    public static final String PROXY_PORT = "blob.s3.proxy.port";
+    public static final String S3MOCK = "blob.s3.mock";
+
+    private final String bucketName;
+    private final String region;
+    private final String proxyHost;
+    private final int proxyPort;
+    private final boolean s3mock;
+
+    @SuppressWarnings({"rawtypes", "unchecked"})
+    public S3BackupRepositoryConfig(NamedList args) {
+        NamedList<String> config = args.clone();
+
+        region = getStringConfig(config, REGION);
+        bucketName = getStringConfig(config, BUCKET_NAME);
+        proxyHost = getStringConfig(config, PROXY_HOST);
+        proxyPort = getIntConfig(config, PROXY_PORT);
+        s3mock = getBooleanConfig(config, S3MOCK);
+    }
+
+    /**
+     * Construct a {@link S3StorageClient} from the provided config.
+     */
+    public S3StorageClient buildClient() {
+
+        if (s3mock) {
+            return new AdobeMockS3StorageClient(bucketName);
+        } else {
+            return new S3StorageClient(bucketName, region, proxyHost, proxyPort);
+        }
+    }
+
+    private static String getStringConfig(NamedList<String> config, String property) {
+        Map<String, String> env = System.getenv();
+        return env.getOrDefault(toEnvVar(property), config.get(property));
+    }
+
+    private static int getIntConfig(NamedList<String> config, String property) {
+        String stringConfig = getStringConfig(config, property);
+
+        if (!Strings.isNullOrEmpty(stringConfig)) {
+            // Backup/restore cmd will fail if present but not an integer.
+            return Integer.parseInt(stringConfig);
+        } else {
+            return 0;
+        }
+    }
+
+    /**
+     * If the property as any other value than 'true' or 'TRUE', this will default to false.
+     */
+    private static boolean getBooleanConfig(NamedList<String> config, String property) {

Review comment:
       Why does this not support a boolean xml entry?

##########
File path: solr/contrib/blob-repository/src/java/org/apache/solr/s3/AdobeMockS3StorageClient.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.s3;
+
+import com.amazonaws.client.builder.AwsClientBuilder;
+import com.amazonaws.regions.Regions;
+import com.amazonaws.services.s3.AmazonS3;
+import com.amazonaws.services.s3.AmazonS3ClientBuilder;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.base.Strings;
+
+/**
+ * This storage client exists to work around some of the incongruencies Adobe S3Mock has with the S3 API.
+ * The main difference is that S3Mock does not support paths with a leading '/', but S3 does, and our code
+ * in {@link S3StorageClient} requires all paths to have a leading '/'.
+ */
+class AdobeMockS3StorageClient extends S3StorageClient {
+
+    static final int DEFAULT_MOCK_S3_PORT = 9090;
+    private static final String DEFAULT_MOCK_S3_ENDPOINT = "http://localhost:" + DEFAULT_MOCK_S3_PORT;
+
+    AdobeMockS3StorageClient(String bucketName) {
+        super(createInternalClient(), bucketName);
+    }
+
+    @VisibleForTesting
+    AdobeMockS3StorageClient(AmazonS3 s3client, String bucketName) {
+        super(s3client, bucketName);
+    }
+
+    private static AmazonS3 createInternalClient() {
+        String s3MockEndpoint = System.getenv().getOrDefault("MOCK_S3_ENDPOINT", DEFAULT_MOCK_S3_ENDPOINT);
+
+        return AmazonS3ClientBuilder.standard()
+            .enablePathStyleAccess()
+            .withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(s3MockEndpoint, Regions.US_EAST_1.name()))
+            .build();
+    }
+
+    /**
+     * Ensures path adheres to some rules (different than the rules that S3 cares about):
+     * -Trims leading slash, if given
+     * -If it's a file, throw an error if it ends with a trailing slash
+     */
+    @Override
+    String sanitizedPath(String path, boolean isFile) throws S3Exception {
+        Preconditions.checkArgument(!Strings.isNullOrEmpty(path));

Review comment:
       I've seen a lot of errors trying to get this working, and the precondition errors do not produce useful messages to the users. I would really prefer to see us returning back human intelligible messages instead of using the errors that Preconditions throw. 

##########
File path: solr/contrib/blob-repository/src/java/org/apache/solr/s3/S3StorageClient.java
##########
@@ -0,0 +1,486 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.s3;
+
+import com.amazonaws.AmazonClientException;
+import com.amazonaws.AmazonServiceException;
+import com.amazonaws.ClientConfiguration;
+import com.amazonaws.Protocol;
+import com.amazonaws.regions.Regions;
+import com.amazonaws.services.s3.AmazonS3;
+import com.amazonaws.services.s3.AmazonS3ClientBuilder;
+import com.amazonaws.services.s3.model.DeleteObjectsRequest;
+import com.amazonaws.services.s3.model.DeleteObjectsRequest.KeyVersion;
+import com.amazonaws.services.s3.model.DeleteObjectsResult;
+import com.amazonaws.services.s3.model.ListObjectsRequest;
+import com.amazonaws.services.s3.model.ObjectListing;
+import com.amazonaws.services.s3.model.ObjectMetadata;
+import com.amazonaws.services.s3.model.PutObjectRequest;
+import com.amazonaws.services.s3.model.S3Object;
+import com.amazonaws.services.s3.model.S3ObjectSummary;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.base.Strings;
+import com.google.common.collect.Lists;
+import org.apache.commons.io.input.ClosedInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+
+/**
+ * Creates a {@link AmazonS3} for communicating with AWS S3. Utilizes the default credential provider chain;
+ * reference <a href="https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html">AWS SDK docs</a> for
+ * details on where this client will fetch credentials from, and the order of precedence.
+ */
+class S3StorageClient {
+
+    private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+    static final String BLOB_FILE_PATH_DELIMITER = "/";
+
+    // S3 has a hard limit of 1000 keys per batch delete request
+    private static final int MAX_KEYS_PER_BATCH_DELETE = 1000;
+
+    // Metadata name used to identify flag directory entries in S3
+    private static final String BLOB_DIR_HEADER = "x_is_directory";
+
+    // Error messages returned by S3 for a key not found.
+    private static final Set<String> NOT_FOUND_CODES = Set.of("NoSuchKey", "404 Not Found");
+
+    private final AmazonS3 s3Client;
+
+    /**
+     * The S3 bucket where we write all of our blobs to.
+     */
+    private final String bucketName;
+
+    S3StorageClient(String bucketName, String region, String proxyHost, int proxyPort) {
+        this(createInternalClient(region, proxyHost, proxyPort), bucketName);
+    }
+
+    @VisibleForTesting
+    S3StorageClient(AmazonS3 s3Client, String bucketName) {
+        this.s3Client = s3Client;
+        this.bucketName = bucketName;
+    }
+
+    private static AmazonS3 createInternalClient(String region, String proxyHost, int proxyPort) {
+        ClientConfiguration clientConfig = new ClientConfiguration()
+            .withProtocol(Protocol.HTTPS);
+
+        // If configured, add proxy
+        if (!Strings.isNullOrEmpty(proxyHost)) {
+            clientConfig.setProxyHost(proxyHost);
+            if (proxyPort > 0) {
+                clientConfig.setProxyPort(proxyPort);
+            }
+        }
+
+        /*
+         * Default s3 client builder loads credentials from disk and handles token refreshes
+         */
+        return AmazonS3ClientBuilder.standard()
+            .enablePathStyleAccess()
+            .withClientConfiguration(clientConfig)
+            .withRegion(Regions.fromName(region))
+            .build();
+    }
+
+    /**
+     * Create Directory in S3 Blob Store.
+     *
+     * @param path Directory Path in Blob Store.
+     */
+    void createDirectory(String path) throws S3Exception {
+        path = sanitizedPath(path, false);
+
+        if (!parentDirectoryExist(path)) {
+            createDirectory(path.substring(0, path.lastIndexOf(BLOB_FILE_PATH_DELIMITER)));
+            //TODO see https://issues.apache.org/jira/browse/SOLR-15359
+//            throw new BlobException("Parent directory doesn't exist, path=" + path);
+        }
+
+        ObjectMetadata objectMetadata = new ObjectMetadata();
+        objectMetadata.addUserMetadata(BLOB_DIR_HEADER, "true");
+        objectMetadata.setContentLength(0);
+
+        // Create empty blob object with header
+        final InputStream im = ClosedInputStream.CLOSED_INPUT_STREAM;
+
+        try {
+            PutObjectRequest putRequest = new PutObjectRequest(bucketName, path, im, objectMetadata);
+            s3Client.putObject(putRequest);
+        } catch (AmazonClientException ase) {
+            throw handleAmazonException(ase);
+        }
+    }
+
+    /**
+     /**
+     * Delete files from S3 Blob Store. Deletion order is not guaranteed.
+     *
+     * @param paths Paths to files or blobs.
+     */
+    void delete(Collection<String> paths) throws S3Exception {
+        Set<String> entries = new HashSet<>();
+        for (String path : paths) {
+            entries.add(sanitizedPath(path, true));
+        }
+
+        deleteBlobs(entries);
+    }
+
+    /**
+     * Delete directory, all the files and sub-directories from S3.
+     *
+     * @param path Path to directory in S3.
+     */
+    void deleteDirectory(String path) throws S3Exception {
+        path = sanitizedPath(path, false);
+
+        List<String> entries = new ArrayList<>();
+        entries.add(path);
+
+        // Get all the files and subdirectories
+        entries.addAll(listAll(path));
+
+        deleteObjects(entries);
+    }
+
+    /**
+     * List all the files and sub-directories directly under given path.
+     *
+     * @param path Path to directory in S3.
+     * @return Files and sub-directories in path.
+     */
+    String[] listDir(String path) throws S3Exception {
+        path = sanitizedPath(path, false);
+
+        String prefix = path.equals("/") ? path : path + BLOB_FILE_PATH_DELIMITER;
+        ListObjectsRequest listRequest = new ListObjectsRequest()
+            .withBucketName(bucketName)
+            .withPrefix(prefix)
+            .withDelimiter(BLOB_FILE_PATH_DELIMITER);
+
+        List<String> entries = new ArrayList<>();
+        try {
+            ObjectListing objectListing = s3Client.listObjects(listRequest);
+
+            while (true) {
+                List<String> files = objectListing.getObjectSummaries().stream()
+                        .map(S3ObjectSummary::getKey)
+                        // This filtering is needed only for S3mock. Real S3 does not ignore the trailing '/' in the prefix.
+                        .filter(s -> s.startsWith(prefix))
+                        .map(s -> s.substring(prefix.length()))
+                        .collect(Collectors.toList());
+
+                entries.addAll(files);
+
+                if (objectListing.isTruncated()) {
+                    objectListing = s3Client.listNextBatchOfObjects(objectListing);
+                } else {
+                    break;
+                }
+            }
+            return entries.toArray(new String[0]);
+        } catch (AmazonClientException ase) {
+            throw handleAmazonException(ase);
+        }
+    }
+
+    /**
+     * Check if path exists.
+     *
+     * @param path to File/Directory in S3.
+     * @return true if path exists, otherwise false?
+     */
+    boolean pathExists(String path) throws S3Exception {
+        path = sanitizedPath(path, false);
+
+        // for root return true
+        if ("/".equals(path)) {

Review comment:
       This should accept empty strings as well for the mock test case.

##########
File path: solr/contrib/blob-repository/src/java/org/apache/solr/s3/AdobeMockS3StorageClient.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.s3;
+
+import com.amazonaws.client.builder.AwsClientBuilder;
+import com.amazonaws.regions.Regions;
+import com.amazonaws.services.s3.AmazonS3;
+import com.amazonaws.services.s3.AmazonS3ClientBuilder;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.base.Strings;
+
+/**
+ * This storage client exists to work around some of the incongruencies Adobe S3Mock has with the S3 API.
+ * The main difference is that S3Mock does not support paths with a leading '/', but S3 does, and our code
+ * in {@link S3StorageClient} requires all paths to have a leading '/'.
+ */
+class AdobeMockS3StorageClient extends S3StorageClient {
+
+    static final int DEFAULT_MOCK_S3_PORT = 9090;
+    private static final String DEFAULT_MOCK_S3_ENDPOINT = "http://localhost:" + DEFAULT_MOCK_S3_PORT;
+
+    AdobeMockS3StorageClient(String bucketName) {
+        super(createInternalClient(), bucketName);
+    }
+
+    @VisibleForTesting
+    AdobeMockS3StorageClient(AmazonS3 s3client, String bucketName) {
+        super(s3client, bucketName);
+    }
+
+    private static AmazonS3 createInternalClient() {
+        String s3MockEndpoint = System.getenv().getOrDefault("MOCK_S3_ENDPOINT", DEFAULT_MOCK_S3_ENDPOINT);
+
+        return AmazonS3ClientBuilder.standard()
+            .enablePathStyleAccess()
+            .withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(s3MockEndpoint, Regions.US_EAST_1.name()))
+            .build();
+    }
+
+    /**
+     * Ensures path adheres to some rules (different than the rules that S3 cares about):
+     * -Trims leading slash, if given
+     * -If it's a file, throw an error if it ends with a trailing slash
+     */
+    @Override
+    String sanitizedPath(String path, boolean isFile) throws S3Exception {
+        Preconditions.checkArgument(!Strings.isNullOrEmpty(path));

Review comment:
       This line should also not be here, it's causing the example in `solr/contrib/blob-repository/README.md` to fail, because the `location` path gets sanitized to an empty string, which fails. The failures should occur when someone tries to write to an empty path, such as when `location` and `name` are empty, but that can be caught much earlier on, such as in the collections API request handler. Or when the S3MockClient actually tries to make the request....




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org