You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/02/18 20:31:54 UTC

[GitHub] [druid] maytasm3 opened a new pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

maytasm3 opened a new pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375
 
 
   Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
   
   ### Description
   
   Cloud InputSource (such as S3, etc.) should be able to support taking in credentials optionally in order to be able to read from multiple buckets/location. This is in order to use different credentials than the default (i.e. the one provided in the common runtime property) and also to be able to use different credentials for each ingestion.
   
   This address the proposal https://github.com/apache/druid/issues/9305
   
   Motivation
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r384170055
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -42,32 +46,66 @@
 import java.net.URI;
 import java.util.Iterator;
 import java.util.List;
+import java.util.Objects;
 import java.util.stream.Collectors;
 import java.util.stream.Stream;
 
 public class S3InputSource extends CloudObjectInputSource
 {
-  private final ServerSideEncryptingAmazonS3 s3Client;
+  // We lazily initialize ServerSideEncryptingAmazonS3 to avoid costly s3 operation when we only need S3InputSource
+  // for stored information (such as for task logs) and not for ingestion.
+  // (This cost only applies for new ServerSideEncryptingAmazonS3 created with s3InputSourceProperties given).
+  private final Supplier<ServerSideEncryptingAmazonS3> s3ClientSupplier;
+  @JsonProperty("properties")
+  private final S3InputSourceProperties s3InputSourceProperties;
   private final S3InputDataConfig inputDataConfig;
 
   @JsonCreator
   public S3InputSource(
       @JacksonInject ServerSideEncryptingAmazonS3 s3Client,
 
 Review comment:
   Yea..we still need it to use the singleton client for when the properties is not present in the JSON. This will save the cost of initializing the s3client using the s3ClientSupplier since we can reuse the  injected singleton

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sthetland commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
sthetland commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383525868
 
 

 ##########
 File path: docs/development/extensions-core/s3.md
 ##########
 @@ -64,6 +64,7 @@ In addition to this you need to set additional configuration, specific for [deep
 ### S3 authentication methods
 
 To connect to your S3 bucket (whether deep storage bucket or source bucket), Druid use the following credentials providers chain
 
 Review comment:
   Not your change but..
    
   ```suggestion
   Druid uses the following credentials provider chain to connect to your S3 bucket (whether a deep storage bucket or source bucket). 
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r384169958
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSourceProperties.java
 ##########
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.s3;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import org.apache.druid.metadata.PasswordProvider;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+/**
+ * Contains properties for s3 input source.
+ * Properties can be specified by ingestionSpec which will override system default.
+ */
+public class S3InputSourceProperties
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383598432
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -76,7 +114,41 @@ protected S3Entity createEntity(InputSplit<CloudObjectLocation> split)
   @Override
   public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
   {
-    return new S3InputSource(s3Client, inputDataConfig, null, null, ImmutableList.of(split.get()));
+    return new S3InputSource(
+        s3Client.get(),
 
 Review comment:
   oops. Totally didn't see the extra whitespace at the beginning of some lines. Fixed. Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r384170055
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -42,32 +46,66 @@
 import java.net.URI;
 import java.util.Iterator;
 import java.util.List;
+import java.util.Objects;
 import java.util.stream.Collectors;
 import java.util.stream.Stream;
 
 public class S3InputSource extends CloudObjectInputSource
 {
-  private final ServerSideEncryptingAmazonS3 s3Client;
+  // We lazily initialize ServerSideEncryptingAmazonS3 to avoid costly s3 operation when we only need S3InputSource
+  // for stored information (such as for task logs) and not for ingestion.
+  // (This cost only applies for new ServerSideEncryptingAmazonS3 created with s3InputSourceProperties given).
+  private final Supplier<ServerSideEncryptingAmazonS3> s3ClientSupplier;
+  @JsonProperty("properties")
+  private final S3InputSourceProperties s3InputSourceProperties;
   private final S3InputDataConfig inputDataConfig;
 
   @JsonCreator
   public S3InputSource(
       @JacksonInject ServerSideEncryptingAmazonS3 s3Client,
 
 Review comment:
   Yea..we still need it to use the singleton client for when the properties is not present in the JSON. This will save the cost of initializing the s3client using the s3ClientSupplier since we can reuse the  injected singleton

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383612190
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -19,50 +19,88 @@
 
 package org.apache.druid.data.input.s3;
 
+import com.amazonaws.auth.AWSStaticCredentialsProvider;
+import com.amazonaws.auth.BasicAWSCredentials;
+import com.amazonaws.services.s3.AmazonS3ClientBuilder;
 import com.amazonaws.services.s3.model.S3ObjectSummary;
 import com.fasterxml.jackson.annotation.JacksonInject;
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.google.common.base.Preconditions;
+import com.google.common.base.Supplier;
+import com.google.common.base.Suppliers;
 import com.google.common.collect.ImmutableList;
 import org.apache.druid.data.input.InputSplit;
 import org.apache.druid.data.input.impl.CloudObjectInputSource;
 import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.S3ConfigProperties;
 import org.apache.druid.data.input.impl.SplittableInputSource;
 import org.apache.druid.storage.s3.S3InputDataConfig;
+import org.apache.druid.storage.s3.S3StorageConfig;
 import org.apache.druid.storage.s3.S3StorageDruidModule;
 import org.apache.druid.storage.s3.S3Utils;
 import org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3;
 
 import javax.annotation.Nullable;
 import java.net.URI;
 import java.util.List;
+import java.util.Objects;
 import java.util.stream.Stream;
 import java.util.stream.StreamSupport;
 
 public class S3InputSource extends CloudObjectInputSource<S3Entity>
 {
-  private final ServerSideEncryptingAmazonS3 s3Client;
+  // We lazily initialize s3Client to avoid costly s3 operation when we only need S3InputSource for stored information
+  // (such as for task logs) and not for ingestion. (This cost only applies for new s3Client created with
+  // s3ConfigProperties given).
+  private final Supplier<ServerSideEncryptingAmazonS3> s3Client;
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383085500
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -46,13 +51,25 @@
   @JsonCreator
   public S3InputSource(
       @JacksonInject ServerSideEncryptingAmazonS3 s3Client,
+      @JacksonInject AmazonS3ClientBuilder amazonS3ClientBuilder,
+      @JacksonInject S3StorageConfig storageConfig,
       @JsonProperty("uris") @Nullable List<URI> uris,
       @JsonProperty("prefixes") @Nullable List<URI> prefixes,
-      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects,
+      @JsonProperty("properties") @Nullable CloudConfigProperties cloudConfigProperties
 
 Review comment:
   Yep. Will only modify docs for s3

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383599953
 
 

 ##########
 File path: docs/development/extensions-core/s3.md
 ##########
 @@ -64,6 +64,7 @@ In addition to this you need to set additional configuration, specific for [deep
 ### S3 authentication methods
 
 To connect to your S3 bucket (whether deep storage bucket or source bucket), Druid use the following credentials providers chain
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r382197640
 
 

 ##########
 File path: core/src/main/java/org/apache/druid/data/input/impl/CloudConfigProperties.java
 ##########
 @@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.impl;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import org.apache.druid.metadata.PasswordProvider;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+public class CloudConfigProperties
+{
+  @JsonCreator
+  public CloudConfigProperties(
+      @JsonProperty("accessKeyId") @Nullable PasswordProvider accessKeyId,
 
 Review comment:
   Do we want to support Azure with this too? If so Azure is a bit different; requires an account id, and secret key. Can this be extended for that easily?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383538927
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -19,50 +19,88 @@
 
 package org.apache.druid.data.input.s3;
 
+import com.amazonaws.auth.AWSStaticCredentialsProvider;
+import com.amazonaws.auth.BasicAWSCredentials;
+import com.amazonaws.services.s3.AmazonS3ClientBuilder;
 import com.amazonaws.services.s3.model.S3ObjectSummary;
 import com.fasterxml.jackson.annotation.JacksonInject;
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.google.common.base.Preconditions;
+import com.google.common.base.Supplier;
+import com.google.common.base.Suppliers;
 import com.google.common.collect.ImmutableList;
 import org.apache.druid.data.input.InputSplit;
 import org.apache.druid.data.input.impl.CloudObjectInputSource;
 import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.S3ConfigProperties;
 import org.apache.druid.data.input.impl.SplittableInputSource;
 import org.apache.druid.storage.s3.S3InputDataConfig;
+import org.apache.druid.storage.s3.S3StorageConfig;
 import org.apache.druid.storage.s3.S3StorageDruidModule;
 import org.apache.druid.storage.s3.S3Utils;
 import org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3;
 
 import javax.annotation.Nullable;
 import java.net.URI;
 import java.util.List;
+import java.util.Objects;
 import java.util.stream.Stream;
 import java.util.stream.StreamSupport;
 
 public class S3InputSource extends CloudObjectInputSource<S3Entity>
 {
-  private final ServerSideEncryptingAmazonS3 s3Client;
+  // We lazily initialize s3Client to avoid costly s3 operation when we only need S3InputSource for stored information
+  // (such as for task logs) and not for ingestion. (This cost only applies for new s3Client created with
+  // s3ConfigProperties given).
+  private final Supplier<ServerSideEncryptingAmazonS3> s3Client;
 
 Review comment:
   nit: suggest `s3ClientSupplier`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383597015
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -76,7 +114,41 @@ protected S3Entity createEntity(InputSplit<CloudObjectLocation> split)
   @Override
   public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
   {
-    return new S3InputSource(s3Client, inputDataConfig, null, null, ImmutableList.of(split.get()));
+    return new S3InputSource(
+        s3Client.get(),
+       null,
+       null,
+        inputDataConfig,
+       null,
+       null,
+       ImmutableList.of(split.get()),
+       getS3ConfigProperties()
+    );
+  }
+
+  @Override
+  public int hashCode()
+  {
+    return Objects.hash(
+        super.hashCode(),
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383594349
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -76,7 +114,41 @@ protected S3Entity createEntity(InputSplit<CloudObjectLocation> split)
   @Override
   public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
   {
-    return new S3InputSource(s3Client, inputDataConfig, null, null, ImmutableList.of(split.get()));
+    return new S3InputSource(
+        s3Client.get(),
 
 Review comment:
   I think that's the same formatting.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383804965
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSourceProperties.java
 ##########
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.s3;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import org.apache.druid.metadata.PasswordProvider;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+/**
+ * Contains properties for s3 input source.
+ * Properties can be specified by ingestionSpec which will override system default.
+ */
+public class S3InputSourceProperties
 
 Review comment:
   this maybe should be named `S3InputSourceConfig` or something to be more consistently named with other config-ish classes (there is not any types of the form `{name}Properties.java` in druid currently, and I associate the word to `java.util.Properties` which this isn't related to)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r382280703
 
 

 ##########
 File path: core/src/main/java/org/apache/druid/data/input/impl/CloudConfigProperties.java
 ##########
 @@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.impl;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import org.apache.druid.metadata.PasswordProvider;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+public class CloudConfigProperties
 
 Review comment:
   I wonder if this class is too synthetic, and instead should _not_ be a shared base type, and rather be input source specific. Like I imagine it could be useful in the future to support additional properties, but beyond the 'username' and 'password' model that is common to all clouds, further stuff probably starts to get really cloud specific and probably not a lot of benefit being down here.
   
   It might be better if in this PR you would add a `S3InputSourceConfig` or whatever directly to `S3InputSource` and _not_ pass it down to the base type, since it doesn't actually seem to be of any use there, and also not modify azure and google cloud since they can add their own specific implementations when support is added for credentials override there (and use their own terminology for username/password). @maytasm3 / @jihoonson what do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson merged pull request #9375: Add support for optional aws credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
jihoonson merged pull request #9375: Add support for optional aws credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383604420
 
 

 ##########
 File path: docs/ingestion/native-batch.md
 ##########
 @@ -846,6 +847,15 @@ S3 Object:
 |bucket|Name of the S3 bucket|None|yes|
 |path|The path where data is located.|None|yes|
 
+Properties Object:
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|accessKeyId|S3 access key for this S3 InputSource|None|yes if secretAccessKey is given|
+|secretAccessKey|S3 secret key for this S3 InputSource|None|yes if accessKeyId is given|
+
+**Note :** *If any property is not given in the Properties Object, then the current default server value will be use.*
 
 Review comment:
   That is the idea for the future when we support more than just the access key and secret key. Will changed to "If you don't provide an access key, the default S3 credentials provider chain is used." for now and we can change it back in the future. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sthetland commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
sthetland commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383530823
 
 

 ##########
 File path: docs/development/extensions-core/s3.md
 ##########
 @@ -64,6 +64,7 @@ In addition to this you need to set additional configuration, specific for [deep
 ### S3 authentication methods
 
 To connect to your S3 bucket (whether deep storage bucket or source bucket), Druid use the following credentials providers chain
+**Note :** *Below authentication chain can be overridden for connecting to source bucket by specifying the [properties](../../ingestion/native-batch.md#s3-input-source) parameter in the ingestionSpec.*
 
 Review comment:
   I think the doc normally uses a carrot to format note-style paragraphs, like the ones on https://druid.apache.org/docs/latest/tutorials/cluster.html#with-zookeeper-on-master for example. 
   
   I'd use that style, or this could just be a regular second sentence in the paragraph. In either case, I'd rephrase a bit: 
   
   > You can override the default credentials provider chain by specifying an access key using [Properties Object parameters](../../ingestion/native-batch.md#s3-input-source) in the ingestionSpec.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383612142
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -19,50 +19,88 @@
 
 package org.apache.druid.data.input.s3;
 
+import com.amazonaws.auth.AWSStaticCredentialsProvider;
+import com.amazonaws.auth.BasicAWSCredentials;
+import com.amazonaws.services.s3.AmazonS3ClientBuilder;
 import com.amazonaws.services.s3.model.S3ObjectSummary;
 import com.fasterxml.jackson.annotation.JacksonInject;
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.google.common.base.Preconditions;
+import com.google.common.base.Supplier;
+import com.google.common.base.Suppliers;
 import com.google.common.collect.ImmutableList;
 import org.apache.druid.data.input.InputSplit;
 import org.apache.druid.data.input.impl.CloudObjectInputSource;
 import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.S3ConfigProperties;
 import org.apache.druid.data.input.impl.SplittableInputSource;
 import org.apache.druid.storage.s3.S3InputDataConfig;
+import org.apache.druid.storage.s3.S3StorageConfig;
 import org.apache.druid.storage.s3.S3StorageDruidModule;
 import org.apache.druid.storage.s3.S3Utils;
 import org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3;
 
 import javax.annotation.Nullable;
 import java.net.URI;
 import java.util.List;
+import java.util.Objects;
 import java.util.stream.Stream;
 import java.util.stream.StreamSupport;
 
 public class S3InputSource extends CloudObjectInputSource<S3Entity>
 {
-  private final ServerSideEncryptingAmazonS3 s3Client;
+  // We lazily initialize s3Client to avoid costly s3 operation when we only need S3InputSource for stored information
+  // (such as for task logs) and not for ingestion. (This cost only applies for new s3Client created with
+  // s3ConfigProperties given).
+  private final Supplier<ServerSideEncryptingAmazonS3> s3Client;
+  @JsonProperty("properties")
+  private final S3ConfigProperties s3ConfigProperties;
   private final S3InputDataConfig inputDataConfig;
 
   @JsonCreator
   public S3InputSource(
       @JacksonInject ServerSideEncryptingAmazonS3 s3Client,
+      @JacksonInject AmazonS3ClientBuilder amazonS3ClientBuilder,
+      @JacksonInject S3StorageConfig storageConfig,
       @JacksonInject S3InputDataConfig inputDataConfig,
       @JsonProperty("uris") @Nullable List<URI> uris,
       @JsonProperty("prefixes") @Nullable List<URI> prefixes,
-      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects,
+      @JsonProperty("properties") @Nullable S3ConfigProperties s3ConfigProperties
   )
   {
     super(S3StorageDruidModule.SCHEME, uris, prefixes, objects);
-    this.s3Client = Preconditions.checkNotNull(s3Client, "s3Client");
     this.inputDataConfig = Preconditions.checkNotNull(inputDataConfig, "S3DataSegmentPusherConfig");
+    this.s3ConfigProperties = s3ConfigProperties;
+    this.s3Client = Suppliers.memoize(
+        () -> {
+          if (amazonS3ClientBuilder != null && storageConfig != null && s3ConfigProperties != null) {
+            if (s3ConfigProperties.isCredentialsConfigured()) {
+              BasicAWSCredentials creds = new BasicAWSCredentials(
+                  s3ConfigProperties.getAccessKeyId().getPassword(),
+                  s3ConfigProperties.getSecretAccessKey().getPassword());
+              amazonS3ClientBuilder.withCredentials(new AWSStaticCredentialsProvider(creds));
+            }
+            return new ServerSideEncryptingAmazonS3(amazonS3ClientBuilder.build(), storageConfig.getServerSideEncryption());
+          } else {
+            return Preconditions.checkNotNull(s3Client, "s3Client");
 
 Review comment:
   Moved all the Preconditions.checkNotNull outside

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sthetland commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
sthetland commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383535664
 
 

 ##########
 File path: docs/ingestion/native-batch.md
 ##########
 @@ -846,6 +847,15 @@ S3 Object:
 |bucket|Name of the S3 bucket|None|yes|
 |path|The path where data is located.|None|yes|
 
+Properties Object:
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|accessKeyId|S3 access key for this S3 InputSource|None|yes if secretAccessKey is given|
+|secretAccessKey|S3 secret key for this S3 InputSource|None|yes if accessKeyId is given|
+
+**Note :** *If any property is not given in the Properties Object, then the current default server value will be use.*
 
 Review comment:
   ```suggestion
   Default values are used for any property not overridden in the Properties Object.
   ```
   Caveat to my edit suggestion: it says "any property" but really it's just S3 access key that are supported, no? To avoid the premature generalization, could say something like: 
   
   "If you don't provide an access key, the default S3 credentials provider chain is used." (With a link to the relevant section in s3.md).  
    

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383604907
 
 

 ##########
 File path: docs/ingestion/native-batch.md
 ##########
 @@ -846,6 +847,15 @@ S3 Object:
 |bucket|Name of the S3 bucket|None|yes|
 |path|The path where data is located.|None|yes|
 
+Properties Object:
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|accessKeyId|S3 access key for this S3 InputSource|None|yes if secretAccessKey is given|
+|secretAccessKey|S3 secret key for this S3 InputSource|None|yes if accessKeyId is given|
+
+**Note :** *If any property is not given in the Properties Object, then the current default server value will be use.*
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 edited a comment on issue #9375: Add support for optional aws credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 edited a comment on issue #9375: Add support for optional aws credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#issuecomment-591191540
 
 
   Manual tests looks good. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on issue #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on issue #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#issuecomment-588433048
 
 
   This pull request **introduces 3 alerts** when merging dc6b252535e79c5b6b17e84f3466323b74d61ae3 into e7eb45e64875dcffe7a1e00672a1b0c78e35beb2 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-f8d7c2013c8512f3bddee6cde25588f0facce022)
   
   **new alerts:**
   
   * 3 for Dereferenced variable may be null

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r382199016
 
 

 ##########
 File path: core/src/main/java/org/apache/druid/data/input/impl/CloudConfigProperties.java
 ##########
 @@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.impl;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import org.apache.druid.metadata.PasswordProvider;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+public class CloudConfigProperties
+{
+  @JsonCreator
+  public CloudConfigProperties(
+      @JsonProperty("accessKeyId") @Nullable PasswordProvider accessKeyId,
 
 Review comment:
   Yes. Maybe we can rename the accessKeyId to something more generic. But all the cloud (s3, G, Azure) requires a "user" and a "password" which this should work for all

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383593983
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -76,7 +114,41 @@ protected S3Entity createEntity(InputSplit<CloudObjectLocation> split)
   @Override
   public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
   {
-    return new S3InputSource(s3Client, inputDataConfig, null, null, ImmutableList.of(split.get()));
+    return new S3InputSource(
+        s3Client.get(),
 
 Review comment:
   Not sure what is the correct formatting here.
   I do see something like below in the code
   `
               CloudFilesObject segmentData = new CloudFilesObject(
                   segmentPath,
                   outFile,
                   objectApi.getRegion(),
                   objectApi.getContainer()
               );
   `

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383537739
 
 

 ##########
 File path: docs/ingestion/native-batch.md
 ##########
 @@ -846,6 +847,15 @@ S3 Object:
 |bucket|Name of the S3 bucket|None|yes|
 |path|The path where data is located.|None|yes|
 
+Properties Object:
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|accessKeyId|S3 access key for this S3 InputSource|None|yes if secretAccessKey is given|
 
 Review comment:
   Oh i see. Wanted to emphasize that these configs will only be applied to this input source (not the deep storage, etc.)  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383511182
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -76,7 +114,41 @@ protected S3Entity createEntity(InputSplit<CloudObjectLocation> split)
   @Override
   public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
   {
-    return new S3InputSource(s3Client, inputDataConfig, null, null, ImmutableList.of(split.get()));
+    return new S3InputSource(
+        s3Client.get(),
+       null,
+       null,
+        inputDataConfig,
+       null,
+       null,
+       ImmutableList.of(split.get()),
+       getS3ConfigProperties()
+    );
+  }
+
+  @Override
+  public int hashCode()
+  {
+    return Objects.hash(
+        super.hashCode(),
 
 Review comment:
   nit: this could be one line

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383085591
 
 

 ##########
 File path: core/src/main/java/org/apache/druid/data/input/impl/CloudConfigProperties.java
 ##########
 @@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.impl;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import org.apache.druid.metadata.PasswordProvider;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+public class CloudConfigProperties
+{
+  @JsonCreator
+  public CloudConfigProperties(
+      @JsonProperty("accessKeyId") @Nullable PasswordProvider accessKeyId,
 
 Review comment:
   Making a specific configProperties class for each cloudInputSource without sharing inheritance 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383600319
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -19,50 +19,88 @@
 
 package org.apache.druid.data.input.s3;
 
+import com.amazonaws.auth.AWSStaticCredentialsProvider;
+import com.amazonaws.auth.BasicAWSCredentials;
+import com.amazonaws.services.s3.AmazonS3ClientBuilder;
 import com.amazonaws.services.s3.model.S3ObjectSummary;
 import com.fasterxml.jackson.annotation.JacksonInject;
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.google.common.base.Preconditions;
+import com.google.common.base.Supplier;
+import com.google.common.base.Suppliers;
 import com.google.common.collect.ImmutableList;
 import org.apache.druid.data.input.InputSplit;
 import org.apache.druid.data.input.impl.CloudObjectInputSource;
 import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.S3ConfigProperties;
 import org.apache.druid.data.input.impl.SplittableInputSource;
 import org.apache.druid.storage.s3.S3InputDataConfig;
+import org.apache.druid.storage.s3.S3StorageConfig;
 import org.apache.druid.storage.s3.S3StorageDruidModule;
 import org.apache.druid.storage.s3.S3Utils;
 import org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3;
 
 import javax.annotation.Nullable;
 import java.net.URI;
 import java.util.List;
+import java.util.Objects;
 import java.util.stream.Stream;
 import java.util.stream.StreamSupport;
 
 public class S3InputSource extends CloudObjectInputSource<S3Entity>
 {
-  private final ServerSideEncryptingAmazonS3 s3Client;
+  // We lazily initialize s3Client to avoid costly s3 operation when we only need S3InputSource for stored information
+  // (such as for task logs) and not for ingestion. (This cost only applies for new s3Client created with
+  // s3ConfigProperties given).
+  private final Supplier<ServerSideEncryptingAmazonS3> s3Client;
+  @JsonProperty("properties")
+  private final S3ConfigProperties s3ConfigProperties;
   private final S3InputDataConfig inputDataConfig;
 
   @JsonCreator
   public S3InputSource(
       @JacksonInject ServerSideEncryptingAmazonS3 s3Client,
+      @JacksonInject AmazonS3ClientBuilder amazonS3ClientBuilder,
+      @JacksonInject S3StorageConfig storageConfig,
       @JacksonInject S3InputDataConfig inputDataConfig,
       @JsonProperty("uris") @Nullable List<URI> uris,
       @JsonProperty("prefixes") @Nullable List<URI> prefixes,
-      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects,
+      @JsonProperty("properties") @Nullable S3ConfigProperties s3ConfigProperties
   )
   {
     super(S3StorageDruidModule.SCHEME, uris, prefixes, objects);
-    this.s3Client = Preconditions.checkNotNull(s3Client, "s3Client");
     this.inputDataConfig = Preconditions.checkNotNull(inputDataConfig, "S3DataSegmentPusherConfig");
+    this.s3ConfigProperties = s3ConfigProperties;
+    this.s3Client = Suppliers.memoize(
+        () -> {
+          if (amazonS3ClientBuilder != null && storageConfig != null && s3ConfigProperties != null) {
+            if (s3ConfigProperties.isCredentialsConfigured()) {
+              BasicAWSCredentials creds = new BasicAWSCredentials(
+                  s3ConfigProperties.getAccessKeyId().getPassword(),
+                  s3ConfigProperties.getSecretAccessKey().getPassword());
+              amazonS3ClientBuilder.withCredentials(new AWSStaticCredentialsProvider(creds));
+            }
+            return new ServerSideEncryptingAmazonS3(amazonS3ClientBuilder.build(), storageConfig.getServerSideEncryption());
+          } else {
+            return Preconditions.checkNotNull(s3Client, "s3Client");
 
 Review comment:
   I guess it may not matter much here since this is just a sanity check which shouldn't happen, but it would be better to fail early as possible. The check in the constructor will be called in the JSON deserialization while other methods will be called after the task is scheduled on a middleManager.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r384026183
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3StorageDruidModule.java
 ##########
 @@ -166,9 +166,11 @@ public void configure(Binder binder)
     binder.bind(S3TaskLogs.class).in(LazySingleton.class);
   }
 
+  // This provides ServerSideEncryptingAmazonS3.Builder with default configs from Guice injection initially set.
+  // However, this builder can then be modify and have configuration(s) inside
 
 Review comment:
   modify -> modified?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383085826
 
 

 ##########
 File path: core/src/main/java/org/apache/druid/data/input/impl/CloudConfigProperties.java
 ##########
 @@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.impl;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import org.apache.druid.metadata.PasswordProvider;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+public class CloudConfigProperties
 
 Review comment:
   Make sense to me. Making a specific configProperties class for each cloudInputSource without sharing inheritance. (For this PR, just the S3 stuff and Azure and G will comes in a later PR).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r382281205
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -46,13 +51,25 @@
   @JsonCreator
   public S3InputSource(
       @JacksonInject ServerSideEncryptingAmazonS3 s3Client,
+      @JacksonInject AmazonS3ClientBuilder amazonS3ClientBuilder,
+      @JacksonInject S3StorageConfig storageConfig,
       @JsonProperty("uris") @Nullable List<URI> uris,
       @JsonProperty("prefixes") @Nullable List<URI> prefixes,
-      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects,
+      @JsonProperty("properties") @Nullable CloudConfigProperties cloudConfigProperties
 
 Review comment:
   I don't think the other docs should be modified yet, only s3, since the others are not wired up to anything there like it is in `S3InputSource`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383599281
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -19,50 +19,88 @@
 
 package org.apache.druid.data.input.s3;
 
+import com.amazonaws.auth.AWSStaticCredentialsProvider;
+import com.amazonaws.auth.BasicAWSCredentials;
+import com.amazonaws.services.s3.AmazonS3ClientBuilder;
 import com.amazonaws.services.s3.model.S3ObjectSummary;
 import com.fasterxml.jackson.annotation.JacksonInject;
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.google.common.base.Preconditions;
+import com.google.common.base.Supplier;
+import com.google.common.base.Suppliers;
 import com.google.common.collect.ImmutableList;
 import org.apache.druid.data.input.InputSplit;
 import org.apache.druid.data.input.impl.CloudObjectInputSource;
 import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.S3ConfigProperties;
 import org.apache.druid.data.input.impl.SplittableInputSource;
 import org.apache.druid.storage.s3.S3InputDataConfig;
+import org.apache.druid.storage.s3.S3StorageConfig;
 import org.apache.druid.storage.s3.S3StorageDruidModule;
 import org.apache.druid.storage.s3.S3Utils;
 import org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3;
 
 import javax.annotation.Nullable;
 import java.net.URI;
 import java.util.List;
+import java.util.Objects;
 import java.util.stream.Stream;
 import java.util.stream.StreamSupport;
 
 public class S3InputSource extends CloudObjectInputSource<S3Entity>
 {
-  private final ServerSideEncryptingAmazonS3 s3Client;
+  // We lazily initialize s3Client to avoid costly s3 operation when we only need S3InputSource for stored information
+  // (such as for task logs) and not for ingestion. (This cost only applies for new s3Client created with
+  // s3ConfigProperties given).
+  private final Supplier<ServerSideEncryptingAmazonS3> s3Client;
+  @JsonProperty("properties")
+  private final S3ConfigProperties s3ConfigProperties;
   private final S3InputDataConfig inputDataConfig;
 
   @JsonCreator
   public S3InputSource(
       @JacksonInject ServerSideEncryptingAmazonS3 s3Client,
+      @JacksonInject AmazonS3ClientBuilder amazonS3ClientBuilder,
+      @JacksonInject S3StorageConfig storageConfig,
       @JacksonInject S3InputDataConfig inputDataConfig,
       @JsonProperty("uris") @Nullable List<URI> uris,
       @JsonProperty("prefixes") @Nullable List<URI> prefixes,
-      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects,
+      @JsonProperty("properties") @Nullable S3ConfigProperties s3ConfigProperties
   )
   {
     super(S3StorageDruidModule.SCHEME, uris, prefixes, objects);
-    this.s3Client = Preconditions.checkNotNull(s3Client, "s3Client");
     this.inputDataConfig = Preconditions.checkNotNull(inputDataConfig, "S3DataSegmentPusherConfig");
+    this.s3ConfigProperties = s3ConfigProperties;
+    this.s3Client = Suppliers.memoize(
+        () -> {
+          if (amazonS3ClientBuilder != null && storageConfig != null && s3ConfigProperties != null) {
+            if (s3ConfigProperties.isCredentialsConfigured()) {
+              BasicAWSCredentials creds = new BasicAWSCredentials(
+                  s3ConfigProperties.getAccessKeyId().getPassword(),
+                  s3ConfigProperties.getSecretAccessKey().getPassword());
+              amazonS3ClientBuilder.withCredentials(new AWSStaticCredentialsProvider(creds));
+            }
+            return new ServerSideEncryptingAmazonS3(amazonS3ClientBuilder.build(), storageConfig.getServerSideEncryption());
+          } else {
+            return Preconditions.checkNotNull(s3Client, "s3Client");
 
 Review comment:
   I think it will be the same either way. Had it this way to keep all relevant logic for determining which s3Client to use together (inside the Suppliers.memorize(). I'm fine either way though.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383517279
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -19,50 +19,88 @@
 
 package org.apache.druid.data.input.s3;
 
+import com.amazonaws.auth.AWSStaticCredentialsProvider;
+import com.amazonaws.auth.BasicAWSCredentials;
+import com.amazonaws.services.s3.AmazonS3ClientBuilder;
 import com.amazonaws.services.s3.model.S3ObjectSummary;
 import com.fasterxml.jackson.annotation.JacksonInject;
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.google.common.base.Preconditions;
+import com.google.common.base.Supplier;
+import com.google.common.base.Suppliers;
 import com.google.common.collect.ImmutableList;
 import org.apache.druid.data.input.InputSplit;
 import org.apache.druid.data.input.impl.CloudObjectInputSource;
 import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.S3ConfigProperties;
 import org.apache.druid.data.input.impl.SplittableInputSource;
 import org.apache.druid.storage.s3.S3InputDataConfig;
+import org.apache.druid.storage.s3.S3StorageConfig;
 import org.apache.druid.storage.s3.S3StorageDruidModule;
 import org.apache.druid.storage.s3.S3Utils;
 import org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3;
 
 import javax.annotation.Nullable;
 import java.net.URI;
 import java.util.List;
+import java.util.Objects;
 import java.util.stream.Stream;
 import java.util.stream.StreamSupport;
 
 public class S3InputSource extends CloudObjectInputSource<S3Entity>
 {
-  private final ServerSideEncryptingAmazonS3 s3Client;
+  // We lazily initialize s3Client to avoid costly s3 operation when we only need S3InputSource for stored information
+  // (such as for task logs) and not for ingestion. (This cost only applies for new s3Client created with
+  // s3ConfigProperties given).
+  private final Supplier<ServerSideEncryptingAmazonS3> s3Client;
+  @JsonProperty("properties")
+  private final S3ConfigProperties s3ConfigProperties;
   private final S3InputDataConfig inputDataConfig;
 
   @JsonCreator
   public S3InputSource(
       @JacksonInject ServerSideEncryptingAmazonS3 s3Client,
+      @JacksonInject AmazonS3ClientBuilder amazonS3ClientBuilder,
+      @JacksonInject S3StorageConfig storageConfig,
       @JacksonInject S3InputDataConfig inputDataConfig,
       @JsonProperty("uris") @Nullable List<URI> uris,
       @JsonProperty("prefixes") @Nullable List<URI> prefixes,
-      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects,
+      @JsonProperty("properties") @Nullable S3ConfigProperties s3ConfigProperties
   )
   {
     super(S3StorageDruidModule.SCHEME, uris, prefixes, objects);
-    this.s3Client = Preconditions.checkNotNull(s3Client, "s3Client");
     this.inputDataConfig = Preconditions.checkNotNull(inputDataConfig, "S3DataSegmentPusherConfig");
+    this.s3ConfigProperties = s3ConfigProperties;
+    this.s3Client = Suppliers.memoize(
+        () -> {
+          if (amazonS3ClientBuilder != null && storageConfig != null && s3ConfigProperties != null) {
+            if (s3ConfigProperties.isCredentialsConfigured()) {
+              BasicAWSCredentials creds = new BasicAWSCredentials(
+                  s3ConfigProperties.getAccessKeyId().getPassword(),
+                  s3ConfigProperties.getSecretAccessKey().getPassword());
+              amazonS3ClientBuilder.withCredentials(new AWSStaticCredentialsProvider(creds));
+            }
+            return new ServerSideEncryptingAmazonS3(amazonS3ClientBuilder.build(), storageConfig.getServerSideEncryption());
+          } else {
+            return Preconditions.checkNotNull(s3Client, "s3Client");
 
 Review comment:
   hmm, should the check that one of `s3Client` or `amazonS3ClientBuilder` and `storageConfig ` are not null be done eagerly instead of in the supplier? (i'm not certain either way, just thinking out loud)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383510962
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -76,7 +114,41 @@ protected S3Entity createEntity(InputSplit<CloudObjectLocation> split)
   @Override
   public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
   {
-    return new S3InputSource(s3Client, inputDataConfig, null, null, ImmutableList.of(split.get()));
+    return new S3InputSource(
+        s3Client.get(),
 
 Review comment:
   nit: formatting looks sort of funny here to me, unfortunately i think this is one of the few things style checks don't catch

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383593983
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -76,7 +114,41 @@ protected S3Entity createEntity(InputSplit<CloudObjectLocation> split)
   @Override
   public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
   {
-    return new S3InputSource(s3Client, inputDataConfig, null, null, ImmutableList.of(split.get()));
+    return new S3InputSource(
+        s3Client.get(),
 
 Review comment:
   Not sure what is the correct formatting here.
   I do see something like below in the code
   
               CloudFilesObject segmentData = new CloudFilesObject(
                   segmentPath,
                   outFile,
                   objectApi.getRegion(),
                   objectApi.getContainer()
               );
   
           return Iterables.transform(
               supervisorSpec.getSpec().getDataSources(),
               AuthorizationUtils.DATASOURCE_READ_RA_GENERATOR
           );

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383596877
 
 

 ##########
 File path: core/src/main/java/org/apache/druid/data/input/impl/S3ConfigProperties.java
 ##########
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.impl;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import org.apache.druid.metadata.PasswordProvider;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+/**
+ * Contains properties for s3 access configuration.
+ * Properties can be specified by ingestionSpec and override system default.
+ */
+public class S3ConfigProperties
 
 Review comment:
   Oops forgot that. Done. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383519388
 
 

 ##########
 File path: docs/ingestion/native-batch.md
 ##########
 @@ -846,6 +847,15 @@ S3 Object:
 |bucket|Name of the S3 bucket|None|yes|
 |path|The path where data is located.|None|yes|
 
+Properties Object:
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|accessKeyId|S3 access key for this S3 InputSource|None|yes if secretAccessKey is given|
 
 Review comment:
   I think these can just be shortened to `S3 access key` and `S3 secret key`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383510309
 
 

 ##########
 File path: core/src/main/java/org/apache/druid/data/input/impl/S3ConfigProperties.java
 ##########
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.impl;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import org.apache.druid.metadata.PasswordProvider;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+/**
+ * Contains properties for s3 access configuration.
+ * Properties can be specified by ingestionSpec and override system default.
+ */
+public class S3ConfigProperties
 
 Review comment:
   This should live in the s3 extension, not in druid-core

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383602565
 
 

 ##########
 File path: docs/development/extensions-core/s3.md
 ##########
 @@ -64,6 +64,7 @@ In addition to this you need to set additional configuration, specific for [deep
 ### S3 authentication methods
 
 To connect to your S3 bucket (whether deep storage bucket or source bucket), Druid use the following credentials providers chain
+**Note :** *Below authentication chain can be overridden for connecting to source bucket by specifying the [properties](../../ingestion/native-batch.md#s3-input-source) parameter in the ingestionSpec.*
 
 Review comment:
   You can override the default credentials provider chain for connecting to source bucket by specifying an access key and secret key using Properties Object parameters in the ingestionSpec.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r384169996
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -42,32 +46,66 @@
 import java.net.URI;
 import java.util.Iterator;
 import java.util.List;
+import java.util.Objects;
 import java.util.stream.Collectors;
 import java.util.stream.Stream;
 
 public class S3InputSource extends CloudObjectInputSource
 {
-  private final ServerSideEncryptingAmazonS3 s3Client;
+  // We lazily initialize ServerSideEncryptingAmazonS3 to avoid costly s3 operation when we only need S3InputSource
+  // for stored information (such as for task logs) and not for ingestion.
+  // (This cost only applies for new ServerSideEncryptingAmazonS3 created with s3InputSourceProperties given).
+  private final Supplier<ServerSideEncryptingAmazonS3> s3ClientSupplier;
+  @JsonProperty("properties")
+  private final S3InputSourceProperties s3InputSourceProperties;
   private final S3InputDataConfig inputDataConfig;
 
   @JsonCreator
   public S3InputSource(
       @JacksonInject ServerSideEncryptingAmazonS3 s3Client,
 
 Review comment:
   Yea..we still need it to use the singleton client for when the properties is not present in the JSON. This will save the cost of initializing the s3client using the s3ClientSupplier since we can reuse the  injected singleton

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r384169983
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3StorageDruidModule.java
 ##########
 @@ -166,9 +166,11 @@ public void configure(Binder binder)
     binder.bind(S3TaskLogs.class).in(LazySingleton.class);
   }
 
+  // This provides ServerSideEncryptingAmazonS3.Builder with default configs from Guice injection initially set.
+  // However, this builder can then be modify and have configuration(s) inside
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383524036
 
 

 ##########
 File path: docs/ingestion/native-batch.md
 ##########
 @@ -846,6 +847,15 @@ S3 Object:
 |bucket|Name of the S3 bucket|None|yes|
 |path|The path where data is located.|None|yes|
 
+Properties Object:
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|accessKeyId|S3 access key for this S3 InputSource|None|yes if secretAccessKey is given|
 
 Review comment:
   What do you mean? The description or the property itself?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383600319
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -19,50 +19,88 @@
 
 package org.apache.druid.data.input.s3;
 
+import com.amazonaws.auth.AWSStaticCredentialsProvider;
+import com.amazonaws.auth.BasicAWSCredentials;
+import com.amazonaws.services.s3.AmazonS3ClientBuilder;
 import com.amazonaws.services.s3.model.S3ObjectSummary;
 import com.fasterxml.jackson.annotation.JacksonInject;
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.google.common.base.Preconditions;
+import com.google.common.base.Supplier;
+import com.google.common.base.Suppliers;
 import com.google.common.collect.ImmutableList;
 import org.apache.druid.data.input.InputSplit;
 import org.apache.druid.data.input.impl.CloudObjectInputSource;
 import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.S3ConfigProperties;
 import org.apache.druid.data.input.impl.SplittableInputSource;
 import org.apache.druid.storage.s3.S3InputDataConfig;
+import org.apache.druid.storage.s3.S3StorageConfig;
 import org.apache.druid.storage.s3.S3StorageDruidModule;
 import org.apache.druid.storage.s3.S3Utils;
 import org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3;
 
 import javax.annotation.Nullable;
 import java.net.URI;
 import java.util.List;
+import java.util.Objects;
 import java.util.stream.Stream;
 import java.util.stream.StreamSupport;
 
 public class S3InputSource extends CloudObjectInputSource<S3Entity>
 {
-  private final ServerSideEncryptingAmazonS3 s3Client;
+  // We lazily initialize s3Client to avoid costly s3 operation when we only need S3InputSource for stored information
+  // (such as for task logs) and not for ingestion. (This cost only applies for new s3Client created with
+  // s3ConfigProperties given).
+  private final Supplier<ServerSideEncryptingAmazonS3> s3Client;
+  @JsonProperty("properties")
+  private final S3ConfigProperties s3ConfigProperties;
   private final S3InputDataConfig inputDataConfig;
 
   @JsonCreator
   public S3InputSource(
       @JacksonInject ServerSideEncryptingAmazonS3 s3Client,
+      @JacksonInject AmazonS3ClientBuilder amazonS3ClientBuilder,
+      @JacksonInject S3StorageConfig storageConfig,
       @JacksonInject S3InputDataConfig inputDataConfig,
       @JsonProperty("uris") @Nullable List<URI> uris,
       @JsonProperty("prefixes") @Nullable List<URI> prefixes,
-      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects,
+      @JsonProperty("properties") @Nullable S3ConfigProperties s3ConfigProperties
   )
   {
     super(S3StorageDruidModule.SCHEME, uris, prefixes, objects);
-    this.s3Client = Preconditions.checkNotNull(s3Client, "s3Client");
     this.inputDataConfig = Preconditions.checkNotNull(inputDataConfig, "S3DataSegmentPusherConfig");
+    this.s3ConfigProperties = s3ConfigProperties;
+    this.s3Client = Suppliers.memoize(
+        () -> {
+          if (amazonS3ClientBuilder != null && storageConfig != null && s3ConfigProperties != null) {
+            if (s3ConfigProperties.isCredentialsConfigured()) {
+              BasicAWSCredentials creds = new BasicAWSCredentials(
+                  s3ConfigProperties.getAccessKeyId().getPassword(),
+                  s3ConfigProperties.getSecretAccessKey().getPassword());
+              amazonS3ClientBuilder.withCredentials(new AWSStaticCredentialsProvider(creds));
+            }
+            return new ServerSideEncryptingAmazonS3(amazonS3ClientBuilder.build(), storageConfig.getServerSideEncryption());
+          } else {
+            return Preconditions.checkNotNull(s3Client, "s3Client");
 
 Review comment:
   I guess it may not matter much here since this is just a sanity check which shouldn't happen, but it would be better to fail as early as possible. The check in the constructor will be called in the JSON deserialization while other methods will be called after the task is scheduled on a middleManager.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r384027725
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -42,32 +46,66 @@
 import java.net.URI;
 import java.util.Iterator;
 import java.util.List;
+import java.util.Objects;
 import java.util.stream.Collectors;
 import java.util.stream.Stream;
 
 public class S3InputSource extends CloudObjectInputSource
 {
-  private final ServerSideEncryptingAmazonS3 s3Client;
+  // We lazily initialize ServerSideEncryptingAmazonS3 to avoid costly s3 operation when we only need S3InputSource
+  // for stored information (such as for task logs) and not for ingestion.
+  // (This cost only applies for new ServerSideEncryptingAmazonS3 created with s3InputSourceProperties given).
+  private final Supplier<ServerSideEncryptingAmazonS3> s3ClientSupplier;
+  @JsonProperty("properties")
+  private final S3InputSourceProperties s3InputSourceProperties;
   private final S3InputDataConfig inputDataConfig;
 
   @JsonCreator
   public S3InputSource(
       @JacksonInject ServerSideEncryptingAmazonS3 s3Client,
 
 Review comment:
   Do we still need this `s3Client` parameter to reuse the singleton client? If so, please leave a comment about it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ccaominh commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
ccaominh commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r381709202
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -46,13 +51,25 @@
   @JsonCreator
   public S3InputSource(
       @JacksonInject ServerSideEncryptingAmazonS3 s3Client,
+      @JacksonInject AmazonS3ClientBuilder amazonS3ClientBuilder,
+      @JacksonInject S3StorageConfig storageConfig,
       @JsonProperty("uris") @Nullable List<URI> uris,
       @JsonProperty("prefixes") @Nullable List<URI> prefixes,
-      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects,
+      @JsonProperty("properties") @Nullable CloudConfigProperties cloudConfigProperties
 
 Review comment:
   This should probably be added to https://druid.apache.org/docs/latest/ingestion/native-batch.html#s3-input-source. Similar comment for the other modified input sources.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ccaominh commented on issue #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
ccaominh commented on issue #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#issuecomment-588056687
 
 
   Jobs are failing with a compilation error:
   ```
   [ERROR] COMPILATION ERROR : 
   [ERROR] /home/travis/build/apache/druid/extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java:[70,5] constructor CloudObjectInputSource in class org.apache.druid.data.input.impl.CloudObjectInputSource<T> cannot be applied to given types;
     required: java.lang.String,java.util.List<java.net.URI>,java.util.List<java.net.URI>,java.util.List<org.apache.druid.data.input.impl.CloudObjectLocation>,org.apache.druid.data.input.impl.CloudConfigProperties
     found: java.lang.String,java.util.List<java.net.URI>,java.util.List<java.net.URI>,java.util.List<org.apache.druid.data.input.impl.CloudObjectLocation>
     reason: actual and formal argument lists differ in length
   [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project druid-azure-extensions: Compilation failure
   [ERROR] /home/travis/build/apache/druid/extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java:[70,5] constructor CloudObjectInputSource in class org.apache.druid.data.input.impl.CloudObjectInputSource<T> cannot be applied to given types;
   [ERROR]   required: java.lang.String,java.util.List<java.net.URI>,java.util.List<java.net.URI>,java.util.List<org.apache.druid.data.input.impl.CloudObjectLocation>,org.apache.druid.data.input.impl.CloudConfigProperties
   [ERROR]   found: java.lang.String,java.util.List<java.net.URI>,java.util.List<java.net.URI>,java.util.List<org.apache.druid.data.input.impl.CloudObjectLocation>
   [ERROR]   reason: actual and formal argument lists differ in length
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on issue #9375: Add support for optional aws credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on issue #9375: Add support for optional aws credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#issuecomment-591191540
 
 
   Manual test looks good. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383593983
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -76,7 +114,41 @@ protected S3Entity createEntity(InputSplit<CloudObjectLocation> split)
   @Override
   public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
   {
-    return new S3InputSource(s3Client, inputDataConfig, null, null, ImmutableList.of(split.get()));
+    return new S3InputSource(
+        s3Client.get(),
 
 Review comment:
   Not sure what is the correct formatting here.
   I do see something like below in the code
   
               CloudFilesObject segmentData = new CloudFilesObject(
                   segmentPath,
                   outFile,
                   objectApi.getRegion(),
                   objectApi.getContainer()
               );
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on issue #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on issue #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#issuecomment-590220287
 
 
   @clintropolis 
   Addressed comments. Please review again when you have time.
   Main changes:
   - Made specific s3ConfigProperties class for S3 without sharing inheritance (hence, removed CloudConfigProperties).
   - Added s3ConfigProperties to the constructor for S3InputSource in the withSplit method (so that the peons can connect to s3 using the s3ConfigProperties from main worker)
   - Made s3Client lazy initialize inside the S3InputSource
   - Update docs  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383593983
 
 

 ##########
 File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java
 ##########
 @@ -76,7 +114,41 @@ protected S3Entity createEntity(InputSplit<CloudObjectLocation> split)
   @Override
   public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
   {
-    return new S3InputSource(s3Client, inputDataConfig, null, null, ImmutableList.of(split.get()));
+    return new S3InputSource(
+        s3Client.get(),
 
 Review comment:
   Not sure what is the correct formatting here.
   I do see something like below in the code
   
               CloudFilesObject segmentData = new CloudFilesObject(
                   segmentPath,
                   outFile,
                   objectApi.getRegion(),
                   objectApi.getContainer()
               );
   and
   
           return Iterables.transform(
               supervisorSpec.getSpec().getDataSources(),
               AuthorizationUtils.DATASOURCE_READ_RA_GENERATOR
           );

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sthetland commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
sthetland commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383532435
 
 

 ##########
 File path: docs/ingestion/native-batch.md
 ##########
 @@ -838,6 +838,7 @@ Sample specs:
 |uris|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or `objects` must be set|
 |prefixes|JSON array of URI prefixes for the locations of S3 objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set|
 |objects|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set|
+|properties|Properties Object for overriding default S3 configuration. See below for more information.|None|No (Default configurations will be use if not given)
 
 Review comment:
   ```suggestion
   |properties|Properties Object for overriding the default S3 configuration. See below for more information.|None|No (defaults will be used if not given)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383603976
 
 

 ##########
 File path: docs/ingestion/native-batch.md
 ##########
 @@ -838,6 +838,7 @@ Sample specs:
 |uris|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or `objects` must be set|
 |prefixes|JSON array of URI prefixes for the locations of S3 objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set|
 |objects|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set|
+|properties|Properties Object for overriding default S3 configuration. See below for more information.|None|No (Default configurations will be use if not given)
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9375: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/pull/9375#discussion_r383601025
 
 

 ##########
 File path: docs/development/extensions-core/s3.md
 ##########
 @@ -64,6 +64,7 @@ In addition to this you need to set additional configuration, specific for [deep
 ### S3 authentication methods
 
 To connect to your S3 bucket (whether deep storage bucket or source bucket), Druid use the following credentials providers chain
+**Note :** *Below authentication chain can be overridden for connecting to source bucket by specifying the [properties](../../ingestion/native-batch.md#s3-input-source) parameter in the ingestionSpec.*
 
 Review comment:
   This is more for like a second sentence in the paragraph. Similar to other note on this page.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org