You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/02/03 21:35:37 UTC

[GitHub] [druid] zachjsh opened a new pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

zachjsh opened a new pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306
 
 
   ### Description
   This change implements the inputSource reader for druid, which is used for native batch ingestion. This implementation follows those for S3 and GoogleCloud. 
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist above are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   <hr>
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374391961
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/firehose/azure/AzureBlob.java
 ##########
 @@ -26,6 +26,7 @@
 import java.util.Objects;
 
 
+@Deprecated
 
 Review comment:
   Add instructions about how to move away from this deprecated class and if there are any benefits, why the new way is preferred.
   
   Similar comments for other deprecated annotations

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376560349
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureByteSource.java
 ##########
 @@ -46,11 +53,19 @@ public AzureByteSource(
 
   @Override
   public InputStream openStream() throws IOException
+  {
+    return openStream(0L);
+  }
+
+  public InputStream openStream(long offset) throws IOException
   {
     try {
-      return azureStorage.getBlobInputStream(containerName, blobPath);
+      return azureStorage.getBlobInputStream(offset, containerName, blobPath);
     }
     catch (StorageException | URISyntaxException e) {
+      log.warn("Exception when opening stream to azure resource, containerName: %s, blobPath: %s, Error: %s",
 
 Review comment:
   As an operator - what should I do if I see this warning?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson merged pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
jihoonson merged pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374442327
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.storage.azure.AzureCloudBlobDruidToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobDruid;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
+  static final String SCHEME = "azure";
+
+  private final Logger log = new Logger(AzureInputSource.class);
+  private final AzureStorage storage;
+  private final AzureEntityFactory entityFactory;
+  private final AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private final AzureCloudBlobDruidToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  @JsonCreator
+  public AzureInputSource(
+      @JacksonInject AzureStorage storage,
+      @JacksonInject AzureEntityFactory entityFactory,
+      @JacksonInject AzureCloudBlobIterableFactory azureCloudBlobIterableFactory,
+      @JacksonInject AzureCloudBlobDruidToCloudObjectLocationConverter azureCloudBlobToLocationConverter,
+      @JsonProperty("uris") @Nullable List<URI> uris,
+      @JsonProperty("prefixes") @Nullable List<URI> prefixes,
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+  )
+  {
+    super(SCHEME, uris, prefixes, objects);
+    this.storage = Preconditions.checkNotNull(storage, "AzureStorage");
+    this.entityFactory = Preconditions.checkNotNull(entityFactory, "AzureEntityFactory");
+    this.azureCloudBlobIterableFactory = Preconditions.checkNotNull(
+        azureCloudBlobIterableFactory,
+        "AzureCloudBlobIterableFactory"
+    );
+    this.azureCloudBlobToLocationConverter = Preconditions.checkNotNull(azureCloudBlobToLocationConverter, "AzureCloudBlobToLocationConverter");
+  }
+
+  @Override
+  protected AzureEntity createEntity(InputSplit<CloudObjectLocation> split)
+  {
+    return entityFactory.create(split.get());
 
 Review comment:
   Out of curiosity, what's the benefit of using the factory pattern here?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376535243
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.storage.azure.AzureCloudBlobHolderToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobHolder;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+/**
+ * Abstracts the Azure storage system where input data is stored. Allows users to retrieve entities in
+ * the storage system that match either a particular uri, prefix, or object.
+ */
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
+  public static final String SCHEME = "azure";
+
+  private final AzureStorage storage;
+  private final AzureEntityFactory entityFactory;
+  private final AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private final AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  @JsonCreator
+  public AzureInputSource(
+      @JacksonInject AzureStorage storage,
+      @JacksonInject AzureEntityFactory entityFactory,
+      @JacksonInject AzureCloudBlobIterableFactory azureCloudBlobIterableFactory,
+      @JacksonInject AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter,
+      @JsonProperty("uris") @Nullable List<URI> uris,
+      @JsonProperty("prefixes") @Nullable List<URI> prefixes,
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+  )
+  {
+    super(SCHEME, uris, prefixes, objects);
+    this.storage = Preconditions.checkNotNull(storage, "AzureStorage");
+    this.entityFactory = Preconditions.checkNotNull(entityFactory, "AzureEntityFactory");
+    this.azureCloudBlobIterableFactory = Preconditions.checkNotNull(
+        azureCloudBlobIterableFactory,
+        "AzureCloudBlobIterableFactory"
+    );
+    this.azureCloudBlobToLocationConverter = Preconditions.checkNotNull(azureCloudBlobToLocationConverter, "AzureCloudBlobToLocationConverter");
+  }
+
+  @Override
+  public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
+  {
+    return new AzureInputSource(
+        storage,
+        entityFactory,
+        azureCloudBlobIterableFactory,
+        azureCloudBlobToLocationConverter,
+        null,
+        null,
+        ImmutableList.of(split.get())
+    );
+  }
+
+  @Override
+  public String toString()
+  {
+    return "AzureInputSource{" +
+           "uris=" + getUris() +
+           ", prefixes=" + getPrefixes() +
+           ", objects=" + getObjects() +
+           '}';
+  }
+
 
 Review comment:
   I can't tell if equals/ hashCode is broken for this class. It looks like all the implementations just care about the class being the same an the uris/ prefixes/ objects to match. 
   
   But can we ever have a situation where AzureStorage is different but the uris are the same? My guess is no, but if that ever changes in the future it would be really tough to debug this. Unclear to me what the correct behavior is here. What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374405209
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorageDruidModule.java
 ##########
 @@ -91,6 +96,16 @@ public void configure(Binder binder)
     Binders.taskLogsBinder(binder).addBinding(SCHEME).to(AzureTaskLogs.class);
     JsonConfigProvider.bind(binder, "druid.indexer.logs", AzureTaskLogsConfig.class);
     binder.bind(AzureTaskLogs.class).in(LazySingleton.class);
+    binder.install(new FactoryModuleBuilder()
+                       .build(AzureByteSourceFactory.class));
+    binder.install(new FactoryModuleBuilder()
+                       .build(AzureEntityFactory.class));
+    binder.install(new FactoryModuleBuilder()
+                       .build(AzureCloudBlobIteratorFactory.class));
+    binder.install(new FactoryModuleBuilder()
+                       .build(AzureCloudBlobIterableFactory.class));
+    binder.install(new FactoryModuleBuilder()
+                       .build(ListBlobItemDruidFactory.class));
 
 Review comment:
   ModuleTests to ensure the factories are injected and can create their objects?
   
   Look at this PR - #9279 I've added a bunch of module tests with mocks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376561794
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureCloudBlobHolderToCloudObjectLocationConverter.java
 ##########
 @@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.azure;
+
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+
+/**
+ * Converts a {@link CloudBlobHolder} object to a {@link CloudObjectLocation} object
+ */
+public class AzureCloudBlobHolderToCloudObjectLocationConverter
+    implements ICloudSpecificObjectToCloudObjectLocationConverter<CloudBlobHolder>
+{
+  @Override
+  public CloudObjectLocation createCloudObjectLocation(CloudBlobHolder cloudBlob)
+  {
+    try {
+      return new CloudObjectLocation(cloudBlob.getContainerName(), cloudBlob.getName());
+    }
+    catch (Exception e) {
+      throw new RuntimeException(e);
 
 Review comment:
   In Druid, I see us throw `RE` which extends from `RuntimeException`s why did you choose to throw that here instead of `RE`? I don't know which one is the right one to throw...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376675142
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureByteSource.java
 ##########
 @@ -46,11 +53,19 @@ public AzureByteSource(
 
   @Override
   public InputStream openStream() throws IOException
+  {
+    return openStream(0L);
+  }
+
+  public InputStream openStream(long offset) throws IOException
   {
     try {
-      return azureStorage.getBlobInputStream(containerName, blobPath);
+      return azureStorage.getBlobInputStream(offset, containerName, blobPath);
     }
     catch (StorageException | URISyntaxException e) {
+      log.warn("Exception when opening stream to azure resource, containerName: %s, blobPath: %s, Error: %s",
 
 Review comment:
   I would think that the contained Exception message (from Microsoft library code) would help explain the issue in more detail.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376533967
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.storage.azure.AzureCloudBlobHolderToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobHolder;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+/**
+ * Abstracts the Azure storage system where input data is stored. Allows users to retrieve entities in
+ * the storage system that match either a particular uri, prefix, or object.
+ */
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
+  public static final String SCHEME = "azure";
+
+  private final AzureStorage storage;
+  private final AzureEntityFactory entityFactory;
+  private final AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private final AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  @JsonCreator
+  public AzureInputSource(
+      @JacksonInject AzureStorage storage,
+      @JacksonInject AzureEntityFactory entityFactory,
+      @JacksonInject AzureCloudBlobIterableFactory azureCloudBlobIterableFactory,
+      @JacksonInject AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter,
+      @JsonProperty("uris") @Nullable List<URI> uris,
+      @JsonProperty("prefixes") @Nullable List<URI> prefixes,
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+  )
+  {
+    super(SCHEME, uris, prefixes, objects);
+    this.storage = Preconditions.checkNotNull(storage, "AzureStorage");
+    this.entityFactory = Preconditions.checkNotNull(entityFactory, "AzureEntityFactory");
+    this.azureCloudBlobIterableFactory = Preconditions.checkNotNull(
+        azureCloudBlobIterableFactory,
+        "AzureCloudBlobIterableFactory"
+    );
+    this.azureCloudBlobToLocationConverter = Preconditions.checkNotNull(azureCloudBlobToLocationConverter, "AzureCloudBlobToLocationConverter");
+  }
+
+  @Override
+  public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
+  {
+    return new AzureInputSource(
+        storage,
+        entityFactory,
+        azureCloudBlobIterableFactory,
+        azureCloudBlobToLocationConverter,
+        null,
+        null,
+        ImmutableList.of(split.get())
+    );
+  }
+
+  @Override
+  public String toString()
 
 Review comment:
   Can we move the toString() definition in to the base class? Looks like all the InputSources are using the same format.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376674946
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.storage.azure.AzureCloudBlobHolderToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobHolder;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+/**
+ * Abstracts the Azure storage system where input data is stored. Allows users to retrieve entities in
+ * the storage system that match either a particular uri, prefix, or object.
+ */
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
+  public static final String SCHEME = "azure";
+
+  private final AzureStorage storage;
+  private final AzureEntityFactory entityFactory;
+  private final AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private final AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  @JsonCreator
+  public AzureInputSource(
+      @JacksonInject AzureStorage storage,
+      @JacksonInject AzureEntityFactory entityFactory,
+      @JacksonInject AzureCloudBlobIterableFactory azureCloudBlobIterableFactory,
+      @JacksonInject AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter,
+      @JsonProperty("uris") @Nullable List<URI> uris,
+      @JsonProperty("prefixes") @Nullable List<URI> prefixes,
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+  )
+  {
+    super(SCHEME, uris, prefixes, objects);
+    this.storage = Preconditions.checkNotNull(storage, "AzureStorage");
+    this.entityFactory = Preconditions.checkNotNull(entityFactory, "AzureEntityFactory");
+    this.azureCloudBlobIterableFactory = Preconditions.checkNotNull(
+        azureCloudBlobIterableFactory,
+        "AzureCloudBlobIterableFactory"
+    );
+    this.azureCloudBlobToLocationConverter = Preconditions.checkNotNull(azureCloudBlobToLocationConverter, "AzureCloudBlobToLocationConverter");
+  }
+
+  @Override
+  public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
+  {
+    return new AzureInputSource(
+        storage,
+        entityFactory,
+        azureCloudBlobIterableFactory,
+        azureCloudBlobToLocationConverter,
+        null,
+        null,
+        ImmutableList.of(split.get())
+    );
+  }
+
+  @Override
+  public String toString()
 
 Review comment:
   I dont think this can be done. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376573123
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/data/input/azure/AzureInputSourceTest.java
 ##########
 @@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.storage.azure.AzureCloudBlobHolderToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterable;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobHolder;
+import org.easymock.EasyMock;
+import org.easymock.EasyMockSupport;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.net.URI;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Spliterators;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class AzureInputSourceTest extends EasyMockSupport
+{
+  private static final String CONTAINER_NAME = "container";
+  private static final String BLOB_NAME = "blob";
+  private static final URI PREFIX_URI;
+  private final List<URI> EMPTY_URIS = ImmutableList.of();
+  private final List<URI> EMPTY_PREFIXES = ImmutableList.of();
+  private final List<CloudObjectLocation> EMPTY_OBJECTS = ImmutableList.of();
+  private static final String CONTAINER = "CONTAINER";
+  private static final String BLOB_PATH = "BLOB_PATH";
+  private static final CloudObjectLocation CLOUD_OBJECT_LOCATION_1 = new CloudObjectLocation(CONTAINER, BLOB_PATH);
+
+  private AzureStorage storage;
+  private AzureEntityFactory entityFactory;
+  private AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  private InputSplit<CloudObjectLocation> inputSplit;
+  private AzureEntity azureEntity1;
+  private CloudBlobHolder cloudBlobDruid1;
+  private AzureCloudBlobIterable azureCloudBlobIterable;
+
+  private AzureInputSource azureInputSource;
+
+  static {
+    try {
+      PREFIX_URI = new URI(AzureInputSource.SCHEME + "://" + CONTAINER_NAME + "/" + BLOB_NAME);
+    }
+    catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  @Before
+  public void setup()
+  {
+    storage = createMock(AzureStorage.class);
+    entityFactory = createMock(AzureEntityFactory.class);
+    inputSplit = createMock(InputSplit.class);
+    azureEntity1 = createMock(AzureEntity.class);
+    azureCloudBlobIterableFactory = createMock(AzureCloudBlobIterableFactory.class);
+    azureCloudBlobToLocationConverter = createMock(AzureCloudBlobHolderToCloudObjectLocationConverter.class);
+    cloudBlobDruid1 = createMock(CloudBlobHolder.class);
+    azureCloudBlobIterable = createMock(AzureCloudBlobIterable.class);
+  }
+
+  @Test(expected = IllegalArgumentException.class)
+  public void test_Constructor_emptyUrisEmptyPrefixesEmptyObjects_throwsIllegalArgumentException()
+  {
+    replayAll();
+    azureInputSource = new AzureInputSource(
+        storage,
+        entityFactory,
+        azureCloudBlobIterableFactory,
+        azureCloudBlobToLocationConverter,
+        EMPTY_URIS,
+        EMPTY_PREFIXES,
+        EMPTY_OBJECTS
+    );
 
 Review comment:
   Do we want tests for uris and prefixes or uris and objects, etc. Or is that covered by another test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376674994
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.storage.azure.AzureCloudBlobHolderToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobHolder;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+/**
+ * Abstracts the Azure storage system where input data is stored. Allows users to retrieve entities in
+ * the storage system that match either a particular uri, prefix, or object.
+ */
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
+  public static final String SCHEME = "azure";
+
+  private final AzureStorage storage;
+  private final AzureEntityFactory entityFactory;
+  private final AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private final AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  @JsonCreator
+  public AzureInputSource(
+      @JacksonInject AzureStorage storage,
+      @JacksonInject AzureEntityFactory entityFactory,
+      @JacksonInject AzureCloudBlobIterableFactory azureCloudBlobIterableFactory,
+      @JacksonInject AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter,
+      @JsonProperty("uris") @Nullable List<URI> uris,
+      @JsonProperty("prefixes") @Nullable List<URI> prefixes,
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+  )
+  {
+    super(SCHEME, uris, prefixes, objects);
+    this.storage = Preconditions.checkNotNull(storage, "AzureStorage");
+    this.entityFactory = Preconditions.checkNotNull(entityFactory, "AzureEntityFactory");
+    this.azureCloudBlobIterableFactory = Preconditions.checkNotNull(
+        azureCloudBlobIterableFactory,
+        "AzureCloudBlobIterableFactory"
+    );
+    this.azureCloudBlobToLocationConverter = Preconditions.checkNotNull(azureCloudBlobToLocationConverter, "AzureCloudBlobToLocationConverter");
+  }
+
+  @Override
+  public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
+  {
+    return new AzureInputSource(
+        storage,
+        entityFactory,
+        azureCloudBlobIterableFactory,
+        azureCloudBlobToLocationConverter,
+        null,
+        null,
+        ImmutableList.of(split.get())
+    );
+  }
+
+  @Override
+  public String toString()
+  {
+    return "AzureInputSource{" +
+           "uris=" + getUris() +
+           ", prefixes=" + getPrefixes() +
+           ", objects=" + getObjects() +
+           '}';
+  }
+
+  @Override
+  protected AzureEntity createEntity(InputSplit<CloudObjectLocation> split)
+  {
+    return entityFactory.create(split.get());
+  }
+
+  @Override
+  protected Stream<InputSplit<CloudObjectLocation>> getPrefixesSplitStream()
+  {
+    return StreamSupport.stream(getIterableObjectsFromPrefixes().spliterator(), false)
+                        .map(o -> azureCloudBlobToLocationConverter.createCloudObjectLocation(o))
 
 Review comment:
   This didnt work with Easymock

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376567796
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorage.java
 ##########
 @@ -117,4 +125,26 @@ private CloudBlobContainer getOrCreateCloudBlobContainer(final String containerN
 
     return cloudBlobContainer;
   }
+
+  public ResultSegment<ListBlobItem> listBlobsWithPrefixInContainerSegmented(
 
 Review comment:
   package private
   
   `@VisibleForTesting`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374390633
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.storage.azure.AzureCloudBlobDruidToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobDruid;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
+  static final String SCHEME = "azure";
+
+  private final Logger log = new Logger(AzureInputSource.class);
+  private final AzureStorage storage;
+  private final AzureEntityFactory entityFactory;
+  private final AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private final AzureCloudBlobDruidToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  @JsonCreator
+  public AzureInputSource(
+      @JacksonInject AzureStorage storage,
+      @JacksonInject AzureEntityFactory entityFactory,
+      @JacksonInject AzureCloudBlobIterableFactory azureCloudBlobIterableFactory,
+      @JacksonInject AzureCloudBlobDruidToCloudObjectLocationConverter azureCloudBlobToLocationConverter,
+      @JsonProperty("uris") @Nullable List<URI> uris,
+      @JsonProperty("prefixes") @Nullable List<URI> prefixes,
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+  )
+  {
+    super(SCHEME, uris, prefixes, objects);
+    this.storage = Preconditions.checkNotNull(storage, "AzureStorage");
+    this.entityFactory = Preconditions.checkNotNull(entityFactory, "AzureEntityFactory");
+    this.azureCloudBlobIterableFactory = Preconditions.checkNotNull(
+        azureCloudBlobIterableFactory,
+        "AzureCloudBlobIterableFactory"
+    );
+    this.azureCloudBlobToLocationConverter = Preconditions.checkNotNull(azureCloudBlobToLocationConverter, "AzureCloudBlobToLocationConverter");
+  }
+
+  @Override
+  protected AzureEntity createEntity(InputSplit<CloudObjectLocation> split)
 
 Review comment:
   The javadocs for `InputSplit` say this is used for `FiniteFirehoseFactory` maybe we should update those docs?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374386283
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureEntity.java
 ##########
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.google.common.base.Predicate;
+import com.google.inject.assistedinject.Assisted;
+import com.google.inject.assistedinject.AssistedInject;
+import org.apache.druid.data.input.RetryingInputEntity;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.storage.azure.AzureByteSource;
+import org.apache.druid.storage.azure.AzureByteSourceFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.AzureUtils;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+
+public class AzureEntity extends RetryingInputEntity
+{
+  private final Logger log = new Logger(AzureEntity.class);
+  private final CloudObjectLocation location;
+  private final AzureByteSource byteSource;
+
+  @AssistedInject
+  AzureEntity(
+      AzureStorage storage,
+      @Assisted CloudObjectLocation location,
 
 Review comment:
   Add `@NotNull` or is `EverythingNotNullByDefault` in this package? I didn't see that annotation used in this package

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376675974
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/data/input/azure/AzureInputSourceTest.java
 ##########
 @@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.storage.azure.AzureCloudBlobHolderToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterable;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobHolder;
+import org.easymock.EasyMock;
+import org.easymock.EasyMockSupport;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.net.URI;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Spliterators;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class AzureInputSourceTest extends EasyMockSupport
+{
+  private static final String CONTAINER_NAME = "container";
+  private static final String BLOB_NAME = "blob";
+  private static final URI PREFIX_URI;
+  private final List<URI> EMPTY_URIS = ImmutableList.of();
+  private final List<URI> EMPTY_PREFIXES = ImmutableList.of();
+  private final List<CloudObjectLocation> EMPTY_OBJECTS = ImmutableList.of();
+  private static final String CONTAINER = "CONTAINER";
+  private static final String BLOB_PATH = "BLOB_PATH";
+  private static final CloudObjectLocation CLOUD_OBJECT_LOCATION_1 = new CloudObjectLocation(CONTAINER, BLOB_PATH);
+
+  private AzureStorage storage;
+  private AzureEntityFactory entityFactory;
+  private AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  private InputSplit<CloudObjectLocation> inputSplit;
+  private AzureEntity azureEntity1;
+  private CloudBlobHolder cloudBlobDruid1;
+  private AzureCloudBlobIterable azureCloudBlobIterable;
+
+  private AzureInputSource azureInputSource;
+
+  static {
+    try {
+      PREFIX_URI = new URI(AzureInputSource.SCHEME + "://" + CONTAINER_NAME + "/" + BLOB_NAME);
+    }
+    catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  @Before
+  public void setup()
+  {
+    storage = createMock(AzureStorage.class);
+    entityFactory = createMock(AzureEntityFactory.class);
+    inputSplit = createMock(InputSplit.class);
+    azureEntity1 = createMock(AzureEntity.class);
+    azureCloudBlobIterableFactory = createMock(AzureCloudBlobIterableFactory.class);
+    azureCloudBlobToLocationConverter = createMock(AzureCloudBlobHolderToCloudObjectLocationConverter.class);
+    cloudBlobDruid1 = createMock(CloudBlobHolder.class);
+    azureCloudBlobIterable = createMock(AzureCloudBlobIterable.class);
+  }
+
+  @Test(expected = IllegalArgumentException.class)
+  public void test_Constructor_emptyUrisEmptyPrefixesEmptyObjects_throwsIllegalArgumentException()
 
 Review comment:
   fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376574080
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/storage/azure/AzureCloudBlobHolderToCloudObjectLocationConverterTest.java
 ##########
 @@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.azure;
+
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.easymock.EasyMock;
+import org.easymock.EasyMockSupport;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+public class AzureCloudBlobHolderToCloudObjectLocationConverterTest extends EasyMockSupport
+{
+  private static final String CONTAINER1 = "container1";
+  private static final String BLOB1 = "blob1";
+
+  private CloudBlobHolder cloudBlob;
+
+  private AzureCloudBlobHolderToCloudObjectLocationConverter converter;
+
+  @Before
+  public void setup()
+  {
+    cloudBlob = createMock(CloudBlobHolder.class);
 
 Review comment:
   just FYI - you can use `@RunWith(EasyMockRunner.class)` and then just annotate the variables in the class so you don't need to call `createMock` in the setUp method

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r377475837
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorage.java
 ##########
 @@ -117,4 +125,26 @@ private CloudBlobContainer getOrCreateCloudBlobContainer(final String containerN
 
     return cloudBlobContainer;
   }
+
+  public ResultSegment<ListBlobItem> listBlobsWithPrefixInContainerSegmented(
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r375527197
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorage.java
 ##########
 @@ -117,4 +123,26 @@ private CloudBlobContainer getOrCreateCloudBlobContainer(final String containerN
 
     return cloudBlobContainer;
   }
+
+  public ResultSegmentDruid<ListBlobItem> listBlobsWithPrefixInContainerSegmented(
 
 Review comment:
   Very difficult to do this since the CloudBlobContainer class is final. How do I mock in this case?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374387947
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.storage.azure.AzureCloudBlobDruidToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobDruid;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
 
 Review comment:
   `@VisibleForTesting`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376828301
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/data/input/azure/AzureInputSourceTest.java
 ##########
 @@ -199,6 +201,19 @@ public void test_toString_returnsExpectedString()
     Assert.assertEquals("AzureInputSource{uris=[], prefixes=[azure://container/blob], objects=[]}", actualToString);
   }
 
+  @Test
+  public void abidesEqualsContract()
+  {
+    EqualsVerifier.forClass(AzureInputSource.class)
+                  .usingGetClass()
+                  .withPrefabValues(Logger.class, new Logger(AzureStorage.class), new Logger(AzureStorage.class))
 
 Review comment:
   Shouldn't `AzureStorage#log` be a static member?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376676204
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/storage/azure/AzureCloudBlobIteratorTest.java
 ##########
 @@ -0,0 +1,291 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.azure;
+
+import com.google.common.collect.ImmutableList;
+import com.microsoft.azure.storage.ResultContinuation;
+import com.microsoft.azure.storage.ResultSegment;
+import com.microsoft.azure.storage.blob.ListBlobItem;
+import org.apache.druid.java.util.common.RE;
+import org.easymock.EasyMock;
+import org.easymock.EasyMockSupport;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.NoSuchElementException;
+
+public class AzureCloudBlobIteratorTest extends EasyMockSupport
+{
+  private static final String AZURE = "azure";
+  private static final String CONTAINER1 = "container1";
+  private static final String PREFIX_ONLY_CLOUD_BLOBS = "prefixOnlyCloudBlobs";
+  private static final String PREFIX_WITH_NO_BLOBS = "prefixWithNoBlobs";
+  private static final String PREFIX_WITH_CLOUD_BLOBS_AND_DIRECTORIES = "prefixWithCloudBlobsAndDirectories";
+  private static final URI PREFIX_ONLY_CLOUD_BLOBS_URI;
+  private static final URI PREFIX_WITH_NO_BLOBS_URI;
+  private static final URI PREFIX_WITH_CLOUD_BLOBS_AND_DIRECTORIES_URI;
+  private static final List<URI> EMPTY_URI_PREFIXES = ImmutableList.of();
+  private static final List<URI> PREFIXES;
+  private static final int MAX_LISTING_LENGTH = 10;
+
+  private AzureStorage storage;
+  private ListBlobItemHolderFactory blobItemDruidFactory;
+  private ResultSegment<ListBlobItem> resultSegmentPrefixOnlyCloudBlobs1;
+  private ResultSegment<ListBlobItem> resultSegmentPrefixOnlyCloudBlobs2;
+  private ResultSegment<ListBlobItem> resultSegmentPrefixWithNoBlobs;
+  private ResultSegment<ListBlobItem> resultSegmentPrefixWithCloudBlobsAndDirectories;
+
+  private ResultContinuation resultContinuationPrefixOnlyCloudBlobs = new ResultContinuation();
+  private ResultContinuation nullResultContinuationToken = null;
+
+  private ListBlobItem blobItemPrefixWithOnlyCloudBlobs1;
+  private ListBlobItemHolder cloudBlobItemPrefixWithOnlyCloudBlobs1;
+  private CloudBlobHolder cloudBlobDruidPrefixWithOnlyCloudBlobs1;
+
+  private ListBlobItem blobItemPrefixWithOnlyCloudBlobs2;
+  private ListBlobItemHolder cloudBlobItemPrefixWithOnlyCloudBlobs2;
+  private CloudBlobHolder cloudBlobDruidPrefixWithOnlyCloudBlobs2;
+
+  private ListBlobItem blobItemPrefixWithCloudBlobsAndDirectories1;
+  private ListBlobItemHolder directoryItemPrefixWithCloudBlobsAndDirectories;
+
+  private ListBlobItem blobItemPrefixWithCloudBlobsAndDirectories2;
+  private ListBlobItemHolder cloudBlobItemPrefixWithCloudBlobsAndDirectories;
+  private CloudBlobHolder cloudBlobDruidPrefixWithCloudBlobsAndDirectories;
+
+  private ListBlobItem blobItemPrefixWithCloudBlobsAndDirectories3;
+  private ListBlobItemHolder directoryItemPrefixWithCloudBlobsAndDirectories3;
+
+
+  private AzureCloudBlobIterator azureCloudBlobIterator;
+
+  static {
+    try {
+      PREFIX_ONLY_CLOUD_BLOBS_URI = new URI(AZURE + "://" + CONTAINER1 + "/" + PREFIX_ONLY_CLOUD_BLOBS);
+      PREFIX_WITH_NO_BLOBS_URI = new URI(AZURE + "://" + CONTAINER1 + "/" + PREFIX_WITH_NO_BLOBS);
+      PREFIX_WITH_CLOUD_BLOBS_AND_DIRECTORIES_URI = new URI(AZURE
+                                                            + "://"
+                                                            + CONTAINER1
+                                                            + "/"
+                                                            + PREFIX_WITH_CLOUD_BLOBS_AND_DIRECTORIES);
+      PREFIXES = ImmutableList.of(
+          PREFIX_ONLY_CLOUD_BLOBS_URI,
+          PREFIX_WITH_NO_BLOBS_URI,
+          PREFIX_WITH_CLOUD_BLOBS_AND_DIRECTORIES_URI
+      );
+    }
+    catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  @Before
+  public void setup()
+  {
+    storage = createMock(AzureStorage.class);
+    resultSegmentPrefixOnlyCloudBlobs1 = createMock(ResultSegment.class);
+    resultSegmentPrefixOnlyCloudBlobs2 = createMock(ResultSegment.class);
+    resultSegmentPrefixWithNoBlobs = createMock(ResultSegment.class);
+    resultSegmentPrefixWithCloudBlobsAndDirectories = createMock(ResultSegment.class);
+    cloudBlobItemPrefixWithOnlyCloudBlobs1 = createMock(ListBlobItemHolder.class);
+
+    blobItemPrefixWithOnlyCloudBlobs1 = createMock(ListBlobItem.class);
+    cloudBlobItemPrefixWithOnlyCloudBlobs1 = createMock(ListBlobItemHolder.class);
+    cloudBlobDruidPrefixWithOnlyCloudBlobs1 = createMock(CloudBlobHolder.class);
+
+    blobItemPrefixWithOnlyCloudBlobs2 = createMock(ListBlobItem.class);
+    cloudBlobItemPrefixWithOnlyCloudBlobs2 = createMock(ListBlobItemHolder.class);
+    cloudBlobDruidPrefixWithOnlyCloudBlobs2 = createMock(CloudBlobHolder.class);
+
+    blobItemPrefixWithCloudBlobsAndDirectories1 = createMock(ListBlobItem.class);
+    directoryItemPrefixWithCloudBlobsAndDirectories = createMock(ListBlobItemHolder.class);
+
+    blobItemPrefixWithCloudBlobsAndDirectories2 = createMock(ListBlobItem.class);
+    cloudBlobItemPrefixWithCloudBlobsAndDirectories = createMock(ListBlobItemHolder.class);
+    cloudBlobDruidPrefixWithCloudBlobsAndDirectories = createMock(CloudBlobHolder.class);
+
+    blobItemPrefixWithCloudBlobsAndDirectories3 = createMock(ListBlobItem.class);
+    directoryItemPrefixWithCloudBlobsAndDirectories3 = createMock(ListBlobItemHolder.class);
+
+
+    blobItemDruidFactory = createMock(ListBlobItemHolderFactory.class);
+  }
+
+  @Test
+  public void test_hasNext_noBlobs_returnsFalse()
+  {
+    azureCloudBlobIterator = new AzureCloudBlobIterator(
+        storage,
+        blobItemDruidFactory,
+        EMPTY_URI_PREFIXES,
+        MAX_LISTING_LENGTH
+    );
+    boolean hasNext = azureCloudBlobIterator.hasNext();
+    Assert.assertFalse(hasNext);
+  }
+
+  @Test
+  public void test_next_prefixesWithMultipleBlobsAndSomeDirectories_returnsExpectedBlobs() throws Exception
+  {
+    EasyMock.expect(cloudBlobItemPrefixWithOnlyCloudBlobs1.isCloudBlob()).andReturn(true);
+    EasyMock.expect(cloudBlobItemPrefixWithOnlyCloudBlobs1.getCloudBlob()).andReturn(
+        cloudBlobDruidPrefixWithOnlyCloudBlobs1);
+    EasyMock.expect(blobItemDruidFactory.create(blobItemPrefixWithOnlyCloudBlobs1)).andReturn(
+        cloudBlobItemPrefixWithOnlyCloudBlobs1);
+
+    EasyMock.expect(cloudBlobItemPrefixWithOnlyCloudBlobs2.isCloudBlob()).andReturn(true);
+    EasyMock.expect(cloudBlobItemPrefixWithOnlyCloudBlobs2.getCloudBlob()).andReturn(
+        cloudBlobDruidPrefixWithOnlyCloudBlobs2);
+    EasyMock.expect(blobItemDruidFactory.create(blobItemPrefixWithOnlyCloudBlobs2)).andReturn(
+        cloudBlobItemPrefixWithOnlyCloudBlobs2);
+
+    EasyMock.expect(directoryItemPrefixWithCloudBlobsAndDirectories.isCloudBlob()).andReturn(false);
+    EasyMock.expect(blobItemDruidFactory.create(blobItemPrefixWithCloudBlobsAndDirectories1)).andReturn(
+        directoryItemPrefixWithCloudBlobsAndDirectories);
+
+    EasyMock.expect(cloudBlobItemPrefixWithCloudBlobsAndDirectories.isCloudBlob()).andReturn(true);
+    EasyMock.expect(cloudBlobItemPrefixWithCloudBlobsAndDirectories.getCloudBlob()).andReturn(
+        cloudBlobDruidPrefixWithCloudBlobsAndDirectories);
+    EasyMock.expect(blobItemDruidFactory.create(blobItemPrefixWithCloudBlobsAndDirectories2)).andReturn(
+        cloudBlobItemPrefixWithCloudBlobsAndDirectories);
+
+    EasyMock.expect(directoryItemPrefixWithCloudBlobsAndDirectories3.isCloudBlob()).andReturn(false);
+    EasyMock.expect(blobItemDruidFactory.create(blobItemPrefixWithCloudBlobsAndDirectories3)).andReturn(
+        directoryItemPrefixWithCloudBlobsAndDirectories3);
+
+    ArrayList<ListBlobItem> resultBlobItemsPrefixWithOnlyCloudBlobs1 = new ArrayList<>();
+    resultBlobItemsPrefixWithOnlyCloudBlobs1.add(blobItemPrefixWithOnlyCloudBlobs1);
+    ArrayList<ListBlobItem> resultBlobItemsPrefixWithOnlyCloudBlobs2 = new ArrayList<>();
+    resultBlobItemsPrefixWithOnlyCloudBlobs2.add(blobItemPrefixWithOnlyCloudBlobs2);
+    ArrayList<ListBlobItem> resultBlobItemsPrefixWithNoBlobs = new ArrayList<>();
+    ArrayList<ListBlobItem> resultBlobItemsPrefixWithCloudBlobsAndDirectories = new ArrayList<>();
+    resultBlobItemsPrefixWithCloudBlobsAndDirectories.add(blobItemPrefixWithCloudBlobsAndDirectories1);
+    resultBlobItemsPrefixWithCloudBlobsAndDirectories.add(blobItemPrefixWithCloudBlobsAndDirectories2);
+    resultBlobItemsPrefixWithCloudBlobsAndDirectories.add(blobItemPrefixWithCloudBlobsAndDirectories3);
+    EasyMock.expect(resultSegmentPrefixOnlyCloudBlobs1.getContinuationToken())
+            .andReturn(resultContinuationPrefixOnlyCloudBlobs);
+    EasyMock.expect(resultSegmentPrefixOnlyCloudBlobs1.getResults())
+            .andReturn(resultBlobItemsPrefixWithOnlyCloudBlobs1);
+
+    EasyMock.expect(resultSegmentPrefixOnlyCloudBlobs2.getContinuationToken()).andReturn(nullResultContinuationToken);
+    EasyMock.expect(resultSegmentPrefixOnlyCloudBlobs2.getResults())
+            .andReturn(resultBlobItemsPrefixWithOnlyCloudBlobs2);
+
+    EasyMock.expect(resultSegmentPrefixWithNoBlobs.getContinuationToken()).andReturn(nullResultContinuationToken);
+    EasyMock.expect(resultSegmentPrefixWithNoBlobs.getResults()).andReturn(resultBlobItemsPrefixWithNoBlobs);
+
+    EasyMock.expect(resultSegmentPrefixWithCloudBlobsAndDirectories.getContinuationToken())
+            .andReturn(nullResultContinuationToken);
+    EasyMock.expect(resultSegmentPrefixWithCloudBlobsAndDirectories.getResults())
+            .andReturn(resultBlobItemsPrefixWithCloudBlobsAndDirectories);
+
+    EasyMock.expect(storage.listBlobsWithPrefixInContainerSegmented(
+        CONTAINER1,
+        PREFIX_ONLY_CLOUD_BLOBS,
+        nullResultContinuationToken,
+        MAX_LISTING_LENGTH
+    )).andReturn(resultSegmentPrefixOnlyCloudBlobs1);
+
+
+    EasyMock.expect(storage.listBlobsWithPrefixInContainerSegmented(
+        CONTAINER1,
+        PREFIX_ONLY_CLOUD_BLOBS,
+        resultContinuationPrefixOnlyCloudBlobs,
+        MAX_LISTING_LENGTH
+    )).andReturn(resultSegmentPrefixOnlyCloudBlobs2);
+
+    EasyMock.expect(storage.listBlobsWithPrefixInContainerSegmented(
+        CONTAINER1,
+        PREFIX_WITH_NO_BLOBS,
+        nullResultContinuationToken,
+        MAX_LISTING_LENGTH
+    )).andReturn(resultSegmentPrefixWithNoBlobs);
+
+    EasyMock.expect(storage.listBlobsWithPrefixInContainerSegmented(
+        CONTAINER1,
+        PREFIX_WITH_CLOUD_BLOBS_AND_DIRECTORIES,
+        nullResultContinuationToken,
+        MAX_LISTING_LENGTH
+    )).andReturn(resultSegmentPrefixWithCloudBlobsAndDirectories);
+
+    replayAll();
+
+    azureCloudBlobIterator = new AzureCloudBlobIterator(
+        storage,
+        blobItemDruidFactory,
+        PREFIXES,
+        MAX_LISTING_LENGTH
+    );
+
+    List<CloudBlobHolder> expectedBlobItems = ImmutableList.of(
+        cloudBlobDruidPrefixWithOnlyCloudBlobs1,
+        cloudBlobDruidPrefixWithOnlyCloudBlobs2,
+        cloudBlobDruidPrefixWithCloudBlobsAndDirectories
+    );
+    List<CloudBlobHolder> actualBlobItems = new ArrayList<>();
+    while (azureCloudBlobIterator.hasNext()) {
+      actualBlobItems.add(azureCloudBlobIterator.next());
+    }
+    Assert.assertEquals(expectedBlobItems.size(), actualBlobItems.size());
+    Assert.assertTrue(expectedBlobItems.containsAll(actualBlobItems));
+    verifyAll();
+  }
+
+  @Test(expected = NoSuchElementException.class)
+  public void test_next_emptyPrefixes_throwsNoSuchElementException()
+  {
+    azureCloudBlobIterator = new AzureCloudBlobIterator(
+        storage,
+        blobItemDruidFactory,
+        EMPTY_URI_PREFIXES,
+        MAX_LISTING_LENGTH
+    );
+    azureCloudBlobIterator.next();
+  }
+
+  @Test(expected = RE.class)
+  public void test_fetchNextBatch_exceptionThrownInStorage_throwsREException() throws Exception
+  {
+    EasyMock.expect(storage.listBlobsWithPrefixInContainerSegmented(
+        EasyMock.anyString(),
+        EasyMock.anyString(),
+        EasyMock.anyObject(),
+        EasyMock.anyInt()
+    )).andThrow(new URISyntaxException("", ""));
+    azureCloudBlobIterator = new AzureCloudBlobIterator(
+        storage,
+        blobItemDruidFactory,
+        PREFIXES,
+        MAX_LISTING_LENGTH
+    );
+  }
+
 
 Review comment:
   There is a test that verifies we've read all the blobs from a prefix if there is a continuation token, that verifies a prefix that has nothing in it, and a test that verified a prefix that contains both directories and blobs.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r377994641
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorage.java
 ##########
 @@ -33,10 +36,15 @@
 import java.io.InputStream;
 import java.net.URISyntaxException;
 import java.util.ArrayList;
+import java.util.EnumSet;
 import java.util.List;
 
+/**
+ * Abstracts the Azure storage layer. Makes direct calls to Azure file system.
+ */
 public class AzureStorage
 {
+  private static final boolean USE_FLAT_BLOB_LISTING = true;
 
   private final Logger log = new Logger(AzureStorage.class);
 
 Review comment:
   Will fix this as part of a subsequent change to this area of the code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374386332
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureEntity.java
 ##########
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.google.common.base.Predicate;
+import com.google.inject.assistedinject.Assisted;
+import com.google.inject.assistedinject.AssistedInject;
+import org.apache.druid.data.input.RetryingInputEntity;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.storage.azure.AzureByteSource;
+import org.apache.druid.storage.azure.AzureByteSourceFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.AzureUtils;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+
+public class AzureEntity extends RetryingInputEntity
+{
+  private final Logger log = new Logger(AzureEntity.class);
+  private final CloudObjectLocation location;
+  private final AzureByteSource byteSource;
+
+  @AssistedInject
+  AzureEntity(
+      AzureStorage storage,
 
 Review comment:
   storage isn't used

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376567470
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorage.java
 ##########
 @@ -49,7 +57,7 @@ public AzureStorage(
     this.cloudBlobClient = cloudBlobClient;
   }
 
-  public List<String> emptyCloudBlobDirectory(final String containerName, final String virtualDirPath)
+  public List<String> emptyCloudBlobDirectory(String containerName, final String virtualDirPath)
 
 Review comment:
   we're not changing the containerName in this function are we? I think it's confusing if one parameter is marked as final and the other isn't when both are in fact final

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on issue #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on issue #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#issuecomment-582590927
 
 
   > Can we also add a new tile for Azure data lake in Druid's web console?
   
   @fanjieqi thanks, I've just opened an internal jira issue to track this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376570724
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/ListBlobItemHolder.java
 ##########
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.azure;
 
 Review comment:
   nit: Move `CloudBlobHolder` and `ListBlobItemHolder` and their Factories to ...`stoarge.azure.blob` so that it's out of this package which has other classes that are relevant to how the extension works

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#issuecomment-582607815
 
 
   Opened https://github.com/apache/druid/issues/9316.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374384041
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureEntity.java
 ##########
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.google.common.base.Predicate;
+import com.google.inject.assistedinject.Assisted;
+import com.google.inject.assistedinject.AssistedInject;
+import org.apache.druid.data.input.RetryingInputEntity;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.storage.azure.AzureByteSource;
+import org.apache.druid.storage.azure.AzureByteSourceFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.AzureUtils;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+
+public class AzureEntity extends RetryingInputEntity
 
 Review comment:
   javadocs please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376557502
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.storage.azure.AzureCloudBlobHolderToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobHolder;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+/**
+ * Abstracts the Azure storage system where input data is stored. Allows users to retrieve entities in
+ * the storage system that match either a particular uri, prefix, or object.
+ */
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
+  public static final String SCHEME = "azure";
+
+  private final AzureStorage storage;
+  private final AzureEntityFactory entityFactory;
+  private final AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private final AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  @JsonCreator
+  public AzureInputSource(
+      @JacksonInject AzureStorage storage,
+      @JacksonInject AzureEntityFactory entityFactory,
+      @JacksonInject AzureCloudBlobIterableFactory azureCloudBlobIterableFactory,
+      @JacksonInject AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter,
+      @JsonProperty("uris") @Nullable List<URI> uris,
+      @JsonProperty("prefixes") @Nullable List<URI> prefixes,
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+  )
+  {
+    super(SCHEME, uris, prefixes, objects);
+    this.storage = Preconditions.checkNotNull(storage, "AzureStorage");
+    this.entityFactory = Preconditions.checkNotNull(entityFactory, "AzureEntityFactory");
+    this.azureCloudBlobIterableFactory = Preconditions.checkNotNull(
+        azureCloudBlobIterableFactory,
+        "AzureCloudBlobIterableFactory"
+    );
+    this.azureCloudBlobToLocationConverter = Preconditions.checkNotNull(azureCloudBlobToLocationConverter, "AzureCloudBlobToLocationConverter");
+  }
+
+  @Override
+  public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
+  {
+    return new AzureInputSource(
+        storage,
+        entityFactory,
+        azureCloudBlobIterableFactory,
+        azureCloudBlobToLocationConverter,
+        null,
+        null,
+        ImmutableList.of(split.get())
+    );
+  }
+
+  @Override
+  public String toString()
+  {
+    return "AzureInputSource{" +
+           "uris=" + getUris() +
+           ", prefixes=" + getPrefixes() +
+           ", objects=" + getObjects() +
+           '}';
+  }
+
+  @Override
+  protected AzureEntity createEntity(InputSplit<CloudObjectLocation> split)
+  {
+    return entityFactory.create(split.get());
+  }
+
+  @Override
+  protected Stream<InputSplit<CloudObjectLocation>> getPrefixesSplitStream()
+  {
+    return StreamSupport.stream(getIterableObjectsFromPrefixes().spliterator(), false)
+                        .map(o -> azureCloudBlobToLocationConverter.createCloudObjectLocation(o))
 
 Review comment:
   nit: intelliJ recommends using a method reference `azureCloudBlobToLocationConverter::createCloudObjectLocation`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374388544
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.storage.azure.AzureCloudBlobDruidToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobDruid;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
 
 Review comment:
   javadocs please - I'll stop asking for the rest of this review

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r375534420
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorage.java
 ##########
 @@ -117,4 +123,26 @@ private CloudBlobContainer getOrCreateCloudBlobContainer(final String containerN
 
     return cloudBlobContainer;
   }
+
+  public ResultSegmentDruid<ListBlobItem> listBlobsWithPrefixInContainerSegmented(
 
 Review comment:
   :( so many final classes in azure - maybe one day we can test with PowerMock 💪 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374405474
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureUtils.java
 ##########
 @@ -46,6 +48,11 @@
     return false;
   };
 
+  public static String extractAzureKey(URI uri)
 
 Review comment:
   javadoc for utility function

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376866366
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/storage/azure/AzureStorageDruidModuleTest.java
 ##########
 @@ -101,6 +126,108 @@ public void test_getAzureStorageContainer_expectedClient()
     Assert.assertSame(cloudBlobClient, azureStorage.getCloudBlobClient());
   }
 
+  @Test
+  public void test_getAzureCloudBlobToLocationConverter_expectedConverted()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
+    AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobLocationConverter1 = injector.getInstance(
+        AzureCloudBlobHolderToCloudObjectLocationConverter.class);
+    AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobLocationConverter2 = injector.getInstance(
+        AzureCloudBlobHolderToCloudObjectLocationConverter.class);
+    Assert.assertSame(azureCloudBlobLocationConverter1, azureCloudBlobLocationConverter2);
+  }
+
+  @Test
+  public void test_getAzureByteSourceFactory_canCreateAzureByteSource()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
+    AzureByteSourceFactory factory = injector.getInstance(AzureByteSourceFactory.class);
+    Object object1 = factory.create("container1", "blob1");
+    Object object2 = factory.create("container2", "blob2");
+    Assert.assertNotNull(object1);
+    Assert.assertNotNull(object2);
+    Assert.assertNotSame(object1, object2);
+  }
+
+  @Test
+  public void test_getAzureEntityFactory_canCreateAzureEntity()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+
+    EasyMock.expect(cloudObjectLocation1.getBucket()).andReturn(AZURE_CONTAINER);
+    EasyMock.expect(cloudObjectLocation2.getBucket()).andReturn(AZURE_CONTAINER);
+    EasyMock.expect(cloudObjectLocation1.getPath()).andReturn(PATH);
+    EasyMock.expect(cloudObjectLocation2.getPath()).andReturn(PATH);
+    replayAll();
+
+    injector = makeInjectorWithProperties(props);
+    AzureEntityFactory factory = injector.getInstance(AzureEntityFactory.class);
+    Object object1 = factory.create(cloudObjectLocation1);
+    Object object2 = factory.create(cloudObjectLocation2);
+    Assert.assertNotNull(object1);
+    Assert.assertNotNull(object2);
+    Assert.assertNotSame(object1, object2);
+  }
+
+  @Test
+  public void test_getAzureCloudBlobIteratorFactory_canCreateAzureCloudBlobIterator()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
+    AzureCloudBlobIteratorFactory factory = injector.getInstance(AzureCloudBlobIteratorFactory.class);
+    Object object1 = factory.create(EMPTY_PREFIXES_ITERABLE, 10);
+    Object object2 = factory.create(EMPTY_PREFIXES_ITERABLE, 10);
+    Assert.assertNotNull(object1);
+    Assert.assertNotNull(object2);
+    Assert.assertNotSame(object1, object2);
+  }
+
+  @Test
+  public void test_getAzureCloudBlobIterableFactory_canCreateAzureCloudBlobIterable()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
+    AzureCloudBlobIterableFactory factory = injector.getInstance(AzureCloudBlobIterableFactory.class);
+    AzureCloudBlobIterable object1 = factory.create(EMPTY_PREFIXES_ITERABLE, 10);
+    AzureCloudBlobIterable object2 = factory.create(EMPTY_PREFIXES_ITERABLE, 10);
+    Assert.assertNotNull(object1);
+    Assert.assertNotNull(object2);
+    Assert.assertNotSame(object1, object2);
+  }
+
+  @Test
+  public void test_getListBlobItemDruidFactory_canCreateListBlobItemDruid()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
+    ListBlobItemHolderFactory factory = injector.getInstance(ListBlobItemHolderFactory.class);
+    ListBlobItemHolder object1 = factory.create(blobItem1);
+    ListBlobItemHolder object2 = factory.create(blobItem2);
+    Assert.assertNotNull(object1);
+    Assert.assertNotNull(object2);
+    Assert.assertNotSame(object1, object2);
+  }
+
 
 Review comment:
   How to do this? There is no property in the class or subclass for this. Should I add one just for this test?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r375996122
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.storage.azure.AzureCloudBlobDruidToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobDruid;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
+  static final String SCHEME = "azure";
+
+  private final Logger log = new Logger(AzureInputSource.class);
+  private final AzureStorage storage;
+  private final AzureEntityFactory entityFactory;
+  private final AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private final AzureCloudBlobDruidToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  @JsonCreator
+  public AzureInputSource(
+      @JacksonInject AzureStorage storage,
+      @JacksonInject AzureEntityFactory entityFactory,
+      @JacksonInject AzureCloudBlobIterableFactory azureCloudBlobIterableFactory,
+      @JacksonInject AzureCloudBlobDruidToCloudObjectLocationConverter azureCloudBlobToLocationConverter,
+      @JsonProperty("uris") @Nullable List<URI> uris,
+      @JsonProperty("prefixes") @Nullable List<URI> prefixes,
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+  )
+  {
+    super(SCHEME, uris, prefixes, objects);
+    this.storage = Preconditions.checkNotNull(storage, "AzureStorage");
+    this.entityFactory = Preconditions.checkNotNull(entityFactory, "AzureEntityFactory");
+    this.azureCloudBlobIterableFactory = Preconditions.checkNotNull(
+        azureCloudBlobIterableFactory,
+        "AzureCloudBlobIterableFactory"
+    );
+    this.azureCloudBlobToLocationConverter = Preconditions.checkNotNull(azureCloudBlobToLocationConverter, "AzureCloudBlobToLocationConverter");
+  }
+
+  @Override
+  protected AzureEntity createEntity(InputSplit<CloudObjectLocation> split)
+  {
+    return entityFactory.create(split.get());
 
 Review comment:
   It makes it a lot easier to test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r375995142
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureEntity.java
 ##########
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.google.common.base.Predicate;
+import com.google.inject.assistedinject.Assisted;
+import com.google.inject.assistedinject.AssistedInject;
+import org.apache.druid.data.input.RetryingInputEntity;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.storage.azure.AzureByteSource;
+import org.apache.druid.storage.azure.AzureByteSourceFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.AzureUtils;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+
+public class AzureEntity extends RetryingInputEntity
+{
+  private final Logger log = new Logger(AzureEntity.class);
+  private final CloudObjectLocation location;
+  private final AzureByteSource byteSource;
+
+  @AssistedInject
+  AzureEntity(
+      AzureStorage storage,
+      @Assisted CloudObjectLocation location,
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376532892
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.storage.azure.AzureCloudBlobHolderToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobHolder;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+/**
+ * Abstracts the Azure storage system where input data is stored. Allows users to retrieve entities in
+ * the storage system that match either a particular uri, prefix, or object.
+ */
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
+  public static final String SCHEME = "azure";
+
+  private final AzureStorage storage;
+  private final AzureEntityFactory entityFactory;
+  private final AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private final AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  @JsonCreator
+  public AzureInputSource(
+      @JacksonInject AzureStorage storage,
+      @JacksonInject AzureEntityFactory entityFactory,
+      @JacksonInject AzureCloudBlobIterableFactory azureCloudBlobIterableFactory,
+      @JacksonInject AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter,
+      @JsonProperty("uris") @Nullable List<URI> uris,
+      @JsonProperty("prefixes") @Nullable List<URI> prefixes,
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+  )
+  {
+    super(SCHEME, uris, prefixes, objects);
+    this.storage = Preconditions.checkNotNull(storage, "AzureStorage");
+    this.entityFactory = Preconditions.checkNotNull(entityFactory, "AzureEntityFactory");
+    this.azureCloudBlobIterableFactory = Preconditions.checkNotNull(
+        azureCloudBlobIterableFactory,
+        "AzureCloudBlobIterableFactory"
+    );
+    this.azureCloudBlobToLocationConverter = Preconditions.checkNotNull(azureCloudBlobToLocationConverter, "AzureCloudBlobToLocationConverter");
+  }
+
+  @Override
+  public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
+  {
+    return new AzureInputSource(
+        storage,
+        entityFactory,
+        azureCloudBlobIterableFactory,
+        azureCloudBlobToLocationConverter,
+        null,
+        null,
+        ImmutableList.of(split.get())
+    );
+  }
+
+  @Override
+  public String toString()
+  {
+    return "AzureInputSource{" +
+           "uris=" + getUris() +
+           ", prefixes=" + getPrefixes() +
+           ", objects=" + getObjects() +
+           '}';
+  }
+
+  @Override
+  protected AzureEntity createEntity(InputSplit<CloudObjectLocation> split)
+  {
+    return entityFactory.create(split.get());
+  }
+
+  @Override
+  protected Stream<InputSplit<CloudObjectLocation>> getPrefixesSplitStream()
+  {
+    return StreamSupport.stream(getIterableObjectsFromPrefixes().spliterator(), false)
+                        .map(o -> azureCloudBlobToLocationConverter.createCloudObjectLocation(o))
+                        .map(InputSplit::new);
+  }
+
+  private Iterable<CloudBlobHolder> getIterableObjectsFromPrefixes()
+  {
+    return azureCloudBlobIterableFactory.create(getPrefixes(), MAX_LISTING_LENGTH);
+  }
+}
 
 Review comment:
   missing serdeTest?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r377475772
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/ListBlobItemHolder.java
 ##########
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.azure;
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376018274
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/CloudBlobDruid.java
 ##########
 @@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.azure;
+
+import com.microsoft.azure.storage.StorageException;
+import com.microsoft.azure.storage.blob.CloudBlob;
+
+import java.net.URISyntaxException;
+
+/**
+ * Wrapper for {@link CloudBlob}. Used to make testing easier, since {@link CloudBlob}
+ * is a final class and so is difficult to mock in unit tests.
+ */
+public class CloudBlobDruid
 
 Review comment:
   nit: the suffix of `Druid` seems weird to me. Maybe because.. everything in Druid repo are druid? I suggest `CloudBlobHolder`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376675212
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureByteSourceFactory.java
 ##########
 @@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.azure;
+
+import com.google.inject.assistedinject.Assisted;
+
+public interface AzureByteSourceFactory
+{
+  AzureByteSource create(@Assisted("containerName") String containerName, @Assisted("blobPath") String blobPath);
 
 Review comment:
   Does it hurt? Also I think you do need since both parameters are the same type.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374401256
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureByteSourceFactory.java
 ##########
 @@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.azure;
+
+import com.google.inject.assistedinject.Assisted;
+
+public interface AzureByteSourceFactory
+{
+  AzureByteSource create(@Assisted("containerName") String containerName, @Assisted("blobPath") String blobPath);
 
 Review comment:
   You don't need the assisted annotations here

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r377475867
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorage.java
 ##########
 @@ -49,7 +57,7 @@ public AzureStorage(
     this.cloudBlobClient = cloudBlobClient;
   }
 
-  public List<String> emptyCloudBlobDirectory(final String containerName, final String virtualDirPath)
+  public List<String> emptyCloudBlobDirectory(String containerName, final String virtualDirPath)
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376039538
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureCloudBlobDruidToCloudObjectLocationConverter.java
 ##########
 @@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.azure;
+
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+
+public class AzureCloudBlobDruidToCloudObjectLocationConverter
+    implements ICloudSpecificObjectToCloudObjectLocationConverter<CloudBlobDruid>
+{
+  @Override
+  public CloudObjectLocation createCloudObjectLocation(CloudBlobDruid cloudBlob)
+  {
+    try {
+      return new CloudObjectLocation(cloudBlob.getContainerName(), cloudBlob.getName());
+    }
+    catch (Exception e) {
+      throw new RuntimeException(e);
 
 Review comment:
   hmm, I think those are handled at higher level.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh edited a comment on issue #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh edited a comment on issue #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#issuecomment-582590927
 
 
   > Can we also add a new tile for Azure data lake in Druid's web console?
   
   @fjy thanks, I've just opened an internal jira issue to track this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374403687
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorage.java
 ##########
 @@ -117,4 +123,26 @@ private CloudBlobContainer getOrCreateCloudBlobContainer(final String containerN
 
     return cloudBlobContainer;
   }
+
+  public ResultSegmentDruid<ListBlobItem> listBlobsWithPrefixInContainerSegmented(
 
 Review comment:
   Do we want a unit test to make sure we're using flat blob listing?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374403992
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorage.java
 ##########
 @@ -117,4 +123,26 @@ private CloudBlobContainer getOrCreateCloudBlobContainer(final String containerN
 
     return cloudBlobContainer;
   }
+
+  public ResultSegmentDruid<ListBlobItem> listBlobsWithPrefixInContainerSegmented(
+      final String containerName,
+      final String prefix,
+      ResultContinuation continuationToken,
+      int maxResults
+  ) throws StorageException, URISyntaxException
+  {
+    CloudBlobContainer cloudBlobContainer = cloudBlobClient.getContainerReference(containerName);
+    return new ResultSegmentDruid<ListBlobItem>(cloudBlobContainer
+                                                    .listBlobsSegmented(
+                                                        prefix,
+                                                        /* Use flat blob listing here so that we get only blob types and not directories.*/
+                                                        USE_FLAT_BLOB_LISTING,
+                                                        EnumSet
+                                                            .noneOf(BlobListingDetails.class),
+                                                        maxResults,
+                                                        (ResultContinuation) null,
+                                                        (BlobRequestOptions) null,
+                                                        (OperationContext) null
 
 Review comment:
   I don't think you need to cast nulls

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r377993114
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureByteSource.java
 ##########
 @@ -46,11 +53,19 @@ public AzureByteSource(
 
   @Override
   public InputStream openStream() throws IOException
+  {
+    return openStream(0L);
+  }
+
+  public InputStream openStream(long offset) throws IOException
   {
     try {
-      return azureStorage.getBlobInputStream(containerName, blobPath);
+      return azureStorage.getBlobInputStream(offset, containerName, blobPath);
     }
     catch (StorageException | URISyntaxException e) {
+      log.warn("Exception when opening stream to azure resource, containerName: %s, blobPath: %s, Error: %s",
 
 Review comment:
   The log in RetryingInputStream does not have information about the container or blob

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376566089
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/storage/azure/AzureCloudBlobIteratorTest.java
 ##########
 @@ -0,0 +1,291 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.azure;
+
+import com.google.common.collect.ImmutableList;
+import com.microsoft.azure.storage.ResultContinuation;
+import com.microsoft.azure.storage.ResultSegment;
+import com.microsoft.azure.storage.blob.ListBlobItem;
+import org.apache.druid.java.util.common.RE;
+import org.easymock.EasyMock;
+import org.easymock.EasyMockSupport;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.NoSuchElementException;
+
+public class AzureCloudBlobIteratorTest extends EasyMockSupport
+{
+  private static final String AZURE = "azure";
+  private static final String CONTAINER1 = "container1";
+  private static final String PREFIX_ONLY_CLOUD_BLOBS = "prefixOnlyCloudBlobs";
+  private static final String PREFIX_WITH_NO_BLOBS = "prefixWithNoBlobs";
+  private static final String PREFIX_WITH_CLOUD_BLOBS_AND_DIRECTORIES = "prefixWithCloudBlobsAndDirectories";
+  private static final URI PREFIX_ONLY_CLOUD_BLOBS_URI;
+  private static final URI PREFIX_WITH_NO_BLOBS_URI;
+  private static final URI PREFIX_WITH_CLOUD_BLOBS_AND_DIRECTORIES_URI;
+  private static final List<URI> EMPTY_URI_PREFIXES = ImmutableList.of();
+  private static final List<URI> PREFIXES;
+  private static final int MAX_LISTING_LENGTH = 10;
+
+  private AzureStorage storage;
+  private ListBlobItemHolderFactory blobItemDruidFactory;
+  private ResultSegment<ListBlobItem> resultSegmentPrefixOnlyCloudBlobs1;
+  private ResultSegment<ListBlobItem> resultSegmentPrefixOnlyCloudBlobs2;
+  private ResultSegment<ListBlobItem> resultSegmentPrefixWithNoBlobs;
+  private ResultSegment<ListBlobItem> resultSegmentPrefixWithCloudBlobsAndDirectories;
+
+  private ResultContinuation resultContinuationPrefixOnlyCloudBlobs = new ResultContinuation();
+  private ResultContinuation nullResultContinuationToken = null;
+
+  private ListBlobItem blobItemPrefixWithOnlyCloudBlobs1;
+  private ListBlobItemHolder cloudBlobItemPrefixWithOnlyCloudBlobs1;
+  private CloudBlobHolder cloudBlobDruidPrefixWithOnlyCloudBlobs1;
+
+  private ListBlobItem blobItemPrefixWithOnlyCloudBlobs2;
+  private ListBlobItemHolder cloudBlobItemPrefixWithOnlyCloudBlobs2;
+  private CloudBlobHolder cloudBlobDruidPrefixWithOnlyCloudBlobs2;
+
+  private ListBlobItem blobItemPrefixWithCloudBlobsAndDirectories1;
+  private ListBlobItemHolder directoryItemPrefixWithCloudBlobsAndDirectories;
+
+  private ListBlobItem blobItemPrefixWithCloudBlobsAndDirectories2;
+  private ListBlobItemHolder cloudBlobItemPrefixWithCloudBlobsAndDirectories;
+  private CloudBlobHolder cloudBlobDruidPrefixWithCloudBlobsAndDirectories;
+
+  private ListBlobItem blobItemPrefixWithCloudBlobsAndDirectories3;
+  private ListBlobItemHolder directoryItemPrefixWithCloudBlobsAndDirectories3;
+
+
+  private AzureCloudBlobIterator azureCloudBlobIterator;
+
+  static {
+    try {
+      PREFIX_ONLY_CLOUD_BLOBS_URI = new URI(AZURE + "://" + CONTAINER1 + "/" + PREFIX_ONLY_CLOUD_BLOBS);
+      PREFIX_WITH_NO_BLOBS_URI = new URI(AZURE + "://" + CONTAINER1 + "/" + PREFIX_WITH_NO_BLOBS);
+      PREFIX_WITH_CLOUD_BLOBS_AND_DIRECTORIES_URI = new URI(AZURE
+                                                            + "://"
+                                                            + CONTAINER1
+                                                            + "/"
+                                                            + PREFIX_WITH_CLOUD_BLOBS_AND_DIRECTORIES);
+      PREFIXES = ImmutableList.of(
+          PREFIX_ONLY_CLOUD_BLOBS_URI,
+          PREFIX_WITH_NO_BLOBS_URI,
+          PREFIX_WITH_CLOUD_BLOBS_AND_DIRECTORIES_URI
+      );
+    }
+    catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  @Before
+  public void setup()
+  {
+    storage = createMock(AzureStorage.class);
+    resultSegmentPrefixOnlyCloudBlobs1 = createMock(ResultSegment.class);
+    resultSegmentPrefixOnlyCloudBlobs2 = createMock(ResultSegment.class);
+    resultSegmentPrefixWithNoBlobs = createMock(ResultSegment.class);
+    resultSegmentPrefixWithCloudBlobsAndDirectories = createMock(ResultSegment.class);
+    cloudBlobItemPrefixWithOnlyCloudBlobs1 = createMock(ListBlobItemHolder.class);
+
+    blobItemPrefixWithOnlyCloudBlobs1 = createMock(ListBlobItem.class);
+    cloudBlobItemPrefixWithOnlyCloudBlobs1 = createMock(ListBlobItemHolder.class);
+    cloudBlobDruidPrefixWithOnlyCloudBlobs1 = createMock(CloudBlobHolder.class);
+
+    blobItemPrefixWithOnlyCloudBlobs2 = createMock(ListBlobItem.class);
+    cloudBlobItemPrefixWithOnlyCloudBlobs2 = createMock(ListBlobItemHolder.class);
+    cloudBlobDruidPrefixWithOnlyCloudBlobs2 = createMock(CloudBlobHolder.class);
+
+    blobItemPrefixWithCloudBlobsAndDirectories1 = createMock(ListBlobItem.class);
+    directoryItemPrefixWithCloudBlobsAndDirectories = createMock(ListBlobItemHolder.class);
+
+    blobItemPrefixWithCloudBlobsAndDirectories2 = createMock(ListBlobItem.class);
+    cloudBlobItemPrefixWithCloudBlobsAndDirectories = createMock(ListBlobItemHolder.class);
+    cloudBlobDruidPrefixWithCloudBlobsAndDirectories = createMock(CloudBlobHolder.class);
+
+    blobItemPrefixWithCloudBlobsAndDirectories3 = createMock(ListBlobItem.class);
+    directoryItemPrefixWithCloudBlobsAndDirectories3 = createMock(ListBlobItemHolder.class);
+
+
+    blobItemDruidFactory = createMock(ListBlobItemHolderFactory.class);
+  }
+
+  @Test
+  public void test_hasNext_noBlobs_returnsFalse()
+  {
+    azureCloudBlobIterator = new AzureCloudBlobIterator(
+        storage,
+        blobItemDruidFactory,
+        EMPTY_URI_PREFIXES,
+        MAX_LISTING_LENGTH
+    );
+    boolean hasNext = azureCloudBlobIterator.hasNext();
+    Assert.assertFalse(hasNext);
+  }
+
+  @Test
+  public void test_next_prefixesWithMultipleBlobsAndSomeDirectories_returnsExpectedBlobs() throws Exception
+  {
+    EasyMock.expect(cloudBlobItemPrefixWithOnlyCloudBlobs1.isCloudBlob()).andReturn(true);
+    EasyMock.expect(cloudBlobItemPrefixWithOnlyCloudBlobs1.getCloudBlob()).andReturn(
+        cloudBlobDruidPrefixWithOnlyCloudBlobs1);
+    EasyMock.expect(blobItemDruidFactory.create(blobItemPrefixWithOnlyCloudBlobs1)).andReturn(
+        cloudBlobItemPrefixWithOnlyCloudBlobs1);
+
+    EasyMock.expect(cloudBlobItemPrefixWithOnlyCloudBlobs2.isCloudBlob()).andReturn(true);
+    EasyMock.expect(cloudBlobItemPrefixWithOnlyCloudBlobs2.getCloudBlob()).andReturn(
+        cloudBlobDruidPrefixWithOnlyCloudBlobs2);
+    EasyMock.expect(blobItemDruidFactory.create(blobItemPrefixWithOnlyCloudBlobs2)).andReturn(
+        cloudBlobItemPrefixWithOnlyCloudBlobs2);
+
+    EasyMock.expect(directoryItemPrefixWithCloudBlobsAndDirectories.isCloudBlob()).andReturn(false);
+    EasyMock.expect(blobItemDruidFactory.create(blobItemPrefixWithCloudBlobsAndDirectories1)).andReturn(
+        directoryItemPrefixWithCloudBlobsAndDirectories);
+
+    EasyMock.expect(cloudBlobItemPrefixWithCloudBlobsAndDirectories.isCloudBlob()).andReturn(true);
+    EasyMock.expect(cloudBlobItemPrefixWithCloudBlobsAndDirectories.getCloudBlob()).andReturn(
+        cloudBlobDruidPrefixWithCloudBlobsAndDirectories);
+    EasyMock.expect(blobItemDruidFactory.create(blobItemPrefixWithCloudBlobsAndDirectories2)).andReturn(
+        cloudBlobItemPrefixWithCloudBlobsAndDirectories);
+
+    EasyMock.expect(directoryItemPrefixWithCloudBlobsAndDirectories3.isCloudBlob()).andReturn(false);
+    EasyMock.expect(blobItemDruidFactory.create(blobItemPrefixWithCloudBlobsAndDirectories3)).andReturn(
+        directoryItemPrefixWithCloudBlobsAndDirectories3);
+
+    ArrayList<ListBlobItem> resultBlobItemsPrefixWithOnlyCloudBlobs1 = new ArrayList<>();
+    resultBlobItemsPrefixWithOnlyCloudBlobs1.add(blobItemPrefixWithOnlyCloudBlobs1);
+    ArrayList<ListBlobItem> resultBlobItemsPrefixWithOnlyCloudBlobs2 = new ArrayList<>();
+    resultBlobItemsPrefixWithOnlyCloudBlobs2.add(blobItemPrefixWithOnlyCloudBlobs2);
+    ArrayList<ListBlobItem> resultBlobItemsPrefixWithNoBlobs = new ArrayList<>();
+    ArrayList<ListBlobItem> resultBlobItemsPrefixWithCloudBlobsAndDirectories = new ArrayList<>();
+    resultBlobItemsPrefixWithCloudBlobsAndDirectories.add(blobItemPrefixWithCloudBlobsAndDirectories1);
+    resultBlobItemsPrefixWithCloudBlobsAndDirectories.add(blobItemPrefixWithCloudBlobsAndDirectories2);
+    resultBlobItemsPrefixWithCloudBlobsAndDirectories.add(blobItemPrefixWithCloudBlobsAndDirectories3);
+    EasyMock.expect(resultSegmentPrefixOnlyCloudBlobs1.getContinuationToken())
+            .andReturn(resultContinuationPrefixOnlyCloudBlobs);
+    EasyMock.expect(resultSegmentPrefixOnlyCloudBlobs1.getResults())
+            .andReturn(resultBlobItemsPrefixWithOnlyCloudBlobs1);
+
+    EasyMock.expect(resultSegmentPrefixOnlyCloudBlobs2.getContinuationToken()).andReturn(nullResultContinuationToken);
+    EasyMock.expect(resultSegmentPrefixOnlyCloudBlobs2.getResults())
+            .andReturn(resultBlobItemsPrefixWithOnlyCloudBlobs2);
+
+    EasyMock.expect(resultSegmentPrefixWithNoBlobs.getContinuationToken()).andReturn(nullResultContinuationToken);
+    EasyMock.expect(resultSegmentPrefixWithNoBlobs.getResults()).andReturn(resultBlobItemsPrefixWithNoBlobs);
+
+    EasyMock.expect(resultSegmentPrefixWithCloudBlobsAndDirectories.getContinuationToken())
+            .andReturn(nullResultContinuationToken);
+    EasyMock.expect(resultSegmentPrefixWithCloudBlobsAndDirectories.getResults())
+            .andReturn(resultBlobItemsPrefixWithCloudBlobsAndDirectories);
+
+    EasyMock.expect(storage.listBlobsWithPrefixInContainerSegmented(
+        CONTAINER1,
+        PREFIX_ONLY_CLOUD_BLOBS,
+        nullResultContinuationToken,
+        MAX_LISTING_LENGTH
+    )).andReturn(resultSegmentPrefixOnlyCloudBlobs1);
+
+
+    EasyMock.expect(storage.listBlobsWithPrefixInContainerSegmented(
+        CONTAINER1,
+        PREFIX_ONLY_CLOUD_BLOBS,
+        resultContinuationPrefixOnlyCloudBlobs,
+        MAX_LISTING_LENGTH
+    )).andReturn(resultSegmentPrefixOnlyCloudBlobs2);
+
+    EasyMock.expect(storage.listBlobsWithPrefixInContainerSegmented(
+        CONTAINER1,
+        PREFIX_WITH_NO_BLOBS,
+        nullResultContinuationToken,
+        MAX_LISTING_LENGTH
+    )).andReturn(resultSegmentPrefixWithNoBlobs);
+
+    EasyMock.expect(storage.listBlobsWithPrefixInContainerSegmented(
+        CONTAINER1,
+        PREFIX_WITH_CLOUD_BLOBS_AND_DIRECTORIES,
+        nullResultContinuationToken,
+        MAX_LISTING_LENGTH
+    )).andReturn(resultSegmentPrefixWithCloudBlobsAndDirectories);
+
+    replayAll();
+
+    azureCloudBlobIterator = new AzureCloudBlobIterator(
+        storage,
+        blobItemDruidFactory,
+        PREFIXES,
+        MAX_LISTING_LENGTH
+    );
+
+    List<CloudBlobHolder> expectedBlobItems = ImmutableList.of(
+        cloudBlobDruidPrefixWithOnlyCloudBlobs1,
+        cloudBlobDruidPrefixWithOnlyCloudBlobs2,
+        cloudBlobDruidPrefixWithCloudBlobsAndDirectories
+    );
+    List<CloudBlobHolder> actualBlobItems = new ArrayList<>();
+    while (azureCloudBlobIterator.hasNext()) {
+      actualBlobItems.add(azureCloudBlobIterator.next());
+    }
+    Assert.assertEquals(expectedBlobItems.size(), actualBlobItems.size());
+    Assert.assertTrue(expectedBlobItems.containsAll(actualBlobItems));
+    verifyAll();
+  }
+
+  @Test(expected = NoSuchElementException.class)
+  public void test_next_emptyPrefixes_throwsNoSuchElementException()
+  {
+    azureCloudBlobIterator = new AzureCloudBlobIterator(
+        storage,
+        blobItemDruidFactory,
+        EMPTY_URI_PREFIXES,
+        MAX_LISTING_LENGTH
+    );
+    azureCloudBlobIterator.next();
+  }
+
+  @Test(expected = RE.class)
+  public void test_fetchNextBatch_exceptionThrownInStorage_throwsREException() throws Exception
+  {
+    EasyMock.expect(storage.listBlobsWithPrefixInContainerSegmented(
+        EasyMock.anyString(),
+        EasyMock.anyString(),
+        EasyMock.anyObject(),
+        EasyMock.anyInt()
+    )).andThrow(new URISyntaxException("", ""));
+    azureCloudBlobIterator = new AzureCloudBlobIterator(
+        storage,
+        blobItemDruidFactory,
+        PREFIXES,
+        MAX_LISTING_LENGTH
+    );
+  }
+
 
 Review comment:
   Is there a test that verifies we've read all the blobs from a prefix if there is a continuation token?
   Is there a test for a prefix that has nothing in it? A prefix that has only directories?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on issue #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on issue #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#issuecomment-581655170
 
 
   This pull request **introduces 2 alerts** when merging 6dd498ea5575fb96813ae1814f2d5de2ff00affa into a085685182d62e5dd1b716f1bbb9bcbbaeb1c661 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-88e050fac2ba08e8bc555fc78736f092356b6760)
   
   **new alerts:**
   
   * 1 for Unused format argument
   * 1 for Iterable wrapping an iterator

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r377989681
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.storage.azure.AzureCloudBlobHolderToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.blob.CloudBlobHolder;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+/**
+ * Abstracts the Azure storage system where input data is stored. Allows users to retrieve entities in
+ * the storage system that match either a particular uri, prefix, or object.
+ */
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  @VisibleForTesting
+  static final int MAX_LISTING_LENGTH = 1024;
 
 Review comment:
   Why not using `CloudObjectInputSource.MAX_LISTING_LENGTH`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r377993019
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.storage.azure.AzureCloudBlobHolderToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.blob.CloudBlobHolder;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+/**
+ * Abstracts the Azure storage system where input data is stored. Allows users to retrieve entities in
+ * the storage system that match either a particular uri, prefix, or object.
+ */
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  @VisibleForTesting
+  static final int MAX_LISTING_LENGTH = 1024;
 
 Review comment:
   intelliJ was producing a warning when doing this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376675427
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.storage.azure.AzureCloudBlobDruidToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobDruid;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
+  static final String SCHEME = "azure";
+
+  private final Logger log = new Logger(AzureInputSource.class);
 
 Review comment:
   fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r377989038
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorage.java
 ##########
 @@ -33,10 +36,15 @@
 import java.io.InputStream;
 import java.net.URISyntaxException;
 import java.util.ArrayList;
+import java.util.EnumSet;
 import java.util.List;
 
+/**
+ * Abstracts the Azure storage layer. Makes direct calls to Azure file system.
+ */
 public class AzureStorage
 {
+  private static final boolean USE_FLAT_BLOB_LISTING = true;
 
   private final Logger log = new Logger(AzureStorage.class);
 
 Review comment:
   Please make this a static final variable.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374401608
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureCloudBlobDruidToCloudObjectLocationConverter.java
 ##########
 @@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.azure;
+
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+
+public class AzureCloudBlobDruidToCloudObjectLocationConverter
+    implements ICloudSpecificObjectToCloudObjectLocationConverter<CloudBlobDruid>
+{
+  @Override
+  public CloudObjectLocation createCloudObjectLocation(CloudBlobDruid cloudBlob)
+  {
+    try {
+      return new CloudObjectLocation(cloudBlob.getContainerName(), cloudBlob.getName());
+    }
+    catch (Exception e) {
+      throw new RuntimeException(e);
 
 Review comment:
   Should this handle retryable exceptions?
   
   `AzureUtils.AZURE_RETRY.apply(e)`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374442146
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.storage.azure.AzureCloudBlobDruidToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobDruid;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
+  static final String SCHEME = "azure";
+
+  private final Logger log = new Logger(AzureInputSource.class);
+  private final AzureStorage storage;
+  private final AzureEntityFactory entityFactory;
+  private final AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private final AzureCloudBlobDruidToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  @JsonCreator
+  public AzureInputSource(
+      @JacksonInject AzureStorage storage,
+      @JacksonInject AzureEntityFactory entityFactory,
+      @JacksonInject AzureCloudBlobIterableFactory azureCloudBlobIterableFactory,
+      @JacksonInject AzureCloudBlobDruidToCloudObjectLocationConverter azureCloudBlobToLocationConverter,
+      @JsonProperty("uris") @Nullable List<URI> uris,
+      @JsonProperty("prefixes") @Nullable List<URI> prefixes,
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+  )
+  {
+    super(SCHEME, uris, prefixes, objects);
+    this.storage = Preconditions.checkNotNull(storage, "AzureStorage");
+    this.entityFactory = Preconditions.checkNotNull(entityFactory, "AzureEntityFactory");
+    this.azureCloudBlobIterableFactory = Preconditions.checkNotNull(
+        azureCloudBlobIterableFactory,
+        "AzureCloudBlobIterableFactory"
+    );
+    this.azureCloudBlobToLocationConverter = Preconditions.checkNotNull(azureCloudBlobToLocationConverter, "AzureCloudBlobToLocationConverter");
+  }
+
+  @Override
+  protected AzureEntity createEntity(InputSplit<CloudObjectLocation> split)
 
 Review comment:
   Good point. We should fix the javaodc.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374383411
 
 

 ##########
 File path: extensions-contrib/azure-extensions/pom.xml
 ##########
 @@ -91,6 +91,11 @@
             <artifactId>guice</artifactId>
             <scope>provided</scope>
         </dependency>
+        <dependency>
+            <groupId>com.google.inject.extensions</groupId>
+            <artifactId>guice-assistedinject</artifactId>
 
 Review comment:
   🤘

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376675037
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/firehose/azure/AzureBlob.java
 ##########
 @@ -26,6 +26,7 @@
 import java.util.Objects;
 
 
+@Deprecated
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376572129
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/data/input/azure/AzureEntityTest.java
 ##########
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.google.common.base.Predicate;
+import org.apache.commons.io.input.NullInputStream;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.storage.azure.AzureByteSource;
+import org.apache.druid.storage.azure.AzureByteSourceFactory;
+import org.apache.druid.storage.azure.AzureUtils;
+import org.easymock.EasyMock;
+import org.easymock.EasyMockSupport;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+
+public class AzureEntityTest extends EasyMockSupport
+{
+  private static final String CONTAINER_NAME = "container";
+  private static final String BLOB_NAME = "blob";
+  private static final int OFFSET = 20;
+  private static final InputStream INPUT_STREAM = new NullInputStream(OFFSET);
+  private static final IOException IO_EXCEPTION = new IOException();
+  private static final URI ENTITY_URI;
+
+  private CloudObjectLocation location;
+  private AzureByteSourceFactory byteSourceFactory;
+  private AzureByteSource byteSource;
+
+  private AzureEntity azureEntity;
+
+  static {
+    try {
+      ENTITY_URI = new URI(AzureInputSource.SCHEME + "://" + CONTAINER_NAME + "/" + BLOB_NAME);
+    }
+    catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  @Before
+  public void setup()
+  {
+    location = createMock(CloudObjectLocation.class);
+    byteSourceFactory = createMock(AzureByteSourceFactory.class);
+    byteSource = createMock(AzureByteSource.class);
+  }
+
+  @Test
+  public void test_getUri_returnsLocationUri()
+  {
+    EasyMock.expect(location.getBucket()).andReturn(CONTAINER_NAME);
+    EasyMock.expect(location.getPath()).andReturn(BLOB_NAME);
+    EasyMock.expect(byteSourceFactory.create(CONTAINER_NAME, BLOB_NAME)).andReturn(byteSource);
+    EasyMock.expect(location.toUri(AzureInputSource.SCHEME)).andReturn(ENTITY_URI);
+    replayAll();
+
+    azureEntity = new AzureEntity(location, byteSourceFactory);
+
+    URI actualUri = azureEntity.getUri();
+    Assert.assertEquals(ENTITY_URI, actualUri);
+
+    verifyAll();
+
+  }
+
+  @Test
+  public void test_readFrom_returnsExpectedStream() throws Exception
+  {
+    EasyMock.expect(location.getBucket()).andReturn(CONTAINER_NAME);
+    EasyMock.expect(location.getPath()).andReturn(BLOB_NAME);
+    EasyMock.expect(byteSource.openStream(OFFSET)).andReturn(INPUT_STREAM);
+    EasyMock.expect(byteSourceFactory.create(CONTAINER_NAME, BLOB_NAME)).andReturn(byteSource);
+    replayAll();
+
+    azureEntity = new AzureEntity(location, byteSourceFactory);
+
+    InputStream actualInputStream = azureEntity.readFrom(OFFSET);
+    Assert.assertSame(INPUT_STREAM, actualInputStream);
+  }
+
+  @Test
+  public void test_readFrom_throwsIOException_propogatesError()
+  {
+    try {
+      EasyMock.expect(location.getBucket()).andReturn(CONTAINER_NAME);
+      EasyMock.expect(location.getPath()).andReturn(BLOB_NAME);
+      EasyMock.expect(byteSource.openStream(OFFSET)).andThrow(IO_EXCEPTION);
+      EasyMock.expect(byteSourceFactory.create(CONTAINER_NAME, BLOB_NAME)).andReturn(byteSource);
+      replayAll();
+
+      azureEntity = new AzureEntity(location, byteSourceFactory);
+      azureEntity.readFrom(OFFSET);
+    }
+    catch (IOException e) {
+      verifyAll();
+    }
+  }
+
+  @Test
+  public void test_getPath_returnsLocationPath()
+  {
+    EasyMock.expect(location.getBucket()).andReturn(CONTAINER_NAME);
+    EasyMock.expect(location.getPath()).andReturn(BLOB_NAME).atLeastOnce();
+    EasyMock.expect(byteSourceFactory.create(CONTAINER_NAME, BLOB_NAME)).andReturn(byteSource);
+    replayAll();
+
+    azureEntity = new AzureEntity(location, byteSourceFactory);
+    String actualPath = azureEntity.getPath();
+
+    Assert.assertEquals(BLOB_NAME, actualPath);
+    verifyAll();
+  }
+
+  @Test
+  public void test_getRetryCondition_returnsExpectedRetryCondition()
+  {
+    EasyMock.expect(location.getBucket()).andReturn(CONTAINER_NAME);
+    EasyMock.expect(location.getPath()).andReturn(BLOB_NAME).atLeastOnce();
+    EasyMock.expect(byteSourceFactory.create(CONTAINER_NAME, BLOB_NAME)).andReturn(byteSource);
+    replayAll();
+
+    azureEntity = new AzureEntity(location, byteSourceFactory);
+    Predicate<Throwable> actualRetryCondition = azureEntity.getRetryCondition();
+    Assert.assertSame(AzureUtils.AZURE_RETRY, actualRetryCondition);
+  }
+}
 
 Review comment:
   nit: Add a test for readFromStart

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] fjy commented on issue #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
fjy commented on issue #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#issuecomment-582199803
 
 
   Can we also add a new tile for Azure data lake in Druid's web console?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374382373
 
 

 ##########
 File path: core/src/main/java/org/apache/druid/data/input/impl/CloudObjectInputSource.java
 ##########
 @@ -40,6 +40,7 @@
 public abstract class CloudObjectInputSource<T extends InputEntity> extends AbstractInputSource
     implements SplittableInputSource<CloudObjectLocation>
 {
+  protected static final int MAX_LISTING_LENGTH = 1024;
 
 Review comment:
   I don't think it's good to have the base class decide the max length for all the implementations. It would be better to expose this through a function. Should this be made configurable? 
   
   I see the `GoogleCloudStorageInputSource` used to have a max of 10. This makes it 1024. Also the AzureInputSource uses it's own definition for `MAX_LISTING_LENGTH`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374404624
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorageDruidModule.java
 ##########
 @@ -118,4 +133,11 @@ public AzureStorage getAzureStorageContainer(
   {
     return new AzureStorage(cloudBlobClient);
   }
+
+  @Provides
+  @LazySingleton
+  public AzureCloudBlobDruidToCloudObjectLocationConverter getAzureCloudBlobToLocationConverter()
+  {
+    return new AzureCloudBlobDruidToCloudObjectLocationConverter();
+  }
 
 Review comment:
   why not `bind(AzureCloudBlobDruidToCloudObjectLocationConverter.class).in(Scopes.LazySingleton)`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374406693
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/ResultSegmentDruid.java
 ##########
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.azure;
+
+import com.microsoft.azure.storage.ResultContinuation;
+import com.microsoft.azure.storage.ResultSegment;
+
+import java.util.ArrayList;
+
+public class ResultSegmentDruid<T>
 
 Review comment:
   What is the purpose of all these Druid classes that delegate to another class? Why not justuse the delegate class directly?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376674962
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.storage.azure.AzureCloudBlobHolderToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobHolder;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+/**
+ * Abstracts the Azure storage system where input data is stored. Allows users to retrieve entities in
+ * the storage system that match either a particular uri, prefix, or object.
+ */
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
+  public static final String SCHEME = "azure";
+
+  private final AzureStorage storage;
+  private final AzureEntityFactory entityFactory;
+  private final AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private final AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  @JsonCreator
+  public AzureInputSource(
+      @JacksonInject AzureStorage storage,
+      @JacksonInject AzureEntityFactory entityFactory,
+      @JacksonInject AzureCloudBlobIterableFactory azureCloudBlobIterableFactory,
+      @JacksonInject AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter,
+      @JsonProperty("uris") @Nullable List<URI> uris,
+      @JsonProperty("prefixes") @Nullable List<URI> prefixes,
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+  )
+  {
+    super(SCHEME, uris, prefixes, objects);
+    this.storage = Preconditions.checkNotNull(storage, "AzureStorage");
+    this.entityFactory = Preconditions.checkNotNull(entityFactory, "AzureEntityFactory");
+    this.azureCloudBlobIterableFactory = Preconditions.checkNotNull(
+        azureCloudBlobIterableFactory,
+        "AzureCloudBlobIterableFactory"
+    );
+    this.azureCloudBlobToLocationConverter = Preconditions.checkNotNull(azureCloudBlobToLocationConverter, "AzureCloudBlobToLocationConverter");
+  }
+
+  @Override
+  public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
+  {
+    return new AzureInputSource(
+        storage,
+        entityFactory,
+        azureCloudBlobIterableFactory,
+        azureCloudBlobToLocationConverter,
+        null,
+        null,
+        ImmutableList.of(split.get())
+    );
+  }
+
+  @Override
+  public String toString()
+  {
+    return "AzureInputSource{" +
+           "uris=" + getUris() +
+           ", prefixes=" + getPrefixes() +
+           ", objects=" + getObjects() +
+           '}';
+  }
+
 
 Review comment:
   fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r377984038
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureByteSource.java
 ##########
 @@ -46,11 +53,19 @@ public AzureByteSource(
 
   @Override
   public InputStream openStream() throws IOException
+  {
+    return openStream(0L);
+  }
+
+  public InputStream openStream(long offset) throws IOException
   {
     try {
-      return azureStorage.getBlobInputStream(containerName, blobPath);
+      return azureStorage.getBlobInputStream(offset, containerName, blobPath);
     }
     catch (StorageException | URISyntaxException e) {
+      log.warn("Exception when opening stream to azure resource, containerName: %s, blobPath: %s, Error: %s",
 
 Review comment:
   I think this log is duplicate with that in `RetryingInputStream` which calls this method.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on issue #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on issue #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#issuecomment-582702747
 
 
   This pull request **introduces 2 alerts** when merging 7536255bbbf144e44955d111dd2985e2d65f25b3 into 2e1dbe598ce26a4668dffcbd996ec2b57cf3761a - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-9b1d1076b040584c7abeca54072a1f597ef27298)
   
   **new alerts:**
   
   * 1 for Unused format argument
   * 1 for Iterable wrapping an iterator

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374387631
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureEntityFactory.java
 ##########
 @@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+
+public interface AzureEntityFactory
 
 Review comment:
   javadocs please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376828842
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/data/input/azure/AzureEntityTest.java
 ##########
 @@ -85,6 +85,21 @@ public void test_getUri_returnsLocationUri()
 
   }
 
+  @Test
+  public void test_readFromStart_returnsExpectedStream() throws Exception
+  {
+    EasyMock.expect(location.getBucket()).andReturn(CONTAINER_NAME);
+    EasyMock.expect(location.getPath()).andReturn(BLOB_NAME);
+    EasyMock.expect(byteSource.openStream(0)).andReturn(INPUT_STREAM);
+    EasyMock.expect(byteSourceFactory.create(CONTAINER_NAME, BLOB_NAME)).andReturn(byteSource);
+    replayAll();
+
+    azureEntity = new AzureEntity(location, byteSourceFactory);
+
+    InputStream actualInputStream = azureEntity.readFrom(0);
 
 Review comment:
   hmm... strange that `readFromStart` isn't visible in this test 🤔 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r377990059
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/data/input/azure/AzureEntityTest.java
 ##########
 @@ -85,6 +85,21 @@ public void test_getUri_returnsLocationUri()
 
   }
 
+  @Test
+  public void test_readFromStart_returnsExpectedStream() throws Exception
+  {
+    EasyMock.expect(location.getBucket()).andReturn(CONTAINER_NAME);
+    EasyMock.expect(location.getPath()).andReturn(BLOB_NAME);
+    EasyMock.expect(byteSource.openStream(0)).andReturn(INPUT_STREAM);
+    EasyMock.expect(byteSourceFactory.create(CONTAINER_NAME, BLOB_NAME)).andReturn(byteSource);
+    replayAll();
+
+    azureEntity = new AzureEntity(location, byteSourceFactory);
+
+    InputStream actualInputStream = azureEntity.readFrom(0);
 
 Review comment:
   `readFromStart` is a protected method.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376572741
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/data/input/azure/AzureInputSourceTest.java
 ##########
 @@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.storage.azure.AzureCloudBlobHolderToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterable;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobHolder;
+import org.easymock.EasyMock;
+import org.easymock.EasyMockSupport;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.net.URI;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Spliterators;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class AzureInputSourceTest extends EasyMockSupport
+{
+  private static final String CONTAINER_NAME = "container";
+  private static final String BLOB_NAME = "blob";
+  private static final URI PREFIX_URI;
+  private final List<URI> EMPTY_URIS = ImmutableList.of();
+  private final List<URI> EMPTY_PREFIXES = ImmutableList.of();
+  private final List<CloudObjectLocation> EMPTY_OBJECTS = ImmutableList.of();
+  private static final String CONTAINER = "CONTAINER";
+  private static final String BLOB_PATH = "BLOB_PATH";
+  private static final CloudObjectLocation CLOUD_OBJECT_LOCATION_1 = new CloudObjectLocation(CONTAINER, BLOB_PATH);
+
+  private AzureStorage storage;
+  private AzureEntityFactory entityFactory;
+  private AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  private InputSplit<CloudObjectLocation> inputSplit;
+  private AzureEntity azureEntity1;
+  private CloudBlobHolder cloudBlobDruid1;
+  private AzureCloudBlobIterable azureCloudBlobIterable;
+
+  private AzureInputSource azureInputSource;
+
+  static {
+    try {
+      PREFIX_URI = new URI(AzureInputSource.SCHEME + "://" + CONTAINER_NAME + "/" + BLOB_NAME);
+    }
+    catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  @Before
+  public void setup()
+  {
+    storage = createMock(AzureStorage.class);
+    entityFactory = createMock(AzureEntityFactory.class);
+    inputSplit = createMock(InputSplit.class);
+    azureEntity1 = createMock(AzureEntity.class);
+    azureCloudBlobIterableFactory = createMock(AzureCloudBlobIterableFactory.class);
+    azureCloudBlobToLocationConverter = createMock(AzureCloudBlobHolderToCloudObjectLocationConverter.class);
+    cloudBlobDruid1 = createMock(CloudBlobHolder.class);
+    azureCloudBlobIterable = createMock(AzureCloudBlobIterable.class);
+  }
+
+  @Test(expected = IllegalArgumentException.class)
+  public void test_Constructor_emptyUrisEmptyPrefixesEmptyObjects_throwsIllegalArgumentException()
 
 Review comment:
   nit: I prefere `test_init_...` but maybe `Constructor` should be `constructor` since it's the first word after `_`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376675999
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/data/input/azure/AzureInputSourceTest.java
 ##########
 @@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.storage.azure.AzureCloudBlobHolderToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterable;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobHolder;
+import org.easymock.EasyMock;
+import org.easymock.EasyMockSupport;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.net.URI;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Spliterators;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class AzureInputSourceTest extends EasyMockSupport
+{
+  private static final String CONTAINER_NAME = "container";
+  private static final String BLOB_NAME = "blob";
+  private static final URI PREFIX_URI;
+  private final List<URI> EMPTY_URIS = ImmutableList.of();
+  private final List<URI> EMPTY_PREFIXES = ImmutableList.of();
+  private final List<CloudObjectLocation> EMPTY_OBJECTS = ImmutableList.of();
+  private static final String CONTAINER = "CONTAINER";
+  private static final String BLOB_PATH = "BLOB_PATH";
+  private static final CloudObjectLocation CLOUD_OBJECT_LOCATION_1 = new CloudObjectLocation(CONTAINER, BLOB_PATH);
+
+  private AzureStorage storage;
+  private AzureEntityFactory entityFactory;
+  private AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  private InputSplit<CloudObjectLocation> inputSplit;
+  private AzureEntity azureEntity1;
+  private CloudBlobHolder cloudBlobDruid1;
+  private AzureCloudBlobIterable azureCloudBlobIterable;
+
+  private AzureInputSource azureInputSource;
+
+  static {
+    try {
+      PREFIX_URI = new URI(AzureInputSource.SCHEME + "://" + CONTAINER_NAME + "/" + BLOB_NAME);
+    }
+    catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  @Before
+  public void setup()
+  {
+    storage = createMock(AzureStorage.class);
+    entityFactory = createMock(AzureEntityFactory.class);
+    inputSplit = createMock(InputSplit.class);
+    azureEntity1 = createMock(AzureEntity.class);
+    azureCloudBlobIterableFactory = createMock(AzureCloudBlobIterableFactory.class);
+    azureCloudBlobToLocationConverter = createMock(AzureCloudBlobHolderToCloudObjectLocationConverter.class);
+    cloudBlobDruid1 = createMock(CloudBlobHolder.class);
+    azureCloudBlobIterable = createMock(AzureCloudBlobIterable.class);
+  }
+
+  @Test(expected = IllegalArgumentException.class)
+  public void test_Constructor_emptyUrisEmptyPrefixesEmptyObjects_throwsIllegalArgumentException()
+  {
+    replayAll();
+    azureInputSource = new AzureInputSource(
+        storage,
+        entityFactory,
+        azureCloudBlobIterableFactory,
+        azureCloudBlobToLocationConverter,
+        EMPTY_URIS,
+        EMPTY_PREFIXES,
+        EMPTY_OBJECTS
+    );
 
 Review comment:
   cant have both

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376018725
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/CloudBlobDruid.java
 ##########
 @@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.azure;
+
+import com.microsoft.azure.storage.StorageException;
+import com.microsoft.azure.storage.blob.CloudBlob;
+
+import java.net.URISyntaxException;
+
+/**
+ * Wrapper for {@link CloudBlob}. Used to make testing easier, since {@link CloudBlob}
+ * is a final class and so is difficult to mock in unit tests.
+ */
+public class CloudBlobDruid
 
 Review comment:
   Same for other classes ending with `Druid`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374428991
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/storage/azure/ResultSegmentDruid.java
 ##########
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.azure;
+
+import com.microsoft.azure.storage.ResultContinuation;
+import com.microsoft.azure.storage.ResultSegment;
+
+import java.util.ArrayList;
+
+public class ResultSegmentDruid<T>
 
 Review comment:
   All the delegate classes are finals, so I cant mock them

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376675876
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/data/input/azure/AzureEntityTest.java
 ##########
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.google.common.base.Predicate;
+import org.apache.commons.io.input.NullInputStream;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.storage.azure.AzureByteSource;
+import org.apache.druid.storage.azure.AzureByteSourceFactory;
+import org.apache.druid.storage.azure.AzureUtils;
+import org.easymock.EasyMock;
+import org.easymock.EasyMockSupport;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+
+public class AzureEntityTest extends EasyMockSupport
+{
+  private static final String CONTAINER_NAME = "container";
+  private static final String BLOB_NAME = "blob";
+  private static final int OFFSET = 20;
+  private static final InputStream INPUT_STREAM = new NullInputStream(OFFSET);
+  private static final IOException IO_EXCEPTION = new IOException();
+  private static final URI ENTITY_URI;
+
+  private CloudObjectLocation location;
+  private AzureByteSourceFactory byteSourceFactory;
+  private AzureByteSource byteSource;
+
+  private AzureEntity azureEntity;
+
+  static {
+    try {
+      ENTITY_URI = new URI(AzureInputSource.SCHEME + "://" + CONTAINER_NAME + "/" + BLOB_NAME);
+    }
+    catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  @Before
+  public void setup()
+  {
+    location = createMock(CloudObjectLocation.class);
+    byteSourceFactory = createMock(AzureByteSourceFactory.class);
+    byteSource = createMock(AzureByteSource.class);
+  }
+
+  @Test
+  public void test_getUri_returnsLocationUri()
+  {
+    EasyMock.expect(location.getBucket()).andReturn(CONTAINER_NAME);
+    EasyMock.expect(location.getPath()).andReturn(BLOB_NAME);
+    EasyMock.expect(byteSourceFactory.create(CONTAINER_NAME, BLOB_NAME)).andReturn(byteSource);
+    EasyMock.expect(location.toUri(AzureInputSource.SCHEME)).andReturn(ENTITY_URI);
+    replayAll();
+
+    azureEntity = new AzureEntity(location, byteSourceFactory);
+
+    URI actualUri = azureEntity.getUri();
+    Assert.assertEquals(ENTITY_URI, actualUri);
+
+    verifyAll();
+
+  }
+
+  @Test
+  public void test_readFrom_returnsExpectedStream() throws Exception
+  {
+    EasyMock.expect(location.getBucket()).andReturn(CONTAINER_NAME);
+    EasyMock.expect(location.getPath()).andReturn(BLOB_NAME);
+    EasyMock.expect(byteSource.openStream(OFFSET)).andReturn(INPUT_STREAM);
+    EasyMock.expect(byteSourceFactory.create(CONTAINER_NAME, BLOB_NAME)).andReturn(byteSource);
+    replayAll();
+
+    azureEntity = new AzureEntity(location, byteSourceFactory);
+
+    InputStream actualInputStream = azureEntity.readFrom(OFFSET);
+    Assert.assertSame(INPUT_STREAM, actualInputStream);
+  }
+
+  @Test
+  public void test_readFrom_throwsIOException_propogatesError()
+  {
+    try {
+      EasyMock.expect(location.getBucket()).andReturn(CONTAINER_NAME);
+      EasyMock.expect(location.getPath()).andReturn(BLOB_NAME);
+      EasyMock.expect(byteSource.openStream(OFFSET)).andThrow(IO_EXCEPTION);
+      EasyMock.expect(byteSourceFactory.create(CONTAINER_NAME, BLOB_NAME)).andReturn(byteSource);
+      replayAll();
+
+      azureEntity = new AzureEntity(location, byteSourceFactory);
+      azureEntity.readFrom(OFFSET);
+    }
+    catch (IOException e) {
+      verifyAll();
+    }
+  }
+
+  @Test
+  public void test_getPath_returnsLocationPath()
+  {
+    EasyMock.expect(location.getBucket()).andReturn(CONTAINER_NAME);
+    EasyMock.expect(location.getPath()).andReturn(BLOB_NAME).atLeastOnce();
+    EasyMock.expect(byteSourceFactory.create(CONTAINER_NAME, BLOB_NAME)).andReturn(byteSource);
+    replayAll();
+
+    azureEntity = new AzureEntity(location, byteSourceFactory);
+    String actualPath = azureEntity.getPath();
+
+    Assert.assertEquals(BLOB_NAME, actualPath);
+    verifyAll();
+  }
+
+  @Test
+  public void test_getRetryCondition_returnsExpectedRetryCondition()
+  {
+    EasyMock.expect(location.getBucket()).andReturn(CONTAINER_NAME);
+    EasyMock.expect(location.getPath()).andReturn(BLOB_NAME).atLeastOnce();
+    EasyMock.expect(byteSourceFactory.create(CONTAINER_NAME, BLOB_NAME)).andReturn(byteSource);
+    replayAll();
+
+    azureEntity = new AzureEntity(location, byteSourceFactory);
+    Predicate<Throwable> actualRetryCondition = azureEntity.getRetryCondition();
+    Assert.assertSame(AzureUtils.AZURE_RETRY, actualRetryCondition);
+  }
+}
 
 Review comment:
   added

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376574658
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/storage/azure/AzureStorageDruidModuleTest.java
 ##########
 @@ -101,6 +126,108 @@ public void test_getAzureStorageContainer_expectedClient()
     Assert.assertSame(cloudBlobClient, azureStorage.getCloudBlobClient());
   }
 
+  @Test
+  public void test_getAzureCloudBlobToLocationConverter_expectedConverted()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
+    AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobLocationConverter1 = injector.getInstance(
+        AzureCloudBlobHolderToCloudObjectLocationConverter.class);
+    AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobLocationConverter2 = injector.getInstance(
+        AzureCloudBlobHolderToCloudObjectLocationConverter.class);
+    Assert.assertSame(azureCloudBlobLocationConverter1, azureCloudBlobLocationConverter2);
+  }
+
+  @Test
+  public void test_getAzureByteSourceFactory_canCreateAzureByteSource()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
+    AzureByteSourceFactory factory = injector.getInstance(AzureByteSourceFactory.class);
+    Object object1 = factory.create("container1", "blob1");
+    Object object2 = factory.create("container2", "blob2");
+    Assert.assertNotNull(object1);
+    Assert.assertNotNull(object2);
+    Assert.assertNotSame(object1, object2);
+  }
+
+  @Test
+  public void test_getAzureEntityFactory_canCreateAzureEntity()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+
+    EasyMock.expect(cloudObjectLocation1.getBucket()).andReturn(AZURE_CONTAINER);
+    EasyMock.expect(cloudObjectLocation2.getBucket()).andReturn(AZURE_CONTAINER);
+    EasyMock.expect(cloudObjectLocation1.getPath()).andReturn(PATH);
+    EasyMock.expect(cloudObjectLocation2.getPath()).andReturn(PATH);
+    replayAll();
+
+    injector = makeInjectorWithProperties(props);
+    AzureEntityFactory factory = injector.getInstance(AzureEntityFactory.class);
+    Object object1 = factory.create(cloudObjectLocation1);
+    Object object2 = factory.create(cloudObjectLocation2);
+    Assert.assertNotNull(object1);
+    Assert.assertNotNull(object2);
+    Assert.assertNotSame(object1, object2);
+  }
+
+  @Test
+  public void test_getAzureCloudBlobIteratorFactory_canCreateAzureCloudBlobIterator()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
+    AzureCloudBlobIteratorFactory factory = injector.getInstance(AzureCloudBlobIteratorFactory.class);
+    Object object1 = factory.create(EMPTY_PREFIXES_ITERABLE, 10);
+    Object object2 = factory.create(EMPTY_PREFIXES_ITERABLE, 10);
+    Assert.assertNotNull(object1);
+    Assert.assertNotNull(object2);
+    Assert.assertNotSame(object1, object2);
+  }
+
+  @Test
+  public void test_getAzureCloudBlobIterableFactory_canCreateAzureCloudBlobIterable()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
+    AzureCloudBlobIterableFactory factory = injector.getInstance(AzureCloudBlobIterableFactory.class);
+    AzureCloudBlobIterable object1 = factory.create(EMPTY_PREFIXES_ITERABLE, 10);
+    AzureCloudBlobIterable object2 = factory.create(EMPTY_PREFIXES_ITERABLE, 10);
+    Assert.assertNotNull(object1);
+    Assert.assertNotNull(object2);
+    Assert.assertNotSame(object1, object2);
+  }
+
+  @Test
+  public void test_getListBlobItemDruidFactory_canCreateListBlobItemDruid()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
 
 Review comment:
   I see this duplicated in a lot of tests, maybe we should put it in it's own utility function?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376675013
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.storage.azure.AzureCloudBlobHolderToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobHolder;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+/**
+ * Abstracts the Azure storage system where input data is stored. Allows users to retrieve entities in
+ * the storage system that match either a particular uri, prefix, or object.
+ */
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
+  public static final String SCHEME = "azure";
+
+  private final AzureStorage storage;
+  private final AzureEntityFactory entityFactory;
+  private final AzureCloudBlobIterableFactory azureCloudBlobIterableFactory;
+  private final AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter;
+
+  @JsonCreator
+  public AzureInputSource(
+      @JacksonInject AzureStorage storage,
+      @JacksonInject AzureEntityFactory entityFactory,
+      @JacksonInject AzureCloudBlobIterableFactory azureCloudBlobIterableFactory,
+      @JacksonInject AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobToLocationConverter,
+      @JsonProperty("uris") @Nullable List<URI> uris,
+      @JsonProperty("prefixes") @Nullable List<URI> prefixes,
+      @JsonProperty("objects") @Nullable List<CloudObjectLocation> objects
+  )
+  {
+    super(SCHEME, uris, prefixes, objects);
+    this.storage = Preconditions.checkNotNull(storage, "AzureStorage");
+    this.entityFactory = Preconditions.checkNotNull(entityFactory, "AzureEntityFactory");
+    this.azureCloudBlobIterableFactory = Preconditions.checkNotNull(
+        azureCloudBlobIterableFactory,
+        "AzureCloudBlobIterableFactory"
+    );
+    this.azureCloudBlobToLocationConverter = Preconditions.checkNotNull(azureCloudBlobToLocationConverter, "AzureCloudBlobToLocationConverter");
+  }
+
+  @Override
+  public SplittableInputSource<CloudObjectLocation> withSplit(InputSplit<CloudObjectLocation> split)
+  {
+    return new AzureInputSource(
+        storage,
+        entityFactory,
+        azureCloudBlobIterableFactory,
+        azureCloudBlobToLocationConverter,
+        null,
+        null,
+        ImmutableList.of(split.get())
+    );
+  }
+
+  @Override
+  public String toString()
+  {
+    return "AzureInputSource{" +
+           "uris=" + getUris() +
+           ", prefixes=" + getPrefixes() +
+           ", objects=" + getObjects() +
+           '}';
+  }
+
+  @Override
+  protected AzureEntity createEntity(InputSplit<CloudObjectLocation> split)
+  {
+    return entityFactory.create(split.get());
+  }
+
+  @Override
+  protected Stream<InputSplit<CloudObjectLocation>> getPrefixesSplitStream()
+  {
+    return StreamSupport.stream(getIterableObjectsFromPrefixes().spliterator(), false)
+                        .map(o -> azureCloudBlobToLocationConverter.createCloudObjectLocation(o))
+                        .map(InputSplit::new);
+  }
+
+  private Iterable<CloudBlobHolder> getIterableObjectsFromPrefixes()
+  {
+    return azureCloudBlobIterableFactory.create(getPrefixes(), MAX_LISTING_LENGTH);
+  }
+}
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r376575486
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/test/java/org/apache/druid/storage/azure/AzureStorageDruidModuleTest.java
 ##########
 @@ -101,6 +126,108 @@ public void test_getAzureStorageContainer_expectedClient()
     Assert.assertSame(cloudBlobClient, azureStorage.getCloudBlobClient());
   }
 
+  @Test
+  public void test_getAzureCloudBlobToLocationConverter_expectedConverted()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
+    AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobLocationConverter1 = injector.getInstance(
+        AzureCloudBlobHolderToCloudObjectLocationConverter.class);
+    AzureCloudBlobHolderToCloudObjectLocationConverter azureCloudBlobLocationConverter2 = injector.getInstance(
+        AzureCloudBlobHolderToCloudObjectLocationConverter.class);
+    Assert.assertSame(azureCloudBlobLocationConverter1, azureCloudBlobLocationConverter2);
+  }
+
+  @Test
+  public void test_getAzureByteSourceFactory_canCreateAzureByteSource()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
+    AzureByteSourceFactory factory = injector.getInstance(AzureByteSourceFactory.class);
+    Object object1 = factory.create("container1", "blob1");
+    Object object2 = factory.create("container2", "blob2");
+    Assert.assertNotNull(object1);
+    Assert.assertNotNull(object2);
+    Assert.assertNotSame(object1, object2);
+  }
+
+  @Test
+  public void test_getAzureEntityFactory_canCreateAzureEntity()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+
+    EasyMock.expect(cloudObjectLocation1.getBucket()).andReturn(AZURE_CONTAINER);
+    EasyMock.expect(cloudObjectLocation2.getBucket()).andReturn(AZURE_CONTAINER);
+    EasyMock.expect(cloudObjectLocation1.getPath()).andReturn(PATH);
+    EasyMock.expect(cloudObjectLocation2.getPath()).andReturn(PATH);
+    replayAll();
+
+    injector = makeInjectorWithProperties(props);
+    AzureEntityFactory factory = injector.getInstance(AzureEntityFactory.class);
+    Object object1 = factory.create(cloudObjectLocation1);
+    Object object2 = factory.create(cloudObjectLocation2);
+    Assert.assertNotNull(object1);
+    Assert.assertNotNull(object2);
+    Assert.assertNotSame(object1, object2);
+  }
+
+  @Test
+  public void test_getAzureCloudBlobIteratorFactory_canCreateAzureCloudBlobIterator()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
+    AzureCloudBlobIteratorFactory factory = injector.getInstance(AzureCloudBlobIteratorFactory.class);
+    Object object1 = factory.create(EMPTY_PREFIXES_ITERABLE, 10);
+    Object object2 = factory.create(EMPTY_PREFIXES_ITERABLE, 10);
+    Assert.assertNotNull(object1);
+    Assert.assertNotNull(object2);
+    Assert.assertNotSame(object1, object2);
+  }
+
+  @Test
+  public void test_getAzureCloudBlobIterableFactory_canCreateAzureCloudBlobIterable()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
+    AzureCloudBlobIterableFactory factory = injector.getInstance(AzureCloudBlobIterableFactory.class);
+    AzureCloudBlobIterable object1 = factory.create(EMPTY_PREFIXES_ITERABLE, 10);
+    AzureCloudBlobIterable object2 = factory.create(EMPTY_PREFIXES_ITERABLE, 10);
+    Assert.assertNotNull(object1);
+    Assert.assertNotNull(object2);
+    Assert.assertNotSame(object1, object2);
+  }
+
+  @Test
+  public void test_getListBlobItemDruidFactory_canCreateListBlobItemDruid()
+  {
+    final Properties props = new Properties();
+    props.put("druid.azure.account", AZURE_ACCOUNT_NAME);
+    props.put("druid.azure.key", AZURE_ACCOUNT_KEY);
+    props.put("druid.azure.container", AZURE_CONTAINER);
+    injector = makeInjectorWithProperties(props);
+    ListBlobItemHolderFactory factory = injector.getInstance(ListBlobItemHolderFactory.class);
+    ListBlobItemHolder object1 = factory.create(blobItem1);
+    ListBlobItemHolder object2 = factory.create(blobItem2);
+    Assert.assertNotNull(object1);
+    Assert.assertNotNull(object2);
+    Assert.assertNotSame(object1, object2);
+  }
+
 
 Review comment:
   Test to verify the AzureInputSource is registered to the scheme `azure` 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374385289
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureEntity.java
 ##########
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.google.common.base.Predicate;
+import com.google.inject.assistedinject.Assisted;
+import com.google.inject.assistedinject.AssistedInject;
+import org.apache.druid.data.input.RetryingInputEntity;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.storage.azure.AzureByteSource;
+import org.apache.druid.storage.azure.AzureByteSourceFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.AzureUtils;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+
+public class AzureEntity extends RetryingInputEntity
+{
+  private final Logger log = new Logger(AzureEntity.class);
 
 Review comment:
   private static final Logger LOG = new Logger(AzureEntity.class);

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374384794
 
 

 ##########
 File path: core/src/main/java/org/apache/druid/data/input/impl/CloudObjectInputSource.java
 ##########
 @@ -40,6 +40,7 @@
 public abstract class CloudObjectInputSource<T extends InputEntity> extends AbstractInputSource
     implements SplittableInputSource<CloudObjectLocation>
 {
+  protected static final int MAX_LISTING_LENGTH = 1024;
 
 Review comment:
   I talked to @clintropolis  about the Google limit and that was a mistake, was meant to be 1024 as well. I figured since they are all 1024 now, we could abstract that to base class, and then override in the specific implementation if necessary, which right now it doesn't seem to be so. I agree that this should be configurable.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374387206
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureEntity.java
 ##########
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.google.common.base.Predicate;
+import com.google.inject.assistedinject.Assisted;
+import com.google.inject.assistedinject.AssistedInject;
+import org.apache.druid.data.input.RetryingInputEntity;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.storage.azure.AzureByteSource;
+import org.apache.druid.storage.azure.AzureByteSourceFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.AzureUtils;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+
+public class AzureEntity extends RetryingInputEntity
+{
+  private final Logger log = new Logger(AzureEntity.class);
+  private final CloudObjectLocation location;
+  private final AzureByteSource byteSource;
+
+  @AssistedInject
+  AzureEntity(
+      AzureStorage storage,
+      @Assisted CloudObjectLocation location,
+      AzureByteSourceFactory byteSourceFactory
+  )
+  {
+    this.location = location;
+    this.byteSource = byteSourceFactory.create(location.getBucket(), location.getPath());
+  }
+
+  @Nullable
+  @Override
+  public URI getUri()
+  {
+    return location.toUri(AzureInputSource.SCHEME);
 
 Review comment:
   How can this ever return null? If location is null, this will throw an NPE, and it doesn't look like toUri ever returns a null

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9306: implement Azure InputSource reader and deprecate Azure FireHose
URL: https://github.com/apache/druid/pull/9306#discussion_r374388718
 
 

 ##########
 File path: extensions-contrib/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java
 ##########
 @@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.azure;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.impl.CloudObjectInputSource;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.impl.SplittableInputSource;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.storage.azure.AzureCloudBlobDruidToCloudObjectLocationConverter;
+import org.apache.druid.storage.azure.AzureCloudBlobIterableFactory;
+import org.apache.druid.storage.azure.AzureStorage;
+import org.apache.druid.storage.azure.CloudBlobDruid;
+
+import javax.annotation.Nullable;
+import java.net.URI;
+import java.util.List;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+public class AzureInputSource extends CloudObjectInputSource<AzureEntity>
+{
+  static final int MAX_LISTING_LENGTH = 1024;
+  static final String SCHEME = "azure";
+
+  private final Logger log = new Logger(AzureInputSource.class);
 
 Review comment:
   `private static final` - I'll stop making this comment for the rest of this review

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org