You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/04/23 00:03:35 UTC

[GitHub] [druid] sthetland opened a new pull request #11153: Azure data lake input source

sthetland opened a new pull request #11153:
URL: https://github.com/apache/druid/pull/11153


   We added a tile for Azure Data Lake in the data loader here: https://github.com/apache/druid/pull/9437. However, the doc doesn't mention Azure Data Lake at all. 
   
   The ingestion spec is the same as for Azure blobs, so I think the only change needed is to have the existing Azure Blob doc refer to Azure Data Lake ingestion as well: https://docs.imply.io/latest/druid/ingestion/native-batch/#azure-input-source
   
   This PR makes that change


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] sthetland commented on pull request #11153: Azure data lake input source

Posted by GitBox <gi...@apache.org>.

sthetland commented on pull request #11153:
URL: https://github.com/apache/druid/pull/11153#issuecomment-825839429


   > Should we list data lake in line 61:
   > 
   > ```
   > - [`azure`](#azure-input-source) reads data from Azure Blob Storage.
   > ```
   
   Good catch. Added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith commented on a change in pull request #11153: Azure data lake input source

Posted by GitBox <gi...@apache.org>.

techdocsmith commented on a change in pull request #11153:
URL: https://github.com/apache/druid/pull/11153#discussion_r621687801



##########
File path: docs/ingestion/native-batch.md
##########
@@ -1066,17 +1064,17 @@ Sample specs:
 |property|description|default|required?|
 |--------|-----------|-------|---------|
 |type|This should be `azure`.|None|yes|
-|uris|JSON array of URIs where Azure Blob objects to be ingested are located. Should be in form "azure://\<container>/\<path-to-file\>"|None|`uris` or `prefixes` or `objects` must be set|
-|prefixes|JSON array of URI prefixes for the locations of Azure Blob objects to be ingested. Should be in the form "azure://\<container>/\<prefix\>". Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or `objects` must be set|
-|objects|JSON array of Azure Blob objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set|
+|uris|JSON array of URIs where the Azure objects to be ingested are located, in the form "azure://\<container>/\<path-to-file\>"|None|`uris` or `prefixes` or `objects` must be set|
+|prefixes|JSON array of URI prefixes for the locations of Azure objects to be ingested, in the form "azure://\<container>/\<prefix\>". Empty objects starting with one of the given prefixes are skipped.|None|`uris` or `prefixes` or `objects` must be set|

Review comment:
       ```suggestion
   |prefixes|JSON array of URI prefixes for the locations of Azure objects to ingest, in the form "azure://\<container>/\<prefix\>". Empty objects starting with one of the given prefixes are skipped.|None|`uris` or `prefixes` or `objects` must be set|
   ```
   nit

##########
File path: docs/ingestion/native-batch.md
##########
@@ -1066,17 +1064,17 @@ Sample specs:
 |property|description|default|required?|
 |--------|-----------|-------|---------|
 |type|This should be `azure`.|None|yes|
-|uris|JSON array of URIs where Azure Blob objects to be ingested are located. Should be in form "azure://\<container>/\<path-to-file\>"|None|`uris` or `prefixes` or `objects` must be set|
-|prefixes|JSON array of URI prefixes for the locations of Azure Blob objects to be ingested. Should be in the form "azure://\<container>/\<prefix\>". Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or `objects` must be set|
-|objects|JSON array of Azure Blob objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set|
+|uris|JSON array of URIs where the Azure objects to be ingested are located, in the form "azure://\<container>/\<path-to-file\>"|None|`uris` or `prefixes` or `objects` must be set|
+|prefixes|JSON array of URI prefixes for the locations of Azure objects to be ingested, in the form "azure://\<container>/\<prefix\>". Empty objects starting with one of the given prefixes are skipped.|None|`uris` or `prefixes` or `objects` must be set|
+|objects|JSON array of Azure objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set|

Review comment:
       ```suggestion
   |objects|JSON array of Azure objects to ingest.|None|`uris` or `prefixes` or `objects` must be set|
   ```
   nit




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith commented on pull request #11153: Azure data lake input source

Posted by GitBox <gi...@apache.org>.

techdocsmith commented on pull request #11153:
URL: https://github.com/apache/druid/pull/11153#issuecomment-856346220


   @sthetland is this one done?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] sthetland merged pull request #11153: Azure data lake input source

Posted by GitBox <gi...@apache.org>.

sthetland merged pull request #11153:
URL: https://github.com/apache/druid/pull/11153


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith commented on a change in pull request #11153: Azure data lake input source

Posted by GitBox <gi...@apache.org>.

techdocsmith commented on a change in pull request #11153:
URL: https://github.com/apache/druid/pull/11153#discussion_r619340293



##########
File path: docs/ingestion/native-batch.md
##########
@@ -1004,10 +1004,8 @@ Google Cloud Storage object:
 
 > You need to include the [`druid-azure-extensions`](../development/extensions-core/azure.md) as an extension to use the Azure input source.
 
-The Azure input source is to support reading objects directly from Azure Blob store. Objects can be
-specified as list of Azure Blob store URI strings. The Azure input source is splittable and can be used
-by the [Parallel task](#parallel-task), where each worker task of `index_parallel` will read
-a single object.
+The Azure input source is used to read objects directly from Azure Blob store or Azure Data Lake sources. Objects can be

Review comment:
       ```suggestion
   The Azure input source reads objects directly from Azure Blob store or Azure Data Lake sources. You can
   ```
   nit

##########
File path: docs/ingestion/native-batch.md
##########
@@ -1004,10 +1004,8 @@ Google Cloud Storage object:
 
 > You need to include the [`druid-azure-extensions`](../development/extensions-core/azure.md) as an extension to use the Azure input source.
 
-The Azure input source is to support reading objects directly from Azure Blob store. Objects can be
-specified as list of Azure Blob store URI strings. The Azure input source is splittable and can be used
-by the [Parallel task](#parallel-task), where each worker task of `index_parallel` will read
-a single object.
+The Azure input source is used to read objects directly from Azure Blob store or Azure Data Lake sources. Objects can be
+specified as a list of file URI strings or prefixes. The Azure input source is splittable and can be used by the [Parallel task](#parallel-task), where each worker task reads a single object.

Review comment:
       ```suggestion
   specify objects as a list of file URI strings or prefixes. You can split the Azure input source for use with [Parallel task](#parallel-task) indexing and each worker task reads one chunk of the split data.
   ```
   I think we should differentiate between the `single object` and the sections of split out object since we're using `object` as the whole.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org