You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/07/23 07:54:56 UTC

[GitHub] [druid] petermarshallio opened a new pull request #11490: Docs - S3 masking and nav update to S3 page

petermarshallio opened a new pull request #11490:
URL: https://github.com/apache/druid/pull/11490


   OTBO community discussion at https://groups.google.com/g/druid-user/c/FydcpFrA688 - tip on masking credentials in logs
   "Note" section changed to > mimic style of other "notes" on other pages
   `[click here]` nav replaced with name of the resource that will open (Amazon Developer Guide)
   Nav changes on the S3 page with links to appropriate sections
   "Basically" removed
   "Loaded" changed to "Pulled" for term consistency
   Introductory text added to each ingestion spec example
   Explicit link to the S3 extension doc added
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] been tested in a test Druid cluster.
   
   cc @sthetland @techdocsmith for checks on how I've structured the nav / stylistics


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith commented on a change in pull request #11490: Docs - S3 masking and nav update to S3 page

Posted by GitBox <gi...@apache.org>.

techdocsmith commented on a change in pull request #11490:
URL: https://github.com/apache/druid/pull/11490#discussion_r734892456



##########
File path: docs/development/extensions-core/s3.md
##########
@@ -36,7 +36,7 @@ The [S3 input source](../../ingestion/native-batch.md#s3-input-source) is suppor
 to read objects directly from S3. If you use the [Hadoop task](../../ingestion/hadoop.md),
 you can read data from S3 by specifying the S3 paths in your [`inputSpec`](../../ingestion/hadoop.md#inputspec).
 
-To configure the extension to read objects from S3 you need to configure how to [connect to S3](#configuration).
+To configure the extension to read objects from S3 you need to configure Druid to [connect to S3](#configuration).

Review comment:
       ```suggestion
   To configure the extension to read objects from S3, supply the S3 [connection information](#configuration).
   ```

##########
File path: docs/development/extensions-core/s3.md
##########
@@ -76,14 +77,15 @@ Druid uses the following credentials provider chain to connect to your S3 bucket
 |6|ECS container credentials|Based on environment variables available on AWS ECS (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or AWS_CONTAINER_CREDENTIALS_FULL_URI) as described in the [EC2ContainerCredentialsProviderWrapper documentation](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/EC2ContainerCredentialsProviderWrapper.html)|
 |7|Instance profile information|Based on the instance profile you may have attached to your druid instance|
 
-You can find more information about authentication method [here](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials)<br/>
-**Note :** *Order is important here as it indicates the precedence of authentication methods.<br/>
-So if you are trying to use Instance profile information, you **must not** set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties*
+> You can find more information about authentication methods in the [Amazon Developer Guide](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials).
+
+> Order is important here as it indicates the precedence of authentication methods. If you are trying to use Instance profile information, you **must not** set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties.
 
+> You can use the property [`druid.startup.logging.maskProperties`](../../configuration/index.html#startup-logging) to mask credentials information in Druid logs.  For example, `["password", "secretKey", "awsSecretAccessKey"]`.

Review comment:
       ```suggestion
   You can use the property [`druid.startup.logging.maskProperties`](../../configuration/index.html#startup-logging) to mask credentials information in Druid logs.  For example, `["password", "secretKey", "awsSecretAccessKey"]`.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|

Review comment:
       ```suggestion
   |`type`|Set value to `s3`.|None|yes|
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:

Review comment:
       ```suggestion
   Specify objects to ingest as either:
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is specified.
+
+#### S3 Input Objects
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`bucket`|Name of the S3 bucket|None|yes|
+|`path`|The path where data is located.|None|yes|
+
+#### S3 Input Properties Object
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`accessKeyId`|The [Password Provider](../operations/password-provider.md) or plain text string of this S3 InputSource's access key|None|yes if secretAccessKey is given|
+|`secretAccessKey`|The [Password Provider](../operations/password-provider.md) or plain text string of this S3 InputSource's secret key|None|yes if accessKeyId is given|
+|`assumeRoleArn`|AWS ARN of the role to assume.  See the [AWS User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html). `assumeRoleArn` can be used either with the ingestion spec AWS credentials or with the default S3 credentials|None|no|
+|`assumeRoleExternalId`|A unique identifier that might be required when you assume a role in another account.  See the [AWS User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html).|None|no|
+
+> If `accessKeyId` and `secretAccessKey` are not given, then the default [S3 credentials provider chain](../development/extensions-core/s3.md#s3-authentication-methods) is used.

Review comment:
       ```suggestion
   If you do not supply an `accessKeyId` and `secretAccessKey`, Druid uses the default [S3 credentials provider chain](../development/extensions-core/s3.md#s3-authentication-methods).
   ```

##########
File path: docs/development/extensions-core/s3.md
##########
@@ -64,7 +64,8 @@ In addition to this you need to set additional configuration, specific for [deep
 ### S3 authentication methods
 
 Druid uses the following credentials provider chain to connect to your S3 bucket (whether a deep storage bucket or source bucket).
-**Note :** *You can override the default credentials provider chain for connecting to source bucket by specifying an access key and secret key using [Properties Object](../../ingestion/native-batch.md#s3-input-source) parameters in the ingestionSpec.*
+
+> You can override the default credentials provider chain for connecting to the source bucket by specifying an access key and secret key using [Properties Object](../../ingestion/native-batch.md#s3-input-source) parameters in the ingestion specification.

Review comment:
       ```suggestion
   > To override the default credentials provider chain for connecting to the source bucket, specify an access key and secret key using [Properties Object](../../ingestion/native-batch.md#s3-input-source) parameters in the ingestion specification.
   ```

##########
File path: docs/development/extensions-core/s3.md
##########
@@ -76,14 +77,15 @@ Druid uses the following credentials provider chain to connect to your S3 bucket
 |6|ECS container credentials|Based on environment variables available on AWS ECS (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or AWS_CONTAINER_CREDENTIALS_FULL_URI) as described in the [EC2ContainerCredentialsProviderWrapper documentation](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/EC2ContainerCredentialsProviderWrapper.html)|
 |7|Instance profile information|Based on the instance profile you may have attached to your druid instance|
 
-You can find more information about authentication method [here](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials)<br/>
-**Note :** *Order is important here as it indicates the precedence of authentication methods.<br/>
-So if you are trying to use Instance profile information, you **must not** set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties*
+> You can find more information about authentication methods in the [Amazon Developer Guide](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials).

Review comment:
       ```suggestion
   Fore more information, refer to the [Amazon Developer Guide](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials).
   ```

##########
File path: docs/development/extensions-core/s3.md
##########
@@ -76,14 +77,15 @@ Druid uses the following credentials provider chain to connect to your S3 bucket
 |6|ECS container credentials|Based on environment variables available on AWS ECS (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or AWS_CONTAINER_CREDENTIALS_FULL_URI) as described in the [EC2ContainerCredentialsProviderWrapper documentation](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/EC2ContainerCredentialsProviderWrapper.html)|
 |7|Instance profile information|Based on the instance profile you may have attached to your druid instance|
 
-You can find more information about authentication method [here](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials)<br/>
-**Note :** *Order is important here as it indicates the precedence of authentication methods.<br/>
-So if you are trying to use Instance profile information, you **must not** set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties*
+> You can find more information about authentication methods in the [Amazon Developer Guide](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials).
+
+> Order is important here as it indicates the precedence of authentication methods. If you are trying to use Instance profile information, you **must not** set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties.

Review comment:
       ```suggestion
   The order of configuration parameters is important here because it indicates the precedence of authentication methods. If you are trying to use Instance profile information, do not set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|

Review comment:
       ```suggestion
   |`uris`| JSON array of URIs defining the location of S3 objects to ingest |None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.

Review comment:
       ```suggestion
   The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  In this case each `index_parallel` task reads one or more objects.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is specified.

Review comment:
       ```suggestion
   The S3 input source skips all empty objects only when `prefixes` is specified.
   ```

##########
File path: docs/development/extensions-core/s3.md
##########
@@ -76,14 +77,15 @@ Druid uses the following credentials provider chain to connect to your S3 bucket
 |6|ECS container credentials|Based on environment variables available on AWS ECS (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or AWS_CONTAINER_CREDENTIALS_FULL_URI) as described in the [EC2ContainerCredentialsProviderWrapper documentation](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/EC2ContainerCredentialsProviderWrapper.html)|
 |7|Instance profile information|Based on the instance profile you may have attached to your druid instance|
 
-You can find more information about authentication method [here](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials)<br/>
-**Note :** *Order is important here as it indicates the precedence of authentication methods.<br/>
-So if you are trying to use Instance profile information, you **must not** set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties*
+> You can find more information about authentication methods in the [Amazon Developer Guide](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials).
+
+> Order is important here as it indicates the precedence of authentication methods. If you are trying to use Instance profile information, you **must not** set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties.
 
+> You can use the property [`druid.startup.logging.maskProperties`](../../configuration/index.html#startup-logging) to mask credentials information in Druid logs.  For example, `["password", "secretKey", "awsSecretAccessKey"]`.
 
 ### S3 permissions settings
 
-`s3:GetObject` and `s3:PutObject` are basically required for pushing/loading segments to/from S3.
+`s3:GetObject` and `s3:PutObject` are required for pushing / pulling segments to / from S3.

Review comment:
       ```suggestion
   `s3:GetObject` and `s3:PutObject` are required for pushing or pulling segments to or from S3.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding the default S3 configuration.|None|No (defaults will be used if not given)

Review comment:
       ```suggestion
   |[`properties`](#s3-input-properties-object)|Properties Object to override the default S3 configuration.|None|No (defaults will be used if not given)
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or

Review comment:
       ```suggestion
   - a list of S3 URI strings
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is specified.
+
+#### S3 Input Objects
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`bucket`|Name of the S3 bucket|None|yes|
+|`path`|The path where data is located.|None|yes|
+
+#### S3 Input Properties Object
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`accessKeyId`|The [Password Provider](../operations/password-provider.md) or plain text string of this S3 InputSource's access key|None|yes if secretAccessKey is given|
+|`secretAccessKey`|The [Password Provider](../operations/password-provider.md) or plain text string of this S3 InputSource's secret key|None|yes if accessKeyId is given|
+|`assumeRoleArn`|AWS ARN of the role to assume.  See the [AWS User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html). `assumeRoleArn` can be used either with the ingestion spec AWS credentials or with the default S3 credentials|None|no|
+|`assumeRoleExternalId`|A unique identifier that might be required when you assume a role in another account.  See the [AWS User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html).|None|no|
+
+> If `accessKeyId` and `secretAccessKey` are not given, then the default [S3 credentials provider chain](../development/extensions-core/s3.md#s3-authentication-methods) is used.
+
+#### S3 Input Examples

Review comment:
       ```suggestion
   #### S3 input examples
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then ingest
+*all* objects contained in the `prefixes` you specify.

Review comment:
       ```suggestion
   all objects contained in the specified prefixes.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is specified.
+
+#### S3 Input Objects
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`bucket`|Name of the S3 bucket|None|yes|
+|`path`|The path where data is located.|None|yes|

Review comment:
       ```suggestion
   |`path`|The path to the data|None|yes|
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is specified.
+
+#### S3 Input Objects
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`bucket`|Name of the S3 bucket|None|yes|
+|`path`|The path where data is located.|None|yes|
+
+#### S3 Input Properties Object
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`accessKeyId`|The [Password Provider](../operations/password-provider.md) or plain text string of this S3 InputSource's access key|None|yes if secretAccessKey is given|
+|`secretAccessKey`|The [Password Provider](../operations/password-provider.md) or plain text string of this S3 InputSource's secret key|None|yes if accessKeyId is given|
+|`assumeRoleArn`|AWS ARN of the role to assume.  See the [AWS User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html). `assumeRoleArn` can be used either with the ingestion spec AWS credentials or with the default S3 credentials|None|no|
+|`assumeRoleExternalId`|A unique identifier that might be required when you assume a role in another account.  See the [AWS User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html).|None|no|
+
+> If `accessKeyId` and `secretAccessKey` are not given, then the default [S3 credentials provider chain](../development/extensions-core/s3.md#s3-authentication-methods) is used.
+
+#### S3 Input Examples
+
+Using URIs, this ingestion specification will ingest two specific objects:

Review comment:
       ```suggestion
   Using URIs, the following ingestion specification ingests two specific objects:
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|

Review comment:
       ```suggestion
   |`prefixes`| JSON array of URIs defining the URI prefixes for the locations of S3 objects to ingest. Druid skips empty objects starting with one of the given prefixes.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is specified.
+
+#### S3 Input Objects
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`bucket`|Name of the S3 bucket|None|yes|
+|`path`|The path where data is located.|None|yes|
+
+#### S3 Input Properties Object

Review comment:
       ```suggestion
   #### S3 input properties object
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -941,33 +985,7 @@ Sample specs:
 ...
 ```
 
-|property|description|default|required?|
-|--------|-----------|-------|---------|
-|type|This should be `s3`.|None|yes|
-|uris|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or `objects` must be set|
-|prefixes|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or `objects` must be set|
-|objects|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set|
-|properties|Properties Object for overriding the default S3 configuration. See below for more information.|None|No (defaults will be used if not given)
-
-Note that the S3 input source will skip all empty objects only when `prefixes` is specified.
-
-S3 Object:
-
-|property|description|default|required?|
-|--------|-----------|-------|---------|
-|bucket|Name of the S3 bucket|None|yes|
-|path|The path where data is located.|None|yes|
-
-Properties Object:
-
-|property|description|default|required?|
-|--------|-----------|-------|---------|
-|accessKeyId|The [Password Provider](../operations/password-provider.md) or plain text string of this S3 InputSource's access key|None|yes if secretAccessKey is given|
-|secretAccessKey|The [Password Provider](../operations/password-provider.md) or plain text string of this S3 InputSource's secret key|None|yes if accessKeyId is given|
-|assumeRoleArn|AWS ARN of the role to assume [see](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html). **assumeRoleArn** can be used either with the ingestion spec AWS credentials or with the default S3 credentials|None|no|
-|assumeRoleExternalId|A unique identifier that might be required when you assume a role in another account [see](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html)|None|no|
-
-**Note :** *If accessKeyId and secretAccessKey are not given, the default [S3 credentials provider chain](../development/extensions-core/s3.md#s3-authentication-methods) is used.*
+> Read more about S3 and Druid on the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension page, including using S3-like for [Deep Storage](../dependencies/deep-storage.html), more about authentication, and additional configuration options.

Review comment:
       ```suggestion
   Learn more about S3 and Druid on the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension page, including using S3-like for [Deep Storage](../dependencies/deep-storage.html), more about authentication, and additional configuration options.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -900,6 +940,8 @@ Sample specs:
 ...
 ```
 
+This ingestion specification provides task-specific credentials to ingest two specific objects:

Review comment:
       ```suggestion
   The following ingestion specification provides task-specific credentials to ingest two specific objects:
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|

Review comment:
       ```suggestion
   |[`objects`](#s3-input-objects)|JSON array of S3 Objects to ingest.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -880,6 +919,7 @@ Sample specs:
 ...
 ```
 
+This time using `objects`, this specification will ingest two specific objects, one from the `foo` bucket, one from the `bar` bucket:

Review comment:
       ```suggestion
   The following example uses `objects` to ingest two specific objects, one from the `foo` bucket, one from the `bar` bucket:
   ```
   when possible opt for "real world" examples over "foo" & "bar"

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is specified.
+
+#### S3 Input Objects
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`bucket`|Name of the S3 bucket|None|yes|
+|`path`|The path where data is located.|None|yes|
+
+#### S3 Input Properties Object
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`accessKeyId`|The [Password Provider](../operations/password-provider.md) or plain text string of this S3 InputSource's access key|None|yes if secretAccessKey is given|
+|`secretAccessKey`|The [Password Provider](../operations/password-provider.md) or plain text string of this S3 InputSource's secret key|None|yes if accessKeyId is given|

Review comment:
       ```suggestion
   |`secretAccessKey`|The [Password Provider](../operations/password-provider.md) or plain text string of the S3 InputSource's secret key|None|yes if accessKeyId is given|
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how Druid has divided up the work of ingestion.

Review comment:
       ```suggestion
   You can view the payload of individual `index_parallel` tasks to see how Druid has divided up the work of ingestion.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is specified.
+
+#### S3 Input Objects
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`bucket`|Name of the S3 bucket|None|yes|
+|`path`|The path where data is located.|None|yes|
+
+#### S3 Input Properties Object
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`accessKeyId`|The [Password Provider](../operations/password-provider.md) or plain text string of this S3 InputSource's access key|None|yes if secretAccessKey is given|

Review comment:
       ```suggestion
   |`accessKeyId`|The [Password Provider](../operations/password-provider.md) or plain text string of the S3 InputSource's access key|None|yes if secretAccessKey is given|
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -864,6 +901,8 @@ Sample specs:
 ...
 ```
 
+This specification will ingest all the objects in two locations given in `prefixes`:

Review comment:
       ```suggestion
   The following specification ingests all the objects in two locations given in `prefixes`:
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then ingest

Review comment:
       ```suggestion
   When you supply a list of `prefixes`, Druid lists the contents and then ingests
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] petermarshallio commented on pull request #11490: Docs - S3 masking and nav update to S3 page

Posted by GitBox <gi...@apache.org>.

petermarshallio commented on pull request #11490:
URL: https://github.com/apache/druid/pull/11490#issuecomment-1077859774


   @techdocsmith I fixed some broken links, and also tidied up some other areas of the text - including that para on the precedence of credentials: welcome you checking that I haven't changed the meaning too much on that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith commented on a change in pull request #11490: Docs - S3 masking and nav update to S3 page

Posted by GitBox <gi...@apache.org>.

techdocsmith commented on a change in pull request #11490:
URL: https://github.com/apache/druid/pull/11490#discussion_r835525675



##########
File path: docs/development/extensions-core/s3.md
##########
@@ -63,8 +63,9 @@ In addition to this you need to set additional configuration, specific for [deep
 
 ### S3 authentication methods
 
-Druid uses the following credentials provider chain to connect to your S3 bucket (whether a deep storage bucket or source bucket).
-**Note :** *You can override the default credentials provider chain for connecting to source bucket by specifying an access key and secret key using [Properties Object](../../ingestion/native-batch-input-source.md#s3-input-source) parameters in the ingestionSpec.*
+You can provide credentials to connect to S3 in a number of ways, whether for [deep storage](#deep-storage) or as an [ingestion source](#reading-data-from-s3).
+
+There is a defined order of precedence, as given below.  This means, for example, if you would like to use profile information given in `~/.aws.credentials`, do not set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid config file as this would take precedence.

Review comment:
       ```suggestion
   Thee configuration options are listed in order of precedence.  For example, if you would like to use profile information given in `~/.aws.credentials`, do not set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid config file because they would take precedence.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] petermarshallio commented on a change in pull request #11490: Docs - S3 masking and nav update to S3 page

Posted by GitBox <gi...@apache.org>.

petermarshallio commented on a change in pull request #11490:
URL: https://github.com/apache/druid/pull/11490#discussion_r738338327



##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|

Review comment:
       (See line 864)

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the input source.
 
 ### S3 Input Source
 
-> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to [load](../development/extensions.html#loading-extensions) the [`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel task](#parallel-task).  Each `index_parallel` task will then read one or multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be ingested. Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|

Review comment:
       Revisiting the wording on this @techdocsmith – I'm not sure what "Empty objects starting with one of the given prefixes will be skipped." means here.  Maybe we revert this bit?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] petermarshallio commented on pull request #11490: Docs - S3 masking and nav update to S3 page

Posted by GitBox <gi...@apache.org>.

petermarshallio commented on pull request #11490:
URL: https://github.com/apache/druid/pull/11490#issuecomment-938516057


   @techdocsmith another one that I'd welcome your review on?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith merged pull request #11490: Docs - S3 masking and nav update to S3 page

Posted by GitBox <gi...@apache.org>.

techdocsmith merged pull request #11490:
URL: https://github.com/apache/druid/pull/11490


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org