You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/01 22:05:18 UTC

[GitHub] [iceberg] JonasJ-ap opened a new pull request, #5902: AWS: Add doc for assume role session name and several http client configurations

JonasJ-ap opened a new pull request, #5902:
URL: https://github.com/apache/iceberg/pull/5902

   Add doc for the following features:
    1. user specified assume role session name, added in #5765 
    2. optional configurations for URL Connection HTTP Client, will be added in #5900 
    3. optional configurations for Apache HTTP Client. Two of them were added in #5787. The rest will be added in #5899 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #5902: Docs: Add doc for HTTP client configurations

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #5902:
URL: https://github.com/apache/iceberg/pull/5902#discussion_r985299777


##########
docs/aws.md:
##########
@@ -573,6 +573,55 @@ spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersio
     --conf spark.sql.catalog.my_catalog.client.assume-role.region=ap-northeast-1
 ```
 
+### URL Connection HTTP Client

Review Comment:
   can we make this level 4, and at level 3 have a dedicated section for `HTTP client configurations`?



##########
docs/aws.md:
##########
@@ -573,6 +573,55 @@ spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersio
     --conf spark.sql.catalog.my_catalog.client.assume-role.region=ap-northeast-1
 ```
 
+### URL Connection HTTP Client

Review Comment:
   And although it is just 1 line at this moment, can we also have a table for choosing the client type under the level 3 section? And we can talk a bit about the pros and cons of different HTTP client implementations



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 merged pull request #5902: Docs: Add doc for HTTP client configurations

Posted by GitBox <gi...@apache.org>.
jackye1995 merged PR #5902:
URL: https://github.com/apache/iceberg/pull/5902


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #5902: Docs: Add doc for HTTP client configurations

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #5902:
URL: https://github.com/apache/iceberg/pull/5902#discussion_r990813762


##########
docs/aws.md:
##########
@@ -573,6 +573,56 @@ spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersio
     --conf spark.sql.catalog.my_catalog.client.assume-role.region=ap-northeast-1
 ```
 
+### HTTP Client Configurations
+AWS clients support two types of HTTP Client, [URL Connection HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/url-connection-client) 
+and [Apache HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/apache-client).
+By default, AWS clients use **URL Connection** HTTP Client to communicate with the service. 
+This HTTP client optimizes for minimum dependencies and startup latency but support less functionality than other implementations. 
+In contrast, Apache HTTP Client supports more functionalities and more customized settings, such as expect-continue handshake and TCP KeepAlive, at cost of extra dependency and additional startup latency. 
+
+For more details of configuration, see sections [URL Connection HTTP Client Configurations](#url-connection-http-client-configurations) and [Apache HTTP Client Configurations](#apache-http-client-configurations).
+
+Configure the following property to set the type of HTTP client:
+
+| Property         | Default       | Description                                                                                                |
+|------------------|---------------|------------------------------------------------------------------------------------------------------------|
+| http-client.type | urlconnection | Types of HTTP Client. <br/> `urlconnection`: URL Connection HTTP Client <br/> `apache`: Apache HTTP Client |
+
+#### URL Connection HTTP Client Configurations
+
+URL Connection HTTP Client has the following configurable properties:
+
+| Property                                        | Default | Description                                                                                                                                                                                                      |
+|-------------------------------------------------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| http-client.urlconnection.socket-timeout-ms     | null    | An optional [socket timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.Builder.html#socketTimeout(java.time.Duration)) in milliseconds         |
+| http-client.urlconnection.connection-timeout-ms | null    | An optional [connection timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.Builder.html#connectionTimeout(java.time.Duration)) in milliseconds |
+
+Users can use catalog properties to override the defaults. For example, to configure the socket timeout for URL Connection HTTP Client when starting a spark shell, one can add:
+```shell
+--conf spark.sql.catalog.my_catalog.http-client.urlconnection.socket-timeout-ms=80
+```
+
+#### Apache HTTP Client Configurations
+
+Apache HTTP Client has the following configurable properties:
+
+| Property                                              | Default                   | Description                                                                                                                                                                                                                                 |
+|-------------------------------------------------------|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| http-client.apache.socket-timeout-ms                  | null                      | An optional [socket timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#socketTimeout(java.time.Duration)) in milliseconds                                                  |
+| http-client.apache.connection-timeout-ms              | null                      | An optional [connection timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#connectionTimeout(java.time.Duration)) in milliseconds                                          |
+| http-client.apache.connection-acquisition-timeout-ms  | null                      | An optional [connection acquisition timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#connectionAcquisitionTimeout(java.time.Duration)) in milliseconds                   |
+| http-client.apache.connection-max-idle-time-ms        | null                      | An optional [connection max idle timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#connectionMaxIdleTime(java.time.Duration)) in milliseconds                             |
+| http-client.apache.connection-time-to-live-ms         | null                      | An optional [connection time to live](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#connectionTimeToLive(java.time.Duration)) in milliseconds                                  |
+| http-client.apache.expect-continue-enabled            | null, disabled by default | An optional `true/false` setting that decide whether to enable [expect continue](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#expectContinueEnabled(java.lang.Boolean))       |
+| http-client.apache.max-connections                    | null                      | An optional [max connections](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#maxConnections(java.lang.Integer))  in integer                                                     |
+| http-client.apache.tcp-keep-alive-enabled             | null, disabled by default | An optional `true/false` setting that decide whether to enable [tcp keep alive](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#tcpKeepAlive(java.lang.Boolean))                 |
+| http-client.apache.use-idle-connection-reaper-enabled | null, enabled by default  | An optional `true/false` setting that decide whether to [use idle connection reaper](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#useIdleConnectionReaper(java.lang.Boolean)) |
+
+Users can use catalog properties to override the defaults. For example, to configure the max connections for Apache HTTP Client when starting a spark shell, one can add:
+```shell
+--conf spark.sql.catalog.my_catalog.http-client.apache.max-connections=5
+```

Review Comment:
   We probably want one more PR to update all the Spark examples in `aws.md`, so for this one we can keep it as is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #5902: Docs: Add doc for AWS assume role session name and several http client configurations

Posted by GitBox <gi...@apache.org>.
JonasJ-ap commented on code in PR #5902:
URL: https://github.com/apache/iceberg/pull/5902#discussion_r985268374


##########
docs/aws.md:
##########
@@ -552,12 +552,13 @@ This also serves as an example for users who would like to implement their own A
 
 This client factory has the following configurable catalog properties:
 
-| Property                          | Default                                  | Description                                            |
-| --------------------------------- | ---------------------------------------- | ------------------------------------------------------ |
-| client.assume-role.arn            | null, requires user input                | ARN of the role to assume, e.g. arn:aws:iam::123456789:role/myRoleToAssume  |
-| client.assume-role.region         | null, requires user input                | All AWS clients except the STS client will use the given region instead of the default region chain  |
-| client.assume-role.external-id    | null                                     | An optional [external ID](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html)  |
-| client.assume-role.timeout-sec    | 1 hour                                   | Timeout of each assume role session. At the end of the timeout, a new set of role session credentials will be fetched through a STS client.  |
+| Property                          | Default                   | Description                                                                                                                                 |

Review Comment:
   Thank you for your review. I think the "change all the lines" here is due to the reformat of the markdown table after I add a new line. The actual contents of previously existing lines remain unchanged. In this case, do I need to rollback to the previous format?



##########
docs/aws.md:
##########
@@ -552,12 +552,13 @@ This also serves as an example for users who would like to implement their own A
 
 This client factory has the following configurable catalog properties:
 
-| Property                          | Default                                  | Description                                            |
-| --------------------------------- | ---------------------------------------- | ------------------------------------------------------ |
-| client.assume-role.arn            | null, requires user input                | ARN of the role to assume, e.g. arn:aws:iam::123456789:role/myRoleToAssume  |
-| client.assume-role.region         | null, requires user input                | All AWS clients except the STS client will use the given region instead of the default region chain  |
-| client.assume-role.external-id    | null                                     | An optional [external ID](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html)  |
-| client.assume-role.timeout-sec    | 1 hour                                   | Timeout of each assume role session. At the end of the timeout, a new set of role session credentials will be fetched through a STS client.  |
+| Property                          | Default                   | Description                                                                                                                                 |
+|-----------------------------------|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
+| client.assume-role.arn            | null, requires user input | ARN of the role to assume, e.g. arn:aws:iam::123456789:role/myRoleToAssume                                                                  |
+| client.assume-role.region         | null, requires user input | All AWS clients except the STS client will use the given region instead of the default region chain                                         |
+| client.assume-role.external-id    | null                      | An optional [external ID](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html)                        |
+| client.assume-role.timeout-sec    | 1 hour                    | Timeout of each assume role session. At the end of the timeout, a new set of role session credentials will be fetched through a STS client. |
+| client.assume-role.session-name   | null                      | An optional [session name](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_iam-condition-keys.html#ck_rolesessionname)  |

Review Comment:
   Thank you for your suggestion. In this case, I'll remove the added content about `assume_role` and maybe open another PR for it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on a diff in pull request #5902: Docs: Add doc for HTTP client configurations

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #5902:
URL: https://github.com/apache/iceberg/pull/5902#discussion_r985974667


##########
docs/aws.md:
##########
@@ -573,6 +573,68 @@ spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersio
     --conf spark.sql.catalog.my_catalog.client.assume-role.region=ap-northeast-1
 ```
 
+### HTTP Client Configurations
+AWS clients support two types of HTTP Client, [URL Connection HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/url-connection-client) 
+and [Apache HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/apache-client).
+In default, AWS clients use URL Connection HTTP Client to communicate with the service. 

Review Comment:
   [nit]
   ```suggestion
   By default, AWS clients use **URL Connection** HTTP Client to communicate with the service. 
   ```



##########
docs/aws.md:
##########
@@ -573,6 +573,68 @@ spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersio
     --conf spark.sql.catalog.my_catalog.client.assume-role.region=ap-northeast-1
 ```
 
+### HTTP Client Configurations
+AWS clients support two types of HTTP Client, [URL Connection HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/url-connection-client) 
+and [Apache HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/apache-client).
+In default, AWS clients use URL Connection HTTP Client to communicate with the service. 
+This HTTP client optimizes for minimum dependencies and startup latency but support less functionality than other implementations. 
+In contrast, Apache HTTP Client supports more functionalities and more customized settings, such as expect-continue handshake and TCP KeepAlive, at cost of extra dependency and additional startup latency. 
+
+For more details of configuration, see sections [URL Connection HTTP Client Configurations](#url-connection-http-client-configurations) and [Apache HTTP Client Configurations](#apache-http-client-configurations).
+
+Configure the following property to set the type of HTTP client:
+
+| Property         | Default       | Description                                                                                                |
+|------------------|---------------|------------------------------------------------------------------------------------------------------------|
+| http-client.type | urlconnection | Types of HTTP Client. <br/> `urlconnection`: URL Connection HTTP Client <br/> `apache`: Apache HTTP Client |
+
+#### URL Connection HTTP Client Configurations
+
+URL Connection HTTP Client has the following configurable properties:
+
+| Property                                        | Default | Description                                                                                                                                                                                                      |
+|-------------------------------------------------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| http-client.urlconnection.socket-timeout-ms     | null    | An optional [socket timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.Builder.html#socketTimeout(java.time.Duration)) in milliseconds         |
+| http-client.urlconnection.connection-timeout-ms | null    | An optional [connection timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.Builder.html#connectionTimeout(java.time.Duration)) in milliseconds |
+
+Here is an example to start Spark shell with URL Connection HTTP client and some optional settings:

Review Comment:
   +1, This is getting quite redundant, Thinking out loud here, should we just say, we can use catalog properties to override the defaults and let the end user how to figure out how to pass these properties in their respective engine impl. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #5902: Docs: Add doc for HTTP client configurations

Posted by GitBox <gi...@apache.org>.
JonasJ-ap commented on code in PR #5902:
URL: https://github.com/apache/iceberg/pull/5902#discussion_r985351548


##########
docs/aws.md:
##########
@@ -573,6 +573,55 @@ spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersio
     --conf spark.sql.catalog.my_catalog.client.assume-role.region=ap-northeast-1
 ```
 
+### URL Connection HTTP Client

Review Comment:
   Thank you for your suggestions. I refactored the doc to have a paragraph discuss how to choose between these two HTTP clients



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #5902: Docs: Add doc for HTTP client configurations

Posted by GitBox <gi...@apache.org>.
JonasJ-ap commented on code in PR #5902:
URL: https://github.com/apache/iceberg/pull/5902#discussion_r990553676


##########
docs/aws.md:
##########
@@ -573,6 +573,56 @@ spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersio
     --conf spark.sql.catalog.my_catalog.client.assume-role.region=ap-northeast-1
 ```
 
+### HTTP Client Configurations
+AWS clients support two types of HTTP Client, [URL Connection HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/url-connection-client) 
+and [Apache HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/apache-client).
+By default, AWS clients use **URL Connection** HTTP Client to communicate with the service. 
+This HTTP client optimizes for minimum dependencies and startup latency but support less functionality than other implementations. 
+In contrast, Apache HTTP Client supports more functionalities and more customized settings, such as expect-continue handshake and TCP KeepAlive, at cost of extra dependency and additional startup latency. 
+
+For more details of configuration, see sections [URL Connection HTTP Client Configurations](#url-connection-http-client-configurations) and [Apache HTTP Client Configurations](#apache-http-client-configurations).
+
+Configure the following property to set the type of HTTP client:
+
+| Property         | Default       | Description                                                                                                |
+|------------------|---------------|------------------------------------------------------------------------------------------------------------|
+| http-client.type | urlconnection | Types of HTTP Client. <br/> `urlconnection`: URL Connection HTTP Client <br/> `apache`: Apache HTTP Client |
+
+#### URL Connection HTTP Client Configurations
+
+URL Connection HTTP Client has the following configurable properties:
+
+| Property                                        | Default | Description                                                                                                                                                                                                      |
+|-------------------------------------------------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| http-client.urlconnection.socket-timeout-ms     | null    | An optional [socket timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.Builder.html#socketTimeout(java.time.Duration)) in milliseconds         |
+| http-client.urlconnection.connection-timeout-ms | null    | An optional [connection timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.Builder.html#connectionTimeout(java.time.Duration)) in milliseconds |
+
+Users can use catalog properties to override the defaults. For example, to configure the socket timeout for URL Connection HTTP Client when starting a spark shell, one can add:
+```shell
+--conf spark.sql.catalog.my_catalog.http-client.urlconnection.socket-timeout-ms=80
+```
+
+#### Apache HTTP Client Configurations
+
+Apache HTTP Client has the following configurable properties:
+
+| Property                                              | Default                   | Description                                                                                                                                                                                                                                 |
+|-------------------------------------------------------|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| http-client.apache.socket-timeout-ms                  | null                      | An optional [socket timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#socketTimeout(java.time.Duration)) in milliseconds                                                  |
+| http-client.apache.connection-timeout-ms              | null                      | An optional [connection timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#connectionTimeout(java.time.Duration)) in milliseconds                                          |
+| http-client.apache.connection-acquisition-timeout-ms  | null                      | An optional [connection acquisition timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#connectionAcquisitionTimeout(java.time.Duration)) in milliseconds                   |
+| http-client.apache.connection-max-idle-time-ms        | null                      | An optional [connection max idle timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#connectionMaxIdleTime(java.time.Duration)) in milliseconds                             |
+| http-client.apache.connection-time-to-live-ms         | null                      | An optional [connection time to live](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#connectionTimeToLive(java.time.Duration)) in milliseconds                                  |
+| http-client.apache.expect-continue-enabled            | null, disabled by default | An optional `true/false` setting that decide whether to enable [expect continue](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#expectContinueEnabled(java.lang.Boolean))       |
+| http-client.apache.max-connections                    | null                      | An optional [max connections](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#maxConnections(java.lang.Integer))  in integer                                                     |
+| http-client.apache.tcp-keep-alive-enabled             | null, disabled by default | An optional `true/false` setting that decide whether to enable [tcp keep alive](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#tcpKeepAlive(java.lang.Boolean))                 |
+| http-client.apache.use-idle-connection-reaper-enabled | null, enabled by default  | An optional `true/false` setting that decide whether to [use idle connection reaper](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#useIdleConnectionReaper(java.lang.Boolean)) |
+
+Users can use catalog properties to override the defaults. For example, to configure the max connections for Apache HTTP Client when starting a spark shell, one can add:
+```shell
+--conf spark.sql.catalog.my_catalog.http-client.apache.max-connections=5
+```

Review Comment:
   @singhpk234 @jackye1995 Thank you for your suggestions. I replaced the shell example with `Users can use catalog properties to override the defaults.` plus a one-line example to show how to configure in spark-shell. I keep the one-line example since I think it can show users the format of setting the properties mentioned while not occupying too much space. Do you think it is ok to keep the stuff after  "For example..."?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #5902: Docs: Add doc for HTTP client configurations

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on PR #5902:
URL: https://github.com/apache/iceberg/pull/5902#issuecomment-1272644332

   The related PRs are merged, I think we are good to merge this one as well. @JonasJ-ap let me know if you are interested in refactoring the redundant Spark examples a bit, otherwise I will find someone to work on that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #5902: Docs: Add doc for AWS assume role session name and several http client configurations

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #5902:
URL: https://github.com/apache/iceberg/pull/5902#discussion_r985267316


##########
docs/aws.md:
##########
@@ -552,12 +552,13 @@ This also serves as an example for users who would like to implement their own A
 
 This client factory has the following configurable catalog properties:
 
-| Property                          | Default                                  | Description                                            |
-| --------------------------------- | ---------------------------------------- | ------------------------------------------------------ |
-| client.assume-role.arn            | null, requires user input                | ARN of the role to assume, e.g. arn:aws:iam::123456789:role/myRoleToAssume  |
-| client.assume-role.region         | null, requires user input                | All AWS clients except the STS client will use the given region instead of the default region chain  |
-| client.assume-role.external-id    | null                                     | An optional [external ID](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html)  |
-| client.assume-role.timeout-sec    | 1 hour                                   | Timeout of each assume role session. At the end of the timeout, a new set of role session credentials will be fetched through a STS client.  |
+| Property                          | Default                   | Description                                                                                                                                 |

Review Comment:
   no need to change all the lines, just need to add one more line for the new config added



##########
docs/aws.md:
##########
@@ -573,6 +574,55 @@ spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersio
     --conf spark.sql.catalog.my_catalog.client.assume-role.region=ap-northeast-1
 ```
 
+### URL Connection HTTP Client
+In default, AWS clients use the [URL Connection HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/url-connection-client) for HTTP connection management.
+
+This HTTP Client has the following configurable properties:
+
+| Property                                        | Default | Description                                                                                                                                                                                                      |
+|-------------------------------------------------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| http-client.urlconnection.socket-timeout-ms     | null    | An optional [socket timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.Builder.html#socketTimeout(java.time.Duration)) in milliseconds         |
+| http-client.urlconnection.connection-timeout-ms | null    | An optional [connection timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.Builder.html#connectionTimeout(java.time.Duration)) in milliseconds |
+
+Here is an example to start Spark shell with URL Connection HTTP client and some optional settings:
+```shell
+spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersion %}},software.amazon.awssdk:bundle:2.17.257 \
+    --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \    
+    --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
+    --conf spark.sql.catalog.my_catalog.http-client.urlconnection.socket-timeout-ms=80 \
+    --conf spark.sql.catalog.my_catalog.http-client.urlconnection.connection-timeout-ms=90
+```
+
+### Apache HTTP Client

Review Comment:
   I think we want a bit more levels, 1 general level for HTTP client configuration, an intro paragraph about choosing the type of HTTP client, and 2 sub sections for configuring UrlConnection and Apache type clients.



##########
docs/aws.md:
##########
@@ -552,12 +552,13 @@ This also serves as an example for users who would like to implement their own A
 
 This client factory has the following configurable catalog properties:
 
-| Property                          | Default                                  | Description                                            |
-| --------------------------------- | ---------------------------------------- | ------------------------------------------------------ |
-| client.assume-role.arn            | null, requires user input                | ARN of the role to assume, e.g. arn:aws:iam::123456789:role/myRoleToAssume  |
-| client.assume-role.region         | null, requires user input                | All AWS clients except the STS client will use the given region instead of the default region chain  |
-| client.assume-role.external-id    | null                                     | An optional [external ID](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html)  |
-| client.assume-role.timeout-sec    | 1 hour                                   | Timeout of each assume role session. At the end of the timeout, a new set of role session credentials will be fetched through a STS client.  |
+| Property                          | Default                   | Description                                                                                                                                 |
+|-----------------------------------|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
+| client.assume-role.arn            | null, requires user input | ARN of the role to assume, e.g. arn:aws:iam::123456789:role/myRoleToAssume                                                                  |
+| client.assume-role.region         | null, requires user input | All AWS clients except the STS client will use the given region instead of the default region chain                                         |
+| client.assume-role.external-id    | null                      | An optional [external ID](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html)                        |
+| client.assume-role.timeout-sec    | 1 hour                    | Timeout of each assume role session. At the end of the timeout, a new set of role session credentials will be fetched through a STS client. |
+| client.assume-role.session-name   | null                      | An optional [session name](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_iam-condition-keys.html#ck_rolesessionname)  |

Review Comment:
   I know you recently added both changes to assume role factory and http client, but they are kind of unrelated. Can we make this only focused on the features in http client? that is already a good amount of changes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #5902: Docs: Add doc for HTTP client configurations

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #5902:
URL: https://github.com/apache/iceberg/pull/5902#discussion_r985872557


##########
docs/aws.md:
##########
@@ -573,6 +573,68 @@ spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersio
     --conf spark.sql.catalog.my_catalog.client.assume-role.region=ap-northeast-1
 ```
 
+### HTTP Client Configurations
+AWS clients support two types of HTTP Client, [URL Connection HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/url-connection-client) 
+and [Apache HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/apache-client).
+In default, AWS clients use URL Connection HTTP Client to communicate with the service. 
+This HTTP client optimizes for minimum dependencies and startup latency but support less functionality than other implementations. 
+In contrast, Apache HTTP Client supports more functionalities and more customized settings, such as expect-continue handshake and TCP KeepAlive, at cost of extra dependency and additional startup latency. 
+
+For more details of configuration, see sections [URL Connection HTTP Client Configurations](#url-connection-http-client-configurations) and [Apache HTTP Client Configurations](#apache-http-client-configurations).
+
+Configure the following property to set the type of HTTP client:
+
+| Property         | Default       | Description                                                                                                |
+|------------------|---------------|------------------------------------------------------------------------------------------------------------|
+| http-client.type | urlconnection | Types of HTTP Client. <br/> `urlconnection`: URL Connection HTTP Client <br/> `apache`: Apache HTTP Client |
+
+#### URL Connection HTTP Client Configurations
+
+URL Connection HTTP Client has the following configurable properties:
+
+| Property                                        | Default | Description                                                                                                                                                                                                      |
+|-------------------------------------------------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| http-client.urlconnection.socket-timeout-ms     | null    | An optional [socket timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.Builder.html#socketTimeout(java.time.Duration)) in milliseconds         |
+| http-client.urlconnection.connection-timeout-ms | null    | An optional [connection timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.Builder.html#connectionTimeout(java.time.Duration)) in milliseconds |
+
+Here is an example to start Spark shell with URL Connection HTTP client and some optional settings:

Review Comment:
   I think this is getting redundant as we are adding a Spark shell example for every doc section... I think we need to consolidate to a better way, any thoughts? @JonasJ-ap @rajarshisarkar @singhpk234 @amogh-jahagirdar 



##########
docs/aws.md:
##########
@@ -573,6 +573,68 @@ spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersio
     --conf spark.sql.catalog.my_catalog.client.assume-role.region=ap-northeast-1
 ```
 
+### HTTP Client Configurations
+AWS clients support two types of HTTP Client, [URL Connection HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/url-connection-client) 
+and [Apache HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/apache-client).
+In default, AWS clients use URL Connection HTTP Client to communicate with the service. 
+This HTTP client optimizes for minimum dependencies and startup latency but support less functionality than other implementations. 
+In contrast, Apache HTTP Client supports more functionalities and more customized settings, such as expect-continue handshake and TCP KeepAlive, at cost of extra dependency and additional startup latency. 
+
+For more details of configuration, see sections [URL Connection HTTP Client Configurations](#url-connection-http-client-configurations) and [Apache HTTP Client Configurations](#apache-http-client-configurations).
+
+Configure the following property to set the type of HTTP client:
+
+| Property         | Default       | Description                                                                                                |
+|------------------|---------------|------------------------------------------------------------------------------------------------------------|
+| http-client.type | urlconnection | Types of HTTP Client. <br/> `urlconnection`: URL Connection HTTP Client <br/> `apache`: Apache HTTP Client |
+
+#### URL Connection HTTP Client Configurations
+
+URL Connection HTTP Client has the following configurable properties:
+
+| Property                                        | Default | Description                                                                                                                                                                                                      |
+|-------------------------------------------------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| http-client.urlconnection.socket-timeout-ms     | null    | An optional [socket timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.Builder.html#socketTimeout(java.time.Duration)) in milliseconds         |
+| http-client.urlconnection.connection-timeout-ms | null    | An optional [connection timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.Builder.html#connectionTimeout(java.time.Duration)) in milliseconds |
+
+Here is an example to start Spark shell with URL Connection HTTP client and some optional settings:

Review Comment:
   I think this is getting redundant as we are adding a Spark shell example for every doc section... we need to consolidate to a better way, any thoughts? @JonasJ-ap @rajarshisarkar @singhpk234 @amogh-jahagirdar 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #5902: Docs: Add doc for HTTP client configurations

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #5902:
URL: https://github.com/apache/iceberg/pull/5902#discussion_r990813684


##########
docs/aws.md:
##########
@@ -573,6 +573,56 @@ spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersio
     --conf spark.sql.catalog.my_catalog.client.assume-role.region=ap-northeast-1
 ```
 
+### HTTP Client Configurations
+AWS clients support two types of HTTP Client, [URL Connection HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/url-connection-client) 
+and [Apache HTTP Client](https://mvnrepository.com/artifact/software.amazon.awssdk/apache-client).
+By default, AWS clients use **URL Connection** HTTP Client to communicate with the service. 
+This HTTP client optimizes for minimum dependencies and startup latency but support less functionality than other implementations. 
+In contrast, Apache HTTP Client supports more functionalities and more customized settings, such as expect-continue handshake and TCP KeepAlive, at cost of extra dependency and additional startup latency. 
+
+For more details of configuration, see sections [URL Connection HTTP Client Configurations](#url-connection-http-client-configurations) and [Apache HTTP Client Configurations](#apache-http-client-configurations).
+
+Configure the following property to set the type of HTTP client:
+
+| Property         | Default       | Description                                                                                                |
+|------------------|---------------|------------------------------------------------------------------------------------------------------------|
+| http-client.type | urlconnection | Types of HTTP Client. <br/> `urlconnection`: URL Connection HTTP Client <br/> `apache`: Apache HTTP Client |
+
+#### URL Connection HTTP Client Configurations
+
+URL Connection HTTP Client has the following configurable properties:
+
+| Property                                        | Default | Description                                                                                                                                                                                                      |
+|-------------------------------------------------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| http-client.urlconnection.socket-timeout-ms     | null    | An optional [socket timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.Builder.html#socketTimeout(java.time.Duration)) in milliseconds         |
+| http-client.urlconnection.connection-timeout-ms | null    | An optional [connection timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.Builder.html#connectionTimeout(java.time.Duration)) in milliseconds |
+
+Users can use catalog properties to override the defaults. For example, to configure the socket timeout for URL Connection HTTP Client when starting a spark shell, one can add:
+```shell
+--conf spark.sql.catalog.my_catalog.http-client.urlconnection.socket-timeout-ms=80
+```
+
+#### Apache HTTP Client Configurations
+
+Apache HTTP Client has the following configurable properties:
+
+| Property                                              | Default                   | Description                                                                                                                                                                                                                                 |
+|-------------------------------------------------------|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| http-client.apache.socket-timeout-ms                  | null                      | An optional [socket timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#socketTimeout(java.time.Duration)) in milliseconds                                                  |
+| http-client.apache.connection-timeout-ms              | null                      | An optional [connection timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#connectionTimeout(java.time.Duration)) in milliseconds                                          |
+| http-client.apache.connection-acquisition-timeout-ms  | null                      | An optional [connection acquisition timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#connectionAcquisitionTimeout(java.time.Duration)) in milliseconds                   |
+| http-client.apache.connection-max-idle-time-ms        | null                      | An optional [connection max idle timeout](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#connectionMaxIdleTime(java.time.Duration)) in milliseconds                             |
+| http-client.apache.connection-time-to-live-ms         | null                      | An optional [connection time to live](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#connectionTimeToLive(java.time.Duration)) in milliseconds                                  |
+| http-client.apache.expect-continue-enabled            | null, disabled by default | An optional `true/false` setting that decide whether to enable [expect continue](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#expectContinueEnabled(java.lang.Boolean))       |
+| http-client.apache.max-connections                    | null                      | An optional [max connections](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#maxConnections(java.lang.Integer))  in integer                                                     |
+| http-client.apache.tcp-keep-alive-enabled             | null, disabled by default | An optional `true/false` setting that decide whether to enable [tcp keep alive](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#tcpKeepAlive(java.lang.Boolean))                 |
+| http-client.apache.use-idle-connection-reaper-enabled | null, enabled by default  | An optional `true/false` setting that decide whether to [use idle connection reaper](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/apache/ApacheHttpClient.Builder.html#useIdleConnectionReaper(java.lang.Boolean)) |
+
+Users can use catalog properties to override the defaults. For example, to configure the max connections for Apache HTTP Client when starting a spark shell, one can add:
+```shell
+--conf spark.sql.catalog.my_catalog.http-client.apache.max-connections=5
+```

Review Comment:
   yeah I would agree with @singhpk234 that we just need to say 1 sentence at the very beginning saying these are all catalog properties, and please see https://iceberg.apache.org/docs/latest/configuration/#catalog-properties for how to use them and configure in different engines.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on pull request #5902: Docs: Add doc for HTTP client configurations

Posted by GitBox <gi...@apache.org>.
JonasJ-ap commented on PR #5902:
URL: https://github.com/apache/iceberg/pull/5902#issuecomment-1272650111

   > The related PRs are merged, I think we are good to merge this one as well. @JonasJ-ap let me know if you are interested in refactoring the redundant Spark examples a bit, otherwise I will find someone to work on that.
   
   Thank you for your review. I'm happy to do this refactoring task.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org