You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/17 00:52:07 UTC

[GitHub] [iceberg] price-qian opened a new pull request, #5555: Add s3.acceleration-enbled flag to AwsProperties

price-qian opened a new pull request, #5555:
URL: https://github.com/apache/iceberg/pull/5555

   Compiled iceberg-spark-runtime-3.1 and tested on Glue 3.0. The job is able to read from a DEMO S3 bucket into an S3 bucket that lives in my account. The DEMO bucket is from this blog: https://aws.amazon.com/blogs/big-data/build-an-apache-iceberg-data-lake-using-amazon-athena-amazon-emr-and-aws-glue/
   
   My Spark job used during the testing is listed below.
   ```
   import com.amazonaws.services.glue.GlueContext
   
   import org.apache.spark.SparkContext
   import org.apache.spark.sql.SparkSession
   
   object IcebergSparkSQL {
     def main(sysArgs: Array[String]) {
           val sparkContext: SparkContext = new SparkContext()
           val spark: SparkSession = SparkSession.builder.
             config("spark.sql.catalog.demo", "org.apache.iceberg.spark.SparkCatalog").
             config("spark.sql.catalog.demo.warehouse", "s3://iceberg-test-puzhen/glueiceberg").
             config("spark.sql.catalog.demo.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog").
             config("spark.sql.catalog.demo.s3.acceleration-enabled", "true").
             getOrCreate()
           
           // create database
           spark.sql("CREATE DATABASE IF NOT EXISTS demo.reviews")
           
           // load data (Amazon reviews public dataset)
           val book_reviews_location = "s3://amazon-reviews-pds/parquet/product_category=Books/*.parquet"
           val book_reviews = spark.read.parquet(book_reviews_location)
           book_reviews.writeTo("demo.reviews.book_reviews2").
             tableProperty("format-version", "2").
             createOrReplace()
           
           // read using SQL
           spark.sql("SELECT * FROM demo.reviews.book_reviews2").show()
     }
   }
   ```
   
   Job runtime duration:
   
   Workload run without S3 acceleration: 3 minutes 21 seconds
   Workload run with S3 acceleration: 3 minutes 14 seconds


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #5555: Add s3.acceleration-enbled flag to AwsProperties

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on PR #5555:
URL: https://github.com/apache/iceberg/pull/5555#issuecomment-1219681831

   This is a straight forward change so I will directly merge it, thanks for the contribution! And thanks for the review @amogh-jahagirdar 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] price-qian commented on pull request #5555: Add s3.acceleration-enbled flag to AwsProperties

Posted by GitBox <gi...@apache.org>.
price-qian commented on PR #5555:
URL: https://github.com/apache/iceberg/pull/5555#issuecomment-1221635316

   Great thanks @jackye1995 for the opportunity to make this PR. Also great thanks to both @jackye1995 and @amogh-jahagirdar for the help in the development and testing of this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #5555: Add s3.acceleration-enbled flag to AwsProperties

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #5555:
URL: https://github.com/apache/iceberg/pull/5555#discussion_r948291535


##########
aws/src/main/java/org/apache/iceberg/aws/AwsClientFactories.java:
##########
@@ -183,10 +190,11 @@ public static <T extends SdkClientBuilder> void configureEndpoint(T builder, Str
   }
 
   public static S3Configuration s3Configuration(
-      Boolean pathStyleAccess, Boolean s3UseArnRegionEnabled) {
+      Boolean pathStyleAccess, Boolean s3UseArnRegionEnabled, Boolean s3AccelerationEnabled) {

Review Comment:
   Another task we need to do is to make sure for all these configs we add, all the client factories in open source are applying those configs. For now there are 3: default, assume role, LakeFormation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] price-qian commented on a diff in pull request #5555: Add s3.acceleration-enbled flag to AwsProperties

Posted by GitBox <gi...@apache.org>.
price-qian commented on code in PR #5555:
URL: https://github.com/apache/iceberg/pull/5555#discussion_r948297684


##########
aws/src/main/java/org/apache/iceberg/aws/AwsClientFactories.java:
##########
@@ -183,10 +190,11 @@ public static <T extends SdkClientBuilder> void configureEndpoint(T builder, Str
   }
 
   public static S3Configuration s3Configuration(
-      Boolean pathStyleAccess, Boolean s3UseArnRegionEnabled) {
+      Boolean pathStyleAccess, Boolean s3UseArnRegionEnabled, Boolean s3AccelerationEnabled) {

Review Comment:
   I think it's fair to move directly to s3(). One question about deprecation, why can't we just make it private?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 merged pull request #5555: Add s3.acceleration-enbled flag to AwsProperties

Posted by GitBox <gi...@apache.org>.
jackye1995 merged PR #5555:
URL: https://github.com/apache/iceberg/pull/5555


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #5555: Add s3.acceleration-enbled flag to AwsProperties

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on PR #5555:
URL: https://github.com/apache/iceberg/pull/5555#issuecomment-1218224207

   Thanks for the contribution, this is definitely super useful feature flag to add. Just one comment regarding the public method we are modifying. Let's see if it breaks CI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on pull request #5555: Add s3.acceleration-enbled flag to AwsProperties

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on PR #5555:
URL: https://github.com/apache/iceberg/pull/5555#issuecomment-1221642523

   Great to have this PR in Price, this is really awesome! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5555: Add s3.acceleration-enbled flag to AwsProperties

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5555:
URL: https://github.com/apache/iceberg/pull/5555#discussion_r948512759


##########
aws/src/main/java/org/apache/iceberg/aws/AwsClientFactories.java:
##########
@@ -183,10 +190,11 @@ public static <T extends SdkClientBuilder> void configureEndpoint(T builder, Str
   }
 
   public static S3Configuration s3Configuration(
-      Boolean pathStyleAccess, Boolean s3UseArnRegionEnabled) {
+      Boolean pathStyleAccess, Boolean s3UseArnRegionEnabled, Boolean s3AccelerationEnabled) {

Review Comment:
   Agree, let's deprecate s3Configuration, it shouldn't be exposed. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #5555: Add s3.acceleration-enbled flag to AwsProperties

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #5555:
URL: https://github.com/apache/iceberg/pull/5555#discussion_r948324667


##########
aws/src/main/java/org/apache/iceberg/aws/AwsClientFactories.java:
##########
@@ -183,10 +190,11 @@ public static <T extends SdkClientBuilder> void configureEndpoint(T builder, Str
   }
 
   public static S3Configuration s3Configuration(
-      Boolean pathStyleAccess, Boolean s3UseArnRegionEnabled) {
+      Boolean pathStyleAccess, Boolean s3UseArnRegionEnabled, Boolean s3AccelerationEnabled) {

Review Comment:
   for any public method that is already released, there might be people dependent on it, so if we want to remove it we deprecate it, keep it for a few releases and then remove.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #5555: Add s3.acceleration-enbled flag to AwsProperties

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #5555:
URL: https://github.com/apache/iceberg/pull/5555#discussion_r948155475


##########
aws/src/main/java/org/apache/iceberg/aws/AwsClientFactories.java:
##########
@@ -183,10 +190,11 @@ public static <T extends SdkClientBuilder> void configureEndpoint(T builder, Str
   }
 
   public static S3Configuration s3Configuration(
-      Boolean pathStyleAccess, Boolean s3UseArnRegionEnabled) {
+      Boolean pathStyleAccess, Boolean s3UseArnRegionEnabled, Boolean s3AccelerationEnabled) {

Review Comment:
   somehow we marked this as public, which is definitely a miss. Let me enable CI first and see if it detects it as incompatible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #5555: Add s3.acceleration-enbled flag to AwsProperties

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #5555:
URL: https://github.com/apache/iceberg/pull/5555#discussion_r948289776


##########
aws/src/main/java/org/apache/iceberg/aws/AwsClientFactories.java:
##########
@@ -183,10 +190,11 @@ public static <T extends SdkClientBuilder> void configureEndpoint(T builder, Str
   }
 
   public static S3Configuration s3Configuration(
-      Boolean pathStyleAccess, Boolean s3UseArnRegionEnabled) {
+      Boolean pathStyleAccess, Boolean s3UseArnRegionEnabled, Boolean s3AccelerationEnabled) {

Review Comment:
   The checks passed. For dealing with this public method, I see this is currently only called in `s3()`,  so I think:
   1. for this PR, let's move the contents of creating `S3Configuration` directly to s3(), I don't think we need a dedicated method for this 1 liner. 
   2. we should have another PR to deprecate this method, before people take dependency on it.
   
   Any thoughts? @price-qian @amogh-jahagirdar @xiaoxuandev 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on pull request #5555: Add s3.acceleration-enbled flag to AwsProperties

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on PR #5555:
URL: https://github.com/apache/iceberg/pull/5555#issuecomment-1218605037

   This is great! Overall LGTM but I think we should integrate this to all the AwsClientFactory implementations provided by Iceberg.  
   
   https://github.com/apache/iceberg/blob/master/aws/src/main/java/org/apache/iceberg/aws/lakeformation/LakeFormationAwsClientFactory.java 
   
   and 
   
   https://github.com/apache/iceberg/blob/master/aws/src/main/java/org/apache/iceberg/aws/AssumeRoleAwsClientFactory.java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org