You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/17 01:45:15 UTC

[GitHub] [iceberg] JonasJ-ap opened a new pull request, #5994: Doc: add assume role session name doc and remove redundant spark shell examples

JonasJ-ap opened a new pull request, #5994:
URL: https://github.com/apache/iceberg/pull/5994

   1. Add doc for optional assume role session name which was added in #5765 
   2. Remove redundant spark examples and refactor some descriptions. These changes were suggested in #5902 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a diff in pull request #5994: Doc: add assume role session name doc and remove redundant spark shell examples

Posted by GitBox <gi...@apache.org>.

rdblue commented on code in PR #5994:
URL: https://github.com/apache/iceberg/pull/5994#discussion_r1002767398


##########
docs/aws.md:
##########
@@ -435,48 +437,23 @@ This is turned off by default.
 ### S3 Tags
 
 Custom [tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) can be added to S3 objects while writing and deleting.
-For example, to write S3 tags with Spark 3.0, you can start the Spark SQL shell with:
-```
-spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
-    --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \
-    --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
-    --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
-    --conf spark.sql.catalog.my_catalog.s3.write.tags.my_key1=my_val1 \
-    --conf spark.sql.catalog.my_catalog.s3.write.tags.my_key2=my_val2
-```
-For the above example, the objects in S3 will be saved with tags: `my_key1=my_val1` and `my_key2=my_val2`. Do note that the specified write tags will be saved only while object creation.
+
+To write S3 tags, `s3.write.tags` need to be configured. For example, if set `s3.write.tags.my_key1=my_val1` and `s3.write.tags.my_key2=my_val2`, 

Review Comment:
   Grammar: "if set" has no subject.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a diff in pull request #5994: Doc: add assume role session name doc and remove redundant spark shell examples

Posted by GitBox <gi...@apache.org>.

rdblue commented on code in PR #5994:
URL: https://github.com/apache/iceberg/pull/5994#discussion_r1002767756


##########
docs/aws.md:
##########
@@ -435,48 +437,23 @@ This is turned off by default.
 ### S3 Tags
 
 Custom [tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) can be added to S3 objects while writing and deleting.
-For example, to write S3 tags with Spark 3.0, you can start the Spark SQL shell with:
-```
-spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
-    --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \
-    --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
-    --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
-    --conf spark.sql.catalog.my_catalog.s3.write.tags.my_key1=my_val1 \
-    --conf spark.sql.catalog.my_catalog.s3.write.tags.my_key2=my_val2
-```
-For the above example, the objects in S3 will be saved with tags: `my_key1=my_val1` and `my_key2=my_val2`. Do note that the specified write tags will be saved only while object creation.
+
+To write S3 tags, `s3.write.tags` need to be configured. For example, if set `s3.write.tags.my_key1=my_val1` and `s3.write.tags.my_key2=my_val2`, 
+the objects in S3 will be saved with tags: `my_key1=my_val1` and `my_key2=my_val2`. Do note that the specified write tags will be saved only while object creation.
 
 When the catalog property `s3.delete-enabled` is set to `false`, the objects are not hard-deleted from S3.
 This is expected to be used in combination with S3 delete tagging, so objects are tagged and removed using [S3 lifecycle policy](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html).
 The property is set to `true` by default.
 
 With the `s3.delete.tags` config, objects are tagged with the configured key-value pairs before deletion.
 Users can configure tag-based object lifecycle policy at bucket level to transition objects to different tiers.
-For example, to add S3 delete tags with Spark 3.0, you can start the Spark SQL shell with: 
-
-```
-sh spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
-    --conf spark.sql.catalog.my_catalog.warehouse=s3://iceberg-warehouse/s3-tagging \
-    --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
-    --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
-    --conf spark.sql.catalog.my_catalog.s3.delete.tags.my_key3=my_val3 \
-    --conf spark.sql.catalog.my_catalog.s3.delete-enabled=false
-```

Review Comment:
   I think it is fine to remove the Spark command, but I think these should still use a code block that sets the relevant Spark properties.
   
   Basically, change this so that it talks about settings in your Spark config file and then have code blocks like this:
   
   ```
   spark.sql.catalog.my_catalog.s3.delete.tags.my_key3=my_val3
   spark.sql.catalog.my_catalog.s3.delete-enabled=false
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a diff in pull request #5994: Doc: add assume role session name doc and remove redundant spark shell examples

Posted by GitBox <gi...@apache.org>.

rdblue commented on code in PR #5994:
URL: https://github.com/apache/iceberg/pull/5994#discussion_r1002767844


##########
docs/aws.md:
##########
@@ -558,20 +509,9 @@ This client factory has the following configurable catalog properties:
 | client.assume-role.region         | null, requires user input                | All AWS clients except the STS client will use the given region instead of the default region chain  |
 | client.assume-role.external-id    | null                                     | An optional [external ID](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html)  |
 | client.assume-role.timeout-sec    | 1 hour                                   | Timeout of each assume role session. At the end of the timeout, a new set of role session credentials will be fetched through a STS client.  |
-
+| client.assume-role.session-name | null                      | An optional [session name](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_iam-condition-keys.html#ck_rolesessionname)  |

Review Comment:
   Please align the new row with the existing columns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a diff in pull request #5994: Doc: add assume role session name doc and remove redundant spark shell examples

Posted by GitBox <gi...@apache.org>.

rdblue commented on code in PR #5994:
URL: https://github.com/apache/iceberg/pull/5994#discussion_r1002767286


##########
docs/aws.md:
##########
@@ -326,6 +326,8 @@ For more details, please refer to [Lock catalog properties](../configuration/#lo
 Iceberg allows users to write data to S3 through `S3FileIO`.
 `GlueCatalog` by default uses this `FileIO`, and other catalogs can load this `FileIO` using the `io-impl` catalog property.
 
+Users can use catalog properties to override the defaults. Please see https://iceberg.apache.org/docs/latest/configuration/#catalog-properties for how to use these catalog properties and configure in different engines.

Review Comment:
   Should not use bare link.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #5994: Doc: add assume role session name doc and remove redundant spark shell examples

Posted by GitBox <gi...@apache.org>.

JonasJ-ap commented on code in PR #5994:
URL: https://github.com/apache/iceberg/pull/5994#discussion_r1003915535


##########
docs/aws.md:
##########
@@ -435,48 +437,23 @@ This is turned off by default.
 ### S3 Tags
 
 Custom [tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) can be added to S3 objects while writing and deleting.
-For example, to write S3 tags with Spark 3.0, you can start the Spark SQL shell with:
-```
-spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
-    --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \
-    --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
-    --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
-    --conf spark.sql.catalog.my_catalog.s3.write.tags.my_key1=my_val1 \
-    --conf spark.sql.catalog.my_catalog.s3.write.tags.my_key2=my_val2
-```
-For the above example, the objects in S3 will be saved with tags: `my_key1=my_val1` and `my_key2=my_val2`. Do note that the specified write tags will be saved only while object creation.
+
+To write S3 tags, `s3.write.tags` need to be configured. For example, if set `s3.write.tags.my_key1=my_val1` and `s3.write.tags.my_key2=my_val2`, 
+the objects in S3 will be saved with tags: `my_key1=my_val1` and `my_key2=my_val2`. Do note that the specified write tags will be saved only while object creation.
 
 When the catalog property `s3.delete-enabled` is set to `false`, the objects are not hard-deleted from S3.
 This is expected to be used in combination with S3 delete tagging, so objects are tagged and removed using [S3 lifecycle policy](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html).
 The property is set to `true` by default.
 
 With the `s3.delete.tags` config, objects are tagged with the configured key-value pairs before deletion.
 Users can configure tag-based object lifecycle policy at bucket level to transition objects to different tiers.
-For example, to add S3 delete tags with Spark 3.0, you can start the Spark SQL shell with: 
-
-```
-sh spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
-    --conf spark.sql.catalog.my_catalog.warehouse=s3://iceberg-warehouse/s3-tagging \
-    --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
-    --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
-    --conf spark.sql.catalog.my_catalog.s3.delete.tags.my_key3=my_val3 \
-    --conf spark.sql.catalog.my_catalog.s3.delete-enabled=false
-```

Review Comment:
   Thank you for your suggestions. I refactored and added code blocks for every deleted Spark command example.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org