You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/13 09:10:38 UTC

[GitHub] [iceberg] flyrain opened a new pull request #2330: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

flyrain opened a new pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330


   https://github.com/apache/iceberg/issues/2309
   A unit test is added. Besides here is the integration test with Apache Spark 3.0.2:
   ```
   spark-sql> describe table extended local.db.table;
   21/03/13 00:22:33 INFO BaseMetastoreCatalog: Table loaded by catalog: local.db.table
   21/03/13 00:22:35 INFO CodeGenerator: Code generated in 242.060932 ms
   21/03/13 00:22:35 INFO CodeGenerator: Code generated in 11.30419 ms
   id	bigint
   data	string
   
   # Partitioning
   Not partitioned
   
   # Detailed Table Information
   Name	local.db.table
   Location	/Users/yufei/Downloads/spark-3.0.2-bin-hadoop2.7/warehouse/db/table
   Provider	iceberg
   Owner	yufei
   Table Properties	[current-snapshot-id=none,format=iceberg/parquet]
   Time taken: 3.503 seconds, Fetched 12 row(s)
   21/03/13 00:22:35 INFO SparkSQLCLIDriver: Time taken: 3.503 seconds, Fetched 12 row(s)
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2330: Spark: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330#discussion_r600070709



##########
File path: spark3/src/test/java/org/apache/iceberg/actions/TestCreateActions.java
##########
@@ -447,6 +447,30 @@ public void testProperties() throws Exception {
     }
   }
 
+  @Test
+  public void testSparkTableReservedProperties() throws Exception {
+    String destTableName = "iceberg_reserved_properties";
+    String source = sourceName("test_reserved_properties_table");
+    String dest = destName(destTableName);
+    createSourceTable(CREATE_PARQUET, source);
+    assertMigratedFileCount(Actions.snapshot(source, dest), source, dest);
+    SparkTable table = loadTable(dest);
+
+    String[] keys = {"provider", "format", "current-snapshot-id", "location"};
+
+    for (String entry : keys) {
+      Assert.assertTrue(
+              "Created table missing reserved property " + entry, table.properties().containsKey(entry));
+    }
+
+    Assert.assertEquals("Property value is not the expected value", "iceberg", table.properties().get("provider"));
+    Assert.assertEquals("Property value is not the expected value", "iceberg/parquet",
+        table.properties().get("format"));
+    Assert.assertTrue("Current-snapshot-id doesn't exist",

Review comment:
       nit: would `assertNotEquals` be more appropriate?

##########
File path: spark3/src/main/java/org/apache/iceberg/spark/source/SparkTable.java
##########
@@ -63,7 +63,8 @@
 
   private static final Logger LOG = LoggerFactory.getLogger(SparkTable.class);
 
-  private static final Set<String> RESERVED_PROPERTIES = Sets.newHashSet("provider", "format", "current-snapshot-id");
+  private static final Set<String> RESERVED_PROPERTIES =
+      Sets.newHashSet("provider", "format", "current-snapshot-id", "location");

Review comment:
       nit: not directly related to this PR, but shall we use `ImmutableSet` for consistency?

##########
File path: spark3/src/test/java/org/apache/iceberg/actions/TestCreateActions.java
##########
@@ -447,6 +447,30 @@ public void testProperties() throws Exception {
     }
   }
 
+  @Test
+  public void testSparkTableReservedProperties() throws Exception {
+    String destTableName = "iceberg_reserved_properties";
+    String source = sourceName("test_reserved_properties_table");
+    String dest = destName(destTableName);
+    createSourceTable(CREATE_PARQUET, source);
+    assertMigratedFileCount(Actions.snapshot(source, dest), source, dest);
+    SparkTable table = loadTable(dest);
+
+    String[] keys = {"provider", "format", "current-snapshot-id", "location"};
+
+    for (String entry : keys) {
+      Assert.assertTrue(

Review comment:
       I think this would fit on one line. We have our max line length as 120.

##########
File path: spark3/src/test/java/org/apache/iceberg/actions/TestCreateActions.java
##########
@@ -447,6 +447,30 @@ public void testProperties() throws Exception {
     }
   }
 
+  @Test
+  public void testSparkTableReservedProperties() throws Exception {
+    String destTableName = "iceberg_reserved_properties";
+    String source = sourceName("test_reserved_properties_table");
+    String dest = destName(destTableName);
+    createSourceTable(CREATE_PARQUET, source);
+    assertMigratedFileCount(Actions.snapshot(source, dest), source, dest);
+    SparkTable table = loadTable(dest);
+
+    String[] keys = {"provider", "format", "current-snapshot-id", "location"};
+
+    for (String entry : keys) {
+      Assert.assertTrue(
+              "Created table missing reserved property " + entry, table.properties().containsKey(entry));
+    }
+
+    Assert.assertEquals("Property value is not the expected value", "iceberg", table.properties().get("provider"));
+    Assert.assertEquals("Property value is not the expected value", "iceberg/parquet",

Review comment:
       I think if we make the error message shorter, this would fit on one line.
   
   ```
       Assert.assertEquals("Unexpected provider", "iceberg", table.properties().get("provider"));
       Assert.assertEquals("Unexpected format", "iceberg/parquet", table.properties().get("format"));
       ...
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2330: Spark: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330#issuecomment-805389466


   What do you also think, @rdblue @shardulm94? 
   
   We could also show the current sort order and spec using `sort-order` and `spec` table properties. Would that make sense?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi edited a comment on pull request #2330: Spark: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

Posted by GitBox <gi...@apache.org>.
aokolnychyi edited a comment on pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330#issuecomment-805393966


   I am also thinking about the best way to show ALL specs and sort orders to the user. Maybe, we can offer procedures for that? Any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on pull request #2330: Spark: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

Posted by GitBox <gi...@apache.org>.
flyrain commented on pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330#issuecomment-806408355


   Hi @aokolnychyi, thanks for the review and nice suggestion. I have made the changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on pull request #2330: Spark: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

Posted by GitBox <gi...@apache.org>.
flyrain commented on pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330#issuecomment-810473752


   Thanks @RussellSpitzer, @aokolnychyi and @rdblue for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #2330: Spark: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330#issuecomment-799507745


   > I would just add one more test to verify the location is set correctly in the table properties on read.
   
   Ah I missed it as one of the asserts! LGTM +1


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on pull request #2330: Spark: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

Posted by GitBox <gi...@apache.org>.
flyrain commented on pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330#issuecomment-809950961


   Hi @aokolnychyi @rdblue @RussellSpitzer, is this PR ready for merging?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on pull request #2330: Spark: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

Posted by GitBox <gi...@apache.org>.
flyrain commented on pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330#issuecomment-799627934


   Thanks @RussellSpitzer for the review. Hi @aokolnychyi, can you take a look?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2330: Spark: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330#issuecomment-807231637


   @aokolnychyi, I think we can add more reserved properties in parallel. It's a good idea to add sort order, but spec can already be viewed in Spark using `DESCRIBE FORMATTED` so I don't think we need to add it. We should also consider the fact that users will try to set these properties directly, and make sure we have a good error message when it doesn't work.
   
   This PR looks good to me. Any other comments before merging?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2330: Spark: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330#issuecomment-805393966


   I am also thinking about the best way to show specs and sort orders to the user. Maybe, we can offer procedures for that? Any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on a change in pull request #2330: Spark: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

Posted by GitBox <gi...@apache.org>.
flyrain commented on a change in pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330#discussion_r601109220



##########
File path: spark3/src/main/java/org/apache/iceberg/spark/source/SparkTable.java
##########
@@ -63,7 +63,8 @@
 
   private static final Logger LOG = LoggerFactory.getLogger(SparkTable.class);
 
-  private static final Set<String> RESERVED_PROPERTIES = Sets.newHashSet("provider", "format", "current-snapshot-id");
+  private static final Set<String> RESERVED_PROPERTIES =
+      Sets.newHashSet("provider", "format", "current-snapshot-id", "location");

Review comment:
       Fair enough. Made the change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2330: Spark: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330#issuecomment-810457680


   Since there were no more comments, I merged this. Thanks @flyrain!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #2330: Spark: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330#issuecomment-799507074


   I would just add one more test to verify the location is set correctly in the table properties on read.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #2330: Spark: Add table location to reserved properties so that the table location shows in Spark SQL Properties.

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #2330:
URL: https://github.com/apache/iceberg/pull/2330


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org