You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/15 02:40:58 UTC

[GitHub] [iceberg] singhpk234 opened a new pull request, #6190: Spark: Backport #5860 to spark 3.x

singhpk234 opened a new pull request, #6190:
URL: https://github.com/apache/iceberg/pull/6190

   ### About change
   
   Backport https://github.com/apache/iceberg/pull/5860 to spark 3.x
   
   cc @aokolnychyi @rdblue @ajantha-bhat 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on a diff in pull request #6190: Spark: Backport #5860 to spark 3.x

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #6190:
URL: https://github.com/apache/iceberg/pull/6190#discussion_r1022261543


##########
spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/SparkValueConverter.java:
##########
@@ -70,9 +72,25 @@ public static Object convert(Type type, Object object) {
         return convertedMap;
 
       case DATE:
-        return DateTimeUtils.fromJavaDate((Date) object);
+        // if spark.sql.datetime.java8API.enabled is set to true, java.time.LocalDate
+        // for Spark SQL DATE type otherwise java.sql.Date is returned.
+        if (object instanceof Date) {
+          return DateTimeUtils.fromJavaDate((Date) object);
+        } else if (object instanceof LocalDate) {
+          return DateTimeUtils.localDateToDays((LocalDate) object);
+        } else {
+          throw new UnsupportedOperationException("Not a supported date class: " + object);
+        }

Review Comment:
   DateTimeUtils#anyToDays & DateTimeUtils#anyToMicros utils were introduced at part of this [commit](anyToMicros) of 3.3.0 hence not present here.
   
   Since the above two utils do a scala `match` without fallback if none of the scenarios matched, added UnsupportedOperationException here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on a diff in pull request #6190: Spark: Backport #5860 to spark 3.x

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #6190:
URL: https://github.com/apache/iceberg/pull/6190#discussion_r1022261543


##########
spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/SparkValueConverter.java:
##########
@@ -70,9 +72,25 @@ public static Object convert(Type type, Object object) {
         return convertedMap;
 
       case DATE:
-        return DateTimeUtils.fromJavaDate((Date) object);
+        // if spark.sql.datetime.java8API.enabled is set to true, java.time.LocalDate
+        // for Spark SQL DATE type otherwise java.sql.Date is returned.
+        if (object instanceof Date) {
+          return DateTimeUtils.fromJavaDate((Date) object);
+        } else if (object instanceof LocalDate) {
+          return DateTimeUtils.localDateToDays((LocalDate) object);
+        } else {
+          throw new UnsupportedOperationException("Not a supported date class: " + object);
+        }

Review Comment:
   DateTimeUtils#anyToDays & DateTimeUtils#anyToMicros utils were introduced at part of this [commit](anyToMicros) of 3.3.0 hence not present in lower version so added the checks here.
   
   Since the above two utils do a direct scala `match` without matching when none of the scenarios matched, which would result in exception, added UnsupportedOperationException here as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #6190: Spark[3.1 | 3.2]: Support Java 8 time API in SparkValueConverter

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #6190:
URL: https://github.com/apache/iceberg/pull/6190


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on a diff in pull request #6190: Spark: Backport #5860 to spark 3.x

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #6190:
URL: https://github.com/apache/iceberg/pull/6190#discussion_r1022262621


##########
spark/v3.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteManifestsProcedure.java:
##########
@@ -73,6 +78,93 @@ public void testRewriteLargeManifests() {
         "Must have 4 manifests", 4, table.currentSnapshot().allManifests(table.io()).size());
   }
 
+  @Test
+  public void testRewriteLargeManifestsOnDatePartitionedTableWithJava8APIEnabled() {
+    sql(
+        "CREATE TABLE %s (id INTEGER, name STRING, dept STRING, ts DATE) USING iceberg PARTITIONED BY (ts)",
+        tableName);
+    try {
+      spark
+          .createDataFrame(
+              ImmutableList.of(
+                  RowFactory.create(1, "John Doe", "hr", Date.valueOf("2021-01-01")),
+                  RowFactory.create(2, "Jane Doe", "hr", Date.valueOf("2021-01-02")),
+                  RowFactory.create(3, "Matt Doe", "hr", Date.valueOf("2021-01-03")),
+                  RowFactory.create(4, "Will Doe", "facilities", Date.valueOf("2021-01-04"))),
+              spark.table(tableName).schema())
+          .writeTo(tableName)
+          .append();
+    } catch (NoSuchTableException e) {
+      // not possible as we already created the table above.
+      throw new RuntimeException(e);
+    }

Review Comment:
   move this outside of the `spark.sql.datetime.java8API.enabled` as dataframe write was failing it was expecting the LocalDate obj rather than Date obj. 
   
   Note this was not an issue an issue for 3.3 & 3.2 seems like something broke between 3.1 and 3.2



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on a diff in pull request #6190: Spark: Backport #5860 to spark 3.x

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #6190:
URL: https://github.com/apache/iceberg/pull/6190#discussion_r1022262621


##########
spark/v3.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteManifestsProcedure.java:
##########
@@ -73,6 +78,93 @@ public void testRewriteLargeManifests() {
         "Must have 4 manifests", 4, table.currentSnapshot().allManifests(table.io()).size());
   }
 
+  @Test
+  public void testRewriteLargeManifestsOnDatePartitionedTableWithJava8APIEnabled() {
+    sql(
+        "CREATE TABLE %s (id INTEGER, name STRING, dept STRING, ts DATE) USING iceberg PARTITIONED BY (ts)",
+        tableName);
+    try {
+      spark
+          .createDataFrame(
+              ImmutableList.of(
+                  RowFactory.create(1, "John Doe", "hr", Date.valueOf("2021-01-01")),
+                  RowFactory.create(2, "Jane Doe", "hr", Date.valueOf("2021-01-02")),
+                  RowFactory.create(3, "Matt Doe", "hr", Date.valueOf("2021-01-03")),
+                  RowFactory.create(4, "Will Doe", "facilities", Date.valueOf("2021-01-04"))),
+              spark.table(tableName).schema())
+          .writeTo(tableName)
+          .append();
+    } catch (NoSuchTableException e) {
+      // not possible as we already created the table above.
+      throw new RuntimeException(e);
+    }

Review Comment:
   move this outside of the `spark.sql.datetime.java8API.enabled` prop setting as dataframe write was failing it was expecting the LocalDate obj rather than Date obj. 
   
   Note: This was not an issue an issue for 3.3 & 3.2 seems like something broke between 3.1 and 3.2



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #6190: Spark[3.1 | 3.2]: Support Java 8 time API in SparkValueConverter

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #6190:
URL: https://github.com/apache/iceberg/pull/6190#issuecomment-1356907875

   Thanks, @singhpk234!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #6190: Spark: Backport #5860 to spark 3.x

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on PR #6190:
URL: https://github.com/apache/iceberg/pull/6190#issuecomment-1314809321

   Thanks, @singhpk234! I'll check with fresh eyes tomorrow since it is not a clean cherry-pick.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zinking commented on pull request #6190: Spark: Backport #5860 to spark 3.x

Posted by GitBox <gi...@apache.org>.
zinking commented on PR #6190:
URL: https://github.com/apache/iceberg/pull/6190#issuecomment-1315281689

   can we include a little description regarding what is backported in the title,  makes things more direct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on a diff in pull request #6190: Spark: Backport #5860 to spark 3.x

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #6190:
URL: https://github.com/apache/iceberg/pull/6190#discussion_r1022262621


##########
spark/v3.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteManifestsProcedure.java:
##########
@@ -73,6 +78,93 @@ public void testRewriteLargeManifests() {
         "Must have 4 manifests", 4, table.currentSnapshot().allManifests(table.io()).size());
   }
 
+  @Test
+  public void testRewriteLargeManifestsOnDatePartitionedTableWithJava8APIEnabled() {
+    sql(
+        "CREATE TABLE %s (id INTEGER, name STRING, dept STRING, ts DATE) USING iceberg PARTITIONED BY (ts)",
+        tableName);
+    try {
+      spark
+          .createDataFrame(
+              ImmutableList.of(
+                  RowFactory.create(1, "John Doe", "hr", Date.valueOf("2021-01-01")),
+                  RowFactory.create(2, "Jane Doe", "hr", Date.valueOf("2021-01-02")),
+                  RowFactory.create(3, "Matt Doe", "hr", Date.valueOf("2021-01-03")),
+                  RowFactory.create(4, "Will Doe", "facilities", Date.valueOf("2021-01-04"))),
+              spark.table(tableName).schema())
+          .writeTo(tableName)
+          .append();
+    } catch (NoSuchTableException e) {
+      // not possible as we already created the table above.
+      throw new RuntimeException(e);
+    }

Review Comment:
   move this outside of the `spark.sql.datetime.java8API.enabled` conf setting as dataframe write was failing because it was expecting LocalDate obj rather than Date obj. 
   
   Note: This was not an issue an issue for 3.3 & 3.2 seems like something broke between 3.1 and 3.2



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org