You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/09/17 07:52:20 UTC

[GitHub] [iceberg] wangyum opened a new pull request, #5783: [WIP] Build: Update Spark to 3.3.1

wangyum opened a new pull request, #5783:
URL: https://github.com/apache/iceberg/pull/5783

   Spark 3.3.1 is voting. This PR is used to test Spark 3.3.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wangyum commented on a diff in pull request #5783: [WIP] Build: Update Spark to 3.3.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #5783:
URL: https://github.com/apache/iceberg/pull/5783#discussion_r973556180


##########
build.gradle:
##########
@@ -96,6 +96,7 @@ allprojects {
   group = "org.apache.iceberg"
   version = projectVersion
   repositories {
+    maven { url  "https://repository.apache.org/content/repositories/orgapachespark-1418/" }

Review Comment:
   Will remove it if Spark 3.3.1 is officially released.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bryanck commented on pull request #5783: [WIP] Build: Update Spark to 3.3.1

Posted by GitBox <gi...@apache.org>.
bryanck commented on PR #5783:
URL: https://github.com/apache/iceberg/pull/5783#issuecomment-1271705171

   I thought I'd mention that I uncovered a performance regression related to join distribution in Spark 3.3.0 vs Spark 3.2.x that impacts DSv2 sources like Iceberg. It would be nice if we could get a fix into Spark 3.3.1, but if not, we can work around it in the Iceberg Spark runtime. (The Spark ticket is [here](https://issues.apache.org/jira/browse/SPARK-40703) if interested.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi merged pull request #5783: Build: Update Spark to 3.3.1

Posted by GitBox <gi...@apache.org>.
aokolnychyi merged PR #5783:
URL: https://github.com/apache/iceberg/pull/5783


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #5783: Build: Update Spark to 3.3.1

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on PR #5783:
URL: https://github.com/apache/iceberg/pull/5783#issuecomment-1295614949

   Thanks, @wangyum! Thanks for reviewing, @singhpk234!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on a diff in pull request #5783: Build: Update Spark to 3.3.1

Posted by GitBox <gi...@apache.org>.
flyrain commented on code in PR #5783:
URL: https://github.com/apache/iceberg/pull/5783#discussion_r1008594453


##########
spark/v3.3/build.gradle:
##########
@@ -27,6 +27,16 @@ def sparkProjects = [
     project(":iceberg-spark:iceberg-spark-runtime-${sparkMajorVersion}_${scalaVersion}"),
 ]
 
+configure(sparkProjects) {
+  configurations {
+    all {
+      resolutionStrategy {
+        force 'com.fasterxml.jackson.core:jackson-databind:2.13.4.2'
+      }
+    }
+  }
+}

Review Comment:
   Thanks for the fix. The issue is that jackson-databind 2.13.4.1 depends on the non-existed lib, jackson-bom:2.13.4.1. I guess jackson team fixed that with jackson-databind 2.13.4.2



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5783: [WIP] Build: Update Spark to 3.3.1

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5783:
URL: https://github.com/apache/iceberg/pull/5783#discussion_r985836978


##########
build.gradle:
##########
@@ -96,6 +96,7 @@ allprojects {
   group = "org.apache.iceberg"
   version = projectVersion
   repositories {
+    maven { url  "https://repository.apache.org/content/repositories/orgapachespark-1418/" }

Review Comment:
   @wangyum The easiest way is to use IntelliJ and just run a single test from the UI:
   ![image](https://user-images.githubusercontent.com/1134248/193600165-46b40dfe-b352-4c67-87c8-a86d5afbfc52.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wangyum commented on a diff in pull request #5783: [WIP] Build: Update Spark to 3.3.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #5783:
URL: https://github.com/apache/iceberg/pull/5783#discussion_r985161505


##########
build.gradle:
##########
@@ -96,6 +96,7 @@ allprojects {
   group = "org.apache.iceberg"
   version = projectVersion
   repositories {
+    maven { url  "https://repository.apache.org/content/repositories/orgapachespark-1418/" }

Review Comment:
   @Fokko Do you know how to run a single test? I'd like to run these tests locally:
   ```
   org.apache.iceberg.spark.extensions.TestRewriteDataFilesProcedure > testRewriteDataFilesOnNonPartitionTable[catalogName = testhive, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hive, default-namespace=default}] FAILED
       java.lang.AssertionError: Action should rewrite 10 data files and add 1 data files: row 1 col 1 contents should match expected:<10> but was:<7>
           at org.junit.Assert.fail(Assert.java:89)
           at org.junit.Assert.failNotEquals(Assert.java:835)
           at org.junit.Assert.assertEquals(Assert.java:120)
           at org.apache.iceberg.spark.SparkTestBase.assertEquals(SparkTestBase.java:181)
           at org.apache.iceberg.spark.SparkTestBase.assertEquals(SparkTestBase.java:163)
           at org.apache.iceberg.spark.extensions.TestRewriteDataFilesProcedure.testRewriteDataFilesOnNonPartitionTable(TestRewriteDataFilesProcedure.java:104)
   
   org.apache.iceberg.spark.extensions.TestRewriteDataFilesProcedure > testRewriteDataFilesWithZOrder[catalogName = testhive, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hive, default-namespace=default}] FAILED
       java.lang.AssertionError: Action should rewrite 10 data files and add 1 data files: row 1 col 1 contents should match expected:<10> but was:<7>
           at org.junit.Assert.fail(Assert.java:89)
           at org.junit.Assert.failNotEquals(Assert.java:835)
           at org.junit.Assert.assertEquals(Assert.java:120)
           at org.apache.iceberg.spark.SparkTestBase.assertEquals(SparkTestBase.java:181)
           at org.apache.iceberg.spark.SparkTestBase.assertEquals(SparkTestBase.java:163)
           at org.apache.iceberg.spark.extensions.TestRewriteDataFilesProcedure.testRewriteDataFilesWithZOrder(TestRewriteDataFilesProcedure.java:171)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wangyum commented on a diff in pull request #5783: Build: Update Spark to 3.3.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #5783:
URL: https://github.com/apache/iceberg/pull/5783#discussion_r1006285951


##########
spark/v3.3/build.gradle:
##########
@@ -27,6 +27,16 @@ def sparkProjects = [
     project(":iceberg-spark:iceberg-spark-runtime-${sparkMajorVersion}_${scalaVersion}"),
 ]
 
+configure(sparkProjects) {
+  configurations {
+    all {
+      resolutionStrategy {
+        force 'com.fasterxml.jackson.core:jackson-databind:2.13.4.2'
+      }
+    }
+  }
+}

Review Comment:
   To fix:
   ```
   ```
   > Could not resolve all files for configuration ':iceberg-spark:iceberg-spark-3.3_2.12:compileClasspath'.
   
      > Could not find com.fasterxml.jackson:jackson-bom:2.13.4.1.
        Searched in the following locations:
          - https://repository.apache.org/content/repositories/orgapachespark-1430/com/fasterxml/jackson/jackson-bom/2.13.4.1/jackson-bom-2.13.4.1.pom
          - https://repo.maven.apache.org/maven2/com/fasterxml/jackson/jackson-bom/2.13.4.1/jackson-bom-2.13.4.1.pom
          - file:/home/runner/.m2/repository/com/fasterxml/jackson/jackson-bom/2.13.4.1/jackson-bom-2.13.4.1.pom
        Required by:
            project :iceberg-spark:iceberg-spark-3.3_2.12 > org.apache.avro:avro:1.11.1 > com.fasterxml.jackson.core:jackson-databind:2.13.4.1
      > Could not find com.fasterxml.jackson:jackson-bom:2.13.4.1.
        Searched in the following locations:
          - https://repository.apache.org/content/repositories/orgapachespark-1430/com/fasterxml/jackson/jackson-bom/2.13.4.1/jackson-bom-2.13.4.1.pom
          - https://repo.maven.apache.org/maven2/com/fasterxml/jackson/jackson-bom/2.13.4.1/jackson-bom-2.13.4.1.pom
          - file:/home/runner/.m2/repository/com/fasterxml/jackson/jackson-bom/2.13.4.1/jackson-bom-2.13.4.1.pom
        Required by:
            project :iceberg-spark:iceberg-spark-3.3_2.12 > org.apache.arrow:arrow-vector:7.0.0 > com.fasterxml.jackson.core:jackson-annotations:2.13.4
            project :iceberg-spark:iceberg-spark-3.3_2.12 > org.apache.avro:avro:1.11.1 > com.fasterxml.jackson.core:jackson-core:2.13.4
   Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.
   ```
   ```



##########
spark/v3.3/build.gradle:
##########
@@ -27,6 +27,16 @@ def sparkProjects = [
     project(":iceberg-spark:iceberg-spark-runtime-${sparkMajorVersion}_${scalaVersion}"),
 ]
 
+configure(sparkProjects) {
+  configurations {
+    all {
+      resolutionStrategy {
+        force 'com.fasterxml.jackson.core:jackson-databind:2.13.4.2'
+      }
+    }
+  }
+}

Review Comment:
   To fix:
   ```
   > Could not resolve all files for configuration ':iceberg-spark:iceberg-spark-3.3_2.12:compileClasspath'.
   
      > Could not find com.fasterxml.jackson:jackson-bom:2.13.4.1.
        Searched in the following locations:
          - https://repository.apache.org/content/repositories/orgapachespark-1430/com/fasterxml/jackson/jackson-bom/2.13.4.1/jackson-bom-2.13.4.1.pom
          - https://repo.maven.apache.org/maven2/com/fasterxml/jackson/jackson-bom/2.13.4.1/jackson-bom-2.13.4.1.pom
          - file:/home/runner/.m2/repository/com/fasterxml/jackson/jackson-bom/2.13.4.1/jackson-bom-2.13.4.1.pom
        Required by:
            project :iceberg-spark:iceberg-spark-3.3_2.12 > org.apache.avro:avro:1.11.1 > com.fasterxml.jackson.core:jackson-databind:2.13.4.1
      > Could not find com.fasterxml.jackson:jackson-bom:2.13.4.1.
        Searched in the following locations:
          - https://repository.apache.org/content/repositories/orgapachespark-1430/com/fasterxml/jackson/jackson-bom/2.13.4.1/jackson-bom-2.13.4.1.pom
          - https://repo.maven.apache.org/maven2/com/fasterxml/jackson/jackson-bom/2.13.4.1/jackson-bom-2.13.4.1.pom
          - file:/home/runner/.m2/repository/com/fasterxml/jackson/jackson-bom/2.13.4.1/jackson-bom-2.13.4.1.pom
        Required by:
            project :iceberg-spark:iceberg-spark-3.3_2.12 > org.apache.arrow:arrow-vector:7.0.0 > com.fasterxml.jackson.core:jackson-annotations:2.13.4
            project :iceberg-spark:iceberg-spark-3.3_2.12 > org.apache.avro:avro:1.11.1 > com.fasterxml.jackson.core:jackson-core:2.13.4
   Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wangyum commented on pull request #5783: Build: Update Spark to 3.3.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on PR #5783:
URL: https://github.com/apache/iceberg/pull/5783#issuecomment-1292794181

   @bryanck The performance regression issue have been fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org