You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "alexeykudinkin (via GitHub)" <gi...@apache.org> on 2023/02/01 22:15:57 UTC

[GitHub] [hudi] alexeykudinkin opened a new pull request, #7821: [MINOR] Fixing Kryo being instantiated w/ invalid `SparkConf`

alexeykudinkin opened a new pull request, #7821:
URL: https://github.com/apache/hudi/pull/7821

   ### Change Logs
   
   TBA
   
   ### Impact
   
   TBA
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #7821: [HUDI-5681] Fixing Kryo being instantiated w/ invalid `SparkConf`

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on code in PR #7821:
URL: https://github.com/apache/hudi/pull/7821#discussion_r1093906025


##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala:
##########
@@ -328,7 +328,7 @@ case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends Hoodie
       }).toMap
     // Serialize the Map[UpdateCondition, UpdateAssignments] to base64 string
     val serializedUpdateConditionAndExpressions = Base64.getEncoder
-      .encodeToString(SerDeUtils.toBytes(updateConditionToAssignments))
+      .encodeToString(Serializer.toBytes(updateConditionToAssignments))

Review Comment:
   Does this work for all Spark versions?



##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/payload/ExpressionPayload.scala:
##########
@@ -455,5 +456,50 @@ object ExpressionPayload {
             field.schema, field.doc, field.defaultVal, field.order))
     Schema.createRecord(a.getName, a.getDoc, a.getNamespace, a.isError, mergedFields.asJava)
   }
+
+
+  /**
+   * This object differs from Hudi's generic [[SerializationUtils]] in its ability to serialize
+   * Spark's internal structures (various [[Expression]]s)
+   *
+   * For that purpose we re-use Spark's [[KryoSerializer]] instance sharing configuration
+   * with enclosing [[SparkEnv]]. This is necessary to make sure that this particular instance of Kryo
+   * user for serialization of Spark's internal structures (like [[Expression]]s) is configured
+   * appropriately (class-loading, custom serializers, etc)
+   *
+   * TODO rebase on Spark's SerializerSupport
+   */
+  private[hudi] object Serializer {
+

Review Comment:
   Have you tested this on all Spark versions (Spark 2.4, 3.1, 3.2, 3.3) in cluster environment (multiple nodes)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7821: [HUDI-5681] Fixing Kryo being instantiated w/ invalid `SparkConf`

Posted by "alexeykudinkin (via GitHub)" <gi...@apache.org>.
alexeykudinkin commented on code in PR #7821:
URL: https://github.com/apache/hudi/pull/7821#discussion_r1093983668


##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/payload/ExpressionPayload.scala:
##########
@@ -455,5 +456,50 @@ object ExpressionPayload {
             field.schema, field.doc, field.defaultVal, field.order))
     Schema.createRecord(a.getName, a.getDoc, a.getNamespace, a.isError, mergedFields.asJava)
   }
+
+
+  /**
+   * This object differs from Hudi's generic [[SerializationUtils]] in its ability to serialize
+   * Spark's internal structures (various [[Expression]]s)
+   *
+   * For that purpose we re-use Spark's [[KryoSerializer]] instance sharing configuration
+   * with enclosing [[SparkEnv]]. This is necessary to make sure that this particular instance of Kryo
+   * user for serialization of Spark's internal structures (like [[Expression]]s) is configured
+   * appropriately (class-loading, custom serializers, etc)
+   *
+   * TODO rebase on Spark's SerializerSupport
+   */
+  private[hudi] object Serializer {
+

Review Comment:
   Will check



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7821: [HUDI-5681] Fixing Kryo being instantiated w/ invalid `SparkConf`

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7821:
URL: https://github.com/apache/hudi/pull/7821#issuecomment-1413044543

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e99119a0314f601d87668aeac8b048730415c919",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14853",
       "triggerID" : "e99119a0314f601d87668aeac8b048730415c919",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e99119a0314f601d87668aeac8b048730415c919 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14853) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7821: [HUDI-5681] Fixing Kryo being instantiated w/ invalid `SparkConf`

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7821:
URL: https://github.com/apache/hudi/pull/7821#issuecomment-1412892325

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e99119a0314f601d87668aeac8b048730415c919",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14853",
       "triggerID" : "e99119a0314f601d87668aeac8b048730415c919",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e99119a0314f601d87668aeac8b048730415c919 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14853) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin merged pull request #7821: [HUDI-5681] Fixing Kryo being instantiated w/ invalid `SparkConf`

Posted by "alexeykudinkin (via GitHub)" <gi...@apache.org>.
alexeykudinkin merged PR #7821:
URL: https://github.com/apache/hudi/pull/7821


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7821: [HUDI-5681] Fixing Kryo being instantiated w/ invalid `SparkConf`

Posted by "alexeykudinkin (via GitHub)" <gi...@apache.org>.
alexeykudinkin commented on code in PR #7821:
URL: https://github.com/apache/hudi/pull/7821#discussion_r1093798048


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##########
@@ -188,7 +188,6 @@ trait ProvidesHoodieConfig extends Logging {
         PRECOMBINE_FIELD.key -> preCombineField,
         PARTITIONPATH_FIELD.key -> partitionFieldsStr,
         PAYLOAD_CLASS_NAME.key -> payloadClassName,
-        HoodieWriteConfig.COMBINE_BEFORE_INSERT.key -> String.valueOf(hasPrecombineColumn),

Review Comment:
   Stacked on top of another (for testing), will be cleaned up



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7821: [HUDI-5681] Fixing Kryo being instantiated w/ invalid `SparkConf`

Posted by "alexeykudinkin (via GitHub)" <gi...@apache.org>.
alexeykudinkin commented on code in PR #7821:
URL: https://github.com/apache/hudi/pull/7821#discussion_r1094049841


##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/payload/ExpressionPayload.scala:
##########
@@ -455,5 +456,50 @@ object ExpressionPayload {
             field.schema, field.doc, field.defaultVal, field.order))
     Schema.createRecord(a.getName, a.getDoc, a.getNamespace, a.isError, mergedFields.asJava)
   }
+
+
+  /**
+   * This object differs from Hudi's generic [[SerializationUtils]] in its ability to serialize
+   * Spark's internal structures (various [[Expression]]s)
+   *
+   * For that purpose we re-use Spark's [[KryoSerializer]] instance sharing configuration
+   * with enclosing [[SparkEnv]]. This is necessary to make sure that this particular instance of Kryo
+   * user for serialization of Spark's internal structures (like [[Expression]]s) is configured
+   * appropriately (class-loading, custom serializers, etc)
+   *
+   * TODO rebase on Spark's SerializerSupport
+   */
+  private[hudi] object Serializer {
+

Review Comment:
   Checked Spark 3.1, 3.2 and 3.3, working fine



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] vinothchandar commented on a diff in pull request #7821: [MINOR] Fixing Kryo being instantiated w/ invalid `SparkConf`

Posted by "vinothchandar (via GitHub)" <gi...@apache.org>.
vinothchandar commented on code in PR #7821:
URL: https://github.com/apache/hudi/pull/7821#discussion_r1093790583


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##########
@@ -188,7 +188,6 @@ trait ProvidesHoodieConfig extends Logging {
         PRECOMBINE_FIELD.key -> preCombineField,
         PARTITIONPATH_FIELD.key -> partitionFieldsStr,
         PAYLOAD_CLASS_NAME.key -> payloadClassName,
-        HoodieWriteConfig.COMBINE_BEFORE_INSERT.key -> String.valueOf(hasPrecombineColumn),

Review Comment:
   why do we change this file for this PR?



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/SerDeUtils.scala:
##########
@@ -18,18 +18,28 @@
 package org.apache.spark.sql.hudi
 
 import org.apache.hudi.common.util.BinaryUtil
-import org.apache.spark.SparkConf
+import org.apache.spark.internal.config.Kryo.KRYO_USE_POOL
+import org.apache.spark.{SparkConf, SparkEnv}
 import org.apache.spark.serializer.{KryoSerializer, SerializerInstance}
 
 import java.nio.ByteBuffer
 
 
+// TODO merge w/ SerializationUtils
 object SerDeUtils {
 
-  private val SERIALIZER_THREAD_LOCAL = new ThreadLocal[SerializerInstance] {
+  private lazy val conf = {
+    val conf = Option(SparkEnv.get)
+      // TODO elaborate

Review Comment:
   fix comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7821: [HUDI-5681] Fixing Kryo being instantiated w/ invalid `SparkConf`

Posted by "alexeykudinkin (via GitHub)" <gi...@apache.org>.
alexeykudinkin commented on code in PR #7821:
URL: https://github.com/apache/hudi/pull/7821#discussion_r1093983753


##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala:
##########
@@ -328,7 +328,7 @@ case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends Hoodie
       }).toMap
     // Serialize the Map[UpdateCondition, UpdateAssignments] to base64 string
     val serializedUpdateConditionAndExpressions = Base64.getEncoder
-      .encodeToString(SerDeUtils.toBytes(updateConditionToAssignments))
+      .encodeToString(Serializer.toBytes(updateConditionToAssignments))

Review Comment:
   What exactly are you referring to?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7821: [HUDI-5681] Fixing Kryo being instantiated w/ invalid `SparkConf`

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7821:
URL: https://github.com/apache/hudi/pull/7821#issuecomment-1412877824

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e99119a0314f601d87668aeac8b048730415c919",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e99119a0314f601d87668aeac8b048730415c919",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e99119a0314f601d87668aeac8b048730415c919 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org