You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zeruibao (via GitHub)" <gi...@apache.org> on 2023/10/25 16:59:42 UTC

[PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

zeruibao opened a new pull request, #43530:
URL: https://github.com/apache/spark/pull/43530

   ### What changes were proposed in this pull request?
   Fix slowdown in Avro read. There is a https://github.com/apache/spark/pull/42503 causes the performance regression. It seems that creating an `AvroOptions` inside `toSqlType` is very expensive. Try to pass this in the callstack.
   
   ### Why are the changes needed?
   Need to fix the performance regression of Avro read.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Existing UT test
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "zeruibao (via GitHub)" <gi...@apache.org>.
zeruibao commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1374001055


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -49,13 +49,13 @@ object SchemaConverters {
   /**
    * Converts an Avro schema to a corresponding Spark SQL schema.
    *
-   * @since 2.4.0
+   * @since 4.0.0
    */
-  def toSqlType(avroSchema: Schema): SchemaType = {

Review Comment:
   sure!



##########
project/MimaExcludes.scala:
##########
@@ -45,7 +45,9 @@ object MimaExcludes {
     // [SPARK-45427][CORE] Add RPC SSL settings to SSLOptions and SparkTransportConf
     ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.network.netty.SparkTransportConf.fromSparkConf"),
     // [SPARK-45136][CONNECT] Enhance ClosureCleaner with Ammonite support
-    ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.util.MethodIdentifier$")
+    ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.util.MethodIdentifier$"),
+    // [SPARK-43380][SQL] Fix slowdown in Avro read
+    ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.avro.SchemaConverters.toSqlType")

Review Comment:
   sure!!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1373994295


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -49,13 +49,13 @@ object SchemaConverters {
   /**
    * Converts an Avro schema to a corresponding Spark SQL schema.
    *
-   * @since 2.4.0
+   * @since 2.5.0

Review Comment:
   ```suggestion
      * @since 4.0.0
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1372566366


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -51,11 +51,14 @@ object SchemaConverters {
    *
    * @since 2.4.0

Review Comment:
   let's not mess up the API doc here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1373994997


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -49,13 +49,13 @@ object SchemaConverters {
   /**
    * Converts an Avro schema to a corresponding Spark SQL schema.
    *
-   * @since 2.4.0
+   * @since 4.0.0
    */
-  def toSqlType(avroSchema: Schema): SchemaType = {

Review Comment:
   actually this is a public API and we can't simply remove it. Shall we add it back and make it just call `toSqlType(avroSchema, false)`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #43530:
URL: https://github.com/apache/spark/pull/43530#issuecomment-1782383483

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1372757325


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -51,11 +51,14 @@ object SchemaConverters {
    *
    * @since 2.4.0
    */
+  def toSqlType(avroSchema: Schema, useStableIdForUnionType: Boolean): SchemaType = {
+    toSqlTypeHelper(avroSchema, Set.empty, useStableIdForUnionType)
+  }
   def toSqlType(avroSchema: Schema): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()))
+    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()).useStableIdForUnionType)

Review Comment:
   Could we avoid the usage `AvroOptions(Map()).useStableIdForUnionType`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1372567060


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -51,11 +51,14 @@ object SchemaConverters {
    *
    * @since 2.4.0
    */
+  def toSqlType(avroSchema: Schema, useStableIdForUnionType: Boolean): SchemaType = {
+    toSqlTypeHelper(avroSchema, Set.empty, useStableIdForUnionType)
+  }
   def toSqlType(avroSchema: Schema): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()))
+    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()).useStableIdForUnionType)
   }
   def toSqlType(avroSchema: Schema, options: Map[String, String]): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(options))

Review Comment:
   so this is the place that caused perf issue before?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "zeruibao (via GitHub)" <gi...@apache.org>.
zeruibao commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1372573571


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -51,11 +51,14 @@ object SchemaConverters {
    *
    * @since 2.4.0
    */
+  def toSqlType(avroSchema: Schema, useStableIdForUnionType: Boolean): SchemaType = {
+    toSqlTypeHelper(avroSchema, Set.empty, useStableIdForUnionType)
+  }
   def toSqlType(avroSchema: Schema): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()))
+    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()).useStableIdForUnionType)
   }
   def toSqlType(avroSchema: Schema, options: Map[String, String]): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(options))

Review Comment:
   Yes, from the flame graph. We can know that create a `AvroOptions` is very expensive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "zeruibao (via GitHub)" <gi...@apache.org>.
zeruibao commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1372688209


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -51,11 +51,14 @@ object SchemaConverters {
    *
    * @since 2.4.0
    */
+  def toSqlType(avroSchema: Schema, useStableIdForUnionType: Boolean): SchemaType = {
+    toSqlTypeHelper(avroSchema, Set.empty, useStableIdForUnionType)
+  }
   def toSqlType(avroSchema: Schema): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()))
+    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()).useStableIdForUnionType)
   }
   def toSqlType(avroSchema: Schema, options: Map[String, String]): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(options))

Review Comment:
   Oh, It seems that there is no caller for this `def toSqlType` any more. Only in test suite. Shall I remove this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "zeruibao (via GitHub)" <gi...@apache.org>.
zeruibao commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1372683246


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -51,11 +51,14 @@ object SchemaConverters {
    *
    * @since 2.4.0
    */
+  def toSqlType(avroSchema: Schema, useStableIdForUnionType: Boolean): SchemaType = {
+    toSqlTypeHelper(avroSchema, Set.empty, useStableIdForUnionType)
+  }
   def toSqlType(avroSchema: Schema): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()))
+    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()).useStableIdForUnionType)
   }
   def toSqlType(avroSchema: Schema, options: Map[String, String]): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(options))

Review Comment:
   There is a method called `toCatalystType` that call this `def toSqlType variant`. Shall I also change that one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "RunyaoChen (via GitHub)" <gi...@apache.org>.
RunyaoChen commented on PR #43530:
URL: https://github.com/apache/spark/pull/43530#issuecomment-1779739884

   The PR description mentions  https://github.com/apache/spark/pull/42503 causes the performance regression, while the 42503 itself is a fix to another perf regression.
   
   I think we need some careful review here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "RunyaoChen (via GitHub)" <gi...@apache.org>.
RunyaoChen commented on PR #43530:
URL: https://github.com/apache/spark/pull/43530#issuecomment-1779741732

   @cloud-fan Could you help review given that you reviewed the https://github.com/apache/spark/pull/42503 that this PR is trying to fix


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "zeruibao (via GitHub)" <gi...@apache.org>.
zeruibao commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1372688525


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -51,11 +51,14 @@ object SchemaConverters {
    *
    * @since 2.4.0

Review Comment:
   I see!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan closed pull request #43530: [SPARK-43380][SQL] Fix slowdown in Avro read
URL: https://github.com/apache/spark/pull/43530


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "zeruibao (via GitHub)" <gi...@apache.org>.
zeruibao commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1373499976


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -51,11 +51,14 @@ object SchemaConverters {
    *
    * @since 2.4.0
    */
+  def toSqlType(avroSchema: Schema, useStableIdForUnionType: Boolean): SchemaType = {
+    toSqlTypeHelper(avroSchema, Set.empty, useStableIdForUnionType)
+  }
   def toSqlType(avroSchema: Schema): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()))
+    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()).useStableIdForUnionType)

Review Comment:
   sure, jut completely remove this method now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "zeruibao (via GitHub)" <gi...@apache.org>.
zeruibao commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1372573204


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -51,11 +51,14 @@ object SchemaConverters {
    *
    * @since 2.4.0

Review Comment:
   You mean I need to change it to since 2.5.0?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1372716678


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -51,11 +51,14 @@ object SchemaConverters {
    *
    * @since 2.4.0
    */
+  def toSqlType(avroSchema: Schema, useStableIdForUnionType: Boolean): SchemaType = {
+    toSqlTypeHelper(avroSchema, Set.empty, useStableIdForUnionType)
+  }
   def toSqlType(avroSchema: Schema): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()))
+    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()).useStableIdForUnionType)
   }
   def toSqlType(avroSchema: Schema, options: Map[String, String]): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(options))

Review Comment:
   yea let's remove



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1372662555


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -51,11 +51,14 @@ object SchemaConverters {
    *
    * @since 2.4.0
    */
+  def toSqlType(avroSchema: Schema, useStableIdForUnionType: Boolean): SchemaType = {
+    toSqlTypeHelper(avroSchema, Set.empty, useStableIdForUnionType)
+  }
   def toSqlType(avroSchema: Schema): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()))
+    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()).useStableIdForUnionType)
   }
   def toSqlType(avroSchema: Schema, options: Map[String, String]): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(options))

Review Comment:
   Do we still have callers for this `def toSqlType` variant?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "zeruibao (via GitHub)" <gi...@apache.org>.
zeruibao commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1372683246


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -51,11 +51,14 @@ object SchemaConverters {
    *
    * @since 2.4.0
    */
+  def toSqlType(avroSchema: Schema, useStableIdForUnionType: Boolean): SchemaType = {
+    toSqlTypeHelper(avroSchema, Set.empty, useStableIdForUnionType)
+  }
   def toSqlType(avroSchema: Schema): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()))
+    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(Map()).useStableIdForUnionType)
   }
   def toSqlType(avroSchema: Schema, options: Map[String, String]): SchemaType = {
-    toSqlTypeHelper(avroSchema, Set.empty, AvroOptions(options))

Review Comment:
   There is a method called `toCatalystType` that call this `def toSqlType variant`. Shall I also change that one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1372662044


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##########
@@ -51,11 +51,14 @@ object SchemaConverters {
    *
    * @since 2.4.0

Review Comment:
   no, this API doc is for `def toSqlType(avroSchema: Schema)` but not the new `def toSqlType(avroSchema: Schema, useStableIdForUnionType: Boolean)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43530:
URL: https://github.com/apache/spark/pull/43530#discussion_r1373995261


##########
project/MimaExcludes.scala:
##########
@@ -45,7 +45,9 @@ object MimaExcludes {
     // [SPARK-45427][CORE] Add RPC SSL settings to SSLOptions and SparkTransportConf
     ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.network.netty.SparkTransportConf.fromSparkConf"),
     // [SPARK-45136][CONNECT] Enhance ClosureCleaner with Ammonite support
-    ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.util.MethodIdentifier$")
+    ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.util.MethodIdentifier$"),
+    // [SPARK-43380][SQL] Fix slowdown in Avro read
+    ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.avro.SchemaConverters.toSqlType")

Review Comment:
   this change is no longer needed if we add it back.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43380][SQL] Fix slowdown in Avro read [spark]

Posted by "zeruibao (via GitHub)" <gi...@apache.org>.
zeruibao commented on PR #43530:
URL: https://github.com/apache/spark/pull/43530#issuecomment-1782367680

   All tests passed. Should be good to merge @cloud-fan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org