You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/27 06:07:18 UTC

[GitHub] [spark] LuciferYang opened a new pull request, #38811: [SPARK-41276][SQL][MLLIB]

LuciferYang opened a new pull request, #38811:
URL: https://github.com/apache/spark/pull/38811

   ### What changes were proposed in this pull request?
   There are two main ways to construct `StructType`:
   
   - Primary constructor
   
   ```scala
   case class StructType(fields: Array[StructField])
   ```
   
   - Use `Seq` as input constructor
   
   ```scala
   def apply(fields: Seq[StructField]): StructType = StructType(fields.toArray)
   ```
   
   These two construction methods are widely used in Spark, but the latter requires an additional collection conversion.
   
   This pr changes the following 3 scenarios to use primary constructor to reduce one collection conversion:
   
   1. For manually create `Seq` input scenes, change to use manually create `Array` input instead, for examaple:
   
   https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala#L55-L63
   
   2. For the scenario where 'toSeq' is added to create input for compatibility with Scala 2.13, directly call 'toArray' to instead, for example:
   
   https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L108-L113
   
   3. For scenes whose input is originally `Array`, remove the redundant `toSeq`, for example:
   
   https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L587-L592
   
   All changes involved are local variables, and the input array will not escape and be accidentally change, so it is safe.
   
   
   ### Why are the changes needed?
   Reduce unnecessary collection conversion.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   
   - Pass GitHub Actions
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38811: [SPARK-41276][SQL][ML][MLLIB][PROTOBUF][PYTHON][R][SS][AVRO] Optimize constructor use of `StructType`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #38811:
URL: https://github.com/apache/spark/pull/38811#issuecomment-1332223630

   Thanks @srowen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #38811: [SPARK-41276][SQL][ML][MLLIB][PROTOBUF][PYTHON][R][SS][AVRO] Optimize constructor use of `StructType`

Posted by GitBox <gi...@apache.org>.
srowen commented on PR #38811:
URL: https://github.com/apache/spark/pull/38811#issuecomment-1330750710

   I know, I'm just pointing out that if there were 100 callers if Seq, and 10 of Array, then it'd be simpler to swap the constructor impl , but I don't think that's the case


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #38811: [SPARK-41276][SQL][ML][MLLIB][PROTOBUF][PYTHON][R][SS][AVRO] Optimize constructor use of `StructType`

Posted by GitBox <gi...@apache.org>.
srowen commented on PR #38811:
URL: https://github.com/apache/spark/pull/38811#issuecomment-1330715216

   I assume it's less change to change these usages, rather than swap the implementation of the constructors so that the Array one calls the Seq one? OK if so


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen closed pull request #38811: [SPARK-41276][SQL][ML][MLLIB][PROTOBUF][PYTHON][R][SS][AVRO] Optimize constructor use of `StructType`

Posted by GitBox <gi...@apache.org>.
srowen closed pull request #38811: [SPARK-41276][SQL][ML][MLLIB][PROTOBUF][PYTHON][R][SS][AVRO] Optimize constructor use of `StructType`
URL: https://github.com/apache/spark/pull/38811


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #38811: [SPARK-41276][SQL][ML][MLLIB][PROTOBUF][PYTHON][R][SS][AVRO] Optimize constructor use of `StructType`

Posted by GitBox <gi...@apache.org>.
srowen commented on PR #38811:
URL: https://github.com/apache/spark/pull/38811#issuecomment-1332222466

   Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38811: [SPARK-41276][SQL][ML][MLLIB][PROTOBUF][PYTHON][R][SS][AVRO] Optimize constructor use of `StructType`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #38811:
URL: https://github.com/apache/spark/pull/38811#issuecomment-1330742255

   These Array are local variables, and they will not escape and be changed unexpectedly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38811: [SPARK-41276][SQL][ML][MLLIB][PROTOBUF][PYTHON][R][SS][AVRO] Optimize constructor use of `StructType`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #38811:
URL: https://github.com/apache/spark/pull/38811#issuecomment-1330766590

   For production code, Array one is used in 75 places and Seq one is used in 132 places before this pr
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38811: [SPARK-41276][SQL][ML][MLLIB][PROTOBUF][PYTHON][R][SS][AVRO] Optimize constructor use of `StructType`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #38811:
URL: https://github.com/apache/spark/pull/38811#issuecomment-1330739227

   The target is to call Array one directly, not Seq one (due to Seq one will also call `seq.toArray`) 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38811: [SPARK-41276][SQL][ML][MLLIB][PROTOBUF][PYTHON][R][SS][AVRO] Optimize constructor use of `StructType`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #38811:
URL: https://github.com/apache/spark/pull/38811#issuecomment-1330754811

   Yes, not that case
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org