You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2021/12/14 18:18:00 UTC

[jira] [Commented] (SPARK-37646) Avoid touching Scala reflection APIs in the lit function

    [ https://issues.apache.org/jira/browse/SPARK-37646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459394#comment-17459394 ] 

Apache Spark commented on SPARK-37646:
--------------------------------------

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/34901

> Avoid touching Scala reflection APIs in the lit function
> --------------------------------------------------------
>
>                 Key: SPARK-37646
>                 URL: https://issues.apache.org/jira/browse/SPARK-37646
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Shixiong Zhu
>            Assignee: Shixiong Zhu
>            Priority: Major
>
> Currently lit is slow when the concurrency is high as it needs to hit the Scala reflection code which hits global locks. For example, running the following test locally using Spark 3.2 shows the difference:
> {code:java}
> scala> :paste
> // Entering paste mode (ctrl-D to finish)import org.apache.spark.sql.functions._
> import org.apache.spark.sql.Column
> import org.apache.spark.sql.catalyst.expressions.Literalval parallelism = 50def testLiteral(): Unit = {
>   val ts = for (_ <- 0 until parallelism) yield {
>     new Thread() {
>       override def run() {
>          for (_ <- 0 until 50) {
>           new Column(Literal(0L))
>         }
>       }
>     }
>   }
>   ts.foreach(_.start())
>   ts.foreach(_.join())
> }def testLit(): Unit = {
>   val ts = for (_ <- 0 until parallelism) yield {
>     new Thread() {
>       override def run() {
>          for (_ <- 0 until 50) {
>           lit(0L)
>         }
>       }
>     }
>   }
>   ts.foreach(_.start())
>   ts.foreach(_.join())
> }println("warmup")
> testLiteral()
> testLit()println("lit: false")
> spark.time {
>   testLiteral()
> }
> println("lit: true")
> spark.time {
>   testLit()
> }// Exiting paste mode, now interpreting.warmup
> lit: false
> Time taken: 8 ms
> lit: true
> Time taken: 682 ms
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.Column
> import org.apache.spark.sql.catalyst.expressions.Literal
> parallelism: Int = 50
> testLiteral: ()Unit
> testLit: ()Unit {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org