You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/01 14:39:13 UTC

[GitHub] [spark] mgaido91 opened a new pull request #24505: [SPARK-27607][SQL] Improve Row.toString performance

mgaido91 opened a new pull request #24505: [SPARK-27607][SQL] Improve Row.toString performance
URL: https://github.com/apache/spark/pull/24505
 
 
   ## What changes were proposed in this pull request?
   
   `Row.toString` is currently causing the useless creation of an `Array` containing all the values in the row before generating the string containing it. This operation adds a considerable overhead.
   
   The PR proposes to avoid this operation in order to get a faster implementation.
   
   ## How was this patch tested?
   
   Run
   
   ```
   test("Row toString perf test") {
       val n = 100000
       val rows = (1 to n).map { i =>
         Row(i, i.toDouble, i.toString, i.toShort, true, null)
       }
       // warmup
       (1 to 10).foreach { _ => rows.foreach(_.toString) }
   
       val times = (1 to 100).map { _ =>
         val t0 = System.nanoTime()
         rows.foreach(_.toString)
         val t1 = System.nanoTime()
         t1 - t0
       }
       // scalastyle:off println
       println(s"Avg time on ${times.length} iterations for $n toString:" +
         s" ${times.sum.toDouble / times.length / 1e6} ms")
       // scalastyle:on println
     }
   ```
   Before the PR:
   ```
   Avg time on 100 iterations for 100000 toString: 61.08408419 ms
   ```
   After the PR:
   ```
   Avg time on 100 iterations for 100000 toString: 48.18608 ms
   ```
   This means the new implementation is about 1.27X faster than the original one.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org