You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hiroshi Inoue (JIRA)" <ji...@apache.org> on 2016/06/30 17:29:10 UTC
[jira] [Created] (SPARK-16331) [SQL] Reduce code generation time
Hiroshi Inoue created SPARK-16331:
-------------------------------------
Summary: [SQL] Reduce code generation time
Key: SPARK-16331
URL: https://issues.apache.org/jira/browse/SPARK-16331
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.0.0, 2.1.0
Reporter: Hiroshi Inoue
During the code generation, a {{LocalRelation}} often has a huge {{Vector}} object as {{data}}. In the simple example below, a {{LocalRelation}} has a Vector with 1000000 elements of {{UnsafeRow}}.
{quote}
val numRows = 1000000
val ds = (1 to numRows).toDS().persist()
benchmark.addCase("filter+reduce") { iter =>
ds.filter(a => (a & 1) == 0).reduce(_ + _)
}
{quote}
At {{TreeNode.transformChildren}}, all elements of the vector is unnecessarily iterated to check whether any children exist in the vector since {{Vector}} is Traversable. This part significantly increases code generation time.
This patch avoids this overhead by checking the number of children before iterating all elements; {{LocalRelation}} does not have children since it extends {{LeafNode}}.
The performance of the above example
{quote}
without this patch
Java HotSpot(TM) 64-Bit Server VM 1.8.0_91-b14 on Mac OS X 10.11.5
Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
compilationTime: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
filter+reduce 4426 / 4533 0.2 4426.0 1.0X
with this patch
compilationTime: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
filter+reduce 3117 / 3391 0.3 3116.6 1.0X
{quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org