You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/06/30 19:19:00 UTC
[jira] [Commented] (SPARK-24706) Support ByteType and ShortType
pushdown to parquet
[ https://issues.apache.org/jira/browse/SPARK-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528866#comment-16528866 ]
Apache Spark commented on SPARK-24706:
--------------------------------------
User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/21682
> Support ByteType and ShortType pushdown to parquet
> --------------------------------------------------
>
> Key: SPARK-24706
> URL: https://issues.apache.org/jira/browse/SPARK-24706
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.0
> Reporter: Yuming Wang
> Priority: Major
>
> Benchmark result:
> {noformat}
> ###############################[ Pushdown benchmark for tinyint ]################################
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
> Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
> Select 1 tinyint row (value = CAST(63 AS tinyint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------
> Parquet Vectorized 4307 / 4575 3.7 273.8 1.0X
> Parquet Vectorized (Pushdown) 227 / 241 69.4 14.4 19.0X
> Native ORC Vectorized 3646 / 3727 4.3 231.8 1.2X
> Native ORC Vectorized (Pushdown) 736 / 744 21.4 46.8 5.9X
> Select 10% tinyint rows (value < 12): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------
> Parquet Vectorized 5209 / 5843 3.0 331.2 1.0X
> Parquet Vectorized (Pushdown) 1296 / 1759 12.1 82.4 4.0X
> Native ORC Vectorized 4455 / 4594 3.5 283.2 1.2X
> Native ORC Vectorized (Pushdown) 1736 / 1813 9.1 110.4 3.0X
> Select 50% tinyint rows (value < 63): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------
> Parquet Vectorized 8362 / 8394 1.9 531.7 1.0X
> Parquet Vectorized (Pushdown) 6303 / 6530 2.5 400.7 1.3X
> Native ORC Vectorized 7962 / 8113 2.0 506.2 1.1X
> Native ORC Vectorized (Pushdown) 6680 / 7556 2.4 424.7 1.3X
> Select 90% tinyint rows (value < 114): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------
> Parquet Vectorized 11572 / 11715 1.4 735.7 1.0X
> Parquet Vectorized (Pushdown) 11198 / 11326 1.4 712.0 1.0X
> Native ORC Vectorized 11041 / 11209 1.4 702.0 1.0X
> Native ORC Vectorized (Pushdown) 11104 / 11472 1.4 706.0 1.0X
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org