You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ke Jia (JIRA)" <ji...@apache.org> on 2017/07/27 03:42:00 UTC

[jira] [Comment Edited] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

    [ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095796#comment-16095796 ] 

Ke Jia edited comment on HIVE-17139 at 7/27/17 3:41 AM:
--------------------------------------------------------

With this patch, I test "select case when a=1 then trim(b) end from test_orc_5000" in my development machine. The data scale is almost 50 million records in table test_orc_5000(a int, b string) stored as ORC. The execution engine is spark. I do three experiments and the average value is as below table. The result shows the execution time of spark from 35.76s to 32.57s, the time cost of VectorSelectOperator from 3.12s to 0.89s and the count of then expression evaluation from 49999735 to 5000712.

||  ||Non-optimization||Optimization||Improvement||
|Hos|35.76s|32.57s|8.9%|
|VectorSelectOperator|3.12s|0.89s|71.5%|
|count|49999735|5000712|8.99%|


						
			

			




was (Author: jk_self):
With this patch, I test "select case when a=1 then trim(b) end from test_orc_5000" in my development machine. The data scale is almost 50 million records in table test_orc_5000(a int, b string) stored as ORC. The execution engine is spark. I do three experiments and the average value is as below table. The result shows the execution time of spark from 35.76s to 32.57s, the time cost of VectorSelectOperator from 3.12s to 0.89s and the count of then expression evaluation from 49999735 to 5000712.

||  ||Non-optimization||Optimization||Improvement||
|Hos|35.76s|32.57s|8.9%|
|VectorSelectOperator|3.12s|0.89s|7.15%|
|count|49999735|5000712|8.99%|


						
			

			



> Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-17139
>                 URL: https://issues.apache.org/jira/browse/HIVE-17139
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ke Jia
>            Assignee: Ke Jia
>         Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, HIVE-17139.3.patch
>
>
> The case when and if statement execution for Hive vectorization is not optimal, which all the conditional and else expressions are evaluated for current implementation. The optimized approach is to update the selected array of batch parameter after the conditional expression is executed. Then the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)