You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Rui Li (JIRA)" <ji...@apache.org> on 2016/05/05 01:24:12 UTC

[jira] [Commented] (HIVE-13634) Hive-on-Spark performed worse than Hive-on-MR, for queries with external scripts

    [ https://issues.apache.org/jira/browse/HIVE-13634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271760#comment-15271760 ] 

Rui Li commented on HIVE-13634:
-------------------------------

I'll look into this one.

> Hive-on-Spark performed worse than Hive-on-MR, for queries with external scripts
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-13634
>                 URL: https://issues.apache.org/jira/browse/HIVE-13634
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Xin Hao
>            Assignee: Rui Li
>
> Hive-on-Spark performed worse than Hive-on-MR, for queries with external scripts.
> For TPCx-BB Q2/Q3/Q4, they are Python Streaming related cases and will call external scripts to handle reduce tasks. We found that for these 3 queries Hive-on-Spark shows lower performance than Hive-on-MR when processing reduce tasks with external (Python) scripts. So ‘Improve HoS performance for queries with external scripts’ seems a performance optimization opportunity.
> The following shows the Q2/Q3/Q4 test result on 8-worker-node cluster with TPCx-BB 3TB data size.
> TPCx-BB Query 2
> (1)Hive-on-MR 
> Total Query Execution Time (sec): 2172.180
> Execution Time of External Scripts (sec): 736
> (2)Hive-on-Spark
> Total Query Execution Time (sec): 2283.604
> Execution Time of External Scripts (sec): 1197
> TPCx-BB Query 3
> (1)Hive-on-MR 
> Total Query Execution Time (sec): 1070.632
> Execution Time of External Scripts (sec): 513
> (2)Hive-on-Spark
> Total Query Execution Time (sec): 1287.679
> Execution Time of External Scripts (sec): 919
> TPCx-BB Query 4
> (1)Hive-on-MR 
> Total Query Execution Time (sec): 1781.864
> Execution Time of External Scripts (sec): 1518
> (2)Hive-on-Spark
> Total Query Execution Time (sec): 2028.023
> Execution Time of External Scripts (sec): 1599



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)