You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2015/12/15 23:18:47 UTC

[jira] [Commented] (TEZ-3002) Does Tez run slower than hive on larger dataset (~2.5 TB)?

    [ https://issues.apache.org/jira/browse/TEZ-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058961#comment-15058961 ] 

Hitesh Shah commented on TEZ-3002:
----------------------------------

Moving this jira to Hive for now. It will be good to start the discussion there as this pertains to Hive regardless of whether it uses MR or Tez as its internal execution engine. 

FWIW, issues such as this are usually better off being raised on the Hive user mailing list for discussion/analysis and a bug created later based on the findings. 

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> ----------------------------------------------------------
>
>                 Key: TEZ-3002
>                 URL: https://issues.apache.org/jira/browse/TEZ-3002
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, we are getting 30% performance boost over Hive on smaller data set(1-10 GB) but Hive starts to perform better than Tez as data size increases. Like when we run a hive query with Tez on about 2.3 TB worth of data, it performs worse than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=10000; set tez.am.resource.memory.mb=59205; set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=36700160000;
> Is it normal or I am missing some property / not configuring some property properly? Also, I am using an older version of Tez as of now. Could that be the issue too? I still have to bootstrap latest version of Tez on EMR and test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)