You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2014/10/16 06:12:34 UTC
[jira] [Updated] (TEZ-1177) HIve Tez job stalls out
[ https://issues.apache.org/jira/browse/TEZ-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hitesh Shah updated TEZ-1177:
-----------------------------
Fix Version/s: (was: 0.4.0)
> HIve Tez job stalls out
> -----------------------
>
> Key: TEZ-1177
> URL: https://issues.apache.org/jira/browse/TEZ-1177
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.4.0
> Environment: HDP 2.1 21 data node cluster 128 GB RAM/node
> Reporter: Douglas Moore
> Attachments: task_failed.core-dump.1.txt, task_failed.spill-error.1, task_failed.spill-error.2, task_failed.spill-error.3, tez-jira-issue.tar.gz
>
>
> I'm on HDP 2.1 running Hive 0.13 job that has created a 3 stage job.
> 338 Maps, 45 Reducers, then 1 Reducer
> The job churns through stange 1, long pause, churns through stage 2, longer pause at which point I kill the job.
> In looking through the log, I notice a lot of ShuffleSchedule copy operations and the performance slows to 9MB/sec from some at over 1000 MB/sec. I'm on a very beefy cluster, with high bandwidth network. What's going on here?
> My ultimate goal is to get `select count(distinct ss_ticket_number) from store_sales` to run at a reasonable pace.
> Will attach log files, query, explain plan etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)