You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2014/10/16 06:12:34 UTC

[jira] [Updated] (TEZ-1177) HIve Tez job stalls out

     [ https://issues.apache.org/jira/browse/TEZ-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hitesh Shah updated TEZ-1177:
-----------------------------
    Fix Version/s:     (was: 0.4.0)

> HIve Tez job stalls out
> -----------------------
>
>                 Key: TEZ-1177
>                 URL: https://issues.apache.org/jira/browse/TEZ-1177
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>         Environment: HDP 2.1 21 data node cluster 128 GB RAM/node
>            Reporter: Douglas Moore
>         Attachments: task_failed.core-dump.1.txt, task_failed.spill-error.1, task_failed.spill-error.2, task_failed.spill-error.3, tez-jira-issue.tar.gz
>
>
> I'm on HDP 2.1 running Hive 0.13 job that has created a 3 stage job.
> 338 Maps, 45 Reducers, then 1 Reducer
> The job churns through stange 1, long pause, churns through stage 2, longer pause at which point I kill the job.
> In looking through the log, I notice a lot of ShuffleSchedule copy operations and the performance slows to 9MB/sec from some at over 1000 MB/sec. I'm on a very beefy cluster, with high bandwidth network. What's going on here?
> My ultimate goal is to get `select count(distinct ss_ticket_number) from store_sales` to run at a reasonable pace.
> Will attach log files, query, explain plan etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)