You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Sean Curtis <se...@gmail.com> on 2010/12/21 03:11:29 UTC

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

just running a simple select count(1) from a table (using movielens as an example) doesnt seem to work for me.  anyone know why this doesnt work? im using hive trunk:

hive> select avg(rating) from movierating where movieid=43;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201012141048_0023, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201012141048_0023
Kill Command = /Users/Sean/dev/hadoop-0.20.2+737/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:8021 -kill job_201012141048_0023
2010-12-20 15:15:03,295 Stage-1 map = 0%,  reduce = 0%
2010-12-20 15:15:09,420 Stage-1 map = 50%,  reduce = 0%
... 
eventually fails after a couple of minutes with:

2010-12-20 17:33:01,113 Stage-1 map = 100%,  reduce = 0%
2010-12-20 17:33:32,182 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201012141048_0023 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
hive> 


almost seems like the reduce task never starts. any help would be appreciated.

sean

Re: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

Posted by Ted Yu <yu...@gmail.com>.

Have you found anything interesting from Hive history file
(/tmp/hadoop/hive_job_log_hadoop_201012210353_775358406.txt) ?

Thanks

On Mon, Dec 20, 2010 at 6:11 PM, Sean Curtis <se...@gmail.com> wrote:

> just running a simple select count(1) from a table (using movielens as an
> example) doesnt seem to work for me.  anyone know why this doesnt work? im
> using hive trunk:
>
> hive> select avg(rating) from movierating where movieid=43;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>  set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>  set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>  set mapred.reduce.tasks=<number>
> Starting Job = job_201012141048_0023, Tracking URL =
> http://localhost:50030/jobdetails.jsp?jobid=job_201012141048_0023
> Kill Command = /Users/Sean/dev/hadoop-0.20.2+737/bin/../bin/hadoop job
>  -Dmapred.job.tracker=localhost:8021 -kill job_201012141048_0023
> 2010-12-20 15:15:03,295 Stage-1 map = 0%,  reduce = 0%
> 2010-12-20 15:15:09,420 Stage-1 map = 50%,  reduce = 0%
> ...
> eventually fails after a couple of minutes with:
>
> 2010-12-20 17:33:01,113 Stage-1 map = 100%,  reduce = 0%
> 2010-12-20 17:33:32,182 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201012141048_0023 with errors
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
> hive>
>
>
> almost seems like the reduce task never starts. any help would be
> appreciated.
>
> sean

Re: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

Posted by Adarsh Sharma <ad...@orkash.com>.

Sean Curtis wrote:
> in failed/killed task attempts, i see the following:
>
>
> attempt_201012141048_0023_m_000000_0task_201012141048_0023_m_000000 
> <http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_m_000000>172.24.10.91 
> <http://172.24.10.91:50060>FAILED
> Too many fetch-failures
> Last 4KB 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000000_0&start=-4097>
> Last 8KB 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000000_0&start=-8193>
> All 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000000_0>
> attempt_201012141048_0023_m_000000_1task_201012141048_0023_m_000000 
> <http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_m_000000>172.24.10.91 
> <http://172.24.10.91:50060>FAILED
> Too many fetch-failures
> Last 4KB 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000000_1&start=-4097>
> Last 8KB 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000000_1&start=-8193>
> All 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000000_1>
> attempt_201012141048_0023_m_000001_0task_201012141048_0023_m_000001 
> <http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_m_000001>172.24.10.91 
> <http://172.24.10.91:50060>FAILED
> Too many fetch-failures
> Last 4KB 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000001_0&start=-4097>
> Last 8KB 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000001_0&start=-8193>
> All 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000001_0>
> attempt_201012141048_0023_m_000001_1task_201012141048_0023_m_000001 
> <http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_m_000001>172.24.10.91 
> <http://172.24.10.91:50060>FAILED
> Too many fetch-failures
> Last 4KB 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000001_1&start=-4097>
> Last 8KB 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000001_1&start=-8193>
> All 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000001_1>
> attempt_201012141048_0023_r_000000_0task_201012141048_0023_r_000000 
> <http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_r_000000>172.24.10.91 
> <http://172.24.10.91:50060>FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.


The value you have in your hadoop-site.xml file for hadoop.tmp.dir will 
get you into trouble:
/tmp/hadoop/tmp/dir/hadoop-${user.name}

as many systems remove items from /tmp that are older than some time 
interval.

Are there any apparently relevant messages in the task tracker logs?

With two nodes and a small number of reducers, the 
tasktracker.http.threads change is unlikely to be part of the issue.

In general, the shuffle phase is simply transferring the sorted map 
outputs, to the reducer and merge sorting the results.

The errors tend to fall into two types.
Failed or blocked transfers.

Merge sort failures.

The failed or blocked transfers tend to be due to, to many requests at 
one time to a task tracker, and is controlled by 
tasktracker.http.threads which increases the number of requests that may 
be serviced.

or firewall issues that block the actual transfer.

The merge sort failures tend to be either: out of memory or out of disk 
space issues.

There are a couple of jira's open for shuffle errors for other cases.

http://issues.apache.org/jira/browse/HADOOP-3604
http://issues.apache.org/jira/browse/HADOOP-3155 < - likely cause fixed 
in the cloudera 0.18.3 distribution
http://issues.apache.org/jira/browse/HADOOP-4115
http://issues.apache.org/jira/browse/HADOOP-3130
http://issues.apache.org/jira/browse/HADOOP-2095


> Last 4KB 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_0&start=-4097>
> Last 8KB 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_0&start=-8193>
> All 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_0>
> attempt_201012141048_0023_r_000000_1task_201012141048_0023_r_000000 
> <http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_r_000000>172.24.10.91 
> <http://172.24.10.91:50060>FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> Last 4KB 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_1&start=-4097>
> Last 8KB 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_1&start=-8193>
> All 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_1>
> attempt_201012141048_0023_r_000000_2task_201012141048_0023_r_000000 
> <http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_r_000000>172.24.10.91 
> <http://172.24.10.91:50060>FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> Last 4KB 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_2&start=-4097>
> Last 8KB 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_2&start=-8193>
> All 
> <http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_2>
> attempt_201012141048_0023_r_000000_3task_201012141048_0023_r_000000 
> <http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_r_000000>172.24.10.91 
> <http://172.24.10.91:50060>FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
>
>
>
> On Dec 20, 2010, at 11:01 PM, Adarsh Sharma wrote:
>
>> Sean Curtis wrote:
>>> just running a simple select count(1) from a table (using movielens 
>>> as an example) doesnt seem to work for me.  anyone know why this 
>>> doesnt work? im using hive trunk:
>>>
>>> hive> select avg(rating) from movierating where movieid=43;
>>> Total MapReduce jobs = 1
>>> Launching Job 1 out of 1
>>> Number of reduce tasks determined at compile time: 1
>>> In order to change the average load for a reducer (in bytes):
>>>  set hive.exec.reducers.bytes.per.reducer=<number>
>>> In order to limit the maximum number of reducers:
>>>  set hive.exec.reducers.max=<number>
>>> In order to set a constant number of reducers:
>>>  set mapred.reduce.tasks=<number>
>>> Starting Job = job_201012141048_0023, Tracking URL = 
>>> http://localhost:50030/jobdetails.jsp?jobid=job_201012141048_0023
>>> Kill Command = /Users/Sean/dev/hadoop-0.20.2+737/bin/../bin/hadoop 
>>> job  -Dmapred.job.tracker=localhost:8021 -kill job_201012141048_0023
>>> 2010-12-20 15:15:03,295 Stage-1 map = 0%,  reduce = 0%
>>> 2010-12-20 15:15:09,420 Stage-1 map = 50%,  reduce = 0%
>>> ... eventually fails after a couple of minutes with:
>>>
>>> 2010-12-20 17:33:01,113 Stage-1 map = 100%,  reduce = 0%
>>> 2010-12-20 17:33:32,182 Stage-1 map = 100%,  reduce = 100%
>>> Ended Job = job_201012141048_0023 with errors
>>> FAILED: Execution Error, return code 2 from 
>>> org.apache.hadoop.hive.ql.exec.MapRedTask
>>> hive>
>>>
>>> almost seems like the reduce task never starts. any help would be 
>>> appreciated.
>>>
>>> sean
>> To know the root cause of the problem, got to Jobtracker web UI ( 
>> IP:50030) and Check Job Tracker History at the bottom corresponding 
>> to this Job ID.
>>
>>
>> Best Regards
>>
>> Adarsh Sharma
>

Re: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

Posted by Sean Curtis <se...@gmail.com>.

in failed/killed task attempts, i see the following:

attempt_201012141048_0023_m_000000_0task_201012141048_0023_m_000000172.24.10.91FAILED
Too many fetch-failures
Last 4KB
Last 8KB
All
attempt_201012141048_0023_m_000000_1task_201012141048_0023_m_000000172.24.10.91FAILED
Too many fetch-failures
Last 4KB
Last 8KB
All
attempt_201012141048_0023_m_000001_0task_201012141048_0023_m_000001172.24.10.91FAILED
Too many fetch-failures
Last 4KB
Last 8KB
All
attempt_201012141048_0023_m_000001_1task_201012141048_0023_m_000001172.24.10.91FAILED
Too many fetch-failures
Last 4KB
Last 8KB
All
attempt_201012141048_0023_r_000000_0task_201012141048_0023_r_000000172.24.10.91FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Last 4KB
Last 8KB
All
attempt_201012141048_0023_r_000000_1task_201012141048_0023_r_000000172.24.10.91FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Last 4KB
Last 8KB
All
attempt_201012141048_0023_r_000000_2task_201012141048_0023_r_000000172.24.10.91FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Last 4KB
Last 8KB
All
attempt_201012141048_0023_r_000000_3task_201012141048_0023_r_000000172.24.10.91FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

On Dec 20, 2010, at 11:01 PM, Adarsh Sharma wrote:

> Sean Curtis wrote:
>> just running a simple select count(1) from a table (using movielens as an example) doesnt seem to work for me.  anyone know why this doesnt work? im using hive trunk:
>> 
>> hive> select avg(rating) from movierating where movieid=43;
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks determined at compile time: 1
>> In order to change the average load for a reducer (in bytes):
>>  set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>  set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>  set mapred.reduce.tasks=<number>
>> Starting Job = job_201012141048_0023, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201012141048_0023
>> Kill Command = /Users/Sean/dev/hadoop-0.20.2+737/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:8021 -kill job_201012141048_0023
>> 2010-12-20 15:15:03,295 Stage-1 map = 0%,  reduce = 0%
>> 2010-12-20 15:15:09,420 Stage-1 map = 50%,  reduce = 0%
>> ... eventually fails after a couple of minutes with:
>> 
>> 2010-12-20 17:33:01,113 Stage-1 map = 100%,  reduce = 0%
>> 2010-12-20 17:33:32,182 Stage-1 map = 100%,  reduce = 100%
>> Ended Job = job_201012141048_0023 with errors
>> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
>> hive> 
>> 
>> almost seems like the reduce task never starts. any help would be appreciated.
>> 
>> sean
> To know the root cause of the problem, got to Jobtracker web UI ( IP:50030) and Check Job Tracker History at the bottom corresponding to this Job ID.
> 
> 
> Best Regards
> 
> Adarsh Sharma

Re: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

Posted by Adarsh Sharma <ad...@orkash.com>.

Sean Curtis wrote:
> just running a simple select count(1) from a table (using movielens as an example) doesnt seem to work for me.  anyone know why this doesnt work? im using hive trunk:
>
> hive> select avg(rating) from movierating where movieid=43;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Starting Job = job_201012141048_0023, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201012141048_0023
> Kill Command = /Users/Sean/dev/hadoop-0.20.2+737/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:8021 -kill job_201012141048_0023
> 2010-12-20 15:15:03,295 Stage-1 map = 0%,  reduce = 0%
> 2010-12-20 15:15:09,420 Stage-1 map = 50%,  reduce = 0%
> ... 
> eventually fails after a couple of minutes with:
>
> 2010-12-20 17:33:01,113 Stage-1 map = 100%,  reduce = 0%
> 2010-12-20 17:33:32,182 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201012141048_0023 with errors
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> hive> 
>
>
> almost seems like the reduce task never starts. any help would be appreciated.
>
> sean
To know the root cause of the problem, got to Jobtracker web UI ( 
IP:50030) and Check Job Tracker History at the bottom corresponding to 
this Job ID.


Best Regards

Adarsh Sharma