You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Alexandre Fouche <al...@cleverscale.com> on 2012/11/07 12:30:17 UTC

Hive NR map progress inconsistent and regurlarly restart from 0%

I have a Yarn MR (with two ec2 instances to mapreduce) job on a dataset of approximately a thousand avro records, and the map phase is behaving erratically. See the progress below

What is going on there ?

            hive> select * from nikon where qs_cs_s_aid='VIEW' limit 10;

            Total MapReduce jobs = 1
            Launching Job 1 out of 1
            Number of reduce tasks is set to 0 since there's no reduce operator
            Starting Job = job_1352281315350_0020, Tracking URL = http://blabla.ec2.internal:8088/proxy/application_1352281315350_0020/
            Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=blabla.com:8032 -kill job_1352281315350_0020
            Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 0

            2012-11-07 11:14:40,976 Stage-1 map = 0%,  reduce = 0%
            2012-11-07 11:15:06,136 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 10.38 sec
            2012-11-07 11:15:07,253 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 12.18 sec
            2012-11-07 11:15:08,371 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 12.18 sec
            2012-11-07 11:15:09,491 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 12.18 sec
            2012-11-07 11:15:10,643 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 15.42 sec
            (...)
            2012-11-07 11:15:35,441 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 37.77 sec
            2012-11-07 11:15:36,486 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 37.77 sec

here restart at 16% ?

            2012-11-07 11:15:37,692 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 21.15 sec
            2012-11-07 11:15:38,815 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 21.15 sec
            2012-11-07 11:15:39,865 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 21.15 sec
            2012-11-07 11:15:41,064 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 22.4 sec
            2012-11-07 11:15:42,181 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 22.4 sec
            2012-11-07 11:15:43,299 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 22.4 sec

here restart at 0% ?

            2012-11-07 11:15:44,418 Stage-1 map = 0%,  reduce = 0%
            2012-11-07 11:16:02,076 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 6.86 sec
            2012-11-07 11:16:03,193 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 6.86 sec
            2012-11-07 11:16:04,259 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 8.45 sec
            (...)
            2012-11-07 11:16:31,291 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 35.34 sec
            2012-11-07 11:16:32,414 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 37.93 sec

here restart at 11% ?

            2012-11-07 11:16:33,459 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 19.53 sec
            2012-11-07 11:16:34,507 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 19.53 sec
            2012-11-07 11:16:35,731 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 21.47 sec
            (...)
            2012-11-07 11:16:46,839 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 24.14 sec

here restart at 0% ?

            2012-11-07 11:16:47,939 Stage-1 map = 0%,  reduce = 0%
            2012-11-07 11:16:56,653 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 7.54 sec
            2012-11-07 11:16:57,814 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 7.54 sec
            (...)

Needless to say the job crashes after some time with an Error: java.io.IOException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: -56

Re: Hive NR map progress inconsistent and regurlarly restart from 0%

Posted by Alexandre Fouche <al...@cleverscale.com>.

Well, that is my problem here: i checked the logs on resourcemanager and nodemanagers and saw nothing suspicious, but these logs are too verbose.
Could it be the data or avro records which have issues ?



--
Alexandre Fouche
Lead operations engineer, cloud architect
http://www.cleverscale.com | @cleverscale
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Wednesday 7 November 2012 at 12:36, Jan Dolinár wrote:

> This usually happens when some task fail, their progress is then not
> counted, hence the 'restart'. Check your task logs for failures.
>  
> Jan
>  
>  
> On Wed, Nov 7, 2012 at 12:30 PM, Alexandre Fouche
> <alexandre.fouche@cleverscale.com (mailto:alexandre.fouche@cleverscale.com)> wrote:
> > I have a Yarn MR (with two ec2 instances to mapreduce) job on a dataset of
> > approximately a thousand avro records, and the map phase is behaving
> > erratically. See the progress below
> >  
> > What is going on there ?
> >  
> > hive> select * from nikon where qs_cs_s_aid='VIEW' limit 10;
> >  
> > Total MapReduce jobs = 1
> > Launching Job 1 out of 1
> > Number of reduce tasks is set to 0 since there's no reduce
> > operator
> > Starting Job = job_1352281315350_0020, Tracking URL =
> > http://blabla.ec2.internal:8088/proxy/application_1352281315350_0020/
> > Kill Command = /usr/lib/hadoop/bin/hadoop job
> > -Dmapred.job.tracker=blabla.com:8032 (http://blabla.com:8032) -kill job_1352281315350_0020
> > Hadoop job information for Stage-1: number of mappers: 4; number
> > of reducers: 0
> >  
> > 2012-11-07 11:14:40,976 Stage-1 map = 0%, reduce = 0%
> > 2012-11-07 11:15:06,136 Stage-1 map = 1%, reduce = 0%,
> > Cumulative CPU 10.38 sec
> > 2012-11-07 11:15:07,253 Stage-1 map = 1%, reduce = 0%,
> > Cumulative CPU 12.18 sec
> > 2012-11-07 11:15:08,371 Stage-1 map = 1%, reduce = 0%,
> > Cumulative CPU 12.18 sec
> > 2012-11-07 11:15:09,491 Stage-1 map = 1%, reduce = 0%,
> > Cumulative CPU 12.18 sec
> > 2012-11-07 11:15:10,643 Stage-1 map = 2%, reduce = 0%,
> > Cumulative CPU 15.42 sec
> > (...)
> > 2012-11-07 11:15:35,441 Stage-1 map = 28%, reduce = 0%,
> > Cumulative CPU 37.77 sec
> > 2012-11-07 11:15:36,486 Stage-1 map = 28%, reduce = 0%,
> > Cumulative CPU 37.77 sec
> >  
> > here restart at 16% ?
> >  
> > 2012-11-07 11:15:37,692 Stage-1 map = 16%, reduce = 0%,
> > Cumulative CPU 21.15 sec
> > 2012-11-07 11:15:38,815 Stage-1 map = 16%, reduce = 0%,
> > Cumulative CPU 21.15 sec
> > 2012-11-07 11:15:39,865 Stage-1 map = 16%, reduce = 0%,
> > Cumulative CPU 21.15 sec
> > 2012-11-07 11:15:41,064 Stage-1 map = 18%, reduce = 0%,
> > Cumulative CPU 22.4 sec
> > 2012-11-07 11:15:42,181 Stage-1 map = 18%, reduce = 0%,
> > Cumulative CPU 22.4 sec
> > 2012-11-07 11:15:43,299 Stage-1 map = 18%, reduce = 0%,
> > Cumulative CPU 22.4 sec
> >  
> > here restart at 0% ?
> >  
> > 2012-11-07 11:15:44,418 Stage-1 map = 0%, reduce = 0%
> > 2012-11-07 11:16:02,076 Stage-1 map = 1%, reduce = 0%,
> > Cumulative CPU 6.86 sec
> > 2012-11-07 11:16:03,193 Stage-1 map = 1%, reduce = 0%,
> > Cumulative CPU 6.86 sec
> > 2012-11-07 11:16:04,259 Stage-1 map = 2%, reduce = 0%,
> > Cumulative CPU 8.45 sec
> > (...)
> > 2012-11-07 11:16:31,291 Stage-1 map = 22%, reduce = 0%,
> > Cumulative CPU 35.34 sec
> > 2012-11-07 11:16:32,414 Stage-1 map = 26%, reduce = 0%,
> > Cumulative CPU 37.93 sec
> >  
> > here restart at 11% ?
> >  
> > 2012-11-07 11:16:33,459 Stage-1 map = 11%, reduce = 0%,
> > Cumulative CPU 19.53 sec
> > 2012-11-07 11:16:34,507 Stage-1 map = 11%, reduce = 0%,
> > Cumulative CPU 19.53 sec
> > 2012-11-07 11:16:35,731 Stage-1 map = 13%, reduce = 0%,
> > Cumulative CPU 21.47 sec
> > (...)
> > 2012-11-07 11:16:46,839 Stage-1 map = 17%, reduce = 0%,
> > Cumulative CPU 24.14 sec
> >  
> > here restart at 0% ?
> >  
> > 2012-11-07 11:16:47,939 Stage-1 map = 0%, reduce = 0%
> > 2012-11-07 11:16:56,653 Stage-1 map = 1%, reduce = 0%,
> > Cumulative CPU 7.54 sec
> > 2012-11-07 11:16:57,814 Stage-1 map = 1%, reduce = 0%,
> > Cumulative CPU 7.54 sec
> > (...)
> >  
> > Needless to say the job crashes after some time with an Error:
> > java.io.IOException: java.io.IOException:
> > java.lang.ArrayIndexOutOfBoundsException: -56
> >  
>  
>  
>

Re: Hive NR map progress inconsistent and regurlarly restart from 0%

Posted by Jan Dolinár <do...@gmail.com>.

This usually happens when some task fail, their progress is then not
counted, hence the 'restart'. Check your task logs for failures.

Jan


On Wed, Nov 7, 2012 at 12:30 PM, Alexandre Fouche
<al...@cleverscale.com> wrote:
> I have a Yarn MR (with two ec2 instances to mapreduce) job on a dataset of
> approximately a thousand avro records, and the map phase is behaving
> erratically. See the progress below
>
> What is going on there ?
>
>             hive> select * from nikon where qs_cs_s_aid='VIEW' limit 10;
>
>             Total MapReduce jobs = 1
>             Launching Job 1 out of 1
>             Number of reduce tasks is set to 0 since there's no reduce
> operator
>             Starting Job = job_1352281315350_0020, Tracking URL =
> http://blabla.ec2.internal:8088/proxy/application_1352281315350_0020/
>             Kill Command = /usr/lib/hadoop/bin/hadoop job
> -Dmapred.job.tracker=blabla.com:8032 -kill job_1352281315350_0020
>             Hadoop job information for Stage-1: number of mappers: 4; number
> of reducers: 0
>
>             2012-11-07 11:14:40,976 Stage-1 map = 0%,  reduce = 0%
>             2012-11-07 11:15:06,136 Stage-1 map = 1%,  reduce = 0%,
> Cumulative CPU 10.38 sec
>             2012-11-07 11:15:07,253 Stage-1 map = 1%,  reduce = 0%,
> Cumulative CPU 12.18 sec
>             2012-11-07 11:15:08,371 Stage-1 map = 1%,  reduce = 0%,
> Cumulative CPU 12.18 sec
>             2012-11-07 11:15:09,491 Stage-1 map = 1%,  reduce = 0%,
> Cumulative CPU 12.18 sec
>             2012-11-07 11:15:10,643 Stage-1 map = 2%,  reduce = 0%,
> Cumulative CPU 15.42 sec
>             (...)
>             2012-11-07 11:15:35,441 Stage-1 map = 28%,  reduce = 0%,
> Cumulative CPU 37.77 sec
>             2012-11-07 11:15:36,486 Stage-1 map = 28%,  reduce = 0%,
> Cumulative CPU 37.77 sec
>
> here restart at 16% ?
>
>             2012-11-07 11:15:37,692 Stage-1 map = 16%,  reduce = 0%,
> Cumulative CPU 21.15 sec
>             2012-11-07 11:15:38,815 Stage-1 map = 16%,  reduce = 0%,
> Cumulative CPU 21.15 sec
>             2012-11-07 11:15:39,865 Stage-1 map = 16%,  reduce = 0%,
> Cumulative CPU 21.15 sec
>             2012-11-07 11:15:41,064 Stage-1 map = 18%,  reduce = 0%,
> Cumulative CPU 22.4 sec
>             2012-11-07 11:15:42,181 Stage-1 map = 18%,  reduce = 0%,
> Cumulative CPU 22.4 sec
>             2012-11-07 11:15:43,299 Stage-1 map = 18%,  reduce = 0%,
> Cumulative CPU 22.4 sec
>
> here restart at 0% ?
>
>             2012-11-07 11:15:44,418 Stage-1 map = 0%,  reduce = 0%
>             2012-11-07 11:16:02,076 Stage-1 map = 1%,  reduce = 0%,
> Cumulative CPU 6.86 sec
>             2012-11-07 11:16:03,193 Stage-1 map = 1%,  reduce = 0%,
> Cumulative CPU 6.86 sec
>             2012-11-07 11:16:04,259 Stage-1 map = 2%,  reduce = 0%,
> Cumulative CPU 8.45 sec
>             (...)
>             2012-11-07 11:16:31,291 Stage-1 map = 22%,  reduce = 0%,
> Cumulative CPU 35.34 sec
>             2012-11-07 11:16:32,414 Stage-1 map = 26%,  reduce = 0%,
> Cumulative CPU 37.93 sec
>
> here restart at 11% ?
>
>             2012-11-07 11:16:33,459 Stage-1 map = 11%,  reduce = 0%,
> Cumulative CPU 19.53 sec
>             2012-11-07 11:16:34,507 Stage-1 map = 11%,  reduce = 0%,
> Cumulative CPU 19.53 sec
>             2012-11-07 11:16:35,731 Stage-1 map = 13%,  reduce = 0%,
> Cumulative CPU 21.47 sec
>             (...)
>             2012-11-07 11:16:46,839 Stage-1 map = 17%,  reduce = 0%,
> Cumulative CPU 24.14 sec
>
> here restart at 0% ?
>
>             2012-11-07 11:16:47,939 Stage-1 map = 0%,  reduce = 0%
>             2012-11-07 11:16:56,653 Stage-1 map = 1%,  reduce = 0%,
> Cumulative CPU 7.54 sec
>             2012-11-07 11:16:57,814 Stage-1 map = 1%,  reduce = 0%,
> Cumulative CPU 7.54 sec
>             (...)
>
> Needless to say the job crashes after some time with an Error:
> java.io.IOException: java.io.IOException:
> java.lang.ArrayIndexOutOfBoundsException: -56
>