You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by Azuryy <az...@gmail.com> on 2014/04/01 01:20:32 UTC

Re: RM ha issuses

I will run a MR job to verify it.

Stop RM means yarn-daemon.sh stop resourcemanager 

Thanks
Sent from my iPhone5s

> On 2014年4月1日, at 0:38, Karthik Kambatla <ka...@cloudera.com> wrote:
> 
> Thanks for reporting this, Azuryy. Indeed, this is surprising.
> 
> I don't quite understand how Hive works; do you mind running a vanilla MR
> job and verifying if this is indeed the case. Also, when you say you
> stopped the Active RM, you mean only the RM process - correct?
> 
> 
>> On Mon, Mar 31, 2014 at 3:46 AM, Azuryy Yu <az...@gmail.com> wrote:
>> 
>> Hi,
>> 
>> I built from trunk, and configured RM Ha, then I submitted a hive job.
>> total 11 maps, then I stopped active RM when 6 maps finished.
>> 
>> but Hive shows me all map tasks restat again. This is conflict with the
>> design description.
>> 
>> job progress:
>> 2014-03-31 18:44:14,088 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU
>> 713.84 sec
>> 2014-03-31 18:44:15,128 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU
>> 722.83 sec
>> 2014-03-31 18:44:16,160 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU
>> 731.95 sec
>> 2014-03-31 18:44:17,191 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU
>> 744.17 sec
>> 2014-03-31 18:44:18,220 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU
>> 756.22 sec
>> 2014-03-31 18:44:19,250 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU
>> 762.4 sec
>> 2014-03-31 18:44:20,281 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU
>> 774.64 sec
>> 2014-03-31 18:44:21,306 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU
>> 786.49 sec
>> 2014-03-31 18:44:22,334 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU
>> 792.59 sec
>> 2014-03-31 18:44:23,363 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU
>> 807.58 sec
>> 2014-03-31 18:44:24,392 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU
>> 815.96 sec
>> 2014-03-31 18:44:25,416 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU
>> 823.83 sec
>> 2014-03-31 18:44:26,443 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU
>> 826.84 sec
>> 2014-03-31 18:44:27,472 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU
>> 832.16 sec
>> 2014-03-31 18:44:28,501 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU
>> 839.73 sec
>> 2014-03-31 18:44:29,531 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU
>> 844.45 sec
>> 2014-03-31 18:44:30,564 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU
>> 760.34 sec
>> 2014-03-31 18:44:31,728 Stage-1 map = 0%,  reduce = 0%
>> 2014-03-31 18:45:06,918 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>> 213.81 sec
>> 2014-03-31 18:45:07,952 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>> 216.83 sec
>> 2014-03-31 18:45:08,979 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU
>> 229.15 sec
>> 2014-03-31 18:45:10,007 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU
>> 244.42 sec
>> 2014-03-31 18:45:11,040 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU
>> 247.31 sec
>> 2014-03-31 18:45:12,072 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU
>> 259.5 sec
>> 2014-03-31 18:45:13,105 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
>> 274.72 sec
>> 2014-03-31 18:45:14,135 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
>> 280.76 sec
>> 2014-03-31 18:45:15,170 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
>> 292.9 sec
>> 2014-03-31 18:45:16,202 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
>> 305.16 sec
>> 2014-03-31 18:45:17,233 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
>> 314.21 sec
>> 2014-03-31 18:45:18,264 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
>> 323.34 sec
>> 2014-03-31 18:45:19,294 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
>> 335.6 sec
>> 2014-03-31 18:45:20,325 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
>> 344.71 sec
>> 2014-03-31 18:45:21,355 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
>> 353.8 sec
>> 2014-03-31 18:45:22,385 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
>> 366.06 sec
>> 2014-03-31 18:45:23,415 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
>> 375.2 sec
>> 2014-03-31 18:45:24,449 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
>> 384.28 sec
>> 2014-03-31 18:45:25,481 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
>> 396.54 sec
>> 2014-03-31 18:45:26,512 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU
>> 408.72 sec
>> 2014-03-31 18:45:27,549 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU
>> 414.69 sec
>> 2014-03-31 18:45:28,582 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU
>> 426.99 sec
>> 2014-03-31 18:45:29,614 Stage-1 map = 32%,  reduce = 0%, Cumulative CPU
>> 439.25 sec
>> 2014-03-31 18:45:30,653 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU
>> 448.25 sec
>> 2014-03-31 18:45:31,683 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU
>> 460.5 sec
>> 2014-03-31 18:45:32,723 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU
>> 469.63 sec
>> 2014-03-31 18:45:33,754 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU
>> 478.67 sec
>> 

Re: RM ha issuses

Posted by Azuryy Yu <az...@gmail.com>.
I am using hive-0.12.0,  and ZKRMStateRoot as RM store class. another step
would be compare ZK data for MR job and Hive Job.


On Tue, Apr 1, 2014 at 12:51 PM, Karthik Kambatla <ka...@cloudera.com>wrote:

> It might be a good first step to compare the configurations for the vanilla
> MR job and Hive MR job.
>
>
> On Mon, Mar 31, 2014 at 7:06 PM, Azuryy Yu <az...@gmail.com> wrote:
>
> > Hi Karthik,
> > I ram a common MR job, it does work well during RM failover.
> >
> > job progress:
> > (there is failover with red font)
> >
> > 14/04/01 10:01:38 INFO mapreduce.Job:  map 61% reduce 8%
> > 14/04/01 10:01:40 INFO mapreduce.Job:  map 61% reduce 10%
> > 14/04/01 10:01:41 INFO mapreduce.Job:  map 62% reduce 10%
> > 14/04/01 10:01:44 INFO mapreduce.Job:  map 63% reduce 10%
> > 14/04/01 10:01:47 INFO mapreduce.Job:  map 64% reduce 10%
> > 14/04/01 10:02:36 INFO mapreduce.Job:  map 60% reduce 0%
> > 14/04/01 10:02:40 INFO client.ConfiguredRMFailoverProxyProvider: Failing
> > over to rm2
> > 14/04/01 10:03:00 INFO mapreduce.Job:  map 63% reduce 0%
> > 14/04/01 10:03:02 INFO mapreduce.Job:  map 66% reduce 2%
> > 14/04/01 10:03:04 INFO mapreduce.Job:  map 67% reduce 2%
> > 14/04/01 10:03:06 INFO mapreduce.Job:  map 69% reduce 2%
> > 14/04/01 10:03:08 INFO mapreduce.Job:  map 71% reduce 2%
> > 14/04/01 10:03:10 INFO mapreduce.Job:  map 72% reduce 2%
> >
> > So Hive job tasks are all restart during failover, please take a look.
> >
> >
> >
> > On Tue, Apr 1, 2014 at 7:20 AM, Azuryy <az...@gmail.com> wrote:
> >
> > > I will run a MR job to verify it.
> > >
> > > Stop RM means yarn-daemon.sh stop resourcemanager
> > >
> > > Thanks
> > > Sent from my iPhone5s
> > >
> > > > On 2014年4月1日, at 0:38, Karthik Kambatla <ka...@cloudera.com> wrote:
> > > >
> > > > Thanks for reporting this, Azuryy. Indeed, this is surprising.
> > > >
> > > > I don't quite understand how Hive works; do you mind running a
> vanilla
> > MR
> > > > job and verifying if this is indeed the case. Also, when you say you
> > > > stopped the Active RM, you mean only the RM process - correct?
> > > >
> > > >
> > > >> On Mon, Mar 31, 2014 at 3:46 AM, Azuryy Yu <az...@gmail.com>
> > wrote:
> > > >>
> > > >> Hi,
> > > >>
> > > >> I built from trunk, and configured RM Ha, then I submitted a hive
> job.
> > > >> total 11 maps, then I stopped active RM when 6 maps finished.
> > > >>
> > > >> but Hive shows me all map tasks restat again. This is conflict with
> > the
> > > >> design description.
> > > >>
> > > >> job progress:
> > > >> 2014-03-31 18:44:14,088 Stage-1 map = 68%,  reduce = 0%, Cumulative
> > CPU
> > > >> 713.84 sec
> > > >> 2014-03-31 18:44:15,128 Stage-1 map = 68%,  reduce = 0%, Cumulative
> > CPU
> > > >> 722.83 sec
> > > >> 2014-03-31 18:44:16,160 Stage-1 map = 68%,  reduce = 0%, Cumulative
> > CPU
> > > >> 731.95 sec
> > > >> 2014-03-31 18:44:17,191 Stage-1 map = 68%,  reduce = 0%, Cumulative
> > CPU
> > > >> 744.17 sec
> > > >> 2014-03-31 18:44:18,220 Stage-1 map = 68%,  reduce = 0%, Cumulative
> > CPU
> > > >> 756.22 sec
> > > >> 2014-03-31 18:44:19,250 Stage-1 map = 68%,  reduce = 0%, Cumulative
> > CPU
> > > >> 762.4 sec
> > > >> 2014-03-31 18:44:20,281 Stage-1 map = 68%,  reduce = 0%, Cumulative
> > CPU
> > > >> 774.64 sec
> > > >> 2014-03-31 18:44:21,306 Stage-1 map = 70%,  reduce = 0%, Cumulative
> > CPU
> > > >> 786.49 sec
> > > >> 2014-03-31 18:44:22,334 Stage-1 map = 70%,  reduce = 0%, Cumulative
> > CPU
> > > >> 792.59 sec
> > > >> 2014-03-31 18:44:23,363 Stage-1 map = 73%,  reduce = 0%, Cumulative
> > CPU
> > > >> 807.58 sec
> > > >> 2014-03-31 18:44:24,392 Stage-1 map = 77%,  reduce = 0%, Cumulative
> > CPU
> > > >> 815.96 sec
> > > >> 2014-03-31 18:44:25,416 Stage-1 map = 80%,  reduce = 0%, Cumulative
> > CPU
> > > >> 823.83 sec
> > > >> 2014-03-31 18:44:26,443 Stage-1 map = 80%,  reduce = 0%, Cumulative
> > CPU
> > > >> 826.84 sec
> > > >> 2014-03-31 18:44:27,472 Stage-1 map = 82%,  reduce = 0%, Cumulative
> > CPU
> > > >> 832.16 sec
> > > >> 2014-03-31 18:44:28,501 Stage-1 map = 84%,  reduce = 0%, Cumulative
> > CPU
> > > >> 839.73 sec
> > > >> 2014-03-31 18:44:29,531 Stage-1 map = 86%,  reduce = 0%, Cumulative
> > CPU
> > > >> 844.45 sec
> > > >> 2014-03-31 18:44:30,564 Stage-1 map = 82%,  reduce = 0%, Cumulative
> > CPU
> > > >> 760.34 sec
> > > >> 2014-03-31 18:44:31,728 Stage-1 map = 0%,  reduce = 0%
> > > >> 2014-03-31 18:45:06,918 Stage-1 map = 2%,  reduce = 0%, Cumulative
> CPU
> > > >> 213.81 sec
> > > >> 2014-03-31 18:45:07,952 Stage-1 map = 2%,  reduce = 0%, Cumulative
> CPU
> > > >> 216.83 sec
> > > >> 2014-03-31 18:45:08,979 Stage-1 map = 7%,  reduce = 0%, Cumulative
> CPU
> > > >> 229.15 sec
> > > >> 2014-03-31 18:45:10,007 Stage-1 map = 11%,  reduce = 0%, Cumulative
> > CPU
> > > >> 244.42 sec
> > > >> 2014-03-31 18:45:11,040 Stage-1 map = 14%,  reduce = 0%, Cumulative
> > CPU
> > > >> 247.31 sec
> > > >> 2014-03-31 18:45:12,072 Stage-1 map = 18%,  reduce = 0%, Cumulative
> > CPU
> > > >> 259.5 sec
> > > >> 2014-03-31 18:45:13,105 Stage-1 map = 23%,  reduce = 0%, Cumulative
> > CPU
> > > >> 274.72 sec
> > > >> 2014-03-31 18:45:14,135 Stage-1 map = 23%,  reduce = 0%, Cumulative
> > CPU
> > > >> 280.76 sec
> > > >> 2014-03-31 18:45:15,170 Stage-1 map = 23%,  reduce = 0%, Cumulative
> > CPU
> > > >> 292.9 sec
> > > >> 2014-03-31 18:45:16,202 Stage-1 map = 23%,  reduce = 0%, Cumulative
> > CPU
> > > >> 305.16 sec
> > > >> 2014-03-31 18:45:17,233 Stage-1 map = 23%,  reduce = 0%, Cumulative
> > CPU
> > > >> 314.21 sec
> > > >> 2014-03-31 18:45:18,264 Stage-1 map = 23%,  reduce = 0%, Cumulative
> > CPU
> > > >> 323.34 sec
> > > >> 2014-03-31 18:45:19,294 Stage-1 map = 23%,  reduce = 0%, Cumulative
> > CPU
> > > >> 335.6 sec
> > > >> 2014-03-31 18:45:20,325 Stage-1 map = 23%,  reduce = 0%, Cumulative
> > CPU
> > > >> 344.71 sec
> > > >> 2014-03-31 18:45:21,355 Stage-1 map = 23%,  reduce = 0%, Cumulative
> > CPU
> > > >> 353.8 sec
> > > >> 2014-03-31 18:45:22,385 Stage-1 map = 23%,  reduce = 0%, Cumulative
> > CPU
> > > >> 366.06 sec
> > > >> 2014-03-31 18:45:23,415 Stage-1 map = 23%,  reduce = 0%, Cumulative
> > CPU
> > > >> 375.2 sec
> > > >> 2014-03-31 18:45:24,449 Stage-1 map = 23%,  reduce = 0%, Cumulative
> > CPU
> > > >> 384.28 sec
> > > >> 2014-03-31 18:45:25,481 Stage-1 map = 23%,  reduce = 0%, Cumulative
> > CPU
> > > >> 396.54 sec
> > > >> 2014-03-31 18:45:26,512 Stage-1 map = 25%,  reduce = 0%, Cumulative
> > CPU
> > > >> 408.72 sec
> > > >> 2014-03-31 18:45:27,549 Stage-1 map = 25%,  reduce = 0%, Cumulative
> > CPU
> > > >> 414.69 sec
> > > >> 2014-03-31 18:45:28,582 Stage-1 map = 30%,  reduce = 0%, Cumulative
> > CPU
> > > >> 426.99 sec
> > > >> 2014-03-31 18:45:29,614 Stage-1 map = 32%,  reduce = 0%, Cumulative
> > CPU
> > > >> 439.25 sec
> > > >> 2014-03-31 18:45:30,653 Stage-1 map = 34%,  reduce = 0%, Cumulative
> > CPU
> > > >> 448.25 sec
> > > >> 2014-03-31 18:45:31,683 Stage-1 map = 39%,  reduce = 0%, Cumulative
> > CPU
> > > >> 460.5 sec
> > > >> 2014-03-31 18:45:32,723 Stage-1 map = 41%,  reduce = 0%, Cumulative
> > CPU
> > > >> 469.63 sec
> > > >> 2014-03-31 18:45:33,754 Stage-1 map = 43%,  reduce = 0%, Cumulative
> > CPU
> > > >> 478.67 sec
> > > >>
> > >
> >
>

Re: RM ha issuses

Posted by Karthik Kambatla <ka...@cloudera.com>.
It might be a good first step to compare the configurations for the vanilla
MR job and Hive MR job.


On Mon, Mar 31, 2014 at 7:06 PM, Azuryy Yu <az...@gmail.com> wrote:

> Hi Karthik,
> I ram a common MR job, it does work well during RM failover.
>
> job progress:
> (there is failover with red font)
>
> 14/04/01 10:01:38 INFO mapreduce.Job:  map 61% reduce 8%
> 14/04/01 10:01:40 INFO mapreduce.Job:  map 61% reduce 10%
> 14/04/01 10:01:41 INFO mapreduce.Job:  map 62% reduce 10%
> 14/04/01 10:01:44 INFO mapreduce.Job:  map 63% reduce 10%
> 14/04/01 10:01:47 INFO mapreduce.Job:  map 64% reduce 10%
> 14/04/01 10:02:36 INFO mapreduce.Job:  map 60% reduce 0%
> 14/04/01 10:02:40 INFO client.ConfiguredRMFailoverProxyProvider: Failing
> over to rm2
> 14/04/01 10:03:00 INFO mapreduce.Job:  map 63% reduce 0%
> 14/04/01 10:03:02 INFO mapreduce.Job:  map 66% reduce 2%
> 14/04/01 10:03:04 INFO mapreduce.Job:  map 67% reduce 2%
> 14/04/01 10:03:06 INFO mapreduce.Job:  map 69% reduce 2%
> 14/04/01 10:03:08 INFO mapreduce.Job:  map 71% reduce 2%
> 14/04/01 10:03:10 INFO mapreduce.Job:  map 72% reduce 2%
>
> So Hive job tasks are all restart during failover, please take a look.
>
>
>
> On Tue, Apr 1, 2014 at 7:20 AM, Azuryy <az...@gmail.com> wrote:
>
> > I will run a MR job to verify it.
> >
> > Stop RM means yarn-daemon.sh stop resourcemanager
> >
> > Thanks
> > Sent from my iPhone5s
> >
> > > On 2014年4月1日, at 0:38, Karthik Kambatla <ka...@cloudera.com> wrote:
> > >
> > > Thanks for reporting this, Azuryy. Indeed, this is surprising.
> > >
> > > I don't quite understand how Hive works; do you mind running a vanilla
> MR
> > > job and verifying if this is indeed the case. Also, when you say you
> > > stopped the Active RM, you mean only the RM process - correct?
> > >
> > >
> > >> On Mon, Mar 31, 2014 at 3:46 AM, Azuryy Yu <az...@gmail.com>
> wrote:
> > >>
> > >> Hi,
> > >>
> > >> I built from trunk, and configured RM Ha, then I submitted a hive job.
> > >> total 11 maps, then I stopped active RM when 6 maps finished.
> > >>
> > >> but Hive shows me all map tasks restat again. This is conflict with
> the
> > >> design description.
> > >>
> > >> job progress:
> > >> 2014-03-31 18:44:14,088 Stage-1 map = 68%,  reduce = 0%, Cumulative
> CPU
> > >> 713.84 sec
> > >> 2014-03-31 18:44:15,128 Stage-1 map = 68%,  reduce = 0%, Cumulative
> CPU
> > >> 722.83 sec
> > >> 2014-03-31 18:44:16,160 Stage-1 map = 68%,  reduce = 0%, Cumulative
> CPU
> > >> 731.95 sec
> > >> 2014-03-31 18:44:17,191 Stage-1 map = 68%,  reduce = 0%, Cumulative
> CPU
> > >> 744.17 sec
> > >> 2014-03-31 18:44:18,220 Stage-1 map = 68%,  reduce = 0%, Cumulative
> CPU
> > >> 756.22 sec
> > >> 2014-03-31 18:44:19,250 Stage-1 map = 68%,  reduce = 0%, Cumulative
> CPU
> > >> 762.4 sec
> > >> 2014-03-31 18:44:20,281 Stage-1 map = 68%,  reduce = 0%, Cumulative
> CPU
> > >> 774.64 sec
> > >> 2014-03-31 18:44:21,306 Stage-1 map = 70%,  reduce = 0%, Cumulative
> CPU
> > >> 786.49 sec
> > >> 2014-03-31 18:44:22,334 Stage-1 map = 70%,  reduce = 0%, Cumulative
> CPU
> > >> 792.59 sec
> > >> 2014-03-31 18:44:23,363 Stage-1 map = 73%,  reduce = 0%, Cumulative
> CPU
> > >> 807.58 sec
> > >> 2014-03-31 18:44:24,392 Stage-1 map = 77%,  reduce = 0%, Cumulative
> CPU
> > >> 815.96 sec
> > >> 2014-03-31 18:44:25,416 Stage-1 map = 80%,  reduce = 0%, Cumulative
> CPU
> > >> 823.83 sec
> > >> 2014-03-31 18:44:26,443 Stage-1 map = 80%,  reduce = 0%, Cumulative
> CPU
> > >> 826.84 sec
> > >> 2014-03-31 18:44:27,472 Stage-1 map = 82%,  reduce = 0%, Cumulative
> CPU
> > >> 832.16 sec
> > >> 2014-03-31 18:44:28,501 Stage-1 map = 84%,  reduce = 0%, Cumulative
> CPU
> > >> 839.73 sec
> > >> 2014-03-31 18:44:29,531 Stage-1 map = 86%,  reduce = 0%, Cumulative
> CPU
> > >> 844.45 sec
> > >> 2014-03-31 18:44:30,564 Stage-1 map = 82%,  reduce = 0%, Cumulative
> CPU
> > >> 760.34 sec
> > >> 2014-03-31 18:44:31,728 Stage-1 map = 0%,  reduce = 0%
> > >> 2014-03-31 18:45:06,918 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
> > >> 213.81 sec
> > >> 2014-03-31 18:45:07,952 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
> > >> 216.83 sec
> > >> 2014-03-31 18:45:08,979 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU
> > >> 229.15 sec
> > >> 2014-03-31 18:45:10,007 Stage-1 map = 11%,  reduce = 0%, Cumulative
> CPU
> > >> 244.42 sec
> > >> 2014-03-31 18:45:11,040 Stage-1 map = 14%,  reduce = 0%, Cumulative
> CPU
> > >> 247.31 sec
> > >> 2014-03-31 18:45:12,072 Stage-1 map = 18%,  reduce = 0%, Cumulative
> CPU
> > >> 259.5 sec
> > >> 2014-03-31 18:45:13,105 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 274.72 sec
> > >> 2014-03-31 18:45:14,135 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 280.76 sec
> > >> 2014-03-31 18:45:15,170 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 292.9 sec
> > >> 2014-03-31 18:45:16,202 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 305.16 sec
> > >> 2014-03-31 18:45:17,233 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 314.21 sec
> > >> 2014-03-31 18:45:18,264 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 323.34 sec
> > >> 2014-03-31 18:45:19,294 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 335.6 sec
> > >> 2014-03-31 18:45:20,325 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 344.71 sec
> > >> 2014-03-31 18:45:21,355 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 353.8 sec
> > >> 2014-03-31 18:45:22,385 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 366.06 sec
> > >> 2014-03-31 18:45:23,415 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 375.2 sec
> > >> 2014-03-31 18:45:24,449 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 384.28 sec
> > >> 2014-03-31 18:45:25,481 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 396.54 sec
> > >> 2014-03-31 18:45:26,512 Stage-1 map = 25%,  reduce = 0%, Cumulative
> CPU
> > >> 408.72 sec
> > >> 2014-03-31 18:45:27,549 Stage-1 map = 25%,  reduce = 0%, Cumulative
> CPU
> > >> 414.69 sec
> > >> 2014-03-31 18:45:28,582 Stage-1 map = 30%,  reduce = 0%, Cumulative
> CPU
> > >> 426.99 sec
> > >> 2014-03-31 18:45:29,614 Stage-1 map = 32%,  reduce = 0%, Cumulative
> CPU
> > >> 439.25 sec
> > >> 2014-03-31 18:45:30,653 Stage-1 map = 34%,  reduce = 0%, Cumulative
> CPU
> > >> 448.25 sec
> > >> 2014-03-31 18:45:31,683 Stage-1 map = 39%,  reduce = 0%, Cumulative
> CPU
> > >> 460.5 sec
> > >> 2014-03-31 18:45:32,723 Stage-1 map = 41%,  reduce = 0%, Cumulative
> CPU
> > >> 469.63 sec
> > >> 2014-03-31 18:45:33,754 Stage-1 map = 43%,  reduce = 0%, Cumulative
> CPU
> > >> 478.67 sec
> > >>
> >
>

Re: RM ha issuses

Posted by Azuryy Yu <az...@gmail.com>.
Hi Karthik,
I ram a common MR job, it does work well during RM failover.

job progress:
(there is failover with red font)

14/04/01 10:01:38 INFO mapreduce.Job:  map 61% reduce 8%
14/04/01 10:01:40 INFO mapreduce.Job:  map 61% reduce 10%
14/04/01 10:01:41 INFO mapreduce.Job:  map 62% reduce 10%
14/04/01 10:01:44 INFO mapreduce.Job:  map 63% reduce 10%
14/04/01 10:01:47 INFO mapreduce.Job:  map 64% reduce 10%
14/04/01 10:02:36 INFO mapreduce.Job:  map 60% reduce 0%
14/04/01 10:02:40 INFO client.ConfiguredRMFailoverProxyProvider: Failing
over to rm2
14/04/01 10:03:00 INFO mapreduce.Job:  map 63% reduce 0%
14/04/01 10:03:02 INFO mapreduce.Job:  map 66% reduce 2%
14/04/01 10:03:04 INFO mapreduce.Job:  map 67% reduce 2%
14/04/01 10:03:06 INFO mapreduce.Job:  map 69% reduce 2%
14/04/01 10:03:08 INFO mapreduce.Job:  map 71% reduce 2%
14/04/01 10:03:10 INFO mapreduce.Job:  map 72% reduce 2%

So Hive job tasks are all restart during failover, please take a look.



On Tue, Apr 1, 2014 at 7:20 AM, Azuryy <az...@gmail.com> wrote:

> I will run a MR job to verify it.
>
> Stop RM means yarn-daemon.sh stop resourcemanager
>
> Thanks
> Sent from my iPhone5s
>
> > On 2014年4月1日, at 0:38, Karthik Kambatla <ka...@cloudera.com> wrote:
> >
> > Thanks for reporting this, Azuryy. Indeed, this is surprising.
> >
> > I don't quite understand how Hive works; do you mind running a vanilla MR
> > job and verifying if this is indeed the case. Also, when you say you
> > stopped the Active RM, you mean only the RM process - correct?
> >
> >
> >> On Mon, Mar 31, 2014 at 3:46 AM, Azuryy Yu <az...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I built from trunk, and configured RM Ha, then I submitted a hive job.
> >> total 11 maps, then I stopped active RM when 6 maps finished.
> >>
> >> but Hive shows me all map tasks restat again. This is conflict with the
> >> design description.
> >>
> >> job progress:
> >> 2014-03-31 18:44:14,088 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU
> >> 713.84 sec
> >> 2014-03-31 18:44:15,128 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU
> >> 722.83 sec
> >> 2014-03-31 18:44:16,160 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU
> >> 731.95 sec
> >> 2014-03-31 18:44:17,191 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU
> >> 744.17 sec
> >> 2014-03-31 18:44:18,220 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU
> >> 756.22 sec
> >> 2014-03-31 18:44:19,250 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU
> >> 762.4 sec
> >> 2014-03-31 18:44:20,281 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU
> >> 774.64 sec
> >> 2014-03-31 18:44:21,306 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU
> >> 786.49 sec
> >> 2014-03-31 18:44:22,334 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU
> >> 792.59 sec
> >> 2014-03-31 18:44:23,363 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU
> >> 807.58 sec
> >> 2014-03-31 18:44:24,392 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU
> >> 815.96 sec
> >> 2014-03-31 18:44:25,416 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU
> >> 823.83 sec
> >> 2014-03-31 18:44:26,443 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU
> >> 826.84 sec
> >> 2014-03-31 18:44:27,472 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU
> >> 832.16 sec
> >> 2014-03-31 18:44:28,501 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU
> >> 839.73 sec
> >> 2014-03-31 18:44:29,531 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU
> >> 844.45 sec
> >> 2014-03-31 18:44:30,564 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU
> >> 760.34 sec
> >> 2014-03-31 18:44:31,728 Stage-1 map = 0%,  reduce = 0%
> >> 2014-03-31 18:45:06,918 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
> >> 213.81 sec
> >> 2014-03-31 18:45:07,952 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
> >> 216.83 sec
> >> 2014-03-31 18:45:08,979 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU
> >> 229.15 sec
> >> 2014-03-31 18:45:10,007 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU
> >> 244.42 sec
> >> 2014-03-31 18:45:11,040 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU
> >> 247.31 sec
> >> 2014-03-31 18:45:12,072 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU
> >> 259.5 sec
> >> 2014-03-31 18:45:13,105 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
> >> 274.72 sec
> >> 2014-03-31 18:45:14,135 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
> >> 280.76 sec
> >> 2014-03-31 18:45:15,170 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
> >> 292.9 sec
> >> 2014-03-31 18:45:16,202 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
> >> 305.16 sec
> >> 2014-03-31 18:45:17,233 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
> >> 314.21 sec
> >> 2014-03-31 18:45:18,264 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
> >> 323.34 sec
> >> 2014-03-31 18:45:19,294 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
> >> 335.6 sec
> >> 2014-03-31 18:45:20,325 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
> >> 344.71 sec
> >> 2014-03-31 18:45:21,355 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
> >> 353.8 sec
> >> 2014-03-31 18:45:22,385 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
> >> 366.06 sec
> >> 2014-03-31 18:45:23,415 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
> >> 375.2 sec
> >> 2014-03-31 18:45:24,449 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
> >> 384.28 sec
> >> 2014-03-31 18:45:25,481 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU
> >> 396.54 sec
> >> 2014-03-31 18:45:26,512 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU
> >> 408.72 sec
> >> 2014-03-31 18:45:27,549 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU
> >> 414.69 sec
> >> 2014-03-31 18:45:28,582 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU
> >> 426.99 sec
> >> 2014-03-31 18:45:29,614 Stage-1 map = 32%,  reduce = 0%, Cumulative CPU
> >> 439.25 sec
> >> 2014-03-31 18:45:30,653 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU
> >> 448.25 sec
> >> 2014-03-31 18:45:31,683 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU
> >> 460.5 sec
> >> 2014-03-31 18:45:32,723 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU
> >> 469.63 sec
> >> 2014-03-31 18:45:33,754 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU
> >> 478.67 sec
> >>
>