You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Flavio Pompermaier <po...@okkam.it> on 2014/05/20 09:16:49 UTC

Region servers crashing during mapreduce

Hi to all,

I'm using Cloudera CDH4 (4.5.0) with default parameters and HBase 0.94.6.
I'm experiencing a bad behaviour of my mapreduce jobs, where region servers
keep crashing. I checked the logs and the region servers seems to die
without logging anything..this seems to happen at the 2nd or 3rd times I
submit a job..can someone help me in figuring out what's happening?

Thanks in advance,
Flavio

Re: Region servers crashing during mapreduce

Posted by Marcos Ortiz <ml...@uci.cu>.

If you are using just two nodes, you should aware about the resources that you 
allocate for every process (JT, TT, MS, RS, etc) particularly the memory that you are 
using for region hosting, Java ops processes, sorting, map/reduces tasks, etc
-- 
Marcos Ortiz[1] (@marcosluis2186[2])
http://about.me/marcosortiz[3] 
On Tuesday, May 20, 2014 05:18:08 PM Flavio Pompermaier wrote:
> Thanks for the explanation Marcos. For the moment we started this cluster
> with 2 nodes so I had to share almost everything.. :)
> Do I have to be careful with something? Do I have to increase some timeout
> or decrease the caching of the scan maybe?
> 
> Best,
> Flavio
> 
> On Tue, May 20, 2014 at 4:05 PM, Marcos Ortiz <ml...@uci.cu> wrote:
> >  Based in your hbase-cmf-hbase1-MASTER.log, the problems come after the
> > 
> > region splitting process, particularly when the SplitManager finishes its
> > spliting tasks, the regions in the myserver1 server are put offline, and
> > the Master throw the NotServingRegionException.
> > 
> > 
> > 
> > Then. the process continues with the myserver2, after the same step of the
> > SplitManager finishes.
> > 
> > 
> > 
> > Zookeeper seems to work OK .
> > 
> > 
> > 
> > Do you have the RegionServers sharing the same resources with the
> > TaskTrackers?
> > 
> > --
> > 
> > Marcos Ortiz <http://www.linkedin.com/in/mlortiz>
> > (@marcosluis2186<http://twitter.com/marcosluis2186> )
> > 
> > http://about.me/marcosortiz
> > 
> > On Tuesday, May 20, 2014 02:18:50 PM Flavio Pompermaier wrote:
> > > In the attached zip the config files generated by Cloudera. The
> > > core-site
> > > 
> > > and the hdfs-site are slightly different if I download them from
> > 
> > mapreduce
> > 
> > > or hbase service..and I don't know why..
> > > 
> > > 
> > > 
> > > Attached also the logs of the HBase master, zookeeper (in the range of
> > 
> > time
> > 
> > > where I experienced region server problems).
> > > 
> > > Can you find something useful to solve the issue?
> > > 
> > > 
> > > 
> > > When I set up the scanner I do:
> > > 
> > > 
> > > 
> > > Scan scan = new Scan();
> > > 
> > > scan.setCacheBlocks(false);
> > > 
> > > scan.addColumn(family, qualifier);
> > > 
> > > scan.setCaching(1000);
> > > 
> > > scan.setMaxVersions(1);
> > > 
> > > 
> > > 
> > > Best,
> > > 
> > > Flavio
> > > 
> > > 
> > > 
> > > On Tue, May 20, 2014 at 12:24 PM, Geovanie Marquez <
> > > 
> > > > geovanie.marquez@gmail.com> wrote:
> > > >> It's really not going to be useful to guess without more log
> > > >> 
> > > >> investigation.check the master node logs to see when the first region
> > > >> 
> > > >> server went down and correlate zookeeper and region server logs to
> > > >> the
> > > >> 
> > > >> minute or two before it died.
> > > >> 
> > > >> 
> > > >> 
> > > >> It could be garbage collection or high scan batches killing your
> > 
> > servers
> > 
> > > >> occasionally.
> > > >> 
> > > >> On May 20, 2014 3:17 AM, "Flavio Pompermaier" <po...@okkam.it>
> > > >> 
> > > >> wrote:
> > > >> > Hi to all,
> > > >> > 
> > > >> > 
> > > >> > 
> > > >> > I'm using Cloudera CDH4 (4.5.0) with default parameters and HBase
> > > >> 
> > > >> 0.94.6.
> > > >> 
> > > >> > I'm experiencing a bad behaviour of my mapreduce jobs, where region
> > > >> 
> > > >> servers
> > > >> 
> > > >> > keep crashing. I checked the logs and the region servers seems to
> > 
> > die
> > 
> > > >> > without logging anything..this seems to happen at the 2nd or 3rd
> > 
> > times
> > 
> > > >> > I
> > > >> > 
> > > >> > submit a job..can someone help me in figuring out what's happening?
> > > >> > 
VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu

Re: Region servers crashing during mapreduce

Posted by Flavio Pompermaier <po...@okkam.it>.

Thanks for the explanation Marcos. For the moment we started this cluster
with 2 nodes so I had to share almost everything.. :)
Do I have to be careful with something? Do I have to increase some timeout
or decrease the caching of the scan maybe?

Best,
Flavio


On Tue, May 20, 2014 at 4:05 PM, Marcos Ortiz <ml...@uci.cu> wrote:

>  Based in your hbase-cmf-hbase1-MASTER.log, the problems come after the
> region splitting process, particularly when the SplitManager finishes its
> spliting tasks, the regions in the myserver1 server are put offline, and
> the Master throw the NotServingRegionException.
>
>
>
> Then. the process continues with the myserver2, after the same step of the
> SplitManager finishes.
>
>
>
> Zookeeper seems to work OK .
>
>
>
> Do you have the RegionServers sharing the same resources with the
> TaskTrackers?
>
> --
>
> Marcos Ortiz <http://www.linkedin.com/in/mlortiz> (@marcosluis2186<http://twitter.com/marcosluis2186>
> )
>
> http://about.me/marcosortiz
>
> On Tuesday, May 20, 2014 02:18:50 PM Flavio Pompermaier wrote:
>
> > In the attached zip the config files generated by Cloudera. The core-site
>
> > and the hdfs-site are slightly different if I download them from
> mapreduce
>
> > or hbase service..and I don't know why..
>
> >
>
> > Attached also the logs of the HBase master, zookeeper (in the range of
> time
>
> > where I experienced region server problems).
>
> > Can you find something useful to solve the issue?
>
> >
>
> > When I set up the scanner I do:
>
> >
>
> > Scan scan = new Scan();
>
> > scan.setCacheBlocks(false);
>
> > scan.addColumn(family, qualifier);
>
> > scan.setCaching(1000);
>
> > scan.setMaxVersions(1);
>
> >
>
> > Best,
>
> > Flavio
>
> >
>
> > On Tue, May 20, 2014 at 12:24 PM, Geovanie Marquez <
>
> >
>
> > > geovanie.marquez@gmail.com> wrote:
>
> > >> It's really not going to be useful to guess without more log
>
> > >> investigation.check the master node logs to see when the first region
>
> > >> server went down and correlate zookeeper and region server logs to the
>
> > >> minute or two before it died.
>
> > >>
>
> > >> It could be garbage collection or high scan batches killing your
> servers
>
> > >> occasionally.
>
> > >> On May 20, 2014 3:17 AM, "Flavio Pompermaier" <po...@okkam.it>
>
> > >>
>
> > >> wrote:
>
> > >> > Hi to all,
>
> > >> >
>
> > >> > I'm using Cloudera CDH4 (4.5.0) with default parameters and HBase
>
> > >>
>
> > >> 0.94.6.
>
> > >>
>
> > >> > I'm experiencing a bad behaviour of my mapreduce jobs, where region
>
> > >>
>
> > >> servers
>
> > >>
>
> > >> > keep crashing. I checked the logs and the region servers seems to
> die
>
> > >> > without logging anything..this seems to happen at the 2nd or 3rd
> times
>
> > >> > I
>
> > >> > submit a job..can someone help me in figuring out what's happening?
>
> > >> >
>
> > >> > Thanks in advance,
>
> > >> > Flavio
>
> ------------------------------
>
> VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de
> julio de 2014. Ver www.uci.cu
>

Re: Region servers crashing during mapreduce

Posted by Marcos Ortiz <ml...@uci.cu>.

Based in your hbase-cmf-hbase1-MASTER.log, the problems come after the region 
splitting process, particularly when the SplitManager finishes its spliting tasks, the 
regions in the myserver1 server are put offline, and the Master throw the 
NotServingRegionException.

Then. the process continues with the myserver2, after the same step of the 
SplitManager finishes.

Zookeeper seems to work OK .

Do you have the RegionServers sharing the same resources with the TaskTrackers?    
-- 
Marcos Ortiz[1] (@marcosluis2186[2])
http://about.me/marcosortiz[3] 
On Tuesday, May 20, 2014 02:18:50 PM Flavio Pompermaier wrote:
> In the attached zip the config files generated by Cloudera. The core-site
> and the hdfs-site are slightly different if I download them from mapreduce
> or hbase service..and I don't know why..
> 
> Attached also the logs of the HBase master, zookeeper (in the range of time
> where I experienced region server problems).
> Can you find something useful to solve the issue?
> 
> When I set up the scanner I do:
> 
> Scan scan = new Scan();
> scan.setCacheBlocks(false);
> scan.addColumn(family, qualifier);
> scan.setCaching(1000);
> scan.setMaxVersions(1);
> 
> Best,
> Flavio
> 
> On Tue, May 20, 2014 at 12:24 PM, Geovanie Marquez <
> 
> > geovanie.marquez@gmail.com> wrote:
> >> It's really not going to be useful to guess without more log
> >> investigation.check the master node logs to see when the first region
> >> server went down and correlate zookeeper and region server logs to the
> >> minute or two before it died.
> >> 
> >> It could be garbage collection or high scan batches killing your servers
> >> occasionally.
> >> On May 20, 2014 3:17 AM, "Flavio Pompermaier" <po...@okkam.it>
> >> 
> >> wrote:
> >> > Hi to all,
> >> > 
> >> > I'm using Cloudera CDH4 (4.5.0) with default parameters and HBase
> >> 
> >> 0.94.6.
> >> 
> >> > I'm experiencing a bad behaviour of my mapreduce jobs, where region
> >> 
> >> servers
> >> 
> >> > keep crashing. I checked the logs and the region servers seems to die
> >> > without logging anything..this seems to happen at the 2nd or 3rd times
> >> > I
> >> > submit a job..can someone help me in figuring out what's happening?
> >> > 
> >> > Thanks in advance,
> >> > Flavio

--------
[1] http://www.linkedin.com/in/mlortiz
[2] http://twitter.com/marcosluis2186
[3] http://about.me/marcosortiz

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu

Re: Region servers crashing during mapreduce

Posted by Flavio Pompermaier <po...@okkam.it>.

In the attached zip the config files generated by Cloudera. The core-site
and the hdfs-site are slightly different if I download them from mapreduce
or hbase service..and I don't know why..

Attached also the logs of the HBase master, zookeeper (in the range of time
where I experienced region server problems).
Can you find something useful to solve the issue?

When I set up the scanner I do:

Scan scan = new Scan();
scan.setCacheBlocks(false);
scan.addColumn(family, qualifier);
scan.setCaching(1000);
scan.setMaxVersions(1);

Best,
Flavio

On Tue, May 20, 2014 at 12:24 PM, Geovanie Marquez <
> geovanie.marquez@gmail.com> wrote:
>
>> It's really not going to be useful to guess without more log
>> investigation.check the master node logs to see when the first region
>> server went down and correlate zookeeper and region server logs to the
>> minute or two before it died.
>>
>> It could be garbage collection or high scan batches killing your servers
>> occasionally.
>> On May 20, 2014 3:17 AM, "Flavio Pompermaier" <po...@okkam.it>
>> wrote:
>>
>> > Hi to all,
>> >
>> > I'm using Cloudera CDH4 (4.5.0) with default parameters and HBase
>> 0.94.6.
>> > I'm experiencing a bad behaviour of my mapreduce jobs, where region
>> servers
>> > keep crashing. I checked the logs and the region servers seems to die
>> > without logging anything..this seems to happen at the 2nd or 3rd times I
>> > submit a job..can someone help me in figuring out what's happening?
>> >
>> > Thanks in advance,
>> > Flavio
>> >
>>
>

Re: Region servers crashing during mapreduce

Posted by Geovanie Marquez <ge...@gmail.com>.

It's really not going to be useful to guess without more log
investigation.check the master node logs to see when the first region
server went down and correlate zookeeper and region server logs to the
minute or two before it died.

It could be garbage collection or high scan batches killing your servers
occasionally.
On May 20, 2014 3:17 AM, "Flavio Pompermaier" <po...@okkam.it> wrote:

> Hi to all,
>
> I'm using Cloudera CDH4 (4.5.0) with default parameters and HBase 0.94.6.
> I'm experiencing a bad behaviour of my mapreduce jobs, where region servers
> keep crashing. I checked the logs and the region servers seems to die
> without logging anything..this seems to happen at the 2nd or 3rd times I
> submit a job..can someone help me in figuring out what's happening?
>
> Thanks in advance,
> Flavio
>

Re: Region servers crashing during mapreduce

Posted by Marcos Ortiz <ml...@uci.cu>.

On 20/05/14 03:16, Flavio Pompermaier wrote:
> Hi to all,
>
> I'm using Cloudera CDH4 (4.5.0) with default parameters and HBase 0.94.6.
> I'm experiencing a bad behaviour of my mapreduce jobs, where region servers
> keep crashing. I checked the logs and the region servers seems to die
> without logging anything..this seems to happen at the 2nd or 3rd times I
> submit a job..can someone help me in figuring out what's happening?
>
> Thanks in advance,
> Flavio
>
Could you share your conf. files to see them?
-- 
@marcosluis2186 <http://twitter.com/marcosluis2186>


VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu