You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Peter Veentjer <al...@gmail.com> on 2011/01/05 18:35:07 UTC

Scheduling map/reduce jobs

He Guys,

although it isn't completely related to HBase. Is there support for
scheduling map reduce jobs?

E.g. I want to do a map reduce job that automatically removes certain
elements from hbase and perhaps some additional cleanup (I know that there
is support for lease times).

I could have every node in the cluster schedule this job every hour, but if
there are n nodes I don't want this job to be running n times. Only 1 time
would be sufficient. I can create some checking logic for scheduling a map
reduce job where a job that should not be run is rejected/ignored, but if
there is something out of the box.

Another question:

We are building an environment where client provided plugins can be deployed
and obey some kind of SLA (e.g. we want it to be running on 2 nodes at
least). So the old plugin needs to be undeployed, the jars of the new plugin
need to be copied on demand to the nodes and the new plugin needs to be
deployed. Is there functionality for this behavior out of the box or does it
need to be implemented by ourselves?

Re: Scheduling map/reduce jobs

Posted by Bill Graham <bi...@gmail.com>.
Take a look at Oozie or Azkaban:

http://www.quora.com/What-are-the-differences-advantages-disadvantages-of-Azkaban-vs-Oozie

On Wed, Jan 5, 2011 at 9:35 AM, Peter Veentjer <al...@gmail.com> wrote:
> He Guys,
>
> although it isn't completely related to HBase. Is there support for
> scheduling map reduce jobs?
>
> E.g. I want to do a map reduce job that automatically removes certain
> elements from hbase and perhaps some additional cleanup (I know that there
> is support for lease times).
>
> I could have every node in the cluster schedule this job every hour, but if
> there are n nodes I don't want this job to be running n times. Only 1 time
> would be sufficient. I can create some checking logic for scheduling a map
> reduce job where a job that should not be run is rejected/ignored, but if
> there is something out of the box.
>
> Another question:
>
> We are building an environment where client provided plugins can be deployed
> and obey some kind of SLA (e.g. we want it to be running on 2 nodes at
> least). So the old plugin needs to be undeployed, the jars of the new plugin
> need to be copied on demand to the nodes and the new plugin needs to be
> deployed. Is there functionality for this behavior out of the box or does it
> need to be implemented by ourselves?
>

RE: perplexing HBase bug: looking for where to learn how to debug

Posted by Jonathan Gray <jg...@fb.com>.
The first step to debugging HBase is usually going through the Master and RegionServer logs.  Sometimes it can be more art than science but a majority of our debugging is done with log analysis.

If you can find specific offending regions, you can parse through the logs looking for mentions of that region and see where things went wrong.

If you're just getting started with HBase, I would also recommend working with the latest 0.90RC as issues like you're seeing have been fixed since then.

JG

> -----Original Message-----
> From: Chet Murthy [mailto:chet@watson.ibm.com]
> Sent: Wednesday, January 05, 2011 10:38 PM
> To: user@hbase.apache.org; dev@hbase.apache.org
> Subject: perplexing HBase bug: looking for where to learn how to debug
> 
> 
> I've just started using hbase, and have encountered a perplexing bug.
> The bug occurs on one set of Linux boxes, and not on another set, even
> though they're both x86_64 Linux, and both are running -identical- JVM
> releases.
> 
> I've attached a description of the probelm below, but really, what I'm
> wondering is, if there's a description someplace of various places to turn on
> instrumentation in hbase, so I can figure out what's wrong.  I plan to do a lot
> of work with hbase in the future, so knowing how to debug it is in some
> sense more important than finding out the fix for this particular bug.
> 
> I really am looking to learn how to fish here.  I'm sure I can slowly dig around
> find all the various tracing facilities and such, but I figured there might be a
> cheat-sheet someplace ....
> 
> Thanks,
> --chet--
> 
> ==========================================================
> ======
> 
> Basically, I set up hadoop 0.20.0 + hbase 0.20.6, in a cluster with 1 namenode,
> and anywhere from 2-5 datanodes which are also regionservers.  I'm running
> a single zookeeper node, since this is just for testing.  Furthermore, all these
> machines are isolated, high-performance, SMP, with lots of memory.
> Modern Intel/AMD boxes.
> 
> The cluster which 'works" runs Fedora 9 on Opteron, and the one that "fails"
> runs RHEL5 on Intel Xeon (something-or-other -- I forget).
> 
> The test I'm running is Yahoo Cluster benchmark (YCSB).  I'm just trying to
> load 1m records, and on the cluster that fails, I get,
> variously:
> 
> (1) a load will fail with an error like:
> 
> com.yahoo.ycsb.DBException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact region server  -- nothing found, no 'location' returned,
> tableName=usertable, reload=true -- for region , row 'user1000015788', but
> failed after 11 attempts.
> Exceptions:
> org.apache.hadoop.hbase.client.NoServerForRegionException: No server
> address listed in .META. for region usertable,,1294095537393
> org.apache.hadoop.hbase.client.NoServerForRegionException: No server
> address listed in .META. for region usertable,,1294095537393
> 
> (b) a load will succeed, but there won't be 1m rows (where I use the "count"
> command in "hbase shell" to count).
> 
> (c) sometimes, a "truncate" will fail, with an error of the form above.  the
> step which fails is the "disable" step.
> 
> Java stack-dumps from the regionservers don't show any threads doing
> anything interesting.  I don't know how to interrogate Zookeeper; perhaps
> there's something messed-up in there ....

RE: perplexing HBase bug: looking for where to learn how to debug

Posted by Jonathan Gray <jg...@fb.com>.
The first step to debugging HBase is usually going through the Master and RegionServer logs.  Sometimes it can be more art than science but a majority of our debugging is done with log analysis.

If you can find specific offending regions, you can parse through the logs looking for mentions of that region and see where things went wrong.

If you're just getting started with HBase, I would also recommend working with the latest 0.90RC as issues like you're seeing have been fixed since then.

JG

> -----Original Message-----
> From: Chet Murthy [mailto:chet@watson.ibm.com]
> Sent: Wednesday, January 05, 2011 10:38 PM
> To: user@hbase.apache.org; dev@hbase.apache.org
> Subject: perplexing HBase bug: looking for where to learn how to debug
> 
> 
> I've just started using hbase, and have encountered a perplexing bug.
> The bug occurs on one set of Linux boxes, and not on another set, even
> though they're both x86_64 Linux, and both are running -identical- JVM
> releases.
> 
> I've attached a description of the probelm below, but really, what I'm
> wondering is, if there's a description someplace of various places to turn on
> instrumentation in hbase, so I can figure out what's wrong.  I plan to do a lot
> of work with hbase in the future, so knowing how to debug it is in some
> sense more important than finding out the fix for this particular bug.
> 
> I really am looking to learn how to fish here.  I'm sure I can slowly dig around
> find all the various tracing facilities and such, but I figured there might be a
> cheat-sheet someplace ....
> 
> Thanks,
> --chet--
> 
> ==========================================================
> ======
> 
> Basically, I set up hadoop 0.20.0 + hbase 0.20.6, in a cluster with 1 namenode,
> and anywhere from 2-5 datanodes which are also regionservers.  I'm running
> a single zookeeper node, since this is just for testing.  Furthermore, all these
> machines are isolated, high-performance, SMP, with lots of memory.
> Modern Intel/AMD boxes.
> 
> The cluster which 'works" runs Fedora 9 on Opteron, and the one that "fails"
> runs RHEL5 on Intel Xeon (something-or-other -- I forget).
> 
> The test I'm running is Yahoo Cluster benchmark (YCSB).  I'm just trying to
> load 1m records, and on the cluster that fails, I get,
> variously:
> 
> (1) a load will fail with an error like:
> 
> com.yahoo.ycsb.DBException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact region server  -- nothing found, no 'location' returned,
> tableName=usertable, reload=true -- for region , row 'user1000015788', but
> failed after 11 attempts.
> Exceptions:
> org.apache.hadoop.hbase.client.NoServerForRegionException: No server
> address listed in .META. for region usertable,,1294095537393
> org.apache.hadoop.hbase.client.NoServerForRegionException: No server
> address listed in .META. for region usertable,,1294095537393
> 
> (b) a load will succeed, but there won't be 1m rows (where I use the "count"
> command in "hbase shell" to count).
> 
> (c) sometimes, a "truncate" will fail, with an error of the form above.  the
> step which fails is the "disable" step.
> 
> Java stack-dumps from the regionservers don't show any threads doing
> anything interesting.  I don't know how to interrogate Zookeeper; perhaps
> there's something messed-up in there ....

perplexing HBase bug: looking for where to learn how to debug

Posted by Chet Murthy <ch...@watson.ibm.com>.
I've just started using hbase, and have encountered a perplexing bug.
The bug occurs on one set of Linux boxes, and not on another set, even
though they're both x86_64 Linux, and both are running -identical- JVM
releases.

I've attached a description of the probelm below, but really, what I'm
wondering is, if there's a description someplace of various places to
turn on instrumentation in hbase, so I can figure out what's wrong.  I
plan to do a lot of work with hbase in the future, so knowing how to
debug it is in some sense more important than finding out the fix for
this particular bug.

I really am looking to learn how to fish here.  I'm sure I can slowly
dig around find all the various tracing facilities and such, but I
figured there might be a cheat-sheet someplace ....

Thanks,
--chet--

================================================================

Basically, I set up hadoop 0.20.0 + hbase 0.20.6, in a cluster with 1
namenode, and anywhere from 2-5 datanodes which are also
regionservers.  I'm running a single zookeeper node, since this is
just for testing.  Furthermore, all these machines are isolated,
high-performance, SMP, with lots of memory.  Modern Intel/AMD boxes.

The cluster which 'works" runs Fedora 9 on Opteron, and the one that
"fails" runs RHEL5 on Intel Xeon (something-or-other -- I forget).

The test I'm running is Yahoo Cluster benchmark (YCSB).  I'm just
trying to load 1m records, and on the cluster that fails, I get,
variously:

(1) a load will fail with an error like:

com.yahoo.ycsb.DBException: 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact 
region server  -- nothing found, no 'location' returned, tableName=usertable, 
reload=true -- for region , row 'user1000015788', but failed after 11 
attempts.
Exceptions:
org.apache.hadoop.hbase.client.NoServerForRegionException: No server address
listed in .META. for region usertable,,1294095537393
org.apache.hadoop.hbase.client.NoServerForRegionException: No server address
listed in .META. for region usertable,,1294095537393

(b) a load will succeed, but there won't be 1m rows (where I use the
"count" command in "hbase shell" to count).

(c) sometimes, a "truncate" will fail, with an error of the form
above.  the step which fails is the "disable" step.

Java stack-dumps from the regionservers don't show any threads doing
anything interesting.  I don't know how to interrogate Zookeeper;
perhaps there's something messed-up in there ....

perplexing HBase bug: looking for where to learn how to debug

Posted by Chet Murthy <ch...@watson.ibm.com>.
I've just started using hbase, and have encountered a perplexing bug.
The bug occurs on one set of Linux boxes, and not on another set, even
though they're both x86_64 Linux, and both are running -identical- JVM
releases.

I've attached a description of the probelm below, but really, what I'm
wondering is, if there's a description someplace of various places to
turn on instrumentation in hbase, so I can figure out what's wrong.  I
plan to do a lot of work with hbase in the future, so knowing how to
debug it is in some sense more important than finding out the fix for
this particular bug.

I really am looking to learn how to fish here.  I'm sure I can slowly
dig around find all the various tracing facilities and such, but I
figured there might be a cheat-sheet someplace ....

Thanks,
--chet--

================================================================

Basically, I set up hadoop 0.20.0 + hbase 0.20.6, in a cluster with 1
namenode, and anywhere from 2-5 datanodes which are also
regionservers.  I'm running a single zookeeper node, since this is
just for testing.  Furthermore, all these machines are isolated,
high-performance, SMP, with lots of memory.  Modern Intel/AMD boxes.

The cluster which 'works" runs Fedora 9 on Opteron, and the one that
"fails" runs RHEL5 on Intel Xeon (something-or-other -- I forget).

The test I'm running is Yahoo Cluster benchmark (YCSB).  I'm just
trying to load 1m records, and on the cluster that fails, I get,
variously:

(1) a load will fail with an error like:

com.yahoo.ycsb.DBException: 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact 
region server  -- nothing found, no 'location' returned, tableName=usertable, 
reload=true -- for region , row 'user1000015788', but failed after 11 
attempts.
Exceptions:
org.apache.hadoop.hbase.client.NoServerForRegionException: No server address
listed in .META. for region usertable,,1294095537393
org.apache.hadoop.hbase.client.NoServerForRegionException: No server address
listed in .META. for region usertable,,1294095537393

(b) a load will succeed, but there won't be 1m rows (where I use the
"count" command in "hbase shell" to count).

(c) sometimes, a "truncate" will fail, with an error of the form
above.  the step which fails is the "disable" step.

Java stack-dumps from the regionservers don't show any threads doing
anything interesting.  I don't know how to interrogate Zookeeper;
perhaps there's something messed-up in there ....

Re: Scheduling map/reduce jobs

Posted by Hari Sreekumar <hs...@clickable.com>.
 Hi Peter,

That part is taken care of by the hadoop framework itself, more
specifically, my the jobtracker.

Hari

On Thu, Jan 6, 2011 at 1:25 AM, Peter Veentjer <al...@gmail.com>wrote:

> Hi Tylen,
>
> I'm not worried about the scheduling since there are tons of mechanisms
> available for that. But what I want to prevent is that every node in the
> cluster starts to schedule the same map/reduce jobs. Only one machine is
> sufficient and when that machine fails, another one should take over.
>
> On Wed, Jan 5, 2011 at 7:31 PM, Tyler Coffin <tc...@rim.com> wrote:
>
> > Cron works great and is probably already on your systems.
> >
> > -----Original Message-----
> > From: Peter Veentjer [mailto:alarmnummer@gmail.com]
> > Sent: January 5, 2011 12:35
> > To: user@hbase.apache.org
> > Subject: Scheduling map/reduce jobs
> >
> > He Guys,
> >
> > although it isn't completely related to HBase. Is there support for
> > scheduling map reduce jobs?
> >
> > E.g. I want to do a map reduce job that automatically removes certain
> > elements from hbase and perhaps some additional cleanup (I know that
> there
> > is support for lease times).
> >
> > I could have every node in the cluster schedule this job every hour, but
> if
> > there are n nodes I don't want this job to be running n times. Only 1
> time
> > would be sufficient. I can create some checking logic for scheduling a
> map
> > reduce job where a job that should not be run is rejected/ignored, but if
> > there is something out of the box.
> >
> > Another question:
> >
> > We are building an environment where client provided plugins can be
> > deployed
> > and obey some kind of SLA (e.g. we want it to be running on 2 nodes at
> > least). So the old plugin needs to be undeployed, the jars of the new
> > plugin
> > need to be copied on demand to the nodes and the new plugin needs to be
> > deployed. Is there functionality for this behavior out of the box or does
> > it
> > need to be implemented by ourselves?
> >
> > ---------------------------------------------------------------------
> > This transmission (including any attachments) may contain confidential
> > information, privileged material (including material protected by the
> > solicitor-client or other applicable privileges), or constitute
> non-public
> > information. Any use of this information by anyone other than the
> intended
> > recipient is prohibited. If you have received this transmission in error,
> > please immediately reply to the sender and delete this information from
> your
> > system. Use, dissemination, distribution, or reproduction of this
> > transmission by unintended recipients is not authorized and may be
> unlawful.
> >
>

Re: Scheduling map/reduce jobs

Posted by Peter Veentjer <al...@gmail.com>.
Hi Tylen,

I'm not worried about the scheduling since there are tons of mechanisms
available for that. But what I want to prevent is that every node in the
cluster starts to schedule the same map/reduce jobs. Only one machine is
sufficient and when that machine fails, another one should take over.

On Wed, Jan 5, 2011 at 7:31 PM, Tyler Coffin <tc...@rim.com> wrote:

> Cron works great and is probably already on your systems.
>
> -----Original Message-----
> From: Peter Veentjer [mailto:alarmnummer@gmail.com]
> Sent: January 5, 2011 12:35
> To: user@hbase.apache.org
> Subject: Scheduling map/reduce jobs
>
> He Guys,
>
> although it isn't completely related to HBase. Is there support for
> scheduling map reduce jobs?
>
> E.g. I want to do a map reduce job that automatically removes certain
> elements from hbase and perhaps some additional cleanup (I know that there
> is support for lease times).
>
> I could have every node in the cluster schedule this job every hour, but if
> there are n nodes I don't want this job to be running n times. Only 1 time
> would be sufficient. I can create some checking logic for scheduling a map
> reduce job where a job that should not be run is rejected/ignored, but if
> there is something out of the box.
>
> Another question:
>
> We are building an environment where client provided plugins can be
> deployed
> and obey some kind of SLA (e.g. we want it to be running on 2 nodes at
> least). So the old plugin needs to be undeployed, the jars of the new
> plugin
> need to be copied on demand to the nodes and the new plugin needs to be
> deployed. Is there functionality for this behavior out of the box or does
> it
> need to be implemented by ourselves?
>
> ---------------------------------------------------------------------
> This transmission (including any attachments) may contain confidential
> information, privileged material (including material protected by the
> solicitor-client or other applicable privileges), or constitute non-public
> information. Any use of this information by anyone other than the intended
> recipient is prohibited. If you have received this transmission in error,
> please immediately reply to the sender and delete this information from your
> system. Use, dissemination, distribution, or reproduction of this
> transmission by unintended recipients is not authorized and may be unlawful.
>

RE: Scheduling map/reduce jobs

Posted by Tyler Coffin <tc...@rim.com>.
Cron works great and is probably already on your systems.

-----Original Message-----
From: Peter Veentjer [mailto:alarmnummer@gmail.com] 
Sent: January 5, 2011 12:35
To: user@hbase.apache.org
Subject: Scheduling map/reduce jobs

He Guys,

although it isn't completely related to HBase. Is there support for
scheduling map reduce jobs?

E.g. I want to do a map reduce job that automatically removes certain
elements from hbase and perhaps some additional cleanup (I know that there
is support for lease times).

I could have every node in the cluster schedule this job every hour, but if
there are n nodes I don't want this job to be running n times. Only 1 time
would be sufficient. I can create some checking logic for scheduling a map
reduce job where a job that should not be run is rejected/ignored, but if
there is something out of the box.

Another question:

We are building an environment where client provided plugins can be deployed
and obey some kind of SLA (e.g. we want it to be running on 2 nodes at
least). So the old plugin needs to be undeployed, the jars of the new plugin
need to be copied on demand to the nodes and the new plugin needs to be
deployed. Is there functionality for this behavior out of the box or does it
need to be implemented by ourselves?

---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.