You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Jason Rutherglen <ja...@gmail.com> on 2011/07/17 03:28:51 UTC

Starting the Hadoop DataNode inside the HBase process?

Running the DataNode inside of an HBase process seems like this could
be a good option to enable?

Specifically because it would reduce the number of processes on an
HBase instance.  Eg, I think one of the barriers to adoption for HBase
in general is the multiple processes management part.  Are there any
known issues with doing this?

In addition to the DataNode, one could auto-specify which servers
should be running Zookeeper and start ZK inside of the HBase
process(es).

Re: Starting the Hadoop DataNode inside the HBase process?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Just for reference, there's this jira:
https://issues.apache.org/jira/browse/HBASE-2811

J-D

On Sat, Jul 16, 2011 at 6:28 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
> Running the DataNode inside of an HBase process seems like this could
> be a good option to enable?
>
> Specifically because it would reduce the number of processes on an
> HBase instance.  Eg, I think one of the barriers to adoption for HBase
> in general is the multiple processes management part.  Are there any
> known issues with doing this?
>
> In addition to the DataNode, one could auto-specify which servers
> should be running Zookeeper and start ZK inside of the HBase
> process(es).
>

Re: Starting the Hadoop DataNode inside the HBase process?

Posted by Ted Dunning <td...@maprtech.com>.

On Mon, Jul 18, 2011 at 9:32 AM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> > My gut is that this would be a maintenance headache
>
> What specifically do you think would cause a problem?
>

Tracking versions for one.  Everybody has a different favorite.  That is the
nice thing about standards.  There are so many to choose from.

Besides, how do you handle people who want the snapshots and higher
performance that you get from maprfs?

> Internal management of ZK is already an option (and I don't recommend that
> > either, for different reasons)
>
> What are the reasons?
>

The basic issue is that it is nice to use ZK to determine which services are
up and to avoid race conditions as services come up.  If some of the
services are actually running ZK, how do you distinguish that process
getting hung from not being up?

Also, ZK is very reliable and that is the primary virtue we are trying to
capitalize on when we use it as a coordination service.  Given that, how is
it a good thing to incorporate it into software that is inevitably less
stable?  Isn't that tantamount to  giving ZK's primary virtue?

Re: Starting the Hadoop DataNode inside the HBase process?

Posted by Jason Rutherglen <ja...@gmail.com>.

> My gut is that this would be a maintenance headache

What specifically do you think would cause a problem?

> Internal management of ZK is already an option (and I don't recommend that
> either, for different reasons)

What are the reasons?

On Sat, Jul 16, 2011 at 7:34 PM, Ted Dunning <td...@maprtech.com> wrote:
> On Sat, Jul 16, 2011 at 6:28 PM, Jason Rutherglen <
> jason.rutherglen@gmail.com> wrote:
>
>> Running the DataNode inside of an HBase process seems like this could
>> be a good option to enable?
>>
>
> My gut is that this would be a maintenance headache.
>
>
>> Specifically because it would reduce the number of processes on an
>> HBase instance.  Eg, I think one of the barriers to adoption for HBase
>> in general is the multiple processes management part.  Are there any
>> known issues with doing this?
>>
>
> Well, I think you are right about adoption.  To take Mongo as a straw man,
> the new user impression is that you untar a file and run a program.  Then
> you run another one on another machine.  Leaving aside the fact that Mongo
> has admin issues at scale, this style of installation definitely enhances
> the adoption for simple instances.
>
> I am not sure, however, whether this option is really available for HBase.
>  HDFS is not a simple animal no matter how you package it.
>
> In addition to the DataNode, one could auto-specify which servers
>> should be running Zookeeper and start ZK inside of the HBase
>> process(es).
>>
>
> Internal management of ZK is already an option (and I don't recommend that
> either, for different reasons).
>

Re: Starting the Hadoop DataNode inside the HBase process?

Posted by Ted Dunning <td...@maprtech.com>.

On Sat, Jul 16, 2011 at 6:28 PM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> Running the DataNode inside of an HBase process seems like this could
> be a good option to enable?
>

My gut is that this would be a maintenance headache.

> Specifically because it would reduce the number of processes on an
> HBase instance.  Eg, I think one of the barriers to adoption for HBase
> in general is the multiple processes management part.  Are there any
> known issues with doing this?
>

Well, I think you are right about adoption.  To take Mongo as a straw man,
the new user impression is that you untar a file and run a program.  Then
you run another one on another machine.  Leaving aside the fact that Mongo
has admin issues at scale, this style of installation definitely enhances
the adoption for simple instances.

I am not sure, however, whether this option is really available for HBase.
 HDFS is not a simple animal no matter how you package it.

In addition to the DataNode, one could auto-specify which servers
> should be running Zookeeper and start ZK inside of the HBase
> process(es).
>

Internal management of ZK is already an option (and I don't recommend that
either, for different reasons).