You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2011/09/02 23:49:11 UTC

[jira] [Created] (HBASE-4329) Use NextGen Hadoop to deploy HBase

Use NextGen Hadoop to deploy HBase
----------------------------------

                 Key: HBASE-4329
                 URL: https://issues.apache.org/jira/browse/HBASE-4329
             Project: HBase
          Issue Type: Brainstorming
            Reporter: Arun C Murthy


Currently (circa 2011), with due respect, it's not practical to run shared, multi-tenant HBase clusters on the largest Hadoop installs (of 4000+ nodes).

As an interim, I'd like to brainstorm using NextGen Hadoop (MAPREDUCE-279) to deploy HBase for focussed sets of applications/users/organizations. Thus, one could deploy a smaller instance of HBase (100s of nodes) in a large Hadoop cluster and use it for a set of applications.

The other advantage is that the resource usage of HBase (master, region-server etc.) is accounted for in the overall utilization of the cluster and, conceivably, aid in resource tracking, capacity planning etc.

----

Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4329) Use NextGen Hadoop to deploy HBase

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096986#comment-13096986 ] 

Arun C Murthy commented on HBASE-4329:
--------------------------------------

bq. Why you say that? (I don't disagree but a list of why's would help figure what the fit criteria for closing this issue are).

Stack, first up, I didn't mean to start to flame - I'm sure you know that. :)

FWIW, talking to folks around, isolation and support for prioritization to ensure a single user/application cannot *hog* a HBase cluster (or parts thereof) is something I've heard as concern. This dovetails very well with our experience running both HDFS and MapReduce at scale, as a shared resource. Again, this isn't to claim it's a solved problem in Hadoop core, just something we've focussed on, for a while now.

Hence, my thinking was we could use YARN as an intermediate solution. I discussed this idea with Andrew at the Summit and he didn't give me the impression that I was off my rocker, maybe he was just being polite and has a great poker face! 

Thanks for pointing me to HBASE-4120, that seems related - I wasn't aware. It's a lot to digest, I'll try to spend some time on it. If the HBase community decides to focus on the multi-tenancy/isolation problem (via HBASE-4120 etc.) - great! We can close this discussion. If not, I'd like to brainstorm with you guys for an intermediate solution. 

It really depends where you guys want to focus your energies.

bq. Meantime, where I work, mapreduce is the problem (smile). We're messing with cgroup containing mapreduce so it doesn't steal resources from hdfs (and hbase).

I'm sure - MR needs more work, I'm painfully aware of this! :)

We plan to go the cgroups route sometime right after we ship 0.23, we could share notes and ideas.

bq. You want us to get into the nextgen mr container because then there is one place to go to do accounting?

The idea is that *iff* the HBase community wants to use this an an intermediate solution, using the RM will ensure the resource usage of HBase is accounted for w.r.t to the applications/queues/organizations etc.

> Use NextGen Hadoop to deploy HBase
> ----------------------------------
>
>                 Key: HBASE-4329
>                 URL: https://issues.apache.org/jira/browse/HBASE-4329
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Arun C Murthy
>
> Currently (circa 2011), with due respect, it's not practical to run shared, multi-tenant HBase clusters on the largest Hadoop installs (of 4000+ nodes).
> As an interim, I'd like to brainstorm using NextGen Hadoop (MAPREDUCE-279) to deploy HBase for focussed sets of applications/users/organizations. Thus, one could deploy a smaller instance of HBase (100s of nodes) in a large Hadoop cluster and use it for a set of applications.
> The other advantage is that the resource usage of HBase (master, region-server etc.) is accounted for in the overall utilization of the cluster and, conceivably, aid in resource tracking, capacity planning etc.
> ----
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4329) Use NextGen Hadoop to deploy HBase

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104222#comment-13104222 ] 

Andrew Purtell commented on HBASE-4329:
---------------------------------------

bq. I discussed this idea with Andrew at the Summit and he didn't give me the impression that I was off my rocker

No, certainly not. With YARN, Hadoop generalizes resource management. It could well make sense to use YARN to partition resources for HBase or other components. It may not be the only story but it makes sense to look at certainly.

> Use NextGen Hadoop to deploy HBase
> ----------------------------------
>
>                 Key: HBASE-4329
>                 URL: https://issues.apache.org/jira/browse/HBASE-4329
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Arun C Murthy
>
> Currently (circa 2011), with due respect, it's not practical to run shared, multi-tenant HBase clusters on the largest Hadoop installs (of 4000+ nodes).
> As an interim, I'd like to brainstorm using NextGen Hadoop (MAPREDUCE-279) to deploy HBase for focussed sets of applications/users/organizations. Thus, one could deploy a smaller instance of HBase (100s of nodes) in a large Hadoop cluster and use it for a set of applications.
> The other advantage is that the resource usage of HBase (master, region-server etc.) is accounted for in the overall utilization of the cluster and, conceivably, aid in resource tracking, capacity planning etc.
> ----
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4329) Use NextGen Hadoop to deploy HBase

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096961#comment-13096961 ] 

stack commented on HBASE-4329:
------------------------------

bq. Currently (circa 2011), with due respect, it's not practical to run shared, multi-tenant HBase clusters on the largest Hadoop installs (of 4000+ nodes).

Why you say that? (I don't disagree but a list of why's would help figure what the fit criteria for closing this issue are).

Up to now, in our ignorance, we've been thinking a fat hbase install w/ multitenancy enabled via hbase security acls.  Regards resources consumed by the running hbase, there is an interesting contribution over in HBASE-4120 that is provocative but I'm thinking needs a bit of work before it'd be committed.  Meantime, where I work, mapreduce is the problem (smile).  We're messing with cgroup containing mapreduce so it doesn't steal resources from hdfs (and hbase).

You want us to get into the nextgen mr container because then there is one place to go to do accounting?  I need to do some background reading over on mapreduce-279 to see what we're missing.

Good on you Arun.

> Use NextGen Hadoop to deploy HBase
> ----------------------------------
>
>                 Key: HBASE-4329
>                 URL: https://issues.apache.org/jira/browse/HBASE-4329
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Arun C Murthy
>
> Currently (circa 2011), with due respect, it's not practical to run shared, multi-tenant HBase clusters on the largest Hadoop installs (of 4000+ nodes).
> As an interim, I'd like to brainstorm using NextGen Hadoop (MAPREDUCE-279) to deploy HBase for focussed sets of applications/users/organizations. Thus, one could deploy a smaller instance of HBase (100s of nodes) in a large Hadoop cluster and use it for a set of applications.
> The other advantage is that the resource usage of HBase (master, region-server etc.) is accounted for in the overall utilization of the cluster and, conceivably, aid in resource tracking, capacity planning etc.
> ----
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4329) Use NextGen Hadoop to deploy HBase

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099813#comment-13099813 ] 

stack commented on HBASE-4329:
------------------------------

bq. Hence, my thinking was we could use YARN as an intermediate solution.

Why would it be only an intermediate soln Arun?  What else needs to be done?

bq.  If the HBase community decides to focus on the multi-tenancy/isolation problem ... great! We can close this discussion. If not, I'd like to brainstorm with you guys for an intermediate solution.

Well, we want to play nice with the neighbours.

I saw the show a few times -- at the hadoop summit -- but I haven't yet read the book.  Will be back when I learn more about YARN container.

> Use NextGen Hadoop to deploy HBase
> ----------------------------------
>
>                 Key: HBASE-4329
>                 URL: https://issues.apache.org/jira/browse/HBASE-4329
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Arun C Murthy
>
> Currently (circa 2011), with due respect, it's not practical to run shared, multi-tenant HBase clusters on the largest Hadoop installs (of 4000+ nodes).
> As an interim, I'd like to brainstorm using NextGen Hadoop (MAPREDUCE-279) to deploy HBase for focussed sets of applications/users/organizations. Thus, one could deploy a smaller instance of HBase (100s of nodes) in a large Hadoop cluster and use it for a set of applications.
> The other advantage is that the resource usage of HBase (master, region-server etc.) is accounted for in the overall utilization of the cluster and, conceivably, aid in resource tracking, capacity planning etc.
> ----
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4329) Use NextGen Hadoop to deploy HBase

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096348#comment-13096348 ] 

Arun C Murthy commented on HBASE-4329:
--------------------------------------

Potentially this is related to Andrew's ideas in HBASE-4047 for using NextGen Hadoop to run generic co-processors.

> Use NextGen Hadoop to deploy HBase
> ----------------------------------
>
>                 Key: HBASE-4329
>                 URL: https://issues.apache.org/jira/browse/HBASE-4329
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Arun C Murthy
>
> Currently (circa 2011), with due respect, it's not practical to run shared, multi-tenant HBase clusters on the largest Hadoop installs (of 4000+ nodes).
> As an interim, I'd like to brainstorm using NextGen Hadoop (MAPREDUCE-279) to deploy HBase for focussed sets of applications/users/organizations. Thus, one could deploy a smaller instance of HBase (100s of nodes) in a large Hadoop cluster and use it for a set of applications.
> The other advantage is that the resource usage of HBase (master, region-server etc.) is accounted for in the overall utilization of the cluster and, conceivably, aid in resource tracking, capacity planning etc.
> ----
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4329) Use NextGen Hadoop to deploy HBase

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104253#comment-13104253 ] 

Arun C Murthy commented on HBASE-4329:
--------------------------------------

bq. It may not be the only story but it makes sense to look at certainly.

Agree completely. Thanks.

> Use NextGen Hadoop to deploy HBase
> ----------------------------------
>
>                 Key: HBASE-4329
>                 URL: https://issues.apache.org/jira/browse/HBASE-4329
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Arun C Murthy
>
> Currently (circa 2011), with due respect, it's not practical to run shared, multi-tenant HBase clusters on the largest Hadoop installs (of 4000+ nodes).
> As an interim, I'd like to brainstorm using NextGen Hadoop (MAPREDUCE-279) to deploy HBase for focussed sets of applications/users/organizations. Thus, one could deploy a smaller instance of HBase (100s of nodes) in a large Hadoop cluster and use it for a set of applications.
> The other advantage is that the resource usage of HBase (master, region-server etc.) is accounted for in the overall utilization of the cluster and, conceivably, aid in resource tracking, capacity planning etc.
> ----
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira