You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by José Feiteirinha <j...@feiteira.org> on 2013/02/19 17:56:44 UTC

Newbie: HBase good for Tree like structure?

Dear all,

I hope this is the right place for this question.

I'm currently in the starting stages of developing a software that may
'explode' in terms of users and data. I'm considering a very basic
tree-like data-structure and would like to know your thoughts regarding
HBase/Hadoop.

My reason is that I would like to be prepared from the get-go for large
data.

My structure is planned as such:

   - The data be nodes of a huge multidimensional tree.
   - I'm planning on having each row containing the full node path, e.g.
   "root.grandparentX.parentY.babyZ" (or ? "babyZ.parentY.grandparentX.root" )
   - However in terms of data per node, it should be pretty much static.


While this is a very simple structure, it does seem to be beneficial to use
HBase / Hadoop just for the scalability alone. I also understood that if I
get to billions of rows, only an HBase like approach can sustain me?

My idea is to start with a simple standalone server and then expand the
cluster as the load & data grow.

If you may,
I would like your thoughts, mostly regarding weather I'm using an Hammer to
kill Ants, my proposed data-structure or any other advice you may have.


Kind regards,
José

--
José Feiteirinha

www.feiteira.org

Re: Newbie: HBase good for Tree like structure?

Posted by Wellington Chevreuil <we...@gmail.com>.

Hi José,

I think your structure is ok to define HBase row keys. The main issue
you`ll have then is row you`ll be able to build these keys, so that you can
properly access your tree nodes.

Regarding your scalability concerns, you should not worry to start with a
small Hadoop/Hbase cluster (even standalone) for development/concept proof
purposes, but that definitely will require a more robust environment if you
get to a billion of rows later. You'll have to start thinking on read/write
load patterns, so that you'll be able to take the best advantage of HBase
as your problem solution.

Regards,
Wellington.

2013/2/19 José Feiteirinha <j...@feiteira.org>

> Dear all,
>
> I hope this is the right place for this question.
>
> I'm currently in the starting stages of developing a software that may
> 'explode' in terms of users and data. I'm considering a very basic
> tree-like data-structure and would like to know your thoughts regarding
> HBase/Hadoop.
>
> My reason is that I would like to be prepared from the get-go for large
> data.
>
> My structure is planned as such:
>
>    - The data be nodes of a huge multidimensional tree.
>    - I'm planning on having each row containing the full node path, e.g.
>    "root.grandparentX.parentY.babyZ" (or ? "babyZ.parentY.grandparentX.root" )
>    - However in terms of data per node, it should be pretty much static.
>
>
> While this is a very simple structure, it does seem to be beneficial to
> use HBase / Hadoop just for the scalability alone. I also understood that
> if I get to billions of rows, only an HBase like approach can sustain me?
>
> My idea is to start with a simple standalone server and then expand the
> cluster as the load & data grow.
>
> If you may,
> I would like your thoughts, mostly regarding weather I'm using an Hammer
> to kill Ants, my proposed data-structure or any other advice you may have.
>
>
> Kind regards,
> José
>
> --
> José Feiteirinha
>
> www.feiteira.org
>

Re: Newbie: HBase good for Tree like structure?

Posted by Wellington Chevreuil <we...@gmail.com>.

Hi José,

I think your structure is ok to define HBase row keys. The main issue
you`ll have then is row you`ll be able to build these keys, so that you can
properly access your tree nodes.

Regarding your scalability concerns, you should not worry to start with a
small Hadoop/Hbase cluster (even standalone) for development/concept proof
purposes, but that definitely will require a more robust environment if you
get to a billion of rows later. You'll have to start thinking on read/write
load patterns, so that you'll be able to take the best advantage of HBase
as your problem solution.

Regards,
Wellington.

2013/2/19 José Feiteirinha <j...@feiteira.org>

> Dear all,
>
> I hope this is the right place for this question.
>
> I'm currently in the starting stages of developing a software that may
> 'explode' in terms of users and data. I'm considering a very basic
> tree-like data-structure and would like to know your thoughts regarding
> HBase/Hadoop.
>
> My reason is that I would like to be prepared from the get-go for large
> data.
>
> My structure is planned as such:
>
>    - The data be nodes of a huge multidimensional tree.
>    - I'm planning on having each row containing the full node path, e.g.
>    "root.grandparentX.parentY.babyZ" (or ? "babyZ.parentY.grandparentX.root" )
>    - However in terms of data per node, it should be pretty much static.
>
>
> While this is a very simple structure, it does seem to be beneficial to
> use HBase / Hadoop just for the scalability alone. I also understood that
> if I get to billions of rows, only an HBase like approach can sustain me?
>
> My idea is to start with a simple standalone server and then expand the
> cluster as the load & data grow.
>
> If you may,
> I would like your thoughts, mostly regarding weather I'm using an Hammer
> to kill Ants, my proposed data-structure or any other advice you may have.
>
>
> Kind regards,
> José
>
> --
> José Feiteirinha
>
> www.feiteira.org
>

Re: Newbie: HBase good for Tree like structure?

Posted by Wellington Chevreuil <we...@gmail.com>.

Hi José,

I think your structure is ok to define HBase row keys. The main issue
you`ll have then is row you`ll be able to build these keys, so that you can
properly access your tree nodes.

Regarding your scalability concerns, you should not worry to start with a
small Hadoop/Hbase cluster (even standalone) for development/concept proof
purposes, but that definitely will require a more robust environment if you
get to a billion of rows later. You'll have to start thinking on read/write
load patterns, so that you'll be able to take the best advantage of HBase
as your problem solution.

Regards,
Wellington.

2013/2/19 José Feiteirinha <j...@feiteira.org>

> Dear all,
>
> I hope this is the right place for this question.
>
> I'm currently in the starting stages of developing a software that may
> 'explode' in terms of users and data. I'm considering a very basic
> tree-like data-structure and would like to know your thoughts regarding
> HBase/Hadoop.
>
> My reason is that I would like to be prepared from the get-go for large
> data.
>
> My structure is planned as such:
>
>    - The data be nodes of a huge multidimensional tree.
>    - I'm planning on having each row containing the full node path, e.g.
>    "root.grandparentX.parentY.babyZ" (or ? "babyZ.parentY.grandparentX.root" )
>    - However in terms of data per node, it should be pretty much static.
>
>
> While this is a very simple structure, it does seem to be beneficial to
> use HBase / Hadoop just for the scalability alone. I also understood that
> if I get to billions of rows, only an HBase like approach can sustain me?
>
> My idea is to start with a simple standalone server and then expand the
> cluster as the load & data grow.
>
> If you may,
> I would like your thoughts, mostly regarding weather I'm using an Hammer
> to kill Ants, my proposed data-structure or any other advice you may have.
>
>
> Kind regards,
> José
>
> --
> José Feiteirinha
>
> www.feiteira.org
>

Re: Newbie: HBase good for Tree like structure?

Posted by Wellington Chevreuil <we...@gmail.com>.

Hi José,

I think your structure is ok to define HBase row keys. The main issue
you`ll have then is row you`ll be able to build these keys, so that you can
properly access your tree nodes.

Regarding your scalability concerns, you should not worry to start with a
small Hadoop/Hbase cluster (even standalone) for development/concept proof
purposes, but that definitely will require a more robust environment if you
get to a billion of rows later. You'll have to start thinking on read/write
load patterns, so that you'll be able to take the best advantage of HBase
as your problem solution.

Regards,
Wellington.

2013/2/19 José Feiteirinha <j...@feiteira.org>

> Dear all,
>
> I hope this is the right place for this question.
>
> I'm currently in the starting stages of developing a software that may
> 'explode' in terms of users and data. I'm considering a very basic
> tree-like data-structure and would like to know your thoughts regarding
> HBase/Hadoop.
>
> My reason is that I would like to be prepared from the get-go for large
> data.
>
> My structure is planned as such:
>
>    - The data be nodes of a huge multidimensional tree.
>    - I'm planning on having each row containing the full node path, e.g.
>    "root.grandparentX.parentY.babyZ" (or ? "babyZ.parentY.grandparentX.root" )
>    - However in terms of data per node, it should be pretty much static.
>
>
> While this is a very simple structure, it does seem to be beneficial to
> use HBase / Hadoop just for the scalability alone. I also understood that
> if I get to billions of rows, only an HBase like approach can sustain me?
>
> My idea is to start with a simple standalone server and then expand the
> cluster as the load & data grow.
>
> If you may,
> I would like your thoughts, mostly regarding weather I'm using an Hammer
> to kill Ants, my proposed data-structure or any other advice you may have.
>
>
> Kind regards,
> José
>
> --
> José Feiteirinha
>
> www.feiteira.org
>