You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by myhbase <my...@126.com> on 2013/06/22 16:21:26 UTC

how many severs in a hbase cluster

Hello All,

I learn hbase almost from papers and books, according to my
understanding, HBase is the kind of architecture which is more appliable
to a big cluster. We should have many HDFS nodes, and many HBase(region
server) nodes. If we only have several severs(5-8), it seems hbase is
not a good choice, please correct me if I am wrong. In addition, how
many nodes usually we can start to consider the hbase solution and how
about the physic mem size and other hardware resource in each node, any
reference document or cases? Thanks.

--Ning


Re: how many severs in a hbase cluster

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Ning,

I'm personally running HBase in production with only 8 nodes.

As you will see here: http://wiki.apache.org/hadoop/Hbase/PoweredBy
some are also running small clusters.

So I will say it more depend on you need than on the size.

I will say the minimum is 4 to make sure you have your factor 3
replication and some stability if a node fails, but you might be good
also with 3.And there is almost no maximum.

Regarding memory, the more, the merrier... Ỳou also need to make sure
you have many disks per server. Forget that if you have just 1. I'm
able to run with 3, but it's limit. 5 is a good number, and some are
running with 12...

Again, it depend if your application is more read intensive, or CPU
intensive, etc. Can you tell us a bit more about what you want to
achieve?

Thanks,

JM

2013/6/22 myhbase <my...@126.com>:
> Hello All,
>
> I learn hbase almost from papers and books, according to my
> understanding, HBase is the kind of architecture which is more appliable
> to a big cluster. We should have many HDFS nodes, and many HBase(region
> server) nodes. If we only have several severs(5-8), it seems hbase is
> not a good choice, please correct me if I am wrong. In addition, how
> many nodes usually we can start to consider the hbase solution and how
> about the physic mem size and other hardware resource in each node, any
> reference document or cases? Thanks.
>
> --Ning
>

Re: how many severs in a hbase cluster

Posted by Kevin O'dell <ke...@cloudera.com>.
Mohammad,

  The NN is low write, and has pretty static memory usage.  You will see
the NN memory usage go up as you add blocks/files.  Since, HBase has memory
limitations(GC's Fault), and should have ~1 file per store you will not
have a lot of memory pressure on the NN.  The JT is the same way, it scales
up usage based on number of MR jobs.  In a sane HBase environment you are
not going to be running 1000s of MR jobs against HBase.  ZK also has pretty
minimum requirements - 1GB of memory, dedicated CPU core, and place to
write to with low I/O wait.  I have always found the NN, SNN, and JT to be
the next best place to put the ZK if dedicated HW is not available.  I have
seen some strange behavior with ZK runs on DN/TT/RS nodes.  From
unexplained timeouts to corrupt znodes causing failures(This one was real
nasty).


On Sat, Jun 22, 2013 at 7:21 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Iain,
>
>          You would put a lot of pressure on the RAM if you do that. NN
> already has high memory requirement and then having JT+ZK on the same
> machine would be too heavy, IMHO.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Sun, Jun 23, 2013 at 4:07 AM, iain wright <ia...@gmail.com> wrote:
>
> > Hi Mohammad,
> >
> > I am curious why you chose not to put the third ZK on the NN+JT? I was
> > planning on doing that on a new cluster and want to confirm it would be
> > okay.
> >
> >
> > --
> > Iain Wright
> > Cell: (562) 852-5916
> >
> > <http://www.labctsi.org/>
> > This email message is confidential, intended only for the recipient(s)
> > named above and may contain information that is privileged, exempt from
> > disclosure under applicable law. If you are not the intended recipient,
> do
> > not disclose or disseminate the message to anyone except the intended
> > recipient. If you have received this message in error, or are not the
> named
> > recipient(s), please immediately notify the sender by return email, and
> > delete all copies of this message.
> >
> >
> > On Sat, Jun 22, 2013 at 10:05 AM, Mohammad Tariq <do...@gmail.com>
> > wrote:
> >
> > > Yeah, I forgot to mention that no. of ZKs should be odd. Perhaps those
> > > parentheses made that statement look like an optional statement. Just
> to
> > > clarify it was mandatory.
> > >
> > > Warm Regards,
> > > Tariq
> > > cloudfront.blogspot.com
> > >
> > >
> > > On Sat, Jun 22, 2013 at 9:45 PM, Kevin O'dell <
> kevin.odell@cloudera.com
> > > >wrote:
> > >
> > > > If you run ZK with a DN/TT/RS please make sure to dedicate a hard
> drive
> > > and
> > > > a core to the ZK process. I have seen many strange occurrences.
> > > > On Jun 22, 2013 12:10 PM, "Jean-Marc Spaggiari" <
> > jean-marc@spaggiari.org
> > > >
> > > > wrote:
> > > >
> > > > > You HAVE TO run a ZK3, or else you don't need to have ZK2 and any
> ZK
> > > > > failure will be an issue. You need to have an odd number of ZK
> > > > > servers...
> > > > >
> > > > > Also, if you don't run MR jobs, you don't need the TT and JT...
> Else,
> > > > > everything below is correct. But there is many other options, all
> > > > > depend on your needs and the hardware you have ;)
> > > > >
> > > > > JM
> > > > >
> > > > > 2013/6/22 Mohammad Tariq <do...@gmail.com>:
> > > > > > With 8 machines you can do something like this :
> > > > > >
> > > > > > Machine 1 - NN+JT
> > > > > > Machine 2 - SNN+ZK1
> > > > > > Machine 3 - HM+ZK2
> > > > > > Machine 4-8 - DN+TT+RS
> > > > > > (You can run ZK3 on a slave node with some additional memory).
> > > > > >
> > > > > > DN and RS run on the same machine. Although RSs are said to hold
> > the
> > > > > data,
> > > > > > the data is actually stored in DNs. Replication is managed at
> HDFS
> > > > level.
> > > > > > You don't have to worry about that.
> > > > > >
> > > > > > You can visit this link <
> > > > http://hbase.apache.org/book/perf.writing.html>
> > > > > to
> > > > > > see how to write efficiently into HBase. With a small field there
> > > > should
> > > > > > not be any problem except storage and increased metadata, as
> you'll
> > > > have
> > > > > > many small cells. If possible club several small fields into one
> > and
> > > > put
> > > > > > them together in one cell.
> > > > > >
> > > > > > HTH
> > > > > >
> > > > > > Warm Regards,
> > > > > > Tariq
> > > > > > cloudfront.blogspot.com
> > > > > >
> > > > > >
> > > > > > On Sat, Jun 22, 2013 at 8:31 PM, myhbase <my...@126.com>
> wrote:
> > > > > >
> > > > > >> Thanks for your response.
> > > > > >>
> > > > > >> Now if 5 servers are enough, how can I install  and configure my
> > > > nodes?
> > > > > If
> > > > > >> I need 3 replicas in case data loss, I should at least have 3
> > > > > datanodes, we
> > > > > >> still have namenode, regionserver and HMaster nodes, zookeeper
> > > nodes,
> > > > > some
> > > > > >> of them must be installed in the same machine. The datanode
> seems
> > > the
> > > > > disk
> > > > > >> IO sensitive node while region server is the mem sensitive, can
> I
> > > > > install
> > > > > >> them in the same machine? Any suggestion on the deployment plan?
> > > > > >>
> > > > > >> My business requirement is that the write is much more than
> > > read(7:3),
> > > > > and
> > > > > >> I have another concern that I have a field which will have the
> > > 8~15KB
> > > > in
> > > > > >>  data size, I am not sure, there will be any problem in hbase
> when
> > > it
> > > > > runs
> > > > > >> compaction and split in regions.
> > > > > >>
> > > > > >>  Oh, you already have heavyweight's input :).
> > > > > >>>
> > > > > >>> Thanks JM.
> > > > > >>>
> > > > > >>> Warm Regards,
> > > > > >>> Tariq
> > > > > >>> cloudfront.blogspot.com
> > > > > >>>
> > > > > >>>
> > > > > >>> On Sat, Jun 22, 2013 at 8:05 PM, Mohammad Tariq <
> > > dontariq@gmail.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>  Hello there,
> > > > > >>>>
> > > > > >>>>          IMHO, 5-8 servers are sufficient enough to start
> with.
> > > But
> > > > > it's
> > > > > >>>> all relative to the data you have and the intensity of your
> > > > > reads/writes.
> > > > > >>>> You should have different strategies though, based on whether
> > it's
> > > > > 'read'
> > > > > >>>> or 'write'. You actually can't define 'big' in absolute terms.
> > My
> > > > > cluster
> > > > > >>>> might be big for me, but for someone else it might still be
> not
> > > big
> > > > > >>>> enough
> > > > > >>>> or for someone it might be very big. Long story short it
> depends
> > > on
> > > > > your
> > > > > >>>> needs. If you are able to achieve your goal with 5-8 RSs, then
> > > > having
> > > > > >>>> more
> > > > > >>>> machines will be a wastage, I think.
> > > > > >>>>
> > > > > >>>> But you should always keep in mind that HBase is kinda greedy
> > when
> > > > it
> > > > > >>>> comes to memory. For a decent load 4G is sufficient, IMHO. But
> > it
> > > > > again
> > > > > >>>> depends on operations you are gonna perform. If you have large
> > > > > clusters
> > > > > >>>> where you are planning to run MR jobs frequently you are
> better
> > > off
> > > > > with
> > > > > >>>> additional 2G.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Warm Regards,
> > > > > >>>> Tariq
> > > > > >>>> cloudfront.blogspot.com
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Sat, Jun 22, 2013 at 7:51 PM, myhbase <my...@126.com>
> > wrote:
> > > > > >>>>
> > > > > >>>>  Hello All,
> > > > > >>>>>
> > > > > >>>>> I learn hbase almost from papers and books, according to my
> > > > > >>>>> understanding, HBase is the kind of architecture which is
> more
> > > > > appliable
> > > > > >>>>> to a big cluster. We should have many HDFS nodes, and many
> > > > > HBase(region
> > > > > >>>>> server) nodes. If we only have several severs(5-8), it seems
> > > hbase
> > > > is
> > > > > >>>>> not a good choice, please correct me if I am wrong. In
> > addition,
> > > > how
> > > > > >>>>> many nodes usually we can start to consider the hbase
> solution
> > > and
> > > > > how
> > > > > >>>>> about the physic mem size and other hardware resource in each
> > > node,
> > > > > any
> > > > > >>>>> reference document or cases? Thanks.
> > > > > >>>>>
> > > > > >>>>> --Ning
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
>



-- 
Kevin O'Dell
Systems Engineer, Cloudera

Re: how many severs in a hbase cluster

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Iain,

         You would put a lot of pressure on the RAM if you do that. NN
already has high memory requirement and then having JT+ZK on the same
machine would be too heavy, IMHO.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Sun, Jun 23, 2013 at 4:07 AM, iain wright <ia...@gmail.com> wrote:

> Hi Mohammad,
>
> I am curious why you chose not to put the third ZK on the NN+JT? I was
> planning on doing that on a new cluster and want to confirm it would be
> okay.
>
>
> --
> Iain Wright
> Cell: (562) 852-5916
>
> <http://www.labctsi.org/>
> This email message is confidential, intended only for the recipient(s)
> named above and may contain information that is privileged, exempt from
> disclosure under applicable law. If you are not the intended recipient, do
> not disclose or disseminate the message to anyone except the intended
> recipient. If you have received this message in error, or are not the named
> recipient(s), please immediately notify the sender by return email, and
> delete all copies of this message.
>
>
> On Sat, Jun 22, 2013 at 10:05 AM, Mohammad Tariq <do...@gmail.com>
> wrote:
>
> > Yeah, I forgot to mention that no. of ZKs should be odd. Perhaps those
> > parentheses made that statement look like an optional statement. Just to
> > clarify it was mandatory.
> >
> > Warm Regards,
> > Tariq
> > cloudfront.blogspot.com
> >
> >
> > On Sat, Jun 22, 2013 at 9:45 PM, Kevin O'dell <kevin.odell@cloudera.com
> > >wrote:
> >
> > > If you run ZK with a DN/TT/RS please make sure to dedicate a hard drive
> > and
> > > a core to the ZK process. I have seen many strange occurrences.
> > > On Jun 22, 2013 12:10 PM, "Jean-Marc Spaggiari" <
> jean-marc@spaggiari.org
> > >
> > > wrote:
> > >
> > > > You HAVE TO run a ZK3, or else you don't need to have ZK2 and any ZK
> > > > failure will be an issue. You need to have an odd number of ZK
> > > > servers...
> > > >
> > > > Also, if you don't run MR jobs, you don't need the TT and JT... Else,
> > > > everything below is correct. But there is many other options, all
> > > > depend on your needs and the hardware you have ;)
> > > >
> > > > JM
> > > >
> > > > 2013/6/22 Mohammad Tariq <do...@gmail.com>:
> > > > > With 8 machines you can do something like this :
> > > > >
> > > > > Machine 1 - NN+JT
> > > > > Machine 2 - SNN+ZK1
> > > > > Machine 3 - HM+ZK2
> > > > > Machine 4-8 - DN+TT+RS
> > > > > (You can run ZK3 on a slave node with some additional memory).
> > > > >
> > > > > DN and RS run on the same machine. Although RSs are said to hold
> the
> > > > data,
> > > > > the data is actually stored in DNs. Replication is managed at HDFS
> > > level.
> > > > > You don't have to worry about that.
> > > > >
> > > > > You can visit this link <
> > > http://hbase.apache.org/book/perf.writing.html>
> > > > to
> > > > > see how to write efficiently into HBase. With a small field there
> > > should
> > > > > not be any problem except storage and increased metadata, as you'll
> > > have
> > > > > many small cells. If possible club several small fields into one
> and
> > > put
> > > > > them together in one cell.
> > > > >
> > > > > HTH
> > > > >
> > > > > Warm Regards,
> > > > > Tariq
> > > > > cloudfront.blogspot.com
> > > > >
> > > > >
> > > > > On Sat, Jun 22, 2013 at 8:31 PM, myhbase <my...@126.com> wrote:
> > > > >
> > > > >> Thanks for your response.
> > > > >>
> > > > >> Now if 5 servers are enough, how can I install  and configure my
> > > nodes?
> > > > If
> > > > >> I need 3 replicas in case data loss, I should at least have 3
> > > > datanodes, we
> > > > >> still have namenode, regionserver and HMaster nodes, zookeeper
> > nodes,
> > > > some
> > > > >> of them must be installed in the same machine. The datanode seems
> > the
> > > > disk
> > > > >> IO sensitive node while region server is the mem sensitive, can I
> > > > install
> > > > >> them in the same machine? Any suggestion on the deployment plan?
> > > > >>
> > > > >> My business requirement is that the write is much more than
> > read(7:3),
> > > > and
> > > > >> I have another concern that I have a field which will have the
> > 8~15KB
> > > in
> > > > >>  data size, I am not sure, there will be any problem in hbase when
> > it
> > > > runs
> > > > >> compaction and split in regions.
> > > > >>
> > > > >>  Oh, you already have heavyweight's input :).
> > > > >>>
> > > > >>> Thanks JM.
> > > > >>>
> > > > >>> Warm Regards,
> > > > >>> Tariq
> > > > >>> cloudfront.blogspot.com
> > > > >>>
> > > > >>>
> > > > >>> On Sat, Jun 22, 2013 at 8:05 PM, Mohammad Tariq <
> > dontariq@gmail.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>>  Hello there,
> > > > >>>>
> > > > >>>>          IMHO, 5-8 servers are sufficient enough to start with.
> > But
> > > > it's
> > > > >>>> all relative to the data you have and the intensity of your
> > > > reads/writes.
> > > > >>>> You should have different strategies though, based on whether
> it's
> > > > 'read'
> > > > >>>> or 'write'. You actually can't define 'big' in absolute terms.
> My
> > > > cluster
> > > > >>>> might be big for me, but for someone else it might still be not
> > big
> > > > >>>> enough
> > > > >>>> or for someone it might be very big. Long story short it depends
> > on
> > > > your
> > > > >>>> needs. If you are able to achieve your goal with 5-8 RSs, then
> > > having
> > > > >>>> more
> > > > >>>> machines will be a wastage, I think.
> > > > >>>>
> > > > >>>> But you should always keep in mind that HBase is kinda greedy
> when
> > > it
> > > > >>>> comes to memory. For a decent load 4G is sufficient, IMHO. But
> it
> > > > again
> > > > >>>> depends on operations you are gonna perform. If you have large
> > > > clusters
> > > > >>>> where you are planning to run MR jobs frequently you are better
> > off
> > > > with
> > > > >>>> additional 2G.
> > > > >>>>
> > > > >>>>
> > > > >>>> Warm Regards,
> > > > >>>> Tariq
> > > > >>>> cloudfront.blogspot.com
> > > > >>>>
> > > > >>>>
> > > > >>>> On Sat, Jun 22, 2013 at 7:51 PM, myhbase <my...@126.com>
> wrote:
> > > > >>>>
> > > > >>>>  Hello All,
> > > > >>>>>
> > > > >>>>> I learn hbase almost from papers and books, according to my
> > > > >>>>> understanding, HBase is the kind of architecture which is more
> > > > appliable
> > > > >>>>> to a big cluster. We should have many HDFS nodes, and many
> > > > HBase(region
> > > > >>>>> server) nodes. If we only have several severs(5-8), it seems
> > hbase
> > > is
> > > > >>>>> not a good choice, please correct me if I am wrong. In
> addition,
> > > how
> > > > >>>>> many nodes usually we can start to consider the hbase solution
> > and
> > > > how
> > > > >>>>> about the physic mem size and other hardware resource in each
> > node,
> > > > any
> > > > >>>>> reference document or cases? Thanks.
> > > > >>>>>
> > > > >>>>> --Ning
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>
> > > > >>
> > > >
> > >
> >
>

Re: how many severs in a hbase cluster

Posted by iain wright <ia...@gmail.com>.
Hi Mohammad,

I am curious why you chose not to put the third ZK on the NN+JT? I was
planning on doing that on a new cluster and want to confirm it would be
okay.


-- 
Iain Wright
Cell: (562) 852-5916

<http://www.labctsi.org/>
This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.


On Sat, Jun 22, 2013 at 10:05 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Yeah, I forgot to mention that no. of ZKs should be odd. Perhaps those
> parentheses made that statement look like an optional statement. Just to
> clarify it was mandatory.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Sat, Jun 22, 2013 at 9:45 PM, Kevin O'dell <kevin.odell@cloudera.com
> >wrote:
>
> > If you run ZK with a DN/TT/RS please make sure to dedicate a hard drive
> and
> > a core to the ZK process. I have seen many strange occurrences.
> > On Jun 22, 2013 12:10 PM, "Jean-Marc Spaggiari" <jean-marc@spaggiari.org
> >
> > wrote:
> >
> > > You HAVE TO run a ZK3, or else you don't need to have ZK2 and any ZK
> > > failure will be an issue. You need to have an odd number of ZK
> > > servers...
> > >
> > > Also, if you don't run MR jobs, you don't need the TT and JT... Else,
> > > everything below is correct. But there is many other options, all
> > > depend on your needs and the hardware you have ;)
> > >
> > > JM
> > >
> > > 2013/6/22 Mohammad Tariq <do...@gmail.com>:
> > > > With 8 machines you can do something like this :
> > > >
> > > > Machine 1 - NN+JT
> > > > Machine 2 - SNN+ZK1
> > > > Machine 3 - HM+ZK2
> > > > Machine 4-8 - DN+TT+RS
> > > > (You can run ZK3 on a slave node with some additional memory).
> > > >
> > > > DN and RS run on the same machine. Although RSs are said to hold the
> > > data,
> > > > the data is actually stored in DNs. Replication is managed at HDFS
> > level.
> > > > You don't have to worry about that.
> > > >
> > > > You can visit this link <
> > http://hbase.apache.org/book/perf.writing.html>
> > > to
> > > > see how to write efficiently into HBase. With a small field there
> > should
> > > > not be any problem except storage and increased metadata, as you'll
> > have
> > > > many small cells. If possible club several small fields into one and
> > put
> > > > them together in one cell.
> > > >
> > > > HTH
> > > >
> > > > Warm Regards,
> > > > Tariq
> > > > cloudfront.blogspot.com
> > > >
> > > >
> > > > On Sat, Jun 22, 2013 at 8:31 PM, myhbase <my...@126.com> wrote:
> > > >
> > > >> Thanks for your response.
> > > >>
> > > >> Now if 5 servers are enough, how can I install  and configure my
> > nodes?
> > > If
> > > >> I need 3 replicas in case data loss, I should at least have 3
> > > datanodes, we
> > > >> still have namenode, regionserver and HMaster nodes, zookeeper
> nodes,
> > > some
> > > >> of them must be installed in the same machine. The datanode seems
> the
> > > disk
> > > >> IO sensitive node while region server is the mem sensitive, can I
> > > install
> > > >> them in the same machine? Any suggestion on the deployment plan?
> > > >>
> > > >> My business requirement is that the write is much more than
> read(7:3),
> > > and
> > > >> I have another concern that I have a field which will have the
> 8~15KB
> > in
> > > >>  data size, I am not sure, there will be any problem in hbase when
> it
> > > runs
> > > >> compaction and split in regions.
> > > >>
> > > >>  Oh, you already have heavyweight's input :).
> > > >>>
> > > >>> Thanks JM.
> > > >>>
> > > >>> Warm Regards,
> > > >>> Tariq
> > > >>> cloudfront.blogspot.com
> > > >>>
> > > >>>
> > > >>> On Sat, Jun 22, 2013 at 8:05 PM, Mohammad Tariq <
> dontariq@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>  Hello there,
> > > >>>>
> > > >>>>          IMHO, 5-8 servers are sufficient enough to start with.
> But
> > > it's
> > > >>>> all relative to the data you have and the intensity of your
> > > reads/writes.
> > > >>>> You should have different strategies though, based on whether it's
> > > 'read'
> > > >>>> or 'write'. You actually can't define 'big' in absolute terms. My
> > > cluster
> > > >>>> might be big for me, but for someone else it might still be not
> big
> > > >>>> enough
> > > >>>> or for someone it might be very big. Long story short it depends
> on
> > > your
> > > >>>> needs. If you are able to achieve your goal with 5-8 RSs, then
> > having
> > > >>>> more
> > > >>>> machines will be a wastage, I think.
> > > >>>>
> > > >>>> But you should always keep in mind that HBase is kinda greedy when
> > it
> > > >>>> comes to memory. For a decent load 4G is sufficient, IMHO. But it
> > > again
> > > >>>> depends on operations you are gonna perform. If you have large
> > > clusters
> > > >>>> where you are planning to run MR jobs frequently you are better
> off
> > > with
> > > >>>> additional 2G.
> > > >>>>
> > > >>>>
> > > >>>> Warm Regards,
> > > >>>> Tariq
> > > >>>> cloudfront.blogspot.com
> > > >>>>
> > > >>>>
> > > >>>> On Sat, Jun 22, 2013 at 7:51 PM, myhbase <my...@126.com> wrote:
> > > >>>>
> > > >>>>  Hello All,
> > > >>>>>
> > > >>>>> I learn hbase almost from papers and books, according to my
> > > >>>>> understanding, HBase is the kind of architecture which is more
> > > appliable
> > > >>>>> to a big cluster. We should have many HDFS nodes, and many
> > > HBase(region
> > > >>>>> server) nodes. If we only have several severs(5-8), it seems
> hbase
> > is
> > > >>>>> not a good choice, please correct me if I am wrong. In addition,
> > how
> > > >>>>> many nodes usually we can start to consider the hbase solution
> and
> > > how
> > > >>>>> about the physic mem size and other hardware resource in each
> node,
> > > any
> > > >>>>> reference document or cases? Thanks.
> > > >>>>>
> > > >>>>> --Ning
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>
> > > >>
> > >
> >
>

Re: how many severs in a hbase cluster

Posted by Mohammad Tariq <do...@gmail.com>.
Yeah, I forgot to mention that no. of ZKs should be odd. Perhaps those
parentheses made that statement look like an optional statement. Just to
clarify it was mandatory.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Sat, Jun 22, 2013 at 9:45 PM, Kevin O'dell <ke...@cloudera.com>wrote:

> If you run ZK with a DN/TT/RS please make sure to dedicate a hard drive and
> a core to the ZK process. I have seen many strange occurrences.
> On Jun 22, 2013 12:10 PM, "Jean-Marc Spaggiari" <je...@spaggiari.org>
> wrote:
>
> > You HAVE TO run a ZK3, or else you don't need to have ZK2 and any ZK
> > failure will be an issue. You need to have an odd number of ZK
> > servers...
> >
> > Also, if you don't run MR jobs, you don't need the TT and JT... Else,
> > everything below is correct. But there is many other options, all
> > depend on your needs and the hardware you have ;)
> >
> > JM
> >
> > 2013/6/22 Mohammad Tariq <do...@gmail.com>:
> > > With 8 machines you can do something like this :
> > >
> > > Machine 1 - NN+JT
> > > Machine 2 - SNN+ZK1
> > > Machine 3 - HM+ZK2
> > > Machine 4-8 - DN+TT+RS
> > > (You can run ZK3 on a slave node with some additional memory).
> > >
> > > DN and RS run on the same machine. Although RSs are said to hold the
> > data,
> > > the data is actually stored in DNs. Replication is managed at HDFS
> level.
> > > You don't have to worry about that.
> > >
> > > You can visit this link <
> http://hbase.apache.org/book/perf.writing.html>
> > to
> > > see how to write efficiently into HBase. With a small field there
> should
> > > not be any problem except storage and increased metadata, as you'll
> have
> > > many small cells. If possible club several small fields into one and
> put
> > > them together in one cell.
> > >
> > > HTH
> > >
> > > Warm Regards,
> > > Tariq
> > > cloudfront.blogspot.com
> > >
> > >
> > > On Sat, Jun 22, 2013 at 8:31 PM, myhbase <my...@126.com> wrote:
> > >
> > >> Thanks for your response.
> > >>
> > >> Now if 5 servers are enough, how can I install  and configure my
> nodes?
> > If
> > >> I need 3 replicas in case data loss, I should at least have 3
> > datanodes, we
> > >> still have namenode, regionserver and HMaster nodes, zookeeper nodes,
> > some
> > >> of them must be installed in the same machine. The datanode seems the
> > disk
> > >> IO sensitive node while region server is the mem sensitive, can I
> > install
> > >> them in the same machine? Any suggestion on the deployment plan?
> > >>
> > >> My business requirement is that the write is much more than read(7:3),
> > and
> > >> I have another concern that I have a field which will have the 8~15KB
> in
> > >>  data size, I am not sure, there will be any problem in hbase when it
> > runs
> > >> compaction and split in regions.
> > >>
> > >>  Oh, you already have heavyweight's input :).
> > >>>
> > >>> Thanks JM.
> > >>>
> > >>> Warm Regards,
> > >>> Tariq
> > >>> cloudfront.blogspot.com
> > >>>
> > >>>
> > >>> On Sat, Jun 22, 2013 at 8:05 PM, Mohammad Tariq <do...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>  Hello there,
> > >>>>
> > >>>>          IMHO, 5-8 servers are sufficient enough to start with. But
> > it's
> > >>>> all relative to the data you have and the intensity of your
> > reads/writes.
> > >>>> You should have different strategies though, based on whether it's
> > 'read'
> > >>>> or 'write'. You actually can't define 'big' in absolute terms. My
> > cluster
> > >>>> might be big for me, but for someone else it might still be not big
> > >>>> enough
> > >>>> or for someone it might be very big. Long story short it depends on
> > your
> > >>>> needs. If you are able to achieve your goal with 5-8 RSs, then
> having
> > >>>> more
> > >>>> machines will be a wastage, I think.
> > >>>>
> > >>>> But you should always keep in mind that HBase is kinda greedy when
> it
> > >>>> comes to memory. For a decent load 4G is sufficient, IMHO. But it
> > again
> > >>>> depends on operations you are gonna perform. If you have large
> > clusters
> > >>>> where you are planning to run MR jobs frequently you are better off
> > with
> > >>>> additional 2G.
> > >>>>
> > >>>>
> > >>>> Warm Regards,
> > >>>> Tariq
> > >>>> cloudfront.blogspot.com
> > >>>>
> > >>>>
> > >>>> On Sat, Jun 22, 2013 at 7:51 PM, myhbase <my...@126.com> wrote:
> > >>>>
> > >>>>  Hello All,
> > >>>>>
> > >>>>> I learn hbase almost from papers and books, according to my
> > >>>>> understanding, HBase is the kind of architecture which is more
> > appliable
> > >>>>> to a big cluster. We should have many HDFS nodes, and many
> > HBase(region
> > >>>>> server) nodes. If we only have several severs(5-8), it seems hbase
> is
> > >>>>> not a good choice, please correct me if I am wrong. In addition,
> how
> > >>>>> many nodes usually we can start to consider the hbase solution and
> > how
> > >>>>> about the physic mem size and other hardware resource in each node,
> > any
> > >>>>> reference document or cases? Thanks.
> > >>>>>
> > >>>>> --Ning
> > >>>>>
> > >>>>>
> > >>>>>
> > >>
> > >>
> >
>

Re: how many severs in a hbase cluster

Posted by Kevin O'dell <ke...@cloudera.com>.
If you run ZK with a DN/TT/RS please make sure to dedicate a hard drive and
a core to the ZK process. I have seen many strange occurrences.
On Jun 22, 2013 12:10 PM, "Jean-Marc Spaggiari" <je...@spaggiari.org>
wrote:

> You HAVE TO run a ZK3, or else you don't need to have ZK2 and any ZK
> failure will be an issue. You need to have an odd number of ZK
> servers...
>
> Also, if you don't run MR jobs, you don't need the TT and JT... Else,
> everything below is correct. But there is many other options, all
> depend on your needs and the hardware you have ;)
>
> JM
>
> 2013/6/22 Mohammad Tariq <do...@gmail.com>:
> > With 8 machines you can do something like this :
> >
> > Machine 1 - NN+JT
> > Machine 2 - SNN+ZK1
> > Machine 3 - HM+ZK2
> > Machine 4-8 - DN+TT+RS
> > (You can run ZK3 on a slave node with some additional memory).
> >
> > DN and RS run on the same machine. Although RSs are said to hold the
> data,
> > the data is actually stored in DNs. Replication is managed at HDFS level.
> > You don't have to worry about that.
> >
> > You can visit this link <http://hbase.apache.org/book/perf.writing.html>
> to
> > see how to write efficiently into HBase. With a small field there should
> > not be any problem except storage and increased metadata, as you'll have
> > many small cells. If possible club several small fields into one and put
> > them together in one cell.
> >
> > HTH
> >
> > Warm Regards,
> > Tariq
> > cloudfront.blogspot.com
> >
> >
> > On Sat, Jun 22, 2013 at 8:31 PM, myhbase <my...@126.com> wrote:
> >
> >> Thanks for your response.
> >>
> >> Now if 5 servers are enough, how can I install  and configure my nodes?
> If
> >> I need 3 replicas in case data loss, I should at least have 3
> datanodes, we
> >> still have namenode, regionserver and HMaster nodes, zookeeper nodes,
> some
> >> of them must be installed in the same machine. The datanode seems the
> disk
> >> IO sensitive node while region server is the mem sensitive, can I
> install
> >> them in the same machine? Any suggestion on the deployment plan?
> >>
> >> My business requirement is that the write is much more than read(7:3),
> and
> >> I have another concern that I have a field which will have the 8~15KB in
> >>  data size, I am not sure, there will be any problem in hbase when it
> runs
> >> compaction and split in regions.
> >>
> >>  Oh, you already have heavyweight's input :).
> >>>
> >>> Thanks JM.
> >>>
> >>> Warm Regards,
> >>> Tariq
> >>> cloudfront.blogspot.com
> >>>
> >>>
> >>> On Sat, Jun 22, 2013 at 8:05 PM, Mohammad Tariq <do...@gmail.com>
> >>> wrote:
> >>>
> >>>  Hello there,
> >>>>
> >>>>          IMHO, 5-8 servers are sufficient enough to start with. But
> it's
> >>>> all relative to the data you have and the intensity of your
> reads/writes.
> >>>> You should have different strategies though, based on whether it's
> 'read'
> >>>> or 'write'. You actually can't define 'big' in absolute terms. My
> cluster
> >>>> might be big for me, but for someone else it might still be not big
> >>>> enough
> >>>> or for someone it might be very big. Long story short it depends on
> your
> >>>> needs. If you are able to achieve your goal with 5-8 RSs, then having
> >>>> more
> >>>> machines will be a wastage, I think.
> >>>>
> >>>> But you should always keep in mind that HBase is kinda greedy when it
> >>>> comes to memory. For a decent load 4G is sufficient, IMHO. But it
> again
> >>>> depends on operations you are gonna perform. If you have large
> clusters
> >>>> where you are planning to run MR jobs frequently you are better off
> with
> >>>> additional 2G.
> >>>>
> >>>>
> >>>> Warm Regards,
> >>>> Tariq
> >>>> cloudfront.blogspot.com
> >>>>
> >>>>
> >>>> On Sat, Jun 22, 2013 at 7:51 PM, myhbase <my...@126.com> wrote:
> >>>>
> >>>>  Hello All,
> >>>>>
> >>>>> I learn hbase almost from papers and books, according to my
> >>>>> understanding, HBase is the kind of architecture which is more
> appliable
> >>>>> to a big cluster. We should have many HDFS nodes, and many
> HBase(region
> >>>>> server) nodes. If we only have several severs(5-8), it seems hbase is
> >>>>> not a good choice, please correct me if I am wrong. In addition, how
> >>>>> many nodes usually we can start to consider the hbase solution and
> how
> >>>>> about the physic mem size and other hardware resource in each node,
> any
> >>>>> reference document or cases? Thanks.
> >>>>>
> >>>>> --Ning
> >>>>>
> >>>>>
> >>>>>
> >>
> >>
>

Re: how many severs in a hbase cluster

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
You HAVE TO run a ZK3, or else you don't need to have ZK2 and any ZK
failure will be an issue. You need to have an odd number of ZK
servers...

Also, if you don't run MR jobs, you don't need the TT and JT... Else,
everything below is correct. But there is many other options, all
depend on your needs and the hardware you have ;)

JM

2013/6/22 Mohammad Tariq <do...@gmail.com>:
> With 8 machines you can do something like this :
>
> Machine 1 - NN+JT
> Machine 2 - SNN+ZK1
> Machine 3 - HM+ZK2
> Machine 4-8 - DN+TT+RS
> (You can run ZK3 on a slave node with some additional memory).
>
> DN and RS run on the same machine. Although RSs are said to hold the data,
> the data is actually stored in DNs. Replication is managed at HDFS level.
> You don't have to worry about that.
>
> You can visit this link <http://hbase.apache.org/book/perf.writing.html> to
> see how to write efficiently into HBase. With a small field there should
> not be any problem except storage and increased metadata, as you'll have
> many small cells. If possible club several small fields into one and put
> them together in one cell.
>
> HTH
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Sat, Jun 22, 2013 at 8:31 PM, myhbase <my...@126.com> wrote:
>
>> Thanks for your response.
>>
>> Now if 5 servers are enough, how can I install  and configure my nodes? If
>> I need 3 replicas in case data loss, I should at least have 3 datanodes, we
>> still have namenode, regionserver and HMaster nodes, zookeeper nodes, some
>> of them must be installed in the same machine. The datanode seems the disk
>> IO sensitive node while region server is the mem sensitive, can I install
>> them in the same machine? Any suggestion on the deployment plan?
>>
>> My business requirement is that the write is much more than read(7:3), and
>> I have another concern that I have a field which will have the 8~15KB in
>>  data size, I am not sure, there will be any problem in hbase when it runs
>> compaction and split in regions.
>>
>>  Oh, you already have heavyweight's input :).
>>>
>>> Thanks JM.
>>>
>>> Warm Regards,
>>> Tariq
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Sat, Jun 22, 2013 at 8:05 PM, Mohammad Tariq <do...@gmail.com>
>>> wrote:
>>>
>>>  Hello there,
>>>>
>>>>          IMHO, 5-8 servers are sufficient enough to start with. But it's
>>>> all relative to the data you have and the intensity of your reads/writes.
>>>> You should have different strategies though, based on whether it's 'read'
>>>> or 'write'. You actually can't define 'big' in absolute terms. My cluster
>>>> might be big for me, but for someone else it might still be not big
>>>> enough
>>>> or for someone it might be very big. Long story short it depends on your
>>>> needs. If you are able to achieve your goal with 5-8 RSs, then having
>>>> more
>>>> machines will be a wastage, I think.
>>>>
>>>> But you should always keep in mind that HBase is kinda greedy when it
>>>> comes to memory. For a decent load 4G is sufficient, IMHO. But it again
>>>> depends on operations you are gonna perform. If you have large clusters
>>>> where you are planning to run MR jobs frequently you are better off with
>>>> additional 2G.
>>>>
>>>>
>>>> Warm Regards,
>>>> Tariq
>>>> cloudfront.blogspot.com
>>>>
>>>>
>>>> On Sat, Jun 22, 2013 at 7:51 PM, myhbase <my...@126.com> wrote:
>>>>
>>>>  Hello All,
>>>>>
>>>>> I learn hbase almost from papers and books, according to my
>>>>> understanding, HBase is the kind of architecture which is more appliable
>>>>> to a big cluster. We should have many HDFS nodes, and many HBase(region
>>>>> server) nodes. If we only have several severs(5-8), it seems hbase is
>>>>> not a good choice, please correct me if I am wrong. In addition, how
>>>>> many nodes usually we can start to consider the hbase solution and how
>>>>> about the physic mem size and other hardware resource in each node, any
>>>>> reference document or cases? Thanks.
>>>>>
>>>>> --Ning
>>>>>
>>>>>
>>>>>
>>
>>

Re: how many severs in a hbase cluster

Posted by Mohammad Tariq <do...@gmail.com>.
With 8 machines you can do something like this :

Machine 1 - NN+JT
Machine 2 - SNN+ZK1
Machine 3 - HM+ZK2
Machine 4-8 - DN+TT+RS
(You can run ZK3 on a slave node with some additional memory).

DN and RS run on the same machine. Although RSs are said to hold the data,
the data is actually stored in DNs. Replication is managed at HDFS level.
You don't have to worry about that.

You can visit this link <http://hbase.apache.org/book/perf.writing.html> to
see how to write efficiently into HBase. With a small field there should
not be any problem except storage and increased metadata, as you'll have
many small cells. If possible club several small fields into one and put
them together in one cell.

HTH

Warm Regards,
Tariq
cloudfront.blogspot.com


On Sat, Jun 22, 2013 at 8:31 PM, myhbase <my...@126.com> wrote:

> Thanks for your response.
>
> Now if 5 servers are enough, how can I install  and configure my nodes? If
> I need 3 replicas in case data loss, I should at least have 3 datanodes, we
> still have namenode, regionserver and HMaster nodes, zookeeper nodes, some
> of them must be installed in the same machine. The datanode seems the disk
> IO sensitive node while region server is the mem sensitive, can I install
> them in the same machine? Any suggestion on the deployment plan?
>
> My business requirement is that the write is much more than read(7:3), and
> I have another concern that I have a field which will have the 8~15KB in
>  data size, I am not sure, there will be any problem in hbase when it runs
> compaction and split in regions.
>
>  Oh, you already have heavyweight's input :).
>>
>> Thanks JM.
>>
>> Warm Regards,
>> Tariq
>> cloudfront.blogspot.com
>>
>>
>> On Sat, Jun 22, 2013 at 8:05 PM, Mohammad Tariq <do...@gmail.com>
>> wrote:
>>
>>  Hello there,
>>>
>>>          IMHO, 5-8 servers are sufficient enough to start with. But it's
>>> all relative to the data you have and the intensity of your reads/writes.
>>> You should have different strategies though, based on whether it's 'read'
>>> or 'write'. You actually can't define 'big' in absolute terms. My cluster
>>> might be big for me, but for someone else it might still be not big
>>> enough
>>> or for someone it might be very big. Long story short it depends on your
>>> needs. If you are able to achieve your goal with 5-8 RSs, then having
>>> more
>>> machines will be a wastage, I think.
>>>
>>> But you should always keep in mind that HBase is kinda greedy when it
>>> comes to memory. For a decent load 4G is sufficient, IMHO. But it again
>>> depends on operations you are gonna perform. If you have large clusters
>>> where you are planning to run MR jobs frequently you are better off with
>>> additional 2G.
>>>
>>>
>>> Warm Regards,
>>> Tariq
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Sat, Jun 22, 2013 at 7:51 PM, myhbase <my...@126.com> wrote:
>>>
>>>  Hello All,
>>>>
>>>> I learn hbase almost from papers and books, according to my
>>>> understanding, HBase is the kind of architecture which is more appliable
>>>> to a big cluster. We should have many HDFS nodes, and many HBase(region
>>>> server) nodes. If we only have several severs(5-8), it seems hbase is
>>>> not a good choice, please correct me if I am wrong. In addition, how
>>>> many nodes usually we can start to consider the hbase solution and how
>>>> about the physic mem size and other hardware resource in each node, any
>>>> reference document or cases? Thanks.
>>>>
>>>> --Ning
>>>>
>>>>
>>>>
>
>

Re: how many severs in a hbase cluster

Posted by myhbase <my...@126.com>.
Thanks for your response.

Now if 5 servers are enough, how can I install  and configure my nodes? 
If I need 3 replicas in case data loss, I should at least have 3 
datanodes, we still have namenode, regionserver and HMaster nodes, 
zookeeper nodes, some of them must be installed in the same machine. The 
datanode seems the disk IO sensitive node while region server is the mem 
sensitive, can I install them in the same machine? Any suggestion on the 
deployment plan?

My business requirement is that the write is much more than read(7:3), 
and I have another concern that I have a field which will have the 
8~15KB in  data size, I am not sure, there will be any problem in hbase 
when it runs compaction and split in regions.
> Oh, you already have heavyweight's input :).
>
> Thanks JM.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Sat, Jun 22, 2013 at 8:05 PM, Mohammad Tariq <do...@gmail.com> wrote:
>
>> Hello there,
>>
>>          IMHO, 5-8 servers are sufficient enough to start with. But it's
>> all relative to the data you have and the intensity of your reads/writes.
>> You should have different strategies though, based on whether it's 'read'
>> or 'write'. You actually can't define 'big' in absolute terms. My cluster
>> might be big for me, but for someone else it might still be not big enough
>> or for someone it might be very big. Long story short it depends on your
>> needs. If you are able to achieve your goal with 5-8 RSs, then having more
>> machines will be a wastage, I think.
>>
>> But you should always keep in mind that HBase is kinda greedy when it
>> comes to memory. For a decent load 4G is sufficient, IMHO. But it again
>> depends on operations you are gonna perform. If you have large clusters
>> where you are planning to run MR jobs frequently you are better off with
>> additional 2G.
>>
>>
>> Warm Regards,
>> Tariq
>> cloudfront.blogspot.com
>>
>>
>> On Sat, Jun 22, 2013 at 7:51 PM, myhbase <my...@126.com> wrote:
>>
>>> Hello All,
>>>
>>> I learn hbase almost from papers and books, according to my
>>> understanding, HBase is the kind of architecture which is more appliable
>>> to a big cluster. We should have many HDFS nodes, and many HBase(region
>>> server) nodes. If we only have several severs(5-8), it seems hbase is
>>> not a good choice, please correct me if I am wrong. In addition, how
>>> many nodes usually we can start to consider the hbase solution and how
>>> about the physic mem size and other hardware resource in each node, any
>>> reference document or cases? Thanks.
>>>
>>> --Ning
>>>
>>>



Re: how many severs in a hbase cluster

Posted by Mohammad Tariq <do...@gmail.com>.
Oh, you already have heavyweight's input :).

Thanks JM.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Sat, Jun 22, 2013 at 8:05 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello there,
>
>         IMHO, 5-8 servers are sufficient enough to start with. But it's
> all relative to the data you have and the intensity of your reads/writes.
> You should have different strategies though, based on whether it's 'read'
> or 'write'. You actually can't define 'big' in absolute terms. My cluster
> might be big for me, but for someone else it might still be not big enough
> or for someone it might be very big. Long story short it depends on your
> needs. If you are able to achieve your goal with 5-8 RSs, then having more
> machines will be a wastage, I think.
>
> But you should always keep in mind that HBase is kinda greedy when it
> comes to memory. For a decent load 4G is sufficient, IMHO. But it again
> depends on operations you are gonna perform. If you have large clusters
> where you are planning to run MR jobs frequently you are better off with
> additional 2G.
>
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Sat, Jun 22, 2013 at 7:51 PM, myhbase <my...@126.com> wrote:
>
>> Hello All,
>>
>> I learn hbase almost from papers and books, according to my
>> understanding, HBase is the kind of architecture which is more appliable
>> to a big cluster. We should have many HDFS nodes, and many HBase(region
>> server) nodes. If we only have several severs(5-8), it seems hbase is
>> not a good choice, please correct me if I am wrong. In addition, how
>> many nodes usually we can start to consider the hbase solution and how
>> about the physic mem size and other hardware resource in each node, any
>> reference document or cases? Thanks.
>>
>> --Ning
>>
>>
>

Re: how many severs in a hbase cluster

Posted by Mohammad Tariq <do...@gmail.com>.
Hello there,

        IMHO, 5-8 servers are sufficient enough to start with. But it's all
relative to the data you have and the intensity of your reads/writes. You
should have different strategies though, based on whether it's 'read' or
'write'. You actually can't define 'big' in absolute terms. My cluster
might be big for me, but for someone else it might still be not big enough
or for someone it might be very big. Long story short it depends on your
needs. If you are able to achieve your goal with 5-8 RSs, then having more
machines will be a wastage, I think.

But you should always keep in mind that HBase is kinda greedy when it comes
to memory. For a decent load 4G is sufficient, IMHO. But it again depends
on operations you are gonna perform. If you have large clusters where you
are planning to run MR jobs frequently you are better off with additional
2G.


Warm Regards,
Tariq
cloudfront.blogspot.com


On Sat, Jun 22, 2013 at 7:51 PM, myhbase <my...@126.com> wrote:

> Hello All,
>
> I learn hbase almost from papers and books, according to my
> understanding, HBase is the kind of architecture which is more appliable
> to a big cluster. We should have many HDFS nodes, and many HBase(region
> server) nodes. If we only have several severs(5-8), it seems hbase is
> not a good choice, please correct me if I am wrong. In addition, how
> many nodes usually we can start to consider the hbase solution and how
> about the physic mem size and other hardware resource in each node, any
> reference document or cases? Thanks.
>
> --Ning
>
>