You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Slava Gorelik <sl...@gmail.com> on 2008/10/02 20:30:19 UTC

Hbase / Hadoop Tuning

Hi All.
Our environment - 8 Datanodes (1 is also Namenode),
7 from them is also region servers and 1 is Master, default replication - 3.
We have application that heavy writes with relative small rows - about
10Kb,
current performance is 100000 rows in 580000 Milisec - 5.8 Milisec / row.
Is there any way to improve this performance by some tuning / tweaking HBase
or Hadoop ?

Thank You and Best Regards.

Re: Hbase / Hadoop Tuning

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Slava,

This is very slow. What kind of machines do you have? Anything else going
on? How do you push the data in?

Thx,

J-D

On Thu, Oct 2, 2008 at 2:30 PM, Slava Gorelik <sl...@gmail.com>wrote:

> Hi All.
> Our environment - 8 Datanodes (1 is also Namenode),
> 7 from them is also region servers and 1 is Master, default replication -
> 3.
> We have application that heavy writes with relative small rows - about
> 10Kb,
> current performance is 100000 rows in 580000 Milisec - 5.8 Milisec / row.
> Is there any way to improve this performance by some tuning / tweaking
> HBase
> or Hadoop ?
>
> Thank You and Best Regards.
>

RE: Hbase / Hadoop Tuning

Posted by Jonathan Gray <jl...@streamy.com>.

I believe he's referring to a java process.

It turns out that after more investigation by stack, there is not a giant
lock in the RPC mechanism client-side.

Check out:  https://issues.apache.org/jira/browse/HBASE-576

To increase performance you'll want multiple threads each with their own
HTables.

JG

-----Original Message-----
From: Krzysztof Szlapinski [mailto:krzysztof.szlapinski@starline.hk] 
Sent: Friday, October 10, 2008 3:53 AM
To: hbase-user@hadoop.apache.org
Subject: Re: Hbase / Hadoop Tuning

Jim Kellerman (POWERSET) wrote:
> Using a single client will also limit write performance.
> Even if the client is multi-threaded, there is a big giant lock
> in the RPC mechanism which prevents concurrent requests (This
> is something we plan to fix in the future).
>   
What do you mean by "hbase client"? Do you mean the instances of 
HBaseAdmin and HTable (any other?).

krzysiek

Re: Hbase / Hadoop Tuning

Posted by Krzysztof Szlapinski <kr...@starline.hk>.

Jim Kellerman (POWERSET) wrote:
> Using a single client will also limit write performance.
> Even if the client is multi-threaded, there is a big giant lock
> in the RPC mechanism which prevents concurrent requests (This
> is something we plan to fix in the future).
>   
What do you mean by "hbase client"? Do you mean the instances of 
HBaseAdmin and HTable (any other?).

krzysiek

Re: Hbase / Hadoop Tuning

Posted by Slava Gorelik <sl...@gmail.com>.

Thank You.I'll try to implement all your advices.

Thanks Again and Best Regards.


On Fri, Oct 3, 2008 at 12:27 AM, Jonathan Gray <jl...@streamy.com> wrote:

> If this is the case, then certainly what is hurting you is (repeating what
> has been said before, but maybe it's clearer to you now):
>
> - Serialized round-trip RPC calls for each insert (will eventually be
> handled with batched updates and/or parallelism in client, for now, you
> would need to have multiple processes doing the writing... you will see a
> major improvement if you have multiple writing processes)
>
> - Inserting to a single region.  As described before, you're only hitting a
> single server, so your writes are not at all being distributed.  Lower your
> region/filesize to get splits sooner.  Also, keep your eye on:
> https://issues.apache.org/jira/browse/HBASE-902  This feature is intended
> for situations like this.
>
>
> -----Original Message-----
> From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> Sent: Thursday, October 02, 2008 1:55 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Hbase / Hadoop Tuning
>
> Hi.My webapp is trying to simulate the row by row operation, it means that
> it's adding in the loop 100K Rows.
> And my time measurement is started a line before loop and finished a line
> after the loop, it means that no overhead of webapp.
> But, sure, i'll take in deep, that i'm not spending the 1 or 2  ms for
> some operation.
>
> Thank You and Best Regards.
>
>
>
>
> On Thu, Oct 2, 2008 at 11:36 PM, Jonathan Gray <jl...@streamy.com> wrote:
>
> > In this case, it would definitely hurt your performance.
> >
> > One question.  Have you done more detailed timings to determine where
> time
> > is spent?  With the overhead of your webapp, and it streaming insertions
> > one
> > row at a time, is it possible that a significant amount of time is being
> > spent before or after the hbase commit (significant in this case could be
> > 1-2 ms/row).
> >
> > JG
> >
> > -----Original Message-----
> > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > Sent: Thursday, October 02, 2008 1:12 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: Hbase / Hadoop Tuning
> >
> > Thank You.
> > According to doing write in MR jobs, the problem is that rows are coming
> to
> > webapp one by one and i can't accumulate them into
> > one big batch update, it means i need to run MR job for each single row,
> in
> > this case will MR jobs help ?
> >
> > Best Regards.
> >
> > On Thu, Oct 2, 2008 at 10:58 PM, Jim Kellerman (POWERSET) <
> > Jim.Kellerman@microsoft.com> wrote:
> >
> > > Responses inline below.
> > > > -----Original Message-----
> > > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > > > Sent: Thursday, October 02, 2008 12:39 PM
> > > > To: hbase-user@hadoop.apache.org
> > > > Subject: Re: Hbase / Hadoop Tuning
> > > >
> > > > Thank You Jim for a quick answer.
> > > > 1) If i understand correct, using 2 clients should allow me improve
> > > > the performance twice (more or less) ?
> > >
> > > I don't know if you will get 2x performance, but it will be greater
> than
> > > 1x.
> > >
> > > > 2) Currently, our webapp is HBase client using Htable - is that what
> > you
> > > > meant, when you said "(HBase, not web) clients" ?
> > >
> > > If multiple requests come into your webapp, and your webapp is
> > > multithreaded, you will not see a performance increase.
> > >
> > > If your webapp runs a different process for each request, you will see
> > > a performance increase because the RPC connection will not be shared
> > > and consequently will not block on the giant lock. That is why I
> > > recommended splitting up your job using Map/Reduce.
> > >
> > > > 3) 64MB for single region server is a minimum size or could be less ?
> > >
> > > It could be less, but that is the default block size for the Hadoop
> DFS.
> > > If you make it smaller, you might want to change the default block size
> > > for Hadoop as well.
> > >
> > > > 4) When is planed to fix the RPC lock for concurrent operations
> > > > in single client ?
> > >
> > > This change is targeted for somewhere in the next 6 months according
> > > to the roadmap.
> > >
> > >
> > > > Thank You Again and Best Regards.
> > > >
> > > >
> > > > On Thu, Oct 2, 2008 at 10:30 PM, Jim Kellerman (POWERSET) <
> > > > Jim.Kellerman@microsoft.com> wrote:
> > > >
> > > > > What you are storing is 140,000,000 bytes, so having multiple
> > > > > region servers will not help you as a single region is only
> > > > > served by a single region server. By default, regions split
> > > > > when they reach 256MB. So until the region splits, all traffic
> > > > > will go to a single region server. You might try reducing the
> > > > > maximum file size to encourage region splitting by changing the
> > > > > value of hbase.hregion.max.filesize to 64MB.
> > > > >
> > > > > Using a single client will also limit write performance.
> > > > > Even if the client is multi-threaded, there is a big giant lock
> > > > > in the RPC mechanism which prevents concurrent requests (This
> > > > > is something we plan to fix in the future).
> > > > >
> > > > > Multiple clients do not block against one another the way multi-
> > > > > threaded clients do currently. So another way to increase
> > > > > write performance would be to run multiple (HBase, not web)
> clients,
> > > > > by either running multiple processes directly, or by utilizing
> > > > > a Map/Reduce job to do the writes.
> > > > >
> > > > > ---
> > > > > Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > > > > > Sent: Thursday, October 02, 2008 12:07 PM
> > > > > > To: hbase-user@hadoop.apache.org
> > > > > > Subject: Re: Hbase / Hadoop Tuning
> > > > > >
> > > > > > Hi.Thank you for quick response.
> > > > > > We are using 7 machines (6 RedHat 5 and 1 is SuSe interprise 10).
> > > > > > Each machine is : 4 CPU with 4gb ram and 200gb HD, connected with
> > 1gb
> > > > > > network interface.
> > > > > > All machines in the same rec. On one machine (master) we are
> > running
> > > > > > Tomcat
> > > > > > with one webapp
> > > > > > that is adding 100000 rows. Nothing else is running. When no
> webapp
> > > > > > running
> > > > > > the CPU load is less the 1%.
> > > > > >
> > > > > > We are using Hbase 0.18.0 and Hadoop 0.18.0.
> > > > > > Hbase cluster is one master and 6 region servers.
> > > > > >
> > > > > > Row addition is done by BatchUpdate and commint into single
> column
> > > > > family.
> > > > > > The data is simple bytes array (1400 bytes each row).
> > > > > >
> > > > > >
> > > > > > Thank You and Best Regards.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Oct 2, 2008 at 9:39 PM, stack <st...@duboce.net> wrote:
> > > > > >
> > > > > > > Tell us more Slava.  HBase versions and how many regions you
> have
> > > in
> > > > > > your
> > > > > > > cluster?
> > > > > > >
> > > > > > > If small rows, your best boost will likely come when we support
> > > > > batching
> > > > > > of
> > > > > > > updates: HBASE-748.
> > > > > > >
> > > > > > > St.Ack
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Slava Gorelik wrote:
> > > > > > >
> > > > > > >> Hi All.
> > > > > > >> Our environment - 8 Datanodes (1 is also Namenode),
> > > > > > >> 7 from them is also region servers and 1 is Master, default
> > > > > replication
> > > > > > -
> > > > > > >> 3.
> > > > > > >> We have application that heavy writes with relative small rows
> -
> > > > about
> > > > > > >> 10Kb,
> > > > > > >> current performance is 100000 rows in 580000 Milisec - 5.8
> > Milisec
> > > > /
> > > > > > row.
> > > > > > >> Is there any way to improve this performance by some tuning /
> > > > tweaking
> > > > > > >> HBase
> > > > > > >> or Hadoop ?
> > > > > > >>
> > > > > > >> Thank You and Best Regards.
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > >
> > >
> >
> >
>
>

RE: Hbase / Hadoop Tuning

Posted by Jonathan Gray <jl...@streamy.com>.

If this is the case, then certainly what is hurting you is (repeating what
has been said before, but maybe it's clearer to you now):

- Serialized round-trip RPC calls for each insert (will eventually be
handled with batched updates and/or parallelism in client, for now, you
would need to have multiple processes doing the writing... you will see a
major improvement if you have multiple writing processes)

- Inserting to a single region.  As described before, you're only hitting a
single server, so your writes are not at all being distributed.  Lower your
region/filesize to get splits sooner.  Also, keep your eye on:
https://issues.apache.org/jira/browse/HBASE-902  This feature is intended
for situations like this.


-----Original Message-----
From: Slava Gorelik [mailto:slava.gorelik@gmail.com] 
Sent: Thursday, October 02, 2008 1:55 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Hbase / Hadoop Tuning

Hi.My webapp is trying to simulate the row by row operation, it means that
it's adding in the loop 100K Rows.
And my time measurement is started a line before loop and finished a line
after the loop, it means that no overhead of webapp.
But, sure, i'll take in deep, that i'm not spending the 1 or 2  ms for
some operation.

Thank You and Best Regards.




On Thu, Oct 2, 2008 at 11:36 PM, Jonathan Gray <jl...@streamy.com> wrote:

> In this case, it would definitely hurt your performance.
>
> One question.  Have you done more detailed timings to determine where time
> is spent?  With the overhead of your webapp, and it streaming insertions
> one
> row at a time, is it possible that a significant amount of time is being
> spent before or after the hbase commit (significant in this case could be
> 1-2 ms/row).
>
> JG
>
> -----Original Message-----
> From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> Sent: Thursday, October 02, 2008 1:12 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Hbase / Hadoop Tuning
>
> Thank You.
> According to doing write in MR jobs, the problem is that rows are coming
to
> webapp one by one and i can't accumulate them into
> one big batch update, it means i need to run MR job for each single row,
in
> this case will MR jobs help ?
>
> Best Regards.
>
> On Thu, Oct 2, 2008 at 10:58 PM, Jim Kellerman (POWERSET) <
> Jim.Kellerman@microsoft.com> wrote:
>
> > Responses inline below.
> > > -----Original Message-----
> > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > > Sent: Thursday, October 02, 2008 12:39 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: Re: Hbase / Hadoop Tuning
> > >
> > > Thank You Jim for a quick answer.
> > > 1) If i understand correct, using 2 clients should allow me improve
> > > the performance twice (more or less) ?
> >
> > I don't know if you will get 2x performance, but it will be greater than
> > 1x.
> >
> > > 2) Currently, our webapp is HBase client using Htable - is that what
> you
> > > meant, when you said "(HBase, not web) clients" ?
> >
> > If multiple requests come into your webapp, and your webapp is
> > multithreaded, you will not see a performance increase.
> >
> > If your webapp runs a different process for each request, you will see
> > a performance increase because the RPC connection will not be shared
> > and consequently will not block on the giant lock. That is why I
> > recommended splitting up your job using Map/Reduce.
> >
> > > 3) 64MB for single region server is a minimum size or could be less ?
> >
> > It could be less, but that is the default block size for the Hadoop DFS.
> > If you make it smaller, you might want to change the default block size
> > for Hadoop as well.
> >
> > > 4) When is planed to fix the RPC lock for concurrent operations
> > > in single client ?
> >
> > This change is targeted for somewhere in the next 6 months according
> > to the roadmap.
> >
> >
> > > Thank You Again and Best Regards.
> > >
> > >
> > > On Thu, Oct 2, 2008 at 10:30 PM, Jim Kellerman (POWERSET) <
> > > Jim.Kellerman@microsoft.com> wrote:
> > >
> > > > What you are storing is 140,000,000 bytes, so having multiple
> > > > region servers will not help you as a single region is only
> > > > served by a single region server. By default, regions split
> > > > when they reach 256MB. So until the region splits, all traffic
> > > > will go to a single region server. You might try reducing the
> > > > maximum file size to encourage region splitting by changing the
> > > > value of hbase.hregion.max.filesize to 64MB.
> > > >
> > > > Using a single client will also limit write performance.
> > > > Even if the client is multi-threaded, there is a big giant lock
> > > > in the RPC mechanism which prevents concurrent requests (This
> > > > is something we plan to fix in the future).
> > > >
> > > > Multiple clients do not block against one another the way multi-
> > > > threaded clients do currently. So another way to increase
> > > > write performance would be to run multiple (HBase, not web) clients,
> > > > by either running multiple processes directly, or by utilizing
> > > > a Map/Reduce job to do the writes.
> > > >
> > > > ---
> > > > Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > > > > Sent: Thursday, October 02, 2008 12:07 PM
> > > > > To: hbase-user@hadoop.apache.org
> > > > > Subject: Re: Hbase / Hadoop Tuning
> > > > >
> > > > > Hi.Thank you for quick response.
> > > > > We are using 7 machines (6 RedHat 5 and 1 is SuSe interprise 10).
> > > > > Each machine is : 4 CPU with 4gb ram and 200gb HD, connected with
> 1gb
> > > > > network interface.
> > > > > All machines in the same rec. On one machine (master) we are
> running
> > > > > Tomcat
> > > > > with one webapp
> > > > > that is adding 100000 rows. Nothing else is running. When no
webapp
> > > > > running
> > > > > the CPU load is less the 1%.
> > > > >
> > > > > We are using Hbase 0.18.0 and Hadoop 0.18.0.
> > > > > Hbase cluster is one master and 6 region servers.
> > > > >
> > > > > Row addition is done by BatchUpdate and commint into single column
> > > > family.
> > > > > The data is simple bytes array (1400 bytes each row).
> > > > >
> > > > >
> > > > > Thank You and Best Regards.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Oct 2, 2008 at 9:39 PM, stack <st...@duboce.net> wrote:
> > > > >
> > > > > > Tell us more Slava.  HBase versions and how many regions you
have
> > in
> > > > > your
> > > > > > cluster?
> > > > > >
> > > > > > If small rows, your best boost will likely come when we support
> > > > batching
> > > > > of
> > > > > > updates: HBASE-748.
> > > > > >
> > > > > > St.Ack
> > > > > >
> > > > > >
> > > > > >
> > > > > > Slava Gorelik wrote:
> > > > > >
> > > > > >> Hi All.
> > > > > >> Our environment - 8 Datanodes (1 is also Namenode),
> > > > > >> 7 from them is also region servers and 1 is Master, default
> > > > replication
> > > > > -
> > > > > >> 3.
> > > > > >> We have application that heavy writes with relative small rows
-
> > > about
> > > > > >> 10Kb,
> > > > > >> current performance is 100000 rows in 580000 Milisec - 5.8
> Milisec
> > > /
> > > > > row.
> > > > > >> Is there any way to improve this performance by some tuning /
> > > tweaking
> > > > > >> HBase
> > > > > >> or Hadoop ?
> > > > > >>
> > > > > >> Thank You and Best Regards.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >
> > > > > >
> > > >
> >
>
>

Re: Hbase / Hadoop Tuning

Posted by Slava Gorelik <sl...@gmail.com>.

Hi.My webapp is trying to simulate the row by row operation, it means that
it's adding in the loop 100K Rows.
And my time measurement is started a line before loop and finished a line
after the loop, it means that no overhead of webapp.
But, sure, i'll take in deep, that i'm not spending the 1 or 2  ms for
some operation.

Thank You and Best Regards.




On Thu, Oct 2, 2008 at 11:36 PM, Jonathan Gray <jl...@streamy.com> wrote:

> In this case, it would definitely hurt your performance.
>
> One question.  Have you done more detailed timings to determine where time
> is spent?  With the overhead of your webapp, and it streaming insertions
> one
> row at a time, is it possible that a significant amount of time is being
> spent before or after the hbase commit (significant in this case could be
> 1-2 ms/row).
>
> JG
>
> -----Original Message-----
> From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> Sent: Thursday, October 02, 2008 1:12 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Hbase / Hadoop Tuning
>
> Thank You.
> According to doing write in MR jobs, the problem is that rows are coming to
> webapp one by one and i can't accumulate them into
> one big batch update, it means i need to run MR job for each single row, in
> this case will MR jobs help ?
>
> Best Regards.
>
> On Thu, Oct 2, 2008 at 10:58 PM, Jim Kellerman (POWERSET) <
> Jim.Kellerman@microsoft.com> wrote:
>
> > Responses inline below.
> > > -----Original Message-----
> > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > > Sent: Thursday, October 02, 2008 12:39 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: Re: Hbase / Hadoop Tuning
> > >
> > > Thank You Jim for a quick answer.
> > > 1) If i understand correct, using 2 clients should allow me improve
> > > the performance twice (more or less) ?
> >
> > I don't know if you will get 2x performance, but it will be greater than
> > 1x.
> >
> > > 2) Currently, our webapp is HBase client using Htable - is that what
> you
> > > meant, when you said "(HBase, not web) clients" ?
> >
> > If multiple requests come into your webapp, and your webapp is
> > multithreaded, you will not see a performance increase.
> >
> > If your webapp runs a different process for each request, you will see
> > a performance increase because the RPC connection will not be shared
> > and consequently will not block on the giant lock. That is why I
> > recommended splitting up your job using Map/Reduce.
> >
> > > 3) 64MB for single region server is a minimum size or could be less ?
> >
> > It could be less, but that is the default block size for the Hadoop DFS.
> > If you make it smaller, you might want to change the default block size
> > for Hadoop as well.
> >
> > > 4) When is planed to fix the RPC lock for concurrent operations
> > > in single client ?
> >
> > This change is targeted for somewhere in the next 6 months according
> > to the roadmap.
> >
> >
> > > Thank You Again and Best Regards.
> > >
> > >
> > > On Thu, Oct 2, 2008 at 10:30 PM, Jim Kellerman (POWERSET) <
> > > Jim.Kellerman@microsoft.com> wrote:
> > >
> > > > What you are storing is 140,000,000 bytes, so having multiple
> > > > region servers will not help you as a single region is only
> > > > served by a single region server. By default, regions split
> > > > when they reach 256MB. So until the region splits, all traffic
> > > > will go to a single region server. You might try reducing the
> > > > maximum file size to encourage region splitting by changing the
> > > > value of hbase.hregion.max.filesize to 64MB.
> > > >
> > > > Using a single client will also limit write performance.
> > > > Even if the client is multi-threaded, there is a big giant lock
> > > > in the RPC mechanism which prevents concurrent requests (This
> > > > is something we plan to fix in the future).
> > > >
> > > > Multiple clients do not block against one another the way multi-
> > > > threaded clients do currently. So another way to increase
> > > > write performance would be to run multiple (HBase, not web) clients,
> > > > by either running multiple processes directly, or by utilizing
> > > > a Map/Reduce job to do the writes.
> > > >
> > > > ---
> > > > Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > > > > Sent: Thursday, October 02, 2008 12:07 PM
> > > > > To: hbase-user@hadoop.apache.org
> > > > > Subject: Re: Hbase / Hadoop Tuning
> > > > >
> > > > > Hi.Thank you for quick response.
> > > > > We are using 7 machines (6 RedHat 5 and 1 is SuSe interprise 10).
> > > > > Each machine is : 4 CPU with 4gb ram and 200gb HD, connected with
> 1gb
> > > > > network interface.
> > > > > All machines in the same rec. On one machine (master) we are
> running
> > > > > Tomcat
> > > > > with one webapp
> > > > > that is adding 100000 rows. Nothing else is running. When no webapp
> > > > > running
> > > > > the CPU load is less the 1%.
> > > > >
> > > > > We are using Hbase 0.18.0 and Hadoop 0.18.0.
> > > > > Hbase cluster is one master and 6 region servers.
> > > > >
> > > > > Row addition is done by BatchUpdate and commint into single column
> > > > family.
> > > > > The data is simple bytes array (1400 bytes each row).
> > > > >
> > > > >
> > > > > Thank You and Best Regards.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Oct 2, 2008 at 9:39 PM, stack <st...@duboce.net> wrote:
> > > > >
> > > > > > Tell us more Slava.  HBase versions and how many regions you have
> > in
> > > > > your
> > > > > > cluster?
> > > > > >
> > > > > > If small rows, your best boost will likely come when we support
> > > > batching
> > > > > of
> > > > > > updates: HBASE-748.
> > > > > >
> > > > > > St.Ack
> > > > > >
> > > > > >
> > > > > >
> > > > > > Slava Gorelik wrote:
> > > > > >
> > > > > >> Hi All.
> > > > > >> Our environment - 8 Datanodes (1 is also Namenode),
> > > > > >> 7 from them is also region servers and 1 is Master, default
> > > > replication
> > > > > -
> > > > > >> 3.
> > > > > >> We have application that heavy writes with relative small rows -
> > > about
> > > > > >> 10Kb,
> > > > > >> current performance is 100000 rows in 580000 Milisec - 5.8
> Milisec
> > > /
> > > > > row.
> > > > > >> Is there any way to improve this performance by some tuning /
> > > tweaking
> > > > > >> HBase
> > > > > >> or Hadoop ?
> > > > > >>
> > > > > >> Thank You and Best Regards.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >
> > > > > >
> > > >
> >
>
>

RE: Hbase / Hadoop Tuning

Posted by Jonathan Gray <jl...@streamy.com>.

In this case, it would definitely hurt your performance.

One question.  Have you done more detailed timings to determine where time
is spent?  With the overhead of your webapp, and it streaming insertions one
row at a time, is it possible that a significant amount of time is being
spent before or after the hbase commit (significant in this case could be
1-2 ms/row).

JG

-----Original Message-----
From: Slava Gorelik [mailto:slava.gorelik@gmail.com] 
Sent: Thursday, October 02, 2008 1:12 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Hbase / Hadoop Tuning

Thank You.
According to doing write in MR jobs, the problem is that rows are coming to
webapp one by one and i can't accumulate them into
one big batch update, it means i need to run MR job for each single row, in
this case will MR jobs help ?

Best Regards.

On Thu, Oct 2, 2008 at 10:58 PM, Jim Kellerman (POWERSET) <
Jim.Kellerman@microsoft.com> wrote:

> Responses inline below.
> > -----Original Message-----
> > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > Sent: Thursday, October 02, 2008 12:39 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: Hbase / Hadoop Tuning
> >
> > Thank You Jim for a quick answer.
> > 1) If i understand correct, using 2 clients should allow me improve
> > the performance twice (more or less) ?
>
> I don't know if you will get 2x performance, but it will be greater than
> 1x.
>
> > 2) Currently, our webapp is HBase client using Htable - is that what you
> > meant, when you said "(HBase, not web) clients" ?
>
> If multiple requests come into your webapp, and your webapp is
> multithreaded, you will not see a performance increase.
>
> If your webapp runs a different process for each request, you will see
> a performance increase because the RPC connection will not be shared
> and consequently will not block on the giant lock. That is why I
> recommended splitting up your job using Map/Reduce.
>
> > 3) 64MB for single region server is a minimum size or could be less ?
>
> It could be less, but that is the default block size for the Hadoop DFS.
> If you make it smaller, you might want to change the default block size
> for Hadoop as well.
>
> > 4) When is planed to fix the RPC lock for concurrent operations
> > in single client ?
>
> This change is targeted for somewhere in the next 6 months according
> to the roadmap.
>
>
> > Thank You Again and Best Regards.
> >
> >
> > On Thu, Oct 2, 2008 at 10:30 PM, Jim Kellerman (POWERSET) <
> > Jim.Kellerman@microsoft.com> wrote:
> >
> > > What you are storing is 140,000,000 bytes, so having multiple
> > > region servers will not help you as a single region is only
> > > served by a single region server. By default, regions split
> > > when they reach 256MB. So until the region splits, all traffic
> > > will go to a single region server. You might try reducing the
> > > maximum file size to encourage region splitting by changing the
> > > value of hbase.hregion.max.filesize to 64MB.
> > >
> > > Using a single client will also limit write performance.
> > > Even if the client is multi-threaded, there is a big giant lock
> > > in the RPC mechanism which prevents concurrent requests (This
> > > is something we plan to fix in the future).
> > >
> > > Multiple clients do not block against one another the way multi-
> > > threaded clients do currently. So another way to increase
> > > write performance would be to run multiple (HBase, not web) clients,
> > > by either running multiple processes directly, or by utilizing
> > > a Map/Reduce job to do the writes.
> > >
> > > ---
> > > Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
> > >
> > >
> > > > -----Original Message-----
> > > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > > > Sent: Thursday, October 02, 2008 12:07 PM
> > > > To: hbase-user@hadoop.apache.org
> > > > Subject: Re: Hbase / Hadoop Tuning
> > > >
> > > > Hi.Thank you for quick response.
> > > > We are using 7 machines (6 RedHat 5 and 1 is SuSe interprise 10).
> > > > Each machine is : 4 CPU with 4gb ram and 200gb HD, connected with
1gb
> > > > network interface.
> > > > All machines in the same rec. On one machine (master) we are running
> > > > Tomcat
> > > > with one webapp
> > > > that is adding 100000 rows. Nothing else is running. When no webapp
> > > > running
> > > > the CPU load is less the 1%.
> > > >
> > > > We are using Hbase 0.18.0 and Hadoop 0.18.0.
> > > > Hbase cluster is one master and 6 region servers.
> > > >
> > > > Row addition is done by BatchUpdate and commint into single column
> > > family.
> > > > The data is simple bytes array (1400 bytes each row).
> > > >
> > > >
> > > > Thank You and Best Regards.
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Oct 2, 2008 at 9:39 PM, stack <st...@duboce.net> wrote:
> > > >
> > > > > Tell us more Slava.  HBase versions and how many regions you have
> in
> > > > your
> > > > > cluster?
> > > > >
> > > > > If small rows, your best boost will likely come when we support
> > > batching
> > > > of
> > > > > updates: HBASE-748.
> > > > >
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > > Slava Gorelik wrote:
> > > > >
> > > > >> Hi All.
> > > > >> Our environment - 8 Datanodes (1 is also Namenode),
> > > > >> 7 from them is also region servers and 1 is Master, default
> > > replication
> > > > -
> > > > >> 3.
> > > > >> We have application that heavy writes with relative small rows -
> > about
> > > > >> 10Kb,
> > > > >> current performance is 100000 rows in 580000 Milisec - 5.8
Milisec
> > /
> > > > row.
> > > > >> Is there any way to improve this performance by some tuning /
> > tweaking
> > > > >> HBase
> > > > >> or Hadoop ?
> > > > >>
> > > > >> Thank You and Best Regards.
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > > >
> > >
>

Re: Hbase / Hadoop Tuning

Posted by Slava Gorelik <sl...@gmail.com>.

Thank You.
According to doing write in MR jobs, the problem is that rows are coming to
webapp one by one and i can't accumulate them into
one big batch update, it means i need to run MR job for each single row, in
this case will MR jobs help ?

Best Regards.

On Thu, Oct 2, 2008 at 10:58 PM, Jim Kellerman (POWERSET) <
Jim.Kellerman@microsoft.com> wrote:

> Responses inline below.
> > -----Original Message-----
> > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > Sent: Thursday, October 02, 2008 12:39 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: Hbase / Hadoop Tuning
> >
> > Thank You Jim for a quick answer.
> > 1) If i understand correct, using 2 clients should allow me improve
> > the performance twice (more or less) ?
>
> I don't know if you will get 2x performance, but it will be greater than
> 1x.
>
> > 2) Currently, our webapp is HBase client using Htable - is that what you
> > meant, when you said "(HBase, not web) clients" ?
>
> If multiple requests come into your webapp, and your webapp is
> multithreaded, you will not see a performance increase.
>
> If your webapp runs a different process for each request, you will see
> a performance increase because the RPC connection will not be shared
> and consequently will not block on the giant lock. That is why I
> recommended splitting up your job using Map/Reduce.
>
> > 3) 64MB for single region server is a minimum size or could be less ?
>
> It could be less, but that is the default block size for the Hadoop DFS.
> If you make it smaller, you might want to change the default block size
> for Hadoop as well.
>
> > 4) When is planed to fix the RPC lock for concurrent operations
> > in single client ?
>
> This change is targeted for somewhere in the next 6 months according
> to the roadmap.
>
>
> > Thank You Again and Best Regards.
> >
> >
> > On Thu, Oct 2, 2008 at 10:30 PM, Jim Kellerman (POWERSET) <
> > Jim.Kellerman@microsoft.com> wrote:
> >
> > > What you are storing is 140,000,000 bytes, so having multiple
> > > region servers will not help you as a single region is only
> > > served by a single region server. By default, regions split
> > > when they reach 256MB. So until the region splits, all traffic
> > > will go to a single region server. You might try reducing the
> > > maximum file size to encourage region splitting by changing the
> > > value of hbase.hregion.max.filesize to 64MB.
> > >
> > > Using a single client will also limit write performance.
> > > Even if the client is multi-threaded, there is a big giant lock
> > > in the RPC mechanism which prevents concurrent requests (This
> > > is something we plan to fix in the future).
> > >
> > > Multiple clients do not block against one another the way multi-
> > > threaded clients do currently. So another way to increase
> > > write performance would be to run multiple (HBase, not web) clients,
> > > by either running multiple processes directly, or by utilizing
> > > a Map/Reduce job to do the writes.
> > >
> > > ---
> > > Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
> > >
> > >
> > > > -----Original Message-----
> > > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > > > Sent: Thursday, October 02, 2008 12:07 PM
> > > > To: hbase-user@hadoop.apache.org
> > > > Subject: Re: Hbase / Hadoop Tuning
> > > >
> > > > Hi.Thank you for quick response.
> > > > We are using 7 machines (6 RedHat 5 and 1 is SuSe interprise 10).
> > > > Each machine is : 4 CPU with 4gb ram and 200gb HD, connected with 1gb
> > > > network interface.
> > > > All machines in the same rec. On one machine (master) we are running
> > > > Tomcat
> > > > with one webapp
> > > > that is adding 100000 rows. Nothing else is running. When no webapp
> > > > running
> > > > the CPU load is less the 1%.
> > > >
> > > > We are using Hbase 0.18.0 and Hadoop 0.18.0.
> > > > Hbase cluster is one master and 6 region servers.
> > > >
> > > > Row addition is done by BatchUpdate and commint into single column
> > > family.
> > > > The data is simple bytes array (1400 bytes each row).
> > > >
> > > >
> > > > Thank You and Best Regards.
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Oct 2, 2008 at 9:39 PM, stack <st...@duboce.net> wrote:
> > > >
> > > > > Tell us more Slava.  HBase versions and how many regions you have
> in
> > > > your
> > > > > cluster?
> > > > >
> > > > > If small rows, your best boost will likely come when we support
> > > batching
> > > > of
> > > > > updates: HBASE-748.
> > > > >
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > > Slava Gorelik wrote:
> > > > >
> > > > >> Hi All.
> > > > >> Our environment - 8 Datanodes (1 is also Namenode),
> > > > >> 7 from them is also region servers and 1 is Master, default
> > > replication
> > > > -
> > > > >> 3.
> > > > >> We have application that heavy writes with relative small rows -
> > about
> > > > >> 10Kb,
> > > > >> current performance is 100000 rows in 580000 Milisec - 5.8 Milisec
> > /
> > > > row.
> > > > >> Is there any way to improve this performance by some tuning /
> > tweaking
> > > > >> HBase
> > > > >> or Hadoop ?
> > > > >>
> > > > >> Thank You and Best Regards.
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > > >
> > >
>

RE: Hbase / Hadoop Tuning

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.

Responses inline below.
> -----Original Message-----
> From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> Sent: Thursday, October 02, 2008 12:39 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Hbase / Hadoop Tuning
>
> Thank You Jim for a quick answer.
> 1) If i understand correct, using 2 clients should allow me improve
> the performance twice (more or less) ?

I don't know if you will get 2x performance, but it will be greater than 1x.

> 2) Currently, our webapp is HBase client using Htable - is that what you
> meant, when you said "(HBase, not web) clients" ?

If multiple requests come into your webapp, and your webapp is multithreaded, you will not see a performance increase.

If your webapp runs a different process for each request, you will see
a performance increase because the RPC connection will not be shared
and consequently will not block on the giant lock. That is why I
recommended splitting up your job using Map/Reduce.

> 3) 64MB for single region server is a minimum size or could be less ?

It could be less, but that is the default block size for the Hadoop DFS.
If you make it smaller, you might want to change the default block size
for Hadoop as well.

> 4) When is planed to fix the RPC lock for concurrent operations
> in single client ?

This change is targeted for somewhere in the next 6 months according
to the roadmap.


> Thank You Again and Best Regards.
>
>
> On Thu, Oct 2, 2008 at 10:30 PM, Jim Kellerman (POWERSET) <
> Jim.Kellerman@microsoft.com> wrote:
>
> > What you are storing is 140,000,000 bytes, so having multiple
> > region servers will not help you as a single region is only
> > served by a single region server. By default, regions split
> > when they reach 256MB. So until the region splits, all traffic
> > will go to a single region server. You might try reducing the
> > maximum file size to encourage region splitting by changing the
> > value of hbase.hregion.max.filesize to 64MB.
> >
> > Using a single client will also limit write performance.
> > Even if the client is multi-threaded, there is a big giant lock
> > in the RPC mechanism which prevents concurrent requests (This
> > is something we plan to fix in the future).
> >
> > Multiple clients do not block against one another the way multi-
> > threaded clients do currently. So another way to increase
> > write performance would be to run multiple (HBase, not web) clients,
> > by either running multiple processes directly, or by utilizing
> > a Map/Reduce job to do the writes.
> >
> > ---
> > Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
> >
> >
> > > -----Original Message-----
> > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > > Sent: Thursday, October 02, 2008 12:07 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: Re: Hbase / Hadoop Tuning
> > >
> > > Hi.Thank you for quick response.
> > > We are using 7 machines (6 RedHat 5 and 1 is SuSe interprise 10).
> > > Each machine is : 4 CPU with 4gb ram and 200gb HD, connected with 1gb
> > > network interface.
> > > All machines in the same rec. On one machine (master) we are running
> > > Tomcat
> > > with one webapp
> > > that is adding 100000 rows. Nothing else is running. When no webapp
> > > running
> > > the CPU load is less the 1%.
> > >
> > > We are using Hbase 0.18.0 and Hadoop 0.18.0.
> > > Hbase cluster is one master and 6 region servers.
> > >
> > > Row addition is done by BatchUpdate and commint into single column
> > family.
> > > The data is simple bytes array (1400 bytes each row).
> > >
> > >
> > > Thank You and Best Regards.
> > >
> > >
> > >
> > >
> > > On Thu, Oct 2, 2008 at 9:39 PM, stack <st...@duboce.net> wrote:
> > >
> > > > Tell us more Slava.  HBase versions and how many regions you have in
> > > your
> > > > cluster?
> > > >
> > > > If small rows, your best boost will likely come when we support
> > batching
> > > of
> > > > updates: HBASE-748.
> > > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > Slava Gorelik wrote:
> > > >
> > > >> Hi All.
> > > >> Our environment - 8 Datanodes (1 is also Namenode),
> > > >> 7 from them is also region servers and 1 is Master, default
> > replication
> > > -
> > > >> 3.
> > > >> We have application that heavy writes with relative small rows -
> about
> > > >> 10Kb,
> > > >> current performance is 100000 rows in 580000 Milisec - 5.8 Milisec
> /
> > > row.
> > > >> Is there any way to improve this performance by some tuning /
> tweaking
> > > >> HBase
> > > >> or Hadoop ?
> > > >>
> > > >> Thank You and Best Regards.
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> >

Re: Hbase / Hadoop Tuning

Posted by Slava Gorelik <sl...@gmail.com>.

Thank You Jim for a quick answer.
1) If i understand correct, using 2 clients should allow me improve
the performance twice (more or less) ?
2) Currently, our webapp is HBase client using Htable - is that what you
meant, when you said "(HBase, not web) clients" ?
3) 64MB for single region server is a minimum size or could be less ?
4) When is planed to fix the RPC lock for concurrent operations
in single client ?

Thank You Again and Best Regards.


On Thu, Oct 2, 2008 at 10:30 PM, Jim Kellerman (POWERSET) <
Jim.Kellerman@microsoft.com> wrote:

> What you are storing is 140,000,000 bytes, so having multiple
> region servers will not help you as a single region is only
> served by a single region server. By default, regions split
> when they reach 256MB. So until the region splits, all traffic
> will go to a single region server. You might try reducing the
> maximum file size to encourage region splitting by changing the
> value of hbase.hregion.max.filesize to 64MB.
>
> Using a single client will also limit write performance.
> Even if the client is multi-threaded, there is a big giant lock
> in the RPC mechanism which prevents concurrent requests (This
> is something we plan to fix in the future).
>
> Multiple clients do not block against one another the way multi-
> threaded clients do currently. So another way to increase
> write performance would be to run multiple (HBase, not web) clients,
> by either running multiple processes directly, or by utilizing
> a Map/Reduce job to do the writes.
>
> ---
> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>
>
> > -----Original Message-----
> > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > Sent: Thursday, October 02, 2008 12:07 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: Hbase / Hadoop Tuning
> >
> > Hi.Thank you for quick response.
> > We are using 7 machines (6 RedHat 5 and 1 is SuSe interprise 10).
> > Each machine is : 4 CPU with 4gb ram and 200gb HD, connected with 1gb
> > network interface.
> > All machines in the same rec. On one machine (master) we are running
> > Tomcat
> > with one webapp
> > that is adding 100000 rows. Nothing else is running. When no webapp
> > running
> > the CPU load is less the 1%.
> >
> > We are using Hbase 0.18.0 and Hadoop 0.18.0.
> > Hbase cluster is one master and 6 region servers.
> >
> > Row addition is done by BatchUpdate and commint into single column
> family.
> > The data is simple bytes array (1400 bytes each row).
> >
> >
> > Thank You and Best Regards.
> >
> >
> >
> >
> > On Thu, Oct 2, 2008 at 9:39 PM, stack <st...@duboce.net> wrote:
> >
> > > Tell us more Slava.  HBase versions and how many regions you have in
> > your
> > > cluster?
> > >
> > > If small rows, your best boost will likely come when we support
> batching
> > of
> > > updates: HBASE-748.
> > >
> > > St.Ack
> > >
> > >
> > >
> > > Slava Gorelik wrote:
> > >
> > >> Hi All.
> > >> Our environment - 8 Datanodes (1 is also Namenode),
> > >> 7 from them is also region servers and 1 is Master, default
> replication
> > -
> > >> 3.
> > >> We have application that heavy writes with relative small rows - about
> > >> 10Kb,
> > >> current performance is 100000 rows in 580000 Milisec - 5.8 Milisec /
> > row.
> > >> Is there any way to improve this performance by some tuning / tweaking
> > >> HBase
> > >> or Hadoop ?
> > >>
> > >> Thank You and Best Regards.
> > >>
> > >>
> > >>
> > >
> > >
>

RE: Hbase / Hadoop Tuning

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.

What you are storing is 140,000,000 bytes, so having multiple
region servers will not help you as a single region is only
served by a single region server. By default, regions split
when they reach 256MB. So until the region splits, all traffic
will go to a single region server. You might try reducing the
maximum file size to encourage region splitting by changing the
value of hbase.hregion.max.filesize to 64MB.

Using a single client will also limit write performance.
Even if the client is multi-threaded, there is a big giant lock
in the RPC mechanism which prevents concurrent requests (This
is something we plan to fix in the future).

Multiple clients do not block against one another the way multi-
threaded clients do currently. So another way to increase
write performance would be to run multiple (HBase, not web) clients,
by either running multiple processes directly, or by utilizing
a Map/Reduce job to do the writes.

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)

> -----Original Message-----
> From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> Sent: Thursday, October 02, 2008 12:07 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Hbase / Hadoop Tuning
>
> Hi.Thank you for quick response.
> We are using 7 machines (6 RedHat 5 and 1 is SuSe interprise 10).
> Each machine is : 4 CPU with 4gb ram and 200gb HD, connected with 1gb
> network interface.
> All machines in the same rec. On one machine (master) we are running
> Tomcat
> with one webapp
> that is adding 100000 rows. Nothing else is running. When no webapp
> running
> the CPU load is less the 1%.
>
> We are using Hbase 0.18.0 and Hadoop 0.18.0.
> Hbase cluster is one master and 6 region servers.
>
> Row addition is done by BatchUpdate and commint into single column family.
> The data is simple bytes array (1400 bytes each row).
>
>
> Thank You and Best Regards.
>
>
>
>
> On Thu, Oct 2, 2008 at 9:39 PM, stack <st...@duboce.net> wrote:
>
> > Tell us more Slava.  HBase versions and how many regions you have in
> your
> > cluster?
> >
> > If small rows, your best boost will likely come when we support batching
> of
> > updates: HBASE-748.
> >
> > St.Ack
> >
> >
> >
> > Slava Gorelik wrote:
> >
> >> Hi All.
> >> Our environment - 8 Datanodes (1 is also Namenode),
> >> 7 from them is also region servers and 1 is Master, default replication
> -
> >> 3.
> >> We have application that heavy writes with relative small rows - about
> >> 10Kb,
> >> current performance is 100000 rows in 580000 Milisec - 5.8 Milisec /
> row.
> >> Is there any way to improve this performance by some tuning / tweaking
> >> HBase
> >> or Hadoop ?
> >>
> >> Thank You and Best Regards.
> >>
> >>
> >>
> >
> >

Re: Hbase / Hadoop Tuning

Posted by Slava Gorelik <sl...@gmail.com>.

Hi.Thank you for quick response.
We are using 7 machines (6 RedHat 5 and 1 is SuSe interprise 10).
Each machine is : 4 CPU with 4gb ram and 200gb HD, connected with 1gb
network interface.
All machines in the same rec. On one machine (master) we are running Tomcat
with one webapp
that is adding 100000 rows. Nothing else is running. When no webapp running
the CPU load is less the 1%.

We are using Hbase 0.18.0 and Hadoop 0.18.0.
Hbase cluster is one master and 6 region servers.

Row addition is done by BatchUpdate and commint into single column family.
The data is simple bytes array (1400 bytes each row).

Thank You and Best Regards.

On Thu, Oct 2, 2008 at 9:39 PM, stack <st...@duboce.net> wrote:

> Tell us more Slava.  HBase versions and how many regions you have in your
> cluster?
>
> If small rows, your best boost will likely come when we support batching of
> updates: HBASE-748.
>
> St.Ack
>
>
>
> Slava Gorelik wrote:
>
>> Hi All.
>> Our environment - 8 Datanodes (1 is also Namenode),
>> 7 from them is also region servers and 1 is Master, default replication -
>> 3.
>> We have application that heavy writes with relative small rows - about
>> 10Kb,
>> current performance is 100000 rows in 580000 Milisec - 5.8 Milisec / row.
>> Is there any way to improve this performance by some tuning / tweaking
>> HBase
>> or Hadoop ?
>>
>> Thank You and Best Regards.
>>
>>
>>
>
>

Re: Hbase / Hadoop Tuning

Posted by stack <st...@duboce.net>.

Tell us more Slava.  HBase versions and how many regions you have in 
your cluster?

If small rows, your best boost will likely come when we support batching 
of updates: HBASE-748.

St.Ack


Slava Gorelik wrote:
> Hi All.
> Our environment - 8 Datanodes (1 is also Namenode),
> 7 from them is also region servers and 1 is Master, default replication - 3.
> We have application that heavy writes with relative small rows - about
> 10Kb,
> current performance is 100000 rows in 580000 Milisec - 5.8 Milisec / row.
> Is there any way to improve this performance by some tuning / tweaking HBase
> or Hadoop ?
>
> Thank You and Best Regards.
>
>