You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Harold Lim <ro...@yahoo.com> on 2011/05/30 18:43:35 UTC

How to improve HBase throughput with YCSB?

Hi All,

I have an HBase cluster on ec2 m1.large instance (10 region servers). I'm trying to run a read-only YCSB workload. It seems that I can't get a good throughput. It saturates to around 600+ operations per second. 

My dataset is around 200GB (~1k+ regions). Running major compaction and also setting the handler count to 100 helped improve the performance a little bit. 

Are there setting or configurations that I need to set?

Thanks,
Harold


Re: How to improve HBase throughput with YCSB?

Posted by Harold Lim <ro...@yahoo.com>.
Hi Ted,

I increased the max client connections for zookeeper to some really big number. However, when I start YCSB with > 30 threads, 
It continuously prints out:
11/06/03 01:53:47 INFO zookeeper.ClientCnxn: Session establishment complete on server xxxxx:2181, sessionid = 0x13053f70e3536b1, negotiated timeout = 180000
11/06/03 01:53:47 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=xxxx:2181 sessionTimeout=180000 watcher=hconnection
11/06/03 01:53:47 INFO zookeeper.ClientCnxn: Opening socket connection to server xxxxx:2181



-Harold





--- On Thu, 6/2/11, Ted Dunning <td...@maprtech.com> wrote:

> From: Ted Dunning <td...@maprtech.com>
> Subject: Re: How to improve HBase throughput with YCSB?
> To: user@hbase.apache.org
> Date: Thursday, June 2, 2011, 2:48 AM
> Yeah.. there is a bug on that.
> 
> I am spacing the number right now.  And I have to
> run.
> 
> On Wed, Jun 1, 2011 at 11:42 PM, Harold Lim <ro...@yahoo.com>
> wrote:
> 
> > I'm running HBase 0.90.2.
> >
> >
> > -Harold
> >
> > --- On Thu, 6/2/11, Ted Dunning <td...@maprtech.com>
> wrote:
> >
> > > From: Ted Dunning <td...@maprtech.com>
> > > Subject: Re: How to improve HBase throughput with
> YCSB?
> > > To: user@hbase.apache.org
> > > Date: Thursday, June 2, 2011, 2:34 AM
> > > Zookeeper has an internal limit on
> > > number of connections.
> > >
> > > Which version of hbase are you running?
> > >
> > > On Wed, Jun 1, 2011 at 11:20 PM, Harold Lim
> <ro...@yahoo.com>
> > > wrote:
> > >
> > > > Hi Ted,
> > > >
> > > > For some reason, when I try the forked
> version of
> > > YCSB, I can't seem to
> > > > launch more than 10 threads. I start getting
> the
> > > following errors:
> > > >
> > > >
> > > >
> > > > 11/06/02 02:17:35 INFO zookeeper.ZooKeeper:
> Initiating
> > > client connection,
> > > > connectString=xxxxxx:2181
> sessionTimeout=180000
> > > watcher=hconnection
> > > > 11/06/02 02:17:35 INFO zookeeper.ClientCnxn:
> Opening
> > > socket connection to
> > > > server xxxxxx:2181
> > > > 11/06/02 02:17:35 INFO zookeeper.ClientCnxn:
> Socket
> > > connection established
> > > > to xxxxxx:2181, initiating session
> > > > 11/06/02 02:17:35 WARN zookeeper.ClientCnxn:
> Session
> > > 0x0 for server
> > > > xxxxxx:2181, unexpected error, closing
> socket
> > > connection and attempting
> > > > reconnect
> > > > java.io.IOException: Connection reset by
> peer
> > > >        at
> > > sun.nio.ch.FileDispatcher.read0(Native Method)
> > > >        at
> > >
> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > > >        at
> > >
> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> > > >        at
> > > sun.nio.ch.IOUtil.read(IOUtil.java:169)
> > > >        at
> > >
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> > > >        at
> > > >
> > >
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:858)
> > > >        at
> > > >
> > >
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130)
> > > >
> > > >
> > > > Thanks,
> > > > Harold
> > > > --- On Tue, 5/31/11, Ted Dunning <td...@maprtech.com>
> > > wrote:
> > > >
> > > > > From: Ted Dunning <td...@maprtech.com>
> > > > > Subject: Re: How to improve HBase
> throughput with
> > > YCSB?
> > > > > To: user@hbase.apache.org
> > > > > Date: Tuesday, May 31, 2011, 2:22 AM
> > > > > It may make it better.
> > > > >
> > > > > We should have an update shortly that
> will allow
> > > multiple
> > > > > machines to
> > > > > participate in generating load.  A
> single
> > > YCSB is
> > > > > sufficient to stress a few
> > > > > nodes but once you get to 10 or more
> (especially
> > > with MapR
> > > > > underneath) you
> > > > > really need a cluster to generate the
> load.
> > > > >
> > > > > The synchronization strategy is very
> > > simple.  We load
> > > > > up as usual and wait
> > > > > for a file to appear in
> Zookeeper.  When it
> > > appears,
> > > > > the load turns on.
> > > > >
> > > > > On Mon, May 30, 2011 at 10:08 PM,
> Harold Lim
> > > <ro...@yahoo.com>
> > > > > wrote:
> > > > >
> > > > > > I also see that you have a forked
> version of
> > > YCSB,
> > > > > will that make my
> > > > > > performance better?
> > > > > >
> > > > >
> > > >
> > >
> >
> 

Re: How to improve HBase throughput with YCSB?

Posted by Ted Dunning <td...@maprtech.com>.
Yeah.. there is a bug on that.

I am spacing the number right now.  And I have to run.

On Wed, Jun 1, 2011 at 11:42 PM, Harold Lim <ro...@yahoo.com> wrote:

> I'm running HBase 0.90.2.
>
>
> -Harold
>
> --- On Thu, 6/2/11, Ted Dunning <td...@maprtech.com> wrote:
>
> > From: Ted Dunning <td...@maprtech.com>
> > Subject: Re: How to improve HBase throughput with YCSB?
> > To: user@hbase.apache.org
> > Date: Thursday, June 2, 2011, 2:34 AM
> > Zookeeper has an internal limit on
> > number of connections.
> >
> > Which version of hbase are you running?
> >
> > On Wed, Jun 1, 2011 at 11:20 PM, Harold Lim <ro...@yahoo.com>
> > wrote:
> >
> > > Hi Ted,
> > >
> > > For some reason, when I try the forked version of
> > YCSB, I can't seem to
> > > launch more than 10 threads. I start getting the
> > following errors:
> > >
> > >
> > >
> > > 11/06/02 02:17:35 INFO zookeeper.ZooKeeper: Initiating
> > client connection,
> > > connectString=xxxxxx:2181 sessionTimeout=180000
> > watcher=hconnection
> > > 11/06/02 02:17:35 INFO zookeeper.ClientCnxn: Opening
> > socket connection to
> > > server xxxxxx:2181
> > > 11/06/02 02:17:35 INFO zookeeper.ClientCnxn: Socket
> > connection established
> > > to xxxxxx:2181, initiating session
> > > 11/06/02 02:17:35 WARN zookeeper.ClientCnxn: Session
> > 0x0 for server
> > > xxxxxx:2181, unexpected error, closing socket
> > connection and attempting
> > > reconnect
> > > java.io.IOException: Connection reset by peer
> > >        at
> > sun.nio.ch.FileDispatcher.read0(Native Method)
> > >        at
> > sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > >        at
> > sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> > >        at
> > sun.nio.ch.IOUtil.read(IOUtil.java:169)
> > >        at
> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> > >        at
> > >
> > org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:858)
> > >        at
> > >
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130)
> > >
> > >
> > > Thanks,
> > > Harold
> > > --- On Tue, 5/31/11, Ted Dunning <td...@maprtech.com>
> > wrote:
> > >
> > > > From: Ted Dunning <td...@maprtech.com>
> > > > Subject: Re: How to improve HBase throughput with
> > YCSB?
> > > > To: user@hbase.apache.org
> > > > Date: Tuesday, May 31, 2011, 2:22 AM
> > > > It may make it better.
> > > >
> > > > We should have an update shortly that will allow
> > multiple
> > > > machines to
> > > > participate in generating load.  A single
> > YCSB is
> > > > sufficient to stress a few
> > > > nodes but once you get to 10 or more (especially
> > with MapR
> > > > underneath) you
> > > > really need a cluster to generate the load.
> > > >
> > > > The synchronization strategy is very
> > simple.  We load
> > > > up as usual and wait
> > > > for a file to appear in Zookeeper.  When it
> > appears,
> > > > the load turns on.
> > > >
> > > > On Mon, May 30, 2011 at 10:08 PM, Harold Lim
> > <ro...@yahoo.com>
> > > > wrote:
> > > >
> > > > > I also see that you have a forked version of
> > YCSB,
> > > > will that make my
> > > > > performance better?
> > > > >
> > > >
> > >
> >
>

Re: How to improve HBase throughput with YCSB?

Posted by Harold Lim <ro...@yahoo.com>.
Hi St.Ack,

In my setup, Zk is being managed by HBase. I'll try increasing maxClientCnxns. 




Thanks,
Harold

--- On Thu, 6/2/11, Stack <st...@duboce.net> wrote:

> From: Stack <st...@duboce.net>
> Subject: Re: How to improve HBase throughput with YCSB?
> To: user@hbase.apache.org
> Date: Thursday, June 2, 2011, 11:34 AM
> It looks like you are managing zk
> yourself?  Default is that zk only
> allows 10 connections.  Up it to 1000 for now. 
> Its maxClientCnxns.
> St.Ack
> 
> On Wed, Jun 1, 2011 at 11:42 PM, Harold Lim <ro...@yahoo.com>
> wrote:
> > I'm running HBase 0.90.2.
> >
> >
> > -Harold
> >
> > --- On Thu, 6/2/11, Ted Dunning <td...@maprtech.com>
> wrote:
> >
> >> From: Ted Dunning <td...@maprtech.com>
> >> Subject: Re: How to improve HBase throughput with
> YCSB?
> >> To: user@hbase.apache.org
> >> Date: Thursday, June 2, 2011, 2:34 AM
> >> Zookeeper has an internal limit on
> >> number of connections.
> >>
> >> Which version of hbase are you running?
> >>
> >> On Wed, Jun 1, 2011 at 11:20 PM, Harold Lim <ro...@yahoo.com>
> >> wrote:
> >>
> >> > Hi Ted,
> >> >
> >> > For some reason, when I try the forked
> version of
> >> YCSB, I can't seem to
> >> > launch more than 10 threads. I start getting
> the
> >> following errors:
> >> >
> >> >
> >> >
> >> > 11/06/02 02:17:35 INFO zookeeper.ZooKeeper:
> Initiating
> >> client connection,
> >> > connectString=xxxxxx:2181
> sessionTimeout=180000
> >> watcher=hconnection
> >> > 11/06/02 02:17:35 INFO zookeeper.ClientCnxn:
> Opening
> >> socket connection to
> >> > server xxxxxx:2181
> >> > 11/06/02 02:17:35 INFO zookeeper.ClientCnxn:
> Socket
> >> connection established
> >> > to xxxxxx:2181, initiating session
> >> > 11/06/02 02:17:35 WARN zookeeper.ClientCnxn:
> Session
> >> 0x0 for server
> >> > xxxxxx:2181, unexpected error, closing
> socket
> >> connection and attempting
> >> > reconnect
> >> > java.io.IOException: Connection reset by
> peer
> >> >        at
> >> sun.nio.ch.FileDispatcher.read0(Native Method)
> >> >        at
> >>
> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> >> >        at
> >>
> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> >> >        at
> >> sun.nio.ch.IOUtil.read(IOUtil.java:169)
> >> >        at
> >>
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> >> >        at
> >> >
> >>
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:858)
> >> >        at
> >> >
> >>
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130)
> >> >
> >> >
> >> > Thanks,
> >> > Harold
> >> > --- On Tue, 5/31/11, Ted Dunning <td...@maprtech.com>
> >> wrote:
> >> >
> >> > > From: Ted Dunning <td...@maprtech.com>
> >> > > Subject: Re: How to improve HBase
> throughput with
> >> YCSB?
> >> > > To: user@hbase.apache.org
> >> > > Date: Tuesday, May 31, 2011, 2:22 AM
> >> > > It may make it better.
> >> > >
> >> > > We should have an update shortly that
> will allow
> >> multiple
> >> > > machines to
> >> > > participate in generating load.  A
> single
> >> YCSB is
> >> > > sufficient to stress a few
> >> > > nodes but once you get to 10 or more
> (especially
> >> with MapR
> >> > > underneath) you
> >> > > really need a cluster to generate the
> load.
> >> > >
> >> > > The synchronization strategy is very
> >> simple.  We load
> >> > > up as usual and wait
> >> > > for a file to appear in Zookeeper. 
> When it
> >> appears,
> >> > > the load turns on.
> >> > >
> >> > > On Mon, May 30, 2011 at 10:08 PM, Harold
> Lim
> >> <ro...@yahoo.com>
> >> > > wrote:
> >> > >
> >> > > > I also see that you have a forked
> version of
> >> YCSB,
> >> > > will that make my
> >> > > > performance better?
> >> > > >
> >> > >
> >> >
> >>
> >
> 

Re: How to improve HBase throughput with YCSB?

Posted by Stack <st...@duboce.net>.
It looks like you are managing zk yourself?  Default is that zk only
allows 10 connections.  Up it to 1000 for now.  Its maxClientCnxns.
St.Ack

On Wed, Jun 1, 2011 at 11:42 PM, Harold Lim <ro...@yahoo.com> wrote:
> I'm running HBase 0.90.2.
>
>
> -Harold
>
> --- On Thu, 6/2/11, Ted Dunning <td...@maprtech.com> wrote:
>
>> From: Ted Dunning <td...@maprtech.com>
>> Subject: Re: How to improve HBase throughput with YCSB?
>> To: user@hbase.apache.org
>> Date: Thursday, June 2, 2011, 2:34 AM
>> Zookeeper has an internal limit on
>> number of connections.
>>
>> Which version of hbase are you running?
>>
>> On Wed, Jun 1, 2011 at 11:20 PM, Harold Lim <ro...@yahoo.com>
>> wrote:
>>
>> > Hi Ted,
>> >
>> > For some reason, when I try the forked version of
>> YCSB, I can't seem to
>> > launch more than 10 threads. I start getting the
>> following errors:
>> >
>> >
>> >
>> > 11/06/02 02:17:35 INFO zookeeper.ZooKeeper: Initiating
>> client connection,
>> > connectString=xxxxxx:2181 sessionTimeout=180000
>> watcher=hconnection
>> > 11/06/02 02:17:35 INFO zookeeper.ClientCnxn: Opening
>> socket connection to
>> > server xxxxxx:2181
>> > 11/06/02 02:17:35 INFO zookeeper.ClientCnxn: Socket
>> connection established
>> > to xxxxxx:2181, initiating session
>> > 11/06/02 02:17:35 WARN zookeeper.ClientCnxn: Session
>> 0x0 for server
>> > xxxxxx:2181, unexpected error, closing socket
>> connection and attempting
>> > reconnect
>> > java.io.IOException: Connection reset by peer
>> >        at
>> sun.nio.ch.FileDispatcher.read0(Native Method)
>> >        at
>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>> >        at
>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>> >        at
>> sun.nio.ch.IOUtil.read(IOUtil.java:169)
>> >        at
>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>> >        at
>> >
>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:858)
>> >        at
>> >
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130)
>> >
>> >
>> > Thanks,
>> > Harold
>> > --- On Tue, 5/31/11, Ted Dunning <td...@maprtech.com>
>> wrote:
>> >
>> > > From: Ted Dunning <td...@maprtech.com>
>> > > Subject: Re: How to improve HBase throughput with
>> YCSB?
>> > > To: user@hbase.apache.org
>> > > Date: Tuesday, May 31, 2011, 2:22 AM
>> > > It may make it better.
>> > >
>> > > We should have an update shortly that will allow
>> multiple
>> > > machines to
>> > > participate in generating load.  A single
>> YCSB is
>> > > sufficient to stress a few
>> > > nodes but once you get to 10 or more (especially
>> with MapR
>> > > underneath) you
>> > > really need a cluster to generate the load.
>> > >
>> > > The synchronization strategy is very
>> simple.  We load
>> > > up as usual and wait
>> > > for a file to appear in Zookeeper.  When it
>> appears,
>> > > the load turns on.
>> > >
>> > > On Mon, May 30, 2011 at 10:08 PM, Harold Lim
>> <ro...@yahoo.com>
>> > > wrote:
>> > >
>> > > > I also see that you have a forked version of
>> YCSB,
>> > > will that make my
>> > > > performance better?
>> > > >
>> > >
>> >
>>
>

Re: How to improve HBase throughput with YCSB?

Posted by Harold Lim <ro...@yahoo.com>.
I'm running HBase 0.90.2.


-Harold

--- On Thu, 6/2/11, Ted Dunning <td...@maprtech.com> wrote:

> From: Ted Dunning <td...@maprtech.com>
> Subject: Re: How to improve HBase throughput with YCSB?
> To: user@hbase.apache.org
> Date: Thursday, June 2, 2011, 2:34 AM
> Zookeeper has an internal limit on
> number of connections.
> 
> Which version of hbase are you running?
> 
> On Wed, Jun 1, 2011 at 11:20 PM, Harold Lim <ro...@yahoo.com>
> wrote:
> 
> > Hi Ted,
> >
> > For some reason, when I try the forked version of
> YCSB, I can't seem to
> > launch more than 10 threads. I start getting the
> following errors:
> >
> >
> >
> > 11/06/02 02:17:35 INFO zookeeper.ZooKeeper: Initiating
> client connection,
> > connectString=xxxxxx:2181 sessionTimeout=180000
> watcher=hconnection
> > 11/06/02 02:17:35 INFO zookeeper.ClientCnxn: Opening
> socket connection to
> > server xxxxxx:2181
> > 11/06/02 02:17:35 INFO zookeeper.ClientCnxn: Socket
> connection established
> > to xxxxxx:2181, initiating session
> > 11/06/02 02:17:35 WARN zookeeper.ClientCnxn: Session
> 0x0 for server
> > xxxxxx:2181, unexpected error, closing socket
> connection and attempting
> > reconnect
> > java.io.IOException: Connection reset by peer
> >        at
> sun.nio.ch.FileDispatcher.read0(Native Method)
> >        at
> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> >        at
> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> >        at
> sun.nio.ch.IOUtil.read(IOUtil.java:169)
> >        at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> >        at
> >
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:858)
> >        at
> >
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130)
> >
> >
> > Thanks,
> > Harold
> > --- On Tue, 5/31/11, Ted Dunning <td...@maprtech.com>
> wrote:
> >
> > > From: Ted Dunning <td...@maprtech.com>
> > > Subject: Re: How to improve HBase throughput with
> YCSB?
> > > To: user@hbase.apache.org
> > > Date: Tuesday, May 31, 2011, 2:22 AM
> > > It may make it better.
> > >
> > > We should have an update shortly that will allow
> multiple
> > > machines to
> > > participate in generating load.  A single
> YCSB is
> > > sufficient to stress a few
> > > nodes but once you get to 10 or more (especially
> with MapR
> > > underneath) you
> > > really need a cluster to generate the load.
> > >
> > > The synchronization strategy is very
> simple.  We load
> > > up as usual and wait
> > > for a file to appear in Zookeeper.  When it
> appears,
> > > the load turns on.
> > >
> > > On Mon, May 30, 2011 at 10:08 PM, Harold Lim
> <ro...@yahoo.com>
> > > wrote:
> > >
> > > > I also see that you have a forked version of
> YCSB,
> > > will that make my
> > > > performance better?
> > > >
> > >
> >
> 

Re: How to improve HBase throughput with YCSB?

Posted by Ted Dunning <td...@maprtech.com>.
Zookeeper has an internal limit on number of connections.

Which version of hbase are you running?

On Wed, Jun 1, 2011 at 11:20 PM, Harold Lim <ro...@yahoo.com> wrote:

> Hi Ted,
>
> For some reason, when I try the forked version of YCSB, I can't seem to
> launch more than 10 threads. I start getting the following errors:
>
>
>
> 11/06/02 02:17:35 INFO zookeeper.ZooKeeper: Initiating client connection,
> connectString=xxxxxx:2181 sessionTimeout=180000 watcher=hconnection
> 11/06/02 02:17:35 INFO zookeeper.ClientCnxn: Opening socket connection to
> server xxxxxx:2181
> 11/06/02 02:17:35 INFO zookeeper.ClientCnxn: Socket connection established
> to xxxxxx:2181, initiating session
> 11/06/02 02:17:35 WARN zookeeper.ClientCnxn: Session 0x0 for server
> xxxxxx:2181, unexpected error, closing socket connection and attempting
> reconnect
> java.io.IOException: Connection reset by peer
>        at sun.nio.ch.FileDispatcher.read0(Native Method)
>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>        at sun.nio.ch.IOUtil.read(IOUtil.java:169)
>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:858)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130)
>
>
> Thanks,
> Harold
> --- On Tue, 5/31/11, Ted Dunning <td...@maprtech.com> wrote:
>
> > From: Ted Dunning <td...@maprtech.com>
> > Subject: Re: How to improve HBase throughput with YCSB?
> > To: user@hbase.apache.org
> > Date: Tuesday, May 31, 2011, 2:22 AM
> > It may make it better.
> >
> > We should have an update shortly that will allow multiple
> > machines to
> > participate in generating load.  A single YCSB is
> > sufficient to stress a few
> > nodes but once you get to 10 or more (especially with MapR
> > underneath) you
> > really need a cluster to generate the load.
> >
> > The synchronization strategy is very simple.  We load
> > up as usual and wait
> > for a file to appear in Zookeeper.  When it appears,
> > the load turns on.
> >
> > On Mon, May 30, 2011 at 10:08 PM, Harold Lim <ro...@yahoo.com>
> > wrote:
> >
> > > I also see that you have a forked version of YCSB,
> > will that make my
> > > performance better?
> > >
> >
>

Re: How to improve HBase throughput with YCSB?

Posted by Harold Lim <ro...@yahoo.com>.
Hi Ted,

For some reason, when I try the forked version of YCSB, I can't seem to launch more than 10 threads. I start getting the following errors:



11/06/02 02:17:35 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=xxxxxx:2181 sessionTimeout=180000 watcher=hconnection
11/06/02 02:17:35 INFO zookeeper.ClientCnxn: Opening socket connection to server xxxxxx:2181
11/06/02 02:17:35 INFO zookeeper.ClientCnxn: Socket connection established to xxxxxx:2181, initiating session
11/06/02 02:17:35 WARN zookeeper.ClientCnxn: Session 0x0 for server xxxxxx:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcher.read0(Native Method)
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
	at sun.nio.ch.IOUtil.read(IOUtil.java:169)
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
	at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:858)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130)


Thanks,
Harold
--- On Tue, 5/31/11, Ted Dunning <td...@maprtech.com> wrote:

> From: Ted Dunning <td...@maprtech.com>
> Subject: Re: How to improve HBase throughput with YCSB?
> To: user@hbase.apache.org
> Date: Tuesday, May 31, 2011, 2:22 AM
> It may make it better.
> 
> We should have an update shortly that will allow multiple
> machines to
> participate in generating load.  A single YCSB is
> sufficient to stress a few
> nodes but once you get to 10 or more (especially with MapR
> underneath) you
> really need a cluster to generate the load.
> 
> The synchronization strategy is very simple.  We load
> up as usual and wait
> for a file to appear in Zookeeper.  When it appears,
> the load turns on.
> 
> On Mon, May 30, 2011 at 10:08 PM, Harold Lim <ro...@yahoo.com>
> wrote:
> 
> > I also see that you have a forked version of YCSB,
> will that make my
> > performance better?
> >
> 

Re: How to improve HBase throughput with YCSB?

Posted by Ted Dunning <td...@maprtech.com>.
It may make it better.

We should have an update shortly that will allow multiple machines to
participate in generating load.  A single YCSB is sufficient to stress a few
nodes but once you get to 10 or more (especially with MapR underneath) you
really need a cluster to generate the load.

The synchronization strategy is very simple.  We load up as usual and wait
for a file to appear in Zookeeper.  When it appears, the load turns on.

On Mon, May 30, 2011 at 10:08 PM, Harold Lim <ro...@yahoo.com> wrote:

> I also see that you have a forked version of YCSB, will that make my
> performance better?
>

Re: How to improve HBase throughput with YCSB?

Posted by Harold Lim <ro...@yahoo.com>.
Hi Ted,

I haven't tried with bigger instances yet. Those are my next steps.

I also see that you have a forked version of YCSB, will that make my performance better?


Thanks,
Harold





--- On Tue, 5/31/11, Ted Dunning <td...@maprtech.com> wrote:

> From: Ted Dunning <td...@maprtech.com>
> Subject: Re: How to improve HBase throughput with YCSB?
> To: user@hbase.apache.org
> Date: Tuesday, May 31, 2011, 12:38 AM
> What happens if you increase heap
> space to 8GB on an m1.xlarge or
> m2.2xlarge?
> 
> 
> On Mon, May 30, 2011 at 8:50 PM, Harold Lim <ro...@yahoo.com>
> wrote:
> 
> > Hi Lohit,
> >
> > I'm running HBase 0.90.2. 10 x ec2 m1.large instances.
> I set the heap size
> > to 4GB and handler count for hbase, and dfs to 100. I
> also set the dfs max
> > xcievers to 4096
> >
> > I'm running a pure random read YCSB workload.
> >
> > I also tried running multiple clients from multiple
> ec2 instances, but that
> > just degrades the throughput of each client. I also
> tried increasing the
> > number of threads and it doesn't seem to help.
> >
> > Below is the output I get from YCSB:
> >
> > YCSB Client 0.1
> > Command line: -t -db com.yahoo.ycsb.db.HBaseClient -P
> > workloads/workloadstar-100_0 -p columnfamily=data -p
> operationcount=120000
> > -s -threads 50 -target 600
> > [OVERALL], RunTime(ms), 246398.0
> > [OVERALL], Throughput(ops/sec), 487.01694007256555
> > [READ], Operations, 120000
> > [READ], AverageLatency(ms), 70.07661666666667
> > [READ], MinLatency(ms), 0
> > [READ], MaxLatency(ms), 2779
> > [READ], 95thPercentileLatency(ms), 393
> > [READ], 99thPercentileLatency(ms), 855
> > [READ], Return=0, 120000
> >
> > Thanks,
> > Harold
> >
> >
> > --- On Mon, 5/30/11, lohit <lo...@gmail.com>
> wrote:
> >
> > > From: lohit <lo...@gmail.com>
> > > Subject: Re: How to improve HBase throughput with
> YCSB?
> > > To: user@hbase.apache.org
> > > Date: Monday, May 30, 2011, 3:38 PM
> > > Hello Harold,
> > >
> > > Can you share with us what kind of throughput you
> are
> > > seeing.
> > > Number of ops/sec and read latency you are
> seeing.
> > > Also, what version of hbase are you running.
> > >
> > > Thanks,
> > > Lohit
> > >
> > > 2011/5/30 Harold Lim <ro...@yahoo.com>
> > >
> > > > Hi All,
> > > >
> > > > I have an HBase cluster on ec2 m1.large
> instance (10
> > > region servers). I'm
> > > > trying to run a read-only YCSB workload. It
> seems that
> > > I can't get a good
> > > > throughput. It saturates to around 600+
> operations per
> > > second.
> > > >
> > > > My dataset is around 200GB (~1k+ regions).
> Running
> > > major compaction and
> > > > also setting the handler count to 100 helped
> improve
> > > the performance a
> > > > little bit.
> > > >
> > > > Are there setting or configurations that I
> need to
> > > set?
> > > >
> > > > Thanks,
> > > > Harold
> > > >
> > > >
> > >
> > >
> > > --
> > > Have a Nice Day!
> > > Lohit
> > >
> >
> 

Re: How to improve HBase throughput with YCSB?

Posted by Ted Dunning <td...@maprtech.com>.
What happens if you increase heap space to 8GB on an m1.xlarge or
m2.2xlarge?


On Mon, May 30, 2011 at 8:50 PM, Harold Lim <ro...@yahoo.com> wrote:

> Hi Lohit,
>
> I'm running HBase 0.90.2. 10 x ec2 m1.large instances. I set the heap size
> to 4GB and handler count for hbase, and dfs to 100. I also set the dfs max
> xcievers to 4096
>
> I'm running a pure random read YCSB workload.
>
> I also tried running multiple clients from multiple ec2 instances, but that
> just degrades the throughput of each client. I also tried increasing the
> number of threads and it doesn't seem to help.
>
> Below is the output I get from YCSB:
>
> YCSB Client 0.1
> Command line: -t -db com.yahoo.ycsb.db.HBaseClient -P
> workloads/workloadstar-100_0 -p columnfamily=data -p operationcount=120000
> -s -threads 50 -target 600
> [OVERALL], RunTime(ms), 246398.0
> [OVERALL], Throughput(ops/sec), 487.01694007256555
> [READ], Operations, 120000
> [READ], AverageLatency(ms), 70.07661666666667
> [READ], MinLatency(ms), 0
> [READ], MaxLatency(ms), 2779
> [READ], 95thPercentileLatency(ms), 393
> [READ], 99thPercentileLatency(ms), 855
> [READ], Return=0, 120000
>
> Thanks,
> Harold
>
>
> --- On Mon, 5/30/11, lohit <lo...@gmail.com> wrote:
>
> > From: lohit <lo...@gmail.com>
> > Subject: Re: How to improve HBase throughput with YCSB?
> > To: user@hbase.apache.org
> > Date: Monday, May 30, 2011, 3:38 PM
> > Hello Harold,
> >
> > Can you share with us what kind of throughput you are
> > seeing.
> > Number of ops/sec and read latency you are seeing.
> > Also, what version of hbase are you running.
> >
> > Thanks,
> > Lohit
> >
> > 2011/5/30 Harold Lim <ro...@yahoo.com>
> >
> > > Hi All,
> > >
> > > I have an HBase cluster on ec2 m1.large instance (10
> > region servers). I'm
> > > trying to run a read-only YCSB workload. It seems that
> > I can't get a good
> > > throughput. It saturates to around 600+ operations per
> > second.
> > >
> > > My dataset is around 200GB (~1k+ regions). Running
> > major compaction and
> > > also setting the handler count to 100 helped improve
> > the performance a
> > > little bit.
> > >
> > > Are there setting or configurations that I need to
> > set?
> > >
> > > Thanks,
> > > Harold
> > >
> > >
> >
> >
> > --
> > Have a Nice Day!
> > Lohit
> >
>

Re: How to improve HBase throughput with YCSB?

Posted by Harold Lim <ro...@yahoo.com>.
Hi Lohit,

I'm running HBase 0.90.2. 10 x ec2 m1.large instances. I set the heap size to 4GB and handler count for hbase, and dfs to 100. I also set the dfs max xcievers to 4096

I'm running a pure random read YCSB workload.

I also tried running multiple clients from multiple ec2 instances, but that just degrades the throughput of each client. I also tried increasing the number of threads and it doesn't seem to help.

Below is the output I get from YCSB:

YCSB Client 0.1
Command line: -t -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadstar-100_0 -p columnfamily=data -p operationcount=120000 -s -threads 50 -target 600
[OVERALL], RunTime(ms), 246398.0
[OVERALL], Throughput(ops/sec), 487.01694007256555
[READ], Operations, 120000
[READ], AverageLatency(ms), 70.07661666666667
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 2779
[READ], 95thPercentileLatency(ms), 393
[READ], 99thPercentileLatency(ms), 855
[READ], Return=0, 120000

Thanks,
Harold


--- On Mon, 5/30/11, lohit <lo...@gmail.com> wrote:

> From: lohit <lo...@gmail.com>
> Subject: Re: How to improve HBase throughput with YCSB?
> To: user@hbase.apache.org
> Date: Monday, May 30, 2011, 3:38 PM
> Hello Harold,
> 
> Can you share with us what kind of throughput you are
> seeing.
> Number of ops/sec and read latency you are seeing.
> Also, what version of hbase are you running.
> 
> Thanks,
> Lohit
> 
> 2011/5/30 Harold Lim <ro...@yahoo.com>
> 
> > Hi All,
> >
> > I have an HBase cluster on ec2 m1.large instance (10
> region servers). I'm
> > trying to run a read-only YCSB workload. It seems that
> I can't get a good
> > throughput. It saturates to around 600+ operations per
> second.
> >
> > My dataset is around 200GB (~1k+ regions). Running
> major compaction and
> > also setting the handler count to 100 helped improve
> the performance a
> > little bit.
> >
> > Are there setting or configurations that I need to
> set?
> >
> > Thanks,
> > Harold
> >
> >
> 
> 
> -- 
> Have a Nice Day!
> Lohit
> 

Re: How to improve HBase throughput with YCSB?

Posted by Jeff Whiting <je...@qualtrics.com>.
I saw large increases in performance with YCSB by enabling bloom filters. (Sorry it has been too 
long to remember how much, but it was significant).

~Jeff

On 6/1/2011 8:41 AM, Ted Dunning wrote:
> y need to read several locations to access your data
> since it effectively overlays multiple hfiles.

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com


Re: How to improve HBase throughput with YCSB?

Posted by Ted Dunning <td...@maprtech.com>.
Answers in-line.

On Wed, Jun 1, 2011 at 12:42 AM, Harold Lim <ro...@yahoo.com> wrote:

> Hi Ted,
>
> > You appear to be running on about 10 disks total.
> > Each disk should be
> > capable of about 100 ops per second but they appear to be
> > doing about 70.
> >  This is plausible overhead.
>
>
> Each c1.xlarge instance has 4 ephemeral disk. However, I forgot to modify
> my script to mount the other 2 ephemeral disk and add them to dfs.data.dir.
> So, it should be running on 20 disks total. That would make it 100 ops per
> second vs 35 ops per second? Is that still a plausible overhead?
>

Potentially.  Hbase may need to read several locations to access your data
since it effectively overlays multiple hfiles.


> Is there a difference to the performance if I add the 4 disks to the
> dfs.data.dir vs. setting a raid-0 of the 4 ephemeral disks and have a single
> location for dfs.data.dir?
>

I would avoid raid-0

> > Uniform random can be a reasonably good approximation if
> > you are running
> > behind a cache large enough to cache all repeated
> > accesses.  If you aren't
> > behind a cache, uniform access might be very unrealistic
> > (and pessimistic).
> >
> > Do you have logs that you can use to model your actual read
> > behaviors?
> >
>
> Right now, I'm just playing with completely uniformly random. However, I
> have also tried a Zipf distribution and the throughput seems to saturate at
> around 1.2k ops per second.
>

Harumph.

What about data that prefers recently accessed keys?

Re: How to improve HBase throughput with YCSB?

Posted by Harold Lim <ro...@yahoo.com>.
Hi Ted,

> You appear to be running on about 10 disks total. 
> Each disk should be
> capable of about 100 ops per second but they appear to be
> doing about 70.
>  This is plausible overhead.


Each c1.xlarge instance has 4 ephemeral disk. However, I forgot to modify my script to mount the other 2 ephemeral disk and add them to dfs.data.dir. So, it should be running on 20 disks total. That would make it 100 ops per second vs 35 ops per second? Is that still a plausible overhead?

Is there a difference to the performance if I add the 4 disks to the dfs.data.dir vs. setting a raid-0 of the 4 ephemeral disks and have a single location for dfs.data.dir?


I'll also try your suggestion of using multiple ebs stores.


> 
> Is your actual load going to be completely uniformly
> random?  Or will there
> be a Zipf distribution?  Will there be burst of
> repeated accesses?
> 
> Uniform random can be a reasonably good approximation if
> you are running
> behind a cache large enough to cache all repeated
> accesses.  If you aren't
> behind a cache, uniform access might be very unrealistic
> (and pessimistic).
> 
> Do you have logs that you can use to model your actual read
> behaviors?
> 

Right now, I'm just playing with completely uniformly random. However, I have also tried a Zipf distribution and the throughput seems to saturate at around 1.2k ops per second.

I actually don't have logs to model my read behaviors. I'm using HBase as part of my research project.

Thanks,
Harold  



Re: How to improve HBase throughput with YCSB?

Posted by Ted Dunning <td...@maprtech.com>.
Woof.

Of course.

Harold,

You appear to be running on about 10 disks total.  Each disk should be
capable of about 100 ops per second but they appear to be doing about 70.
 This is plausible overhead.

Try attaching 5 or 10 small EBS partitions to each of your nodes and use
them in HDFS.  That may substantially increase your maximum IOP rate and
thus your read rate.

Is your actual load going to be completely uniformly random?  Or will there
be a Zipf distribution?  Will there be burst of repeated accesses?

Uniform random can be a reasonably good approximation if you are running
behind a cache large enough to cache all repeated accesses.  If you aren't
behind a cache, uniform access might be very unrealistic (and pessimistic).

Do you have logs that you can use to model your actual read behaviors?


On Tue, May 31, 2011 at 10:00 PM, Harold Lim <ro...@yahoo.com> wrote:

> Hi Andrew,
>
> I tried running on c1.xlarge instances and the performance improved a
> little bit but the throughput is still low. I can now get throughput of 700+
> read operations per second (up from 400-500+). I was hoping to get
> throughput in the order of thousands.
>
> I was wondering if there is something wrong with my set-up or is it normal
> for HBase running on ec2 instances to get low throughput numbers?
>
>
> -Harold
>
>
> --- On Mon, 5/30/11, Andrew Purtell <ap...@apache.org> wrote:
>
> > From: Andrew Purtell <ap...@apache.org>
> > Subject: Re: How to improve HBase throughput with YCSB?
> > To: user@hbase.apache.org
> > Date: Monday, May 30, 2011, 8:33 PM
> > The hypervisor steals a lot of CPU
> > time from m1.large instances.  You should be using
> > c1.xlarge instances.
> >
> > Are you using local storage or EBS?
> >
> > Be aware that I/O performance on EC2 for any system is
> > lower than if you are using real hardware, significantly so
> > if not using one of the instance types with I/O performance
> > listed as "high".
> >
> > > 2011/5/30 Harold Lim <ro...@yahoo.com>
> > >
> > > > Hi All,
> > > >
> > > > I have an HBase cluster on ec2 m1.large instance
> > (10
> > > > region servers).
> >
> >
>

Re: How to improve HBase throughput with YCSB?

Posted by Harold Lim <ro...@yahoo.com>.
Hi Andrew,

I tried running on c1.xlarge instances and the performance improved a little bit but the throughput is still low. I can now get throughput of 700+ read operations per second (up from 400-500+). I was hoping to get throughput in the order of thousands.

I was wondering if there is something wrong with my set-up or is it normal for HBase running on ec2 instances to get low throughput numbers?


-Harold


--- On Mon, 5/30/11, Andrew Purtell <ap...@apache.org> wrote:

> From: Andrew Purtell <ap...@apache.org>
> Subject: Re: How to improve HBase throughput with YCSB?
> To: user@hbase.apache.org
> Date: Monday, May 30, 2011, 8:33 PM
> The hypervisor steals a lot of CPU
> time from m1.large instances.  You should be using
> c1.xlarge instances.
> 
> Are you using local storage or EBS?
> 
> Be aware that I/O performance on EC2 for any system is
> lower than if you are using real hardware, significantly so
> if not using one of the instance types with I/O performance
> listed as "high". 
> 
> > 2011/5/30 Harold Lim <ro...@yahoo.com>
> > 
> > > Hi All,
> > >
> > > I have an HBase cluster on ec2 m1.large instance
> (10
> > > region servers).
> 
> 

Re: How to improve HBase throughput with YCSB?

Posted by Harold Lim <ro...@yahoo.com>.
Hi Andrew,

Is this a normal behavior in m1.large instances?  Does m1.xlarge work?

I am using the local storage of the instances (ephemeral disk in EC2 terminology).
I picked m1.large because that was the "smallest" type of instance that has a high I/O performance listed.


Thanks,
Harold


--- On Mon, 5/30/11, Andrew Purtell <ap...@apache.org> wrote:

> From: Andrew Purtell <ap...@apache.org>
> Subject: Re: How to improve HBase throughput with YCSB?
> To: user@hbase.apache.org
> Date: Monday, May 30, 2011, 8:33 PM
> The hypervisor steals a lot of CPU
> time from m1.large instances.  You should be using
> c1.xlarge instances.
> 
> Are you using local storage or EBS?
> 
> Be aware that I/O performance on EC2 for any system is
> lower than if you are using real hardware, significantly so
> if not using one of the instance types with I/O performance
> listed as "high". 
> 
> > 2011/5/30 Harold Lim <ro...@yahoo.com>
> > 
> > > Hi All,
> > >
> > > I have an HBase cluster on ec2 m1.large instance
> (10
> > > region servers).
> 
> 

Re: How to improve HBase throughput with YCSB?

Posted by Andrew Purtell <ap...@apache.org>.
The hypervisor steals a lot of CPU time from m1.large instances.  You should be using c1.xlarge instances.

Are you using local storage or EBS?

Be aware that I/O performance on EC2 for any system is lower than if you are using real hardware, significantly so if not using one of the instance types with I/O performance listed as "high". 

> 2011/5/30 Harold Lim <ro...@yahoo.com>
> 
> > Hi All,
> >
> > I have an HBase cluster on ec2 m1.large instance (10
> > region servers).


Re: How to improve HBase throughput with YCSB?

Posted by lohit <lo...@gmail.com>.
Hello Harold,

Can you share with us what kind of throughput you are seeing.
Number of ops/sec and read latency you are seeing.
Also, what version of hbase are you running.

Thanks,
Lohit

2011/5/30 Harold Lim <ro...@yahoo.com>

> Hi All,
>
> I have an HBase cluster on ec2 m1.large instance (10 region servers). I'm
> trying to run a read-only YCSB workload. It seems that I can't get a good
> throughput. It saturates to around 600+ operations per second.
>
> My dataset is around 200GB (~1k+ regions). Running major compaction and
> also setting the handler count to 100 helped improve the performance a
> little bit.
>
> Are there setting or configurations that I need to set?
>
> Thanks,
> Harold
>
>


-- 
Have a Nice Day!
Lohit

Re: How to improve HBase throughput with YCSB?

Posted by Harold Lim <ro...@yahoo.com>.
Hi Ted,

I iostat my region server and it seems that there is an imbalance in the read requests of the disks.


Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
xvdap1            0.40         0.00         0.00          0          0
xvdb            429.00        11.65         0.07        116          0
xvdc            166.50         4.54         0.00         45          0
xvdd            395.20        10.78         0.00        107          0
xvde            168.10         4.53         0.00         45          0

I checked my other region servers and they are showing the same behavior.


Thanks,
Harold

--- On Tue, 5/31/11, Ted Dunning <td...@maprtech.com> wrote:

> From: Ted Dunning <td...@maprtech.com>
> Subject: Re: How to improve HBase throughput with YCSB?
> To: user@hbase.apache.org
> Date: Tuesday, May 31, 2011, 2:23 AM
> Try iostat or if you are running it,
> try ganglia.
> 
> On Mon, May 30, 2011 at 10:07 PM, Harold Lim <ro...@yahoo.com>
> wrote:
> 
> > How do I know how much data is moving from the disk?
> >
> 

Re: How to improve HBase throughput with YCSB?

Posted by Ted Dunning <td...@maprtech.com>.
Try iostat or if you are running it, try ganglia.

On Mon, May 30, 2011 at 10:07 PM, Harold Lim <ro...@yahoo.com> wrote:

> How do I know how much data is moving from the disk?
>

Re: How to improve HBase throughput with YCSB?

Posted by Harold Lim <ro...@yahoo.com>.
Hi Ted,

I read all fields in the record. I was trying to get similar performance from the YCSB paper.

How do I know how much data is moving from the disk?

-Harold
--- On Tue, 5/31/11, Ted Dunning <td...@maprtech.com> wrote:

> From: Ted Dunning <td...@maprtech.com>
> Subject: Re: How to improve HBase throughput with YCSB?
> To: user@hbase.apache.org
> Date: Tuesday, May 31, 2011, 12:34 AM
> How large are the reads?
> 
> Have you tried this on a better instance type such as was
> suggested a bit
> ago?
> 
> How much data is moving from the disks?
> 
> 
> On Mon, May 30, 2011 at 8:46 PM, Harold Lim <ro...@yahoo.com>
> wrote:
> 
> > Hi Ted,
> >
> > It's a pure random read operation.
> >
> >
> > -Harold
> > --- On Mon, 5/30/11, Ted Dunning <td...@maprtech.com>
> wrote:
> >
> > > From: Ted Dunning <td...@maprtech.com>
> > > Subject: Re: How to improve HBase throughput with
> YCSB?
> > > To: user@hbase.apache.org
> > > Date: Monday, May 30, 2011, 3:07 PM
> > > What kind of operations?
> > >
> > > On Mon, May 30, 2011 at 9:43 AM, Harold Lim
> <ro...@yahoo.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I have an HBase cluster on ec2 m1.large
> instance (10
> > > region servers). I'm
> > > > trying to run a read-only YCSB workload. It
> seems that
> > > I can't get a good
> > > > throughput. It saturates to around 600+
> operations per
> > > second.
> > > >
> > > > My dataset is around 200GB (~1k+ regions).
> Running
> > > major compaction and
> > > > also setting the handler count to 100 helped
> improve
> > > the performance a
> > > > little bit.
> > > >
> > > > Are there setting or configurations that I
> need to
> > > set?
> > > >
> > > > Thanks,
> > > > Harold
> > > >
> > > >
> > >
> >
> 

Re: How to improve HBase throughput with YCSB?

Posted by Ted Dunning <td...@maprtech.com>.
How large are the reads?

Have you tried this on a better instance type such as was suggested a bit
ago?

How much data is moving from the disks?


On Mon, May 30, 2011 at 8:46 PM, Harold Lim <ro...@yahoo.com> wrote:

> Hi Ted,
>
> It's a pure random read operation.
>
>
> -Harold
> --- On Mon, 5/30/11, Ted Dunning <td...@maprtech.com> wrote:
>
> > From: Ted Dunning <td...@maprtech.com>
> > Subject: Re: How to improve HBase throughput with YCSB?
> > To: user@hbase.apache.org
> > Date: Monday, May 30, 2011, 3:07 PM
> > What kind of operations?
> >
> > On Mon, May 30, 2011 at 9:43 AM, Harold Lim <ro...@yahoo.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > I have an HBase cluster on ec2 m1.large instance (10
> > region servers). I'm
> > > trying to run a read-only YCSB workload. It seems that
> > I can't get a good
> > > throughput. It saturates to around 600+ operations per
> > second.
> > >
> > > My dataset is around 200GB (~1k+ regions). Running
> > major compaction and
> > > also setting the handler count to 100 helped improve
> > the performance a
> > > little bit.
> > >
> > > Are there setting or configurations that I need to
> > set?
> > >
> > > Thanks,
> > > Harold
> > >
> > >
> >
>

Re: How to improve HBase throughput with YCSB?

Posted by Harold Lim <ro...@yahoo.com>.
Hi Ted,

It's a pure random read operation.


-Harold
--- On Mon, 5/30/11, Ted Dunning <td...@maprtech.com> wrote:

> From: Ted Dunning <td...@maprtech.com>
> Subject: Re: How to improve HBase throughput with YCSB?
> To: user@hbase.apache.org
> Date: Monday, May 30, 2011, 3:07 PM
> What kind of operations?
> 
> On Mon, May 30, 2011 at 9:43 AM, Harold Lim <ro...@yahoo.com>
> wrote:
> 
> > Hi All,
> >
> > I have an HBase cluster on ec2 m1.large instance (10
> region servers). I'm
> > trying to run a read-only YCSB workload. It seems that
> I can't get a good
> > throughput. It saturates to around 600+ operations per
> second.
> >
> > My dataset is around 200GB (~1k+ regions). Running
> major compaction and
> > also setting the handler count to 100 helped improve
> the performance a
> > little bit.
> >
> > Are there setting or configurations that I need to
> set?
> >
> > Thanks,
> > Harold
> >
> >
> 

Re: How to improve HBase throughput with YCSB?

Posted by Ted Dunning <td...@maprtech.com>.
What kind of operations?

On Mon, May 30, 2011 at 9:43 AM, Harold Lim <ro...@yahoo.com> wrote:

> Hi All,
>
> I have an HBase cluster on ec2 m1.large instance (10 region servers). I'm
> trying to run a read-only YCSB workload. It seems that I can't get a good
> throughput. It saturates to around 600+ operations per second.
>
> My dataset is around 200GB (~1k+ regions). Running major compaction and
> also setting the handler count to 100 helped improve the performance a
> little bit.
>
> Are there setting or configurations that I need to set?
>
> Thanks,
> Harold
>
>