You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by llpind <so...@hotmail.com> on 2009/08/11 18:21:21 UTC

HBase in a real world application

As some of you know, I've been playing with HBase on/off for the past few
months.

I'd like your take on some cluster setup/configuration setting that you’ve
found successful.  Also, any other thoughts on how I can persuade usage of
HBase.

Assume:  Working with ~2 TB of data.  A few very tall tables. Hadoop/HBase
0.20.0.

1.	What specs should a master box have (speed, HD, RAM)?  Should Slave boxes
be different?
2.	Recommended size of cluster?  I realize this depends on what
load/performance requirements we have, but I’d like to know your thoughts
based on #1 specs.
3.	Should zookeeper quorums run on different boxes than regionservers?


Basically if you could give some example cluster configurations with the
amount of data your working with that would be a lot of help (or point me to
a place were this has been discussed for .20).  Currently I don’t have the
funds to play around with a lot of boxes, but I hope to soon.  :)  Thanks.

-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24920888.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: HBase in a real world application

Posted by llpind <so...@hotmail.com>.
Thanks stack.  Will try applying patch.  



stack-3 wrote:
> 
> Our writes were off by a factor of 7 or 8.  Writes should be better now
> (HBASE-1771).
> Thanks,
> St.Ack
> 
> 
> On Thu, Aug 13, 2009 at 4:53 PM, stack <st...@duboce.net> wrote:
> 
>> I just tried it.  It seems slow to me writing too.  Let me take a
>> look....
>> St.Ack
>>
>>
>> On Thu, Aug 13, 2009 at 10:06 AM, llpind <so...@hotmail.com> wrote:
>>
>>>
>>> Okay I changed replication to 2.  and removed "-XX:NewSize=6m
>>> -XX:MaxNewSize=6m"
>>>
>>> here is results for randomWrite 3 clients:
>>>
>>>
>>>
>>> RandomWrite =================================================
>>>
>>> hadoop-0.20.0/bin/hadoop jar hbase-0.20.0/hbase-0.20.0-test.jar
>>>  --nomapred
>>> randomWrite 3
>>>
>>>
>>> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-0 Start
>>> randomWrite at offset 0 for 1048576 rows
>>> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-1 Start
>>> randomWrite at offset 1048576 for 1048576 rows
>>> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-2 Start
>>> randomWrite at offset 2097152 for 1048576 rows
>>> 09/08/13 09:51:47 INFO hbase.PerformanceEvaluation: client-0
>>> 0/104857/1048576
>>> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-1
>>> 1048576/1153427/2097152
>>> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-2
>>> 2097152/2201997/3145728
>>> 09/08/13 09:52:22 INFO hbase.PerformanceEvaluation: client-1
>>> 1048576/1258284/2097152
>>> 09/08/13 09:52:23 INFO hbase.PerformanceEvaluation: client-0
>>> 0/209714/1048576
>>> 09/08/13 09:52:24 INFO hbase.PerformanceEvaluation: client-2
>>> 2097152/2306854/3145728
>>> 09/08/13 09:52:47 INFO hbase.PerformanceEvaluation: client-1
>>> 1048576/1363141/2097152
>>> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-0
>>> 0/314571/1048576
>>> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-2
>>> 2097152/2411711/3145728
>>> 09/08/13 09:53:24 INFO hbase.PerformanceEvaluation: client-1
>>> 1048576/1467998/2097152
>>> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-0
>>> 0/419428/1048576
>>> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-2
>>> 2097152/2516568/3145728
>>> 09/08/13 09:53:48 INFO hbase.PerformanceEvaluation: client-1
>>> 1048576/1572855/2097152
>>> 09/08/13 09:54:08 INFO hbase.PerformanceEvaluation: client-2
>>> 2097152/2621425/3145728
>>> 09/08/13 09:54:10 INFO hbase.PerformanceEvaluation: client-0
>>> 0/524285/1048576
>>> 09/08/13 09:54:40 INFO hbase.PerformanceEvaluation: client-1
>>> 1048576/1677712/2097152
>>> 09/08/13 09:54:49 INFO hbase.PerformanceEvaluation: client-2
>>> 2097152/2726282/3145728
>>> 09/08/13 09:54:52 INFO hbase.PerformanceEvaluation: client-0
>>> 0/629142/1048576
>>> 09/08/13 09:55:57 INFO hbase.PerformanceEvaluation: client-1
>>> 1048576/1782569/2097152
>>> 09/08/13 09:56:21 INFO hbase.PerformanceEvaluation: client-2
>>> 2097152/2831139/3145728
>>> 09/08/13 09:56:41 INFO hbase.PerformanceEvaluation: client-0
>>> 0/733999/1048576
>>> 09/08/13 09:57:23 INFO hbase.PerformanceEvaluation: client-1
>>> 1048576/1887426/2097152
>>> 09/08/13 09:58:40 INFO hbase.PerformanceEvaluation: client-2
>>> 2097152/2935996/3145728
>>> 09/08/13 09:58:54 INFO hbase.PerformanceEvaluation: client-0
>>> 0/838856/1048576
>>> 09/08/13 10:00:29 INFO hbase.PerformanceEvaluation: client-1
>>> 1048576/1992283/2097152
>>> 09/08/13 10:01:01 INFO hbase.PerformanceEvaluation: client-2
>>> 2097152/3040853/3145728
>>> 09/08/13 10:01:24 INFO hbase.PerformanceEvaluation: client-0
>>> 0/943713/1048576
>>> 09/08/13 10:02:36 INFO hbase.PerformanceEvaluation: client-1
>>> 1048576/2097140/2097152
>>> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: client-1 Finished
>>> randomWrite in 680674ms at offset 1048576 for 1048576 rows
>>> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: Finished 1 in
>>> 680674ms
>>> writing 1048576 rows
>>> 09/08/13 10:03:19 INFO hbase.PerformanceEvaluation: client-2
>>> 2097152/3145710/3145728
>>> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: client-2 Finished
>>> randomWrite in 723771ms at offset 2097152 for 1048576 rows
>>> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: Finished 2 in
>>> 723771ms
>>> writing 1048576 rows
>>> 09/08/13 10:03:41 INFO hbase.PerformanceEvaluation: client-0
>>> 0/1048570/1048576
>>> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: client-0 Finished
>>> randomWrite in 746054ms at offset 0 for 1048576 rows
>>> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: Finished 0 in
>>> 746054ms
>>> writing 1048576 rows
>>>
>>>
>>>
>>> ============================================================
>>>
>>> Still pretty slow.  Any other ideas?  I'm running the client from the
>>> master
>>> box, but its not running any regionServers or datanodes.
>>>
>>> stack-3 wrote:
>>> >
>>> > Your config. looks fine.
>>> >
>>> > Only think that gives me pause is:
>>> >
>>> > "-XX:NewSize=6m -XX:MaxNewSize=6m"
>>> >
>>> > Any reason for the above?
>>> >
>>> > If you study your GC logs, lots of pauses?
>>> >
>>> > Oh, and this: replication is set to 6.  Why 6?  Each write must commit
>>> to
>>> > 6
>>> > datanodes before complete.  In the tests posted on wiki, we replicate
>>> to
>>> 3
>>> > nodes.
>>> >
>>> > In end of this message you say you are doing gets?  Numbers you posted
>>> > were
>>> > for writes?
>>> >
>>> > St.Ack
>>> >
>>> >
>>> > On Wed, Aug 12, 2009 at 1:15 PM, llpind <so...@hotmail.com>
>>> wrote:
>>> >
>>> >>
>>> >> Not sure why my performance is so slow.  Here is my configuration:
>>> >>
>>> >> box1:
>>> >> 10395 SecondaryNameNode
>>> >> 11628 Jps
>>> >> 10131 NameNode
>>> >> 10638 HQuorumPeer
>>> >> 10705 HMaster
>>> >>
>>> >> box 2-5:
>>> >> 6741 HQuorumPeer
>>> >> 6841 HRegionServer
>>> >> 7881 Jps
>>> >> 6610 DataNode
>>> >>
>>> >>
>>> >> hbase site: =======================
>>> >> <?xml version="1.0"?>
>>> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>> >> <!--
>>> >> /**
>>> >>  * Copyright 2007 The Apache Software Foundation
>>> >>  *
>>> >>  * Licensed to the Apache Software Foundation (ASF) under one
>>> >>  * or more contributor license agreements.  See the NOTICE file
>>> >>  * distributed with this work for additional information
>>> >>  * regarding copyright ownership.  The ASF licenses this file
>>> >>  * to you under the Apache License, Version 2.0 (the
>>> >>  * "License"); you may not use this file except in compliance
>>> >>  * with the License.  You may obtain a copy of the License at
>>> >>  *
>>> >>  *     http://www.apache.org/licenses/LICENSE-2.0
>>> >>  *
>>> >>  * Unless required by applicable law or agreed to in writing,
>>> software
>>> >>  * distributed under the License is distributed on an "AS IS" BASIS,
>>> >>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>>> >> implied.
>>> >>  * See the License for the specific language governing permissions
>>> and
>>> >>  * limitations under the License.
>>> >>  */
>>> >> -->
>>> >> <configuration>
>>> >>  <property>
>>> >>    <name>hbase.rootdir</name>
>>> >>    <value>hdfs://box1:9000/hbase</value>
>>> >>    <description>The directory shared by region servers.
>>> >>    </description>
>>> >>  </property>
>>> >>  <property>
>>> >>    <name>hbase.master.port</name>
>>> >>    <value>60000</value>
>>> >>    <description>The port that the HBase master runs at.
>>> >>    </description>
>>> >>  </property>
>>> >>  <property>
>>> >>    <name>hbase.cluster.distributed</name>
>>> >>    <value>true</value>
>>> >>    <description>The mode the cluster will be in. Possible values are
>>> >>      false: standalone and pseudo-distributed setups with managed
>>> >> Zookeeper
>>> >>      true: fully-distributed with unmanaged Zookeeper Quorum (see
>>> >> hbase-env.sh)
>>> >>    </description>
>>> >>  </property>
>>> >>  <property>
>>> >>    <name>hbase.regionserver.lease.period</name>
>>> >>    <value>120000</value>
>>> >>    <description>HRegion server lease period in milliseconds. Default
>>> is
>>> >>    60 seconds. Clients must report in within this period else they
>>> are
>>> >>    considered dead.</description>
>>> >>  </property>
>>> >>
>>> >>  <property>
>>> >>      <name>hbase.zookeeper.property.clientPort</name>
>>> >>      <value>2222</value>
>>> >>      <description>Property from ZooKeeper's config zoo.cfg.
>>> >>      The port at which the clients will connect.
>>> >>      </description>
>>> >>  </property>
>>> >>  <property>
>>> >>      <name>hbase.zookeeper.property.dataDir</name>
>>> >>      <value>/home/hadoop/zookeeper</value>
>>> >>  </property>
>>> >>  <property>
>>> >>      <name>hbase.zookeeper.property.syncLimit</name>
>>> >>      <value>5</value>
>>> >>  </property>
>>> >>  <property>
>>> >>      <name>hbase.zookeeper.property.tickTime</name>
>>> >>      <value>2000</value>
>>> >>  </property>
>>> >>  <property>
>>> >>      <name>hbase.zookeeper.property.initLimit</name>
>>> >>      <value>10</value>
>>> >>  </property>
>>> >>  <property>
>>> >>      <name>hbase.zookeeper.quorum</name>
>>> >>      <value>box1,box2,box3,box4</value>
>>> >>      <description>Comma separated list of servers in the ZooKeeper
>>> >> Quorum.
>>> >>      For example,
>>> >> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
>>> >>      By default this is set to localhost for local and
>>> pseudo-distributed
>>> >> modes
>>> >>      of operation. For a fully-distributed setup, this should be set
>>> to
>>> a
>>> >> full
>>> >>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
>>> >> hbase-env.sh
>>> >>      this is the list of servers which we will start/stop ZooKeeper
>>> on.
>>> >>      </description>
>>> >>  </property>
>>> >>  <property>
>>> >>    <name>hfile.block.cache.size</name>
>>> >>    <value>.5</value>
>>> >>    <description>text</description>
>>> >>  </property>
>>> >>
>>> >> </configuration>
>>> >>
>>> >>
>>> >> hbase env:====================================================
>>> >>
>>> >> export HBASE_CLASSPATH=${HADOOP_CONF_DIR}
>>> >>
>>> >> export HBASE_HEAPSIZE=3000
>>> >>
>>> >> export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m
>>> >> -XX:+UseConcMarkSweepGC
>>> >> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>>> >> -XX:+CMSIncrementalMode
>>> >> -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"
>>> >>
>>> >> export HBASE_MANAGES_ZK=true
>>> >>
>>> >> Hadoop core
>>> >> site===========================================================
>>> >>
>>> >> <?xml version="1.0"?>
>>> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>> >>
>>> >> <!-- Put site-specific property overrides in this file. -->
>>> >>
>>> >> <configuration>
>>> >> <property>
>>> >>   <name>fs.default.name</name>
>>> >>   <value>hdfs://box1:9000</value>
>>> >>   <description>The name of the default file system.  A URI whose
>>> >>   scheme and authority determine the FileSystem implementation.  The
>>> >>   uri's scheme determines the config property (fs.SCHEME.impl) naming
>>> >>   the FileSystem implementation class.  The uri's authority is used
>>> to
>>> >>   determine the host, port, etc. for a filesystem.</description>
>>> >> </property>
>>> >> <property>
>>> >>  <name>hadoop.tmp.dir</name>
>>> >>  <value>/data/hadoop-0.20.0-${user.name}</value>
>>> >>  <description>A base for other temporary directories.</description>
>>> >> </property>
>>> >> </configuration>
>>> >>
>>> >> ==============
>>> >>
>>> >> replication is set to 6.
>>> >>
>>> >> hadoop env=================
>>> >>
>>> >> export HADOOP_HEAPSIZE=3000
>>> >> export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
>>> >> $HADOOP_NAMENODE_OPTS"
>>> >> export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
>>> >> $HADOOP_SECONDARYNAMENODE_OPTS"
>>> >> export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
>>> >> $HADOOP_DATANODE_OPTS"
>>> >> export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
>>> >> $HADOOP_BALANCER_OPTS"
>>> >> export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
>>> >> $HADOOP_JOBTRACKER_OPTS"
>>> >>  ==================
>>> >>
>>> >>
>>> >> Very basic setup.  then i start the cluster do simple random Get
>>> >> operations
>>> >> on a tall table (~60 M rows):
>>> >>
>>> >> {NAME => 'tallTable', FAMILIES => [{NAME => 'family1', COMPRESSION =>
>>> >> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
>>> >> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
>>> >>
>>> >> Is this fairly normal speeds?  I'm unsure if this is a result of
>>> having
>>> a
>>> >> small cluster?  Please advise...
>>> >>
>>> >> stack-3 wrote:
>>> >> >
>>> >> > Yeah, seems slow.  In old hbase, it could do 5-10k writes a second
>>> >> going
>>> >> > by
>>> >> > performance eval page up on wiki.  SequentialWrite was about same
>>> as
>>> >> > RandomWrite.  Check out the stats on hw up on that page and
>>> description
>>> >> of
>>> >> > how test was set up.  Can you figure where its slow?
>>> >> >
>>> >> > St.Ack
>>> >> >
>>> >> > On Wed, Aug 12, 2009 at 10:10 AM, llpind <so...@hotmail.com>
>>> >> wrote:
>>> >> >
>>> >> >>
>>> >> >> Thanks Stack.
>>> >> >>
>>> >> >> I will try mapred with more clients.   I tried it without mapred
>>> using
>>> >> 3
>>> >> >> clients Random Write operations here was the output:
>>> >> >>
>>> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0 Start
>>> >> >> randomWrite at offset 0 for 1048576 rows
>>> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1 Start
>>> >> >> randomWrite at offset 1048576 for 1048576 rows
>>> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2 Start
>>> >> >> randomWrite at offset 2097152 for 1048576 rows
>>> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
>>> >> >> 1048576/1153427/2097152
>>> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
>>> >> >> 2097152/2201997/3145728
>>> >> >> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
>>> >> >> 0/104857/1048576
>>> >> >> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
>>> >> >> 0/209714/1048576
>>> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
>>> >> >> 1048576/1258284/2097152
>>> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
>>> >> >> 2097152/2306854/3145728
>>> >> >> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
>>> >> >> 1048576/1363141/2097152
>>> >> >> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
>>> >> >> 0/314571/1048576
>>> >> >> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
>>> >> >> 2097152/2411711/3145728
>>> >> >> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
>>> >> >> 0/419428/1048576
>>> >> >> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
>>> >> >> 1048576/1467998/2097152
>>> >> >> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
>>> >> >> 2097152/2516568/3145728
>>> >> >> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
>>> >> >> 0/524285/1048576
>>> >> >> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
>>> >> >> 2097152/2621425/3145728
>>> >> >> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
>>> >> >> 1048576/1572855/2097152
>>> >> >> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
>>> >> >> 0/629142/1048576
>>> >> >> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
>>> >> >> 2097152/2726282/3145728
>>> >> >> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
>>> >> >> 1048576/1677712/2097152
>>> >> >> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
>>> >> >> 0/733999/1048576
>>> >> >> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
>>> >> >> 2097152/2831139/3145728
>>> >> >> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
>>> >> >> 1048576/1782569/2097152
>>> >> >> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
>>> >> >> 0/838856/1048576
>>> >> >> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
>>> >> >> 2097152/2935996/3145728
>>> >> >> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
>>> >> >> 1048576/1887426/2097152
>>> >> >> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
>>> >> >> 0/943713/1048576
>>> >> >> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
>>> >> >> 2097152/3040853/3145728
>>> >> >> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
>>> >> >> 1048576/1992283/2097152
>>> >> >> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
>>> >> >> 0/1048570/1048576
>>> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0
>>> Finished
>>> >> >> randomWrite in 2376615ms at offset 0 for 1048576 rows
>>> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished 0 in
>>> >> >> 2376615ms
>>> >> >> writing 1048576 rows
>>> >> >> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
>>> >> >> 2097152/3145710/3145728
>>> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2
>>> Finished
>>> >> >> randomWrite in 2623395ms at offset 2097152 for 1048576 rows
>>> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished 2 in
>>> >> >> 2623395ms
>>> >> >> writing 1048576 rows
>>> >> >> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
>>> >> >> 1048576/2097140/2097152
>>> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1
>>> Finished
>>> >> >> randomWrite in 2630199ms at offset 1048576 for 1048576 rows
>>> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished 1 in
>>> >> >> 2630199ms
>>> >> >> writing 1048576 rows
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> Seems kind of slow for ~3M records.  I have a 4 node cluster up at
>>> the
>>> >> >> moment.  HMaster & Namenode running on same box.
>>> >> >> --
>>> >> >> View this message in context:
>>> >> >>
>>> >>
>>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
>>> >> >> Sent from the HBase User mailing list archive at Nabble.com.
>>> >> >>
>>> >> >>
>>> >> >
>>> >> >
>>> >>
>>> >> --
>>> >> View this message in context:
>>> >>
>>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
>>> >> Sent from the HBase User mailing list archive at Nabble.com.
>>> >>
>>> >>
>>> >
>>> >
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24955595.html
>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>
>>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p25119918.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: HBase in a real world application

Posted by stack <st...@duboce.net>.
Done (I added you as a contributor in JIRA).
Thanks,
St.Ack

On Tue, Aug 18, 2009 at 12:34 PM, Schubert Zhang <zs...@gmail.com> wrote:

> stack, please assign
> *HBASE-1778*<https://issues.apache.org/jira/browse/HBASE-1778> to
> me.
>
> On Wed, Aug 19, 2009 at 3:16 AM, Schubert Zhang <zs...@gmail.com> wrote:
>
> > ok, stack. I will done it as soon as possiable.
> >
> >
> > On Wed, Aug 19, 2009 at 2:47 AM, stack <st...@duboce.net> wrote:
> >
> >> Can you make an issue and a patch please Schubert?
> >> St.Ack
> >>
> >> On Tue, Aug 18, 2009 at 10:52 AM, Schubert Zhang <zs...@gmail.com>
> >> wrote:
> >>
> >> > We found that there are two issues about the PerformanceEvaluation
> >> class.
> >> > - Is not match for hadoop-0.20.0.
> >> > - The approach to split map is not strict. Need to provide correct
> >> > InputSplit and InputFormat classes.
> >> >
> >> > And we have just modified the
> >> org.apache.hadoop.hbase.PerformanceEvaluation
> >> > for our evaluations. Please get our code at:
> >> >
> >> > http://dl.getdropbox.com/u/24074/code/PerformanceEvaluation.java
> >> >
> >> > Here is our evaluations of 0.20.0 RC1
> >> >
> >>
> http://docloud.blogspot.com/2009/08/hbase-0200-performance-evaluation.html
> >> >
> >> > Schubert
> >> >
> >> > On Tue, Aug 18, 2009 at 8:37 AM, Jeff Hammerbacher <
> hammer@cloudera.com
> >> > >wrote:
> >> >
> >> > > Thanks guys. For the lazy (e.g. me) and future searchers, here are
> >> some
> >> > > links. The benchmark is meant to simulate the same performance tests
> >> > quoted
> >> > > in Google's BigTable paper.
> >> > >
> >> > > * PerformanceEvaluation wiki page:
> >> > > http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation
> >> > > * PerformanceEvaluation.java:
> >> > >
> >> > >
> >> >
> >>
> http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/test/org/apache/hadoop/hbase/PerformanceEvaluation.java?view=co
> >> > >
> >> > > Thanks,
> >> > > Jeff
> >> > >
> >> > > On Mon, Aug 17, 2009 at 5:09 PM, stack <st...@duboce.net> wrote:
> >> > >
> >> > > > On Mon, Aug 17, 2009 at 4:54 PM, Jeff Hammerbacher <
> >> > hammer@cloudera.com
> >> > > > >wrote:
> >> > > >
> >> > > > > Hey Stack,
> >> > > > >
> >> > > > > I notice that the patch for this issue doesn't include any sort
> of
> >> > > tests
> >> > > > > that might have caught this regression. Do you guys have an
> >> > HBaseBench,
> >> > > > > HBaseMix, or similarly named tool for catching performance
> >> > regressions?
> >> > > > >
> >> > > >
> >> > > > Not as part of our build.  The way its currently done is that near
> >> > > release,
> >> > > > we run our little PerformanceEvaluation doohickey.  If its way
> off,
> >> > crack
> >> > > > the profiler.
> >> > > >
> >> > > > We have been trying to get some of the hadoop allotment of EC2
> time
> >> so
> >> > we
> >> > > > could set up a regular run up on AWS but no luck so far.
> >> > > >
> >> > > > Good on your Jeff,
> >> > > > St.Ack
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Jeff
> >> > > > >
> >> > > > > On Mon, Aug 17, 2009 at 4:51 PM, stack <st...@duboce.net>
> wrote:
> >> > > > >
> >> > > > > > Our writes were off by a factor of 7 or 8.  Writes should be
> >> better
> >> > > now
> >> > > > > > (HBASE-1771).
> >> > > > > > Thanks,
> >> > > > > > St.Ack
> >> > > > > >
> >> > > > > >
> >> > > > > > On Thu, Aug 13, 2009 at 4:53 PM, stack <st...@duboce.net>
> >> wrote:
> >> > > > > >
> >> > > > > > > I just tried it.  It seems slow to me writing too.  Let me
> >> take a
> >> > > > > > look....
> >> > > > > > > St.Ack
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Thu, Aug 13, 2009 at 10:06 AM, llpind <
> >> sonny_heer@hotmail.com
> >> > >
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > >>
> >> > > > > > >> Okay I changed replication to 2.  and removed
> "-XX:NewSize=6m
> >> > > > > > >> -XX:MaxNewSize=6m"
> >> > > > > > >>
> >> > > > > > >> here is results for randomWrite 3 clients:
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >> RandomWrite
> =================================================
> >> > > > > > >>
> >> > > > > > >> hadoop-0.20.0/bin/hadoop jar
> >> hbase-0.20.0/hbase-0.20.0-test.jar
> >> > > > > > >>  --nomapred
> >> > > > > > >> randomWrite 3
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation:
> client-0
> >> > Start
> >> > > > > > >> randomWrite at offset 0 for 1048576 rows
> >> > > > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation:
> client-1
> >> > Start
> >> > > > > > >> randomWrite at offset 1048576 for 1048576 rows
> >> > > > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation:
> client-2
> >> > Start
> >> > > > > > >> randomWrite at offset 2097152 for 1048576 rows
> >> > > > > > >> 09/08/13 09:51:47 INFO hbase.PerformanceEvaluation:
> client-0
> >> > > > > > >> 0/104857/1048576
> >> > > > > > >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation:
> client-1
> >> > > > > > >> 1048576/1153427/2097152
> >> > > > > > >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation:
> client-2
> >> > > > > > >> 2097152/2201997/3145728
> >> > > > > > >> 09/08/13 09:52:22 INFO hbase.PerformanceEvaluation:
> client-1
> >> > > > > > >> 1048576/1258284/2097152
> >> > > > > > >> 09/08/13 09:52:23 INFO hbase.PerformanceEvaluation:
> client-0
> >> > > > > > >> 0/209714/1048576
> >> > > > > > >> 09/08/13 09:52:24 INFO hbase.PerformanceEvaluation:
> client-2
> >> > > > > > >> 2097152/2306854/3145728
> >> > > > > > >> 09/08/13 09:52:47 INFO hbase.PerformanceEvaluation:
> client-1
> >> > > > > > >> 1048576/1363141/2097152
> >> > > > > > >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation:
> client-0
> >> > > > > > >> 0/314571/1048576
> >> > > > > > >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation:
> client-2
> >> > > > > > >> 2097152/2411711/3145728
> >> > > > > > >> 09/08/13 09:53:24 INFO hbase.PerformanceEvaluation:
> client-1
> >> > > > > > >> 1048576/1467998/2097152
> >> > > > > > >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation:
> client-0
> >> > > > > > >> 0/419428/1048576
> >> > > > > > >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation:
> client-2
> >> > > > > > >> 2097152/2516568/3145728
> >> > > > > > >> 09/08/13 09:53:48 INFO hbase.PerformanceEvaluation:
> client-1
> >> > > > > > >> 1048576/1572855/2097152
> >> > > > > > >> 09/08/13 09:54:08 INFO hbase.PerformanceEvaluation:
> client-2
> >> > > > > > >> 2097152/2621425/3145728
> >> > > > > > >> 09/08/13 09:54:10 INFO hbase.PerformanceEvaluation:
> client-0
> >> > > > > > >> 0/524285/1048576
> >> > > > > > >> 09/08/13 09:54:40 INFO hbase.PerformanceEvaluation:
> client-1
> >> > > > > > >> 1048576/1677712/2097152
> >> > > > > > >> 09/08/13 09:54:49 INFO hbase.PerformanceEvaluation:
> client-2
> >> > > > > > >> 2097152/2726282/3145728
> >> > > > > > >> 09/08/13 09:54:52 INFO hbase.PerformanceEvaluation:
> client-0
> >> > > > > > >> 0/629142/1048576
> >> > > > > > >> 09/08/13 09:55:57 INFO hbase.PerformanceEvaluation:
> client-1
> >> > > > > > >> 1048576/1782569/2097152
> >> > > > > > >> 09/08/13 09:56:21 INFO hbase.PerformanceEvaluation:
> client-2
> >> > > > > > >> 2097152/2831139/3145728
> >> > > > > > >> 09/08/13 09:56:41 INFO hbase.PerformanceEvaluation:
> client-0
> >> > > > > > >> 0/733999/1048576
> >> > > > > > >> 09/08/13 09:57:23 INFO hbase.PerformanceEvaluation:
> client-1
> >> > > > > > >> 1048576/1887426/2097152
> >> > > > > > >> 09/08/13 09:58:40 INFO hbase.PerformanceEvaluation:
> client-2
> >> > > > > > >> 2097152/2935996/3145728
> >> > > > > > >> 09/08/13 09:58:54 INFO hbase.PerformanceEvaluation:
> client-0
> >> > > > > > >> 0/838856/1048576
> >> > > > > > >> 09/08/13 10:00:29 INFO hbase.PerformanceEvaluation:
> client-1
> >> > > > > > >> 1048576/1992283/2097152
> >> > > > > > >> 09/08/13 10:01:01 INFO hbase.PerformanceEvaluation:
> client-2
> >> > > > > > >> 2097152/3040853/3145728
> >> > > > > > >> 09/08/13 10:01:24 INFO hbase.PerformanceEvaluation:
> client-0
> >> > > > > > >> 0/943713/1048576
> >> > > > > > >> 09/08/13 10:02:36 INFO hbase.PerformanceEvaluation:
> client-1
> >> > > > > > >> 1048576/2097140/2097152
> >> > > > > > >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation:
> client-1
> >> > > > Finished
> >> > > > > > >> randomWrite in 680674ms at offset 1048576 for 1048576 rows
> >> > > > > > >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation:
> Finished
> >> 1
> >> > in
> >> > > > > > 680674ms
> >> > > > > > >> writing 1048576 rows
> >> > > > > > >> 09/08/13 10:03:19 INFO hbase.PerformanceEvaluation:
> client-2
> >> > > > > > >> 2097152/3145710/3145728
> >> > > > > > >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation:
> client-2
> >> > > > Finished
> >> > > > > > >> randomWrite in 723771ms at offset 2097152 for 1048576 rows
> >> > > > > > >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation:
> Finished
> >> 2
> >> > in
> >> > > > > > 723771ms
> >> > > > > > >> writing 1048576 rows
> >> > > > > > >> 09/08/13 10:03:41 INFO hbase.PerformanceEvaluation:
> client-0
> >> > > > > > >> 0/1048570/1048576
> >> > > > > > >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation:
> client-0
> >> > > > Finished
> >> > > > > > >> randomWrite in 746054ms at offset 0 for 1048576 rows
> >> > > > > > >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation:
> Finished
> >> 0
> >> > in
> >> > > > > > 746054ms
> >> > > > > > >> writing 1048576 rows
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >>
> ============================================================
> >> > > > > > >>
> >> > > > > > >> Still pretty slow.  Any other ideas?  I'm running the
> client
> >> > from
> >> > > > the
> >> > > > > > >> master
> >> > > > > > >> box, but its not running any regionServers or datanodes.
> >> > > > > > >>
> >> > > > > > >> stack-3 wrote:
> >> > > > > > >> >
> >> > > > > > >> > Your config. looks fine.
> >> > > > > > >> >
> >> > > > > > >> > Only think that gives me pause is:
> >> > > > > > >> >
> >> > > > > > >> > "-XX:NewSize=6m -XX:MaxNewSize=6m"
> >> > > > > > >> >
> >> > > > > > >> > Any reason for the above?
> >> > > > > > >> >
> >> > > > > > >> > If you study your GC logs, lots of pauses?
> >> > > > > > >> >
> >> > > > > > >> > Oh, and this: replication is set to 6.  Why 6?  Each
> write
> >> > must
> >> > > > > commit
> >> > > > > > >> to
> >> > > > > > >> > 6
> >> > > > > > >> > datanodes before complete.  In the tests posted on wiki,
> we
> >> > > > > replicate
> >> > > > > > to
> >> > > > > > >> 3
> >> > > > > > >> > nodes.
> >> > > > > > >> >
> >> > > > > > >> > In end of this message you say you are doing gets?
>  Numbers
> >> > you
> >> > > > > posted
> >> > > > > > >> > were
> >> > > > > > >> > for writes?
> >> > > > > > >> >
> >> > > > > > >> > St.Ack
> >> > > > > > >> >
> >> > > > > > >> >
> >> > > > > > >> > On Wed, Aug 12, 2009 at 1:15 PM, llpind <
> >> > sonny_heer@hotmail.com
> >> > > >
> >> > > > > > wrote:
> >> > > > > > >> >
> >> > > > > > >> >>
> >> > > > > > >> >> Not sure why my performance is so slow.  Here is my
> >> > > > configuration:
> >> > > > > > >> >>
> >> > > > > > >> >> box1:
> >> > > > > > >> >> 10395 SecondaryNameNode
> >> > > > > > >> >> 11628 Jps
> >> > > > > > >> >> 10131 NameNode
> >> > > > > > >> >> 10638 HQuorumPeer
> >> > > > > > >> >> 10705 HMaster
> >> > > > > > >> >>
> >> > > > > > >> >> box 2-5:
> >> > > > > > >> >> 6741 HQuorumPeer
> >> > > > > > >> >> 6841 HRegionServer
> >> > > > > > >> >> 7881 Jps
> >> > > > > > >> >> 6610 DataNode
> >> > > > > > >> >>
> >> > > > > > >> >>
> >> > > > > > >> >> hbase site: =======================
> >> > > > > > >> >> <?xml version="1.0"?>
> >> > > > > > >> >> <?xml-stylesheet type="text/xsl"
> >> href="configuration.xsl"?>
> >> > > > > > >> >> <!--
> >> > > > > > >> >> /**
> >> > > > > > >> >>  * Copyright 2007 The Apache Software Foundation
> >> > > > > > >> >>  *
> >> > > > > > >> >>  * Licensed to the Apache Software Foundation (ASF)
> under
> >> one
> >> > > > > > >> >>  * or more contributor license agreements.  See the
> NOTICE
> >> > file
> >> > > > > > >> >>  * distributed with this work for additional information
> >> > > > > > >> >>  * regarding copyright ownership.  The ASF licenses this
> >> file
> >> > > > > > >> >>  * to you under the Apache License, Version 2.0 (the
> >> > > > > > >> >>  * "License"); you may not use this file except in
> >> compliance
> >> > > > > > >> >>  * with the License.  You may obtain a copy of the
> License
> >> at
> >> > > > > > >> >>  *
> >> > > > > > >> >>  *     http://www.apache.org/licenses/LICENSE-2.0
> >> > > > > > >> >>  *
> >> > > > > > >> >>  * Unless required by applicable law or agreed to in
> >> writing,
> >> > > > > > software
> >> > > > > > >> >>  * distributed under the License is distributed on an
> "AS
> >> IS"
> >> > > > > BASIS,
> >> > > > > > >> >>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
> >> > express
> >> > > > or
> >> > > > > > >> >> implied.
> >> > > > > > >> >>  * See the License for the specific language governing
> >> > > > permissions
> >> > > > > > and
> >> > > > > > >> >>  * limitations under the License.
> >> > > > > > >> >>  */
> >> > > > > > >> >> -->
> >> > > > > > >> >> <configuration>
> >> > > > > > >> >>  <property>
> >> > > > > > >> >>    <name>hbase.rootdir</name>
> >> > > > > > >> >>    <value>hdfs://box1:9000/hbase</value>
> >> > > > > > >> >>    <description>The directory shared by region servers.
> >> > > > > > >> >>    </description>
> >> > > > > > >> >>  </property>
> >> > > > > > >> >>  <property>
> >> > > > > > >> >>    <name>hbase.master.port</name>
> >> > > > > > >> >>    <value>60000</value>
> >> > > > > > >> >>    <description>The port that the HBase master runs at.
> >> > > > > > >> >>    </description>
> >> > > > > > >> >>  </property>
> >> > > > > > >> >>  <property>
> >> > > > > > >> >>    <name>hbase.cluster.distributed</name>
> >> > > > > > >> >>    <value>true</value>
> >> > > > > > >> >>    <description>The mode the cluster will be in.
> Possible
> >> > > values
> >> > > > > are
> >> > > > > > >> >>      false: standalone and pseudo-distributed setups
> with
> >> > > managed
> >> > > > > > >> >> Zookeeper
> >> > > > > > >> >>      true: fully-distributed with unmanaged Zookeeper
> >> Quorum
> >> > > (see
> >> > > > > > >> >> hbase-env.sh)
> >> > > > > > >> >>    </description>
> >> > > > > > >> >>  </property>
> >> > > > > > >> >>  <property>
> >> > > > > > >> >>    <name>hbase.regionserver.lease.period</name>
> >> > > > > > >> >>    <value>120000</value>
> >> > > > > > >> >>    <description>HRegion server lease period in
> >> milliseconds.
> >> > > > > Default
> >> > > > > > is
> >> > > > > > >> >>    60 seconds. Clients must report in within this period
> >> else
> >> > > > they
> >> > > > > > are
> >> > > > > > >> >>    considered dead.</description>
> >> > > > > > >> >>  </property>
> >> > > > > > >> >>
> >> > > > > > >> >>  <property>
> >> > > > > > >> >>      <name>hbase.zookeeper.property.clientPort</name>
> >> > > > > > >> >>      <value>2222</value>
> >> > > > > > >> >>      <description>Property from ZooKeeper's config
> >> zoo.cfg.
> >> > > > > > >> >>      The port at which the clients will connect.
> >> > > > > > >> >>      </description>
> >> > > > > > >> >>  </property>
> >> > > > > > >> >>  <property>
> >> > > > > > >> >>      <name>hbase.zookeeper.property.dataDir</name>
> >> > > > > > >> >>      <value>/home/hadoop/zookeeper</value>
> >> > > > > > >> >>  </property>
> >> > > > > > >> >>  <property>
> >> > > > > > >> >>      <name>hbase.zookeeper.property.syncLimit</name>
> >> > > > > > >> >>      <value>5</value>
> >> > > > > > >> >>  </property>
> >> > > > > > >> >>  <property>
> >> > > > > > >> >>      <name>hbase.zookeeper.property.tickTime</name>
> >> > > > > > >> >>      <value>2000</value>
> >> > > > > > >> >>  </property>
> >> > > > > > >> >>  <property>
> >> > > > > > >> >>      <name>hbase.zookeeper.property.initLimit</name>
> >> > > > > > >> >>      <value>10</value>
> >> > > > > > >> >>  </property>
> >> > > > > > >> >>  <property>
> >> > > > > > >> >>      <name>hbase.zookeeper.quorum</name>
> >> > > > > > >> >>      <value>box1,box2,box3,box4</value>
> >> > > > > > >> >>      <description>Comma separated list of servers in the
> >> > > > ZooKeeper
> >> > > > > > >> >> Quorum.
> >> > > > > > >> >>      For example,
> >> > > > > > >> >> "host1.mydomain.com,host2.mydomain.com,
> host3.mydomain.com
> >> ".
> >> > > > > > >> >>      By default this is set to localhost for local and
> >> > > > > > >> pseudo-distributed
> >> > > > > > >> >> modes
> >> > > > > > >> >>      of operation. For a fully-distributed setup, this
> >> should
> >> > > be
> >> > > > > set
> >> > > > > > to
> >> > > > > > >> a
> >> > > > > > >> >> full
> >> > > > > > >> >>      list of ZooKeeper quorum servers. If
> HBASE_MANAGES_ZK
> >> is
> >> > > set
> >> > > > > in
> >> > > > > > >> >> hbase-env.sh
> >> > > > > > >> >>      this is the list of servers which we will
> start/stop
> >> > > > ZooKeeper
> >> > > > > > on.
> >> > > > > > >> >>      </description>
> >> > > > > > >> >>  </property>
> >> > > > > > >> >>  <property>
> >> > > > > > >> >>    <name>hfile.block.cache.size</name>
> >> > > > > > >> >>    <value>.5</value>
> >> > > > > > >> >>    <description>text</description>
> >> > > > > > >> >>  </property>
> >> > > > > > >> >>
> >> > > > > > >> >> </configuration>
> >> > > > > > >> >>
> >> > > > > > >> >>
> >> > > > > > >> >> hbase
> >> > env:====================================================
> >> > > > > > >> >>
> >> > > > > > >> >> export HBASE_CLASSPATH=${HADOOP_CONF_DIR}
> >> > > > > > >> >>
> >> > > > > > >> >> export HBASE_HEAPSIZE=3000
> >> > > > > > >> >>
> >> > > > > > >> >> export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m
> >> > > > > > >> >> -XX:+UseConcMarkSweepGC
> >> > > > > > >> >> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> >> > > > > > >> >> -XX:+CMSIncrementalMode
> >> > > > > > >> >> -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"
> >> > > > > > >> >>
> >> > > > > > >> >> export HBASE_MANAGES_ZK=true
> >> > > > > > >> >>
> >> > > > > > >> >> Hadoop core
> >> > > > > > >> >>
> >> > site===========================================================
> >> > > > > > >> >>
> >> > > > > > >> >> <?xml version="1.0"?>
> >> > > > > > >> >> <?xml-stylesheet type="text/xsl"
> >> href="configuration.xsl"?>
> >> > > > > > >> >>
> >> > > > > > >> >> <!-- Put site-specific property overrides in this file.
> >> -->
> >> > > > > > >> >>
> >> > > > > > >> >> <configuration>
> >> > > > > > >> >> <property>
> >> > > > > > >> >>   <name>fs.default.name</name>
> >> > > > > > >> >>   <value>hdfs://box1:9000</value>
> >> > > > > > >> >>   <description>The name of the default file system.  A
> URI
> >> > > whose
> >> > > > > > >> >>   scheme and authority determine the FileSystem
> >> > implementation.
> >> > > > >  The
> >> > > > > > >> >>   uri's scheme determines the config property
> >> > (fs.SCHEME.impl)
> >> > > > > naming
> >> > > > > > >> >>   the FileSystem implementation class.  The uri's
> >> authority
> >> > is
> >> > > > used
> >> > > > > > to
> >> > > > > > >> >>   determine the host, port, etc. for a
> >> > > filesystem.</description>
> >> > > > > > >> >> </property>
> >> > > > > > >> >> <property>
> >> > > > > > >> >>  <name>hadoop.tmp.dir</name>
> >> > > > > > >> >>  <value>/data/hadoop-0.20.0-${user.name}</value>
> >> > > > > > >> >>  <description>A base for other temporary
> >> > > > directories.</description>
> >> > > > > > >> >> </property>
> >> > > > > > >> >> </configuration>
> >> > > > > > >> >>
> >> > > > > > >> >> ==============
> >> > > > > > >> >>
> >> > > > > > >> >> replication is set to 6.
> >> > > > > > >> >>
> >> > > > > > >> >> hadoop env=================
> >> > > > > > >> >>
> >> > > > > > >> >> export HADOOP_HEAPSIZE=3000
> >> > > > > > >> >> export
> >> HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
> >> > > > > > >> >> $HADOOP_NAMENODE_OPTS"
> >> > > > > > >> >> export
> >> > > > > HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
> >> > > > > > >> >> $HADOOP_SECONDARYNAMENODE_OPTS"
> >> > > > > > >> >> export
> >> HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
> >> > > > > > >> >> $HADOOP_DATANODE_OPTS"
> >> > > > > > >> >> export
> >> HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
> >> > > > > > >> >> $HADOOP_BALANCER_OPTS"
> >> > > > > > >> >> export
> >> HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
> >> > > > > > >> >> $HADOOP_JOBTRACKER_OPTS"
> >> > > > > > >> >>  ==================
> >> > > > > > >> >>
> >> > > > > > >> >>
> >> > > > > > >> >> Very basic setup.  then i start the cluster do simple
> >> random
> >> > > Get
> >> > > > > > >> >> operations
> >> > > > > > >> >> on a tall table (~60 M rows):
> >> > > > > > >> >>
> >> > > > > > >> >> {NAME => 'tallTable', FAMILIES => [{NAME => 'family1',
> >> > > > COMPRESSION
> >> > > > > =>
> >> > > > > > >> >> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE
> =>
> >> > > > '65536',
> >> > > > > > >> >> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
> >> > > > > > >> >>
> >> > > > > > >> >> Is this fairly normal speeds?  I'm unsure if this is a
> >> result
> >> > > of
> >> > > > > > having
> >> > > > > > >> a
> >> > > > > > >> >> small cluster?  Please advise...
> >> > > > > > >> >>
> >> > > > > > >> >> stack-3 wrote:
> >> > > > > > >> >> >
> >> > > > > > >> >> > Yeah, seems slow.  In old hbase, it could do 5-10k
> >> writes a
> >> > > > > second
> >> > > > > > >> >> going
> >> > > > > > >> >> > by
> >> > > > > > >> >> > performance eval page up on wiki.  SequentialWrite was
> >> > about
> >> > > > same
> >> > > > > > as
> >> > > > > > >> >> > RandomWrite.  Check out the stats on hw up on that
> page
> >> and
> >> > > > > > >> description
> >> > > > > > >> >> of
> >> > > > > > >> >> > how test was set up.  Can you figure where its slow?
> >> > > > > > >> >> >
> >> > > > > > >> >> > St.Ack
> >> > > > > > >> >> >
> >> > > > > > >> >> > On Wed, Aug 12, 2009 at 10:10 AM, llpind <
> >> > > > sonny_heer@hotmail.com
> >> > > > > >
> >> > > > > > >> >> wrote:
> >> > > > > > >> >> >
> >> > > > > > >> >> >>
> >> > > > > > >> >> >> Thanks Stack.
> >> > > > > > >> >> >>
> >> > > > > > >> >> >> I will try mapred with more clients.   I tried it
> >> without
> >> > > > mapred
> >> > > > > > >> using
> >> > > > > > >> >> 3
> >> > > > > > >> >> >> clients Random Write operations here was the output:
> >> > > > > > >> >> >>
> >> > > > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation:
> >> > client-0
> >> > > > > Start
> >> > > > > > >> >> >> randomWrite at offset 0 for 1048576 rows
> >> > > > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation:
> >> > client-1
> >> > > > > Start
> >> > > > > > >> >> >> randomWrite at offset 1048576 for 1048576 rows
> >> > > > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation:
> >> > client-2
> >> > > > > Start
> >> > > > > > >> >> >> randomWrite at offset 2097152 for 1048576 rows
> >> > > > > > >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation:
> >> > client-1
> >> > > > > > >> >> >> 1048576/1153427/2097152
> >> > > > > > >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation:
> >> > client-2
> >> > > > > > >> >> >> 2097152/2201997/3145728
> >> > > > > > >> >> >> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation:
> >> > client-0
> >> > > > > > >> >> >> 0/104857/1048576
> >> > > > > > >> >> >> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation:
> >> > client-0
> >> > > > > > >> >> >> 0/209714/1048576
> >> > > > > > >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation:
> >> > client-1
> >> > > > > > >> >> >> 1048576/1258284/2097152
> >> > > > > > >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation:
> >> > client-2
> >> > > > > > >> >> >> 2097152/2306854/3145728
> >> > > > > > >> >> >> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation:
> >> > client-1
> >> > > > > > >> >> >> 1048576/1363141/2097152
> >> > > > > > >> >> >> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation:
> >> > client-0
> >> > > > > > >> >> >> 0/314571/1048576
> >> > > > > > >> >> >> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation:
> >> > client-2
> >> > > > > > >> >> >> 2097152/2411711/3145728
> >> > > > > > >> >> >> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation:
> >> > client-0
> >> > > > > > >> >> >> 0/419428/1048576
> >> > > > > > >> >> >> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation:
> >> > client-1
> >> > > > > > >> >> >> 1048576/1467998/2097152
> >> > > > > > >> >> >> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation:
> >> > client-2
> >> > > > > > >> >> >> 2097152/2516568/3145728
> >> > > > > > >> >> >> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation:
> >> > client-0
> >> > > > > > >> >> >> 0/524285/1048576
> >> > > > > > >> >> >> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation:
> >> > client-2
> >> > > > > > >> >> >> 2097152/2621425/3145728
> >> > > > > > >> >> >> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation:
> >> > client-1
> >> > > > > > >> >> >> 1048576/1572855/2097152
> >> > > > > > >> >> >> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation:
> >> > client-0
> >> > > > > > >> >> >> 0/629142/1048576
> >> > > > > > >> >> >> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation:
> >> > client-2
> >> > > > > > >> >> >> 2097152/2726282/3145728
> >> > > > > > >> >> >> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation:
> >> > client-1
> >> > > > > > >> >> >> 1048576/1677712/2097152
> >> > > > > > >> >> >> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation:
> >> > client-0
> >> > > > > > >> >> >> 0/733999/1048576
> >> > > > > > >> >> >> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation:
> >> > client-2
> >> > > > > > >> >> >> 2097152/2831139/3145728
> >> > > > > > >> >> >> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation:
> >> > client-1
> >> > > > > > >> >> >> 1048576/1782569/2097152
> >> > > > > > >> >> >> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation:
> >> > client-0
> >> > > > > > >> >> >> 0/838856/1048576
> >> > > > > > >> >> >> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation:
> >> > client-2
> >> > > > > > >> >> >> 2097152/2935996/3145728
> >> > > > > > >> >> >> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation:
> >> > client-1
> >> > > > > > >> >> >> 1048576/1887426/2097152
> >> > > > > > >> >> >> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation:
> >> > client-0
> >> > > > > > >> >> >> 0/943713/1048576
> >> > > > > > >> >> >> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation:
> >> > client-2
> >> > > > > > >> >> >> 2097152/3040853/3145728
> >> > > > > > >> >> >> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation:
> >> > client-1
> >> > > > > > >> >> >> 1048576/1992283/2097152
> >> > > > > > >> >> >> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation:
> >> > client-0
> >> > > > > > >> >> >> 0/1048570/1048576
> >> > > > > > >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation:
> >> > client-0
> >> > > > > > >> Finished
> >> > > > > > >> >> >> randomWrite in 2376615ms at offset 0 for 1048576 rows
> >> > > > > > >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation:
> >> > Finished
> >> > > 0
> >> > > > > in
> >> > > > > > >> >> >> 2376615ms
> >> > > > > > >> >> >> writing 1048576 rows
> >> > > > > > >> >> >> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation:
> >> > client-2
> >> > > > > > >> >> >> 2097152/3145710/3145728
> >> > > > > > >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation:
> >> > client-2
> >> > > > > > >> Finished
> >> > > > > > >> >> >> randomWrite in 2623395ms at offset 2097152 for
> 1048576
> >> > rows
> >> > > > > > >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation:
> >> > Finished
> >> > > 2
> >> > > > > in
> >> > > > > > >> >> >> 2623395ms
> >> > > > > > >> >> >> writing 1048576 rows
> >> > > > > > >> >> >> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation:
> >> > client-1
> >> > > > > > >> >> >> 1048576/2097140/2097152
> >> > > > > > >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation:
> >> > client-1
> >> > > > > > >> Finished
> >> > > > > > >> >> >> randomWrite in 2630199ms at offset 1048576 for
> 1048576
> >> > rows
> >> > > > > > >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation:
> >> > Finished
> >> > > 1
> >> > > > > in
> >> > > > > > >> >> >> 2630199ms
> >> > > > > > >> >> >> writing 1048576 rows
> >> > > > > > >> >> >>
> >> > > > > > >> >> >>
> >> > > > > > >> >> >>
> >> > > > > > >> >> >> Seems kind of slow for ~3M records.  I have a 4 node
> >> > cluster
> >> > > > up
> >> > > > > at
> >> > > > > > >> the
> >> > > > > > >> >> >> moment.  HMaster & Namenode running on same box.
> >> > > > > > >> >> >> --
> >> > > > > > >> >> >> View this message in context:
> >> > > > > > >> >> >>
> >> > > > > > >> >>
> >> > > > > > >>
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
> >> > > > > > >> >> >> Sent from the HBase User mailing list archive at
> >> > Nabble.com.
> >> > > > > > >> >> >>
> >> > > > > > >> >> >>
> >> > > > > > >> >> >
> >> > > > > > >> >> >
> >> > > > > > >> >>
> >> > > > > > >> >> --
> >> > > > > > >> >> View this message in context:
> >> > > > > > >> >>
> >> > > > > > >>
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
> >> > > > > > >> >> Sent from the HBase User mailing list archive at
> >> Nabble.com.
> >> > > > > > >> >>
> >> > > > > > >> >>
> >> > > > > > >> >
> >> > > > > > >> >
> >> > > > > > >>
> >> > > > > > >> --
> >> > > > > > >> View this message in context:
> >> > > > > > >>
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24955595.html
> >> > > > > > >> Sent from the HBase User mailing list archive at
> Nabble.com.
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: HBase in a real world application

Posted by Schubert Zhang <zs...@gmail.com>.
stack, please assign
*HBASE-1778*<https://issues.apache.org/jira/browse/HBASE-1778> to
me.

On Wed, Aug 19, 2009 at 3:16 AM, Schubert Zhang <zs...@gmail.com> wrote:

> ok, stack. I will done it as soon as possiable.
>
>
> On Wed, Aug 19, 2009 at 2:47 AM, stack <st...@duboce.net> wrote:
>
>> Can you make an issue and a patch please Schubert?
>> St.Ack
>>
>> On Tue, Aug 18, 2009 at 10:52 AM, Schubert Zhang <zs...@gmail.com>
>> wrote:
>>
>> > We found that there are two issues about the PerformanceEvaluation
>> class.
>> > - Is not match for hadoop-0.20.0.
>> > - The approach to split map is not strict. Need to provide correct
>> > InputSplit and InputFormat classes.
>> >
>> > And we have just modified the
>> org.apache.hadoop.hbase.PerformanceEvaluation
>> > for our evaluations. Please get our code at:
>> >
>> > http://dl.getdropbox.com/u/24074/code/PerformanceEvaluation.java
>> >
>> > Here is our evaluations of 0.20.0 RC1
>> >
>> http://docloud.blogspot.com/2009/08/hbase-0200-performance-evaluation.html
>> >
>> > Schubert
>> >
>> > On Tue, Aug 18, 2009 at 8:37 AM, Jeff Hammerbacher <hammer@cloudera.com
>> > >wrote:
>> >
>> > > Thanks guys. For the lazy (e.g. me) and future searchers, here are
>> some
>> > > links. The benchmark is meant to simulate the same performance tests
>> > quoted
>> > > in Google's BigTable paper.
>> > >
>> > > * PerformanceEvaluation wiki page:
>> > > http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation
>> > > * PerformanceEvaluation.java:
>> > >
>> > >
>> >
>> http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/test/org/apache/hadoop/hbase/PerformanceEvaluation.java?view=co
>> > >
>> > > Thanks,
>> > > Jeff
>> > >
>> > > On Mon, Aug 17, 2009 at 5:09 PM, stack <st...@duboce.net> wrote:
>> > >
>> > > > On Mon, Aug 17, 2009 at 4:54 PM, Jeff Hammerbacher <
>> > hammer@cloudera.com
>> > > > >wrote:
>> > > >
>> > > > > Hey Stack,
>> > > > >
>> > > > > I notice that the patch for this issue doesn't include any sort of
>> > > tests
>> > > > > that might have caught this regression. Do you guys have an
>> > HBaseBench,
>> > > > > HBaseMix, or similarly named tool for catching performance
>> > regressions?
>> > > > >
>> > > >
>> > > > Not as part of our build.  The way its currently done is that near
>> > > release,
>> > > > we run our little PerformanceEvaluation doohickey.  If its way off,
>> > crack
>> > > > the profiler.
>> > > >
>> > > > We have been trying to get some of the hadoop allotment of EC2 time
>> so
>> > we
>> > > > could set up a regular run up on AWS but no luck so far.
>> > > >
>> > > > Good on your Jeff,
>> > > > St.Ack
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > >
>> > > > > Thanks,
>> > > > > Jeff
>> > > > >
>> > > > > On Mon, Aug 17, 2009 at 4:51 PM, stack <st...@duboce.net> wrote:
>> > > > >
>> > > > > > Our writes were off by a factor of 7 or 8.  Writes should be
>> better
>> > > now
>> > > > > > (HBASE-1771).
>> > > > > > Thanks,
>> > > > > > St.Ack
>> > > > > >
>> > > > > >
>> > > > > > On Thu, Aug 13, 2009 at 4:53 PM, stack <st...@duboce.net>
>> wrote:
>> > > > > >
>> > > > > > > I just tried it.  It seems slow to me writing too.  Let me
>> take a
>> > > > > > look....
>> > > > > > > St.Ack
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Aug 13, 2009 at 10:06 AM, llpind <
>> sonny_heer@hotmail.com
>> > >
>> > > > > wrote:
>> > > > > > >
>> > > > > > >>
>> > > > > > >> Okay I changed replication to 2.  and removed "-XX:NewSize=6m
>> > > > > > >> -XX:MaxNewSize=6m"
>> > > > > > >>
>> > > > > > >> here is results for randomWrite 3 clients:
>> > > > > > >>
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> RandomWrite =================================================
>> > > > > > >>
>> > > > > > >> hadoop-0.20.0/bin/hadoop jar
>> hbase-0.20.0/hbase-0.20.0-test.jar
>> > > > > > >>  --nomapred
>> > > > > > >> randomWrite 3
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-0
>> > Start
>> > > > > > >> randomWrite at offset 0 for 1048576 rows
>> > > > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-1
>> > Start
>> > > > > > >> randomWrite at offset 1048576 for 1048576 rows
>> > > > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-2
>> > Start
>> > > > > > >> randomWrite at offset 2097152 for 1048576 rows
>> > > > > > >> 09/08/13 09:51:47 INFO hbase.PerformanceEvaluation: client-0
>> > > > > > >> 0/104857/1048576
>> > > > > > >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-1
>> > > > > > >> 1048576/1153427/2097152
>> > > > > > >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-2
>> > > > > > >> 2097152/2201997/3145728
>> > > > > > >> 09/08/13 09:52:22 INFO hbase.PerformanceEvaluation: client-1
>> > > > > > >> 1048576/1258284/2097152
>> > > > > > >> 09/08/13 09:52:23 INFO hbase.PerformanceEvaluation: client-0
>> > > > > > >> 0/209714/1048576
>> > > > > > >> 09/08/13 09:52:24 INFO hbase.PerformanceEvaluation: client-2
>> > > > > > >> 2097152/2306854/3145728
>> > > > > > >> 09/08/13 09:52:47 INFO hbase.PerformanceEvaluation: client-1
>> > > > > > >> 1048576/1363141/2097152
>> > > > > > >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-0
>> > > > > > >> 0/314571/1048576
>> > > > > > >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-2
>> > > > > > >> 2097152/2411711/3145728
>> > > > > > >> 09/08/13 09:53:24 INFO hbase.PerformanceEvaluation: client-1
>> > > > > > >> 1048576/1467998/2097152
>> > > > > > >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-0
>> > > > > > >> 0/419428/1048576
>> > > > > > >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-2
>> > > > > > >> 2097152/2516568/3145728
>> > > > > > >> 09/08/13 09:53:48 INFO hbase.PerformanceEvaluation: client-1
>> > > > > > >> 1048576/1572855/2097152
>> > > > > > >> 09/08/13 09:54:08 INFO hbase.PerformanceEvaluation: client-2
>> > > > > > >> 2097152/2621425/3145728
>> > > > > > >> 09/08/13 09:54:10 INFO hbase.PerformanceEvaluation: client-0
>> > > > > > >> 0/524285/1048576
>> > > > > > >> 09/08/13 09:54:40 INFO hbase.PerformanceEvaluation: client-1
>> > > > > > >> 1048576/1677712/2097152
>> > > > > > >> 09/08/13 09:54:49 INFO hbase.PerformanceEvaluation: client-2
>> > > > > > >> 2097152/2726282/3145728
>> > > > > > >> 09/08/13 09:54:52 INFO hbase.PerformanceEvaluation: client-0
>> > > > > > >> 0/629142/1048576
>> > > > > > >> 09/08/13 09:55:57 INFO hbase.PerformanceEvaluation: client-1
>> > > > > > >> 1048576/1782569/2097152
>> > > > > > >> 09/08/13 09:56:21 INFO hbase.PerformanceEvaluation: client-2
>> > > > > > >> 2097152/2831139/3145728
>> > > > > > >> 09/08/13 09:56:41 INFO hbase.PerformanceEvaluation: client-0
>> > > > > > >> 0/733999/1048576
>> > > > > > >> 09/08/13 09:57:23 INFO hbase.PerformanceEvaluation: client-1
>> > > > > > >> 1048576/1887426/2097152
>> > > > > > >> 09/08/13 09:58:40 INFO hbase.PerformanceEvaluation: client-2
>> > > > > > >> 2097152/2935996/3145728
>> > > > > > >> 09/08/13 09:58:54 INFO hbase.PerformanceEvaluation: client-0
>> > > > > > >> 0/838856/1048576
>> > > > > > >> 09/08/13 10:00:29 INFO hbase.PerformanceEvaluation: client-1
>> > > > > > >> 1048576/1992283/2097152
>> > > > > > >> 09/08/13 10:01:01 INFO hbase.PerformanceEvaluation: client-2
>> > > > > > >> 2097152/3040853/3145728
>> > > > > > >> 09/08/13 10:01:24 INFO hbase.PerformanceEvaluation: client-0
>> > > > > > >> 0/943713/1048576
>> > > > > > >> 09/08/13 10:02:36 INFO hbase.PerformanceEvaluation: client-1
>> > > > > > >> 1048576/2097140/2097152
>> > > > > > >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: client-1
>> > > > Finished
>> > > > > > >> randomWrite in 680674ms at offset 1048576 for 1048576 rows
>> > > > > > >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: Finished
>> 1
>> > in
>> > > > > > 680674ms
>> > > > > > >> writing 1048576 rows
>> > > > > > >> 09/08/13 10:03:19 INFO hbase.PerformanceEvaluation: client-2
>> > > > > > >> 2097152/3145710/3145728
>> > > > > > >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: client-2
>> > > > Finished
>> > > > > > >> randomWrite in 723771ms at offset 2097152 for 1048576 rows
>> > > > > > >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: Finished
>> 2
>> > in
>> > > > > > 723771ms
>> > > > > > >> writing 1048576 rows
>> > > > > > >> 09/08/13 10:03:41 INFO hbase.PerformanceEvaluation: client-0
>> > > > > > >> 0/1048570/1048576
>> > > > > > >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: client-0
>> > > > Finished
>> > > > > > >> randomWrite in 746054ms at offset 0 for 1048576 rows
>> > > > > > >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: Finished
>> 0
>> > in
>> > > > > > 746054ms
>> > > > > > >> writing 1048576 rows
>> > > > > > >>
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> ============================================================
>> > > > > > >>
>> > > > > > >> Still pretty slow.  Any other ideas?  I'm running the client
>> > from
>> > > > the
>> > > > > > >> master
>> > > > > > >> box, but its not running any regionServers or datanodes.
>> > > > > > >>
>> > > > > > >> stack-3 wrote:
>> > > > > > >> >
>> > > > > > >> > Your config. looks fine.
>> > > > > > >> >
>> > > > > > >> > Only think that gives me pause is:
>> > > > > > >> >
>> > > > > > >> > "-XX:NewSize=6m -XX:MaxNewSize=6m"
>> > > > > > >> >
>> > > > > > >> > Any reason for the above?
>> > > > > > >> >
>> > > > > > >> > If you study your GC logs, lots of pauses?
>> > > > > > >> >
>> > > > > > >> > Oh, and this: replication is set to 6.  Why 6?  Each write
>> > must
>> > > > > commit
>> > > > > > >> to
>> > > > > > >> > 6
>> > > > > > >> > datanodes before complete.  In the tests posted on wiki, we
>> > > > > replicate
>> > > > > > to
>> > > > > > >> 3
>> > > > > > >> > nodes.
>> > > > > > >> >
>> > > > > > >> > In end of this message you say you are doing gets?  Numbers
>> > you
>> > > > > posted
>> > > > > > >> > were
>> > > > > > >> > for writes?
>> > > > > > >> >
>> > > > > > >> > St.Ack
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >> > On Wed, Aug 12, 2009 at 1:15 PM, llpind <
>> > sonny_heer@hotmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > >> >
>> > > > > > >> >>
>> > > > > > >> >> Not sure why my performance is so slow.  Here is my
>> > > > configuration:
>> > > > > > >> >>
>> > > > > > >> >> box1:
>> > > > > > >> >> 10395 SecondaryNameNode
>> > > > > > >> >> 11628 Jps
>> > > > > > >> >> 10131 NameNode
>> > > > > > >> >> 10638 HQuorumPeer
>> > > > > > >> >> 10705 HMaster
>> > > > > > >> >>
>> > > > > > >> >> box 2-5:
>> > > > > > >> >> 6741 HQuorumPeer
>> > > > > > >> >> 6841 HRegionServer
>> > > > > > >> >> 7881 Jps
>> > > > > > >> >> 6610 DataNode
>> > > > > > >> >>
>> > > > > > >> >>
>> > > > > > >> >> hbase site: =======================
>> > > > > > >> >> <?xml version="1.0"?>
>> > > > > > >> >> <?xml-stylesheet type="text/xsl"
>> href="configuration.xsl"?>
>> > > > > > >> >> <!--
>> > > > > > >> >> /**
>> > > > > > >> >>  * Copyright 2007 The Apache Software Foundation
>> > > > > > >> >>  *
>> > > > > > >> >>  * Licensed to the Apache Software Foundation (ASF) under
>> one
>> > > > > > >> >>  * or more contributor license agreements.  See the NOTICE
>> > file
>> > > > > > >> >>  * distributed with this work for additional information
>> > > > > > >> >>  * regarding copyright ownership.  The ASF licenses this
>> file
>> > > > > > >> >>  * to you under the Apache License, Version 2.0 (the
>> > > > > > >> >>  * "License"); you may not use this file except in
>> compliance
>> > > > > > >> >>  * with the License.  You may obtain a copy of the License
>> at
>> > > > > > >> >>  *
>> > > > > > >> >>  *     http://www.apache.org/licenses/LICENSE-2.0
>> > > > > > >> >>  *
>> > > > > > >> >>  * Unless required by applicable law or agreed to in
>> writing,
>> > > > > > software
>> > > > > > >> >>  * distributed under the License is distributed on an "AS
>> IS"
>> > > > > BASIS,
>> > > > > > >> >>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
>> > express
>> > > > or
>> > > > > > >> >> implied.
>> > > > > > >> >>  * See the License for the specific language governing
>> > > > permissions
>> > > > > > and
>> > > > > > >> >>  * limitations under the License.
>> > > > > > >> >>  */
>> > > > > > >> >> -->
>> > > > > > >> >> <configuration>
>> > > > > > >> >>  <property>
>> > > > > > >> >>    <name>hbase.rootdir</name>
>> > > > > > >> >>    <value>hdfs://box1:9000/hbase</value>
>> > > > > > >> >>    <description>The directory shared by region servers.
>> > > > > > >> >>    </description>
>> > > > > > >> >>  </property>
>> > > > > > >> >>  <property>
>> > > > > > >> >>    <name>hbase.master.port</name>
>> > > > > > >> >>    <value>60000</value>
>> > > > > > >> >>    <description>The port that the HBase master runs at.
>> > > > > > >> >>    </description>
>> > > > > > >> >>  </property>
>> > > > > > >> >>  <property>
>> > > > > > >> >>    <name>hbase.cluster.distributed</name>
>> > > > > > >> >>    <value>true</value>
>> > > > > > >> >>    <description>The mode the cluster will be in. Possible
>> > > values
>> > > > > are
>> > > > > > >> >>      false: standalone and pseudo-distributed setups with
>> > > managed
>> > > > > > >> >> Zookeeper
>> > > > > > >> >>      true: fully-distributed with unmanaged Zookeeper
>> Quorum
>> > > (see
>> > > > > > >> >> hbase-env.sh)
>> > > > > > >> >>    </description>
>> > > > > > >> >>  </property>
>> > > > > > >> >>  <property>
>> > > > > > >> >>    <name>hbase.regionserver.lease.period</name>
>> > > > > > >> >>    <value>120000</value>
>> > > > > > >> >>    <description>HRegion server lease period in
>> milliseconds.
>> > > > > Default
>> > > > > > is
>> > > > > > >> >>    60 seconds. Clients must report in within this period
>> else
>> > > > they
>> > > > > > are
>> > > > > > >> >>    considered dead.</description>
>> > > > > > >> >>  </property>
>> > > > > > >> >>
>> > > > > > >> >>  <property>
>> > > > > > >> >>      <name>hbase.zookeeper.property.clientPort</name>
>> > > > > > >> >>      <value>2222</value>
>> > > > > > >> >>      <description>Property from ZooKeeper's config
>> zoo.cfg.
>> > > > > > >> >>      The port at which the clients will connect.
>> > > > > > >> >>      </description>
>> > > > > > >> >>  </property>
>> > > > > > >> >>  <property>
>> > > > > > >> >>      <name>hbase.zookeeper.property.dataDir</name>
>> > > > > > >> >>      <value>/home/hadoop/zookeeper</value>
>> > > > > > >> >>  </property>
>> > > > > > >> >>  <property>
>> > > > > > >> >>      <name>hbase.zookeeper.property.syncLimit</name>
>> > > > > > >> >>      <value>5</value>
>> > > > > > >> >>  </property>
>> > > > > > >> >>  <property>
>> > > > > > >> >>      <name>hbase.zookeeper.property.tickTime</name>
>> > > > > > >> >>      <value>2000</value>
>> > > > > > >> >>  </property>
>> > > > > > >> >>  <property>
>> > > > > > >> >>      <name>hbase.zookeeper.property.initLimit</name>
>> > > > > > >> >>      <value>10</value>
>> > > > > > >> >>  </property>
>> > > > > > >> >>  <property>
>> > > > > > >> >>      <name>hbase.zookeeper.quorum</name>
>> > > > > > >> >>      <value>box1,box2,box3,box4</value>
>> > > > > > >> >>      <description>Comma separated list of servers in the
>> > > > ZooKeeper
>> > > > > > >> >> Quorum.
>> > > > > > >> >>      For example,
>> > > > > > >> >> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com
>> ".
>> > > > > > >> >>      By default this is set to localhost for local and
>> > > > > > >> pseudo-distributed
>> > > > > > >> >> modes
>> > > > > > >> >>      of operation. For a fully-distributed setup, this
>> should
>> > > be
>> > > > > set
>> > > > > > to
>> > > > > > >> a
>> > > > > > >> >> full
>> > > > > > >> >>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK
>> is
>> > > set
>> > > > > in
>> > > > > > >> >> hbase-env.sh
>> > > > > > >> >>      this is the list of servers which we will start/stop
>> > > > ZooKeeper
>> > > > > > on.
>> > > > > > >> >>      </description>
>> > > > > > >> >>  </property>
>> > > > > > >> >>  <property>
>> > > > > > >> >>    <name>hfile.block.cache.size</name>
>> > > > > > >> >>    <value>.5</value>
>> > > > > > >> >>    <description>text</description>
>> > > > > > >> >>  </property>
>> > > > > > >> >>
>> > > > > > >> >> </configuration>
>> > > > > > >> >>
>> > > > > > >> >>
>> > > > > > >> >> hbase
>> > env:====================================================
>> > > > > > >> >>
>> > > > > > >> >> export HBASE_CLASSPATH=${HADOOP_CONF_DIR}
>> > > > > > >> >>
>> > > > > > >> >> export HBASE_HEAPSIZE=3000
>> > > > > > >> >>
>> > > > > > >> >> export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m
>> > > > > > >> >> -XX:+UseConcMarkSweepGC
>> > > > > > >> >> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>> > > > > > >> >> -XX:+CMSIncrementalMode
>> > > > > > >> >> -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"
>> > > > > > >> >>
>> > > > > > >> >> export HBASE_MANAGES_ZK=true
>> > > > > > >> >>
>> > > > > > >> >> Hadoop core
>> > > > > > >> >>
>> > site===========================================================
>> > > > > > >> >>
>> > > > > > >> >> <?xml version="1.0"?>
>> > > > > > >> >> <?xml-stylesheet type="text/xsl"
>> href="configuration.xsl"?>
>> > > > > > >> >>
>> > > > > > >> >> <!-- Put site-specific property overrides in this file.
>> -->
>> > > > > > >> >>
>> > > > > > >> >> <configuration>
>> > > > > > >> >> <property>
>> > > > > > >> >>   <name>fs.default.name</name>
>> > > > > > >> >>   <value>hdfs://box1:9000</value>
>> > > > > > >> >>   <description>The name of the default file system.  A URI
>> > > whose
>> > > > > > >> >>   scheme and authority determine the FileSystem
>> > implementation.
>> > > > >  The
>> > > > > > >> >>   uri's scheme determines the config property
>> > (fs.SCHEME.impl)
>> > > > > naming
>> > > > > > >> >>   the FileSystem implementation class.  The uri's
>> authority
>> > is
>> > > > used
>> > > > > > to
>> > > > > > >> >>   determine the host, port, etc. for a
>> > > filesystem.</description>
>> > > > > > >> >> </property>
>> > > > > > >> >> <property>
>> > > > > > >> >>  <name>hadoop.tmp.dir</name>
>> > > > > > >> >>  <value>/data/hadoop-0.20.0-${user.name}</value>
>> > > > > > >> >>  <description>A base for other temporary
>> > > > directories.</description>
>> > > > > > >> >> </property>
>> > > > > > >> >> </configuration>
>> > > > > > >> >>
>> > > > > > >> >> ==============
>> > > > > > >> >>
>> > > > > > >> >> replication is set to 6.
>> > > > > > >> >>
>> > > > > > >> >> hadoop env=================
>> > > > > > >> >>
>> > > > > > >> >> export HADOOP_HEAPSIZE=3000
>> > > > > > >> >> export
>> HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
>> > > > > > >> >> $HADOOP_NAMENODE_OPTS"
>> > > > > > >> >> export
>> > > > > HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
>> > > > > > >> >> $HADOOP_SECONDARYNAMENODE_OPTS"
>> > > > > > >> >> export
>> HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
>> > > > > > >> >> $HADOOP_DATANODE_OPTS"
>> > > > > > >> >> export
>> HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
>> > > > > > >> >> $HADOOP_BALANCER_OPTS"
>> > > > > > >> >> export
>> HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
>> > > > > > >> >> $HADOOP_JOBTRACKER_OPTS"
>> > > > > > >> >>  ==================
>> > > > > > >> >>
>> > > > > > >> >>
>> > > > > > >> >> Very basic setup.  then i start the cluster do simple
>> random
>> > > Get
>> > > > > > >> >> operations
>> > > > > > >> >> on a tall table (~60 M rows):
>> > > > > > >> >>
>> > > > > > >> >> {NAME => 'tallTable', FAMILIES => [{NAME => 'family1',
>> > > > COMPRESSION
>> > > > > =>
>> > > > > > >> >> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE =>
>> > > > '65536',
>> > > > > > >> >> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
>> > > > > > >> >>
>> > > > > > >> >> Is this fairly normal speeds?  I'm unsure if this is a
>> result
>> > > of
>> > > > > > having
>> > > > > > >> a
>> > > > > > >> >> small cluster?  Please advise...
>> > > > > > >> >>
>> > > > > > >> >> stack-3 wrote:
>> > > > > > >> >> >
>> > > > > > >> >> > Yeah, seems slow.  In old hbase, it could do 5-10k
>> writes a
>> > > > > second
>> > > > > > >> >> going
>> > > > > > >> >> > by
>> > > > > > >> >> > performance eval page up on wiki.  SequentialWrite was
>> > about
>> > > > same
>> > > > > > as
>> > > > > > >> >> > RandomWrite.  Check out the stats on hw up on that page
>> and
>> > > > > > >> description
>> > > > > > >> >> of
>> > > > > > >> >> > how test was set up.  Can you figure where its slow?
>> > > > > > >> >> >
>> > > > > > >> >> > St.Ack
>> > > > > > >> >> >
>> > > > > > >> >> > On Wed, Aug 12, 2009 at 10:10 AM, llpind <
>> > > > sonny_heer@hotmail.com
>> > > > > >
>> > > > > > >> >> wrote:
>> > > > > > >> >> >
>> > > > > > >> >> >>
>> > > > > > >> >> >> Thanks Stack.
>> > > > > > >> >> >>
>> > > > > > >> >> >> I will try mapred with more clients.   I tried it
>> without
>> > > > mapred
>> > > > > > >> using
>> > > > > > >> >> 3
>> > > > > > >> >> >> clients Random Write operations here was the output:
>> > > > > > >> >> >>
>> > > > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation:
>> > client-0
>> > > > > Start
>> > > > > > >> >> >> randomWrite at offset 0 for 1048576 rows
>> > > > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation:
>> > client-1
>> > > > > Start
>> > > > > > >> >> >> randomWrite at offset 1048576 for 1048576 rows
>> > > > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation:
>> > client-2
>> > > > > Start
>> > > > > > >> >> >> randomWrite at offset 2097152 for 1048576 rows
>> > > > > > >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation:
>> > client-1
>> > > > > > >> >> >> 1048576/1153427/2097152
>> > > > > > >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation:
>> > client-2
>> > > > > > >> >> >> 2097152/2201997/3145728
>> > > > > > >> >> >> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation:
>> > client-0
>> > > > > > >> >> >> 0/104857/1048576
>> > > > > > >> >> >> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation:
>> > client-0
>> > > > > > >> >> >> 0/209714/1048576
>> > > > > > >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation:
>> > client-1
>> > > > > > >> >> >> 1048576/1258284/2097152
>> > > > > > >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation:
>> > client-2
>> > > > > > >> >> >> 2097152/2306854/3145728
>> > > > > > >> >> >> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation:
>> > client-1
>> > > > > > >> >> >> 1048576/1363141/2097152
>> > > > > > >> >> >> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation:
>> > client-0
>> > > > > > >> >> >> 0/314571/1048576
>> > > > > > >> >> >> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation:
>> > client-2
>> > > > > > >> >> >> 2097152/2411711/3145728
>> > > > > > >> >> >> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation:
>> > client-0
>> > > > > > >> >> >> 0/419428/1048576
>> > > > > > >> >> >> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation:
>> > client-1
>> > > > > > >> >> >> 1048576/1467998/2097152
>> > > > > > >> >> >> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation:
>> > client-2
>> > > > > > >> >> >> 2097152/2516568/3145728
>> > > > > > >> >> >> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation:
>> > client-0
>> > > > > > >> >> >> 0/524285/1048576
>> > > > > > >> >> >> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation:
>> > client-2
>> > > > > > >> >> >> 2097152/2621425/3145728
>> > > > > > >> >> >> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation:
>> > client-1
>> > > > > > >> >> >> 1048576/1572855/2097152
>> > > > > > >> >> >> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation:
>> > client-0
>> > > > > > >> >> >> 0/629142/1048576
>> > > > > > >> >> >> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation:
>> > client-2
>> > > > > > >> >> >> 2097152/2726282/3145728
>> > > > > > >> >> >> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation:
>> > client-1
>> > > > > > >> >> >> 1048576/1677712/2097152
>> > > > > > >> >> >> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation:
>> > client-0
>> > > > > > >> >> >> 0/733999/1048576
>> > > > > > >> >> >> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation:
>> > client-2
>> > > > > > >> >> >> 2097152/2831139/3145728
>> > > > > > >> >> >> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation:
>> > client-1
>> > > > > > >> >> >> 1048576/1782569/2097152
>> > > > > > >> >> >> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation:
>> > client-0
>> > > > > > >> >> >> 0/838856/1048576
>> > > > > > >> >> >> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation:
>> > client-2
>> > > > > > >> >> >> 2097152/2935996/3145728
>> > > > > > >> >> >> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation:
>> > client-1
>> > > > > > >> >> >> 1048576/1887426/2097152
>> > > > > > >> >> >> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation:
>> > client-0
>> > > > > > >> >> >> 0/943713/1048576
>> > > > > > >> >> >> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation:
>> > client-2
>> > > > > > >> >> >> 2097152/3040853/3145728
>> > > > > > >> >> >> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation:
>> > client-1
>> > > > > > >> >> >> 1048576/1992283/2097152
>> > > > > > >> >> >> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation:
>> > client-0
>> > > > > > >> >> >> 0/1048570/1048576
>> > > > > > >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation:
>> > client-0
>> > > > > > >> Finished
>> > > > > > >> >> >> randomWrite in 2376615ms at offset 0 for 1048576 rows
>> > > > > > >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation:
>> > Finished
>> > > 0
>> > > > > in
>> > > > > > >> >> >> 2376615ms
>> > > > > > >> >> >> writing 1048576 rows
>> > > > > > >> >> >> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation:
>> > client-2
>> > > > > > >> >> >> 2097152/3145710/3145728
>> > > > > > >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation:
>> > client-2
>> > > > > > >> Finished
>> > > > > > >> >> >> randomWrite in 2623395ms at offset 2097152 for 1048576
>> > rows
>> > > > > > >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation:
>> > Finished
>> > > 2
>> > > > > in
>> > > > > > >> >> >> 2623395ms
>> > > > > > >> >> >> writing 1048576 rows
>> > > > > > >> >> >> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation:
>> > client-1
>> > > > > > >> >> >> 1048576/2097140/2097152
>> > > > > > >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation:
>> > client-1
>> > > > > > >> Finished
>> > > > > > >> >> >> randomWrite in 2630199ms at offset 1048576 for 1048576
>> > rows
>> > > > > > >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation:
>> > Finished
>> > > 1
>> > > > > in
>> > > > > > >> >> >> 2630199ms
>> > > > > > >> >> >> writing 1048576 rows
>> > > > > > >> >> >>
>> > > > > > >> >> >>
>> > > > > > >> >> >>
>> > > > > > >> >> >> Seems kind of slow for ~3M records.  I have a 4 node
>> > cluster
>> > > > up
>> > > > > at
>> > > > > > >> the
>> > > > > > >> >> >> moment.  HMaster & Namenode running on same box.
>> > > > > > >> >> >> --
>> > > > > > >> >> >> View this message in context:
>> > > > > > >> >> >>
>> > > > > > >> >>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
>> > > > > > >> >> >> Sent from the HBase User mailing list archive at
>> > Nabble.com.
>> > > > > > >> >> >>
>> > > > > > >> >> >>
>> > > > > > >> >> >
>> > > > > > >> >> >
>> > > > > > >> >>
>> > > > > > >> >> --
>> > > > > > >> >> View this message in context:
>> > > > > > >> >>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
>> > > > > > >> >> Sent from the HBase User mailing list archive at
>> Nabble.com.
>> > > > > > >> >>
>> > > > > > >> >>
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >>
>> > > > > > >> --
>> > > > > > >> View this message in context:
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24955595.html
>> > > > > > >> Sent from the HBase User mailing list archive at Nabble.com.
>> > > > > > >>
>> > > > > > >>
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: HBase in a real world application

Posted by Schubert Zhang <zs...@gmail.com>.
ok, stack. I will done it as soon as possiable.


On Wed, Aug 19, 2009 at 2:47 AM, stack <st...@duboce.net> wrote:

> Can you make an issue and a patch please Schubert?
> St.Ack
>
> On Tue, Aug 18, 2009 at 10:52 AM, Schubert Zhang <zs...@gmail.com>
> wrote:
>
> > We found that there are two issues about the PerformanceEvaluation class.
> > - Is not match for hadoop-0.20.0.
> > - The approach to split map is not strict. Need to provide correct
> > InputSplit and InputFormat classes.
> >
> > And we have just modified the
> org.apache.hadoop.hbase.PerformanceEvaluation
> > for our evaluations. Please get our code at:
> >
> > http://dl.getdropbox.com/u/24074/code/PerformanceEvaluation.java
> >
> > Here is our evaluations of 0.20.0 RC1
> >
> http://docloud.blogspot.com/2009/08/hbase-0200-performance-evaluation.html
> >
> > Schubert
> >
> > On Tue, Aug 18, 2009 at 8:37 AM, Jeff Hammerbacher <hammer@cloudera.com
> > >wrote:
> >
> > > Thanks guys. For the lazy (e.g. me) and future searchers, here are some
> > > links. The benchmark is meant to simulate the same performance tests
> > quoted
> > > in Google's BigTable paper.
> > >
> > > * PerformanceEvaluation wiki page:
> > > http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation
> > > * PerformanceEvaluation.java:
> > >
> > >
> >
> http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/test/org/apache/hadoop/hbase/PerformanceEvaluation.java?view=co
> > >
> > > Thanks,
> > > Jeff
> > >
> > > On Mon, Aug 17, 2009 at 5:09 PM, stack <st...@duboce.net> wrote:
> > >
> > > > On Mon, Aug 17, 2009 at 4:54 PM, Jeff Hammerbacher <
> > hammer@cloudera.com
> > > > >wrote:
> > > >
> > > > > Hey Stack,
> > > > >
> > > > > I notice that the patch for this issue doesn't include any sort of
> > > tests
> > > > > that might have caught this regression. Do you guys have an
> > HBaseBench,
> > > > > HBaseMix, or similarly named tool for catching performance
> > regressions?
> > > > >
> > > >
> > > > Not as part of our build.  The way its currently done is that near
> > > release,
> > > > we run our little PerformanceEvaluation doohickey.  If its way off,
> > crack
> > > > the profiler.
> > > >
> > > > We have been trying to get some of the hadoop allotment of EC2 time
> so
> > we
> > > > could set up a regular run up on AWS but no luck so far.
> > > >
> > > > Good on your Jeff,
> > > > St.Ack
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > Thanks,
> > > > > Jeff
> > > > >
> > > > > On Mon, Aug 17, 2009 at 4:51 PM, stack <st...@duboce.net> wrote:
> > > > >
> > > > > > Our writes were off by a factor of 7 or 8.  Writes should be
> better
> > > now
> > > > > > (HBASE-1771).
> > > > > > Thanks,
> > > > > > St.Ack
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 13, 2009 at 4:53 PM, stack <st...@duboce.net> wrote:
> > > > > >
> > > > > > > I just tried it.  It seems slow to me writing too.  Let me take
> a
> > > > > > look....
> > > > > > > St.Ack
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Aug 13, 2009 at 10:06 AM, llpind <
> sonny_heer@hotmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > >>
> > > > > > >> Okay I changed replication to 2.  and removed "-XX:NewSize=6m
> > > > > > >> -XX:MaxNewSize=6m"
> > > > > > >>
> > > > > > >> here is results for randomWrite 3 clients:
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> RandomWrite =================================================
> > > > > > >>
> > > > > > >> hadoop-0.20.0/bin/hadoop jar
> hbase-0.20.0/hbase-0.20.0-test.jar
> > > > > > >>  --nomapred
> > > > > > >> randomWrite 3
> > > > > > >>
> > > > > > >>
> > > > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-0
> > Start
> > > > > > >> randomWrite at offset 0 for 1048576 rows
> > > > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-1
> > Start
> > > > > > >> randomWrite at offset 1048576 for 1048576 rows
> > > > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-2
> > Start
> > > > > > >> randomWrite at offset 2097152 for 1048576 rows
> > > > > > >> 09/08/13 09:51:47 INFO hbase.PerformanceEvaluation: client-0
> > > > > > >> 0/104857/1048576
> > > > > > >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-1
> > > > > > >> 1048576/1153427/2097152
> > > > > > >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-2
> > > > > > >> 2097152/2201997/3145728
> > > > > > >> 09/08/13 09:52:22 INFO hbase.PerformanceEvaluation: client-1
> > > > > > >> 1048576/1258284/2097152
> > > > > > >> 09/08/13 09:52:23 INFO hbase.PerformanceEvaluation: client-0
> > > > > > >> 0/209714/1048576
> > > > > > >> 09/08/13 09:52:24 INFO hbase.PerformanceEvaluation: client-2
> > > > > > >> 2097152/2306854/3145728
> > > > > > >> 09/08/13 09:52:47 INFO hbase.PerformanceEvaluation: client-1
> > > > > > >> 1048576/1363141/2097152
> > > > > > >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-0
> > > > > > >> 0/314571/1048576
> > > > > > >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-2
> > > > > > >> 2097152/2411711/3145728
> > > > > > >> 09/08/13 09:53:24 INFO hbase.PerformanceEvaluation: client-1
> > > > > > >> 1048576/1467998/2097152
> > > > > > >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-0
> > > > > > >> 0/419428/1048576
> > > > > > >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-2
> > > > > > >> 2097152/2516568/3145728
> > > > > > >> 09/08/13 09:53:48 INFO hbase.PerformanceEvaluation: client-1
> > > > > > >> 1048576/1572855/2097152
> > > > > > >> 09/08/13 09:54:08 INFO hbase.PerformanceEvaluation: client-2
> > > > > > >> 2097152/2621425/3145728
> > > > > > >> 09/08/13 09:54:10 INFO hbase.PerformanceEvaluation: client-0
> > > > > > >> 0/524285/1048576
> > > > > > >> 09/08/13 09:54:40 INFO hbase.PerformanceEvaluation: client-1
> > > > > > >> 1048576/1677712/2097152
> > > > > > >> 09/08/13 09:54:49 INFO hbase.PerformanceEvaluation: client-2
> > > > > > >> 2097152/2726282/3145728
> > > > > > >> 09/08/13 09:54:52 INFO hbase.PerformanceEvaluation: client-0
> > > > > > >> 0/629142/1048576
> > > > > > >> 09/08/13 09:55:57 INFO hbase.PerformanceEvaluation: client-1
> > > > > > >> 1048576/1782569/2097152
> > > > > > >> 09/08/13 09:56:21 INFO hbase.PerformanceEvaluation: client-2
> > > > > > >> 2097152/2831139/3145728
> > > > > > >> 09/08/13 09:56:41 INFO hbase.PerformanceEvaluation: client-0
> > > > > > >> 0/733999/1048576
> > > > > > >> 09/08/13 09:57:23 INFO hbase.PerformanceEvaluation: client-1
> > > > > > >> 1048576/1887426/2097152
> > > > > > >> 09/08/13 09:58:40 INFO hbase.PerformanceEvaluation: client-2
> > > > > > >> 2097152/2935996/3145728
> > > > > > >> 09/08/13 09:58:54 INFO hbase.PerformanceEvaluation: client-0
> > > > > > >> 0/838856/1048576
> > > > > > >> 09/08/13 10:00:29 INFO hbase.PerformanceEvaluation: client-1
> > > > > > >> 1048576/1992283/2097152
> > > > > > >> 09/08/13 10:01:01 INFO hbase.PerformanceEvaluation: client-2
> > > > > > >> 2097152/3040853/3145728
> > > > > > >> 09/08/13 10:01:24 INFO hbase.PerformanceEvaluation: client-0
> > > > > > >> 0/943713/1048576
> > > > > > >> 09/08/13 10:02:36 INFO hbase.PerformanceEvaluation: client-1
> > > > > > >> 1048576/2097140/2097152
> > > > > > >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: client-1
> > > > Finished
> > > > > > >> randomWrite in 680674ms at offset 1048576 for 1048576 rows
> > > > > > >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: Finished 1
> > in
> > > > > > 680674ms
> > > > > > >> writing 1048576 rows
> > > > > > >> 09/08/13 10:03:19 INFO hbase.PerformanceEvaluation: client-2
> > > > > > >> 2097152/3145710/3145728
> > > > > > >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: client-2
> > > > Finished
> > > > > > >> randomWrite in 723771ms at offset 2097152 for 1048576 rows
> > > > > > >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: Finished 2
> > in
> > > > > > 723771ms
> > > > > > >> writing 1048576 rows
> > > > > > >> 09/08/13 10:03:41 INFO hbase.PerformanceEvaluation: client-0
> > > > > > >> 0/1048570/1048576
> > > > > > >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: client-0
> > > > Finished
> > > > > > >> randomWrite in 746054ms at offset 0 for 1048576 rows
> > > > > > >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: Finished 0
> > in
> > > > > > 746054ms
> > > > > > >> writing 1048576 rows
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> ============================================================
> > > > > > >>
> > > > > > >> Still pretty slow.  Any other ideas?  I'm running the client
> > from
> > > > the
> > > > > > >> master
> > > > > > >> box, but its not running any regionServers or datanodes.
> > > > > > >>
> > > > > > >> stack-3 wrote:
> > > > > > >> >
> > > > > > >> > Your config. looks fine.
> > > > > > >> >
> > > > > > >> > Only think that gives me pause is:
> > > > > > >> >
> > > > > > >> > "-XX:NewSize=6m -XX:MaxNewSize=6m"
> > > > > > >> >
> > > > > > >> > Any reason for the above?
> > > > > > >> >
> > > > > > >> > If you study your GC logs, lots of pauses?
> > > > > > >> >
> > > > > > >> > Oh, and this: replication is set to 6.  Why 6?  Each write
> > must
> > > > > commit
> > > > > > >> to
> > > > > > >> > 6
> > > > > > >> > datanodes before complete.  In the tests posted on wiki, we
> > > > > replicate
> > > > > > to
> > > > > > >> 3
> > > > > > >> > nodes.
> > > > > > >> >
> > > > > > >> > In end of this message you say you are doing gets?  Numbers
> > you
> > > > > posted
> > > > > > >> > were
> > > > > > >> > for writes?
> > > > > > >> >
> > > > > > >> > St.Ack
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Wed, Aug 12, 2009 at 1:15 PM, llpind <
> > sonny_heer@hotmail.com
> > > >
> > > > > > wrote:
> > > > > > >> >
> > > > > > >> >>
> > > > > > >> >> Not sure why my performance is so slow.  Here is my
> > > > configuration:
> > > > > > >> >>
> > > > > > >> >> box1:
> > > > > > >> >> 10395 SecondaryNameNode
> > > > > > >> >> 11628 Jps
> > > > > > >> >> 10131 NameNode
> > > > > > >> >> 10638 HQuorumPeer
> > > > > > >> >> 10705 HMaster
> > > > > > >> >>
> > > > > > >> >> box 2-5:
> > > > > > >> >> 6741 HQuorumPeer
> > > > > > >> >> 6841 HRegionServer
> > > > > > >> >> 7881 Jps
> > > > > > >> >> 6610 DataNode
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >> >> hbase site: =======================
> > > > > > >> >> <?xml version="1.0"?>
> > > > > > >> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > > > >> >> <!--
> > > > > > >> >> /**
> > > > > > >> >>  * Copyright 2007 The Apache Software Foundation
> > > > > > >> >>  *
> > > > > > >> >>  * Licensed to the Apache Software Foundation (ASF) under
> one
> > > > > > >> >>  * or more contributor license agreements.  See the NOTICE
> > file
> > > > > > >> >>  * distributed with this work for additional information
> > > > > > >> >>  * regarding copyright ownership.  The ASF licenses this
> file
> > > > > > >> >>  * to you under the Apache License, Version 2.0 (the
> > > > > > >> >>  * "License"); you may not use this file except in
> compliance
> > > > > > >> >>  * with the License.  You may obtain a copy of the License
> at
> > > > > > >> >>  *
> > > > > > >> >>  *     http://www.apache.org/licenses/LICENSE-2.0
> > > > > > >> >>  *
> > > > > > >> >>  * Unless required by applicable law or agreed to in
> writing,
> > > > > > software
> > > > > > >> >>  * distributed under the License is distributed on an "AS
> IS"
> > > > > BASIS,
> > > > > > >> >>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
> > express
> > > > or
> > > > > > >> >> implied.
> > > > > > >> >>  * See the License for the specific language governing
> > > > permissions
> > > > > > and
> > > > > > >> >>  * limitations under the License.
> > > > > > >> >>  */
> > > > > > >> >> -->
> > > > > > >> >> <configuration>
> > > > > > >> >>  <property>
> > > > > > >> >>    <name>hbase.rootdir</name>
> > > > > > >> >>    <value>hdfs://box1:9000/hbase</value>
> > > > > > >> >>    <description>The directory shared by region servers.
> > > > > > >> >>    </description>
> > > > > > >> >>  </property>
> > > > > > >> >>  <property>
> > > > > > >> >>    <name>hbase.master.port</name>
> > > > > > >> >>    <value>60000</value>
> > > > > > >> >>    <description>The port that the HBase master runs at.
> > > > > > >> >>    </description>
> > > > > > >> >>  </property>
> > > > > > >> >>  <property>
> > > > > > >> >>    <name>hbase.cluster.distributed</name>
> > > > > > >> >>    <value>true</value>
> > > > > > >> >>    <description>The mode the cluster will be in. Possible
> > > values
> > > > > are
> > > > > > >> >>      false: standalone and pseudo-distributed setups with
> > > managed
> > > > > > >> >> Zookeeper
> > > > > > >> >>      true: fully-distributed with unmanaged Zookeeper
> Quorum
> > > (see
> > > > > > >> >> hbase-env.sh)
> > > > > > >> >>    </description>
> > > > > > >> >>  </property>
> > > > > > >> >>  <property>
> > > > > > >> >>    <name>hbase.regionserver.lease.period</name>
> > > > > > >> >>    <value>120000</value>
> > > > > > >> >>    <description>HRegion server lease period in
> milliseconds.
> > > > > Default
> > > > > > is
> > > > > > >> >>    60 seconds. Clients must report in within this period
> else
> > > > they
> > > > > > are
> > > > > > >> >>    considered dead.</description>
> > > > > > >> >>  </property>
> > > > > > >> >>
> > > > > > >> >>  <property>
> > > > > > >> >>      <name>hbase.zookeeper.property.clientPort</name>
> > > > > > >> >>      <value>2222</value>
> > > > > > >> >>      <description>Property from ZooKeeper's config zoo.cfg.
> > > > > > >> >>      The port at which the clients will connect.
> > > > > > >> >>      </description>
> > > > > > >> >>  </property>
> > > > > > >> >>  <property>
> > > > > > >> >>      <name>hbase.zookeeper.property.dataDir</name>
> > > > > > >> >>      <value>/home/hadoop/zookeeper</value>
> > > > > > >> >>  </property>
> > > > > > >> >>  <property>
> > > > > > >> >>      <name>hbase.zookeeper.property.syncLimit</name>
> > > > > > >> >>      <value>5</value>
> > > > > > >> >>  </property>
> > > > > > >> >>  <property>
> > > > > > >> >>      <name>hbase.zookeeper.property.tickTime</name>
> > > > > > >> >>      <value>2000</value>
> > > > > > >> >>  </property>
> > > > > > >> >>  <property>
> > > > > > >> >>      <name>hbase.zookeeper.property.initLimit</name>
> > > > > > >> >>      <value>10</value>
> > > > > > >> >>  </property>
> > > > > > >> >>  <property>
> > > > > > >> >>      <name>hbase.zookeeper.quorum</name>
> > > > > > >> >>      <value>box1,box2,box3,box4</value>
> > > > > > >> >>      <description>Comma separated list of servers in the
> > > > ZooKeeper
> > > > > > >> >> Quorum.
> > > > > > >> >>      For example,
> > > > > > >> >> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com
> ".
> > > > > > >> >>      By default this is set to localhost for local and
> > > > > > >> pseudo-distributed
> > > > > > >> >> modes
> > > > > > >> >>      of operation. For a fully-distributed setup, this
> should
> > > be
> > > > > set
> > > > > > to
> > > > > > >> a
> > > > > > >> >> full
> > > > > > >> >>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK
> is
> > > set
> > > > > in
> > > > > > >> >> hbase-env.sh
> > > > > > >> >>      this is the list of servers which we will start/stop
> > > > ZooKeeper
> > > > > > on.
> > > > > > >> >>      </description>
> > > > > > >> >>  </property>
> > > > > > >> >>  <property>
> > > > > > >> >>    <name>hfile.block.cache.size</name>
> > > > > > >> >>    <value>.5</value>
> > > > > > >> >>    <description>text</description>
> > > > > > >> >>  </property>
> > > > > > >> >>
> > > > > > >> >> </configuration>
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >> >> hbase
> > env:====================================================
> > > > > > >> >>
> > > > > > >> >> export HBASE_CLASSPATH=${HADOOP_CONF_DIR}
> > > > > > >> >>
> > > > > > >> >> export HBASE_HEAPSIZE=3000
> > > > > > >> >>
> > > > > > >> >> export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m
> > > > > > >> >> -XX:+UseConcMarkSweepGC
> > > > > > >> >> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> > > > > > >> >> -XX:+CMSIncrementalMode
> > > > > > >> >> -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"
> > > > > > >> >>
> > > > > > >> >> export HBASE_MANAGES_ZK=true
> > > > > > >> >>
> > > > > > >> >> Hadoop core
> > > > > > >> >>
> > site===========================================================
> > > > > > >> >>
> > > > > > >> >> <?xml version="1.0"?>
> > > > > > >> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > > > >> >>
> > > > > > >> >> <!-- Put site-specific property overrides in this file. -->
> > > > > > >> >>
> > > > > > >> >> <configuration>
> > > > > > >> >> <property>
> > > > > > >> >>   <name>fs.default.name</name>
> > > > > > >> >>   <value>hdfs://box1:9000</value>
> > > > > > >> >>   <description>The name of the default file system.  A URI
> > > whose
> > > > > > >> >>   scheme and authority determine the FileSystem
> > implementation.
> > > > >  The
> > > > > > >> >>   uri's scheme determines the config property
> > (fs.SCHEME.impl)
> > > > > naming
> > > > > > >> >>   the FileSystem implementation class.  The uri's authority
> > is
> > > > used
> > > > > > to
> > > > > > >> >>   determine the host, port, etc. for a
> > > filesystem.</description>
> > > > > > >> >> </property>
> > > > > > >> >> <property>
> > > > > > >> >>  <name>hadoop.tmp.dir</name>
> > > > > > >> >>  <value>/data/hadoop-0.20.0-${user.name}</value>
> > > > > > >> >>  <description>A base for other temporary
> > > > directories.</description>
> > > > > > >> >> </property>
> > > > > > >> >> </configuration>
> > > > > > >> >>
> > > > > > >> >> ==============
> > > > > > >> >>
> > > > > > >> >> replication is set to 6.
> > > > > > >> >>
> > > > > > >> >> hadoop env=================
> > > > > > >> >>
> > > > > > >> >> export HADOOP_HEAPSIZE=3000
> > > > > > >> >> export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
> > > > > > >> >> $HADOOP_NAMENODE_OPTS"
> > > > > > >> >> export
> > > > > HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
> > > > > > >> >> $HADOOP_SECONDARYNAMENODE_OPTS"
> > > > > > >> >> export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
> > > > > > >> >> $HADOOP_DATANODE_OPTS"
> > > > > > >> >> export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
> > > > > > >> >> $HADOOP_BALANCER_OPTS"
> > > > > > >> >> export
> HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
> > > > > > >> >> $HADOOP_JOBTRACKER_OPTS"
> > > > > > >> >>  ==================
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >> >> Very basic setup.  then i start the cluster do simple
> random
> > > Get
> > > > > > >> >> operations
> > > > > > >> >> on a tall table (~60 M rows):
> > > > > > >> >>
> > > > > > >> >> {NAME => 'tallTable', FAMILIES => [{NAME => 'family1',
> > > > COMPRESSION
> > > > > =>
> > > > > > >> >> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE =>
> > > > '65536',
> > > > > > >> >> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
> > > > > > >> >>
> > > > > > >> >> Is this fairly normal speeds?  I'm unsure if this is a
> result
> > > of
> > > > > > having
> > > > > > >> a
> > > > > > >> >> small cluster?  Please advise...
> > > > > > >> >>
> > > > > > >> >> stack-3 wrote:
> > > > > > >> >> >
> > > > > > >> >> > Yeah, seems slow.  In old hbase, it could do 5-10k writes
> a
> > > > > second
> > > > > > >> >> going
> > > > > > >> >> > by
> > > > > > >> >> > performance eval page up on wiki.  SequentialWrite was
> > about
> > > > same
> > > > > > as
> > > > > > >> >> > RandomWrite.  Check out the stats on hw up on that page
> and
> > > > > > >> description
> > > > > > >> >> of
> > > > > > >> >> > how test was set up.  Can you figure where its slow?
> > > > > > >> >> >
> > > > > > >> >> > St.Ack
> > > > > > >> >> >
> > > > > > >> >> > On Wed, Aug 12, 2009 at 10:10 AM, llpind <
> > > > sonny_heer@hotmail.com
> > > > > >
> > > > > > >> >> wrote:
> > > > > > >> >> >
> > > > > > >> >> >>
> > > > > > >> >> >> Thanks Stack.
> > > > > > >> >> >>
> > > > > > >> >> >> I will try mapred with more clients.   I tried it
> without
> > > > mapred
> > > > > > >> using
> > > > > > >> >> 3
> > > > > > >> >> >> clients Random Write operations here was the output:
> > > > > > >> >> >>
> > > > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation:
> > client-0
> > > > > Start
> > > > > > >> >> >> randomWrite at offset 0 for 1048576 rows
> > > > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation:
> > client-1
> > > > > Start
> > > > > > >> >> >> randomWrite at offset 1048576 for 1048576 rows
> > > > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation:
> > client-2
> > > > > Start
> > > > > > >> >> >> randomWrite at offset 2097152 for 1048576 rows
> > > > > > >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation:
> > client-1
> > > > > > >> >> >> 1048576/1153427/2097152
> > > > > > >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation:
> > client-2
> > > > > > >> >> >> 2097152/2201997/3145728
> > > > > > >> >> >> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation:
> > client-0
> > > > > > >> >> >> 0/104857/1048576
> > > > > > >> >> >> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation:
> > client-0
> > > > > > >> >> >> 0/209714/1048576
> > > > > > >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation:
> > client-1
> > > > > > >> >> >> 1048576/1258284/2097152
> > > > > > >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation:
> > client-2
> > > > > > >> >> >> 2097152/2306854/3145728
> > > > > > >> >> >> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation:
> > client-1
> > > > > > >> >> >> 1048576/1363141/2097152
> > > > > > >> >> >> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation:
> > client-0
> > > > > > >> >> >> 0/314571/1048576
> > > > > > >> >> >> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation:
> > client-2
> > > > > > >> >> >> 2097152/2411711/3145728
> > > > > > >> >> >> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation:
> > client-0
> > > > > > >> >> >> 0/419428/1048576
> > > > > > >> >> >> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation:
> > client-1
> > > > > > >> >> >> 1048576/1467998/2097152
> > > > > > >> >> >> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation:
> > client-2
> > > > > > >> >> >> 2097152/2516568/3145728
> > > > > > >> >> >> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation:
> > client-0
> > > > > > >> >> >> 0/524285/1048576
> > > > > > >> >> >> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation:
> > client-2
> > > > > > >> >> >> 2097152/2621425/3145728
> > > > > > >> >> >> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation:
> > client-1
> > > > > > >> >> >> 1048576/1572855/2097152
> > > > > > >> >> >> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation:
> > client-0
> > > > > > >> >> >> 0/629142/1048576
> > > > > > >> >> >> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation:
> > client-2
> > > > > > >> >> >> 2097152/2726282/3145728
> > > > > > >> >> >> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation:
> > client-1
> > > > > > >> >> >> 1048576/1677712/2097152
> > > > > > >> >> >> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation:
> > client-0
> > > > > > >> >> >> 0/733999/1048576
> > > > > > >> >> >> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation:
> > client-2
> > > > > > >> >> >> 2097152/2831139/3145728
> > > > > > >> >> >> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation:
> > client-1
> > > > > > >> >> >> 1048576/1782569/2097152
> > > > > > >> >> >> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation:
> > client-0
> > > > > > >> >> >> 0/838856/1048576
> > > > > > >> >> >> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation:
> > client-2
> > > > > > >> >> >> 2097152/2935996/3145728
> > > > > > >> >> >> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation:
> > client-1
> > > > > > >> >> >> 1048576/1887426/2097152
> > > > > > >> >> >> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation:
> > client-0
> > > > > > >> >> >> 0/943713/1048576
> > > > > > >> >> >> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation:
> > client-2
> > > > > > >> >> >> 2097152/3040853/3145728
> > > > > > >> >> >> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation:
> > client-1
> > > > > > >> >> >> 1048576/1992283/2097152
> > > > > > >> >> >> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation:
> > client-0
> > > > > > >> >> >> 0/1048570/1048576
> > > > > > >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation:
> > client-0
> > > > > > >> Finished
> > > > > > >> >> >> randomWrite in 2376615ms at offset 0 for 1048576 rows
> > > > > > >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation:
> > Finished
> > > 0
> > > > > in
> > > > > > >> >> >> 2376615ms
> > > > > > >> >> >> writing 1048576 rows
> > > > > > >> >> >> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation:
> > client-2
> > > > > > >> >> >> 2097152/3145710/3145728
> > > > > > >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation:
> > client-2
> > > > > > >> Finished
> > > > > > >> >> >> randomWrite in 2623395ms at offset 2097152 for 1048576
> > rows
> > > > > > >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation:
> > Finished
> > > 2
> > > > > in
> > > > > > >> >> >> 2623395ms
> > > > > > >> >> >> writing 1048576 rows
> > > > > > >> >> >> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation:
> > client-1
> > > > > > >> >> >> 1048576/2097140/2097152
> > > > > > >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation:
> > client-1
> > > > > > >> Finished
> > > > > > >> >> >> randomWrite in 2630199ms at offset 1048576 for 1048576
> > rows
> > > > > > >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation:
> > Finished
> > > 1
> > > > > in
> > > > > > >> >> >> 2630199ms
> > > > > > >> >> >> writing 1048576 rows
> > > > > > >> >> >>
> > > > > > >> >> >>
> > > > > > >> >> >>
> > > > > > >> >> >> Seems kind of slow for ~3M records.  I have a 4 node
> > cluster
> > > > up
> > > > > at
> > > > > > >> the
> > > > > > >> >> >> moment.  HMaster & Namenode running on same box.
> > > > > > >> >> >> --
> > > > > > >> >> >> View this message in context:
> > > > > > >> >> >>
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
> > > > > > >> >> >> Sent from the HBase User mailing list archive at
> > Nabble.com.
> > > > > > >> >> >>
> > > > > > >> >> >>
> > > > > > >> >> >
> > > > > > >> >> >
> > > > > > >> >>
> > > > > > >> >> --
> > > > > > >> >> View this message in context:
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
> > > > > > >> >> Sent from the HBase User mailing list archive at
> Nabble.com.
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > > >> --
> > > > > > >> View this message in context:
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24955595.html
> > > > > > >> Sent from the HBase User mailing list archive at Nabble.com.
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: HBase in a real world application

Posted by stack <st...@duboce.net>.
Can you make an issue and a patch please Schubert?
St.Ack

On Tue, Aug 18, 2009 at 10:52 AM, Schubert Zhang <zs...@gmail.com> wrote:

> We found that there are two issues about the PerformanceEvaluation class.
> - Is not match for hadoop-0.20.0.
> - The approach to split map is not strict. Need to provide correct
> InputSplit and InputFormat classes.
>
> And we have just modified the org.apache.hadoop.hbase.PerformanceEvaluation
> for our evaluations. Please get our code at:
>
> http://dl.getdropbox.com/u/24074/code/PerformanceEvaluation.java
>
> Here is our evaluations of 0.20.0 RC1
> http://docloud.blogspot.com/2009/08/hbase-0200-performance-evaluation.html
>
> Schubert
>
> On Tue, Aug 18, 2009 at 8:37 AM, Jeff Hammerbacher <hammer@cloudera.com
> >wrote:
>
> > Thanks guys. For the lazy (e.g. me) and future searchers, here are some
> > links. The benchmark is meant to simulate the same performance tests
> quoted
> > in Google's BigTable paper.
> >
> > * PerformanceEvaluation wiki page:
> > http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation
> > * PerformanceEvaluation.java:
> >
> >
> http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/test/org/apache/hadoop/hbase/PerformanceEvaluation.java?view=co
> >
> > Thanks,
> > Jeff
> >
> > On Mon, Aug 17, 2009 at 5:09 PM, stack <st...@duboce.net> wrote:
> >
> > > On Mon, Aug 17, 2009 at 4:54 PM, Jeff Hammerbacher <
> hammer@cloudera.com
> > > >wrote:
> > >
> > > > Hey Stack,
> > > >
> > > > I notice that the patch for this issue doesn't include any sort of
> > tests
> > > > that might have caught this regression. Do you guys have an
> HBaseBench,
> > > > HBaseMix, or similarly named tool for catching performance
> regressions?
> > > >
> > >
> > > Not as part of our build.  The way its currently done is that near
> > release,
> > > we run our little PerformanceEvaluation doohickey.  If its way off,
> crack
> > > the profiler.
> > >
> > > We have been trying to get some of the hadoop allotment of EC2 time so
> we
> > > could set up a regular run up on AWS but no luck so far.
> > >
> > > Good on your Jeff,
> > > St.Ack
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > Thanks,
> > > > Jeff
> > > >
> > > > On Mon, Aug 17, 2009 at 4:51 PM, stack <st...@duboce.net> wrote:
> > > >
> > > > > Our writes were off by a factor of 7 or 8.  Writes should be better
> > now
> > > > > (HBASE-1771).
> > > > > Thanks,
> > > > > St.Ack
> > > > >
> > > > >
> > > > > On Thu, Aug 13, 2009 at 4:53 PM, stack <st...@duboce.net> wrote:
> > > > >
> > > > > > I just tried it.  It seems slow to me writing too.  Let me take a
> > > > > look....
> > > > > > St.Ack
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 13, 2009 at 10:06 AM, llpind <sonny_heer@hotmail.com
> >
> > > > wrote:
> > > > > >
> > > > > >>
> > > > > >> Okay I changed replication to 2.  and removed "-XX:NewSize=6m
> > > > > >> -XX:MaxNewSize=6m"
> > > > > >>
> > > > > >> here is results for randomWrite 3 clients:
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> RandomWrite =================================================
> > > > > >>
> > > > > >> hadoop-0.20.0/bin/hadoop jar hbase-0.20.0/hbase-0.20.0-test.jar
> > > > > >>  --nomapred
> > > > > >> randomWrite 3
> > > > > >>
> > > > > >>
> > > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-0
> Start
> > > > > >> randomWrite at offset 0 for 1048576 rows
> > > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-1
> Start
> > > > > >> randomWrite at offset 1048576 for 1048576 rows
> > > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-2
> Start
> > > > > >> randomWrite at offset 2097152 for 1048576 rows
> > > > > >> 09/08/13 09:51:47 INFO hbase.PerformanceEvaluation: client-0
> > > > > >> 0/104857/1048576
> > > > > >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-1
> > > > > >> 1048576/1153427/2097152
> > > > > >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-2
> > > > > >> 2097152/2201997/3145728
> > > > > >> 09/08/13 09:52:22 INFO hbase.PerformanceEvaluation: client-1
> > > > > >> 1048576/1258284/2097152
> > > > > >> 09/08/13 09:52:23 INFO hbase.PerformanceEvaluation: client-0
> > > > > >> 0/209714/1048576
> > > > > >> 09/08/13 09:52:24 INFO hbase.PerformanceEvaluation: client-2
> > > > > >> 2097152/2306854/3145728
> > > > > >> 09/08/13 09:52:47 INFO hbase.PerformanceEvaluation: client-1
> > > > > >> 1048576/1363141/2097152
> > > > > >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-0
> > > > > >> 0/314571/1048576
> > > > > >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-2
> > > > > >> 2097152/2411711/3145728
> > > > > >> 09/08/13 09:53:24 INFO hbase.PerformanceEvaluation: client-1
> > > > > >> 1048576/1467998/2097152
> > > > > >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-0
> > > > > >> 0/419428/1048576
> > > > > >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-2
> > > > > >> 2097152/2516568/3145728
> > > > > >> 09/08/13 09:53:48 INFO hbase.PerformanceEvaluation: client-1
> > > > > >> 1048576/1572855/2097152
> > > > > >> 09/08/13 09:54:08 INFO hbase.PerformanceEvaluation: client-2
> > > > > >> 2097152/2621425/3145728
> > > > > >> 09/08/13 09:54:10 INFO hbase.PerformanceEvaluation: client-0
> > > > > >> 0/524285/1048576
> > > > > >> 09/08/13 09:54:40 INFO hbase.PerformanceEvaluation: client-1
> > > > > >> 1048576/1677712/2097152
> > > > > >> 09/08/13 09:54:49 INFO hbase.PerformanceEvaluation: client-2
> > > > > >> 2097152/2726282/3145728
> > > > > >> 09/08/13 09:54:52 INFO hbase.PerformanceEvaluation: client-0
> > > > > >> 0/629142/1048576
> > > > > >> 09/08/13 09:55:57 INFO hbase.PerformanceEvaluation: client-1
> > > > > >> 1048576/1782569/2097152
> > > > > >> 09/08/13 09:56:21 INFO hbase.PerformanceEvaluation: client-2
> > > > > >> 2097152/2831139/3145728
> > > > > >> 09/08/13 09:56:41 INFO hbase.PerformanceEvaluation: client-0
> > > > > >> 0/733999/1048576
> > > > > >> 09/08/13 09:57:23 INFO hbase.PerformanceEvaluation: client-1
> > > > > >> 1048576/1887426/2097152
> > > > > >> 09/08/13 09:58:40 INFO hbase.PerformanceEvaluation: client-2
> > > > > >> 2097152/2935996/3145728
> > > > > >> 09/08/13 09:58:54 INFO hbase.PerformanceEvaluation: client-0
> > > > > >> 0/838856/1048576
> > > > > >> 09/08/13 10:00:29 INFO hbase.PerformanceEvaluation: client-1
> > > > > >> 1048576/1992283/2097152
> > > > > >> 09/08/13 10:01:01 INFO hbase.PerformanceEvaluation: client-2
> > > > > >> 2097152/3040853/3145728
> > > > > >> 09/08/13 10:01:24 INFO hbase.PerformanceEvaluation: client-0
> > > > > >> 0/943713/1048576
> > > > > >> 09/08/13 10:02:36 INFO hbase.PerformanceEvaluation: client-1
> > > > > >> 1048576/2097140/2097152
> > > > > >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: client-1
> > > Finished
> > > > > >> randomWrite in 680674ms at offset 1048576 for 1048576 rows
> > > > > >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: Finished 1
> in
> > > > > 680674ms
> > > > > >> writing 1048576 rows
> > > > > >> 09/08/13 10:03:19 INFO hbase.PerformanceEvaluation: client-2
> > > > > >> 2097152/3145710/3145728
> > > > > >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: client-2
> > > Finished
> > > > > >> randomWrite in 723771ms at offset 2097152 for 1048576 rows
> > > > > >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: Finished 2
> in
> > > > > 723771ms
> > > > > >> writing 1048576 rows
> > > > > >> 09/08/13 10:03:41 INFO hbase.PerformanceEvaluation: client-0
> > > > > >> 0/1048570/1048576
> > > > > >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: client-0
> > > Finished
> > > > > >> randomWrite in 746054ms at offset 0 for 1048576 rows
> > > > > >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: Finished 0
> in
> > > > > 746054ms
> > > > > >> writing 1048576 rows
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> ============================================================
> > > > > >>
> > > > > >> Still pretty slow.  Any other ideas?  I'm running the client
> from
> > > the
> > > > > >> master
> > > > > >> box, but its not running any regionServers or datanodes.
> > > > > >>
> > > > > >> stack-3 wrote:
> > > > > >> >
> > > > > >> > Your config. looks fine.
> > > > > >> >
> > > > > >> > Only think that gives me pause is:
> > > > > >> >
> > > > > >> > "-XX:NewSize=6m -XX:MaxNewSize=6m"
> > > > > >> >
> > > > > >> > Any reason for the above?
> > > > > >> >
> > > > > >> > If you study your GC logs, lots of pauses?
> > > > > >> >
> > > > > >> > Oh, and this: replication is set to 6.  Why 6?  Each write
> must
> > > > commit
> > > > > >> to
> > > > > >> > 6
> > > > > >> > datanodes before complete.  In the tests posted on wiki, we
> > > > replicate
> > > > > to
> > > > > >> 3
> > > > > >> > nodes.
> > > > > >> >
> > > > > >> > In end of this message you say you are doing gets?  Numbers
> you
> > > > posted
> > > > > >> > were
> > > > > >> > for writes?
> > > > > >> >
> > > > > >> > St.Ack
> > > > > >> >
> > > > > >> >
> > > > > >> > On Wed, Aug 12, 2009 at 1:15 PM, llpind <
> sonny_heer@hotmail.com
> > >
> > > > > wrote:
> > > > > >> >
> > > > > >> >>
> > > > > >> >> Not sure why my performance is so slow.  Here is my
> > > configuration:
> > > > > >> >>
> > > > > >> >> box1:
> > > > > >> >> 10395 SecondaryNameNode
> > > > > >> >> 11628 Jps
> > > > > >> >> 10131 NameNode
> > > > > >> >> 10638 HQuorumPeer
> > > > > >> >> 10705 HMaster
> > > > > >> >>
> > > > > >> >> box 2-5:
> > > > > >> >> 6741 HQuorumPeer
> > > > > >> >> 6841 HRegionServer
> > > > > >> >> 7881 Jps
> > > > > >> >> 6610 DataNode
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> hbase site: =======================
> > > > > >> >> <?xml version="1.0"?>
> > > > > >> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > > >> >> <!--
> > > > > >> >> /**
> > > > > >> >>  * Copyright 2007 The Apache Software Foundation
> > > > > >> >>  *
> > > > > >> >>  * Licensed to the Apache Software Foundation (ASF) under one
> > > > > >> >>  * or more contributor license agreements.  See the NOTICE
> file
> > > > > >> >>  * distributed with this work for additional information
> > > > > >> >>  * regarding copyright ownership.  The ASF licenses this file
> > > > > >> >>  * to you under the Apache License, Version 2.0 (the
> > > > > >> >>  * "License"); you may not use this file except in compliance
> > > > > >> >>  * with the License.  You may obtain a copy of the License at
> > > > > >> >>  *
> > > > > >> >>  *     http://www.apache.org/licenses/LICENSE-2.0
> > > > > >> >>  *
> > > > > >> >>  * Unless required by applicable law or agreed to in writing,
> > > > > software
> > > > > >> >>  * distributed under the License is distributed on an "AS IS"
> > > > BASIS,
> > > > > >> >>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
> express
> > > or
> > > > > >> >> implied.
> > > > > >> >>  * See the License for the specific language governing
> > > permissions
> > > > > and
> > > > > >> >>  * limitations under the License.
> > > > > >> >>  */
> > > > > >> >> -->
> > > > > >> >> <configuration>
> > > > > >> >>  <property>
> > > > > >> >>    <name>hbase.rootdir</name>
> > > > > >> >>    <value>hdfs://box1:9000/hbase</value>
> > > > > >> >>    <description>The directory shared by region servers.
> > > > > >> >>    </description>
> > > > > >> >>  </property>
> > > > > >> >>  <property>
> > > > > >> >>    <name>hbase.master.port</name>
> > > > > >> >>    <value>60000</value>
> > > > > >> >>    <description>The port that the HBase master runs at.
> > > > > >> >>    </description>
> > > > > >> >>  </property>
> > > > > >> >>  <property>
> > > > > >> >>    <name>hbase.cluster.distributed</name>
> > > > > >> >>    <value>true</value>
> > > > > >> >>    <description>The mode the cluster will be in. Possible
> > values
> > > > are
> > > > > >> >>      false: standalone and pseudo-distributed setups with
> > managed
> > > > > >> >> Zookeeper
> > > > > >> >>      true: fully-distributed with unmanaged Zookeeper Quorum
> > (see
> > > > > >> >> hbase-env.sh)
> > > > > >> >>    </description>
> > > > > >> >>  </property>
> > > > > >> >>  <property>
> > > > > >> >>    <name>hbase.regionserver.lease.period</name>
> > > > > >> >>    <value>120000</value>
> > > > > >> >>    <description>HRegion server lease period in milliseconds.
> > > > Default
> > > > > is
> > > > > >> >>    60 seconds. Clients must report in within this period else
> > > they
> > > > > are
> > > > > >> >>    considered dead.</description>
> > > > > >> >>  </property>
> > > > > >> >>
> > > > > >> >>  <property>
> > > > > >> >>      <name>hbase.zookeeper.property.clientPort</name>
> > > > > >> >>      <value>2222</value>
> > > > > >> >>      <description>Property from ZooKeeper's config zoo.cfg.
> > > > > >> >>      The port at which the clients will connect.
> > > > > >> >>      </description>
> > > > > >> >>  </property>
> > > > > >> >>  <property>
> > > > > >> >>      <name>hbase.zookeeper.property.dataDir</name>
> > > > > >> >>      <value>/home/hadoop/zookeeper</value>
> > > > > >> >>  </property>
> > > > > >> >>  <property>
> > > > > >> >>      <name>hbase.zookeeper.property.syncLimit</name>
> > > > > >> >>      <value>5</value>
> > > > > >> >>  </property>
> > > > > >> >>  <property>
> > > > > >> >>      <name>hbase.zookeeper.property.tickTime</name>
> > > > > >> >>      <value>2000</value>
> > > > > >> >>  </property>
> > > > > >> >>  <property>
> > > > > >> >>      <name>hbase.zookeeper.property.initLimit</name>
> > > > > >> >>      <value>10</value>
> > > > > >> >>  </property>
> > > > > >> >>  <property>
> > > > > >> >>      <name>hbase.zookeeper.quorum</name>
> > > > > >> >>      <value>box1,box2,box3,box4</value>
> > > > > >> >>      <description>Comma separated list of servers in the
> > > ZooKeeper
> > > > > >> >> Quorum.
> > > > > >> >>      For example,
> > > > > >> >> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
> > > > > >> >>      By default this is set to localhost for local and
> > > > > >> pseudo-distributed
> > > > > >> >> modes
> > > > > >> >>      of operation. For a fully-distributed setup, this should
> > be
> > > > set
> > > > > to
> > > > > >> a
> > > > > >> >> full
> > > > > >> >>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is
> > set
> > > > in
> > > > > >> >> hbase-env.sh
> > > > > >> >>      this is the list of servers which we will start/stop
> > > ZooKeeper
> > > > > on.
> > > > > >> >>      </description>
> > > > > >> >>  </property>
> > > > > >> >>  <property>
> > > > > >> >>    <name>hfile.block.cache.size</name>
> > > > > >> >>    <value>.5</value>
> > > > > >> >>    <description>text</description>
> > > > > >> >>  </property>
> > > > > >> >>
> > > > > >> >> </configuration>
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> hbase
> env:====================================================
> > > > > >> >>
> > > > > >> >> export HBASE_CLASSPATH=${HADOOP_CONF_DIR}
> > > > > >> >>
> > > > > >> >> export HBASE_HEAPSIZE=3000
> > > > > >> >>
> > > > > >> >> export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m
> > > > > >> >> -XX:+UseConcMarkSweepGC
> > > > > >> >> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> > > > > >> >> -XX:+CMSIncrementalMode
> > > > > >> >> -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"
> > > > > >> >>
> > > > > >> >> export HBASE_MANAGES_ZK=true
> > > > > >> >>
> > > > > >> >> Hadoop core
> > > > > >> >>
> site===========================================================
> > > > > >> >>
> > > > > >> >> <?xml version="1.0"?>
> > > > > >> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > > >> >>
> > > > > >> >> <!-- Put site-specific property overrides in this file. -->
> > > > > >> >>
> > > > > >> >> <configuration>
> > > > > >> >> <property>
> > > > > >> >>   <name>fs.default.name</name>
> > > > > >> >>   <value>hdfs://box1:9000</value>
> > > > > >> >>   <description>The name of the default file system.  A URI
> > whose
> > > > > >> >>   scheme and authority determine the FileSystem
> implementation.
> > > >  The
> > > > > >> >>   uri's scheme determines the config property
> (fs.SCHEME.impl)
> > > > naming
> > > > > >> >>   the FileSystem implementation class.  The uri's authority
> is
> > > used
> > > > > to
> > > > > >> >>   determine the host, port, etc. for a
> > filesystem.</description>
> > > > > >> >> </property>
> > > > > >> >> <property>
> > > > > >> >>  <name>hadoop.tmp.dir</name>
> > > > > >> >>  <value>/data/hadoop-0.20.0-${user.name}</value>
> > > > > >> >>  <description>A base for other temporary
> > > directories.</description>
> > > > > >> >> </property>
> > > > > >> >> </configuration>
> > > > > >> >>
> > > > > >> >> ==============
> > > > > >> >>
> > > > > >> >> replication is set to 6.
> > > > > >> >>
> > > > > >> >> hadoop env=================
> > > > > >> >>
> > > > > >> >> export HADOOP_HEAPSIZE=3000
> > > > > >> >> export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
> > > > > >> >> $HADOOP_NAMENODE_OPTS"
> > > > > >> >> export
> > > > HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
> > > > > >> >> $HADOOP_SECONDARYNAMENODE_OPTS"
> > > > > >> >> export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
> > > > > >> >> $HADOOP_DATANODE_OPTS"
> > > > > >> >> export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
> > > > > >> >> $HADOOP_BALANCER_OPTS"
> > > > > >> >> export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
> > > > > >> >> $HADOOP_JOBTRACKER_OPTS"
> > > > > >> >>  ==================
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> Very basic setup.  then i start the cluster do simple random
> > Get
> > > > > >> >> operations
> > > > > >> >> on a tall table (~60 M rows):
> > > > > >> >>
> > > > > >> >> {NAME => 'tallTable', FAMILIES => [{NAME => 'family1',
> > > COMPRESSION
> > > > =>
> > > > > >> >> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE =>
> > > '65536',
> > > > > >> >> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
> > > > > >> >>
> > > > > >> >> Is this fairly normal speeds?  I'm unsure if this is a result
> > of
> > > > > having
> > > > > >> a
> > > > > >> >> small cluster?  Please advise...
> > > > > >> >>
> > > > > >> >> stack-3 wrote:
> > > > > >> >> >
> > > > > >> >> > Yeah, seems slow.  In old hbase, it could do 5-10k writes a
> > > > second
> > > > > >> >> going
> > > > > >> >> > by
> > > > > >> >> > performance eval page up on wiki.  SequentialWrite was
> about
> > > same
> > > > > as
> > > > > >> >> > RandomWrite.  Check out the stats on hw up on that page and
> > > > > >> description
> > > > > >> >> of
> > > > > >> >> > how test was set up.  Can you figure where its slow?
> > > > > >> >> >
> > > > > >> >> > St.Ack
> > > > > >> >> >
> > > > > >> >> > On Wed, Aug 12, 2009 at 10:10 AM, llpind <
> > > sonny_heer@hotmail.com
> > > > >
> > > > > >> >> wrote:
> > > > > >> >> >
> > > > > >> >> >>
> > > > > >> >> >> Thanks Stack.
> > > > > >> >> >>
> > > > > >> >> >> I will try mapred with more clients.   I tried it without
> > > mapred
> > > > > >> using
> > > > > >> >> 3
> > > > > >> >> >> clients Random Write operations here was the output:
> > > > > >> >> >>
> > > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation:
> client-0
> > > > Start
> > > > > >> >> >> randomWrite at offset 0 for 1048576 rows
> > > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation:
> client-1
> > > > Start
> > > > > >> >> >> randomWrite at offset 1048576 for 1048576 rows
> > > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation:
> client-2
> > > > Start
> > > > > >> >> >> randomWrite at offset 2097152 for 1048576 rows
> > > > > >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation:
> client-1
> > > > > >> >> >> 1048576/1153427/2097152
> > > > > >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation:
> client-2
> > > > > >> >> >> 2097152/2201997/3145728
> > > > > >> >> >> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation:
> client-0
> > > > > >> >> >> 0/104857/1048576
> > > > > >> >> >> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation:
> client-0
> > > > > >> >> >> 0/209714/1048576
> > > > > >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation:
> client-1
> > > > > >> >> >> 1048576/1258284/2097152
> > > > > >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation:
> client-2
> > > > > >> >> >> 2097152/2306854/3145728
> > > > > >> >> >> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation:
> client-1
> > > > > >> >> >> 1048576/1363141/2097152
> > > > > >> >> >> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation:
> client-0
> > > > > >> >> >> 0/314571/1048576
> > > > > >> >> >> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation:
> client-2
> > > > > >> >> >> 2097152/2411711/3145728
> > > > > >> >> >> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation:
> client-0
> > > > > >> >> >> 0/419428/1048576
> > > > > >> >> >> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation:
> client-1
> > > > > >> >> >> 1048576/1467998/2097152
> > > > > >> >> >> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation:
> client-2
> > > > > >> >> >> 2097152/2516568/3145728
> > > > > >> >> >> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation:
> client-0
> > > > > >> >> >> 0/524285/1048576
> > > > > >> >> >> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation:
> client-2
> > > > > >> >> >> 2097152/2621425/3145728
> > > > > >> >> >> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation:
> client-1
> > > > > >> >> >> 1048576/1572855/2097152
> > > > > >> >> >> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation:
> client-0
> > > > > >> >> >> 0/629142/1048576
> > > > > >> >> >> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation:
> client-2
> > > > > >> >> >> 2097152/2726282/3145728
> > > > > >> >> >> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation:
> client-1
> > > > > >> >> >> 1048576/1677712/2097152
> > > > > >> >> >> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation:
> client-0
> > > > > >> >> >> 0/733999/1048576
> > > > > >> >> >> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation:
> client-2
> > > > > >> >> >> 2097152/2831139/3145728
> > > > > >> >> >> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation:
> client-1
> > > > > >> >> >> 1048576/1782569/2097152
> > > > > >> >> >> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation:
> client-0
> > > > > >> >> >> 0/838856/1048576
> > > > > >> >> >> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation:
> client-2
> > > > > >> >> >> 2097152/2935996/3145728
> > > > > >> >> >> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation:
> client-1
> > > > > >> >> >> 1048576/1887426/2097152
> > > > > >> >> >> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation:
> client-0
> > > > > >> >> >> 0/943713/1048576
> > > > > >> >> >> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation:
> client-2
> > > > > >> >> >> 2097152/3040853/3145728
> > > > > >> >> >> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation:
> client-1
> > > > > >> >> >> 1048576/1992283/2097152
> > > > > >> >> >> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation:
> client-0
> > > > > >> >> >> 0/1048570/1048576
> > > > > >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation:
> client-0
> > > > > >> Finished
> > > > > >> >> >> randomWrite in 2376615ms at offset 0 for 1048576 rows
> > > > > >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation:
> Finished
> > 0
> > > > in
> > > > > >> >> >> 2376615ms
> > > > > >> >> >> writing 1048576 rows
> > > > > >> >> >> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation:
> client-2
> > > > > >> >> >> 2097152/3145710/3145728
> > > > > >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation:
> client-2
> > > > > >> Finished
> > > > > >> >> >> randomWrite in 2623395ms at offset 2097152 for 1048576
> rows
> > > > > >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation:
> Finished
> > 2
> > > > in
> > > > > >> >> >> 2623395ms
> > > > > >> >> >> writing 1048576 rows
> > > > > >> >> >> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation:
> client-1
> > > > > >> >> >> 1048576/2097140/2097152
> > > > > >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation:
> client-1
> > > > > >> Finished
> > > > > >> >> >> randomWrite in 2630199ms at offset 1048576 for 1048576
> rows
> > > > > >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation:
> Finished
> > 1
> > > > in
> > > > > >> >> >> 2630199ms
> > > > > >> >> >> writing 1048576 rows
> > > > > >> >> >>
> > > > > >> >> >>
> > > > > >> >> >>
> > > > > >> >> >> Seems kind of slow for ~3M records.  I have a 4 node
> cluster
> > > up
> > > > at
> > > > > >> the
> > > > > >> >> >> moment.  HMaster & Namenode running on same box.
> > > > > >> >> >> --
> > > > > >> >> >> View this message in context:
> > > > > >> >> >>
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
> > > > > >> >> >> Sent from the HBase User mailing list archive at
> Nabble.com.
> > > > > >> >> >>
> > > > > >> >> >>
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >> >> --
> > > > > >> >> View this message in context:
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
> > > > > >> >> Sent from the HBase User mailing list archive at Nabble.com.
> > > > > >> >>
> > > > > >> >>
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >> --
> > > > > >> View this message in context:
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24955595.html
> > > > > >> Sent from the HBase User mailing list archive at Nabble.com.
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: HBase in a real world application

Posted by Schubert Zhang <zs...@gmail.com>.
We found that there are two issues about the PerformanceEvaluation class.
- Is not match for hadoop-0.20.0.
- The approach to split map is not strict. Need to provide correct
InputSplit and InputFormat classes.

And we have just modified the org.apache.hadoop.hbase.PerformanceEvaluation
for our evaluations. Please get our code at:

http://dl.getdropbox.com/u/24074/code/PerformanceEvaluation.java

Here is our evaluations of 0.20.0 RC1
http://docloud.blogspot.com/2009/08/hbase-0200-performance-evaluation.html

Schubert

On Tue, Aug 18, 2009 at 8:37 AM, Jeff Hammerbacher <ha...@cloudera.com>wrote:

> Thanks guys. For the lazy (e.g. me) and future searchers, here are some
> links. The benchmark is meant to simulate the same performance tests quoted
> in Google's BigTable paper.
>
> * PerformanceEvaluation wiki page:
> http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation
> * PerformanceEvaluation.java:
>
> http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/test/org/apache/hadoop/hbase/PerformanceEvaluation.java?view=co
>
> Thanks,
> Jeff
>
> On Mon, Aug 17, 2009 at 5:09 PM, stack <st...@duboce.net> wrote:
>
> > On Mon, Aug 17, 2009 at 4:54 PM, Jeff Hammerbacher <hammer@cloudera.com
> > >wrote:
> >
> > > Hey Stack,
> > >
> > > I notice that the patch for this issue doesn't include any sort of
> tests
> > > that might have caught this regression. Do you guys have an HBaseBench,
> > > HBaseMix, or similarly named tool for catching performance regressions?
> > >
> >
> > Not as part of our build.  The way its currently done is that near
> release,
> > we run our little PerformanceEvaluation doohickey.  If its way off, crack
> > the profiler.
> >
> > We have been trying to get some of the hadoop allotment of EC2 time so we
> > could set up a regular run up on AWS but no luck so far.
> >
> > Good on your Jeff,
> > St.Ack
> >
> >
> >
> >
> >
> >
> >
> > >
> > > Thanks,
> > > Jeff
> > >
> > > On Mon, Aug 17, 2009 at 4:51 PM, stack <st...@duboce.net> wrote:
> > >
> > > > Our writes were off by a factor of 7 or 8.  Writes should be better
> now
> > > > (HBASE-1771).
> > > > Thanks,
> > > > St.Ack
> > > >
> > > >
> > > > On Thu, Aug 13, 2009 at 4:53 PM, stack <st...@duboce.net> wrote:
> > > >
> > > > > I just tried it.  It seems slow to me writing too.  Let me take a
> > > > look....
> > > > > St.Ack
> > > > >
> > > > >
> > > > > On Thu, Aug 13, 2009 at 10:06 AM, llpind <so...@hotmail.com>
> > > wrote:
> > > > >
> > > > >>
> > > > >> Okay I changed replication to 2.  and removed "-XX:NewSize=6m
> > > > >> -XX:MaxNewSize=6m"
> > > > >>
> > > > >> here is results for randomWrite 3 clients:
> > > > >>
> > > > >>
> > > > >>
> > > > >> RandomWrite =================================================
> > > > >>
> > > > >> hadoop-0.20.0/bin/hadoop jar hbase-0.20.0/hbase-0.20.0-test.jar
> > > > >>  --nomapred
> > > > >> randomWrite 3
> > > > >>
> > > > >>
> > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-0 Start
> > > > >> randomWrite at offset 0 for 1048576 rows
> > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-1 Start
> > > > >> randomWrite at offset 1048576 for 1048576 rows
> > > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-2 Start
> > > > >> randomWrite at offset 2097152 for 1048576 rows
> > > > >> 09/08/13 09:51:47 INFO hbase.PerformanceEvaluation: client-0
> > > > >> 0/104857/1048576
> > > > >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-1
> > > > >> 1048576/1153427/2097152
> > > > >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-2
> > > > >> 2097152/2201997/3145728
> > > > >> 09/08/13 09:52:22 INFO hbase.PerformanceEvaluation: client-1
> > > > >> 1048576/1258284/2097152
> > > > >> 09/08/13 09:52:23 INFO hbase.PerformanceEvaluation: client-0
> > > > >> 0/209714/1048576
> > > > >> 09/08/13 09:52:24 INFO hbase.PerformanceEvaluation: client-2
> > > > >> 2097152/2306854/3145728
> > > > >> 09/08/13 09:52:47 INFO hbase.PerformanceEvaluation: client-1
> > > > >> 1048576/1363141/2097152
> > > > >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-0
> > > > >> 0/314571/1048576
> > > > >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-2
> > > > >> 2097152/2411711/3145728
> > > > >> 09/08/13 09:53:24 INFO hbase.PerformanceEvaluation: client-1
> > > > >> 1048576/1467998/2097152
> > > > >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-0
> > > > >> 0/419428/1048576
> > > > >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-2
> > > > >> 2097152/2516568/3145728
> > > > >> 09/08/13 09:53:48 INFO hbase.PerformanceEvaluation: client-1
> > > > >> 1048576/1572855/2097152
> > > > >> 09/08/13 09:54:08 INFO hbase.PerformanceEvaluation: client-2
> > > > >> 2097152/2621425/3145728
> > > > >> 09/08/13 09:54:10 INFO hbase.PerformanceEvaluation: client-0
> > > > >> 0/524285/1048576
> > > > >> 09/08/13 09:54:40 INFO hbase.PerformanceEvaluation: client-1
> > > > >> 1048576/1677712/2097152
> > > > >> 09/08/13 09:54:49 INFO hbase.PerformanceEvaluation: client-2
> > > > >> 2097152/2726282/3145728
> > > > >> 09/08/13 09:54:52 INFO hbase.PerformanceEvaluation: client-0
> > > > >> 0/629142/1048576
> > > > >> 09/08/13 09:55:57 INFO hbase.PerformanceEvaluation: client-1
> > > > >> 1048576/1782569/2097152
> > > > >> 09/08/13 09:56:21 INFO hbase.PerformanceEvaluation: client-2
> > > > >> 2097152/2831139/3145728
> > > > >> 09/08/13 09:56:41 INFO hbase.PerformanceEvaluation: client-0
> > > > >> 0/733999/1048576
> > > > >> 09/08/13 09:57:23 INFO hbase.PerformanceEvaluation: client-1
> > > > >> 1048576/1887426/2097152
> > > > >> 09/08/13 09:58:40 INFO hbase.PerformanceEvaluation: client-2
> > > > >> 2097152/2935996/3145728
> > > > >> 09/08/13 09:58:54 INFO hbase.PerformanceEvaluation: client-0
> > > > >> 0/838856/1048576
> > > > >> 09/08/13 10:00:29 INFO hbase.PerformanceEvaluation: client-1
> > > > >> 1048576/1992283/2097152
> > > > >> 09/08/13 10:01:01 INFO hbase.PerformanceEvaluation: client-2
> > > > >> 2097152/3040853/3145728
> > > > >> 09/08/13 10:01:24 INFO hbase.PerformanceEvaluation: client-0
> > > > >> 0/943713/1048576
> > > > >> 09/08/13 10:02:36 INFO hbase.PerformanceEvaluation: client-1
> > > > >> 1048576/2097140/2097152
> > > > >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: client-1
> > Finished
> > > > >> randomWrite in 680674ms at offset 1048576 for 1048576 rows
> > > > >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: Finished 1 in
> > > > 680674ms
> > > > >> writing 1048576 rows
> > > > >> 09/08/13 10:03:19 INFO hbase.PerformanceEvaluation: client-2
> > > > >> 2097152/3145710/3145728
> > > > >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: client-2
> > Finished
> > > > >> randomWrite in 723771ms at offset 2097152 for 1048576 rows
> > > > >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: Finished 2 in
> > > > 723771ms
> > > > >> writing 1048576 rows
> > > > >> 09/08/13 10:03:41 INFO hbase.PerformanceEvaluation: client-0
> > > > >> 0/1048570/1048576
> > > > >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: client-0
> > Finished
> > > > >> randomWrite in 746054ms at offset 0 for 1048576 rows
> > > > >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: Finished 0 in
> > > > 746054ms
> > > > >> writing 1048576 rows
> > > > >>
> > > > >>
> > > > >>
> > > > >> ============================================================
> > > > >>
> > > > >> Still pretty slow.  Any other ideas?  I'm running the client from
> > the
> > > > >> master
> > > > >> box, but its not running any regionServers or datanodes.
> > > > >>
> > > > >> stack-3 wrote:
> > > > >> >
> > > > >> > Your config. looks fine.
> > > > >> >
> > > > >> > Only think that gives me pause is:
> > > > >> >
> > > > >> > "-XX:NewSize=6m -XX:MaxNewSize=6m"
> > > > >> >
> > > > >> > Any reason for the above?
> > > > >> >
> > > > >> > If you study your GC logs, lots of pauses?
> > > > >> >
> > > > >> > Oh, and this: replication is set to 6.  Why 6?  Each write must
> > > commit
> > > > >> to
> > > > >> > 6
> > > > >> > datanodes before complete.  In the tests posted on wiki, we
> > > replicate
> > > > to
> > > > >> 3
> > > > >> > nodes.
> > > > >> >
> > > > >> > In end of this message you say you are doing gets?  Numbers you
> > > posted
> > > > >> > were
> > > > >> > for writes?
> > > > >> >
> > > > >> > St.Ack
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Aug 12, 2009 at 1:15 PM, llpind <sonny_heer@hotmail.com
> >
> > > > wrote:
> > > > >> >
> > > > >> >>
> > > > >> >> Not sure why my performance is so slow.  Here is my
> > configuration:
> > > > >> >>
> > > > >> >> box1:
> > > > >> >> 10395 SecondaryNameNode
> > > > >> >> 11628 Jps
> > > > >> >> 10131 NameNode
> > > > >> >> 10638 HQuorumPeer
> > > > >> >> 10705 HMaster
> > > > >> >>
> > > > >> >> box 2-5:
> > > > >> >> 6741 HQuorumPeer
> > > > >> >> 6841 HRegionServer
> > > > >> >> 7881 Jps
> > > > >> >> 6610 DataNode
> > > > >> >>
> > > > >> >>
> > > > >> >> hbase site: =======================
> > > > >> >> <?xml version="1.0"?>
> > > > >> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > >> >> <!--
> > > > >> >> /**
> > > > >> >>  * Copyright 2007 The Apache Software Foundation
> > > > >> >>  *
> > > > >> >>  * Licensed to the Apache Software Foundation (ASF) under one
> > > > >> >>  * or more contributor license agreements.  See the NOTICE file
> > > > >> >>  * distributed with this work for additional information
> > > > >> >>  * regarding copyright ownership.  The ASF licenses this file
> > > > >> >>  * to you under the Apache License, Version 2.0 (the
> > > > >> >>  * "License"); you may not use this file except in compliance
> > > > >> >>  * with the License.  You may obtain a copy of the License at
> > > > >> >>  *
> > > > >> >>  *     http://www.apache.org/licenses/LICENSE-2.0
> > > > >> >>  *
> > > > >> >>  * Unless required by applicable law or agreed to in writing,
> > > > software
> > > > >> >>  * distributed under the License is distributed on an "AS IS"
> > > BASIS,
> > > > >> >>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
> > or
> > > > >> >> implied.
> > > > >> >>  * See the License for the specific language governing
> > permissions
> > > > and
> > > > >> >>  * limitations under the License.
> > > > >> >>  */
> > > > >> >> -->
> > > > >> >> <configuration>
> > > > >> >>  <property>
> > > > >> >>    <name>hbase.rootdir</name>
> > > > >> >>    <value>hdfs://box1:9000/hbase</value>
> > > > >> >>    <description>The directory shared by region servers.
> > > > >> >>    </description>
> > > > >> >>  </property>
> > > > >> >>  <property>
> > > > >> >>    <name>hbase.master.port</name>
> > > > >> >>    <value>60000</value>
> > > > >> >>    <description>The port that the HBase master runs at.
> > > > >> >>    </description>
> > > > >> >>  </property>
> > > > >> >>  <property>
> > > > >> >>    <name>hbase.cluster.distributed</name>
> > > > >> >>    <value>true</value>
> > > > >> >>    <description>The mode the cluster will be in. Possible
> values
> > > are
> > > > >> >>      false: standalone and pseudo-distributed setups with
> managed
> > > > >> >> Zookeeper
> > > > >> >>      true: fully-distributed with unmanaged Zookeeper Quorum
> (see
> > > > >> >> hbase-env.sh)
> > > > >> >>    </description>
> > > > >> >>  </property>
> > > > >> >>  <property>
> > > > >> >>    <name>hbase.regionserver.lease.period</name>
> > > > >> >>    <value>120000</value>
> > > > >> >>    <description>HRegion server lease period in milliseconds.
> > > Default
> > > > is
> > > > >> >>    60 seconds. Clients must report in within this period else
> > they
> > > > are
> > > > >> >>    considered dead.</description>
> > > > >> >>  </property>
> > > > >> >>
> > > > >> >>  <property>
> > > > >> >>      <name>hbase.zookeeper.property.clientPort</name>
> > > > >> >>      <value>2222</value>
> > > > >> >>      <description>Property from ZooKeeper's config zoo.cfg.
> > > > >> >>      The port at which the clients will connect.
> > > > >> >>      </description>
> > > > >> >>  </property>
> > > > >> >>  <property>
> > > > >> >>      <name>hbase.zookeeper.property.dataDir</name>
> > > > >> >>      <value>/home/hadoop/zookeeper</value>
> > > > >> >>  </property>
> > > > >> >>  <property>
> > > > >> >>      <name>hbase.zookeeper.property.syncLimit</name>
> > > > >> >>      <value>5</value>
> > > > >> >>  </property>
> > > > >> >>  <property>
> > > > >> >>      <name>hbase.zookeeper.property.tickTime</name>
> > > > >> >>      <value>2000</value>
> > > > >> >>  </property>
> > > > >> >>  <property>
> > > > >> >>      <name>hbase.zookeeper.property.initLimit</name>
> > > > >> >>      <value>10</value>
> > > > >> >>  </property>
> > > > >> >>  <property>
> > > > >> >>      <name>hbase.zookeeper.quorum</name>
> > > > >> >>      <value>box1,box2,box3,box4</value>
> > > > >> >>      <description>Comma separated list of servers in the
> > ZooKeeper
> > > > >> >> Quorum.
> > > > >> >>      For example,
> > > > >> >> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
> > > > >> >>      By default this is set to localhost for local and
> > > > >> pseudo-distributed
> > > > >> >> modes
> > > > >> >>      of operation. For a fully-distributed setup, this should
> be
> > > set
> > > > to
> > > > >> a
> > > > >> >> full
> > > > >> >>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is
> set
> > > in
> > > > >> >> hbase-env.sh
> > > > >> >>      this is the list of servers which we will start/stop
> > ZooKeeper
> > > > on.
> > > > >> >>      </description>
> > > > >> >>  </property>
> > > > >> >>  <property>
> > > > >> >>    <name>hfile.block.cache.size</name>
> > > > >> >>    <value>.5</value>
> > > > >> >>    <description>text</description>
> > > > >> >>  </property>
> > > > >> >>
> > > > >> >> </configuration>
> > > > >> >>
> > > > >> >>
> > > > >> >> hbase env:====================================================
> > > > >> >>
> > > > >> >> export HBASE_CLASSPATH=${HADOOP_CONF_DIR}
> > > > >> >>
> > > > >> >> export HBASE_HEAPSIZE=3000
> > > > >> >>
> > > > >> >> export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m
> > > > >> >> -XX:+UseConcMarkSweepGC
> > > > >> >> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> > > > >> >> -XX:+CMSIncrementalMode
> > > > >> >> -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"
> > > > >> >>
> > > > >> >> export HBASE_MANAGES_ZK=true
> > > > >> >>
> > > > >> >> Hadoop core
> > > > >> >> site===========================================================
> > > > >> >>
> > > > >> >> <?xml version="1.0"?>
> > > > >> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > >> >>
> > > > >> >> <!-- Put site-specific property overrides in this file. -->
> > > > >> >>
> > > > >> >> <configuration>
> > > > >> >> <property>
> > > > >> >>   <name>fs.default.name</name>
> > > > >> >>   <value>hdfs://box1:9000</value>
> > > > >> >>   <description>The name of the default file system.  A URI
> whose
> > > > >> >>   scheme and authority determine the FileSystem implementation.
> > >  The
> > > > >> >>   uri's scheme determines the config property (fs.SCHEME.impl)
> > > naming
> > > > >> >>   the FileSystem implementation class.  The uri's authority is
> > used
> > > > to
> > > > >> >>   determine the host, port, etc. for a
> filesystem.</description>
> > > > >> >> </property>
> > > > >> >> <property>
> > > > >> >>  <name>hadoop.tmp.dir</name>
> > > > >> >>  <value>/data/hadoop-0.20.0-${user.name}</value>
> > > > >> >>  <description>A base for other temporary
> > directories.</description>
> > > > >> >> </property>
> > > > >> >> </configuration>
> > > > >> >>
> > > > >> >> ==============
> > > > >> >>
> > > > >> >> replication is set to 6.
> > > > >> >>
> > > > >> >> hadoop env=================
> > > > >> >>
> > > > >> >> export HADOOP_HEAPSIZE=3000
> > > > >> >> export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
> > > > >> >> $HADOOP_NAMENODE_OPTS"
> > > > >> >> export
> > > HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
> > > > >> >> $HADOOP_SECONDARYNAMENODE_OPTS"
> > > > >> >> export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
> > > > >> >> $HADOOP_DATANODE_OPTS"
> > > > >> >> export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
> > > > >> >> $HADOOP_BALANCER_OPTS"
> > > > >> >> export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
> > > > >> >> $HADOOP_JOBTRACKER_OPTS"
> > > > >> >>  ==================
> > > > >> >>
> > > > >> >>
> > > > >> >> Very basic setup.  then i start the cluster do simple random
> Get
> > > > >> >> operations
> > > > >> >> on a tall table (~60 M rows):
> > > > >> >>
> > > > >> >> {NAME => 'tallTable', FAMILIES => [{NAME => 'family1',
> > COMPRESSION
> > > =>
> > > > >> >> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE =>
> > '65536',
> > > > >> >> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
> > > > >> >>
> > > > >> >> Is this fairly normal speeds?  I'm unsure if this is a result
> of
> > > > having
> > > > >> a
> > > > >> >> small cluster?  Please advise...
> > > > >> >>
> > > > >> >> stack-3 wrote:
> > > > >> >> >
> > > > >> >> > Yeah, seems slow.  In old hbase, it could do 5-10k writes a
> > > second
> > > > >> >> going
> > > > >> >> > by
> > > > >> >> > performance eval page up on wiki.  SequentialWrite was about
> > same
> > > > as
> > > > >> >> > RandomWrite.  Check out the stats on hw up on that page and
> > > > >> description
> > > > >> >> of
> > > > >> >> > how test was set up.  Can you figure where its slow?
> > > > >> >> >
> > > > >> >> > St.Ack
> > > > >> >> >
> > > > >> >> > On Wed, Aug 12, 2009 at 10:10 AM, llpind <
> > sonny_heer@hotmail.com
> > > >
> > > > >> >> wrote:
> > > > >> >> >
> > > > >> >> >>
> > > > >> >> >> Thanks Stack.
> > > > >> >> >>
> > > > >> >> >> I will try mapred with more clients.   I tried it without
> > mapred
> > > > >> using
> > > > >> >> 3
> > > > >> >> >> clients Random Write operations here was the output:
> > > > >> >> >>
> > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0
> > > Start
> > > > >> >> >> randomWrite at offset 0 for 1048576 rows
> > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1
> > > Start
> > > > >> >> >> randomWrite at offset 1048576 for 1048576 rows
> > > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2
> > > Start
> > > > >> >> >> randomWrite at offset 2097152 for 1048576 rows
> > > > >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
> > > > >> >> >> 1048576/1153427/2097152
> > > > >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
> > > > >> >> >> 2097152/2201997/3145728
> > > > >> >> >> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
> > > > >> >> >> 0/104857/1048576
> > > > >> >> >> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
> > > > >> >> >> 0/209714/1048576
> > > > >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
> > > > >> >> >> 1048576/1258284/2097152
> > > > >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
> > > > >> >> >> 2097152/2306854/3145728
> > > > >> >> >> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
> > > > >> >> >> 1048576/1363141/2097152
> > > > >> >> >> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
> > > > >> >> >> 0/314571/1048576
> > > > >> >> >> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
> > > > >> >> >> 2097152/2411711/3145728
> > > > >> >> >> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
> > > > >> >> >> 0/419428/1048576
> > > > >> >> >> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
> > > > >> >> >> 1048576/1467998/2097152
> > > > >> >> >> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
> > > > >> >> >> 2097152/2516568/3145728
> > > > >> >> >> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
> > > > >> >> >> 0/524285/1048576
> > > > >> >> >> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
> > > > >> >> >> 2097152/2621425/3145728
> > > > >> >> >> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
> > > > >> >> >> 1048576/1572855/2097152
> > > > >> >> >> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
> > > > >> >> >> 0/629142/1048576
> > > > >> >> >> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
> > > > >> >> >> 2097152/2726282/3145728
> > > > >> >> >> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
> > > > >> >> >> 1048576/1677712/2097152
> > > > >> >> >> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
> > > > >> >> >> 0/733999/1048576
> > > > >> >> >> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
> > > > >> >> >> 2097152/2831139/3145728
> > > > >> >> >> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
> > > > >> >> >> 1048576/1782569/2097152
> > > > >> >> >> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
> > > > >> >> >> 0/838856/1048576
> > > > >> >> >> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
> > > > >> >> >> 2097152/2935996/3145728
> > > > >> >> >> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
> > > > >> >> >> 1048576/1887426/2097152
> > > > >> >> >> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
> > > > >> >> >> 0/943713/1048576
> > > > >> >> >> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
> > > > >> >> >> 2097152/3040853/3145728
> > > > >> >> >> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
> > > > >> >> >> 1048576/1992283/2097152
> > > > >> >> >> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
> > > > >> >> >> 0/1048570/1048576
> > > > >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0
> > > > >> Finished
> > > > >> >> >> randomWrite in 2376615ms at offset 0 for 1048576 rows
> > > > >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished
> 0
> > > in
> > > > >> >> >> 2376615ms
> > > > >> >> >> writing 1048576 rows
> > > > >> >> >> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
> > > > >> >> >> 2097152/3145710/3145728
> > > > >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2
> > > > >> Finished
> > > > >> >> >> randomWrite in 2623395ms at offset 2097152 for 1048576 rows
> > > > >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished
> 2
> > > in
> > > > >> >> >> 2623395ms
> > > > >> >> >> writing 1048576 rows
> > > > >> >> >> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
> > > > >> >> >> 1048576/2097140/2097152
> > > > >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1
> > > > >> Finished
> > > > >> >> >> randomWrite in 2630199ms at offset 1048576 for 1048576 rows
> > > > >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished
> 1
> > > in
> > > > >> >> >> 2630199ms
> > > > >> >> >> writing 1048576 rows
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >> Seems kind of slow for ~3M records.  I have a 4 node cluster
> > up
> > > at
> > > > >> the
> > > > >> >> >> moment.  HMaster & Namenode running on same box.
> > > > >> >> >> --
> > > > >> >> >> View this message in context:
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
> > > > >> >> >> Sent from the HBase User mailing list archive at Nabble.com.
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >
> > > > >> >> >
> > > > >> >>
> > > > >> >> --
> > > > >> >> View this message in context:
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
> > > > >> >> Sent from the HBase User mailing list archive at Nabble.com.
> > > > >> >>
> > > > >> >>
> > > > >> >
> > > > >> >
> > > > >>
> > > > >> --
> > > > >> View this message in context:
> > > > >>
> > > >
> > >
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24955595.html
> > > > >> Sent from the HBase User mailing list archive at Nabble.com.
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: HBase in a real world application

Posted by Jeff Hammerbacher <ha...@cloudera.com>.
Thanks guys. For the lazy (e.g. me) and future searchers, here are some
links. The benchmark is meant to simulate the same performance tests quoted
in Google's BigTable paper.

* PerformanceEvaluation wiki page:
http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation
* PerformanceEvaluation.java:
http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/test/org/apache/hadoop/hbase/PerformanceEvaluation.java?view=co

Thanks,
Jeff

On Mon, Aug 17, 2009 at 5:09 PM, stack <st...@duboce.net> wrote:

> On Mon, Aug 17, 2009 at 4:54 PM, Jeff Hammerbacher <hammer@cloudera.com
> >wrote:
>
> > Hey Stack,
> >
> > I notice that the patch for this issue doesn't include any sort of tests
> > that might have caught this regression. Do you guys have an HBaseBench,
> > HBaseMix, or similarly named tool for catching performance regressions?
> >
>
> Not as part of our build.  The way its currently done is that near release,
> we run our little PerformanceEvaluation doohickey.  If its way off, crack
> the profiler.
>
> We have been trying to get some of the hadoop allotment of EC2 time so we
> could set up a regular run up on AWS but no luck so far.
>
> Good on your Jeff,
> St.Ack
>
>
>
>
>
>
>
> >
> > Thanks,
> > Jeff
> >
> > On Mon, Aug 17, 2009 at 4:51 PM, stack <st...@duboce.net> wrote:
> >
> > > Our writes were off by a factor of 7 or 8.  Writes should be better now
> > > (HBASE-1771).
> > > Thanks,
> > > St.Ack
> > >
> > >
> > > On Thu, Aug 13, 2009 at 4:53 PM, stack <st...@duboce.net> wrote:
> > >
> > > > I just tried it.  It seems slow to me writing too.  Let me take a
> > > look....
> > > > St.Ack
> > > >
> > > >
> > > > On Thu, Aug 13, 2009 at 10:06 AM, llpind <so...@hotmail.com>
> > wrote:
> > > >
> > > >>
> > > >> Okay I changed replication to 2.  and removed "-XX:NewSize=6m
> > > >> -XX:MaxNewSize=6m"
> > > >>
> > > >> here is results for randomWrite 3 clients:
> > > >>
> > > >>
> > > >>
> > > >> RandomWrite =================================================
> > > >>
> > > >> hadoop-0.20.0/bin/hadoop jar hbase-0.20.0/hbase-0.20.0-test.jar
> > > >>  --nomapred
> > > >> randomWrite 3
> > > >>
> > > >>
> > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-0 Start
> > > >> randomWrite at offset 0 for 1048576 rows
> > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-1 Start
> > > >> randomWrite at offset 1048576 for 1048576 rows
> > > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-2 Start
> > > >> randomWrite at offset 2097152 for 1048576 rows
> > > >> 09/08/13 09:51:47 INFO hbase.PerformanceEvaluation: client-0
> > > >> 0/104857/1048576
> > > >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-1
> > > >> 1048576/1153427/2097152
> > > >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-2
> > > >> 2097152/2201997/3145728
> > > >> 09/08/13 09:52:22 INFO hbase.PerformanceEvaluation: client-1
> > > >> 1048576/1258284/2097152
> > > >> 09/08/13 09:52:23 INFO hbase.PerformanceEvaluation: client-0
> > > >> 0/209714/1048576
> > > >> 09/08/13 09:52:24 INFO hbase.PerformanceEvaluation: client-2
> > > >> 2097152/2306854/3145728
> > > >> 09/08/13 09:52:47 INFO hbase.PerformanceEvaluation: client-1
> > > >> 1048576/1363141/2097152
> > > >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-0
> > > >> 0/314571/1048576
> > > >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-2
> > > >> 2097152/2411711/3145728
> > > >> 09/08/13 09:53:24 INFO hbase.PerformanceEvaluation: client-1
> > > >> 1048576/1467998/2097152
> > > >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-0
> > > >> 0/419428/1048576
> > > >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-2
> > > >> 2097152/2516568/3145728
> > > >> 09/08/13 09:53:48 INFO hbase.PerformanceEvaluation: client-1
> > > >> 1048576/1572855/2097152
> > > >> 09/08/13 09:54:08 INFO hbase.PerformanceEvaluation: client-2
> > > >> 2097152/2621425/3145728
> > > >> 09/08/13 09:54:10 INFO hbase.PerformanceEvaluation: client-0
> > > >> 0/524285/1048576
> > > >> 09/08/13 09:54:40 INFO hbase.PerformanceEvaluation: client-1
> > > >> 1048576/1677712/2097152
> > > >> 09/08/13 09:54:49 INFO hbase.PerformanceEvaluation: client-2
> > > >> 2097152/2726282/3145728
> > > >> 09/08/13 09:54:52 INFO hbase.PerformanceEvaluation: client-0
> > > >> 0/629142/1048576
> > > >> 09/08/13 09:55:57 INFO hbase.PerformanceEvaluation: client-1
> > > >> 1048576/1782569/2097152
> > > >> 09/08/13 09:56:21 INFO hbase.PerformanceEvaluation: client-2
> > > >> 2097152/2831139/3145728
> > > >> 09/08/13 09:56:41 INFO hbase.PerformanceEvaluation: client-0
> > > >> 0/733999/1048576
> > > >> 09/08/13 09:57:23 INFO hbase.PerformanceEvaluation: client-1
> > > >> 1048576/1887426/2097152
> > > >> 09/08/13 09:58:40 INFO hbase.PerformanceEvaluation: client-2
> > > >> 2097152/2935996/3145728
> > > >> 09/08/13 09:58:54 INFO hbase.PerformanceEvaluation: client-0
> > > >> 0/838856/1048576
> > > >> 09/08/13 10:00:29 INFO hbase.PerformanceEvaluation: client-1
> > > >> 1048576/1992283/2097152
> > > >> 09/08/13 10:01:01 INFO hbase.PerformanceEvaluation: client-2
> > > >> 2097152/3040853/3145728
> > > >> 09/08/13 10:01:24 INFO hbase.PerformanceEvaluation: client-0
> > > >> 0/943713/1048576
> > > >> 09/08/13 10:02:36 INFO hbase.PerformanceEvaluation: client-1
> > > >> 1048576/2097140/2097152
> > > >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: client-1
> Finished
> > > >> randomWrite in 680674ms at offset 1048576 for 1048576 rows
> > > >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: Finished 1 in
> > > 680674ms
> > > >> writing 1048576 rows
> > > >> 09/08/13 10:03:19 INFO hbase.PerformanceEvaluation: client-2
> > > >> 2097152/3145710/3145728
> > > >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: client-2
> Finished
> > > >> randomWrite in 723771ms at offset 2097152 for 1048576 rows
> > > >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: Finished 2 in
> > > 723771ms
> > > >> writing 1048576 rows
> > > >> 09/08/13 10:03:41 INFO hbase.PerformanceEvaluation: client-0
> > > >> 0/1048570/1048576
> > > >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: client-0
> Finished
> > > >> randomWrite in 746054ms at offset 0 for 1048576 rows
> > > >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: Finished 0 in
> > > 746054ms
> > > >> writing 1048576 rows
> > > >>
> > > >>
> > > >>
> > > >> ============================================================
> > > >>
> > > >> Still pretty slow.  Any other ideas?  I'm running the client from
> the
> > > >> master
> > > >> box, but its not running any regionServers or datanodes.
> > > >>
> > > >> stack-3 wrote:
> > > >> >
> > > >> > Your config. looks fine.
> > > >> >
> > > >> > Only think that gives me pause is:
> > > >> >
> > > >> > "-XX:NewSize=6m -XX:MaxNewSize=6m"
> > > >> >
> > > >> > Any reason for the above?
> > > >> >
> > > >> > If you study your GC logs, lots of pauses?
> > > >> >
> > > >> > Oh, and this: replication is set to 6.  Why 6?  Each write must
> > commit
> > > >> to
> > > >> > 6
> > > >> > datanodes before complete.  In the tests posted on wiki, we
> > replicate
> > > to
> > > >> 3
> > > >> > nodes.
> > > >> >
> > > >> > In end of this message you say you are doing gets?  Numbers you
> > posted
> > > >> > were
> > > >> > for writes?
> > > >> >
> > > >> > St.Ack
> > > >> >
> > > >> >
> > > >> > On Wed, Aug 12, 2009 at 1:15 PM, llpind <so...@hotmail.com>
> > > wrote:
> > > >> >
> > > >> >>
> > > >> >> Not sure why my performance is so slow.  Here is my
> configuration:
> > > >> >>
> > > >> >> box1:
> > > >> >> 10395 SecondaryNameNode
> > > >> >> 11628 Jps
> > > >> >> 10131 NameNode
> > > >> >> 10638 HQuorumPeer
> > > >> >> 10705 HMaster
> > > >> >>
> > > >> >> box 2-5:
> > > >> >> 6741 HQuorumPeer
> > > >> >> 6841 HRegionServer
> > > >> >> 7881 Jps
> > > >> >> 6610 DataNode
> > > >> >>
> > > >> >>
> > > >> >> hbase site: =======================
> > > >> >> <?xml version="1.0"?>
> > > >> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > >> >> <!--
> > > >> >> /**
> > > >> >>  * Copyright 2007 The Apache Software Foundation
> > > >> >>  *
> > > >> >>  * Licensed to the Apache Software Foundation (ASF) under one
> > > >> >>  * or more contributor license agreements.  See the NOTICE file
> > > >> >>  * distributed with this work for additional information
> > > >> >>  * regarding copyright ownership.  The ASF licenses this file
> > > >> >>  * to you under the Apache License, Version 2.0 (the
> > > >> >>  * "License"); you may not use this file except in compliance
> > > >> >>  * with the License.  You may obtain a copy of the License at
> > > >> >>  *
> > > >> >>  *     http://www.apache.org/licenses/LICENSE-2.0
> > > >> >>  *
> > > >> >>  * Unless required by applicable law or agreed to in writing,
> > > software
> > > >> >>  * distributed under the License is distributed on an "AS IS"
> > BASIS,
> > > >> >>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
> or
> > > >> >> implied.
> > > >> >>  * See the License for the specific language governing
> permissions
> > > and
> > > >> >>  * limitations under the License.
> > > >> >>  */
> > > >> >> -->
> > > >> >> <configuration>
> > > >> >>  <property>
> > > >> >>    <name>hbase.rootdir</name>
> > > >> >>    <value>hdfs://box1:9000/hbase</value>
> > > >> >>    <description>The directory shared by region servers.
> > > >> >>    </description>
> > > >> >>  </property>
> > > >> >>  <property>
> > > >> >>    <name>hbase.master.port</name>
> > > >> >>    <value>60000</value>
> > > >> >>    <description>The port that the HBase master runs at.
> > > >> >>    </description>
> > > >> >>  </property>
> > > >> >>  <property>
> > > >> >>    <name>hbase.cluster.distributed</name>
> > > >> >>    <value>true</value>
> > > >> >>    <description>The mode the cluster will be in. Possible values
> > are
> > > >> >>      false: standalone and pseudo-distributed setups with managed
> > > >> >> Zookeeper
> > > >> >>      true: fully-distributed with unmanaged Zookeeper Quorum (see
> > > >> >> hbase-env.sh)
> > > >> >>    </description>
> > > >> >>  </property>
> > > >> >>  <property>
> > > >> >>    <name>hbase.regionserver.lease.period</name>
> > > >> >>    <value>120000</value>
> > > >> >>    <description>HRegion server lease period in milliseconds.
> > Default
> > > is
> > > >> >>    60 seconds. Clients must report in within this period else
> they
> > > are
> > > >> >>    considered dead.</description>
> > > >> >>  </property>
> > > >> >>
> > > >> >>  <property>
> > > >> >>      <name>hbase.zookeeper.property.clientPort</name>
> > > >> >>      <value>2222</value>
> > > >> >>      <description>Property from ZooKeeper's config zoo.cfg.
> > > >> >>      The port at which the clients will connect.
> > > >> >>      </description>
> > > >> >>  </property>
> > > >> >>  <property>
> > > >> >>      <name>hbase.zookeeper.property.dataDir</name>
> > > >> >>      <value>/home/hadoop/zookeeper</value>
> > > >> >>  </property>
> > > >> >>  <property>
> > > >> >>      <name>hbase.zookeeper.property.syncLimit</name>
> > > >> >>      <value>5</value>
> > > >> >>  </property>
> > > >> >>  <property>
> > > >> >>      <name>hbase.zookeeper.property.tickTime</name>
> > > >> >>      <value>2000</value>
> > > >> >>  </property>
> > > >> >>  <property>
> > > >> >>      <name>hbase.zookeeper.property.initLimit</name>
> > > >> >>      <value>10</value>
> > > >> >>  </property>
> > > >> >>  <property>
> > > >> >>      <name>hbase.zookeeper.quorum</name>
> > > >> >>      <value>box1,box2,box3,box4</value>
> > > >> >>      <description>Comma separated list of servers in the
> ZooKeeper
> > > >> >> Quorum.
> > > >> >>      For example,
> > > >> >> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
> > > >> >>      By default this is set to localhost for local and
> > > >> pseudo-distributed
> > > >> >> modes
> > > >> >>      of operation. For a fully-distributed setup, this should be
> > set
> > > to
> > > >> a
> > > >> >> full
> > > >> >>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set
> > in
> > > >> >> hbase-env.sh
> > > >> >>      this is the list of servers which we will start/stop
> ZooKeeper
> > > on.
> > > >> >>      </description>
> > > >> >>  </property>
> > > >> >>  <property>
> > > >> >>    <name>hfile.block.cache.size</name>
> > > >> >>    <value>.5</value>
> > > >> >>    <description>text</description>
> > > >> >>  </property>
> > > >> >>
> > > >> >> </configuration>
> > > >> >>
> > > >> >>
> > > >> >> hbase env:====================================================
> > > >> >>
> > > >> >> export HBASE_CLASSPATH=${HADOOP_CONF_DIR}
> > > >> >>
> > > >> >> export HBASE_HEAPSIZE=3000
> > > >> >>
> > > >> >> export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m
> > > >> >> -XX:+UseConcMarkSweepGC
> > > >> >> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> > > >> >> -XX:+CMSIncrementalMode
> > > >> >> -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"
> > > >> >>
> > > >> >> export HBASE_MANAGES_ZK=true
> > > >> >>
> > > >> >> Hadoop core
> > > >> >> site===========================================================
> > > >> >>
> > > >> >> <?xml version="1.0"?>
> > > >> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > >> >>
> > > >> >> <!-- Put site-specific property overrides in this file. -->
> > > >> >>
> > > >> >> <configuration>
> > > >> >> <property>
> > > >> >>   <name>fs.default.name</name>
> > > >> >>   <value>hdfs://box1:9000</value>
> > > >> >>   <description>The name of the default file system.  A URI whose
> > > >> >>   scheme and authority determine the FileSystem implementation.
> >  The
> > > >> >>   uri's scheme determines the config property (fs.SCHEME.impl)
> > naming
> > > >> >>   the FileSystem implementation class.  The uri's authority is
> used
> > > to
> > > >> >>   determine the host, port, etc. for a filesystem.</description>
> > > >> >> </property>
> > > >> >> <property>
> > > >> >>  <name>hadoop.tmp.dir</name>
> > > >> >>  <value>/data/hadoop-0.20.0-${user.name}</value>
> > > >> >>  <description>A base for other temporary
> directories.</description>
> > > >> >> </property>
> > > >> >> </configuration>
> > > >> >>
> > > >> >> ==============
> > > >> >>
> > > >> >> replication is set to 6.
> > > >> >>
> > > >> >> hadoop env=================
> > > >> >>
> > > >> >> export HADOOP_HEAPSIZE=3000
> > > >> >> export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
> > > >> >> $HADOOP_NAMENODE_OPTS"
> > > >> >> export
> > HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
> > > >> >> $HADOOP_SECONDARYNAMENODE_OPTS"
> > > >> >> export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
> > > >> >> $HADOOP_DATANODE_OPTS"
> > > >> >> export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
> > > >> >> $HADOOP_BALANCER_OPTS"
> > > >> >> export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
> > > >> >> $HADOOP_JOBTRACKER_OPTS"
> > > >> >>  ==================
> > > >> >>
> > > >> >>
> > > >> >> Very basic setup.  then i start the cluster do simple random Get
> > > >> >> operations
> > > >> >> on a tall table (~60 M rows):
> > > >> >>
> > > >> >> {NAME => 'tallTable', FAMILIES => [{NAME => 'family1',
> COMPRESSION
> > =>
> > > >> >> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE =>
> '65536',
> > > >> >> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
> > > >> >>
> > > >> >> Is this fairly normal speeds?  I'm unsure if this is a result of
> > > having
> > > >> a
> > > >> >> small cluster?  Please advise...
> > > >> >>
> > > >> >> stack-3 wrote:
> > > >> >> >
> > > >> >> > Yeah, seems slow.  In old hbase, it could do 5-10k writes a
> > second
> > > >> >> going
> > > >> >> > by
> > > >> >> > performance eval page up on wiki.  SequentialWrite was about
> same
> > > as
> > > >> >> > RandomWrite.  Check out the stats on hw up on that page and
> > > >> description
> > > >> >> of
> > > >> >> > how test was set up.  Can you figure where its slow?
> > > >> >> >
> > > >> >> > St.Ack
> > > >> >> >
> > > >> >> > On Wed, Aug 12, 2009 at 10:10 AM, llpind <
> sonny_heer@hotmail.com
> > >
> > > >> >> wrote:
> > > >> >> >
> > > >> >> >>
> > > >> >> >> Thanks Stack.
> > > >> >> >>
> > > >> >> >> I will try mapred with more clients.   I tried it without
> mapred
> > > >> using
> > > >> >> 3
> > > >> >> >> clients Random Write operations here was the output:
> > > >> >> >>
> > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0
> > Start
> > > >> >> >> randomWrite at offset 0 for 1048576 rows
> > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1
> > Start
> > > >> >> >> randomWrite at offset 1048576 for 1048576 rows
> > > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2
> > Start
> > > >> >> >> randomWrite at offset 2097152 for 1048576 rows
> > > >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
> > > >> >> >> 1048576/1153427/2097152
> > > >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
> > > >> >> >> 2097152/2201997/3145728
> > > >> >> >> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
> > > >> >> >> 0/104857/1048576
> > > >> >> >> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
> > > >> >> >> 0/209714/1048576
> > > >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
> > > >> >> >> 1048576/1258284/2097152
> > > >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
> > > >> >> >> 2097152/2306854/3145728
> > > >> >> >> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
> > > >> >> >> 1048576/1363141/2097152
> > > >> >> >> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
> > > >> >> >> 0/314571/1048576
> > > >> >> >> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
> > > >> >> >> 2097152/2411711/3145728
> > > >> >> >> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
> > > >> >> >> 0/419428/1048576
> > > >> >> >> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
> > > >> >> >> 1048576/1467998/2097152
> > > >> >> >> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
> > > >> >> >> 2097152/2516568/3145728
> > > >> >> >> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
> > > >> >> >> 0/524285/1048576
> > > >> >> >> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
> > > >> >> >> 2097152/2621425/3145728
> > > >> >> >> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
> > > >> >> >> 1048576/1572855/2097152
> > > >> >> >> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
> > > >> >> >> 0/629142/1048576
> > > >> >> >> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
> > > >> >> >> 2097152/2726282/3145728
> > > >> >> >> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
> > > >> >> >> 1048576/1677712/2097152
> > > >> >> >> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
> > > >> >> >> 0/733999/1048576
> > > >> >> >> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
> > > >> >> >> 2097152/2831139/3145728
> > > >> >> >> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
> > > >> >> >> 1048576/1782569/2097152
> > > >> >> >> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
> > > >> >> >> 0/838856/1048576
> > > >> >> >> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
> > > >> >> >> 2097152/2935996/3145728
> > > >> >> >> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
> > > >> >> >> 1048576/1887426/2097152
> > > >> >> >> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
> > > >> >> >> 0/943713/1048576
> > > >> >> >> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
> > > >> >> >> 2097152/3040853/3145728
> > > >> >> >> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
> > > >> >> >> 1048576/1992283/2097152
> > > >> >> >> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
> > > >> >> >> 0/1048570/1048576
> > > >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0
> > > >> Finished
> > > >> >> >> randomWrite in 2376615ms at offset 0 for 1048576 rows
> > > >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished 0
> > in
> > > >> >> >> 2376615ms
> > > >> >> >> writing 1048576 rows
> > > >> >> >> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
> > > >> >> >> 2097152/3145710/3145728
> > > >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2
> > > >> Finished
> > > >> >> >> randomWrite in 2623395ms at offset 2097152 for 1048576 rows
> > > >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished 2
> > in
> > > >> >> >> 2623395ms
> > > >> >> >> writing 1048576 rows
> > > >> >> >> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
> > > >> >> >> 1048576/2097140/2097152
> > > >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1
> > > >> Finished
> > > >> >> >> randomWrite in 2630199ms at offset 1048576 for 1048576 rows
> > > >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished 1
> > in
> > > >> >> >> 2630199ms
> > > >> >> >> writing 1048576 rows
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >> Seems kind of slow for ~3M records.  I have a 4 node cluster
> up
> > at
> > > >> the
> > > >> >> >> moment.  HMaster & Namenode running on same box.
> > > >> >> >> --
> > > >> >> >> View this message in context:
> > > >> >> >>
> > > >> >>
> > > >>
> > >
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
> > > >> >> >> Sent from the HBase User mailing list archive at Nabble.com.
> > > >> >> >>
> > > >> >> >>
> > > >> >> >
> > > >> >> >
> > > >> >>
> > > >> >> --
> > > >> >> View this message in context:
> > > >> >>
> > > >>
> > >
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
> > > >> >> Sent from the HBase User mailing list archive at Nabble.com.
> > > >> >>
> > > >> >>
> > > >> >
> > > >> >
> > > >>
> > > >> --
> > > >> View this message in context:
> > > >>
> > >
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24955595.html
> > > >> Sent from the HBase User mailing list archive at Nabble.com.
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: HBase in a real world application

Posted by stack <st...@duboce.net>.
On Mon, Aug 17, 2009 at 4:54 PM, Jeff Hammerbacher <ha...@cloudera.com>wrote:

> Hey Stack,
>
> I notice that the patch for this issue doesn't include any sort of tests
> that might have caught this regression. Do you guys have an HBaseBench,
> HBaseMix, or similarly named tool for catching performance regressions?
>

Not as part of our build.  The way its currently done is that near release,
we run our little PerformanceEvaluation doohickey.  If its way off, crack
the profiler.

We have been trying to get some of the hadoop allotment of EC2 time so we
could set up a regular run up on AWS but no luck so far.

Good on your Jeff,
St.Ack







>
> Thanks,
> Jeff
>
> On Mon, Aug 17, 2009 at 4:51 PM, stack <st...@duboce.net> wrote:
>
> > Our writes were off by a factor of 7 or 8.  Writes should be better now
> > (HBASE-1771).
> > Thanks,
> > St.Ack
> >
> >
> > On Thu, Aug 13, 2009 at 4:53 PM, stack <st...@duboce.net> wrote:
> >
> > > I just tried it.  It seems slow to me writing too.  Let me take a
> > look....
> > > St.Ack
> > >
> > >
> > > On Thu, Aug 13, 2009 at 10:06 AM, llpind <so...@hotmail.com>
> wrote:
> > >
> > >>
> > >> Okay I changed replication to 2.  and removed "-XX:NewSize=6m
> > >> -XX:MaxNewSize=6m"
> > >>
> > >> here is results for randomWrite 3 clients:
> > >>
> > >>
> > >>
> > >> RandomWrite =================================================
> > >>
> > >> hadoop-0.20.0/bin/hadoop jar hbase-0.20.0/hbase-0.20.0-test.jar
> > >>  --nomapred
> > >> randomWrite 3
> > >>
> > >>
> > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-0 Start
> > >> randomWrite at offset 0 for 1048576 rows
> > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-1 Start
> > >> randomWrite at offset 1048576 for 1048576 rows
> > >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-2 Start
> > >> randomWrite at offset 2097152 for 1048576 rows
> > >> 09/08/13 09:51:47 INFO hbase.PerformanceEvaluation: client-0
> > >> 0/104857/1048576
> > >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-1
> > >> 1048576/1153427/2097152
> > >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-2
> > >> 2097152/2201997/3145728
> > >> 09/08/13 09:52:22 INFO hbase.PerformanceEvaluation: client-1
> > >> 1048576/1258284/2097152
> > >> 09/08/13 09:52:23 INFO hbase.PerformanceEvaluation: client-0
> > >> 0/209714/1048576
> > >> 09/08/13 09:52:24 INFO hbase.PerformanceEvaluation: client-2
> > >> 2097152/2306854/3145728
> > >> 09/08/13 09:52:47 INFO hbase.PerformanceEvaluation: client-1
> > >> 1048576/1363141/2097152
> > >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-0
> > >> 0/314571/1048576
> > >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-2
> > >> 2097152/2411711/3145728
> > >> 09/08/13 09:53:24 INFO hbase.PerformanceEvaluation: client-1
> > >> 1048576/1467998/2097152
> > >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-0
> > >> 0/419428/1048576
> > >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-2
> > >> 2097152/2516568/3145728
> > >> 09/08/13 09:53:48 INFO hbase.PerformanceEvaluation: client-1
> > >> 1048576/1572855/2097152
> > >> 09/08/13 09:54:08 INFO hbase.PerformanceEvaluation: client-2
> > >> 2097152/2621425/3145728
> > >> 09/08/13 09:54:10 INFO hbase.PerformanceEvaluation: client-0
> > >> 0/524285/1048576
> > >> 09/08/13 09:54:40 INFO hbase.PerformanceEvaluation: client-1
> > >> 1048576/1677712/2097152
> > >> 09/08/13 09:54:49 INFO hbase.PerformanceEvaluation: client-2
> > >> 2097152/2726282/3145728
> > >> 09/08/13 09:54:52 INFO hbase.PerformanceEvaluation: client-0
> > >> 0/629142/1048576
> > >> 09/08/13 09:55:57 INFO hbase.PerformanceEvaluation: client-1
> > >> 1048576/1782569/2097152
> > >> 09/08/13 09:56:21 INFO hbase.PerformanceEvaluation: client-2
> > >> 2097152/2831139/3145728
> > >> 09/08/13 09:56:41 INFO hbase.PerformanceEvaluation: client-0
> > >> 0/733999/1048576
> > >> 09/08/13 09:57:23 INFO hbase.PerformanceEvaluation: client-1
> > >> 1048576/1887426/2097152
> > >> 09/08/13 09:58:40 INFO hbase.PerformanceEvaluation: client-2
> > >> 2097152/2935996/3145728
> > >> 09/08/13 09:58:54 INFO hbase.PerformanceEvaluation: client-0
> > >> 0/838856/1048576
> > >> 09/08/13 10:00:29 INFO hbase.PerformanceEvaluation: client-1
> > >> 1048576/1992283/2097152
> > >> 09/08/13 10:01:01 INFO hbase.PerformanceEvaluation: client-2
> > >> 2097152/3040853/3145728
> > >> 09/08/13 10:01:24 INFO hbase.PerformanceEvaluation: client-0
> > >> 0/943713/1048576
> > >> 09/08/13 10:02:36 INFO hbase.PerformanceEvaluation: client-1
> > >> 1048576/2097140/2097152
> > >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: client-1 Finished
> > >> randomWrite in 680674ms at offset 1048576 for 1048576 rows
> > >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: Finished 1 in
> > 680674ms
> > >> writing 1048576 rows
> > >> 09/08/13 10:03:19 INFO hbase.PerformanceEvaluation: client-2
> > >> 2097152/3145710/3145728
> > >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: client-2 Finished
> > >> randomWrite in 723771ms at offset 2097152 for 1048576 rows
> > >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: Finished 2 in
> > 723771ms
> > >> writing 1048576 rows
> > >> 09/08/13 10:03:41 INFO hbase.PerformanceEvaluation: client-0
> > >> 0/1048570/1048576
> > >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: client-0 Finished
> > >> randomWrite in 746054ms at offset 0 for 1048576 rows
> > >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: Finished 0 in
> > 746054ms
> > >> writing 1048576 rows
> > >>
> > >>
> > >>
> > >> ============================================================
> > >>
> > >> Still pretty slow.  Any other ideas?  I'm running the client from the
> > >> master
> > >> box, but its not running any regionServers or datanodes.
> > >>
> > >> stack-3 wrote:
> > >> >
> > >> > Your config. looks fine.
> > >> >
> > >> > Only think that gives me pause is:
> > >> >
> > >> > "-XX:NewSize=6m -XX:MaxNewSize=6m"
> > >> >
> > >> > Any reason for the above?
> > >> >
> > >> > If you study your GC logs, lots of pauses?
> > >> >
> > >> > Oh, and this: replication is set to 6.  Why 6?  Each write must
> commit
> > >> to
> > >> > 6
> > >> > datanodes before complete.  In the tests posted on wiki, we
> replicate
> > to
> > >> 3
> > >> > nodes.
> > >> >
> > >> > In end of this message you say you are doing gets?  Numbers you
> posted
> > >> > were
> > >> > for writes?
> > >> >
> > >> > St.Ack
> > >> >
> > >> >
> > >> > On Wed, Aug 12, 2009 at 1:15 PM, llpind <so...@hotmail.com>
> > wrote:
> > >> >
> > >> >>
> > >> >> Not sure why my performance is so slow.  Here is my configuration:
> > >> >>
> > >> >> box1:
> > >> >> 10395 SecondaryNameNode
> > >> >> 11628 Jps
> > >> >> 10131 NameNode
> > >> >> 10638 HQuorumPeer
> > >> >> 10705 HMaster
> > >> >>
> > >> >> box 2-5:
> > >> >> 6741 HQuorumPeer
> > >> >> 6841 HRegionServer
> > >> >> 7881 Jps
> > >> >> 6610 DataNode
> > >> >>
> > >> >>
> > >> >> hbase site: =======================
> > >> >> <?xml version="1.0"?>
> > >> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > >> >> <!--
> > >> >> /**
> > >> >>  * Copyright 2007 The Apache Software Foundation
> > >> >>  *
> > >> >>  * Licensed to the Apache Software Foundation (ASF) under one
> > >> >>  * or more contributor license agreements.  See the NOTICE file
> > >> >>  * distributed with this work for additional information
> > >> >>  * regarding copyright ownership.  The ASF licenses this file
> > >> >>  * to you under the Apache License, Version 2.0 (the
> > >> >>  * "License"); you may not use this file except in compliance
> > >> >>  * with the License.  You may obtain a copy of the License at
> > >> >>  *
> > >> >>  *     http://www.apache.org/licenses/LICENSE-2.0
> > >> >>  *
> > >> >>  * Unless required by applicable law or agreed to in writing,
> > software
> > >> >>  * distributed under the License is distributed on an "AS IS"
> BASIS,
> > >> >>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> > >> >> implied.
> > >> >>  * See the License for the specific language governing permissions
> > and
> > >> >>  * limitations under the License.
> > >> >>  */
> > >> >> -->
> > >> >> <configuration>
> > >> >>  <property>
> > >> >>    <name>hbase.rootdir</name>
> > >> >>    <value>hdfs://box1:9000/hbase</value>
> > >> >>    <description>The directory shared by region servers.
> > >> >>    </description>
> > >> >>  </property>
> > >> >>  <property>
> > >> >>    <name>hbase.master.port</name>
> > >> >>    <value>60000</value>
> > >> >>    <description>The port that the HBase master runs at.
> > >> >>    </description>
> > >> >>  </property>
> > >> >>  <property>
> > >> >>    <name>hbase.cluster.distributed</name>
> > >> >>    <value>true</value>
> > >> >>    <description>The mode the cluster will be in. Possible values
> are
> > >> >>      false: standalone and pseudo-distributed setups with managed
> > >> >> Zookeeper
> > >> >>      true: fully-distributed with unmanaged Zookeeper Quorum (see
> > >> >> hbase-env.sh)
> > >> >>    </description>
> > >> >>  </property>
> > >> >>  <property>
> > >> >>    <name>hbase.regionserver.lease.period</name>
> > >> >>    <value>120000</value>
> > >> >>    <description>HRegion server lease period in milliseconds.
> Default
> > is
> > >> >>    60 seconds. Clients must report in within this period else they
> > are
> > >> >>    considered dead.</description>
> > >> >>  </property>
> > >> >>
> > >> >>  <property>
> > >> >>      <name>hbase.zookeeper.property.clientPort</name>
> > >> >>      <value>2222</value>
> > >> >>      <description>Property from ZooKeeper's config zoo.cfg.
> > >> >>      The port at which the clients will connect.
> > >> >>      </description>
> > >> >>  </property>
> > >> >>  <property>
> > >> >>      <name>hbase.zookeeper.property.dataDir</name>
> > >> >>      <value>/home/hadoop/zookeeper</value>
> > >> >>  </property>
> > >> >>  <property>
> > >> >>      <name>hbase.zookeeper.property.syncLimit</name>
> > >> >>      <value>5</value>
> > >> >>  </property>
> > >> >>  <property>
> > >> >>      <name>hbase.zookeeper.property.tickTime</name>
> > >> >>      <value>2000</value>
> > >> >>  </property>
> > >> >>  <property>
> > >> >>      <name>hbase.zookeeper.property.initLimit</name>
> > >> >>      <value>10</value>
> > >> >>  </property>
> > >> >>  <property>
> > >> >>      <name>hbase.zookeeper.quorum</name>
> > >> >>      <value>box1,box2,box3,box4</value>
> > >> >>      <description>Comma separated list of servers in the ZooKeeper
> > >> >> Quorum.
> > >> >>      For example,
> > >> >> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
> > >> >>      By default this is set to localhost for local and
> > >> pseudo-distributed
> > >> >> modes
> > >> >>      of operation. For a fully-distributed setup, this should be
> set
> > to
> > >> a
> > >> >> full
> > >> >>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set
> in
> > >> >> hbase-env.sh
> > >> >>      this is the list of servers which we will start/stop ZooKeeper
> > on.
> > >> >>      </description>
> > >> >>  </property>
> > >> >>  <property>
> > >> >>    <name>hfile.block.cache.size</name>
> > >> >>    <value>.5</value>
> > >> >>    <description>text</description>
> > >> >>  </property>
> > >> >>
> > >> >> </configuration>
> > >> >>
> > >> >>
> > >> >> hbase env:====================================================
> > >> >>
> > >> >> export HBASE_CLASSPATH=${HADOOP_CONF_DIR}
> > >> >>
> > >> >> export HBASE_HEAPSIZE=3000
> > >> >>
> > >> >> export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m
> > >> >> -XX:+UseConcMarkSweepGC
> > >> >> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> > >> >> -XX:+CMSIncrementalMode
> > >> >> -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"
> > >> >>
> > >> >> export HBASE_MANAGES_ZK=true
> > >> >>
> > >> >> Hadoop core
> > >> >> site===========================================================
> > >> >>
> > >> >> <?xml version="1.0"?>
> > >> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > >> >>
> > >> >> <!-- Put site-specific property overrides in this file. -->
> > >> >>
> > >> >> <configuration>
> > >> >> <property>
> > >> >>   <name>fs.default.name</name>
> > >> >>   <value>hdfs://box1:9000</value>
> > >> >>   <description>The name of the default file system.  A URI whose
> > >> >>   scheme and authority determine the FileSystem implementation.
>  The
> > >> >>   uri's scheme determines the config property (fs.SCHEME.impl)
> naming
> > >> >>   the FileSystem implementation class.  The uri's authority is used
> > to
> > >> >>   determine the host, port, etc. for a filesystem.</description>
> > >> >> </property>
> > >> >> <property>
> > >> >>  <name>hadoop.tmp.dir</name>
> > >> >>  <value>/data/hadoop-0.20.0-${user.name}</value>
> > >> >>  <description>A base for other temporary directories.</description>
> > >> >> </property>
> > >> >> </configuration>
> > >> >>
> > >> >> ==============
> > >> >>
> > >> >> replication is set to 6.
> > >> >>
> > >> >> hadoop env=================
> > >> >>
> > >> >> export HADOOP_HEAPSIZE=3000
> > >> >> export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
> > >> >> $HADOOP_NAMENODE_OPTS"
> > >> >> export
> HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
> > >> >> $HADOOP_SECONDARYNAMENODE_OPTS"
> > >> >> export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
> > >> >> $HADOOP_DATANODE_OPTS"
> > >> >> export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
> > >> >> $HADOOP_BALANCER_OPTS"
> > >> >> export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
> > >> >> $HADOOP_JOBTRACKER_OPTS"
> > >> >>  ==================
> > >> >>
> > >> >>
> > >> >> Very basic setup.  then i start the cluster do simple random Get
> > >> >> operations
> > >> >> on a tall table (~60 M rows):
> > >> >>
> > >> >> {NAME => 'tallTable', FAMILIES => [{NAME => 'family1', COMPRESSION
> =>
> > >> >> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
> > >> >> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
> > >> >>
> > >> >> Is this fairly normal speeds?  I'm unsure if this is a result of
> > having
> > >> a
> > >> >> small cluster?  Please advise...
> > >> >>
> > >> >> stack-3 wrote:
> > >> >> >
> > >> >> > Yeah, seems slow.  In old hbase, it could do 5-10k writes a
> second
> > >> >> going
> > >> >> > by
> > >> >> > performance eval page up on wiki.  SequentialWrite was about same
> > as
> > >> >> > RandomWrite.  Check out the stats on hw up on that page and
> > >> description
> > >> >> of
> > >> >> > how test was set up.  Can you figure where its slow?
> > >> >> >
> > >> >> > St.Ack
> > >> >> >
> > >> >> > On Wed, Aug 12, 2009 at 10:10 AM, llpind <sonny_heer@hotmail.com
> >
> > >> >> wrote:
> > >> >> >
> > >> >> >>
> > >> >> >> Thanks Stack.
> > >> >> >>
> > >> >> >> I will try mapred with more clients.   I tried it without mapred
> > >> using
> > >> >> 3
> > >> >> >> clients Random Write operations here was the output:
> > >> >> >>
> > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0
> Start
> > >> >> >> randomWrite at offset 0 for 1048576 rows
> > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1
> Start
> > >> >> >> randomWrite at offset 1048576 for 1048576 rows
> > >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2
> Start
> > >> >> >> randomWrite at offset 2097152 for 1048576 rows
> > >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
> > >> >> >> 1048576/1153427/2097152
> > >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
> > >> >> >> 2097152/2201997/3145728
> > >> >> >> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
> > >> >> >> 0/104857/1048576
> > >> >> >> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
> > >> >> >> 0/209714/1048576
> > >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
> > >> >> >> 1048576/1258284/2097152
> > >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
> > >> >> >> 2097152/2306854/3145728
> > >> >> >> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
> > >> >> >> 1048576/1363141/2097152
> > >> >> >> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
> > >> >> >> 0/314571/1048576
> > >> >> >> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
> > >> >> >> 2097152/2411711/3145728
> > >> >> >> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
> > >> >> >> 0/419428/1048576
> > >> >> >> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
> > >> >> >> 1048576/1467998/2097152
> > >> >> >> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
> > >> >> >> 2097152/2516568/3145728
> > >> >> >> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
> > >> >> >> 0/524285/1048576
> > >> >> >> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
> > >> >> >> 2097152/2621425/3145728
> > >> >> >> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
> > >> >> >> 1048576/1572855/2097152
> > >> >> >> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
> > >> >> >> 0/629142/1048576
> > >> >> >> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
> > >> >> >> 2097152/2726282/3145728
> > >> >> >> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
> > >> >> >> 1048576/1677712/2097152
> > >> >> >> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
> > >> >> >> 0/733999/1048576
> > >> >> >> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
> > >> >> >> 2097152/2831139/3145728
> > >> >> >> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
> > >> >> >> 1048576/1782569/2097152
> > >> >> >> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
> > >> >> >> 0/838856/1048576
> > >> >> >> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
> > >> >> >> 2097152/2935996/3145728
> > >> >> >> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
> > >> >> >> 1048576/1887426/2097152
> > >> >> >> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
> > >> >> >> 0/943713/1048576
> > >> >> >> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
> > >> >> >> 2097152/3040853/3145728
> > >> >> >> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
> > >> >> >> 1048576/1992283/2097152
> > >> >> >> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
> > >> >> >> 0/1048570/1048576
> > >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0
> > >> Finished
> > >> >> >> randomWrite in 2376615ms at offset 0 for 1048576 rows
> > >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished 0
> in
> > >> >> >> 2376615ms
> > >> >> >> writing 1048576 rows
> > >> >> >> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
> > >> >> >> 2097152/3145710/3145728
> > >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2
> > >> Finished
> > >> >> >> randomWrite in 2623395ms at offset 2097152 for 1048576 rows
> > >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished 2
> in
> > >> >> >> 2623395ms
> > >> >> >> writing 1048576 rows
> > >> >> >> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
> > >> >> >> 1048576/2097140/2097152
> > >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1
> > >> Finished
> > >> >> >> randomWrite in 2630199ms at offset 1048576 for 1048576 rows
> > >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished 1
> in
> > >> >> >> 2630199ms
> > >> >> >> writing 1048576 rows
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> Seems kind of slow for ~3M records.  I have a 4 node cluster up
> at
> > >> the
> > >> >> >> moment.  HMaster & Namenode running on same box.
> > >> >> >> --
> > >> >> >> View this message in context:
> > >> >> >>
> > >> >>
> > >>
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
> > >> >> >> Sent from the HBase User mailing list archive at Nabble.com.
> > >> >> >>
> > >> >> >>
> > >> >> >
> > >> >> >
> > >> >>
> > >> >> --
> > >> >> View this message in context:
> > >> >>
> > >>
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
> > >> >> Sent from the HBase User mailing list archive at Nabble.com.
> > >> >>
> > >> >>
> > >> >
> > >> >
> > >>
> > >> --
> > >> View this message in context:
> > >>
> >
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24955595.html
> > >> Sent from the HBase User mailing list archive at Nabble.com.
> > >>
> > >>
> > >
> >
>

Re: HBase in a real world application

Posted by Jonathan Gray <jl...@streamy.com>.
That would be PerformanceEvaluation :)

Jeff Hammerbacher wrote:
> Hey Stack,
> 
> I notice that the patch for this issue doesn't include any sort of tests
> that might have caught this regression. Do you guys have an HBaseBench,
> HBaseMix, or similarly named tool for catching performance regressions?
> 
> Thanks,
> Jeff
> 
> On Mon, Aug 17, 2009 at 4:51 PM, stack <st...@duboce.net> wrote:
> 
>> Our writes were off by a factor of 7 or 8.  Writes should be better now
>> (HBASE-1771).
>> Thanks,
>> St.Ack
>>
>>
>> On Thu, Aug 13, 2009 at 4:53 PM, stack <st...@duboce.net> wrote:
>>
>>> I just tried it.  It seems slow to me writing too.  Let me take a
>> look....
>>> St.Ack
>>>
>>>
>>> On Thu, Aug 13, 2009 at 10:06 AM, llpind <so...@hotmail.com> wrote:
>>>
>>>> Okay I changed replication to 2.  and removed "-XX:NewSize=6m
>>>> -XX:MaxNewSize=6m"
>>>>
>>>> here is results for randomWrite 3 clients:
>>>>
>>>>
>>>>
>>>> RandomWrite =================================================
>>>>
>>>> hadoop-0.20.0/bin/hadoop jar hbase-0.20.0/hbase-0.20.0-test.jar
>>>>  --nomapred
>>>> randomWrite 3
>>>>
>>>>
>>>> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-0 Start
>>>> randomWrite at offset 0 for 1048576 rows
>>>> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-1 Start
>>>> randomWrite at offset 1048576 for 1048576 rows
>>>> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-2 Start
>>>> randomWrite at offset 2097152 for 1048576 rows
>>>> 09/08/13 09:51:47 INFO hbase.PerformanceEvaluation: client-0
>>>> 0/104857/1048576
>>>> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-1
>>>> 1048576/1153427/2097152
>>>> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-2
>>>> 2097152/2201997/3145728
>>>> 09/08/13 09:52:22 INFO hbase.PerformanceEvaluation: client-1
>>>> 1048576/1258284/2097152
>>>> 09/08/13 09:52:23 INFO hbase.PerformanceEvaluation: client-0
>>>> 0/209714/1048576
>>>> 09/08/13 09:52:24 INFO hbase.PerformanceEvaluation: client-2
>>>> 2097152/2306854/3145728
>>>> 09/08/13 09:52:47 INFO hbase.PerformanceEvaluation: client-1
>>>> 1048576/1363141/2097152
>>>> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-0
>>>> 0/314571/1048576
>>>> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-2
>>>> 2097152/2411711/3145728
>>>> 09/08/13 09:53:24 INFO hbase.PerformanceEvaluation: client-1
>>>> 1048576/1467998/2097152
>>>> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-0
>>>> 0/419428/1048576
>>>> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-2
>>>> 2097152/2516568/3145728
>>>> 09/08/13 09:53:48 INFO hbase.PerformanceEvaluation: client-1
>>>> 1048576/1572855/2097152
>>>> 09/08/13 09:54:08 INFO hbase.PerformanceEvaluation: client-2
>>>> 2097152/2621425/3145728
>>>> 09/08/13 09:54:10 INFO hbase.PerformanceEvaluation: client-0
>>>> 0/524285/1048576
>>>> 09/08/13 09:54:40 INFO hbase.PerformanceEvaluation: client-1
>>>> 1048576/1677712/2097152
>>>> 09/08/13 09:54:49 INFO hbase.PerformanceEvaluation: client-2
>>>> 2097152/2726282/3145728
>>>> 09/08/13 09:54:52 INFO hbase.PerformanceEvaluation: client-0
>>>> 0/629142/1048576
>>>> 09/08/13 09:55:57 INFO hbase.PerformanceEvaluation: client-1
>>>> 1048576/1782569/2097152
>>>> 09/08/13 09:56:21 INFO hbase.PerformanceEvaluation: client-2
>>>> 2097152/2831139/3145728
>>>> 09/08/13 09:56:41 INFO hbase.PerformanceEvaluation: client-0
>>>> 0/733999/1048576
>>>> 09/08/13 09:57:23 INFO hbase.PerformanceEvaluation: client-1
>>>> 1048576/1887426/2097152
>>>> 09/08/13 09:58:40 INFO hbase.PerformanceEvaluation: client-2
>>>> 2097152/2935996/3145728
>>>> 09/08/13 09:58:54 INFO hbase.PerformanceEvaluation: client-0
>>>> 0/838856/1048576
>>>> 09/08/13 10:00:29 INFO hbase.PerformanceEvaluation: client-1
>>>> 1048576/1992283/2097152
>>>> 09/08/13 10:01:01 INFO hbase.PerformanceEvaluation: client-2
>>>> 2097152/3040853/3145728
>>>> 09/08/13 10:01:24 INFO hbase.PerformanceEvaluation: client-0
>>>> 0/943713/1048576
>>>> 09/08/13 10:02:36 INFO hbase.PerformanceEvaluation: client-1
>>>> 1048576/2097140/2097152
>>>> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: client-1 Finished
>>>> randomWrite in 680674ms at offset 1048576 for 1048576 rows
>>>> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: Finished 1 in
>> 680674ms
>>>> writing 1048576 rows
>>>> 09/08/13 10:03:19 INFO hbase.PerformanceEvaluation: client-2
>>>> 2097152/3145710/3145728
>>>> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: client-2 Finished
>>>> randomWrite in 723771ms at offset 2097152 for 1048576 rows
>>>> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: Finished 2 in
>> 723771ms
>>>> writing 1048576 rows
>>>> 09/08/13 10:03:41 INFO hbase.PerformanceEvaluation: client-0
>>>> 0/1048570/1048576
>>>> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: client-0 Finished
>>>> randomWrite in 746054ms at offset 0 for 1048576 rows
>>>> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: Finished 0 in
>> 746054ms
>>>> writing 1048576 rows
>>>>
>>>>
>>>>
>>>> ============================================================
>>>>
>>>> Still pretty slow.  Any other ideas?  I'm running the client from the
>>>> master
>>>> box, but its not running any regionServers or datanodes.
>>>>
>>>> stack-3 wrote:
>>>>> Your config. looks fine.
>>>>>
>>>>> Only think that gives me pause is:
>>>>>
>>>>> "-XX:NewSize=6m -XX:MaxNewSize=6m"
>>>>>
>>>>> Any reason for the above?
>>>>>
>>>>> If you study your GC logs, lots of pauses?
>>>>>
>>>>> Oh, and this: replication is set to 6.  Why 6?  Each write must commit
>>>> to
>>>>> 6
>>>>> datanodes before complete.  In the tests posted on wiki, we replicate
>> to
>>>> 3
>>>>> nodes.
>>>>>
>>>>> In end of this message you say you are doing gets?  Numbers you posted
>>>>> were
>>>>> for writes?
>>>>>
>>>>> St.Ack
>>>>>
>>>>>
>>>>> On Wed, Aug 12, 2009 at 1:15 PM, llpind <so...@hotmail.com>
>> wrote:
>>>>>> Not sure why my performance is so slow.  Here is my configuration:
>>>>>>
>>>>>> box1:
>>>>>> 10395 SecondaryNameNode
>>>>>> 11628 Jps
>>>>>> 10131 NameNode
>>>>>> 10638 HQuorumPeer
>>>>>> 10705 HMaster
>>>>>>
>>>>>> box 2-5:
>>>>>> 6741 HQuorumPeer
>>>>>> 6841 HRegionServer
>>>>>> 7881 Jps
>>>>>> 6610 DataNode
>>>>>>
>>>>>>
>>>>>> hbase site: =======================
>>>>>> <?xml version="1.0"?>
>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>> <!--
>>>>>> /**
>>>>>>  * Copyright 2007 The Apache Software Foundation
>>>>>>  *
>>>>>>  * Licensed to the Apache Software Foundation (ASF) under one
>>>>>>  * or more contributor license agreements.  See the NOTICE file
>>>>>>  * distributed with this work for additional information
>>>>>>  * regarding copyright ownership.  The ASF licenses this file
>>>>>>  * to you under the Apache License, Version 2.0 (the
>>>>>>  * "License"); you may not use this file except in compliance
>>>>>>  * with the License.  You may obtain a copy of the License at
>>>>>>  *
>>>>>>  *     http://www.apache.org/licenses/LICENSE-2.0
>>>>>>  *
>>>>>>  * Unless required by applicable law or agreed to in writing,
>> software
>>>>>>  * distributed under the License is distributed on an "AS IS" BASIS,
>>>>>>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>>>>>> implied.
>>>>>>  * See the License for the specific language governing permissions
>> and
>>>>>>  * limitations under the License.
>>>>>>  */
>>>>>> -->
>>>>>> <configuration>
>>>>>>  <property>
>>>>>>    <name>hbase.rootdir</name>
>>>>>>    <value>hdfs://box1:9000/hbase</value>
>>>>>>    <description>The directory shared by region servers.
>>>>>>    </description>
>>>>>>  </property>
>>>>>>  <property>
>>>>>>    <name>hbase.master.port</name>
>>>>>>    <value>60000</value>
>>>>>>    <description>The port that the HBase master runs at.
>>>>>>    </description>
>>>>>>  </property>
>>>>>>  <property>
>>>>>>    <name>hbase.cluster.distributed</name>
>>>>>>    <value>true</value>
>>>>>>    <description>The mode the cluster will be in. Possible values are
>>>>>>      false: standalone and pseudo-distributed setups with managed
>>>>>> Zookeeper
>>>>>>      true: fully-distributed with unmanaged Zookeeper Quorum (see
>>>>>> hbase-env.sh)
>>>>>>    </description>
>>>>>>  </property>
>>>>>>  <property>
>>>>>>    <name>hbase.regionserver.lease.period</name>
>>>>>>    <value>120000</value>
>>>>>>    <description>HRegion server lease period in milliseconds. Default
>> is
>>>>>>    60 seconds. Clients must report in within this period else they
>> are
>>>>>>    considered dead.</description>
>>>>>>  </property>
>>>>>>
>>>>>>  <property>
>>>>>>      <name>hbase.zookeeper.property.clientPort</name>
>>>>>>      <value>2222</value>
>>>>>>      <description>Property from ZooKeeper's config zoo.cfg.
>>>>>>      The port at which the clients will connect.
>>>>>>      </description>
>>>>>>  </property>
>>>>>>  <property>
>>>>>>      <name>hbase.zookeeper.property.dataDir</name>
>>>>>>      <value>/home/hadoop/zookeeper</value>
>>>>>>  </property>
>>>>>>  <property>
>>>>>>      <name>hbase.zookeeper.property.syncLimit</name>
>>>>>>      <value>5</value>
>>>>>>  </property>
>>>>>>  <property>
>>>>>>      <name>hbase.zookeeper.property.tickTime</name>
>>>>>>      <value>2000</value>
>>>>>>  </property>
>>>>>>  <property>
>>>>>>      <name>hbase.zookeeper.property.initLimit</name>
>>>>>>      <value>10</value>
>>>>>>  </property>
>>>>>>  <property>
>>>>>>      <name>hbase.zookeeper.quorum</name>
>>>>>>      <value>box1,box2,box3,box4</value>
>>>>>>      <description>Comma separated list of servers in the ZooKeeper
>>>>>> Quorum.
>>>>>>      For example,
>>>>>> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
>>>>>>      By default this is set to localhost for local and
>>>> pseudo-distributed
>>>>>> modes
>>>>>>      of operation. For a fully-distributed setup, this should be set
>> to
>>>> a
>>>>>> full
>>>>>>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
>>>>>> hbase-env.sh
>>>>>>      this is the list of servers which we will start/stop ZooKeeper
>> on.
>>>>>>      </description>
>>>>>>  </property>
>>>>>>  <property>
>>>>>>    <name>hfile.block.cache.size</name>
>>>>>>    <value>.5</value>
>>>>>>    <description>text</description>
>>>>>>  </property>
>>>>>>
>>>>>> </configuration>
>>>>>>
>>>>>>
>>>>>> hbase env:====================================================
>>>>>>
>>>>>> export HBASE_CLASSPATH=${HADOOP_CONF_DIR}
>>>>>>
>>>>>> export HBASE_HEAPSIZE=3000
>>>>>>
>>>>>> export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m
>>>>>> -XX:+UseConcMarkSweepGC
>>>>>> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>>>>>> -XX:+CMSIncrementalMode
>>>>>> -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"
>>>>>>
>>>>>> export HBASE_MANAGES_ZK=true
>>>>>>
>>>>>> Hadoop core
>>>>>> site===========================================================
>>>>>>
>>>>>> <?xml version="1.0"?>
>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>
>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>
>>>>>> <configuration>
>>>>>> <property>
>>>>>>   <name>fs.default.name</name>
>>>>>>   <value>hdfs://box1:9000</value>
>>>>>>   <description>The name of the default file system.  A URI whose
>>>>>>   scheme and authority determine the FileSystem implementation.  The
>>>>>>   uri's scheme determines the config property (fs.SCHEME.impl) naming
>>>>>>   the FileSystem implementation class.  The uri's authority is used
>> to
>>>>>>   determine the host, port, etc. for a filesystem.</description>
>>>>>> </property>
>>>>>> <property>
>>>>>>  <name>hadoop.tmp.dir</name>
>>>>>>  <value>/data/hadoop-0.20.0-${user.name}</value>
>>>>>>  <description>A base for other temporary directories.</description>
>>>>>> </property>
>>>>>> </configuration>
>>>>>>
>>>>>> ==============
>>>>>>
>>>>>> replication is set to 6.
>>>>>>
>>>>>> hadoop env=================
>>>>>>
>>>>>> export HADOOP_HEAPSIZE=3000
>>>>>> export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
>>>>>> $HADOOP_NAMENODE_OPTS"
>>>>>> export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
>>>>>> $HADOOP_SECONDARYNAMENODE_OPTS"
>>>>>> export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
>>>>>> $HADOOP_DATANODE_OPTS"
>>>>>> export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
>>>>>> $HADOOP_BALANCER_OPTS"
>>>>>> export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
>>>>>> $HADOOP_JOBTRACKER_OPTS"
>>>>>>  ==================
>>>>>>
>>>>>>
>>>>>> Very basic setup.  then i start the cluster do simple random Get
>>>>>> operations
>>>>>> on a tall table (~60 M rows):
>>>>>>
>>>>>> {NAME => 'tallTable', FAMILIES => [{NAME => 'family1', COMPRESSION =>
>>>>>> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
>>>>>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
>>>>>>
>>>>>> Is this fairly normal speeds?  I'm unsure if this is a result of
>> having
>>>> a
>>>>>> small cluster?  Please advise...
>>>>>>
>>>>>> stack-3 wrote:
>>>>>>> Yeah, seems slow.  In old hbase, it could do 5-10k writes a second
>>>>>> going
>>>>>>> by
>>>>>>> performance eval page up on wiki.  SequentialWrite was about same
>> as
>>>>>>> RandomWrite.  Check out the stats on hw up on that page and
>>>> description
>>>>>> of
>>>>>>> how test was set up.  Can you figure where its slow?
>>>>>>>
>>>>>>> St.Ack
>>>>>>>
>>>>>>> On Wed, Aug 12, 2009 at 10:10 AM, llpind <so...@hotmail.com>
>>>>>> wrote:
>>>>>>>> Thanks Stack.
>>>>>>>>
>>>>>>>> I will try mapred with more clients.   I tried it without mapred
>>>> using
>>>>>> 3
>>>>>>>> clients Random Write operations here was the output:
>>>>>>>>
>>>>>>>> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0 Start
>>>>>>>> randomWrite at offset 0 for 1048576 rows
>>>>>>>> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1 Start
>>>>>>>> randomWrite at offset 1048576 for 1048576 rows
>>>>>>>> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2 Start
>>>>>>>> randomWrite at offset 2097152 for 1048576 rows
>>>>>>>> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
>>>>>>>> 1048576/1153427/2097152
>>>>>>>> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
>>>>>>>> 2097152/2201997/3145728
>>>>>>>> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
>>>>>>>> 0/104857/1048576
>>>>>>>> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
>>>>>>>> 0/209714/1048576
>>>>>>>> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
>>>>>>>> 1048576/1258284/2097152
>>>>>>>> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
>>>>>>>> 2097152/2306854/3145728
>>>>>>>> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
>>>>>>>> 1048576/1363141/2097152
>>>>>>>> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
>>>>>>>> 0/314571/1048576
>>>>>>>> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
>>>>>>>> 2097152/2411711/3145728
>>>>>>>> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
>>>>>>>> 0/419428/1048576
>>>>>>>> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
>>>>>>>> 1048576/1467998/2097152
>>>>>>>> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
>>>>>>>> 2097152/2516568/3145728
>>>>>>>> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
>>>>>>>> 0/524285/1048576
>>>>>>>> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
>>>>>>>> 2097152/2621425/3145728
>>>>>>>> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
>>>>>>>> 1048576/1572855/2097152
>>>>>>>> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
>>>>>>>> 0/629142/1048576
>>>>>>>> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
>>>>>>>> 2097152/2726282/3145728
>>>>>>>> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
>>>>>>>> 1048576/1677712/2097152
>>>>>>>> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
>>>>>>>> 0/733999/1048576
>>>>>>>> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
>>>>>>>> 2097152/2831139/3145728
>>>>>>>> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
>>>>>>>> 1048576/1782569/2097152
>>>>>>>> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
>>>>>>>> 0/838856/1048576
>>>>>>>> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
>>>>>>>> 2097152/2935996/3145728
>>>>>>>> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
>>>>>>>> 1048576/1887426/2097152
>>>>>>>> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
>>>>>>>> 0/943713/1048576
>>>>>>>> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
>>>>>>>> 2097152/3040853/3145728
>>>>>>>> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
>>>>>>>> 1048576/1992283/2097152
>>>>>>>> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
>>>>>>>> 0/1048570/1048576
>>>>>>>> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0
>>>> Finished
>>>>>>>> randomWrite in 2376615ms at offset 0 for 1048576 rows
>>>>>>>> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished 0 in
>>>>>>>> 2376615ms
>>>>>>>> writing 1048576 rows
>>>>>>>> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
>>>>>>>> 2097152/3145710/3145728
>>>>>>>> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2
>>>> Finished
>>>>>>>> randomWrite in 2623395ms at offset 2097152 for 1048576 rows
>>>>>>>> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished 2 in
>>>>>>>> 2623395ms
>>>>>>>> writing 1048576 rows
>>>>>>>> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
>>>>>>>> 1048576/2097140/2097152
>>>>>>>> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1
>>>> Finished
>>>>>>>> randomWrite in 2630199ms at offset 1048576 for 1048576 rows
>>>>>>>> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished 1 in
>>>>>>>> 2630199ms
>>>>>>>> writing 1048576 rows
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Seems kind of slow for ~3M records.  I have a 4 node cluster up at
>>>> the
>>>>>>>> moment.  HMaster & Namenode running on same box.
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>>
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
>>>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>>
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> View this message in context:
>>>>
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24955595.html
>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>
>>>>
> 

Re: HBase in a real world application

Posted by Jeff Hammerbacher <ha...@cloudera.com>.
Hey Stack,

I notice that the patch for this issue doesn't include any sort of tests
that might have caught this regression. Do you guys have an HBaseBench,
HBaseMix, or similarly named tool for catching performance regressions?

Thanks,
Jeff

On Mon, Aug 17, 2009 at 4:51 PM, stack <st...@duboce.net> wrote:

> Our writes were off by a factor of 7 or 8.  Writes should be better now
> (HBASE-1771).
> Thanks,
> St.Ack
>
>
> On Thu, Aug 13, 2009 at 4:53 PM, stack <st...@duboce.net> wrote:
>
> > I just tried it.  It seems slow to me writing too.  Let me take a
> look....
> > St.Ack
> >
> >
> > On Thu, Aug 13, 2009 at 10:06 AM, llpind <so...@hotmail.com> wrote:
> >
> >>
> >> Okay I changed replication to 2.  and removed "-XX:NewSize=6m
> >> -XX:MaxNewSize=6m"
> >>
> >> here is results for randomWrite 3 clients:
> >>
> >>
> >>
> >> RandomWrite =================================================
> >>
> >> hadoop-0.20.0/bin/hadoop jar hbase-0.20.0/hbase-0.20.0-test.jar
> >>  --nomapred
> >> randomWrite 3
> >>
> >>
> >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-0 Start
> >> randomWrite at offset 0 for 1048576 rows
> >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-1 Start
> >> randomWrite at offset 1048576 for 1048576 rows
> >> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-2 Start
> >> randomWrite at offset 2097152 for 1048576 rows
> >> 09/08/13 09:51:47 INFO hbase.PerformanceEvaluation: client-0
> >> 0/104857/1048576
> >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1153427/2097152
> >> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2201997/3145728
> >> 09/08/13 09:52:22 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1258284/2097152
> >> 09/08/13 09:52:23 INFO hbase.PerformanceEvaluation: client-0
> >> 0/209714/1048576
> >> 09/08/13 09:52:24 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2306854/3145728
> >> 09/08/13 09:52:47 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1363141/2097152
> >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-0
> >> 0/314571/1048576
> >> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2411711/3145728
> >> 09/08/13 09:53:24 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1467998/2097152
> >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-0
> >> 0/419428/1048576
> >> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2516568/3145728
> >> 09/08/13 09:53:48 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1572855/2097152
> >> 09/08/13 09:54:08 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2621425/3145728
> >> 09/08/13 09:54:10 INFO hbase.PerformanceEvaluation: client-0
> >> 0/524285/1048576
> >> 09/08/13 09:54:40 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1677712/2097152
> >> 09/08/13 09:54:49 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2726282/3145728
> >> 09/08/13 09:54:52 INFO hbase.PerformanceEvaluation: client-0
> >> 0/629142/1048576
> >> 09/08/13 09:55:57 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1782569/2097152
> >> 09/08/13 09:56:21 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2831139/3145728
> >> 09/08/13 09:56:41 INFO hbase.PerformanceEvaluation: client-0
> >> 0/733999/1048576
> >> 09/08/13 09:57:23 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1887426/2097152
> >> 09/08/13 09:58:40 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2935996/3145728
> >> 09/08/13 09:58:54 INFO hbase.PerformanceEvaluation: client-0
> >> 0/838856/1048576
> >> 09/08/13 10:00:29 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1992283/2097152
> >> 09/08/13 10:01:01 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/3040853/3145728
> >> 09/08/13 10:01:24 INFO hbase.PerformanceEvaluation: client-0
> >> 0/943713/1048576
> >> 09/08/13 10:02:36 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/2097140/2097152
> >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: client-1 Finished
> >> randomWrite in 680674ms at offset 1048576 for 1048576 rows
> >> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: Finished 1 in
> 680674ms
> >> writing 1048576 rows
> >> 09/08/13 10:03:19 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/3145710/3145728
> >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: client-2 Finished
> >> randomWrite in 723771ms at offset 2097152 for 1048576 rows
> >> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: Finished 2 in
> 723771ms
> >> writing 1048576 rows
> >> 09/08/13 10:03:41 INFO hbase.PerformanceEvaluation: client-0
> >> 0/1048570/1048576
> >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: client-0 Finished
> >> randomWrite in 746054ms at offset 0 for 1048576 rows
> >> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: Finished 0 in
> 746054ms
> >> writing 1048576 rows
> >>
> >>
> >>
> >> ============================================================
> >>
> >> Still pretty slow.  Any other ideas?  I'm running the client from the
> >> master
> >> box, but its not running any regionServers or datanodes.
> >>
> >> stack-3 wrote:
> >> >
> >> > Your config. looks fine.
> >> >
> >> > Only think that gives me pause is:
> >> >
> >> > "-XX:NewSize=6m -XX:MaxNewSize=6m"
> >> >
> >> > Any reason for the above?
> >> >
> >> > If you study your GC logs, lots of pauses?
> >> >
> >> > Oh, and this: replication is set to 6.  Why 6?  Each write must commit
> >> to
> >> > 6
> >> > datanodes before complete.  In the tests posted on wiki, we replicate
> to
> >> 3
> >> > nodes.
> >> >
> >> > In end of this message you say you are doing gets?  Numbers you posted
> >> > were
> >> > for writes?
> >> >
> >> > St.Ack
> >> >
> >> >
> >> > On Wed, Aug 12, 2009 at 1:15 PM, llpind <so...@hotmail.com>
> wrote:
> >> >
> >> >>
> >> >> Not sure why my performance is so slow.  Here is my configuration:
> >> >>
> >> >> box1:
> >> >> 10395 SecondaryNameNode
> >> >> 11628 Jps
> >> >> 10131 NameNode
> >> >> 10638 HQuorumPeer
> >> >> 10705 HMaster
> >> >>
> >> >> box 2-5:
> >> >> 6741 HQuorumPeer
> >> >> 6841 HRegionServer
> >> >> 7881 Jps
> >> >> 6610 DataNode
> >> >>
> >> >>
> >> >> hbase site: =======================
> >> >> <?xml version="1.0"?>
> >> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> >> >> <!--
> >> >> /**
> >> >>  * Copyright 2007 The Apache Software Foundation
> >> >>  *
> >> >>  * Licensed to the Apache Software Foundation (ASF) under one
> >> >>  * or more contributor license agreements.  See the NOTICE file
> >> >>  * distributed with this work for additional information
> >> >>  * regarding copyright ownership.  The ASF licenses this file
> >> >>  * to you under the Apache License, Version 2.0 (the
> >> >>  * "License"); you may not use this file except in compliance
> >> >>  * with the License.  You may obtain a copy of the License at
> >> >>  *
> >> >>  *     http://www.apache.org/licenses/LICENSE-2.0
> >> >>  *
> >> >>  * Unless required by applicable law or agreed to in writing,
> software
> >> >>  * distributed under the License is distributed on an "AS IS" BASIS,
> >> >>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> >> >> implied.
> >> >>  * See the License for the specific language governing permissions
> and
> >> >>  * limitations under the License.
> >> >>  */
> >> >> -->
> >> >> <configuration>
> >> >>  <property>
> >> >>    <name>hbase.rootdir</name>
> >> >>    <value>hdfs://box1:9000/hbase</value>
> >> >>    <description>The directory shared by region servers.
> >> >>    </description>
> >> >>  </property>
> >> >>  <property>
> >> >>    <name>hbase.master.port</name>
> >> >>    <value>60000</value>
> >> >>    <description>The port that the HBase master runs at.
> >> >>    </description>
> >> >>  </property>
> >> >>  <property>
> >> >>    <name>hbase.cluster.distributed</name>
> >> >>    <value>true</value>
> >> >>    <description>The mode the cluster will be in. Possible values are
> >> >>      false: standalone and pseudo-distributed setups with managed
> >> >> Zookeeper
> >> >>      true: fully-distributed with unmanaged Zookeeper Quorum (see
> >> >> hbase-env.sh)
> >> >>    </description>
> >> >>  </property>
> >> >>  <property>
> >> >>    <name>hbase.regionserver.lease.period</name>
> >> >>    <value>120000</value>
> >> >>    <description>HRegion server lease period in milliseconds. Default
> is
> >> >>    60 seconds. Clients must report in within this period else they
> are
> >> >>    considered dead.</description>
> >> >>  </property>
> >> >>
> >> >>  <property>
> >> >>      <name>hbase.zookeeper.property.clientPort</name>
> >> >>      <value>2222</value>
> >> >>      <description>Property from ZooKeeper's config zoo.cfg.
> >> >>      The port at which the clients will connect.
> >> >>      </description>
> >> >>  </property>
> >> >>  <property>
> >> >>      <name>hbase.zookeeper.property.dataDir</name>
> >> >>      <value>/home/hadoop/zookeeper</value>
> >> >>  </property>
> >> >>  <property>
> >> >>      <name>hbase.zookeeper.property.syncLimit</name>
> >> >>      <value>5</value>
> >> >>  </property>
> >> >>  <property>
> >> >>      <name>hbase.zookeeper.property.tickTime</name>
> >> >>      <value>2000</value>
> >> >>  </property>
> >> >>  <property>
> >> >>      <name>hbase.zookeeper.property.initLimit</name>
> >> >>      <value>10</value>
> >> >>  </property>
> >> >>  <property>
> >> >>      <name>hbase.zookeeper.quorum</name>
> >> >>      <value>box1,box2,box3,box4</value>
> >> >>      <description>Comma separated list of servers in the ZooKeeper
> >> >> Quorum.
> >> >>      For example,
> >> >> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
> >> >>      By default this is set to localhost for local and
> >> pseudo-distributed
> >> >> modes
> >> >>      of operation. For a fully-distributed setup, this should be set
> to
> >> a
> >> >> full
> >> >>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
> >> >> hbase-env.sh
> >> >>      this is the list of servers which we will start/stop ZooKeeper
> on.
> >> >>      </description>
> >> >>  </property>
> >> >>  <property>
> >> >>    <name>hfile.block.cache.size</name>
> >> >>    <value>.5</value>
> >> >>    <description>text</description>
> >> >>  </property>
> >> >>
> >> >> </configuration>
> >> >>
> >> >>
> >> >> hbase env:====================================================
> >> >>
> >> >> export HBASE_CLASSPATH=${HADOOP_CONF_DIR}
> >> >>
> >> >> export HBASE_HEAPSIZE=3000
> >> >>
> >> >> export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m
> >> >> -XX:+UseConcMarkSweepGC
> >> >> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> >> >> -XX:+CMSIncrementalMode
> >> >> -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"
> >> >>
> >> >> export HBASE_MANAGES_ZK=true
> >> >>
> >> >> Hadoop core
> >> >> site===========================================================
> >> >>
> >> >> <?xml version="1.0"?>
> >> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> >> >>
> >> >> <!-- Put site-specific property overrides in this file. -->
> >> >>
> >> >> <configuration>
> >> >> <property>
> >> >>   <name>fs.default.name</name>
> >> >>   <value>hdfs://box1:9000</value>
> >> >>   <description>The name of the default file system.  A URI whose
> >> >>   scheme and authority determine the FileSystem implementation.  The
> >> >>   uri's scheme determines the config property (fs.SCHEME.impl) naming
> >> >>   the FileSystem implementation class.  The uri's authority is used
> to
> >> >>   determine the host, port, etc. for a filesystem.</description>
> >> >> </property>
> >> >> <property>
> >> >>  <name>hadoop.tmp.dir</name>
> >> >>  <value>/data/hadoop-0.20.0-${user.name}</value>
> >> >>  <description>A base for other temporary directories.</description>
> >> >> </property>
> >> >> </configuration>
> >> >>
> >> >> ==============
> >> >>
> >> >> replication is set to 6.
> >> >>
> >> >> hadoop env=================
> >> >>
> >> >> export HADOOP_HEAPSIZE=3000
> >> >> export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
> >> >> $HADOOP_NAMENODE_OPTS"
> >> >> export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
> >> >> $HADOOP_SECONDARYNAMENODE_OPTS"
> >> >> export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
> >> >> $HADOOP_DATANODE_OPTS"
> >> >> export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
> >> >> $HADOOP_BALANCER_OPTS"
> >> >> export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
> >> >> $HADOOP_JOBTRACKER_OPTS"
> >> >>  ==================
> >> >>
> >> >>
> >> >> Very basic setup.  then i start the cluster do simple random Get
> >> >> operations
> >> >> on a tall table (~60 M rows):
> >> >>
> >> >> {NAME => 'tallTable', FAMILIES => [{NAME => 'family1', COMPRESSION =>
> >> >> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
> >> >> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
> >> >>
> >> >> Is this fairly normal speeds?  I'm unsure if this is a result of
> having
> >> a
> >> >> small cluster?  Please advise...
> >> >>
> >> >> stack-3 wrote:
> >> >> >
> >> >> > Yeah, seems slow.  In old hbase, it could do 5-10k writes a second
> >> >> going
> >> >> > by
> >> >> > performance eval page up on wiki.  SequentialWrite was about same
> as
> >> >> > RandomWrite.  Check out the stats on hw up on that page and
> >> description
> >> >> of
> >> >> > how test was set up.  Can you figure where its slow?
> >> >> >
> >> >> > St.Ack
> >> >> >
> >> >> > On Wed, Aug 12, 2009 at 10:10 AM, llpind <so...@hotmail.com>
> >> >> wrote:
> >> >> >
> >> >> >>
> >> >> >> Thanks Stack.
> >> >> >>
> >> >> >> I will try mapred with more clients.   I tried it without mapred
> >> using
> >> >> 3
> >> >> >> clients Random Write operations here was the output:
> >> >> >>
> >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0 Start
> >> >> >> randomWrite at offset 0 for 1048576 rows
> >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1 Start
> >> >> >> randomWrite at offset 1048576 for 1048576 rows
> >> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2 Start
> >> >> >> randomWrite at offset 2097152 for 1048576 rows
> >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
> >> >> >> 1048576/1153427/2097152
> >> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
> >> >> >> 2097152/2201997/3145728
> >> >> >> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
> >> >> >> 0/104857/1048576
> >> >> >> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
> >> >> >> 0/209714/1048576
> >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
> >> >> >> 1048576/1258284/2097152
> >> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
> >> >> >> 2097152/2306854/3145728
> >> >> >> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
> >> >> >> 1048576/1363141/2097152
> >> >> >> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
> >> >> >> 0/314571/1048576
> >> >> >> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
> >> >> >> 2097152/2411711/3145728
> >> >> >> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
> >> >> >> 0/419428/1048576
> >> >> >> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
> >> >> >> 1048576/1467998/2097152
> >> >> >> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
> >> >> >> 2097152/2516568/3145728
> >> >> >> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
> >> >> >> 0/524285/1048576
> >> >> >> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
> >> >> >> 2097152/2621425/3145728
> >> >> >> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
> >> >> >> 1048576/1572855/2097152
> >> >> >> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
> >> >> >> 0/629142/1048576
> >> >> >> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
> >> >> >> 2097152/2726282/3145728
> >> >> >> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
> >> >> >> 1048576/1677712/2097152
> >> >> >> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
> >> >> >> 0/733999/1048576
> >> >> >> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
> >> >> >> 2097152/2831139/3145728
> >> >> >> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
> >> >> >> 1048576/1782569/2097152
> >> >> >> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
> >> >> >> 0/838856/1048576
> >> >> >> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
> >> >> >> 2097152/2935996/3145728
> >> >> >> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
> >> >> >> 1048576/1887426/2097152
> >> >> >> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
> >> >> >> 0/943713/1048576
> >> >> >> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
> >> >> >> 2097152/3040853/3145728
> >> >> >> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
> >> >> >> 1048576/1992283/2097152
> >> >> >> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
> >> >> >> 0/1048570/1048576
> >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0
> >> Finished
> >> >> >> randomWrite in 2376615ms at offset 0 for 1048576 rows
> >> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished 0 in
> >> >> >> 2376615ms
> >> >> >> writing 1048576 rows
> >> >> >> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
> >> >> >> 2097152/3145710/3145728
> >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2
> >> Finished
> >> >> >> randomWrite in 2623395ms at offset 2097152 for 1048576 rows
> >> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished 2 in
> >> >> >> 2623395ms
> >> >> >> writing 1048576 rows
> >> >> >> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
> >> >> >> 1048576/2097140/2097152
> >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1
> >> Finished
> >> >> >> randomWrite in 2630199ms at offset 1048576 for 1048576 rows
> >> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished 1 in
> >> >> >> 2630199ms
> >> >> >> writing 1048576 rows
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Seems kind of slow for ~3M records.  I have a 4 node cluster up at
> >> the
> >> >> >> moment.  HMaster & Namenode running on same box.
> >> >> >> --
> >> >> >> View this message in context:
> >> >> >>
> >> >>
> >>
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
> >> >> >> Sent from the HBase User mailing list archive at Nabble.com.
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
> >> >> Sent from the HBase User mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24955595.html
> >> Sent from the HBase User mailing list archive at Nabble.com.
> >>
> >>
> >
>

Re: HBase in a real world application

Posted by stack <st...@duboce.net>.
Our writes were off by a factor of 7 or 8.  Writes should be better now
(HBASE-1771).
Thanks,
St.Ack


On Thu, Aug 13, 2009 at 4:53 PM, stack <st...@duboce.net> wrote:

> I just tried it.  It seems slow to me writing too.  Let me take a look....
> St.Ack
>
>
> On Thu, Aug 13, 2009 at 10:06 AM, llpind <so...@hotmail.com> wrote:
>
>>
>> Okay I changed replication to 2.  and removed "-XX:NewSize=6m
>> -XX:MaxNewSize=6m"
>>
>> here is results for randomWrite 3 clients:
>>
>>
>>
>> RandomWrite =================================================
>>
>> hadoop-0.20.0/bin/hadoop jar hbase-0.20.0/hbase-0.20.0-test.jar
>>  --nomapred
>> randomWrite 3
>>
>>
>> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-0 Start
>> randomWrite at offset 0 for 1048576 rows
>> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-1 Start
>> randomWrite at offset 1048576 for 1048576 rows
>> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-2 Start
>> randomWrite at offset 2097152 for 1048576 rows
>> 09/08/13 09:51:47 INFO hbase.PerformanceEvaluation: client-0
>> 0/104857/1048576
>> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1153427/2097152
>> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2201997/3145728
>> 09/08/13 09:52:22 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1258284/2097152
>> 09/08/13 09:52:23 INFO hbase.PerformanceEvaluation: client-0
>> 0/209714/1048576
>> 09/08/13 09:52:24 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2306854/3145728
>> 09/08/13 09:52:47 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1363141/2097152
>> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-0
>> 0/314571/1048576
>> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2411711/3145728
>> 09/08/13 09:53:24 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1467998/2097152
>> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-0
>> 0/419428/1048576
>> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2516568/3145728
>> 09/08/13 09:53:48 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1572855/2097152
>> 09/08/13 09:54:08 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2621425/3145728
>> 09/08/13 09:54:10 INFO hbase.PerformanceEvaluation: client-0
>> 0/524285/1048576
>> 09/08/13 09:54:40 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1677712/2097152
>> 09/08/13 09:54:49 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2726282/3145728
>> 09/08/13 09:54:52 INFO hbase.PerformanceEvaluation: client-0
>> 0/629142/1048576
>> 09/08/13 09:55:57 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1782569/2097152
>> 09/08/13 09:56:21 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2831139/3145728
>> 09/08/13 09:56:41 INFO hbase.PerformanceEvaluation: client-0
>> 0/733999/1048576
>> 09/08/13 09:57:23 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1887426/2097152
>> 09/08/13 09:58:40 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2935996/3145728
>> 09/08/13 09:58:54 INFO hbase.PerformanceEvaluation: client-0
>> 0/838856/1048576
>> 09/08/13 10:00:29 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1992283/2097152
>> 09/08/13 10:01:01 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/3040853/3145728
>> 09/08/13 10:01:24 INFO hbase.PerformanceEvaluation: client-0
>> 0/943713/1048576
>> 09/08/13 10:02:36 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/2097140/2097152
>> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: client-1 Finished
>> randomWrite in 680674ms at offset 1048576 for 1048576 rows
>> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: Finished 1 in 680674ms
>> writing 1048576 rows
>> 09/08/13 10:03:19 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/3145710/3145728
>> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: client-2 Finished
>> randomWrite in 723771ms at offset 2097152 for 1048576 rows
>> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: Finished 2 in 723771ms
>> writing 1048576 rows
>> 09/08/13 10:03:41 INFO hbase.PerformanceEvaluation: client-0
>> 0/1048570/1048576
>> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: client-0 Finished
>> randomWrite in 746054ms at offset 0 for 1048576 rows
>> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: Finished 0 in 746054ms
>> writing 1048576 rows
>>
>>
>>
>> ============================================================
>>
>> Still pretty slow.  Any other ideas?  I'm running the client from the
>> master
>> box, but its not running any regionServers or datanodes.
>>
>> stack-3 wrote:
>> >
>> > Your config. looks fine.
>> >
>> > Only think that gives me pause is:
>> >
>> > "-XX:NewSize=6m -XX:MaxNewSize=6m"
>> >
>> > Any reason for the above?
>> >
>> > If you study your GC logs, lots of pauses?
>> >
>> > Oh, and this: replication is set to 6.  Why 6?  Each write must commit
>> to
>> > 6
>> > datanodes before complete.  In the tests posted on wiki, we replicate to
>> 3
>> > nodes.
>> >
>> > In end of this message you say you are doing gets?  Numbers you posted
>> > were
>> > for writes?
>> >
>> > St.Ack
>> >
>> >
>> > On Wed, Aug 12, 2009 at 1:15 PM, llpind <so...@hotmail.com> wrote:
>> >
>> >>
>> >> Not sure why my performance is so slow.  Here is my configuration:
>> >>
>> >> box1:
>> >> 10395 SecondaryNameNode
>> >> 11628 Jps
>> >> 10131 NameNode
>> >> 10638 HQuorumPeer
>> >> 10705 HMaster
>> >>
>> >> box 2-5:
>> >> 6741 HQuorumPeer
>> >> 6841 HRegionServer
>> >> 7881 Jps
>> >> 6610 DataNode
>> >>
>> >>
>> >> hbase site: =======================
>> >> <?xml version="1.0"?>
>> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> >> <!--
>> >> /**
>> >>  * Copyright 2007 The Apache Software Foundation
>> >>  *
>> >>  * Licensed to the Apache Software Foundation (ASF) under one
>> >>  * or more contributor license agreements.  See the NOTICE file
>> >>  * distributed with this work for additional information
>> >>  * regarding copyright ownership.  The ASF licenses this file
>> >>  * to you under the Apache License, Version 2.0 (the
>> >>  * "License"); you may not use this file except in compliance
>> >>  * with the License.  You may obtain a copy of the License at
>> >>  *
>> >>  *     http://www.apache.org/licenses/LICENSE-2.0
>> >>  *
>> >>  * Unless required by applicable law or agreed to in writing, software
>> >>  * distributed under the License is distributed on an "AS IS" BASIS,
>> >>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>> >> implied.
>> >>  * See the License for the specific language governing permissions and
>> >>  * limitations under the License.
>> >>  */
>> >> -->
>> >> <configuration>
>> >>  <property>
>> >>    <name>hbase.rootdir</name>
>> >>    <value>hdfs://box1:9000/hbase</value>
>> >>    <description>The directory shared by region servers.
>> >>    </description>
>> >>  </property>
>> >>  <property>
>> >>    <name>hbase.master.port</name>
>> >>    <value>60000</value>
>> >>    <description>The port that the HBase master runs at.
>> >>    </description>
>> >>  </property>
>> >>  <property>
>> >>    <name>hbase.cluster.distributed</name>
>> >>    <value>true</value>
>> >>    <description>The mode the cluster will be in. Possible values are
>> >>      false: standalone and pseudo-distributed setups with managed
>> >> Zookeeper
>> >>      true: fully-distributed with unmanaged Zookeeper Quorum (see
>> >> hbase-env.sh)
>> >>    </description>
>> >>  </property>
>> >>  <property>
>> >>    <name>hbase.regionserver.lease.period</name>
>> >>    <value>120000</value>
>> >>    <description>HRegion server lease period in milliseconds. Default is
>> >>    60 seconds. Clients must report in within this period else they are
>> >>    considered dead.</description>
>> >>  </property>
>> >>
>> >>  <property>
>> >>      <name>hbase.zookeeper.property.clientPort</name>
>> >>      <value>2222</value>
>> >>      <description>Property from ZooKeeper's config zoo.cfg.
>> >>      The port at which the clients will connect.
>> >>      </description>
>> >>  </property>
>> >>  <property>
>> >>      <name>hbase.zookeeper.property.dataDir</name>
>> >>      <value>/home/hadoop/zookeeper</value>
>> >>  </property>
>> >>  <property>
>> >>      <name>hbase.zookeeper.property.syncLimit</name>
>> >>      <value>5</value>
>> >>  </property>
>> >>  <property>
>> >>      <name>hbase.zookeeper.property.tickTime</name>
>> >>      <value>2000</value>
>> >>  </property>
>> >>  <property>
>> >>      <name>hbase.zookeeper.property.initLimit</name>
>> >>      <value>10</value>
>> >>  </property>
>> >>  <property>
>> >>      <name>hbase.zookeeper.quorum</name>
>> >>      <value>box1,box2,box3,box4</value>
>> >>      <description>Comma separated list of servers in the ZooKeeper
>> >> Quorum.
>> >>      For example,
>> >> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
>> >>      By default this is set to localhost for local and
>> pseudo-distributed
>> >> modes
>> >>      of operation. For a fully-distributed setup, this should be set to
>> a
>> >> full
>> >>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
>> >> hbase-env.sh
>> >>      this is the list of servers which we will start/stop ZooKeeper on.
>> >>      </description>
>> >>  </property>
>> >>  <property>
>> >>    <name>hfile.block.cache.size</name>
>> >>    <value>.5</value>
>> >>    <description>text</description>
>> >>  </property>
>> >>
>> >> </configuration>
>> >>
>> >>
>> >> hbase env:====================================================
>> >>
>> >> export HBASE_CLASSPATH=${HADOOP_CONF_DIR}
>> >>
>> >> export HBASE_HEAPSIZE=3000
>> >>
>> >> export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m
>> >> -XX:+UseConcMarkSweepGC
>> >> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>> >> -XX:+CMSIncrementalMode
>> >> -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"
>> >>
>> >> export HBASE_MANAGES_ZK=true
>> >>
>> >> Hadoop core
>> >> site===========================================================
>> >>
>> >> <?xml version="1.0"?>
>> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> >>
>> >> <!-- Put site-specific property overrides in this file. -->
>> >>
>> >> <configuration>
>> >> <property>
>> >>   <name>fs.default.name</name>
>> >>   <value>hdfs://box1:9000</value>
>> >>   <description>The name of the default file system.  A URI whose
>> >>   scheme and authority determine the FileSystem implementation.  The
>> >>   uri's scheme determines the config property (fs.SCHEME.impl) naming
>> >>   the FileSystem implementation class.  The uri's authority is used to
>> >>   determine the host, port, etc. for a filesystem.</description>
>> >> </property>
>> >> <property>
>> >>  <name>hadoop.tmp.dir</name>
>> >>  <value>/data/hadoop-0.20.0-${user.name}</value>
>> >>  <description>A base for other temporary directories.</description>
>> >> </property>
>> >> </configuration>
>> >>
>> >> ==============
>> >>
>> >> replication is set to 6.
>> >>
>> >> hadoop env=================
>> >>
>> >> export HADOOP_HEAPSIZE=3000
>> >> export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
>> >> $HADOOP_NAMENODE_OPTS"
>> >> export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
>> >> $HADOOP_SECONDARYNAMENODE_OPTS"
>> >> export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
>> >> $HADOOP_DATANODE_OPTS"
>> >> export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
>> >> $HADOOP_BALANCER_OPTS"
>> >> export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
>> >> $HADOOP_JOBTRACKER_OPTS"
>> >>  ==================
>> >>
>> >>
>> >> Very basic setup.  then i start the cluster do simple random Get
>> >> operations
>> >> on a tall table (~60 M rows):
>> >>
>> >> {NAME => 'tallTable', FAMILIES => [{NAME => 'family1', COMPRESSION =>
>> >> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
>> >> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
>> >>
>> >> Is this fairly normal speeds?  I'm unsure if this is a result of having
>> a
>> >> small cluster?  Please advise...
>> >>
>> >> stack-3 wrote:
>> >> >
>> >> > Yeah, seems slow.  In old hbase, it could do 5-10k writes a second
>> >> going
>> >> > by
>> >> > performance eval page up on wiki.  SequentialWrite was about same as
>> >> > RandomWrite.  Check out the stats on hw up on that page and
>> description
>> >> of
>> >> > how test was set up.  Can you figure where its slow?
>> >> >
>> >> > St.Ack
>> >> >
>> >> > On Wed, Aug 12, 2009 at 10:10 AM, llpind <so...@hotmail.com>
>> >> wrote:
>> >> >
>> >> >>
>> >> >> Thanks Stack.
>> >> >>
>> >> >> I will try mapred with more clients.   I tried it without mapred
>> using
>> >> 3
>> >> >> clients Random Write operations here was the output:
>> >> >>
>> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0 Start
>> >> >> randomWrite at offset 0 for 1048576 rows
>> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1 Start
>> >> >> randomWrite at offset 1048576 for 1048576 rows
>> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2 Start
>> >> >> randomWrite at offset 2097152 for 1048576 rows
>> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
>> >> >> 1048576/1153427/2097152
>> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
>> >> >> 2097152/2201997/3145728
>> >> >> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
>> >> >> 0/104857/1048576
>> >> >> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
>> >> >> 0/209714/1048576
>> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
>> >> >> 1048576/1258284/2097152
>> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
>> >> >> 2097152/2306854/3145728
>> >> >> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
>> >> >> 1048576/1363141/2097152
>> >> >> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
>> >> >> 0/314571/1048576
>> >> >> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
>> >> >> 2097152/2411711/3145728
>> >> >> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
>> >> >> 0/419428/1048576
>> >> >> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
>> >> >> 1048576/1467998/2097152
>> >> >> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
>> >> >> 2097152/2516568/3145728
>> >> >> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
>> >> >> 0/524285/1048576
>> >> >> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
>> >> >> 2097152/2621425/3145728
>> >> >> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
>> >> >> 1048576/1572855/2097152
>> >> >> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
>> >> >> 0/629142/1048576
>> >> >> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
>> >> >> 2097152/2726282/3145728
>> >> >> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
>> >> >> 1048576/1677712/2097152
>> >> >> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
>> >> >> 0/733999/1048576
>> >> >> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
>> >> >> 2097152/2831139/3145728
>> >> >> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
>> >> >> 1048576/1782569/2097152
>> >> >> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
>> >> >> 0/838856/1048576
>> >> >> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
>> >> >> 2097152/2935996/3145728
>> >> >> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
>> >> >> 1048576/1887426/2097152
>> >> >> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
>> >> >> 0/943713/1048576
>> >> >> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
>> >> >> 2097152/3040853/3145728
>> >> >> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
>> >> >> 1048576/1992283/2097152
>> >> >> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
>> >> >> 0/1048570/1048576
>> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0
>> Finished
>> >> >> randomWrite in 2376615ms at offset 0 for 1048576 rows
>> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished 0 in
>> >> >> 2376615ms
>> >> >> writing 1048576 rows
>> >> >> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
>> >> >> 2097152/3145710/3145728
>> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2
>> Finished
>> >> >> randomWrite in 2623395ms at offset 2097152 for 1048576 rows
>> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished 2 in
>> >> >> 2623395ms
>> >> >> writing 1048576 rows
>> >> >> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
>> >> >> 1048576/2097140/2097152
>> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1
>> Finished
>> >> >> randomWrite in 2630199ms at offset 1048576 for 1048576 rows
>> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished 1 in
>> >> >> 2630199ms
>> >> >> writing 1048576 rows
>> >> >>
>> >> >>
>> >> >>
>> >> >> Seems kind of slow for ~3M records.  I have a 4 node cluster up at
>> the
>> >> >> moment.  HMaster & Namenode running on same box.
>> >> >> --
>> >> >> View this message in context:
>> >> >>
>> >>
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
>> >> >> Sent from the HBase User mailing list archive at Nabble.com.
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
>> >> Sent from the HBase User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24955595.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
>

Re: HBase in a real world application

Posted by stack <st...@duboce.net>.
I just tried it.  It seems slow to me writing too.  Let me take a look....
St.Ack

On Thu, Aug 13, 2009 at 10:06 AM, llpind <so...@hotmail.com> wrote:

>
> Okay I changed replication to 2.  and removed "-XX:NewSize=6m
> -XX:MaxNewSize=6m"
>
> here is results for randomWrite 3 clients:
>
>
>
> RandomWrite =================================================
>
> hadoop-0.20.0/bin/hadoop jar hbase-0.20.0/hbase-0.20.0-test.jar  --nomapred
> randomWrite 3
>
>
> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-0 Start
> randomWrite at offset 0 for 1048576 rows
> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-1 Start
> randomWrite at offset 1048576 for 1048576 rows
> 09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-2 Start
> randomWrite at offset 2097152 for 1048576 rows
> 09/08/13 09:51:47 INFO hbase.PerformanceEvaluation: client-0
> 0/104857/1048576
> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1153427/2097152
> 09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2201997/3145728
> 09/08/13 09:52:22 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1258284/2097152
> 09/08/13 09:52:23 INFO hbase.PerformanceEvaluation: client-0
> 0/209714/1048576
> 09/08/13 09:52:24 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2306854/3145728
> 09/08/13 09:52:47 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1363141/2097152
> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-0
> 0/314571/1048576
> 09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2411711/3145728
> 09/08/13 09:53:24 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1467998/2097152
> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-0
> 0/419428/1048576
> 09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2516568/3145728
> 09/08/13 09:53:48 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1572855/2097152
> 09/08/13 09:54:08 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2621425/3145728
> 09/08/13 09:54:10 INFO hbase.PerformanceEvaluation: client-0
> 0/524285/1048576
> 09/08/13 09:54:40 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1677712/2097152
> 09/08/13 09:54:49 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2726282/3145728
> 09/08/13 09:54:52 INFO hbase.PerformanceEvaluation: client-0
> 0/629142/1048576
> 09/08/13 09:55:57 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1782569/2097152
> 09/08/13 09:56:21 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2831139/3145728
> 09/08/13 09:56:41 INFO hbase.PerformanceEvaluation: client-0
> 0/733999/1048576
> 09/08/13 09:57:23 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1887426/2097152
> 09/08/13 09:58:40 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2935996/3145728
> 09/08/13 09:58:54 INFO hbase.PerformanceEvaluation: client-0
> 0/838856/1048576
> 09/08/13 10:00:29 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1992283/2097152
> 09/08/13 10:01:01 INFO hbase.PerformanceEvaluation: client-2
> 2097152/3040853/3145728
> 09/08/13 10:01:24 INFO hbase.PerformanceEvaluation: client-0
> 0/943713/1048576
> 09/08/13 10:02:36 INFO hbase.PerformanceEvaluation: client-1
> 1048576/2097140/2097152
> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: client-1 Finished
> randomWrite in 680674ms at offset 1048576 for 1048576 rows
> 09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: Finished 1 in 680674ms
> writing 1048576 rows
> 09/08/13 10:03:19 INFO hbase.PerformanceEvaluation: client-2
> 2097152/3145710/3145728
> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: client-2 Finished
> randomWrite in 723771ms at offset 2097152 for 1048576 rows
> 09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: Finished 2 in 723771ms
> writing 1048576 rows
> 09/08/13 10:03:41 INFO hbase.PerformanceEvaluation: client-0
> 0/1048570/1048576
> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: client-0 Finished
> randomWrite in 746054ms at offset 0 for 1048576 rows
> 09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: Finished 0 in 746054ms
> writing 1048576 rows
>
>
>
> ============================================================
>
> Still pretty slow.  Any other ideas?  I'm running the client from the
> master
> box, but its not running any regionServers or datanodes.
>
> stack-3 wrote:
> >
> > Your config. looks fine.
> >
> > Only think that gives me pause is:
> >
> > "-XX:NewSize=6m -XX:MaxNewSize=6m"
> >
> > Any reason for the above?
> >
> > If you study your GC logs, lots of pauses?
> >
> > Oh, and this: replication is set to 6.  Why 6?  Each write must commit to
> > 6
> > datanodes before complete.  In the tests posted on wiki, we replicate to
> 3
> > nodes.
> >
> > In end of this message you say you are doing gets?  Numbers you posted
> > were
> > for writes?
> >
> > St.Ack
> >
> >
> > On Wed, Aug 12, 2009 at 1:15 PM, llpind <so...@hotmail.com> wrote:
> >
> >>
> >> Not sure why my performance is so slow.  Here is my configuration:
> >>
> >> box1:
> >> 10395 SecondaryNameNode
> >> 11628 Jps
> >> 10131 NameNode
> >> 10638 HQuorumPeer
> >> 10705 HMaster
> >>
> >> box 2-5:
> >> 6741 HQuorumPeer
> >> 6841 HRegionServer
> >> 7881 Jps
> >> 6610 DataNode
> >>
> >>
> >> hbase site: =======================
> >> <?xml version="1.0"?>
> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> >> <!--
> >> /**
> >>  * Copyright 2007 The Apache Software Foundation
> >>  *
> >>  * Licensed to the Apache Software Foundation (ASF) under one
> >>  * or more contributor license agreements.  See the NOTICE file
> >>  * distributed with this work for additional information
> >>  * regarding copyright ownership.  The ASF licenses this file
> >>  * to you under the Apache License, Version 2.0 (the
> >>  * "License"); you may not use this file except in compliance
> >>  * with the License.  You may obtain a copy of the License at
> >>  *
> >>  *     http://www.apache.org/licenses/LICENSE-2.0
> >>  *
> >>  * Unless required by applicable law or agreed to in writing, software
> >>  * distributed under the License is distributed on an "AS IS" BASIS,
> >>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> >> implied.
> >>  * See the License for the specific language governing permissions and
> >>  * limitations under the License.
> >>  */
> >> -->
> >> <configuration>
> >>  <property>
> >>    <name>hbase.rootdir</name>
> >>    <value>hdfs://box1:9000/hbase</value>
> >>    <description>The directory shared by region servers.
> >>    </description>
> >>  </property>
> >>  <property>
> >>    <name>hbase.master.port</name>
> >>    <value>60000</value>
> >>    <description>The port that the HBase master runs at.
> >>    </description>
> >>  </property>
> >>  <property>
> >>    <name>hbase.cluster.distributed</name>
> >>    <value>true</value>
> >>    <description>The mode the cluster will be in. Possible values are
> >>      false: standalone and pseudo-distributed setups with managed
> >> Zookeeper
> >>      true: fully-distributed with unmanaged Zookeeper Quorum (see
> >> hbase-env.sh)
> >>    </description>
> >>  </property>
> >>  <property>
> >>    <name>hbase.regionserver.lease.period</name>
> >>    <value>120000</value>
> >>    <description>HRegion server lease period in milliseconds. Default is
> >>    60 seconds. Clients must report in within this period else they are
> >>    considered dead.</description>
> >>  </property>
> >>
> >>  <property>
> >>      <name>hbase.zookeeper.property.clientPort</name>
> >>      <value>2222</value>
> >>      <description>Property from ZooKeeper's config zoo.cfg.
> >>      The port at which the clients will connect.
> >>      </description>
> >>  </property>
> >>  <property>
> >>      <name>hbase.zookeeper.property.dataDir</name>
> >>      <value>/home/hadoop/zookeeper</value>
> >>  </property>
> >>  <property>
> >>      <name>hbase.zookeeper.property.syncLimit</name>
> >>      <value>5</value>
> >>  </property>
> >>  <property>
> >>      <name>hbase.zookeeper.property.tickTime</name>
> >>      <value>2000</value>
> >>  </property>
> >>  <property>
> >>      <name>hbase.zookeeper.property.initLimit</name>
> >>      <value>10</value>
> >>  </property>
> >>  <property>
> >>      <name>hbase.zookeeper.quorum</name>
> >>      <value>box1,box2,box3,box4</value>
> >>      <description>Comma separated list of servers in the ZooKeeper
> >> Quorum.
> >>      For example,
> >> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
> >>      By default this is set to localhost for local and
> pseudo-distributed
> >> modes
> >>      of operation. For a fully-distributed setup, this should be set to
> a
> >> full
> >>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
> >> hbase-env.sh
> >>      this is the list of servers which we will start/stop ZooKeeper on.
> >>      </description>
> >>  </property>
> >>  <property>
> >>    <name>hfile.block.cache.size</name>
> >>    <value>.5</value>
> >>    <description>text</description>
> >>  </property>
> >>
> >> </configuration>
> >>
> >>
> >> hbase env:====================================================
> >>
> >> export HBASE_CLASSPATH=${HADOOP_CONF_DIR}
> >>
> >> export HBASE_HEAPSIZE=3000
> >>
> >> export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m
> >> -XX:+UseConcMarkSweepGC
> >> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> >> -XX:+CMSIncrementalMode
> >> -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"
> >>
> >> export HBASE_MANAGES_ZK=true
> >>
> >> Hadoop core
> >> site===========================================================
> >>
> >> <?xml version="1.0"?>
> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> >>
> >> <!-- Put site-specific property overrides in this file. -->
> >>
> >> <configuration>
> >> <property>
> >>   <name>fs.default.name</name>
> >>   <value>hdfs://box1:9000</value>
> >>   <description>The name of the default file system.  A URI whose
> >>   scheme and authority determine the FileSystem implementation.  The
> >>   uri's scheme determines the config property (fs.SCHEME.impl) naming
> >>   the FileSystem implementation class.  The uri's authority is used to
> >>   determine the host, port, etc. for a filesystem.</description>
> >> </property>
> >> <property>
> >>  <name>hadoop.tmp.dir</name>
> >>  <value>/data/hadoop-0.20.0-${user.name}</value>
> >>  <description>A base for other temporary directories.</description>
> >> </property>
> >> </configuration>
> >>
> >> ==============
> >>
> >> replication is set to 6.
> >>
> >> hadoop env=================
> >>
> >> export HADOOP_HEAPSIZE=3000
> >> export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
> >> $HADOOP_NAMENODE_OPTS"
> >> export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
> >> $HADOOP_SECONDARYNAMENODE_OPTS"
> >> export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
> >> $HADOOP_DATANODE_OPTS"
> >> export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
> >> $HADOOP_BALANCER_OPTS"
> >> export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
> >> $HADOOP_JOBTRACKER_OPTS"
> >>  ==================
> >>
> >>
> >> Very basic setup.  then i start the cluster do simple random Get
> >> operations
> >> on a tall table (~60 M rows):
> >>
> >> {NAME => 'tallTable', FAMILIES => [{NAME => 'family1', COMPRESSION =>
> >> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
> >> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
> >>
> >> Is this fairly normal speeds?  I'm unsure if this is a result of having
> a
> >> small cluster?  Please advise...
> >>
> >> stack-3 wrote:
> >> >
> >> > Yeah, seems slow.  In old hbase, it could do 5-10k writes a second
> >> going
> >> > by
> >> > performance eval page up on wiki.  SequentialWrite was about same as
> >> > RandomWrite.  Check out the stats on hw up on that page and
> description
> >> of
> >> > how test was set up.  Can you figure where its slow?
> >> >
> >> > St.Ack
> >> >
> >> > On Wed, Aug 12, 2009 at 10:10 AM, llpind <so...@hotmail.com>
> >> wrote:
> >> >
> >> >>
> >> >> Thanks Stack.
> >> >>
> >> >> I will try mapred with more clients.   I tried it without mapred
> using
> >> 3
> >> >> clients Random Write operations here was the output:
> >> >>
> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0 Start
> >> >> randomWrite at offset 0 for 1048576 rows
> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1 Start
> >> >> randomWrite at offset 1048576 for 1048576 rows
> >> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2 Start
> >> >> randomWrite at offset 2097152 for 1048576 rows
> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
> >> >> 1048576/1153427/2097152
> >> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
> >> >> 2097152/2201997/3145728
> >> >> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
> >> >> 0/104857/1048576
> >> >> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
> >> >> 0/209714/1048576
> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
> >> >> 1048576/1258284/2097152
> >> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
> >> >> 2097152/2306854/3145728
> >> >> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
> >> >> 1048576/1363141/2097152
> >> >> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
> >> >> 0/314571/1048576
> >> >> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
> >> >> 2097152/2411711/3145728
> >> >> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
> >> >> 0/419428/1048576
> >> >> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
> >> >> 1048576/1467998/2097152
> >> >> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
> >> >> 2097152/2516568/3145728
> >> >> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
> >> >> 0/524285/1048576
> >> >> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
> >> >> 2097152/2621425/3145728
> >> >> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
> >> >> 1048576/1572855/2097152
> >> >> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
> >> >> 0/629142/1048576
> >> >> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
> >> >> 2097152/2726282/3145728
> >> >> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
> >> >> 1048576/1677712/2097152
> >> >> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
> >> >> 0/733999/1048576
> >> >> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
> >> >> 2097152/2831139/3145728
> >> >> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
> >> >> 1048576/1782569/2097152
> >> >> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
> >> >> 0/838856/1048576
> >> >> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
> >> >> 2097152/2935996/3145728
> >> >> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
> >> >> 1048576/1887426/2097152
> >> >> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
> >> >> 0/943713/1048576
> >> >> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
> >> >> 2097152/3040853/3145728
> >> >> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
> >> >> 1048576/1992283/2097152
> >> >> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
> >> >> 0/1048570/1048576
> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0 Finished
> >> >> randomWrite in 2376615ms at offset 0 for 1048576 rows
> >> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished 0 in
> >> >> 2376615ms
> >> >> writing 1048576 rows
> >> >> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
> >> >> 2097152/3145710/3145728
> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2 Finished
> >> >> randomWrite in 2623395ms at offset 2097152 for 1048576 rows
> >> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished 2 in
> >> >> 2623395ms
> >> >> writing 1048576 rows
> >> >> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
> >> >> 1048576/2097140/2097152
> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1 Finished
> >> >> randomWrite in 2630199ms at offset 1048576 for 1048576 rows
> >> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished 1 in
> >> >> 2630199ms
> >> >> writing 1048576 rows
> >> >>
> >> >>
> >> >>
> >> >> Seems kind of slow for ~3M records.  I have a 4 node cluster up at
> the
> >> >> moment.  HMaster & Namenode running on same box.
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
> >> >> Sent from the HBase User mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
> >> Sent from the HBase User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24955595.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: HBase in a real world application

Posted by llpind <so...@hotmail.com>.
Okay I changed replication to 2.  and removed "-XX:NewSize=6m
-XX:MaxNewSize=6m"

here is results for randomWrite 3 clients:



RandomWrite =================================================

hadoop-0.20.0/bin/hadoop jar hbase-0.20.0/hbase-0.20.0-test.jar  --nomapred
randomWrite 3


09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-0 Start
randomWrite at offset 0 for 1048576 rows
09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-1 Start
randomWrite at offset 1048576 for 1048576 rows
09/08/13 09:51:15 INFO hbase.PerformanceEvaluation: client-2 Start
randomWrite at offset 2097152 for 1048576 rows
09/08/13 09:51:47 INFO hbase.PerformanceEvaluation: client-0
0/104857/1048576
09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-1
1048576/1153427/2097152
09/08/13 09:51:48 INFO hbase.PerformanceEvaluation: client-2
2097152/2201997/3145728
09/08/13 09:52:22 INFO hbase.PerformanceEvaluation: client-1
1048576/1258284/2097152
09/08/13 09:52:23 INFO hbase.PerformanceEvaluation: client-0
0/209714/1048576
09/08/13 09:52:24 INFO hbase.PerformanceEvaluation: client-2
2097152/2306854/3145728
09/08/13 09:52:47 INFO hbase.PerformanceEvaluation: client-1
1048576/1363141/2097152
09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-0
0/314571/1048576
09/08/13 09:52:58 INFO hbase.PerformanceEvaluation: client-2
2097152/2411711/3145728
09/08/13 09:53:24 INFO hbase.PerformanceEvaluation: client-1
1048576/1467998/2097152
09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-0
0/419428/1048576
09/08/13 09:53:27 INFO hbase.PerformanceEvaluation: client-2
2097152/2516568/3145728
09/08/13 09:53:48 INFO hbase.PerformanceEvaluation: client-1
1048576/1572855/2097152
09/08/13 09:54:08 INFO hbase.PerformanceEvaluation: client-2
2097152/2621425/3145728
09/08/13 09:54:10 INFO hbase.PerformanceEvaluation: client-0
0/524285/1048576
09/08/13 09:54:40 INFO hbase.PerformanceEvaluation: client-1
1048576/1677712/2097152
09/08/13 09:54:49 INFO hbase.PerformanceEvaluation: client-2
2097152/2726282/3145728
09/08/13 09:54:52 INFO hbase.PerformanceEvaluation: client-0
0/629142/1048576
09/08/13 09:55:57 INFO hbase.PerformanceEvaluation: client-1
1048576/1782569/2097152
09/08/13 09:56:21 INFO hbase.PerformanceEvaluation: client-2
2097152/2831139/3145728
09/08/13 09:56:41 INFO hbase.PerformanceEvaluation: client-0
0/733999/1048576
09/08/13 09:57:23 INFO hbase.PerformanceEvaluation: client-1
1048576/1887426/2097152
09/08/13 09:58:40 INFO hbase.PerformanceEvaluation: client-2
2097152/2935996/3145728
09/08/13 09:58:54 INFO hbase.PerformanceEvaluation: client-0
0/838856/1048576
09/08/13 10:00:29 INFO hbase.PerformanceEvaluation: client-1
1048576/1992283/2097152
09/08/13 10:01:01 INFO hbase.PerformanceEvaluation: client-2
2097152/3040853/3145728
09/08/13 10:01:24 INFO hbase.PerformanceEvaluation: client-0
0/943713/1048576
09/08/13 10:02:36 INFO hbase.PerformanceEvaluation: client-1
1048576/2097140/2097152
09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: client-1 Finished
randomWrite in 680674ms at offset 1048576 for 1048576 rows
09/08/13 10:02:37 INFO hbase.PerformanceEvaluation: Finished 1 in 680674ms
writing 1048576 rows
09/08/13 10:03:19 INFO hbase.PerformanceEvaluation: client-2
2097152/3145710/3145728
09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: client-2 Finished
randomWrite in 723771ms at offset 2097152 for 1048576 rows
09/08/13 10:03:20 INFO hbase.PerformanceEvaluation: Finished 2 in 723771ms
writing 1048576 rows
09/08/13 10:03:41 INFO hbase.PerformanceEvaluation: client-0
0/1048570/1048576
09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: client-0 Finished
randomWrite in 746054ms at offset 0 for 1048576 rows
09/08/13 10:03:42 INFO hbase.PerformanceEvaluation: Finished 0 in 746054ms
writing 1048576 rows



============================================================

Still pretty slow.  Any other ideas?  I'm running the client from the master
box, but its not running any regionServers or datanodes.

stack-3 wrote:
> 
> Your config. looks fine.
> 
> Only think that gives me pause is:
> 
> "-XX:NewSize=6m -XX:MaxNewSize=6m"
> 
> Any reason for the above?
> 
> If you study your GC logs, lots of pauses?
> 
> Oh, and this: replication is set to 6.  Why 6?  Each write must commit to
> 6
> datanodes before complete.  In the tests posted on wiki, we replicate to 3
> nodes.
> 
> In end of this message you say you are doing gets?  Numbers you posted
> were
> for writes?
> 
> St.Ack
> 
> 
> On Wed, Aug 12, 2009 at 1:15 PM, llpind <so...@hotmail.com> wrote:
> 
>>
>> Not sure why my performance is so slow.  Here is my configuration:
>>
>> box1:
>> 10395 SecondaryNameNode
>> 11628 Jps
>> 10131 NameNode
>> 10638 HQuorumPeer
>> 10705 HMaster
>>
>> box 2-5:
>> 6741 HQuorumPeer
>> 6841 HRegionServer
>> 7881 Jps
>> 6610 DataNode
>>
>>
>> hbase site: =======================
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> <!--
>> /**
>>  * Copyright 2007 The Apache Software Foundation
>>  *
>>  * Licensed to the Apache Software Foundation (ASF) under one
>>  * or more contributor license agreements.  See the NOTICE file
>>  * distributed with this work for additional information
>>  * regarding copyright ownership.  The ASF licenses this file
>>  * to you under the Apache License, Version 2.0 (the
>>  * "License"); you may not use this file except in compliance
>>  * with the License.  You may obtain a copy of the License at
>>  *
>>  *     http://www.apache.org/licenses/LICENSE-2.0
>>  *
>>  * Unless required by applicable law or agreed to in writing, software
>>  * distributed under the License is distributed on an "AS IS" BASIS,
>>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>> implied.
>>  * See the License for the specific language governing permissions and
>>  * limitations under the License.
>>  */
>> -->
>> <configuration>
>>  <property>
>>    <name>hbase.rootdir</name>
>>    <value>hdfs://box1:9000/hbase</value>
>>    <description>The directory shared by region servers.
>>    </description>
>>  </property>
>>  <property>
>>    <name>hbase.master.port</name>
>>    <value>60000</value>
>>    <description>The port that the HBase master runs at.
>>    </description>
>>  </property>
>>  <property>
>>    <name>hbase.cluster.distributed</name>
>>    <value>true</value>
>>    <description>The mode the cluster will be in. Possible values are
>>      false: standalone and pseudo-distributed setups with managed
>> Zookeeper
>>      true: fully-distributed with unmanaged Zookeeper Quorum (see
>> hbase-env.sh)
>>    </description>
>>  </property>
>>  <property>
>>    <name>hbase.regionserver.lease.period</name>
>>    <value>120000</value>
>>    <description>HRegion server lease period in milliseconds. Default is
>>    60 seconds. Clients must report in within this period else they are
>>    considered dead.</description>
>>  </property>
>>
>>  <property>
>>      <name>hbase.zookeeper.property.clientPort</name>
>>      <value>2222</value>
>>      <description>Property from ZooKeeper's config zoo.cfg.
>>      The port at which the clients will connect.
>>      </description>
>>  </property>
>>  <property>
>>      <name>hbase.zookeeper.property.dataDir</name>
>>      <value>/home/hadoop/zookeeper</value>
>>  </property>
>>  <property>
>>      <name>hbase.zookeeper.property.syncLimit</name>
>>      <value>5</value>
>>  </property>
>>  <property>
>>      <name>hbase.zookeeper.property.tickTime</name>
>>      <value>2000</value>
>>  </property>
>>  <property>
>>      <name>hbase.zookeeper.property.initLimit</name>
>>      <value>10</value>
>>  </property>
>>  <property>
>>      <name>hbase.zookeeper.quorum</name>
>>      <value>box1,box2,box3,box4</value>
>>      <description>Comma separated list of servers in the ZooKeeper
>> Quorum.
>>      For example,
>> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
>>      By default this is set to localhost for local and pseudo-distributed
>> modes
>>      of operation. For a fully-distributed setup, this should be set to a
>> full
>>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
>> hbase-env.sh
>>      this is the list of servers which we will start/stop ZooKeeper on.
>>      </description>
>>  </property>
>>  <property>
>>    <name>hfile.block.cache.size</name>
>>    <value>.5</value>
>>    <description>text</description>
>>  </property>
>>
>> </configuration>
>>
>>
>> hbase env:====================================================
>>
>> export HBASE_CLASSPATH=${HADOOP_CONF_DIR}
>>
>> export HBASE_HEAPSIZE=3000
>>
>> export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m
>> -XX:+UseConcMarkSweepGC
>> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>> -XX:+CMSIncrementalMode
>> -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"
>>
>> export HBASE_MANAGES_ZK=true
>>
>> Hadoop core
>> site===========================================================
>>
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>> <!-- Put site-specific property overrides in this file. -->
>>
>> <configuration>
>> <property>
>>   <name>fs.default.name</name>
>>   <value>hdfs://box1:9000</value>
>>   <description>The name of the default file system.  A URI whose
>>   scheme and authority determine the FileSystem implementation.  The
>>   uri's scheme determines the config property (fs.SCHEME.impl) naming
>>   the FileSystem implementation class.  The uri's authority is used to
>>   determine the host, port, etc. for a filesystem.</description>
>> </property>
>> <property>
>>  <name>hadoop.tmp.dir</name>
>>  <value>/data/hadoop-0.20.0-${user.name}</value>
>>  <description>A base for other temporary directories.</description>
>> </property>
>> </configuration>
>>
>> ==============
>>
>> replication is set to 6.
>>
>> hadoop env=================
>>
>> export HADOOP_HEAPSIZE=3000
>> export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
>> $HADOOP_NAMENODE_OPTS"
>> export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
>> $HADOOP_SECONDARYNAMENODE_OPTS"
>> export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
>> $HADOOP_DATANODE_OPTS"
>> export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
>> $HADOOP_BALANCER_OPTS"
>> export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
>> $HADOOP_JOBTRACKER_OPTS"
>>  ==================
>>
>>
>> Very basic setup.  then i start the cluster do simple random Get
>> operations
>> on a tall table (~60 M rows):
>>
>> {NAME => 'tallTable', FAMILIES => [{NAME => 'family1', COMPRESSION =>
>> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
>>
>> Is this fairly normal speeds?  I'm unsure if this is a result of having a
>> small cluster?  Please advise...
>>
>> stack-3 wrote:
>> >
>> > Yeah, seems slow.  In old hbase, it could do 5-10k writes a second
>> going
>> > by
>> > performance eval page up on wiki.  SequentialWrite was about same as
>> > RandomWrite.  Check out the stats on hw up on that page and description
>> of
>> > how test was set up.  Can you figure where its slow?
>> >
>> > St.Ack
>> >
>> > On Wed, Aug 12, 2009 at 10:10 AM, llpind <so...@hotmail.com>
>> wrote:
>> >
>> >>
>> >> Thanks Stack.
>> >>
>> >> I will try mapred with more clients.   I tried it without mapred using
>> 3
>> >> clients Random Write operations here was the output:
>> >>
>> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0 Start
>> >> randomWrite at offset 0 for 1048576 rows
>> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1 Start
>> >> randomWrite at offset 1048576 for 1048576 rows
>> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2 Start
>> >> randomWrite at offset 2097152 for 1048576 rows
>> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
>> >> 1048576/1153427/2097152
>> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
>> >> 2097152/2201997/3145728
>> >> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
>> >> 0/104857/1048576
>> >> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
>> >> 0/209714/1048576
>> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
>> >> 1048576/1258284/2097152
>> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
>> >> 2097152/2306854/3145728
>> >> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
>> >> 1048576/1363141/2097152
>> >> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
>> >> 0/314571/1048576
>> >> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
>> >> 2097152/2411711/3145728
>> >> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
>> >> 0/419428/1048576
>> >> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
>> >> 1048576/1467998/2097152
>> >> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
>> >> 2097152/2516568/3145728
>> >> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
>> >> 0/524285/1048576
>> >> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
>> >> 2097152/2621425/3145728
>> >> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
>> >> 1048576/1572855/2097152
>> >> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
>> >> 0/629142/1048576
>> >> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
>> >> 2097152/2726282/3145728
>> >> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
>> >> 1048576/1677712/2097152
>> >> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
>> >> 0/733999/1048576
>> >> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
>> >> 2097152/2831139/3145728
>> >> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
>> >> 1048576/1782569/2097152
>> >> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
>> >> 0/838856/1048576
>> >> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
>> >> 2097152/2935996/3145728
>> >> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
>> >> 1048576/1887426/2097152
>> >> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
>> >> 0/943713/1048576
>> >> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
>> >> 2097152/3040853/3145728
>> >> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
>> >> 1048576/1992283/2097152
>> >> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
>> >> 0/1048570/1048576
>> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0 Finished
>> >> randomWrite in 2376615ms at offset 0 for 1048576 rows
>> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished 0 in
>> >> 2376615ms
>> >> writing 1048576 rows
>> >> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
>> >> 2097152/3145710/3145728
>> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2 Finished
>> >> randomWrite in 2623395ms at offset 2097152 for 1048576 rows
>> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished 2 in
>> >> 2623395ms
>> >> writing 1048576 rows
>> >> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
>> >> 1048576/2097140/2097152
>> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1 Finished
>> >> randomWrite in 2630199ms at offset 1048576 for 1048576 rows
>> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished 1 in
>> >> 2630199ms
>> >> writing 1048576 rows
>> >>
>> >>
>> >>
>> >> Seems kind of slow for ~3M records.  I have a 4 node cluster up at the
>> >> moment.  HMaster & Namenode running on same box.
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
>> >> Sent from the HBase User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24955595.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: HBase in a real world application

Posted by stack <st...@duboce.net>.
Your config. looks fine.

Only think that gives me pause is:

"-XX:NewSize=6m -XX:MaxNewSize=6m"

Any reason for the above?

If you study your GC logs, lots of pauses?

Oh, and this: replication is set to 6.  Why 6?  Each write must commit to 6
datanodes before complete.  In the tests posted on wiki, we replicate to 3
nodes.

In end of this message you say you are doing gets?  Numbers you posted were
for writes?

St.Ack


On Wed, Aug 12, 2009 at 1:15 PM, llpind <so...@hotmail.com> wrote:

>
> Not sure why my performance is so slow.  Here is my configuration:
>
> box1:
> 10395 SecondaryNameNode
> 11628 Jps
> 10131 NameNode
> 10638 HQuorumPeer
> 10705 HMaster
>
> box 2-5:
> 6741 HQuorumPeer
> 6841 HRegionServer
> 7881 Jps
> 6610 DataNode
>
>
> hbase site: =======================
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> <!--
> /**
>  * Copyright 2007 The Apache Software Foundation
>  *
>  * Licensed to the Apache Software Foundation (ASF) under one
>  * or more contributor license agreements.  See the NOTICE file
>  * distributed with this work for additional information
>  * regarding copyright ownership.  The ASF licenses this file
>  * to you under the Apache License, Version 2.0 (the
>  * "License"); you may not use this file except in compliance
>  * with the License.  You may obtain a copy of the License at
>  *
>  *     http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> -->
> <configuration>
>  <property>
>    <name>hbase.rootdir</name>
>    <value>hdfs://box1:9000/hbase</value>
>    <description>The directory shared by region servers.
>    </description>
>  </property>
>  <property>
>    <name>hbase.master.port</name>
>    <value>60000</value>
>    <description>The port that the HBase master runs at.
>    </description>
>  </property>
>  <property>
>    <name>hbase.cluster.distributed</name>
>    <value>true</value>
>    <description>The mode the cluster will be in. Possible values are
>      false: standalone and pseudo-distributed setups with managed Zookeeper
>      true: fully-distributed with unmanaged Zookeeper Quorum (see
> hbase-env.sh)
>    </description>
>  </property>
>  <property>
>    <name>hbase.regionserver.lease.period</name>
>    <value>120000</value>
>    <description>HRegion server lease period in milliseconds. Default is
>    60 seconds. Clients must report in within this period else they are
>    considered dead.</description>
>  </property>
>
>  <property>
>      <name>hbase.zookeeper.property.clientPort</name>
>      <value>2222</value>
>      <description>Property from ZooKeeper's config zoo.cfg.
>      The port at which the clients will connect.
>      </description>
>  </property>
>  <property>
>      <name>hbase.zookeeper.property.dataDir</name>
>      <value>/home/hadoop/zookeeper</value>
>  </property>
>  <property>
>      <name>hbase.zookeeper.property.syncLimit</name>
>      <value>5</value>
>  </property>
>  <property>
>      <name>hbase.zookeeper.property.tickTime</name>
>      <value>2000</value>
>  </property>
>  <property>
>      <name>hbase.zookeeper.property.initLimit</name>
>      <value>10</value>
>  </property>
>  <property>
>      <name>hbase.zookeeper.quorum</name>
>      <value>box1,box2,box3,box4</value>
>      <description>Comma separated list of servers in the ZooKeeper Quorum.
>      For example,
> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
>      By default this is set to localhost for local and pseudo-distributed
> modes
>      of operation. For a fully-distributed setup, this should be set to a
> full
>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
> hbase-env.sh
>      this is the list of servers which we will start/stop ZooKeeper on.
>      </description>
>  </property>
>  <property>
>    <name>hfile.block.cache.size</name>
>    <value>.5</value>
>    <description>text</description>
>  </property>
>
> </configuration>
>
>
> hbase env:====================================================
>
> export HBASE_CLASSPATH=${HADOOP_CONF_DIR}
>
> export HBASE_HEAPSIZE=3000
>
> export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m -XX:+UseConcMarkSweepGC
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -XX:+CMSIncrementalMode
> -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"
>
> export HBASE_MANAGES_ZK=true
>
> Hadoop core site===========================================================
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
> <property>
>   <name>fs.default.name</name>
>   <value>hdfs://box1:9000</value>
>   <description>The name of the default file system.  A URI whose
>   scheme and authority determine the FileSystem implementation.  The
>   uri's scheme determines the config property (fs.SCHEME.impl) naming
>   the FileSystem implementation class.  The uri's authority is used to
>   determine the host, port, etc. for a filesystem.</description>
> </property>
> <property>
>  <name>hadoop.tmp.dir</name>
>  <value>/data/hadoop-0.20.0-${user.name}</value>
>  <description>A base for other temporary directories.</description>
> </property>
> </configuration>
>
> ==============
>
> replication is set to 6.
>
> hadoop env=================
>
> export HADOOP_HEAPSIZE=3000
> export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
> $HADOOP_NAMENODE_OPTS"
> export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
> $HADOOP_SECONDARYNAMENODE_OPTS"
> export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
> $HADOOP_DATANODE_OPTS"
> export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
> $HADOOP_BALANCER_OPTS"
> export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
> $HADOOP_JOBTRACKER_OPTS"
>  ==================
>
>
> Very basic setup.  then i start the cluster do simple random Get operations
> on a tall table (~60 M rows):
>
> {NAME => 'tallTable', FAMILIES => [{NAME => 'family1', COMPRESSION =>
> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
>
> Is this fairly normal speeds?  I'm unsure if this is a result of having a
> small cluster?  Please advise...
>
> stack-3 wrote:
> >
> > Yeah, seems slow.  In old hbase, it could do 5-10k writes a second going
> > by
> > performance eval page up on wiki.  SequentialWrite was about same as
> > RandomWrite.  Check out the stats on hw up on that page and description
> of
> > how test was set up.  Can you figure where its slow?
> >
> > St.Ack
> >
> > On Wed, Aug 12, 2009 at 10:10 AM, llpind <so...@hotmail.com> wrote:
> >
> >>
> >> Thanks Stack.
> >>
> >> I will try mapred with more clients.   I tried it without mapred using 3
> >> clients Random Write operations here was the output:
> >>
> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0 Start
> >> randomWrite at offset 0 for 1048576 rows
> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1 Start
> >> randomWrite at offset 1048576 for 1048576 rows
> >> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2 Start
> >> randomWrite at offset 2097152 for 1048576 rows
> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1153427/2097152
> >> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2201997/3145728
> >> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
> >> 0/104857/1048576
> >> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
> >> 0/209714/1048576
> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1258284/2097152
> >> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2306854/3145728
> >> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1363141/2097152
> >> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
> >> 0/314571/1048576
> >> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2411711/3145728
> >> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
> >> 0/419428/1048576
> >> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1467998/2097152
> >> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2516568/3145728
> >> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
> >> 0/524285/1048576
> >> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2621425/3145728
> >> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1572855/2097152
> >> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
> >> 0/629142/1048576
> >> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2726282/3145728
> >> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1677712/2097152
> >> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
> >> 0/733999/1048576
> >> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2831139/3145728
> >> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1782569/2097152
> >> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
> >> 0/838856/1048576
> >> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/2935996/3145728
> >> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1887426/2097152
> >> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
> >> 0/943713/1048576
> >> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/3040853/3145728
> >> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/1992283/2097152
> >> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
> >> 0/1048570/1048576
> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0 Finished
> >> randomWrite in 2376615ms at offset 0 for 1048576 rows
> >> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished 0 in
> >> 2376615ms
> >> writing 1048576 rows
> >> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
> >> 2097152/3145710/3145728
> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2 Finished
> >> randomWrite in 2623395ms at offset 2097152 for 1048576 rows
> >> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished 2 in
> >> 2623395ms
> >> writing 1048576 rows
> >> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
> >> 1048576/2097140/2097152
> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1 Finished
> >> randomWrite in 2630199ms at offset 1048576 for 1048576 rows
> >> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished 1 in
> >> 2630199ms
> >> writing 1048576 rows
> >>
> >>
> >>
> >> Seems kind of slow for ~3M records.  I have a 4 node cluster up at the
> >> moment.  HMaster & Namenode running on same box.
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
> >> Sent from the HBase User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: HBase in a real world application

Posted by Andrew Purtell <ap...@apache.org>.
Your replication setting is 6? Why do you raise it above the
default (3)? I run with replication=2 on small clusters to
improve write performance. 

    - Andy




________________________________
From: llpind <so...@hotmail.com>
To: hbase-user@hadoop.apache.org
Sent: Wednesday, August 12, 2009 1:15:31 PM
Subject: Re: HBase in a real world application


Not sure why my performance is so slow.  Here is my configuration:

box1:
10395 SecondaryNameNode
11628 Jps
10131 NameNode
10638 HQuorumPeer
10705 HMaster

box 2-5:
6741 HQuorumPeer
6841 HRegionServer
7881 Jps
6610 DataNode


hbase site: =======================
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
* Copyright 2007 The Apache Software Foundation
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements.  See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership.  The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License.  You may obtain a copy of the License at
*
*    http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-->
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://box1:9000/hbase</value>
    <description>The directory shared by region servers.
    </description>
  </property>
  <property>
    <name>hbase.master.port</name>
    <value>60000</value>
    <description>The port that the HBase master runs at.
    </description>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see
hbase-env.sh)
    </description>
  </property>
  <property>
    <name>hbase.regionserver.lease.period</name>
    <value>120000</value>
    <description>HRegion server lease period in milliseconds. Default is
    60 seconds. Clients must report in within this period else they are
    considered dead.</description>
  </property>

  <property>
      <name>hbase.zookeeper.property.clientPort</name>
      <value>2222</value>
      <description>Property from ZooKeeper's config zoo.cfg.
      The port at which the clients will connect.
      </description>
  </property>
  <property>
      <name>hbase.zookeeper.property.dataDir</name>
      <value>/home/hadoop/zookeeper</value>
  </property>
  <property>
      <name>hbase.zookeeper.property.syncLimit</name>
      <value>5</value>
  </property>
  <property>
      <name>hbase.zookeeper.property.tickTime</name>
      <value>2000</value>
  </property>
  <property>
      <name>hbase.zookeeper.property.initLimit</name>
      <value>10</value>
  </property>
  <property>
      <name>hbase.zookeeper.quorum</name>
      <value>box1,box2,box3,box4</value>
      <description>Comma separated list of servers in the ZooKeeper Quorum.
      For example,
"host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
      By default this is set to localhost for local and pseudo-distributed
modes
      of operation. For a fully-distributed setup, this should be set to a
full
      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
hbase-env.sh
      this is the list of servers which we will start/stop ZooKeeper on.
      </description>
  </property>
  <property>
    <name>hfile.block.cache.size</name>
    <value>.5</value>
    <description>text</description>
  </property>

</configuration>


hbase env:====================================================

export HBASE_CLASSPATH=${HADOOP_CONF_DIR}

export HBASE_HEAPSIZE=3000

export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m -XX:+UseConcMarkSweepGC
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+CMSIncrementalMode -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"

export HBASE_MANAGES_ZK=true

Hadoop core site===========================================================

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
   <name>fs.default.name</name>
   <value>hdfs://box1:9000</value>
   <description>The name of the default file system.  A URI whose
   scheme and authority determine the FileSystem implementation.  The
   uri's scheme determines the config property (fs.SCHEME.impl) naming
   the FileSystem implementation class.  The uri's authority is used to
   determine the host, port, etc. for a filesystem.</description>
</property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/data/hadoop-0.20.0-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>
</configuration>

==============

replication is set to 6.

hadoop env=================

export HADOOP_HEAPSIZE=3000
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_JOBTRACKER_OPTS"
==================


Very basic setup.  then i start the cluster do simple random Get operations
on a tall table (~60 M rows):

{NAME => 'tallTable', FAMILIES => [{NAME => 'family1', COMPRESSION =>
'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}

Is this fairly normal speeds?  I'm unsure if this is a result of having a
small cluster?  Please advise...

stack-3 wrote:
> 
> Yeah, seems slow.  In old hbase, it could do 5-10k writes a second going
> by
> performance eval page up on wiki.  SequentialWrite was about same as
> RandomWrite.  Check out the stats on hw up on that page and description of
> how test was set up.  Can you figure where its slow?
> 
> St.Ack
> 
> On Wed, Aug 12, 2009 at 10:10 AM, llpind <so...@hotmail.com> wrote:
> 
>>
>> Thanks Stack.
>>
>> I will try mapred with more clients.   I tried it without mapred using 3
>> clients Random Write operations here was the output:
>>
>> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0 Start
>> randomWrite at offset 0 for 1048576 rows
>> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1 Start
>> randomWrite at offset 1048576 for 1048576 rows
>> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2 Start
>> randomWrite at offset 2097152 for 1048576 rows
>> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1153427/2097152
>> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2201997/3145728
>> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
>> 0/104857/1048576
>> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
>> 0/209714/1048576
>> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1258284/2097152
>> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2306854/3145728
>> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1363141/2097152
>> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
>> 0/314571/1048576
>> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2411711/3145728
>> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
>> 0/419428/1048576
>> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1467998/2097152
>> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2516568/3145728
>> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
>> 0/524285/1048576
>> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2621425/3145728
>> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1572855/2097152
>> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
>> 0/629142/1048576
>> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2726282/3145728
>> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1677712/2097152
>> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
>> 0/733999/1048576
>> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2831139/3145728
>> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1782569/2097152
>> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
>> 0/838856/1048576
>> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2935996/3145728
>> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1887426/2097152
>> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
>> 0/943713/1048576
>> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/3040853/3145728
>> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1992283/2097152
>> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
>> 0/1048570/1048576
>> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0 Finished
>> randomWrite in 2376615ms at offset 0 for 1048576 rows
>> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished 0 in
>> 2376615ms
>> writing 1048576 rows
>> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/3145710/3145728
>> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2 Finished
>> randomWrite in 2623395ms at offset 2097152 for 1048576 rows
>> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished 2 in
>> 2623395ms
>> writing 1048576 rows
>> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/2097140/2097152
>> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1 Finished
>> randomWrite in 2630199ms at offset 1048576 for 1048576 rows
>> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished 1 in
>> 2630199ms
>> writing 1048576 rows
>>
>>
>>
>> Seems kind of slow for ~3M records.  I have a 4 node cluster up at the
>> moment.  HMaster & Namenode running on same box.
>> --
>> View this message in context:
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
Sent from the HBase User mailing list archive at Nabble.com.


      

Re: HBase in a real world application

Posted by llpind <so...@hotmail.com>.
Not sure why my performance is so slow.  Here is my configuration:

box1:
10395 SecondaryNameNode
11628 Jps
10131 NameNode
10638 HQuorumPeer
10705 HMaster

box 2-5:
6741 HQuorumPeer
6841 HRegionServer
7881 Jps
6610 DataNode


hbase site: =======================
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
 * Copyright 2007 The Apache Software Foundation
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
-->
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://box1:9000/hbase</value>
    <description>The directory shared by region servers.
    </description>
  </property>
  <property>
    <name>hbase.master.port</name>
    <value>60000</value>
    <description>The port that the HBase master runs at.
    </description>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see
hbase-env.sh)
    </description>
  </property>
  <property>
    <name>hbase.regionserver.lease.period</name>
    <value>120000</value>
    <description>HRegion server lease period in milliseconds. Default is
    60 seconds. Clients must report in within this period else they are
    considered dead.</description>
  </property>

  <property>
      <name>hbase.zookeeper.property.clientPort</name>
      <value>2222</value>
      <description>Property from ZooKeeper's config zoo.cfg.
      The port at which the clients will connect.
      </description>
  </property>
  <property>
      <name>hbase.zookeeper.property.dataDir</name>
      <value>/home/hadoop/zookeeper</value>
  </property>
  <property>
      <name>hbase.zookeeper.property.syncLimit</name>
      <value>5</value>
  </property>
  <property>
      <name>hbase.zookeeper.property.tickTime</name>
      <value>2000</value>
  </property>
  <property>
      <name>hbase.zookeeper.property.initLimit</name>
      <value>10</value>
  </property>
  <property>
      <name>hbase.zookeeper.quorum</name>
      <value>box1,box2,box3,box4</value>
      <description>Comma separated list of servers in the ZooKeeper Quorum.
      For example,
"host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
      By default this is set to localhost for local and pseudo-distributed
modes
      of operation. For a fully-distributed setup, this should be set to a
full
      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
hbase-env.sh
      this is the list of servers which we will start/stop ZooKeeper on.
      </description>
  </property>
  <property>
    <name>hfile.block.cache.size</name>
    <value>.5</value>
    <description>text</description>
  </property>

</configuration>


hbase env:====================================================

export HBASE_CLASSPATH=${HADOOP_CONF_DIR}

export HBASE_HEAPSIZE=3000

export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m -XX:+UseConcMarkSweepGC
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+CMSIncrementalMode -Xloggc:/home/hadoop/hbase-0.20.0/logs/gc-hbase.log"

export HBASE_MANAGES_ZK=true

Hadoop core site===========================================================

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
   <name>fs.default.name</name>
   <value>hdfs://box1:9000</value>
   <description>The name of the default file system.  A URI whose
   scheme and authority determine the FileSystem implementation.  The
   uri's scheme determines the config property (fs.SCHEME.impl) naming
   the FileSystem implementation class.  The uri's authority is used to
   determine the host, port, etc. for a filesystem.</description>
</property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/data/hadoop-0.20.0-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>
</configuration>

==============

replication is set to 6.

hadoop env=================

export HADOOP_HEAPSIZE=3000
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_JOBTRACKER_OPTS"
 ==================


Very basic setup.  then i start the cluster do simple random Get operations
on a tall table (~60 M rows):

{NAME => 'tallTable', FAMILIES => [{NAME => 'family1', COMPRESSION =>
'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}

Is this fairly normal speeds?  I'm unsure if this is a result of having a
small cluster?  Please advise...

stack-3 wrote:
> 
> Yeah, seems slow.  In old hbase, it could do 5-10k writes a second going
> by
> performance eval page up on wiki.  SequentialWrite was about same as
> RandomWrite.  Check out the stats on hw up on that page and description of
> how test was set up.  Can you figure where its slow?
> 
> St.Ack
> 
> On Wed, Aug 12, 2009 at 10:10 AM, llpind <so...@hotmail.com> wrote:
> 
>>
>> Thanks Stack.
>>
>> I will try mapred with more clients.   I tried it without mapred using 3
>> clients Random Write operations here was the output:
>>
>> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0 Start
>> randomWrite at offset 0 for 1048576 rows
>> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1 Start
>> randomWrite at offset 1048576 for 1048576 rows
>> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2 Start
>> randomWrite at offset 2097152 for 1048576 rows
>> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1153427/2097152
>> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2201997/3145728
>> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
>> 0/104857/1048576
>> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
>> 0/209714/1048576
>> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1258284/2097152
>> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2306854/3145728
>> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1363141/2097152
>> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
>> 0/314571/1048576
>> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2411711/3145728
>> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
>> 0/419428/1048576
>> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1467998/2097152
>> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2516568/3145728
>> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
>> 0/524285/1048576
>> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2621425/3145728
>> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1572855/2097152
>> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
>> 0/629142/1048576
>> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2726282/3145728
>> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1677712/2097152
>> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
>> 0/733999/1048576
>> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2831139/3145728
>> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1782569/2097152
>> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
>> 0/838856/1048576
>> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2935996/3145728
>> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1887426/2097152
>> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
>> 0/943713/1048576
>> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/3040853/3145728
>> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1992283/2097152
>> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
>> 0/1048570/1048576
>> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0 Finished
>> randomWrite in 2376615ms at offset 0 for 1048576 rows
>> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished 0 in
>> 2376615ms
>> writing 1048576 rows
>> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/3145710/3145728
>> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2 Finished
>> randomWrite in 2623395ms at offset 2097152 for 1048576 rows
>> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished 2 in
>> 2623395ms
>> writing 1048576 rows
>> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/2097140/2097152
>> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1 Finished
>> randomWrite in 2630199ms at offset 1048576 for 1048576 rows
>> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished 1 in
>> 2630199ms
>> writing 1048576 rows
>>
>>
>>
>> Seems kind of slow for ~3M records.  I have a 4 node cluster up at the
>> moment.  HMaster & Namenode running on same box.
>> --
>> View this message in context:
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24943406.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: HBase in a real world application

Posted by llpind <so...@hotmail.com>.
Yeah that was my thinking.  Not sure what configuration they had.  Is it this
page?:

http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation#0_19_0

I tried a simple test program doing reads something like this:
			int n = 1000000;
			Get get = new Get();
			
			long start = System.currentTimeMillis();
			rset.next();
			for (int i = 0; i < n ; ++i){
				byte[] row= Bytes.toBytes(rset.getString(1));
				table.get(new Get(row));
			}

For 10,000 i get 4750ms
For 1,000,000 i get 346242ms (~ 5 minutes).

Must be something with my cluster setup. 




stack-3 wrote:
> 
> Yeah, seems slow.  In old hbase, it could do 5-10k writes a second going
> by
> performance eval page up on wiki.  SequentialWrite was about same as
> RandomWrite.  Check out the stats on hw up on that page and description of
> how test was set up.  Can you figure where its slow?
> 
> St.Ack
> 
> On Wed, Aug 12, 2009 at 10:10 AM, llpind <so...@hotmail.com> wrote:
> 
>>
>> Thanks Stack.
>>
>> I will try mapred with more clients.   I tried it without mapred using 3
>> clients Random Write operations here was the output:
>>
>> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0 Start
>> randomWrite at offset 0 for 1048576 rows
>> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1 Start
>> randomWrite at offset 1048576 for 1048576 rows
>> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2 Start
>> randomWrite at offset 2097152 for 1048576 rows
>> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1153427/2097152
>> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2201997/3145728
>> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
>> 0/104857/1048576
>> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
>> 0/209714/1048576
>> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1258284/2097152
>> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2306854/3145728
>> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1363141/2097152
>> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
>> 0/314571/1048576
>> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2411711/3145728
>> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
>> 0/419428/1048576
>> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1467998/2097152
>> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2516568/3145728
>> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
>> 0/524285/1048576
>> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2621425/3145728
>> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1572855/2097152
>> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
>> 0/629142/1048576
>> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2726282/3145728
>> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1677712/2097152
>> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
>> 0/733999/1048576
>> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2831139/3145728
>> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1782569/2097152
>> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
>> 0/838856/1048576
>> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/2935996/3145728
>> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1887426/2097152
>> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
>> 0/943713/1048576
>> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/3040853/3145728
>> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/1992283/2097152
>> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
>> 0/1048570/1048576
>> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0 Finished
>> randomWrite in 2376615ms at offset 0 for 1048576 rows
>> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished 0 in
>> 2376615ms
>> writing 1048576 rows
>> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
>> 2097152/3145710/3145728
>> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2 Finished
>> randomWrite in 2623395ms at offset 2097152 for 1048576 rows
>> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished 2 in
>> 2623395ms
>> writing 1048576 rows
>> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
>> 1048576/2097140/2097152
>> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1 Finished
>> randomWrite in 2630199ms at offset 1048576 for 1048576 rows
>> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished 1 in
>> 2630199ms
>> writing 1048576 rows
>>
>>
>>
>> Seems kind of slow for ~3M records.  I have a 4 node cluster up at the
>> moment.  HMaster & Namenode running on same box.
>> --
>> View this message in context:
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24942799.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: HBase in a real world application

Posted by stack <st...@duboce.net>.
Yeah, seems slow.  In old hbase, it could do 5-10k writes a second going by
performance eval page up on wiki.  SequentialWrite was about same as
RandomWrite.  Check out the stats on hw up on that page and description of
how test was set up.  Can you figure where its slow?

St.Ack

On Wed, Aug 12, 2009 at 10:10 AM, llpind <so...@hotmail.com> wrote:

>
> Thanks Stack.
>
> I will try mapred with more clients.   I tried it without mapred using 3
> clients Random Write operations here was the output:
>
> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0 Start
> randomWrite at offset 0 for 1048576 rows
> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1 Start
> randomWrite at offset 1048576 for 1048576 rows
> 09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2 Start
> randomWrite at offset 2097152 for 1048576 rows
> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1153427/2097152
> 09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2201997/3145728
> 09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
> 0/104857/1048576
> 09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
> 0/209714/1048576
> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1258284/2097152
> 09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2306854/3145728
> 09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1363141/2097152
> 09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
> 0/314571/1048576
> 09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2411711/3145728
> 09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
> 0/419428/1048576
> 09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1467998/2097152
> 09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2516568/3145728
> 09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
> 0/524285/1048576
> 09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2621425/3145728
> 09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1572855/2097152
> 09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
> 0/629142/1048576
> 09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2726282/3145728
> 09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1677712/2097152
> 09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
> 0/733999/1048576
> 09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2831139/3145728
> 09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1782569/2097152
> 09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
> 0/838856/1048576
> 09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
> 2097152/2935996/3145728
> 09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1887426/2097152
> 09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
> 0/943713/1048576
> 09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
> 2097152/3040853/3145728
> 09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
> 1048576/1992283/2097152
> 09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
> 0/1048570/1048576
> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0 Finished
> randomWrite in 2376615ms at offset 0 for 1048576 rows
> 09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished 0 in 2376615ms
> writing 1048576 rows
> 09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
> 2097152/3145710/3145728
> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2 Finished
> randomWrite in 2623395ms at offset 2097152 for 1048576 rows
> 09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished 2 in 2623395ms
> writing 1048576 rows
> 09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
> 1048576/2097140/2097152
> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1 Finished
> randomWrite in 2630199ms at offset 1048576 for 1048576 rows
> 09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished 1 in 2630199ms
> writing 1048576 rows
>
>
>
> Seems kind of slow for ~3M records.  I have a 4 node cluster up at the
> moment.  HMaster & Namenode running on same box.
> --
> View this message in context:
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: HBase in a real world application

Posted by llpind <so...@hotmail.com>.
Thanks Stack.

I will try mapred with more clients.   I tried it without mapred using 3
clients Random Write operations here was the output:

09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-0 Start
randomWrite at offset 0 for 1048576 rows
09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-1 Start
randomWrite at offset 1048576 for 1048576 rows
09/08/12 09:22:52 INFO hbase.PerformanceEvaluation: client-2 Start
randomWrite at offset 2097152 for 1048576 rows
09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-1
1048576/1153427/2097152
09/08/12 09:24:23 INFO hbase.PerformanceEvaluation: client-2
2097152/2201997/3145728
09/08/12 09:24:25 INFO hbase.PerformanceEvaluation: client-0
0/104857/1048576
09/08/12 09:27:42 INFO hbase.PerformanceEvaluation: client-0
0/209714/1048576
09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-1
1048576/1258284/2097152
09/08/12 09:27:46 INFO hbase.PerformanceEvaluation: client-2
2097152/2306854/3145728
09/08/12 09:32:32 INFO hbase.PerformanceEvaluation: client-1
1048576/1363141/2097152
09/08/12 09:32:33 INFO hbase.PerformanceEvaluation: client-0
0/314571/1048576
09/08/12 09:32:41 INFO hbase.PerformanceEvaluation: client-2
2097152/2411711/3145728
09/08/12 09:35:31 INFO hbase.PerformanceEvaluation: client-0
0/419428/1048576
09/08/12 09:35:34 INFO hbase.PerformanceEvaluation: client-1
1048576/1467998/2097152
09/08/12 09:35:53 INFO hbase.PerformanceEvaluation: client-2
2097152/2516568/3145728
09/08/12 09:39:02 INFO hbase.PerformanceEvaluation: client-0
0/524285/1048576
09/08/12 09:39:03 INFO hbase.PerformanceEvaluation: client-2
2097152/2621425/3145728
09/08/12 09:40:07 INFO hbase.PerformanceEvaluation: client-1
1048576/1572855/2097152
09/08/12 09:42:53 INFO hbase.PerformanceEvaluation: client-0
0/629142/1048576
09/08/12 09:44:25 INFO hbase.PerformanceEvaluation: client-2
2097152/2726282/3145728
09/08/12 09:44:44 INFO hbase.PerformanceEvaluation: client-1
1048576/1677712/2097152
09/08/12 09:46:43 INFO hbase.PerformanceEvaluation: client-0
0/733999/1048576
09/08/12 09:48:11 INFO hbase.PerformanceEvaluation: client-2
2097152/2831139/3145728
09/08/12 09:48:29 INFO hbase.PerformanceEvaluation: client-1
1048576/1782569/2097152
09/08/12 09:50:12 INFO hbase.PerformanceEvaluation: client-0
0/838856/1048576
09/08/12 09:52:47 INFO hbase.PerformanceEvaluation: client-2
2097152/2935996/3145728
09/08/12 09:53:51 INFO hbase.PerformanceEvaluation: client-1
1048576/1887426/2097152
09/08/12 09:56:32 INFO hbase.PerformanceEvaluation: client-0
0/943713/1048576
09/08/12 09:58:32 INFO hbase.PerformanceEvaluation: client-2
2097152/3040853/3145728
09/08/12 09:59:14 INFO hbase.PerformanceEvaluation: client-1
1048576/1992283/2097152
09/08/12 10:02:28 INFO hbase.PerformanceEvaluation: client-0
0/1048570/1048576
09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: client-0 Finished
randomWrite in 2376615ms at offset 0 for 1048576 rows
09/08/12 10:02:30 INFO hbase.PerformanceEvaluation: Finished 0 in 2376615ms
writing 1048576 rows
09/08/12 10:06:35 INFO hbase.PerformanceEvaluation: client-2
2097152/3145710/3145728
09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: client-2 Finished
randomWrite in 2623395ms at offset 2097152 for 1048576 rows
09/08/12 10:06:38 INFO hbase.PerformanceEvaluation: Finished 2 in 2623395ms
writing 1048576 rows
09/08/12 10:06:42 INFO hbase.PerformanceEvaluation: client-1
1048576/2097140/2097152
09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: client-1 Finished
randomWrite in 2630199ms at offset 1048576 for 1048576 rows
09/08/12 10:06:43 INFO hbase.PerformanceEvaluation: Finished 1 in 2630199ms
writing 1048576 rows



Seems kind of slow for ~3M records.  I have a 4 node cluster up at the
moment.  HMaster & Namenode running on same box.  
-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24940922.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: HBase in a real world application

Posted by stack <st...@duboce.net>.
Thanks for trying.  Looks like that region is now gone (split is my guess).
Check the master log for mentions of this region to see its history.  Can
you correlate the client failure with an event on this region in master
log?  It looks like client was being pig-headed fixated on the parent of a
split.  You could check your table is healthy?  Run the rowcounter program
to make sure no holes in table?

St.Ack

On Fri, Aug 14, 2009 at 12:41 PM, llpind <so...@hotmail.com> wrote:

>
> hbase(main):003:0> get '.META.', 'TestTable,0001749889,1250092414985',
> {COLUMNS =>'info'}
> 09/08/14 12:28:10 DEBUG client.HConnectionManager$TableServers: Cache hit
> for row <> in tableName .META.: location server 192.168.0.196:60020,
> location region name .META.,,1
> NativeException: java.lang.NullPointerException: null
>        from org/apache/hadoop/hbase/client/HTable.java:789:in `get'
>        from org/apache/hadoop/hbase/client/HTable.java:769:in `get'
>        from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
>        from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
>        from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
>        from java/lang/reflect/Method.java:597:in `invoke'
>        from org/jruby/javasupport/JavaMethod.java:298:in
> `invokeWithExceptionHandling'
>        from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
>        from org/jruby/java/invokers/InstanceMethodInvoker.java:30:in `call'
>        from org/jruby/runtime/callsite/CachingCallSite.java:30:in `call'
>        from org/jruby/ast/CallManyArgsNode.java:59:in `interpret'
>        from org/jruby/ast/LocalAsgnNode.java:123:in `interpret'
>        from org/jruby/ast/NewlineNode.java:104:in `interpret'
>        from org/jruby/ast/IfNode.java:112:in `interpret'
>        from org/jruby/ast/NewlineNode.java:104:in `interpret'
>        from org/jruby/ast/IfNode.java:114:in `interpret'
> ... 115 levels...
>        from
> home/hadoop/hbase_minus_0_dot_20_dot_0/bin/$_dot_dot_/bin/hirb#start:-1:in
> `call'
>        from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in
> `call'
>        from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in
> `call'
>        from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in
> `call'
>        from org/jruby/runtime/callsite/CachingCallSite.java:253:in
> `cacheAndCall'
>        from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>        from
> home/hadoop/hbase_minus_0_dot_20_dot_0/bin/$_dot_dot_/bin/hirb.rb:487:in
> `__file__'
>        from
> home/hadoop/hbase_minus_0_dot_20_dot_0/bin/$_dot_dot_/bin/hirb.rb:-1:in
> `load'
>        from org/jruby/Ruby.java:577:in `runScript'
>        from org/jruby/Ruby.java:480:in `runNormally'
>        from org/jruby/Ruby.java:354:in `runFromMain'
>        from org/jruby/Main.java:229:in `run'
>        from org/jruby/Main.java:110:in `run'
>        from org/jruby/Main.java:94:in `main'
>        from /home/hadoop/hbase-0.20.0/bin/../bin/hirb.rb:384:in `get'
>        from (hbase):4hbase(main):004:0> get '.META.',
> 'TestTable,0001749889,1250092414985'
> 09/08/14 12:28:13 DEBUG client.HConnectionManager$TableServers: Cache hit
> for row <> in tableName .META.: location server 192.168.0.196:60020,
> location region name .META.,,1
> COLUMN                       CELL
>  historian:assignment        timestamp=1250108456441, value=Region assigned
> to server server195,60020,12501083767
>                             79
>  historian:compaction        timestamp=1250109313965, value=Region
> compaction completed in 35sec
>  historian:open              timestamp=1250108459484, value=Region opened
> on
> server : server195
>  historian:split             timestamp=1250092447915, value=Region split
> from: TestTable,0001634945,1250035163
>                             027
>  info:regioninfo             timestamp=1250109315260, value=REGION => {NAME
> => 'TestTable,0001749889,125009241
>                             4985', STARTKEY => '0001749889', ENDKEY =>
> '0001866010', ENCODED => 1707908074, O
>                             FFLINE => true, TABLE => {{NAME => 'TestTable',
> FAMILIES => [{NAME => 'info', VER
>                             SIONS => '3', COMPRESSION => 'NONE', TTL =>
> '2147483647', BLOCKSIZE => '65536', I
>                             N_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>
>
> =====================================
>
> stack-3 wrote:
> >
> > Is that region offline?
> >
> > Do a:
> >
> > hbase> get ".META.", "TestTable,0001749889,1250092414985", {COLUMNS =>
> > "info"}.
> >
> > If so, can you get its history so we can figure how it went offline? (See
> > region history in UI or grep it in master logs?)
> >
> > St.Ack
> >
> >
> > On Fri, Aug 14, 2009 at 9:55 AM, llpind <so...@hotmail.com> wrote:
> >
> >>
> >> Hey Stack,  I tried the following command:
> >>
> >> hadoop-0.20.0/bin/hadoop jar hbase-0.20.0/hbase-0.20.0-test.jar
> >>  randomWrite
> >> 10
> >>
> >> running a map/reduce job, it failed with the following exceptions in
> each
> >> node:
> >>
> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> >> contact
> >> region server Some server for region , row '0001753186', but failed
> after
> >> 11
> >> attempts.
> >> Exceptions:
> >> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> >> TestTable,0001749889,1250092414985
> >> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> >> TestTable,0001749889,1250092414985
> >> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> >> TestTable,0001749889,1250092414985
> >> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> >> TestTable,0001749889,1250092414985
> >> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> >> TestTable,0001749889,1250092414985
> >> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> >> TestTable,0001749889,1250092414985
> >> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> >> TestTable,0001749889,1250092414985
> >> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> >> TestTable,0001749889,1250092414985
> >> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> >> TestTable,0001749889,1250092414985
> >> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> >> TestTable,0001749889,1250092414985
> >>
> >>        at
> >>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:995)
> >>        at
> >>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1025)
> >>        at
> >> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:584)
> >>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:450)
> >>        at
> >>
> >>
> org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest.testRow(PerformanceEvaluation.java:497)
> >>        at
> >>
> >>
> org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:406)
> >>        at
> >>
> >>
> org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:627)
> >>        at
> >>
> >>
> org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:194)
> >>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >>        at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> >>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >>
> >> ================================================================
> >>
> >> This appears to be this issue:
> >>
> >> http://issues.apache.org/jira/browse/HBASE-1603
> >>
> >>
> >> Has this been fixed in .20?  Thanks.
> >>
> >>
> >>
> >> stack-3 wrote:
> >> >
> >> > On Wed, Aug 12, 2009 at 8:58 AM, llpind <so...@hotmail.com>
> wrote:
> >> >
> >> >>
> >> >> Playing with the HBase perfomanceEval Class, but it seems to take a
> >> long
> >> >> time to run “sequentialWrite 2” (~20 minutes).  If I simply emulate 1
> >> >> clients in a simple program, I can do 1 Million Puts in about 3
> >> minutes
> >> >> (non
> >> >> mapred).  The sequential write is writing 2 million with 2 clients.
> >> >> Please
> >> >> help in understanding how to use the performanceEvaluation Class.
> >> >>
> >> >
> >> > If the number of clients is > 1, unless you add the '--nomapred' (sp?)
> >> > argument, PE launches a mapreduce program of N tasks.  Each task puts
> >> up
> >> a
> >> > client writing 1M rows (IIRC).  Try N where N == number_of_map_slots
> >> and
> >> > see
> >> > what that does?  N == 2 probably won't tell you much.  You could also
> >> set
> >> > an
> >> > N > 1 and use the '--nomapred'.  This will run PE clients in a
> distinct
> >> > thread.  For small numbers of N, this can put up heavier loading than
> >> MR
> >> > with its setup and teardown cost.
> >> >
> >> > St.Ack
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24975031.html
> >> Sent from the HBase User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24977400.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: HBase in a real world application

Posted by llpind <so...@hotmail.com>.
hbase(main):003:0> get '.META.', 'TestTable,0001749889,1250092414985',
{COLUMNS =>'info'}
09/08/14 12:28:10 DEBUG client.HConnectionManager$TableServers: Cache hit
for row <> in tableName .META.: location server 192.168.0.196:60020,
location region name .META.,,1
NativeException: java.lang.NullPointerException: null
        from org/apache/hadoop/hbase/client/HTable.java:789:in `get'
        from org/apache/hadoop/hbase/client/HTable.java:769:in `get'
        from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
        from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
        from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
        from java/lang/reflect/Method.java:597:in `invoke'
        from org/jruby/javasupport/JavaMethod.java:298:in
`invokeWithExceptionHandling'
        from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
        from org/jruby/java/invokers/InstanceMethodInvoker.java:30:in `call'
        from org/jruby/runtime/callsite/CachingCallSite.java:30:in `call'
        from org/jruby/ast/CallManyArgsNode.java:59:in `interpret'
        from org/jruby/ast/LocalAsgnNode.java:123:in `interpret'
        from org/jruby/ast/NewlineNode.java:104:in `interpret'
        from org/jruby/ast/IfNode.java:112:in `interpret'
        from org/jruby/ast/NewlineNode.java:104:in `interpret'
        from org/jruby/ast/IfNode.java:114:in `interpret'
... 115 levels...
        from
home/hadoop/hbase_minus_0_dot_20_dot_0/bin/$_dot_dot_/bin/hirb#start:-1:in
`call'
        from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in
`call'
        from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in
`call'
        from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in
`call'
        from org/jruby/runtime/callsite/CachingCallSite.java:253:in
`cacheAndCall'
        from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
        from
home/hadoop/hbase_minus_0_dot_20_dot_0/bin/$_dot_dot_/bin/hirb.rb:487:in
`__file__'
        from
home/hadoop/hbase_minus_0_dot_20_dot_0/bin/$_dot_dot_/bin/hirb.rb:-1:in
`load'
        from org/jruby/Ruby.java:577:in `runScript'
        from org/jruby/Ruby.java:480:in `runNormally'
        from org/jruby/Ruby.java:354:in `runFromMain'
        from org/jruby/Main.java:229:in `run'
        from org/jruby/Main.java:110:in `run'
        from org/jruby/Main.java:94:in `main'
        from /home/hadoop/hbase-0.20.0/bin/../bin/hirb.rb:384:in `get'
        from (hbase):4hbase(main):004:0> get '.META.',
'TestTable,0001749889,1250092414985'
09/08/14 12:28:13 DEBUG client.HConnectionManager$TableServers: Cache hit
for row <> in tableName .META.: location server 192.168.0.196:60020,
location region name .META.,,1
COLUMN                       CELL
 historian:assignment        timestamp=1250108456441, value=Region assigned
to server server195,60020,12501083767
                             79
 historian:compaction        timestamp=1250109313965, value=Region
compaction completed in 35sec
 historian:open              timestamp=1250108459484, value=Region opened on
server : server195
 historian:split             timestamp=1250092447915, value=Region split
from: TestTable,0001634945,1250035163
                             027
 info:regioninfo             timestamp=1250109315260, value=REGION => {NAME
=> 'TestTable,0001749889,125009241
                             4985', STARTKEY => '0001749889', ENDKEY =>
'0001866010', ENCODED => 1707908074, O
                             FFLINE => true, TABLE => {{NAME => 'TestTable',
FAMILIES => [{NAME => 'info', VER
                             SIONS => '3', COMPRESSION => 'NONE', TTL =>
'2147483647', BLOCKSIZE => '65536', I
                             N_MEMORY => 'false', BLOCKCACHE => 'true'}]}}


=====================================

stack-3 wrote:
> 
> Is that region offline?
> 
> Do a:
> 
> hbase> get ".META.", "TestTable,0001749889,1250092414985", {COLUMNS =>
> "info"}.
> 
> If so, can you get its history so we can figure how it went offline? (See
> region history in UI or grep it in master logs?)
> 
> St.Ack
> 
> 
> On Fri, Aug 14, 2009 at 9:55 AM, llpind <so...@hotmail.com> wrote:
> 
>>
>> Hey Stack,  I tried the following command:
>>
>> hadoop-0.20.0/bin/hadoop jar hbase-0.20.0/hbase-0.20.0-test.jar
>>  randomWrite
>> 10
>>
>> running a map/reduce job, it failed with the following exceptions in each
>> node:
>>
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>> contact
>> region server Some server for region , row '0001753186', but failed after
>> 11
>> attempts.
>> Exceptions:
>> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
>> TestTable,0001749889,1250092414985
>> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
>> TestTable,0001749889,1250092414985
>> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
>> TestTable,0001749889,1250092414985
>> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
>> TestTable,0001749889,1250092414985
>> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
>> TestTable,0001749889,1250092414985
>> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
>> TestTable,0001749889,1250092414985
>> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
>> TestTable,0001749889,1250092414985
>> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
>> TestTable,0001749889,1250092414985
>> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
>> TestTable,0001749889,1250092414985
>> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
>> TestTable,0001749889,1250092414985
>>
>>        at
>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:995)
>>        at
>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1025)
>>        at
>> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:584)
>>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:450)
>>        at
>>
>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest.testRow(PerformanceEvaluation.java:497)
>>        at
>>
>> org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:406)
>>        at
>>
>> org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:627)
>>        at
>>
>> org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:194)
>>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> ================================================================
>>
>> This appears to be this issue:
>>
>> http://issues.apache.org/jira/browse/HBASE-1603
>>
>>
>> Has this been fixed in .20?  Thanks.
>>
>>
>>
>> stack-3 wrote:
>> >
>> > On Wed, Aug 12, 2009 at 8:58 AM, llpind <so...@hotmail.com> wrote:
>> >
>> >>
>> >> Playing with the HBase perfomanceEval Class, but it seems to take a
>> long
>> >> time to run “sequentialWrite 2” (~20 minutes).  If I simply emulate 1
>> >> clients in a simple program, I can do 1 Million Puts in about 3
>> minutes
>> >> (non
>> >> mapred).  The sequential write is writing 2 million with 2 clients.
>> >> Please
>> >> help in understanding how to use the performanceEvaluation Class.
>> >>
>> >
>> > If the number of clients is > 1, unless you add the '--nomapred' (sp?)
>> > argument, PE launches a mapreduce program of N tasks.  Each task puts
>> up
>> a
>> > client writing 1M rows (IIRC).  Try N where N == number_of_map_slots
>> and
>> > see
>> > what that does?  N == 2 probably won't tell you much.  You could also
>> set
>> > an
>> > N > 1 and use the '--nomapred'.  This will run PE clients in a distinct
>> > thread.  For small numbers of N, this can put up heavier loading than
>> MR
>> > with its setup and teardown cost.
>> >
>> > St.Ack
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24975031.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24977400.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: HBase in a real world application

Posted by stack <st...@duboce.net>.
Is that region offline?

Do a:

hbase> get ".META.", "TestTable,0001749889,1250092414985", {COLUMNS =>
"info"}.

If so, can you get its history so we can figure how it went offline? (See
region history in UI or grep it in master logs?)

St.Ack


On Fri, Aug 14, 2009 at 9:55 AM, llpind <so...@hotmail.com> wrote:

>
> Hey Stack,  I tried the following command:
>
> hadoop-0.20.0/bin/hadoop jar hbase-0.20.0/hbase-0.20.0-test.jar
>  randomWrite
> 10
>
> running a map/reduce job, it failed with the following exceptions in each
> node:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server Some server for region , row '0001753186', but failed after
> 11
> attempts.
> Exceptions:
> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> TestTable,0001749889,1250092414985
> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> TestTable,0001749889,1250092414985
> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> TestTable,0001749889,1250092414985
> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> TestTable,0001749889,1250092414985
> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> TestTable,0001749889,1250092414985
> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> TestTable,0001749889,1250092414985
> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> TestTable,0001749889,1250092414985
> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> TestTable,0001749889,1250092414985
> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> TestTable,0001749889,1250092414985
> org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
> TestTable,0001749889,1250092414985
>
>        at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:995)
>        at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1025)
>        at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:584)
>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:450)
>        at
>
> org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest.testRow(PerformanceEvaluation.java:497)
>        at
>
> org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:406)
>        at
>
> org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:627)
>        at
>
> org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:194)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> ================================================================
>
> This appears to be this issue:
>
> http://issues.apache.org/jira/browse/HBASE-1603
>
>
> Has this been fixed in .20?  Thanks.
>
>
>
> stack-3 wrote:
> >
> > On Wed, Aug 12, 2009 at 8:58 AM, llpind <so...@hotmail.com> wrote:
> >
> >>
> >> Playing with the HBase perfomanceEval Class, but it seems to take a long
> >> time to run “sequentialWrite 2” (~20 minutes).  If I simply emulate 1
> >> clients in a simple program, I can do 1 Million Puts in about 3 minutes
> >> (non
> >> mapred).  The sequential write is writing 2 million with 2 clients.
> >> Please
> >> help in understanding how to use the performanceEvaluation Class.
> >>
> >
> > If the number of clients is > 1, unless you add the '--nomapred' (sp?)
> > argument, PE launches a mapreduce program of N tasks.  Each task puts up
> a
> > client writing 1M rows (IIRC).  Try N where N == number_of_map_slots and
> > see
> > what that does?  N == 2 probably won't tell you much.  You could also set
> > an
> > N > 1 and use the '--nomapred'.  This will run PE clients in a distinct
> > thread.  For small numbers of N, this can put up heavier loading than MR
> > with its setup and teardown cost.
> >
> > St.Ack
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24975031.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: HBase in a real world application

Posted by llpind <so...@hotmail.com>.
Hey Stack,  I tried the following command:

hadoop-0.20.0/bin/hadoop jar hbase-0.20.0/hbase-0.20.0-test.jar  randomWrite
10

running a map/reduce job, it failed with the following exceptions in each
node:

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server Some server for region , row '0001753186', but failed after 11
attempts.
Exceptions:
org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
TestTable,0001749889,1250092414985
org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
TestTable,0001749889,1250092414985
org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
TestTable,0001749889,1250092414985
org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
TestTable,0001749889,1250092414985
org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
TestTable,0001749889,1250092414985
org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
TestTable,0001749889,1250092414985
org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
TestTable,0001749889,1250092414985
org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
TestTable,0001749889,1250092414985
org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
TestTable,0001749889,1250092414985
org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
TestTable,0001749889,1250092414985

	at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:995)
	at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1025)
	at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:584)
	at org.apache.hadoop.hbase.client.HTable.put(HTable.java:450)
	at
org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest.testRow(PerformanceEvaluation.java:497)
	at
org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:406)
	at
org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:627)
	at
org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:194)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

================================================================

This appears to be this issue:

http://issues.apache.org/jira/browse/HBASE-1603


Has this been fixed in .20?  Thanks.



stack-3 wrote:
> 
> On Wed, Aug 12, 2009 at 8:58 AM, llpind <so...@hotmail.com> wrote:
> 
>>
>> Playing with the HBase perfomanceEval Class, but it seems to take a long
>> time to run “sequentialWrite 2” (~20 minutes).  If I simply emulate 1
>> clients in a simple program, I can do 1 Million Puts in about 3 minutes
>> (non
>> mapred).  The sequential write is writing 2 million with 2 clients. 
>> Please
>> help in understanding how to use the performanceEvaluation Class.
>>
> 
> If the number of clients is > 1, unless you add the '--nomapred' (sp?)
> argument, PE launches a mapreduce program of N tasks.  Each task puts up a
> client writing 1M rows (IIRC).  Try N where N == number_of_map_slots and
> see
> what that does?  N == 2 probably won't tell you much.  You could also set
> an
> N > 1 and use the '--nomapred'.  This will run PE clients in a distinct
> thread.  For small numbers of N, this can put up heavier loading than MR
> with its setup and teardown cost.
> 
> St.Ack
> 
> 

-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24975031.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: HBase in a real world application

Posted by stack <st...@duboce.net>.
On Wed, Aug 12, 2009 at 8:58 AM, llpind <so...@hotmail.com> wrote:

>
> Playing with the HBase perfomanceEval Class, but it seems to take a long
> time to run “sequentialWrite 2” (~20 minutes).  If I simply emulate 1
> clients in a simple program, I can do 1 Million Puts in about 3 minutes
> (non
> mapred).  The sequential write is writing 2 million with 2 clients.  Please
> help in understanding how to use the performanceEvaluation Class.
>

If the number of clients is > 1, unless you add the '--nomapred' (sp?)
argument, PE launches a mapreduce program of N tasks.  Each task puts up a
client writing 1M rows (IIRC).  Try N where N == number_of_map_slots and see
what that does?  N == 2 probably won't tell you much.  You could also set an
N > 1 and use the '--nomapred'.  This will run PE clients in a distinct
thread.  For small numbers of N, this can put up heavier loading than MR
with its setup and teardown cost.

St.Ack

Re: HBase in a real world application

Posted by llpind <so...@hotmail.com>.
Ryan & Eric thanks for your input.

Currently we're experimenting on a small cluster (5-8) node.  I'm trying to
get statistics from a cluster of this size in order to estimate what impact
adding more nodes will have. 

This is proving to be a hard task, since it’s hard with such a small number
of nodes to compare 2 nodes vs. 5, because maybe Hadoop
Datanodes/RegionServers are starving with 2 hence performance suffers.
  
Playing with the HBase perfomanceEval Class, but it seems to take a long
time to run “sequentialWrite 2” (~20 minutes).  If I simply emulate 1
clients in a simple program, I can do 1 Million Puts in about 3 minutes (non
mapred).  The sequential write is writing 2 million with 2 clients.  Please
help in understanding how to use the performanceEvaluation Class.



-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24939515.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: HBase in a real world application

Posted by Eric Sammer <er...@lifeless.net>.
Ryan Rawson wrote:
> we are using dell 1950s, 8cpu 16gb ram, dual 1tb disk.  you can get
> machines in this range for in the $2k range.  I run hbase on 1tb of
> data on 20 of these.  You can probably look at doing 15+ machines.
> 
> The master machine doesnt do much work, but it has to be reliable.
> Raid, dual power supply, etc.  If it goes down, namenode takes your
> entire system down. I run them on a standard node, but with some of
> the dual power features enabled.  The regionservers do way more, so in
> theory you could have a smaller master, but not too small.  Probably
> best to stick to 1 node time, keep it cheap.

I'm actually surprised that in a production cluster with hardware like
this that you'd want to make a strong (i.e. hardware) differentiation
between your namenode / datanodes, job / task trackers, etc. They're all
probably similar enough and the cost difference between a namenode 1950
and a datanode 1950 is probably only in the extra memory and redundancy
on the name node; it might not be all that different at all, really.
(It's possible that's what you're saying here and I'm reading it wrong,
in which case disregard this.)

Ideally, your namenode never goes down / away, but should it happen, you
would have a lot to gain in being able to know that any machine could
replace the namenode in terms of hardware capacity. If the fs image /
edit logs are made available for recovery, you could recover much
quicker than if you have to have a different hardware configuration for
the namenode (again, using namenode as an example). I've worked in a
large number of situations with hundreds of machines like this (1850s,
1950s, 2800 / 2900 series) and found that having a small number of
hardware configurations to be a huge benefit to rapid replacement in HA
situations. Of course, you're trading a bit of specialization for
consistency and "swapability" and that's a choice that might not apply
in all cases, although it paid off in mine.

Just to clarify, this wasn't specifically an hbase / hadoop cluster, but
the idea of limiting variability in a data center (I think) still
applies here.

Thanks!
-- 
Eric Sammer
eric@lifless.net
http://esammer.blogspot.com

Re: HBase in a real world application

Posted by Andrew Purtell <ap...@apache.org>.
Some number of CPU instructions are always emulated when running in a VM, anything that would affect real processor state with respect to hardware or affecting the integrity of other tasks. MMU functions are virtualized/shadowed and require an extra level of mediation. Emulation of privileged operations is done usually with a combination of dynamic code rewriting and traps up to the hypervisor. This has a measurable performance impact. 

Furthermore, device I/O is virtualized. The best you can do here is run paravirtualized Linux guests on Xen boxes. This had the lowest measured performance impact of the available options when I last looked (about a year ago), but it still costs you -- longer instruction paths, possibly more data copies. Virtualization of I/O is the bit that I'd expect would have the largest impact on HDFS and HBase function. On Xen you have the option of running HDFS and HBase in dom0 at least, exporting HDFS as a cluster filesystem or a global Bigtable service to apps or whatever running in domUs. No point to do that for a dedicated HBase storage cluster or HBase + mapreduce platform.

   - Andy

--- On Wed, 8/19/09, llpind <so...@hotmail.com> wrote:

From: llpind <so...@hotmail.com>
Subject: Re: HBase in a real world application
To: hbase-user@hadoop.apache.org
Date: Wednesday, August 19, 2009, 2:33 PM



Ryan Rawson wrote:
> 
> Absolutely not.  VM = low performance, no good.
> 

If you have a box with a lot of RAM, and you split the box into VMs
allocating enough RAM for each. 

Lets say you have a box with 32GB of RAM, and you put two VMs on it
allocating 16GB each... will that be slow too?

please explain why you think VM = low performance.  thanks
-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p25052355.html
Sent from the HBase User mailing list archive at Nabble.com.




      

Re: HBase in a real world application

Posted by llpind <so...@hotmail.com>.
Makes sense it's not as fast, and what Andrew said is what I was looking for. 
Space is an issue at times, and using virtualization with xen hypervisor is
a reasonable solution, this is especially true when experimenting with HBase
and data modeling.  My point here was to find specifics on optimal cluster
setup for running Hadoop/HBase in the real world, since I'm not yet that
familiar with the internals of these systems.  We had an existing box, and
setup VMs, since it was an easy way to go at the time, we are still in early
stages and attempting to determine if HBase is a feasible solution.  Moving
to production, if we do, will likely require rethinking the setup, and that
was the primary reason for my request for clarification (on why VM = low
performance) here, so people who have been working in a cluster environment
can weigh in.  

Thanks.


Ryan Rawson wrote:
> 
> Why do I think VM is low performance?  I could ask you, why do you
> think that Virtualizing is as fast as native?
> 
> 
> On Wed, Aug 19, 2009 at 2:33 PM, llpind<so...@hotmail.com> wrote:
>>
>>
>> Ryan Rawson wrote:
>>>
>>> Absolutely not.  VM = low performance, no good.
>>>
>>
>> If you have a box with a lot of RAM, and you split the box into VMs
>> allocating enough RAM for each.
>>
>> Lets say you have a box with 32GB of RAM, and you put two VMs on it
>> allocating 16GB each... will that be slow too?
>>
>> please explain why you think VM = low performance.  thanks
>> --
>> View this message in context:
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p25052355.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p25119754.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: HBase in a real world application

Posted by Ryan Rawson <ry...@gmail.com>.
Why do I think VM is low performance?  I could ask you, why do you
think that Virtualizing is as fast as native?


On Wed, Aug 19, 2009 at 2:33 PM, llpind<so...@hotmail.com> wrote:
>
>
> Ryan Rawson wrote:
>>
>> Absolutely not.  VM = low performance, no good.
>>
>
> If you have a box with a lot of RAM, and you split the box into VMs
> allocating enough RAM for each.
>
> Lets say you have a box with 32GB of RAM, and you put two VMs on it
> allocating 16GB each... will that be slow too?
>
> please explain why you think VM = low performance.  thanks
> --
> View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p25052355.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: HBase in a real world application

Posted by llpind <so...@hotmail.com>.

Ryan Rawson wrote:
> 
> Absolutely not.  VM = low performance, no good.
> 

If you have a box with a lot of RAM, and you split the box into VMs
allocating enough RAM for each. 

Lets say you have a box with 32GB of RAM, and you put two VMs on it
allocating 16GB each... will that be slow too?

please explain why you think VM = low performance.  thanks
-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p25052355.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: HBase in a real world application

Posted by Ryan Rawson <ry...@gmail.com>.
Absolutely not.  VM = low performance, no good.

While it seems that 16 GB ram is a lot, it really isnt.  I'd rather
have twice, since java sucks up ram like no tomorrow, and also we want
a really really effective OS buffer cache.  This improves random reads
quite a bit.

In fact my newer machines are intel i7s, 8 core, HTT (16 cpus on
/proc/cpuinfo) and 24 gb ram.  I'd prefer to keep 2gb per core, but
due to the architecture needs, we'd have to go with slower ram.

It's all about the RAM.

On Fri, Aug 14, 2009 at 8:23 AM, llpind<so...@hotmail.com> wrote:
>
> Hey Ryan,
>
> Do you mean you run multiple VMs on the 1950s using xen or something?
> isn't 16gb a lot for single box
>
>
>
>
> Ryan Rawson wrote:
>>
>> we are using dell 1950s, 8cpu 16gb ram, dual 1tb disk.  you can get
>> machines in this range for in the $2k range.  I run hbase on 1tb of
>> data on 20 of these.  You can probably look at doing 15+ machines.
>>
>> The master machine doesnt do much work, but it has to be reliable.
>> Raid, dual power supply, etc.  If it goes down, namenode takes your
>> entire system down. I run them on a standard node, but with some of
>> the dual power features enabled.  The regionservers do way more, so in
>> theory you could have a smaller master, but not too small.  Probably
>> best to stick to 1 node time, keep it cheap.
>>
>> You can run ZK on those nodes, but if you run into IO wait issues, you
>> might see stalls that could hurt bad.  I'd avoid doing massive
>> map-reduces with a large intermediate output on these machines.
>>
>> -ryan
>>
>> On Tue, Aug 11, 2009 at 4:14 PM, llpind<so...@hotmail.com> wrote:
>>>
>>> Thanks for the link.  I will keep that in mind.
>>>
>>> Yeah 256MB isn't much.  Moving up to 3-4G for 10-15 boxes gets expensive.
>>>
>>>
>>>
>>>
>>>
>>> Alejandro Pérez-Linaza wrote:
>>>>
>>>> You might want to check out www.rackspacecloud.com where you can get
>>>> boxes
>>>> and pay by the hour (as cheap as $0.015 / hour for a 256Mb box).  We
>>>> used
>>>> it a couple of weeks ago to setup a MySQL Cluster test and ended up
>>>> having
>>>> around 18 boxes.  Memory can be changed from 256Mb to 16Gb in a couple
>>>> of
>>>> minutes.  They also have various flavors to choose from.
>>>>
>>>> The bottom line is that we love it and it solves the problem of the
>>>> "test
>>>> boxes" that you would need right away.
>>>>
>>>> Have fun,
>>>>
>>>> Alex
>>>>
>>>>
>>>> Alejandro Pérez-Linaza
>>>> CEO
>>>> Vertical Technologies, LLC
>>>> aperez@vertical-tech.com
>>>> www.vertical-tech.com
>>>> 9600 NW 25th Street, Suite 4A
>>>> Miami, FL 33172
>>>> Office: (786) 206-0554 x 108
>>>> Toll Free: (866) 382-8918
>>>> Fax: (305) 328-5063
>>>>
>>>> The information in this email is confidential and may be legally
>>>> privileged. It is intended solely for the addressee. Access to this
>>>> email
>>>> by anyone else is unauthorized. If you are not the intended recipient,
>>>> any
>>>> disclosure, copying, distribution or any action taken or omitted to be
>>>> taken in reliance on it, is prohibited and may be unlawful.
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: llpind [mailto:sonny_heer@hotmail.com]
>>>> Sent: Tuesday, August 11, 2009 12:21 PM
>>>> To: hbase-user@hadoop.apache.org
>>>> Subject: HBase in a real world application
>>>>
>>>>
>>>> As some of you know, I've been playing with HBase on/off for the past
>>>> few
>>>> months.
>>>>
>>>> I'd like your take on some cluster setup/configuration setting that
>>>> you’ve
>>>> found successful.  Also, any other thoughts on how I can persuade usage
>>>> of
>>>> HBase.
>>>>
>>>> Assume:  Working with ~2 TB of data.  A few very tall tables.
>>>> Hadoop/HBase
>>>> 0.20.0.
>>>>
>>>> 1.    What specs should a master box have (speed, HD, RAM)?  Should
>>>> Slave
>>>> boxes
>>>> be different?
>>>> 2.    Recommended size of cluster?  I realize this depends on what
>>>> load/performance requirements we have, but I’d like to know your
>>>> thoughts
>>>> based on #1 specs.
>>>> 3.    Should zookeeper quorums run on different boxes than
>>>> regionservers?
>>>>
>>>>
>>>> Basically if you could give some example cluster configurations with the
>>>> amount of data your working with that would be a lot of help (or point
>>>> me
>>>> to
>>>> a place were this has been discussed for .20).  Currently I don’t have
>>>> the
>>>> funds to play around with a lot of boxes, but I hope to soon.  :)
>>>>  Thanks.
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24920888.html
>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24927386.html
>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24973523.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: HBase in a real world application

Posted by llpind <so...@hotmail.com>.
Hey Ryan,

Do you mean you run multiple VMs on the 1950s using xen or something?  
isn't 16gb a lot for single box




Ryan Rawson wrote:
> 
> we are using dell 1950s, 8cpu 16gb ram, dual 1tb disk.  you can get
> machines in this range for in the $2k range.  I run hbase on 1tb of
> data on 20 of these.  You can probably look at doing 15+ machines.
> 
> The master machine doesnt do much work, but it has to be reliable.
> Raid, dual power supply, etc.  If it goes down, namenode takes your
> entire system down. I run them on a standard node, but with some of
> the dual power features enabled.  The regionservers do way more, so in
> theory you could have a smaller master, but not too small.  Probably
> best to stick to 1 node time, keep it cheap.
> 
> You can run ZK on those nodes, but if you run into IO wait issues, you
> might see stalls that could hurt bad.  I'd avoid doing massive
> map-reduces with a large intermediate output on these machines.
> 
> -ryan
> 
> On Tue, Aug 11, 2009 at 4:14 PM, llpind<so...@hotmail.com> wrote:
>>
>> Thanks for the link.  I will keep that in mind.
>>
>> Yeah 256MB isn't much.  Moving up to 3-4G for 10-15 boxes gets expensive.
>>
>>
>>
>>
>>
>> Alejandro Pérez-Linaza wrote:
>>>
>>> You might want to check out www.rackspacecloud.com where you can get
>>> boxes
>>> and pay by the hour (as cheap as $0.015 / hour for a 256Mb box).  We
>>> used
>>> it a couple of weeks ago to setup a MySQL Cluster test and ended up
>>> having
>>> around 18 boxes.  Memory can be changed from 256Mb to 16Gb in a couple
>>> of
>>> minutes.  They also have various flavors to choose from.
>>>
>>> The bottom line is that we love it and it solves the problem of the
>>> "test
>>> boxes" that you would need right away.
>>>
>>> Have fun,
>>>
>>> Alex
>>>
>>>
>>> Alejandro Pérez-Linaza
>>> CEO
>>> Vertical Technologies, LLC
>>> aperez@vertical-tech.com
>>> www.vertical-tech.com
>>> 9600 NW 25th Street, Suite 4A
>>> Miami, FL 33172
>>> Office: (786) 206-0554 x 108
>>> Toll Free: (866) 382-8918
>>> Fax: (305) 328-5063
>>>
>>> The information in this email is confidential and may be legally
>>> privileged. It is intended solely for the addressee. Access to this
>>> email
>>> by anyone else is unauthorized. If you are not the intended recipient,
>>> any
>>> disclosure, copying, distribution or any action taken or omitted to be
>>> taken in reliance on it, is prohibited and may be unlawful.
>>>
>>>
>>> -----Original Message-----
>>> From: llpind [mailto:sonny_heer@hotmail.com]
>>> Sent: Tuesday, August 11, 2009 12:21 PM
>>> To: hbase-user@hadoop.apache.org
>>> Subject: HBase in a real world application
>>>
>>>
>>> As some of you know, I've been playing with HBase on/off for the past
>>> few
>>> months.
>>>
>>> I'd like your take on some cluster setup/configuration setting that
>>> you’ve
>>> found successful.  Also, any other thoughts on how I can persuade usage
>>> of
>>> HBase.
>>>
>>> Assume:  Working with ~2 TB of data.  A few very tall tables.
>>> Hadoop/HBase
>>> 0.20.0.
>>>
>>> 1.    What specs should a master box have (speed, HD, RAM)?  Should
>>> Slave
>>> boxes
>>> be different?
>>> 2.    Recommended size of cluster?  I realize this depends on what
>>> load/performance requirements we have, but I’d like to know your
>>> thoughts
>>> based on #1 specs.
>>> 3.    Should zookeeper quorums run on different boxes than
>>> regionservers?
>>>
>>>
>>> Basically if you could give some example cluster configurations with the
>>> amount of data your working with that would be a lot of help (or point
>>> me
>>> to
>>> a place were this has been discussed for .20).  Currently I don’t have
>>> the
>>> funds to play around with a lot of boxes, but I hope to soon.  :)
>>>  Thanks.
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24920888.html
>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24927386.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24973523.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: HBase in a real world application

Posted by Ryan Rawson <ry...@gmail.com>.
we are using dell 1950s, 8cpu 16gb ram, dual 1tb disk.  you can get
machines in this range for in the $2k range.  I run hbase on 1tb of
data on 20 of these.  You can probably look at doing 15+ machines.

The master machine doesnt do much work, but it has to be reliable.
Raid, dual power supply, etc.  If it goes down, namenode takes your
entire system down. I run them on a standard node, but with some of
the dual power features enabled.  The regionservers do way more, so in
theory you could have a smaller master, but not too small.  Probably
best to stick to 1 node time, keep it cheap.

You can run ZK on those nodes, but if you run into IO wait issues, you
might see stalls that could hurt bad.  I'd avoid doing massive
map-reduces with a large intermediate output on these machines.

-ryan

On Tue, Aug 11, 2009 at 4:14 PM, llpind<so...@hotmail.com> wrote:
>
> Thanks for the link.  I will keep that in mind.
>
> Yeah 256MB isn't much.  Moving up to 3-4G for 10-15 boxes gets expensive.
>
>
>
>
>
> Alejandro Pérez-Linaza wrote:
>>
>> You might want to check out www.rackspacecloud.com where you can get boxes
>> and pay by the hour (as cheap as $0.015 / hour for a 256Mb box).  We used
>> it a couple of weeks ago to setup a MySQL Cluster test and ended up having
>> around 18 boxes.  Memory can be changed from 256Mb to 16Gb in a couple of
>> minutes.  They also have various flavors to choose from.
>>
>> The bottom line is that we love it and it solves the problem of the "test
>> boxes" that you would need right away.
>>
>> Have fun,
>>
>> Alex
>>
>>
>> Alejandro Pérez-Linaza
>> CEO
>> Vertical Technologies, LLC
>> aperez@vertical-tech.com
>> www.vertical-tech.com
>> 9600 NW 25th Street, Suite 4A
>> Miami, FL 33172
>> Office: (786) 206-0554 x 108
>> Toll Free: (866) 382-8918
>> Fax: (305) 328-5063
>>
>> The information in this email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful.
>>
>>
>> -----Original Message-----
>> From: llpind [mailto:sonny_heer@hotmail.com]
>> Sent: Tuesday, August 11, 2009 12:21 PM
>> To: hbase-user@hadoop.apache.org
>> Subject: HBase in a real world application
>>
>>
>> As some of you know, I've been playing with HBase on/off for the past few
>> months.
>>
>> I'd like your take on some cluster setup/configuration setting that you’ve
>> found successful.  Also, any other thoughts on how I can persuade usage of
>> HBase.
>>
>> Assume:  Working with ~2 TB of data.  A few very tall tables. Hadoop/HBase
>> 0.20.0.
>>
>> 1.    What specs should a master box have (speed, HD, RAM)?  Should Slave
>> boxes
>> be different?
>> 2.    Recommended size of cluster?  I realize this depends on what
>> load/performance requirements we have, but I’d like to know your thoughts
>> based on #1 specs.
>> 3.    Should zookeeper quorums run on different boxes than regionservers?
>>
>>
>> Basically if you could give some example cluster configurations with the
>> amount of data your working with that would be a lot of help (or point me
>> to
>> a place were this has been discussed for .20).  Currently I don’t have the
>> funds to play around with a lot of boxes, but I hope to soon.  :)  Thanks.
>>
>> --
>> View this message in context:
>> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24920888.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24927386.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

RE: HBase in a real world application

Posted by llpind <so...@hotmail.com>.
Thanks for the link.  I will keep that in mind. 

Yeah 256MB isn't much.  Moving up to 3-4G for 10-15 boxes gets expensive.





Alejandro Pérez-Linaza wrote:
> 
> You might want to check out www.rackspacecloud.com where you can get boxes
> and pay by the hour (as cheap as $0.015 / hour for a 256Mb box).  We used
> it a couple of weeks ago to setup a MySQL Cluster test and ended up having
> around 18 boxes.  Memory can be changed from 256Mb to 16Gb in a couple of
> minutes.  They also have various flavors to choose from.
> 
> The bottom line is that we love it and it solves the problem of the "test
> boxes" that you would need right away.
> 
> Have fun,
> 
> Alex
> 
> 
> Alejandro Pérez-Linaza
> CEO
> Vertical Technologies, LLC
> aperez@vertical-tech.com
> www.vertical-tech.com
> 9600 NW 25th Street, Suite 4A
> Miami, FL 33172
> Office: (786) 206-0554 x 108
> Toll Free: (866) 382-8918
> Fax: (305) 328-5063
> 
> The information in this email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful.
> 
> 
> -----Original Message-----
> From: llpind [mailto:sonny_heer@hotmail.com] 
> Sent: Tuesday, August 11, 2009 12:21 PM
> To: hbase-user@hadoop.apache.org
> Subject: HBase in a real world application
> 
> 
> As some of you know, I've been playing with HBase on/off for the past few
> months.
> 
> I'd like your take on some cluster setup/configuration setting that you’ve
> found successful.  Also, any other thoughts on how I can persuade usage of
> HBase.
> 
> Assume:  Working with ~2 TB of data.  A few very tall tables. Hadoop/HBase
> 0.20.0.
> 
> 1.	What specs should a master box have (speed, HD, RAM)?  Should Slave
> boxes
> be different?
> 2.	Recommended size of cluster?  I realize this depends on what
> load/performance requirements we have, but I’d like to know your thoughts
> based on #1 specs.
> 3.	Should zookeeper quorums run on different boxes than regionservers?
> 
> 
> Basically if you could give some example cluster configurations with the
> amount of data your working with that would be a lot of help (or point me
> to
> a place were this has been discussed for .20).  Currently I don’t have the
> funds to play around with a lot of boxes, but I hope to soon.  :)  Thanks.
> 
> -- 
> View this message in context:
> http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24920888.html
> Sent from the HBase User mailing list archive at Nabble.com.
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24927386.html
Sent from the HBase User mailing list archive at Nabble.com.


RE: HBase in a real world application

Posted by Alejandro Pérez-Linaza <ap...@vertical-tech.com>.
You might want to check out www.rackspacecloud.com where you can get boxes and pay by the hour (as cheap as $0.015 / hour for a 256Mb box).  We used it a couple of weeks ago to setup a MySQL Cluster test and ended up having around 18 boxes.  Memory can be changed from 256Mb to 16Gb in a couple of minutes.  They also have various flavors to choose from.

The bottom line is that we love it and it solves the problem of the "test boxes" that you would need right away.

Have fun,

Alex


Alejandro Pérez-Linaza
CEO
Vertical Technologies, LLC
aperez@vertical-tech.com
www.vertical-tech.com
9600 NW 25th Street, Suite 4A
Miami, FL 33172
Office: (786) 206-0554 x 108
Toll Free: (866) 382-8918
Fax: (305) 328-5063

The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.


-----Original Message-----
From: llpind [mailto:sonny_heer@hotmail.com] 
Sent: Tuesday, August 11, 2009 12:21 PM
To: hbase-user@hadoop.apache.org
Subject: HBase in a real world application


As some of you know, I've been playing with HBase on/off for the past few
months.

I'd like your take on some cluster setup/configuration setting that you’ve
found successful.  Also, any other thoughts on how I can persuade usage of
HBase.

Assume:  Working with ~2 TB of data.  A few very tall tables. Hadoop/HBase
0.20.0.

1.	What specs should a master box have (speed, HD, RAM)?  Should Slave boxes
be different?
2.	Recommended size of cluster?  I realize this depends on what
load/performance requirements we have, but I’d like to know your thoughts
based on #1 specs.
3.	Should zookeeper quorums run on different boxes than regionservers?


Basically if you could give some example cluster configurations with the
amount of data your working with that would be a lot of help (or point me to
a place were this has been discussed for .20).  Currently I don’t have the
funds to play around with a lot of boxes, but I hope to soon.  :)  Thanks.

-- 
View this message in context: http://www.nabble.com/HBase-in-a-real-world-application-tp24920888p24920888.html
Sent from the HBase User mailing list archive at Nabble.com.