You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Liu, Ming (HPIT-GADSC)" <mi...@hp.com> on 2014/11/12 05:40:30 UTC

Is it possible that HBase update performance is much better than read in YCSB test?

Hi, all,

I am trying to use YCSB to test on our HBase 0.98.5 instance and got a strange result: update is 6x better than read. It is just an exercise, so the HBase is running in a workstation in standalone mode.
I modified the workloada shipped with YCSB into two new workloads: workloadr and workloadu, where workloadr is do 100% read operation and workloadu is do 100% update operation. At the bottom is the workloadr and workloadu config files for your reference.

I found out that the read performance is much worse than the update performance, read is about 6000:

YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadr -p columnfamily=family -s -t
[OVERALL], RunTime(ms), 16565.0
[OVERALL], Throughput(ops/sec), 6036.824630244491

And the update performance is about 36000, 6x better than read.

YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadu -p columnfamily=family -s -t
[OVERALL], RunTime(ms), 2767.0
[OVERALL], Throughput(ops/sec), 36140.22406938923

Is this possible? IMHO, read should be faster than update.
Maybe I am wrong in the workload file? Or there is a possibility that update is faster than read? I don't find a YCSB mailing list, if anyone knows, please give me a link, so I can also ask question on that mailing list. But is it possible that put is faster than get in hbase? If not, the result must be wrong and I need to debug the YCSB code to figure out what is going wrong.

Workloadr:
recordcount=100000
operationcount=100000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=1
updateproportion=0
scanproportion=0
insertproportion=0
requestdistribution=zipfian

workloadu:
recordcount=100000
operationcount=100000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0
updateproportion=1
scanproportion=0
insertproportion=0
requestdistribution=zipfian


Thanks,
Ming

Re: Is it possible that HBase update performance is much better than read in YCSB test?

Posted by ramkrishna vasudevan <ra...@gmail.com>.

Thanks Andrew.  This would be a very useful information along with the
github link.

Regards
Ram

On Thu, Nov 13, 2014 at 9:00 AM, Liu, Ming (HPIT-GADSC) <mi...@hp.com>
wrote:

> Thank you Andrew, this is an excellent answer, I get it now. I will try
> your hbase client for a 'fair' test :-)
>
> Best Regards,
> Ming
>
> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org]
> Sent: Thursday, November 13, 2014 2:08 AM
> To: user@hbase.apache.org
> Cc: DeRoo, John
> Subject: Re: Is it possible that HBase update performance is much better
> than read in YCSB test?
>
> Try this HBase YCSB client instead:
> https://github.com/apurtell/ycsb/tree/new_hbase_client
>
> The HBase YCSB driver in the master repo holds on to one HTable instance
> per driver thread. We accumulate writes into a 12MB write buffer before
> flushing them en masse. This is why the behavior you are seeing confounds
> your expectations. It's not correct behavior IMHO. YCSB wants to measure
> the round trip of every op, not the non-cost of local caching. Worse, if we
> have a lot of driver threads accumulating 12MB of edits more or less at the
> same rate, then we will flush these buffers more or less at the same time
> and stampede the cluster, which leads to deep valleys in observed write
> performance of 30-60 seconds or longer.
>
>
>
> On Tue, Nov 11, 2014 at 8:40 PM, Liu, Ming (HPIT-GADSC) <mi...@hp.com>
> wrote:
>
> > Hi, all,
> >
> > I am trying to use YCSB to test on our HBase 0.98.5 instance and got a
> > strange result: update is 6x better than read. It is just an exercise,
> > so the HBase is running in a workstation in standalone mode.
> > I modified the workloada shipped with YCSB into two new workloads:
> > workloadr and workloadu, where workloadr is do 100% read operation and
> > workloadu is do 100% update operation. At the bottom is the workloadr
> > and workloadu config files for your reference.
> >
> > I found out that the read performance is much worse than the update
> > performance, read is about 6000:
> >
> > YCSB Client 0.1
> > Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadr
> > -p columnfamily=family -s -t [OVERALL], RunTime(ms), 16565.0
> > [OVERALL], Throughput(ops/sec), 6036.824630244491
> >
> > And the update performance is about 36000, 6x better than read.
> >
> > YCSB Client 0.1
> > Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadu
> > -p columnfamily=family -s -t [OVERALL], RunTime(ms), 2767.0 [OVERALL],
> > Throughput(ops/sec), 36140.22406938923
> >
> > Is this possible? IMHO, read should be faster than update.
> > Maybe I am wrong in the workload file? Or there is a possibility that
> > update is faster than read? I don't find a YCSB mailing list, if
> > anyone knows, please give me a link, so I can also ask question on
> > that mailing list. But is it possible that put is faster than get in
> > hbase? If not, the result must be wrong and I need to debug the YCSB
> > code to figure out what is going wrong.
> >
> > Workloadr:
> > recordcount=100000
> > operationcount=100000
> > workload=com.yahoo.ycsb.workloads.CoreWorkload
> > readallfields=true
> > readproportion=1
> > updateproportion=0
> > scanproportion=0
> > insertproportion=0
> > requestdistribution=zipfian
> >
> > workloadu:
> > recordcount=100000
> > operationcount=100000
> > workload=com.yahoo.ycsb.workloads.CoreWorkload
> > readallfields=true
> > readproportion=0
> > updateproportion=1
> > scanproportion=0
> > insertproportion=0
> > requestdistribution=zipfian
> >
> >
> > Thanks,
> > Ming
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

RE: Is it possible that HBase update performance is much better than read in YCSB test?

Posted by "Liu, Ming (HPIT-GADSC)" <mi...@hp.com>.

Thank you Andrew, this is an excellent answer, I get it now. I will try your hbase client for a 'fair' test :-)

Best Regards,
Ming

-----Original Message-----
From: Andrew Purtell [mailto:apurtell@apache.org] 
Sent: Thursday, November 13, 2014 2:08 AM
To: user@hbase.apache.org
Cc: DeRoo, John
Subject: Re: Is it possible that HBase update performance is much better than read in YCSB test?

Try this HBase YCSB client instead:
https://github.com/apurtell/ycsb/tree/new_hbase_client

The HBase YCSB driver in the master repo holds on to one HTable instance per driver thread. We accumulate writes into a 12MB write buffer before flushing them en masse. This is why the behavior you are seeing confounds your expectations. It's not correct behavior IMHO. YCSB wants to measure the round trip of every op, not the non-cost of local caching. Worse, if we have a lot of driver threads accumulating 12MB of edits more or less at the same rate, then we will flush these buffers more or less at the same time and stampede the cluster, which leads to deep valleys in observed write performance of 30-60 seconds or longer.



On Tue, Nov 11, 2014 at 8:40 PM, Liu, Ming (HPIT-GADSC) <mi...@hp.com>
wrote:

> Hi, all,
>
> I am trying to use YCSB to test on our HBase 0.98.5 instance and got a 
> strange result: update is 6x better than read. It is just an exercise, 
> so the HBase is running in a workstation in standalone mode.
> I modified the workloada shipped with YCSB into two new workloads:
> workloadr and workloadu, where workloadr is do 100% read operation and 
> workloadu is do 100% update operation. At the bottom is the workloadr 
> and workloadu config files for your reference.
>
> I found out that the read performance is much worse than the update 
> performance, read is about 6000:
>
> YCSB Client 0.1
> Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadr 
> -p columnfamily=family -s -t [OVERALL], RunTime(ms), 16565.0 
> [OVERALL], Throughput(ops/sec), 6036.824630244491
>
> And the update performance is about 36000, 6x better than read.
>
> YCSB Client 0.1
> Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadu 
> -p columnfamily=family -s -t [OVERALL], RunTime(ms), 2767.0 [OVERALL], 
> Throughput(ops/sec), 36140.22406938923
>
> Is this possible? IMHO, read should be faster than update.
> Maybe I am wrong in the workload file? Or there is a possibility that 
> update is faster than read? I don't find a YCSB mailing list, if 
> anyone knows, please give me a link, so I can also ask question on 
> that mailing list. But is it possible that put is faster than get in 
> hbase? If not, the result must be wrong and I need to debug the YCSB 
> code to figure out what is going wrong.
>
> Workloadr:
> recordcount=100000
> operationcount=100000
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=true
> readproportion=1
> updateproportion=0
> scanproportion=0
> insertproportion=0
> requestdistribution=zipfian
>
> workloadu:
> recordcount=100000
> operationcount=100000
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=true
> readproportion=0
> updateproportion=1
> scanproportion=0
> insertproportion=0
> requestdistribution=zipfian
>
>
> Thanks,
> Ming
>



--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

Re: Is it possible that HBase update performance is much better than read in YCSB test?

Posted by Andrew Purtell <ap...@apache.org>.

Try this HBase YCSB client instead:
https://github.com/apurtell/ycsb/tree/new_hbase_client

The HBase YCSB driver in the master repo holds on to one HTable instance
per driver thread. We accumulate writes into a 12MB write buffer before
flushing them en masse. This is why the behavior you are seeing confounds
your expectations. It's not correct behavior IMHO. YCSB wants to measure
the round trip of every op, not the non-cost of local caching. Worse, if we
have a lot of driver threads accumulating 12MB of edits more or less at the
same rate, then we will flush these buffers more or less at the same time
and stampede the cluster, which leads to deep valleys in observed write
performance of 30-60 seconds or longer.



On Tue, Nov 11, 2014 at 8:40 PM, Liu, Ming (HPIT-GADSC) <mi...@hp.com>
wrote:

> Hi, all,
>
> I am trying to use YCSB to test on our HBase 0.98.5 instance and got a
> strange result: update is 6x better than read. It is just an exercise, so
> the HBase is running in a workstation in standalone mode.
> I modified the workloada shipped with YCSB into two new workloads:
> workloadr and workloadu, where workloadr is do 100% read operation and
> workloadu is do 100% update operation. At the bottom is the workloadr and
> workloadu config files for your reference.
>
> I found out that the read performance is much worse than the update
> performance, read is about 6000:
>
> YCSB Client 0.1
> Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadr -p
> columnfamily=family -s -t
> [OVERALL], RunTime(ms), 16565.0
> [OVERALL], Throughput(ops/sec), 6036.824630244491
>
> And the update performance is about 36000, 6x better than read.
>
> YCSB Client 0.1
> Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadu -p
> columnfamily=family -s -t
> [OVERALL], RunTime(ms), 2767.0
> [OVERALL], Throughput(ops/sec), 36140.22406938923
>
> Is this possible? IMHO, read should be faster than update.
> Maybe I am wrong in the workload file? Or there is a possibility that
> update is faster than read? I don't find a YCSB mailing list, if anyone
> knows, please give me a link, so I can also ask question on that mailing
> list. But is it possible that put is faster than get in hbase? If not, the
> result must be wrong and I need to debug the YCSB code to figure out what
> is going wrong.
>
> Workloadr:
> recordcount=100000
> operationcount=100000
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=true
> readproportion=1
> updateproportion=0
> scanproportion=0
> insertproportion=0
> requestdistribution=zipfian
>
> workloadu:
> recordcount=100000
> operationcount=100000
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=true
> readproportion=0
> updateproportion=1
> scanproportion=0
> insertproportion=0
> requestdistribution=zipfian
>
>
> Thanks,
> Ming
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)