You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Demai Ni <ni...@gmail.com> on 2015/03/07 01:46:59 UTC
significant scan performance difference between Thrift(c++) and Java:
4X slower
hi, guys,
I am trying to get a rough idea about the performance comparison between
c++ and java client when access HBase table, and is surprised to find out
that Thrift (c++) is 4X slower
The performance result is:
C++: real *16m11.313s*; user 5m3.642s; sys 2m21.388s
Java: real *4m6.012s*;user 0m31.228s; sys 0m8.018s
I have a single node HBase(98.6) cluster, with 1X TPCH loaded, and use the
largest table : lineitem, which has 6M rows, roughly 600MB data.
For c++ client, I used the thrift example provided by hbase-examples, the
C++ code looks like:
> std::string t("lineitem");
> int scanner = client.scannerOpenWithScan(t, tscan, dummyAttributes);
> int count = 0;
> ..
> while (true) {
> std::vector<TRowResult> value;
> client.scannerGet(value, scanner);
> if (value.size() == 0) break;
> count ++;
> }
>
> std::cout << count << " rows scanned"<< std::endl;
>
For java client is the most simple one:
> HTable table = new HTable(conf,"lineitem");
>
> Scan scan = new Scan();
> ResultScanner resScanner;
> resScanner = table.getScanner(scan);
> int count = 0;
> for (Result res: resScanner) {
> count ++;
> }
>
Since most of the time should be on I/O, I don't expect any significant
difference between Thrift(C++) and Java. Any ideas? Many thanks
Demai
Re: significant scan performance difference between Thrift(c++) and
Java: 4X slower
Posted by Stack <st...@duboce.net>.
Is it because of the 'hop'? Java goes against RS. The thrift C++ goes to a
thriftserver which hosts a java client and then it goes to the RS?
St.Ack
On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni <ni...@gmail.com> wrote:
> hi, guys,
>
> I am trying to get a rough idea about the performance comparison between
> c++ and java client when access HBase table, and is surprised to find out
> that Thrift (c++) is 4X slower
>
> The performance result is:
> C++: real *16m11.313s*; user 5m3.642s; sys 2m21.388s
> Java: real *4m6.012s*;user 0m31.228s; sys 0m8.018s
>
>
> I have a single node HBase(98.6) cluster, with 1X TPCH loaded, and use the
> largest table : lineitem, which has 6M rows, roughly 600MB data.
>
> For c++ client, I used the thrift example provided by hbase-examples, the
> C++ code looks like:
>
> > std::string t("lineitem");
> > int scanner = client.scannerOpenWithScan(t, tscan, dummyAttributes);
> > int count = 0;
> > ..
> > while (true) {
> > std::vector<TRowResult> value;
> > client.scannerGet(value, scanner);
> > if (value.size() == 0) break;
> > count ++;
> > }
> >
> > std::cout << count << " rows scanned"<< std::endl;
> >
>
> For java client is the most simple one:
>
> > HTable table = new HTable(conf,"lineitem");
> >
> > Scan scan = new Scan();
> > ResultScanner resScanner;
> > resScanner = table.getScanner(scan);
> > int count = 0;
> > for (Result res: resScanner) {
> > count ++;
> > }
> >
>
>
>
> Since most of the time should be on I/O, I don't expect any significant
> difference between Thrift(C++) and Java. Any ideas? Many thanks
>
> Demai
>