You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Haijun Cao <ha...@ymail.com> on 2009/08/03 23:04:16 UTC

stargate performance evaluation

I am evaluating the performance of stargate (which btw, is a
great contrib to hbase, thanks!). The evaluation program is mostly a simple
modification to the existing PerformanceEvaluation program, just replace java
client with stargate client and get value as protobuf. 
 
All of the software (hadoop, zookeeper, hbase, jetty) are
installed on one box. The data set is small, therefore all data are served out
of memory.
 
For random read test, with java client (the existing PE
program), I can get 19K/s, with stargate client,  I can only get 3-4k/s.
In both case, pe program run with 100 threads. Increasing number of threads
does not seem to help (even hurt the throughput).
 
I am just wondering if this is expected ( I can’t figure out
in theory why the throughput drop)? Any idea of possible optimization/configuration change to increase the throughput?
 
Thanks!

Haijun Cao

Re: stargate performance evaluation

Posted by Andrew Purtell <ap...@apache.org>.

Haijun,

Because Stargate is itself a client of the HBase storage cluster, there
will be an extra round trip for each data transfer. There will always
be some performance penalty for this. Over time the penalty may become
quite small. If one is using some future version of Stargate to provide
Bigtable structured storage for a large enterprise, then we expect the
benefits will outweigh this. 

My personal goal for Stargate is to approximate an "internal S3" for
large enterprises.

There are many opportunities currently for performance tuning of
Stargate. For example:

  - Profile the code and look for (and re-engineer) bottlenecks in
    the resource methods.

  - Explore Jetty performance optimizations; current simple config for
    standalone mode may be naive.

  - Explore Jersey framework performance optimizations. The two 
    contributors who worked on Stargate are not (yet) Jersey wizards.

  - Intelligent batching of client requests to the storage cluster.

  - LRU caching for good read performance if the clients' collective
    working set can fit. 

Also, please be aware that the o.a.h.h.stargate.client package is at
this time a simple and naive wrapper around commons httpclient and 
could surely be improved. Its purpose now is to support the test
suite. 

Best regards,

   - Andy




________________________________
From: Haijun Cao <ha...@ymail.com>
To: hbase-user@hadoop.apache.org
Cc: apurtell@apache.org
Sent: Monday, August 3, 2009 10:36:42 PM
Subject: Re: stargate performance evaluation


Andrew,

Thanks for the reply. I am considering using stargate in one of my projects, the design/impl is quite elegant. In your opinion, is there any hard limitation preventing stargate achieving the same throughput as that of hbase java client?  Is it just a matter of fine tuning? I am not sure if caching help in case of random read. I agree that the all local setup is naive, will do a more realistic test and share the observation.  

Haijun




________________________________
From: Andrew Purtell <ap...@apache.org>
To: hbase-user@hadoop.apache.org
Sent: Monday, August 3, 2009 5:25:09 PM
Subject: Re: stargate performance evaluation

Hi,

Thanks for the testing and performance report!

You said you used the stargate Client package? It is pretty basic, written mainly for convenience for writing test cases in the test suite. 

Regarding Stargate quality in general, this is an alpha release. It can survive torture testing with PE it seems. It can handle well formed requests. But, the implementation is untuned. For example, there is no caching (yet). The code has not yet been profiled also. 

I put up an issue for Stargate performance improvement: https://issues.apache.org/jira/browse/HBASE-1741

I'm not sure an all-localhost configuration is the best testing scenario. It would be interesting to see how the performance differs with the client remote from both the regionservers and the Stargate instance. 

  - Andy





________________________________
From: Haijun Cao <ha...@ymail.com>
To: hbase-user@hadoop.apache.org
Sent: Monday, August 3, 2009 2:04:16 PM
Subject: stargate performance evaluation

I am evaluating the performance of stargate (which btw, is a
great contrib to hbase, thanks!). The evaluation program is mostly a simple
modification to the existing PerformanceEvaluation program, just replace java
client with stargate client and get value as protobuf. 

All of the software (hadoop, zookeeper, hbase, jetty) are
installed on one box. The data set is small, therefore all data are served out
of memory.

For random read test, with java client (the existing PE
program), I can get 19K/s, with stargate client,  I can only get 3-4k/s.
In both case, pe program run with 100 threads. Increasing number of threads
does not seem to help (even hurt the throughput).

I am just wondering if this is expected ( I can’t figure out
in theory why the throughput drop)? Any idea of possible optimization/configuration change to increase the throughput?

Thanks!

Haijun Cao

Re: stargate performance evaluation

Posted by Haijun Cao <ha...@ymail.com>.

Andrew,

Thanks for the reply. I am considering using stargate in one of my projects, the design/impl is quite elegant. In your opinion, is there any hard limitation preventing stargate achieving the same throughput as that of hbase java client?  Is it just a matter of fine tuning? I am not sure if caching help in case of random read. I agree that the all local setup is naive, will do a more realistic test and share the observation.  

Haijun




________________________________
From: Andrew Purtell <ap...@apache.org>
To: hbase-user@hadoop.apache.org
Sent: Monday, August 3, 2009 5:25:09 PM
Subject: Re: stargate performance evaluation

Hi,

Thanks for the testing and performance report!

You said you used the stargate Client package? It is pretty basic, written mainly for convenience for writing test cases in the test suite. 

Regarding Stargate quality in general, this is an alpha release. It can survive torture testing with PE it seems. It can handle well formed requests. But, the implementation is untuned. For example, there is no caching (yet). The code has not yet been profiled also. 

I put up an issue for Stargate performance improvement: https://issues.apache.org/jira/browse/HBASE-1741

I'm not sure an all-localhost configuration is the best testing scenario. It would be interesting to see how the performance differs with the client remote from both the regionservers and the Stargate instance. 

  - Andy





________________________________
From: Haijun Cao <ha...@ymail.com>
To: hbase-user@hadoop.apache.org
Sent: Monday, August 3, 2009 2:04:16 PM
Subject: stargate performance evaluation

I am evaluating the performance of stargate (which btw, is a
great contrib to hbase, thanks!). The evaluation program is mostly a simple
modification to the existing PerformanceEvaluation program, just replace java
client with stargate client and get value as protobuf. 

All of the software (hadoop, zookeeper, hbase, jetty) are
installed on one box. The data set is small, therefore all data are served out
of memory.

For random read test, with java client (the existing PE
program), I can get 19K/s, with stargate client,  I can only get 3-4k/s.
In both case, pe program run with 100 threads. Increasing number of threads
does not seem to help (even hurt the throughput).

I am just wondering if this is expected ( I can’t figure out
in theory why the throughput drop)? Any idea of possible optimization/configuration change to increase the throughput?

Thanks!

Haijun Cao

Re: stargate performance evaluation

Posted by Andrew Purtell <ap...@apache.org>.

Hi,

Thanks for the testing and performance report!

You said you used the stargate Client package? It is pretty basic, written mainly for convenience for writing test cases in the test suite. 

Regarding Stargate quality in general, this is an alpha release. It can survive torture testing with PE it seems. It can handle well formed requests. But, the implementation is untuned. For example, there is no caching (yet). The code has not yet been profiled also. 

I put up an issue for Stargate performance improvement: https://issues.apache.org/jira/browse/HBASE-1741

I'm not sure an all-localhost configuration is the best testing scenario. It would be interesting to see how the performance differs with the client remote from both the regionservers and the Stargate instance. 

  - Andy





________________________________
From: Haijun Cao <ha...@ymail.com>
To: hbase-user@hadoop.apache.org
Sent: Monday, August 3, 2009 2:04:16 PM
Subject: stargate performance evaluation

I am evaluating the performance of stargate (which btw, is a
great contrib to hbase, thanks!). The evaluation program is mostly a simple
modification to the existing PerformanceEvaluation program, just replace java
client with stargate client and get value as protobuf. 

All of the software (hadoop, zookeeper, hbase, jetty) are
installed on one box. The data set is small, therefore all data are served out
of memory.

For random read test, with java client (the existing PE
program), I can get 19K/s, with stargate client,  I can only get 3-4k/s.
In both case, pe program run with 100 threads. Increasing number of threads
does not seem to help (even hurt the throughput).

I am just wondering if this is expected ( I can’t figure out
in theory why the throughput drop)? Any idea of possible optimization/configuration change to increase the throughput?

Thanks!

Haijun Cao