You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by yglin <yg...@gmail.com> on 2014/04/01 09:18:46 UTC

HBase/Stargate dataflow in I/O perspective

Hi~ 

I would like to know how data flows when you query it from HBase or
Stargate, especially in I/O perspective.
Please point me some directions to study.
That means questions like below:  
When is HFile(StoreFile) being loaded as a region into region server's
memory?
Does a region stay in region server's memory afterward? When is it being
freed?
When Stargate uses a scan instance to obtain data, does it communicate with
region server with another connection overhead?

Actually I'm asking these because I'm experimenting Toad for Cloud Database
on HBase.
And I got a performance issue of querying 400K data rows in about 5 minutes,
kind of a awkward number.
I installed HBase/HDFS on 7 VMs, 
1 ResourceManager, 1 as NameNode and HMaster, 5 as DataNodes and
RegionServers
Barely change any configuration for performance tuning.
I drew myself a very simple chart trying to find where are the bottlenecks.
<http://apache-hbase.679495.n3.nabble.com/file/n4057719/Toad_Read_HBase_Process.png> 

I know I could miss many details in this simple chart
Please give me some clues
Much appreciate

yglin



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-Stargate-dataflow-in-I-O-perspective-tp4057719.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: HBase/Stargate dataflow in I/O perspective

Posted by yglin <yg...@gmail.com>.

Thank you, Andy.

Let me trying to put it together in my view.
So ...

1. If I keep using only Stargate as client, block-caching will never happen
2. Each time I send query to Stargate then to RegionServer, it loads store
files again, due to no cache.
3. Stargate does build a connection(tcp?) with RegionServer for each scan
operation (I didn't mensioned that I found numerous scan operations in
Stargate's log history, it bothers me much)

And to conclude it ...
Installing HBase on VMs is a bad idea, using Stargate is another one, right? 

yglin



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-Stargate-dataflow-in-I-O-perspective-tp4057719p4057724.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: HBase/Stargate dataflow in I/O perspective

Posted by Andrew Purtell <ap...@apache.org>.

Don't expect any kind of performance when running HBase on VMs, by
definition. Although exactly how bad will depend on the VM host
environment, allocations per container, and such.

As for your questions:

> When is HFile(StoreFile) being loaded as a region into region server's
memory?

Stargate is just another HBase client from the perspective of the
RegionServers. So, the RegionServers will service requests from the REST
gateway as needed, reading store files on demand.

> Does a region stay in region server's memory afterward? When is it being
freed?

Depends, although the REST gateway pessimistically sets Scan#
scan.setCacheBlocks(false) so scans from REST are unlikely to result in
HFile block caching if those blocks are not in cache already. If they
happen to be in cache from another request from another type of client,
then the RegionServers will use the cached blocks and update usage counts
for those blocks for LRU, etc.

> When Stargate uses a scan instance to obtain data, does it communicate
with region server with another connection overhead?

It looks like this: REST client <--> Stargate <--> RegionServers

On Tue, Apr 1, 2014 at 9:18 AM, yglin <yg...@gmail.com> wrote:

> Hi~
>
> I would like to know how data flows when you query it from HBase or
> Stargate, especially in I/O perspective.
> Please point me some directions to study.
> That means questions like below:
> When is HFile(StoreFile) being loaded as a region into region server's
> memory?
> Does a region stay in region server's memory afterward? When is it being
> freed?
> When Stargate uses a scan instance to obtain data, does it communicate with
> region server with another connection overhead?
>
> Actually I'm asking these because I'm experimenting Toad for Cloud Database
> on HBase.
> And I got a performance issue of querying 400K data rows in about 5
> minutes,
> kind of a awkward number.
> I installed HBase/HDFS on 7 VMs,
> 1 ResourceManager, 1 as NameNode and HMaster, 5 as DataNodes and
> RegionServers
> Barely change any configuration for performance tuning.
> I drew myself a very simple chart trying to find where are the bottlenecks.
> <
> http://apache-hbase.679495.n3.nabble.com/file/n4057719/Toad_Read_HBase_Process.png
> >
>
> I know I could miss many details in this simple chart
> Please give me some clues
> Much appreciate
>
> yglin
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/HBase-Stargate-dataflow-in-I-O-perspective-tp4057719.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)