You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Rupinder Singh <rs...@care.com> on 2013/04/12 07:49:14 UTC

Unexplained ganglia metric on hbase-hive cluster

Hi,

I am trying to tune the performance of my cluster. The cluster is hosted on Amazon EMR. There are 2 separate clusters - 1 for HBase and 1 for Hive. Hive cluster has no persistent data, it provides only processing power.
The test cluster from which I have generated the metrics, is 4 large nodes for Hbase (1 master + 3 core) and 4 large nodes for Hive (1 master + 3 core).
The process that is being monitored does this:

1.       New data files are received by the Hive cluster

2.       Hive inserts new data into Hbase

3.       Hive cluster then executes a bunch of hql on the hbase table to generate analytics.
Size of data: HBase table has 10 million rows of about 1K each.

I have attached Ganglia snapshots for this process from both Hive and HBase clusters. What is puzzling is:

1.       On the Cluster Network graph on Hbase, both In and Out lines follow each other closely. This is strange since after the initial insert, Hive is only selecting data from HBase table, so I would expect a lot of Out but nothing in In.

2.       The Cluster Network graph on Hbase shows 80MB/s Out on peaks, but the corresponding peaks on Hive's Cluster Network show only 10MB/s as In. Why is there such a significant difference between the data being sent out by HBase vs data being received by Hive, shouldn't they match ?

Any help or pointers are highly appreciated.

Also uploaded the metrics graphs here:
http://s15.postimg.org/qrrma4asb/hbase.png
http://s22.postimg.org/x443guzsh/hive.png

Thanks
Rupinder




This email is intended for the person(s) to whom it is addressed and may contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use, distribution, copying, or disclosure by any person other than the addressee(s) is strictly prohibited. If you have received this email in error, please notify the sender immediately by return email and delete the message and any attachments from your system.