You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Joarder KAMAL <jo...@gmail.com> on 2013/06/04 08:09:03 UTC

How to collect the real-time transaction request logs from HBase Master/Region Servers?

Dear All,

I am a newbie in HBase/Hadoop and recently have a small-scale setup in a
research cloud:
------------------------------------------
1 Master Server (Also Hadoop Name Node)
3 Region Server (Also Hadoop Data Node)
1 Ganglia Monitoring Server
1 YCSB Workload Generation Server
------------------------------------------
HBase Version: 0.94.7, r1471806
Hadoop Version: 1.0.4, r1393290
Ganglia Version: gmond/gmetad - 3.6.0, gweb - 3.5.8
YCSB Version: 0.1.4
------------------------------------------

I have only one table in HBase - 'usertable' with a single column family
'cf1' holding 1,000,000 key-value records. The row keys are in
monotonically increasing order and currently I have 6 regions distributed
in the 3 region servers each holding 2 of the regions.
*
*
*Objective:* create region hotspots for some research experiments

*Observation:*
After running a workload consist of a total 10,000,000 operations (50%
read, 50% write) I've observed the below statistics in the Web UI of the
master server which can suggest potential hotspots in the 3rd (not sure why
!!) and 6th regions (possibly it was receiving large number of write
requests).

Table Regions
 NameRegion ServerStart Key End KeyRequests
usertable,,1369584948241.3061b90ff519c1bce5b3d867690a2b4a. hdb1-02:60030
user2035146605813492656 127946
usertable,user2035146605813492656,1369584948241.00f8a51bab6d98ebd7c4db582579c3e7.
hdb1-03:60030user2035146605813492656user30679275375621809 126700
usertable,user30679275375621809,1369584813037.d704a50802ec39982884e394d4ef05b7.
hdb1-04:60030user30679275375621809user5136356049533495298
*284828*usertable,user5136356049533495298,1369584928780.999b987d646462e21b8916a737619b39.
hdb1-02:60030 user5136356049533495298user617761656465008158133108usertable,user617761656465008158,1369584928780.9cfe288f48f987de7f93b800dcd4c964.
hdb1-04:60030 user617761656465008158user7218407885253116621119008usertable,user7218407885253116621,1369584832152.e3a9c4d35c91f06c18ed346886ff3306.
hdb1-03:60030 user7218407885253116621*363234*

*Questions:*

   1. Can the HBase developer community guide me on how to collect the *raw
   logs* (directly from the master/region servers) for the above table
   which I've retrieved from the Master server?
   2. And how the master server is getting these logs from the region
   servers? As far I've understand from the architecture the client will
   directly communicate with the region servers to read/write the data
   bypassing the master server (unless the first time or if the region server
   is not responding)
   3. How frequently the master collects these logs? Is it real-time
   (within 1 sec interval !!)?
   4. Which HBase metrics will be most helpful to notice region hotspots
   from Ganglia?


I want to know which transaction request (read/write) going to which region
servers from the raw log dumps as like

No:12345 ---- Type:Write ---- Query ---- Region06
and so on ...


Many thanks again...


Regards,
Joarder Kamal

Fwd: How to collect the real-time transaction request logs from HBase Master/Region Servers?

Posted by Joarder KAMAL <jo...@gmail.com>.
Many apologies for forwarding this email again.

Could you let me know how can I be able to pull/export the real-time raw
logs (number of requests and their details in a particular regions) which
appears in the HBase Web UI like shown in below? I looked at pp. 277-283 of
Lars George's book and other sources but didn't get a clue :(

Any idea??

I want to perform real-time data stream mining with those logs.


Regards,
Joarder Kamal


---------- Forwarded message ----------
From: Joarder KAMAL <jo...@gmail.com>
Date: 4 June 2013 16:09
Subject: How to collect the real-time transaction request logs from HBase
Master/Region Servers?
To: dev@hbase.apache.org


Dear All,

I am a newbie in HBase/Hadoop and recently have a small-scale setup in a
research cloud:
------------------------------------------
1 Master Server (Also Hadoop Name Node)
3 Region Server (Also Hadoop Data Node)
1 Ganglia Monitoring Server
1 YCSB Workload Generation Server
------------------------------------------
HBase Version: 0.94.7, r1471806
Hadoop Version: 1.0.4, r1393290
Ganglia Version: gmond/gmetad - 3.6.0, gweb - 3.5.8
YCSB Version: 0.1.4
------------------------------------------

I have only one table in HBase - 'usertable' with a single column family
'cf1' holding 1,000,000 key-value records. The row keys are in
monotonically increasing order and currently I have 6 regions distributed
in the 3 region servers each holding 2 of the regions.
*
*
*Objective:* create region hotspots for some research experiments

*Observation:*
After running a workload consist of a total 10,000,000 operations (50%
read, 50% write) I've observed the below statistics in the Web UI of the
master server which can suggest potential hotspots in the 3rd (not sure why
!!) and 6th regions (possibly it was receiving large number of write
requests).

Table Regions
 NameRegion ServerStart Key End KeyRequests
usertable,,1369584948241.3061b90ff519c1bce5b3d867690a2b4a. hdb1-02:60030
user2035146605813492656 127946
usertable,user2035146605813492656,1369584948241.00f8a51bab6d98ebd7c4db582579c3e7.
hdb1-03:60030user2035146605813492656 user30679275375621809 126700
usertable,user30679275375621809,1369584813037.d704a50802ec39982884e394d4ef05b7.
hdb1-04:60030user30679275375621809 user5136356049533495298
*284828*usertable,user5136356049533495298,1369584928780.999b987d646462e21b8916a737619b39.
hdb1-02:60030 user5136356049533495298user617761656465008158133108usertable,user617761656465008158,1369584928780.9cfe288f48f987de7f93b800dcd4c964.
hdb1-04:60030 user617761656465008158user7218407885253116621119008usertable,user7218407885253116621,1369584832152.e3a9c4d35c91f06c18ed346886ff3306.
hdb1-03:60030 user7218407885253116621*363234*

*Questions:*

   1. Can the HBase developer community guide me on how to collect the *raw
   logs* (directly from the master/region servers) for the above table
   which I've retrieved from the Master server?
   2. And how the master server is getting these logs from the region
   servers? As far I've understand from the architecture the client will
   directly communicate with the region servers to read/write the data
   bypassing the master server (unless the first time or if the region server
   is not responding)
   3. How frequently the master collects these logs? Is it real-time
   (within 1 sec interval !!)?
   4. Which HBase metrics will be most helpful to notice region hotspots
   from Ganglia?


I want to know which transaction request (read/write) going to which region
servers from the raw log dumps as like

No:12345 ---- Type:Write ---- Query ---- Region06
and so on ...


Many thanks again...


Regards,
Joarder Kamal

Fwd: How to collect the real-time transaction request logs from HBase Master/Region Servers?

Posted by Joarder KAMAL <jo...@gmail.com>.
Many apologies for forwarding this email again.

Could you let me know how can I be able to pull/export the real-time raw
logs (number of requests and their details in a particular regions) which
appears in the HBase Web UI like shown in below? I looked at pp. 277-283 of
Lars George's book and other sources but didn't get a clue :(

Any idea??

I want to perform real-time data stream mining with those logs.


Regards,
Joarder Kamal


---------- Forwarded message ----------
From: Joarder KAMAL <jo...@gmail.com>
Date: 4 June 2013 16:09
Subject: How to collect the real-time transaction request logs from HBase
Master/Region Servers?
To: dev@hbase.apache.org


Dear All,

I am a newbie in HBase/Hadoop and recently have a small-scale setup in a
research cloud:
------------------------------------------
1 Master Server (Also Hadoop Name Node)
3 Region Server (Also Hadoop Data Node)
1 Ganglia Monitoring Server
1 YCSB Workload Generation Server
------------------------------------------
HBase Version: 0.94.7, r1471806
Hadoop Version: 1.0.4, r1393290
Ganglia Version: gmond/gmetad - 3.6.0, gweb - 3.5.8
YCSB Version: 0.1.4
------------------------------------------

I have only one table in HBase - 'usertable' with a single column family
'cf1' holding 1,000,000 key-value records. The row keys are in
monotonically increasing order and currently I have 6 regions distributed
in the 3 region servers each holding 2 of the regions.
*
*
*Objective:* create region hotspots for some research experiments

*Observation:*
After running a workload consist of a total 10,000,000 operations (50%
read, 50% write) I've observed the below statistics in the Web UI of the
master server which can suggest potential hotspots in the 3rd (not sure why
!!) and 6th regions (possibly it was receiving large number of write
requests).

Table Regions
 NameRegion ServerStart Key End KeyRequests
usertable,,1369584948241.3061b90ff519c1bce5b3d867690a2b4a. hdb1-02:60030
user2035146605813492656 127946
usertable,user2035146605813492656,1369584948241.00f8a51bab6d98ebd7c4db582579c3e7.
hdb1-03:60030user2035146605813492656 user30679275375621809 126700
usertable,user30679275375621809,1369584813037.d704a50802ec39982884e394d4ef05b7.
hdb1-04:60030user30679275375621809 user5136356049533495298
*284828*usertable,user5136356049533495298,1369584928780.999b987d646462e21b8916a737619b39.
hdb1-02:60030 user5136356049533495298user617761656465008158133108usertable,user617761656465008158,1369584928780.9cfe288f48f987de7f93b800dcd4c964.
hdb1-04:60030 user617761656465008158user7218407885253116621119008usertable,user7218407885253116621,1369584832152.e3a9c4d35c91f06c18ed346886ff3306.
hdb1-03:60030 user7218407885253116621*363234*

*Questions:*

   1. Can the HBase developer community guide me on how to collect the *raw
   logs* (directly from the master/region servers) for the above table
   which I've retrieved from the Master server?
   2. And how the master server is getting these logs from the region
   servers? As far I've understand from the architecture the client will
   directly communicate with the region servers to read/write the data
   bypassing the master server (unless the first time or if the region server
   is not responding)
   3. How frequently the master collects these logs? Is it real-time
   (within 1 sec interval !!)?
   4. Which HBase metrics will be most helpful to notice region hotspots
   from Ganglia?


I want to know which transaction request (read/write) going to which region
servers from the raw log dumps as like

No:12345 ---- Type:Write ---- Query ---- Region06
and so on ...


Many thanks again...


Regards,
Joarder Kamal