You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Yu Li (JIRA)" <ji...@apache.org> on 2017/06/02 07:21:04 UTC

[jira] [Commented] (HBASE-15160) Put back HFile's HDFS op latency sampling code and add metrics for monitoring

    [ https://issues.apache.org/jira/browse/HBASE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034265#comment-16034265 ] 

Yu Li commented on HBASE-15160:
-------------------------------

Here is the performance testing result:
|| Case || Round || Throughput (ops/s)|| AverageLatency(us)||
| w/o patch| 1 | 120820.29 | 26309.65 |
| | 2|122079.26|26019.93|
|w/ patch v6| 1 | 85544.53| 37222.54|
| |2|87071.61|36563.49|


Test details:
{noformat}
# HBase Settings
-Xmx49152m -Xms49152m -Xmn6144m -XX:SurvivorRatio=2
hfile.block.cache.size => 0.16
hbase.regionserver.handler.count => 192
hbase.rpc.server.impl => org.apache.hadoop.hbase.ipc.NettyRpcServer

# YCSB settings
recordcount=11000000 (11M)
fieldcount=1
fieldlength=1024
table schema：{NAME => 'cf', DATA_BLOCK_ENCODING => 'DIFF', VERSIONS=> '1', COMPRESSION => 'SNAPPY', IN_MEMORY => 'false', BLOCKCACHE => 'true'},{SPLITS => (1..9).map {|i| "user#{1000+i*(9999-1000)/9}"}, METADATA => {'hbase.hstore.block.storage.policy' => 'ALL_SSD'}}

LRUCache hit ratio: 93%
{noformat}

Checking the patch in more details, I doubt the regression comes from the {{System.nanoTime}} call, and will change it to {{System.currentTimeMillis}} and re-test

> Put back HFile's HDFS op latency sampling code and add metrics for monitoring
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-15160
>                 URL: https://issues.apache.org/jira/browse/HBASE-15160
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0, 1.1.2
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Critical
>         Attachments: HBASE-15160.patch, HBASE-15160_v2.patch, HBASE-15160_v3.patch, hbase-15160_v4.patch, hbase-15160_v5.patch, hbase-15160_v6.patch
>
>
> In HBASE-11586 all HDFS op latency sampling code, including fsReadLatency, fsPreadLatency and fsWriteLatency, have been removed. There was some discussion about putting them back in a new JIRA but never happened. According to our experience, these metrics are useful to judge whether issue lies on HDFS when slow request occurs, so we propose to put them back in this JIRA, and add the metrics for monitoring as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)