You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2019/09/06 13:40:00 UTC

[jira] [Created] (IMPALA-8923) Don't need synchronized in HBaseTable.getEstimatedRowStats

Quanlong Huang created IMPALA-8923:
--------------------------------------

             Summary: Don't need synchronized in HBaseTable.getEstimatedRowStats
                 Key: IMPALA-8923
                 URL: https://issues.apache.org/jira/browse/IMPALA-8923
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 3.2.0, Impala 3.1.0, Impala 2.12.0, Impala 3.0, Impala 2.11.0, Impala 2.10.0, Impala 2.9.0, Impala 2.7.1, Impala 2.8.0, Impala 2.7.0, Impala 3.3.0
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang


HBaseTable.getEstimatedRowStats() estimates #rows and row size by sampling on hbase table in target key range. It requires HBase RPCs so could be slow.

Currently, HBaseTable.getEstimatedRowStats() is marked as synchronized. The purpose is to protect the HTable (old HBase API) object in legacy codes (before commit [cf9d248|https://github.com/apache/impala/commit/cf9d2485dd4e6544f6f1f407e2ad0b43eba31874]). However, after commit [cf9d248|https://github.com/apache/impala/commit/cf9d2485dd4e6544f6f1f407e2ad0b43eba31874], we create org.apache.hadoop.hbase.client.Table object for each task (See comments and usages of FeHBaseTable.Util.getHBaseTable()). So we don't need the "synchronized" marker anymore in HBaseTable.getEstimatedRowStats().

Keeping the "synchronized" marker is further harmful. In high qps workload, queries on the same table will wait for entering this method and cost a lot of time in waiting (if this method is comparable slow).

This can be revealed by manually adding a latency (e.g. 100ms) in FeHBaseTable.Util.getEstimatedRowStats() and run concurrent queries on the same hbase table. In my experiment, removing "synchronized" gains 40% boost in 95% percentil query time. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org