You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2019/09/07 14:07:00 UTC

[jira] [Work started] (IMPALA-8923) Don't need synchronized in HBaseTable.getEstimatedRowStats

     [ https://issues.apache.org/jira/browse/IMPALA-8923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on IMPALA-8923 started by Quanlong Huang.
----------------------------------------------
> Don't need synchronized in HBaseTable.getEstimatedRowStats
> ----------------------------------------------------------
>
>                 Key: IMPALA-8923
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8923
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.7.1, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0, Impala 3.1.0, Impala 3.2.0, Impala 3.3.0
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>
> HBaseTable.getEstimatedRowStats() estimates #rows and row size by sampling on hbase table in target key range. It requires HBase RPCs so could be slow.
> Currently, HBaseTable.getEstimatedRowStats() is marked as synchronized. The purpose is to protect the HTable (old HBase API) object in legacy codes (before commit [cf9d248|https://github.com/apache/impala/commit/cf9d2485dd4e6544f6f1f407e2ad0b43eba31874]). However, after commit [cf9d248|https://github.com/apache/impala/commit/cf9d2485dd4e6544f6f1f407e2ad0b43eba31874], we create org.apache.hadoop.hbase.client.Table object for each task (See comments and usages of FeHBaseTable.Util.getHBaseTable()). So we don't need the "synchronized" marker anymore in HBaseTable.getEstimatedRowStats().
> Keeping the "synchronized" marker is further harmful. In high qps workload, queries on the same table will wait for entering this method and cost a lot of time in waiting (if this method is comparable slow).
> This can be revealed by manually adding a latency (e.g. 100ms) in FeHBaseTable.Util.getEstimatedRowStats() and run concurrent queries on the same hbase table. In my experiment, removing "synchronized" gains 40% boost in 95% percentil query time. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org