You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/12/26 14:45:00 UTC

[jira] [Commented] (KYLIN-4315) use metadata numRows in beeline client for quick row counting

    [ https://issues.apache.org/jira/browse/KYLIN-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003667#comment-17003667 ] 

ASF GitHub Bot commented on KYLIN-4315:
---------------------------------------

xiacongling commented on pull request #1024: KYLIN-4315 use metadata numRows in beeline client for quick row counting
URL: https://github.com/apache/kylin/pull/1024
 
 
   see https://issues.apache.org/jira/browse/KYLIN-4315
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> use metadata numRows in beeline client for quick row counting
> -------------------------------------------------------------
>
>                 Key: KYLIN-4315
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4315
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>            Reporter: Congling Xia
>            Assignee: Congling Xia
>            Priority: Major
>
> Hi, I find that in `BeelineHiveClient`, method `getHiveTableRows` uses "select count(*) from <tb_name>" for table row counting. The method is invoked in flat intermediate table redistribution step in cube building.
> This stats can be loaded in metastore. It costs much less time than scanning all rows in Hive table. Since intermediate tables are created and inserted by Kylin, statistics will be automatically calculated and stored in metastore when `[hive.stats.autogather|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.stats.autogather]` is enabled (which is the default setting for Hive). 
> ref Hive wiki for more detail about `numRows` stats: [https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables%E2%80%93ANALYZE]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)