You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/12/01 19:58:10 UTC

[jira] [Commented] (DRILL-4126) Adding HiveMetaStore caching when impersonation is enabled.

    [ https://issues.apache.org/jira/browse/DRILL-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034359#comment-15034359 ] 

ASF GitHub Bot commented on DRILL-4126:
---------------------------------------

GitHub user jinfengni opened a pull request:

    https://github.com/apache/drill/pull/286

    Drill 4127: Reduce Hive metastore client API call in HiveSchema

    Also, it has commit for DRILL-4126: Add cache to HiveSchema in order to reduce long planning time or execution time caused by slow Hive meta store.
    
    Both DRILL-4127 and DRILL-4126 address the long delay caused by slow hive meta store. 
     
    Passed unit, pre-commit regression, and additional impersonation test, before rebasing onto latest master.
    
    Will re-run the above tests. 
    
    @vkorukanti , could you please review the two patches? Thanks.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinfengni/incubator-drill DRILL-4127

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/286.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #286
    
----
commit 19a5a4d1c9c23eedcb94c988bd2229680575a118
Author: Jinfeng Ni <jn...@apache.org>
Date:   2015-11-19T04:18:51Z

    DRILL-4127: Reduce Hive metastore client API call in HiveSchema.
    
    1) Use lazy loading of tableNames in HiveSchema, in stead of pre-loading all table names under each HiveSchema.
    2) Do not call get_all_databases for subSchema to check existence if the name comes from getSubSchemaNames() directly.

commit 9570319c227649144d3a14f8d5774fbe4a282bc4
Author: Jinfeng Ni <jn...@apache.org>
Date:   2015-11-30T04:15:07Z

    DRILL-4126: Add cache to HiveSchema in order to reduce long planning time or execution time caused by slow Hive meta store.
    
    1) HiveSchema caching will help in case impersonation is enabled.
    2) Use flat level cache for tables in DrillHiveMetaStoreClient.

----


> Adding HiveMetaStore caching when impersonation is enabled. 
> ------------------------------------------------------------
>
>                 Key: DRILL-4126
>                 URL: https://issues.apache.org/jira/browse/DRILL-4126
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>
> Currently, HiveMetastore caching is used only when impersonation is disabled, such that all the hivemetastore call goes through NonCloseableHiveClientWithCaching [1]. However, if impersonation is enabled, caching is not used for HiveMetastore access.
> This could significantly increase the planning time when hive storage plugin is enabled, or when running a query against INFORMATION_SCHEMA. Depending on the # of databases/tables in Hive storage plugin, the planning time or INFORMATION_SCHEMA query could become unacceptable. This becomes even worse if the hive metastore is running on a different node from drillbit, making the access of hivemetastore even slower.
> We are seeing that it could takes 30~60 seconds for planning time, or execution time for INFORMATION_SCHEMA query.  The long planning or execution time for INFORMATION_SCHEMA query prevents Drill from acting "interactively" for such queries. 
> We should enable caching when impersonation is used. As long as the authorizer verifies the user has the access to databases/tables, we should get the data from caching. By doing that, we should see reduced number of api call to HiveMetaStore.
> [1] https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java#L299



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)