You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/16 06:18:00 UTC
[jira] [Commented] (DRILL-5972) Slow performance for query on INFORMATION_SCHEMA.TABLE

    [ https://issues.apache.org/jira/browse/DRILL-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16254816#comment-16254816 ] 

ASF GitHub Bot commented on DRILL-5972:
---------------------------------------

GitHub user ppadma opened a pull request:

    https://github.com/apache/drill/pull/1038

    DRILL-5972: Slow performance for query on INFORMATION_SCHEMA.TABLE

    Please see DRILL-5972.
    Problem is while evaluating "boolean and", we are returning as soon as expression result is not TRUE. We should go through all the expressions and if any of them is FALSE, return FALSE. If any of them is INCONCLUSIVE, return INCONCLUSIVE. 
    Tested the fix with lot of hive tables and verified that it fixes the performance problem.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ppadma/drill DRILL-5972

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1038.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1038
    
----
commit 70f489098acc94d9e1af1aa05c35e2dfe01198c6
Author: Padma Penumarthy <pp...@yahoo.com>
Date:   2017-11-16T04:31:23Z

    DRILL-5972: Slow performance for query on INFORMATION_SCHEMA.TABLE

----


> Slow performance for query on INFORMATION_SCHEMA.TABLE
> ------------------------------------------------------
>
>                 Key: DRILL-5972
>                 URL: https://issues.apache.org/jira/browse/DRILL-5972
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Information Schema
>    Affects Versions: 1.11.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>             Fix For: 1.12.0
>
>
> A query like the following on INFORMATION_SCHEMA takes a long time to execute. 
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from INFORMATION_SCHEMA.`TABLES` WHERE TABLE_NAME LIKE '%' AND ( TABLE_SCHEMA = 'hive.default' ) ORDER BY TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME; 
> Reason being we fetch table information for all schemas instead of just 'hive.default' schema.
> If we  change the predicate like this, it executes very fast.
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from INFORMATION_SCHEMA.`TABLES` WHERE  ( TABLE_SCHEMA = 'hive.default' ) AND TABLE_NAME LIKE '%'  ORDER BY TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME; 
> The difference is in the order in which we evaluate the expressions in the predicate.
> In the first case,  we first evaluate TABLE_NAME LIKE '%' and decide that it is inconclusive (since we do not know the schema). So, we go get all tables for all the schemas.
> In the second case, we first evaluate  TABLE_SCHEMA = 'hive.default' and decide that we need to fetch only tables for that schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)