You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2014/10/08 10:46:33 UTC

[jira] [Commented] (TAJO-838) Improve query planner to utilize index

    [ https://issues.apache.org/jira/browse/TAJO-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163237#comment-14163237 ] 

ASF GitHub Bot commented on TAJO-838:
-------------------------------------

GitHub user jihoonson opened a pull request:

    https://github.com/apache/tajo/pull/192

    TAJO-838: Improve query planner to utilize index

    Hi guys.
    
    This is an ongoing work. Even though there still remain critical problems for practical uses, I'd like to share the progress of this issue.
    
    Finally, I succeeded to utilize the index for query processing.
    To show the effectiveness of the index, I carried out a simple performance test as follows.
    
    * Environments: an in-house cluster that consists of one master and 32 workers.
    * Data: TPC-DS store_sales table at scale factor 100 (41 GB).
    * DDL for index creation: create index ss_item_sk_idx on store_sales (ss_item_sk asc null first);
    * Test query: select ss_item_sk from store_sales where ss_item_sk = 1; (selectivity = 0.000045139%)
    * Result
    
    | | Without disk cache | With disk cache |
    |--- | --- | ---|
    | Without index | 23.917 | 19.154 |
    | With index | 4.207      | 3.995 |
    
    Although the selectivity of the query is very low, I think that this result shows a potential benefit of index. 
    
    Here are some remaining issues.
    * Selectivity estimation. In the current patch, index utilization is forced when it exists. I'll improve this to use the index when it is beneficial.
    * Support index for partitioned tables
    * Consider the case when the query predicate includes two or more columns.
    * Code refactoring and potential bug fixes
    * Add more tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jihoonson/tajo-2 TAJO-838

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #192
    
----
commit 7c98709f0fcb06dfb675acae3d6489a6126f55b5
Author: jinossy <ji...@gmail.com>
Date:   2014-08-06T08:43:35Z

    TAJO-995: HiveMetaStoreClient wrapper should retry the connection

commit 415d0867ae4a4543f47360294bead1fc7f41e292
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-08-10T06:07:24Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

commit 7a7b4fd26f61df89cacdb4fc41faf9c2abe456b2
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-08-11T02:28:48Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

commit 45f5ed3adba931f4706f26dda1d3c03240ee11d3
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-08-11T05:40:25Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

commit aa01e83859ef553ac4eb90c1678e3bc6be20c6c9
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-08-18T09:56:24Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

commit b33a94509c1a007b56785435c8e16640ffde91b7
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-09-04T02:14:19Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

commit 05c892448113db40daef54d2e06dad463dbae9c8
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-09-11T02:33:54Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

commit 42a6c4ebeebe36aad6f7dc5f92c83baee398c85e
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-09-11T03:25:05Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

commit 52a942136c549f197d6d1c3d1a13717e6f14a83f
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-09-24T01:26:36Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

commit 0e63abc71723c4c22f2c591ae60d892a9707973f
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-09-24T01:48:09Z

    TAJO-1062: Update TSQL documentation

commit e59fe460cc888bfebe968619405edd9b9e57e410
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-09-27T07:50:08Z

    Update tsql documentation.

commit a5402249a2f7df85aa01280ed359f1d1d3489281
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-09-27T07:52:05Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1062

commit 706be644389f1223f5ea19d1418c6b6aa8b9bc96
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-10-01T03:31:51Z

    Update some typos

commit 327a9c4edd5426521b86afac12dbab4640a7164a
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-10-01T05:51:24Z

    Rename back_command.rst

commit e1f2b6b437fdb842166f8cf7a8c8fbf4bce19041
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-10-01T05:54:36Z

    Use "Tajo" instead of "tajo"

commit b6c06138e756823daebf188174e63bead43c0c05
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-10-01T06:25:43Z

    Update some comments.

commit 4e203f03fb533e04577f5c2947823fedd8680b8a
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-10-04T16:11:59Z

    TAJO-1072: CLI gets stuck when wrong host/port is provided. (Jihun Kang via hyunsik)
    
    Closes #169

commit 68b44da57e53f53e88bedc0bb7ac763c97f069a9
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-10-04T16:36:25Z

    TAJO-1065: The \admin -cluster argument doesn't run as expected. (Jongyoung Park via hyunsik)
    
    Closes #173

commit 029054b45c158159325a68ac1491256e3abe71f4
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-10-05T00:56:12Z

    TAJO-1030: Not supported JDBC APIs should return empty results instead of Exception. (Hyoungjun Kim via hyunsik)
    
    Closes #145

commit ecc2b05af60d9540c758839c1f5d691850ac772b
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-10-05T01:04:27Z

    TAJO-668: Add datetime function documentation. (Jongyoung Park via hyunsik)
    
    Closes #160

commit a282fc1059c2489804fe08e02234a7d09dba2a10
Author: Jihoon Son <ji...@apache.org>
Date:   2014-10-05T07:43:40Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-838

commit 28b4cbc036b05a109694e1dcbaeacd802d0c9f71
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-10-05T13:15:27Z

    Rename cli.rst to tsql.rst

commit 03847cf497779d18d323bf94a9e2f0d79dadcb96
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-10-05T13:16:12Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1062

commit c86d7ade7cde7de60a79d44196eb8c401b9f2a68
Author: Jihoon Son <ji...@apache.org>
Date:   2014-10-05T14:40:31Z

    TAJO-838

commit ca187bcf68ce81b984ccb7c2e2b5adc25ebff237
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-10-06T01:43:27Z

    Updated Change Note.

commit 4f987d967aa3a68e2c17cd8472120ec4316e0fc0
Author: Jihoon Son <ji...@apache.org>
Date:   2014-10-06T02:56:55Z

    TAJO-838

commit ca5fb301bff4b38d80a523d5bece9eaf74f64ec3
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-10-06T05:12:42Z

    TAJO-1067: INSERT OVERWRITE INTO should not remove all partitions. (jaehwa)

commit 44e6fe595da28c6c06e5c82741478a8ff3031fa9
Author: Jaehwa Jung <bl...@apache.org>
Date:   2014-10-06T07:28:45Z

    TAJO-1096: Update download source documentation (Mai Hai Thanh via jaehwa)
    
    Closes #182

commit 67541c48aaa577848023e2e7cedab727f33b8a52
Author: Jihoon Son <ji...@apache.org>
Date:   2014-10-06T08:52:37Z

    TAJO-838

commit 3d630f93be0c50f09abf62aa00e69c0be5dabe7e
Author: Jihoon Son <ji...@apache.org>
Date:   2014-10-06T08:52:46Z

    Merge branch 'master' of http://git-wip-us.apache.org/repos/asf/tajo into TAJO-838

----


> Improve query planner to utilize index
> --------------------------------------
>
>                 Key: TAJO-838
>                 URL: https://issues.apache.org/jira/browse/TAJO-838
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: planner/optimizer
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>            Priority: Minor
>
> Index can improve the query performance when the selectivity of query is high.
> Thus, query planner should decide whether index is used or not for a given query.
> The selectivity can be guessed using statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)