You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2014/10/08 10:46:33 UTC
[jira] [Commented] (TAJO-838) Improve query planner to utilize
index
[ https://issues.apache.org/jira/browse/TAJO-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163237#comment-14163237 ]
ASF GitHub Bot commented on TAJO-838:
-------------------------------------
GitHub user jihoonson opened a pull request:
https://github.com/apache/tajo/pull/192
TAJO-838: Improve query planner to utilize index
Hi guys.
This is an ongoing work. Even though there still remain critical problems for practical uses, I'd like to share the progress of this issue.
Finally, I succeeded to utilize the index for query processing.
To show the effectiveness of the index, I carried out a simple performance test as follows.
* Environments: an in-house cluster that consists of one master and 32 workers.
* Data: TPC-DS store_sales table at scale factor 100 (41 GB).
* DDL for index creation: create index ss_item_sk_idx on store_sales (ss_item_sk asc null first);
* Test query: select ss_item_sk from store_sales where ss_item_sk = 1; (selectivity = 0.000045139%)
* Result
| | Without disk cache | With disk cache |
|--- | --- | ---|
| Without index | 23.917 | 19.154 |
| With index | 4.207 | 3.995 |
Although the selectivity of the query is very low, I think that this result shows a potential benefit of index.
Here are some remaining issues.
* Selectivity estimation. In the current patch, index utilization is forced when it exists. I'll improve this to use the index when it is beneficial.
* Support index for partitioned tables
* Consider the case when the query predicate includes two or more columns.
* Code refactoring and potential bug fixes
* Add more tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jihoonson/tajo-2 TAJO-838
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tajo/pull/192.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #192
----
commit 7c98709f0fcb06dfb675acae3d6489a6126f55b5
Author: jinossy <ji...@gmail.com>
Date: 2014-08-06T08:43:35Z
TAJO-995: HiveMetaStoreClient wrapper should retry the connection
commit 415d0867ae4a4543f47360294bead1fc7f41e292
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-08-10T06:07:24Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo
commit 7a7b4fd26f61df89cacdb4fc41faf9c2abe456b2
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-08-11T02:28:48Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo
commit 45f5ed3adba931f4706f26dda1d3c03240ee11d3
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-08-11T05:40:25Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo
commit aa01e83859ef553ac4eb90c1678e3bc6be20c6c9
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-08-18T09:56:24Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo
commit b33a94509c1a007b56785435c8e16640ffde91b7
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-09-04T02:14:19Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo
commit 05c892448113db40daef54d2e06dad463dbae9c8
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-09-11T02:33:54Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo
commit 42a6c4ebeebe36aad6f7dc5f92c83baee398c85e
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-09-11T03:25:05Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo
commit 52a942136c549f197d6d1c3d1a13717e6f14a83f
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-09-24T01:26:36Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo
commit 0e63abc71723c4c22f2c591ae60d892a9707973f
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-09-24T01:48:09Z
TAJO-1062: Update TSQL documentation
commit e59fe460cc888bfebe968619405edd9b9e57e410
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-09-27T07:50:08Z
Update tsql documentation.
commit a5402249a2f7df85aa01280ed359f1d1d3489281
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-09-27T07:52:05Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1062
commit 706be644389f1223f5ea19d1418c6b6aa8b9bc96
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-10-01T03:31:51Z
Update some typos
commit 327a9c4edd5426521b86afac12dbab4640a7164a
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-10-01T05:51:24Z
Rename back_command.rst
commit e1f2b6b437fdb842166f8cf7a8c8fbf4bce19041
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-10-01T05:54:36Z
Use "Tajo" instead of "tajo"
commit b6c06138e756823daebf188174e63bead43c0c05
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-10-01T06:25:43Z
Update some comments.
commit 4e203f03fb533e04577f5c2947823fedd8680b8a
Author: Hyunsik Choi <hy...@apache.org>
Date: 2014-10-04T16:11:59Z
TAJO-1072: CLI gets stuck when wrong host/port is provided. (Jihun Kang via hyunsik)
Closes #169
commit 68b44da57e53f53e88bedc0bb7ac763c97f069a9
Author: Hyunsik Choi <hy...@apache.org>
Date: 2014-10-04T16:36:25Z
TAJO-1065: The \admin -cluster argument doesn't run as expected. (Jongyoung Park via hyunsik)
Closes #173
commit 029054b45c158159325a68ac1491256e3abe71f4
Author: Hyunsik Choi <hy...@apache.org>
Date: 2014-10-05T00:56:12Z
TAJO-1030: Not supported JDBC APIs should return empty results instead of Exception. (Hyoungjun Kim via hyunsik)
Closes #145
commit ecc2b05af60d9540c758839c1f5d691850ac772b
Author: Hyunsik Choi <hy...@apache.org>
Date: 2014-10-05T01:04:27Z
TAJO-668: Add datetime function documentation. (Jongyoung Park via hyunsik)
Closes #160
commit a282fc1059c2489804fe08e02234a7d09dba2a10
Author: Jihoon Son <ji...@apache.org>
Date: 2014-10-05T07:43:40Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-838
commit 28b4cbc036b05a109694e1dcbaeacd802d0c9f71
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-10-05T13:15:27Z
Rename cli.rst to tsql.rst
commit 03847cf497779d18d323bf94a9e2f0d79dadcb96
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-10-05T13:16:12Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1062
commit c86d7ade7cde7de60a79d44196eb8c401b9f2a68
Author: Jihoon Son <ji...@apache.org>
Date: 2014-10-05T14:40:31Z
TAJO-838
commit ca187bcf68ce81b984ccb7c2e2b5adc25ebff237
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-10-06T01:43:27Z
Updated Change Note.
commit 4f987d967aa3a68e2c17cd8472120ec4316e0fc0
Author: Jihoon Son <ji...@apache.org>
Date: 2014-10-06T02:56:55Z
TAJO-838
commit ca5fb301bff4b38d80a523d5bece9eaf74f64ec3
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-10-06T05:12:42Z
TAJO-1067: INSERT OVERWRITE INTO should not remove all partitions. (jaehwa)
commit 44e6fe595da28c6c06e5c82741478a8ff3031fa9
Author: Jaehwa Jung <bl...@apache.org>
Date: 2014-10-06T07:28:45Z
TAJO-1096: Update download source documentation (Mai Hai Thanh via jaehwa)
Closes #182
commit 67541c48aaa577848023e2e7cedab727f33b8a52
Author: Jihoon Son <ji...@apache.org>
Date: 2014-10-06T08:52:37Z
TAJO-838
commit 3d630f93be0c50f09abf62aa00e69c0be5dabe7e
Author: Jihoon Son <ji...@apache.org>
Date: 2014-10-06T08:52:46Z
Merge branch 'master' of http://git-wip-us.apache.org/repos/asf/tajo into TAJO-838
----
> Improve query planner to utilize index
> --------------------------------------
>
> Key: TAJO-838
> URL: https://issues.apache.org/jira/browse/TAJO-838
> Project: Tajo
> Issue Type: Sub-task
> Components: planner/optimizer
> Reporter: Jihoon Son
> Assignee: Jihoon Son
> Priority: Minor
>
> Index can improve the query performance when the selectivity of query is high.
> Thus, query planner should decide whether index is used or not for a given query.
> The selectivity can be guessed using statistics.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)