You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Arina Ielchiieva (JIRA)" <ji...@apache.org> on 2016/08/25 09:09:20 UTC
[jira] [Resolved] (DRILL-4852) COUNT(*) query against a large JOSN
table slower by 2x
[ https://issues.apache.org/jira/browse/DRILL-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arina Ielchiieva resolved DRILL-4852.
-------------------------------------
Resolution: Fixed
> COUNT(*) query against a large JOSN table slower by 2x
> ------------------------------------------------------
>
> Key: DRILL-4852
> URL: https://issues.apache.org/jira/browse/DRILL-4852
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
> Reporter: Khurram Faraaz
> Assignee: Arina Ielchiieva
> Priority: Critical
> Fix For: 1.8.0
>
>
> We have this manual test where it does a COUNT over 26 million JSON keys. From the results it looks like we have regressed and are slower by 2x on current 1.8.0 master 1.8.0-SNAPSHOT git commit ID : 57dc9f43
> Query takes over 30 seconds to execute consistently over several runs. Note that since this is a single large JSON file there is just one fragment doing all the work.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select count(*) from `twoKeyJsn.json`;
> +-----------+
> | EXPR$0 |
> +-----------+
> | 26212355 |
> +-----------+
> 1 row selected (29.001 seconds)
> {noformat}
> On Drill 1.2.0 the above query took 13.949 seconds
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)