You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/10/12 18:37:05 UTC
[jira] [Commented] (DRILL-3921) Hive LIMIT 1 queries take too long
[ https://issues.apache.org/jira/browse/DRILL-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953303#comment-14953303 ]
ASF GitHub Bot commented on DRILL-3921:
---------------------------------------
GitHub user sudheeshkatkam opened a pull request:
https://github.com/apache/drill/pull/197
DRILL-3921: Initialize the underlying record reader lazily in HiveRec…
…ordReader
@vkorukanti and @jacques-n can you please take a look. I need to add unit tests.
For my setup with 20K files, LIMIT 1 query now takes 53 seconds (~48 seconds for planning). Previously the query took 1300 seconds (~45 seconds for planning).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sudheeshkatkam/drill DRILL-3921
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/197.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #197
----
commit fdca17f3c223a4f51099616e059c394c8db3974d
Author: Sudheesh Katkam <sk...@maprtech.com>
Date: 2015-10-12T16:32:15Z
DRILL-3921: Initialize the underlying record reader lazily in HiveRecordReader
----
> Hive LIMIT 1 queries take too long
> ----------------------------------
>
> Key: DRILL-3921
> URL: https://issues.apache.org/jira/browse/DRILL-3921
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Reporter: Sudheesh Katkam
> Assignee: Sudheesh Katkam
>
> Fragment initialization on a Hive table (that is backed by a directory of many files) can take really long. This is evident through LIMIT 1 queries. The root cause is that the underlying reader in the HiveRecordReader is initialized when the ctor is called, rather than when setup is called.
> Two changes need to be made:
> 1) lazily initialize the underlying record reader in HiveRecordReader
> 2) allow for running a callable as a proxy user within an operator (through OperatorContext). This is required as initialization of the underlying record reader needs to be done as a proxy user (proxy for owner of the file). Previously, this was handled while creating the record batch tree.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)