You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/02/09 22:18:18 UTC

[jira] [Commented] (DRILL-4380) Fix performance regression: in creation of FileSelection in ParquetFormatPlugin to not set files if metadata cache is available.

    [ https://issues.apache.org/jira/browse/DRILL-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139780#comment-15139780 ] 

ASF GitHub Bot commented on DRILL-4380:
---------------------------------------

GitHub user parthchandra opened a pull request:

    https://github.com/apache/drill/pull/369

    DRILL-4380: Fix performance regression: in creation of FileSelection …

    …in ParquetFormatPlugin to not set files if metadata cache is available.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/parthchandra/incubator-drill DRILL-4380

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/369.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #369
    
----
commit be374c12992ef581a285b0a260bb9ad037d6df92
Author: Parth Chandra <pa...@apache.org>
Date:   2015-12-18T00:30:42Z

    DRILL-4380: Fix performance regression: in creation of FileSelection in ParquetFormatPlugin to not set files if metadata cache is available.

----


> Fix performance regression: in creation of FileSelection in ParquetFormatPlugin to not set files if metadata cache is available.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4380
>                 URL: https://issues.apache.org/jira/browse/DRILL-4380
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Parth Chandra
>
> The regression has been caused by the changes in 367d74a65ce2871a1452361cbd13bbd5f4a6cc95 (DRILL-2618: handle queries over empty folders consistently so that they report table not found rather than failing.)
> In ParquetFormatPlugin, the original code created a FileSelection object in the following code:
> {code}
> return new FileSelection(fileNames, metaRootPath.toString(), metadata, selection.getFileStatusList(fs));
> {code}
> The selection.getFileStatusList call made an inexpensive call to FileSelection.init(). The call was inexpensive because the FileSelection.files member was not set and the code does not need to make an expensive call to get the file statuses corresponding to the files in the FileSelection.files member.
> In the new code, this is replaced by 
> {code}
>   final FileSelection newSelection = FileSelection.create(null, fileNames, metaRootPath.toString());
>         return ParquetFileSelection.create(newSelection, metadata);
> {code}
> This sets the FileSelection.files member but not the FileSelection.statuses member. A subsequent call to FileSelection.getStatuses ( in ParquetGroupScan() ) now makes an expensive call to get all the statuses.
> It appears that there was an implicit assumption that the FileSelection.statuses member should be set before the FileSelection.files member is set. This assumption is no longer true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)