You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/09/01 14:29:00 UTC

[jira] [Commented] (DRILL-8290) Short cut recursive file listings for LIMIT 0 queries

    [ https://issues.apache.org/jira/browse/DRILL-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598995#comment-17598995 ] 

ASF GitHub Bot commented on DRILL-8290:
---------------------------------------

jnturton commented on PR #2636:
URL: https://github.com/apache/drill/pull/2636#issuecomment-1234361083

   @vvysotskyi I did spot one [other recursive file listing](https://github.com/jnturton/drill/blob/65fb7ddc144ecae5330c9325af63010748f74cdf/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java#L376) that could possibly use the short cut in this PR if I propagate a `limit0` flag down to it.
   
   It appears to be invoked only if there are Parquet files present at the top level of the queried path which I don't think should be too common for big datasets since data files are generally only present at the leaves of the directory tree. So I thought I'd ask if you think it's worth trying to implement the single file short cut here too, or we just leave it alone?




> Short cut recursive file listings for LIMIT 0 queries
> -----------------------------------------------------
>
>                 Key: DRILL-8290
>                 URL: https://issues.apache.org/jira/browse/DRILL-8290
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning &amp; Optimization
>    Affects Versions: 1.20.2
>            Reporter: James Turton
>            Priority: Minor
>             Fix For: 2.0.0
>
>
> The existing LIMIT 0 query optimisations do not prevent a query run against the top of a deep DFS directory tree from recursively listing FileStatuses for everything within it using a pool of worker threads. This Issue proposes a new optimisation whereby such queries will recurse into the directory tree on a single thread that returns as soon as any single FileStatus has been obtained.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)