You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/11/21 08:37:00 UTC
[jira] [Commented] (DRILL-6857) Limit is not being pushed into scan when selecting from a parquet file with multiple row groups.

    [ https://issues.apache.org/jira/browse/DRILL-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694390#comment-16694390 ] 

ASF GitHub Bot commented on DRILL-6857:
---------------------------------------

arina-ielchiieva opened a new pull request #1548: DRILL-6857: Read only required row groups in a file when limit push down is applied
URL: https://github.com/apache/drill/pull/1548
 
 
   Details in [DRILL-6857](https://issues.apache.org/jira/browse/DRILL-6857).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Limit is not being pushed into scan when selecting from a parquet file with multiple row groups.
> ------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6857
>                 URL: https://issues.apache.org/jira/browse/DRILL-6857
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.15.0
>            Reporter: Anton Gozhiy
>            Assignee: Arina Ielchiieva
>            Priority: Major
>             Fix For: 1.15.0
>
>         Attachments: DRILL_5796_test_data.parquet
>
>
> *Data:*
> A parquet file that contains more than one row group. Example is attached.
> *Query:*
> {code:sql}
> explain plan for select * from dfs.tmp.`DRILL_5796_test_data.parquet` limit 1
> {code}
> *Expected result:*
> numFiles=1, numRowGroups=1
> *Actual result:*
> numFiles=1, numRowGroups=3
> {noformat}
> 00-00    Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 1.0, cumulative cost = {274.1 rows, 280.1 cpu, 270.0 io, 0.0 network, 0.0 memory}, id = 13671
> 00-01      Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount = 1.0, cumulative cost = {274.0 rows, 280.0 cpu, 270.0 io, 0.0 network, 0.0 memory}, id = 13670
> 00-02        SelectionVectorRemover : rowType = RecordType(DYNAMIC_STAR **): rowcount = 1.0, cumulative cost = {273.0 rows, 279.0 cpu, 270.0 io, 0.0 network, 0.0 memory}, id = 13669
> 00-03          Limit(fetch=[1]) : rowType = RecordType(DYNAMIC_STAR **): rowcount = 1.0, cumulative cost = {272.0 rows, 278.0 cpu, 270.0 io, 0.0 network, 0.0 memory}, id = 13668
> 00-04            Limit(fetch=[1]) : rowType = RecordType(DYNAMIC_STAR **): rowcount = 1.0, cumulative cost = {271.0 rows, 274.0 cpu, 270.0 io, 0.0 network, 0.0 memory}, id = 13667
> 00-05              Scan(table=[[dfs, tmp, DRILL_5796_test_data.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tmp/DRILL_5796_test_data.parquet]], selectionRoot=maprfs:/tmp/DRILL_5796_test_data.parquet, numFiles=1, numRowGroups=3, usedMetadataFile=false, columns=[`**`]]]) : rowType = RecordType(DYNAMIC_STAR **): rowcount = 270.0, cumulative cost = {270.0 rows, 270.0 cpu, 270.0 io, 0.0 network, 0.0 memory}, id = 13666
> {noformat}
> *Note:*
> The limit pushdown works with the same data partitioned by files (1 row group for a file )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)