You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/12/24 03:24:39 UTC

[GitHub] [iceberg] zhangjun0x01 opened a new issue #1983: Flink : query large iceberg tables slowly

zhangjun0x01 opened a new issue #1983:
URL: https://github.com/apache/iceberg/issues/1983


   when I use flink to query large iceberg table , I found that it is query slowly, it need a few minutes. 
   
   the table is partitioned table, the daily data is about 700 million , and the sql is like this `select * from mytable where d = 20201224 limit 1`.
   
   I only want to select one data to validate the data , but I found that the query is slowly ,even though we have push down the limit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangjun0x01 commented on issue #1983: Flink : query large iceberg tables slowly

Posted by GitBox <gi...@apache.org>.
zhangjun0x01 commented on issue #1983:
URL: https://github.com/apache/iceberg/issues/1983#issuecomment-751613468


   I did a detail test, if we add a query condition in `where`, for example `select * from mytable where d = 20201224 limit 1 ` the limit push down will fail , both partitioned table and unpartitioned table.
   
   I test `HiveTableSource` in flink,it has the same problem. (Except the condition is the partition field)
   
   I am not sure whether this is a bug for flink


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangjun0x01 commented on issue #1983: Flink : query large iceberg tables slowly

Posted by GitBox <gi...@apache.org>.
zhangjun0x01 commented on issue #1983:
URL: https://github.com/apache/iceberg/issues/1983#issuecomment-752306169


   I look up the flink source code ,I found the comment. [here](https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-blink/src/main/java/org/apache/flink/table/planner/plan/rules/logical/PushLimitIntoTableSourceScanRule.java#L64)
   
   ```
         // a limit can be pushed down only if it satisfies the two conditions: 1) do not have order
           // by keys, 2) have limit.
   ```
   
   I think it may be a flink bug ,I create a [flink issue](https://issues.apache.org/jira/browse/FLINK-20809) to trace this 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangjun0x01 commented on issue #1983: Flink : query large iceberg tables slowly

Posted by GitBox <gi...@apache.org>.
zhangjun0x01 commented on issue #1983:
URL: https://github.com/apache/iceberg/issues/1983#issuecomment-751591824


   I tested it ,found that `limit push down` is invalid for partitioned tables


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on issue #1983: Flink : query large iceberg tables slowly

Posted by GitBox <gi...@apache.org>.
openinx commented on issue #1983:
URL: https://github.com/apache/iceberg/issues/1983#issuecomment-750789194


   Could you pls dump the stacktrace when executing the `SELECT` query  ?   I will check what's the reason that hung the query.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on issue #1983: Flink : query large iceberg tables slowly

Posted by GitBox <gi...@apache.org>.
openinx commented on issue #1983:
URL: https://github.com/apache/iceberg/issues/1983#issuecomment-751595449


   > limit push down is invalid for partitioned tables. 
   
   Could you provide more details ?  This is limited by apache flink,  Or bugs from iceberg integration work ? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangjun0x01 removed a comment on issue #1983: Flink : query large iceberg tables slowly

Posted by GitBox <gi...@apache.org>.
zhangjun0x01 removed a comment on issue #1983:
URL: https://github.com/apache/iceberg/issues/1983#issuecomment-751591824


   I tested it ,found that `limit push down` is invalid for partitioned tables


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangjun0x01 commented on issue #1983: Flink : query large iceberg tables slowly

Posted by GitBox <gi...@apache.org>.
zhangjun0x01 commented on issue #1983:
URL: https://github.com/apache/iceberg/issues/1983#issuecomment-751158493


   I checked the log and found that the log is not comprehensive and does not have more useful information, 
   so I want to add some logs to print the execution time of each process


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org