You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/20 20:25:52 UTC

[GitHub] [iceberg] sudssf opened a new issue #1220: add option to disable compute statistics for spark2 and spark3

sudssf opened a new issue #1220:
URL: https://github.com/apache/iceberg/issues/1220


   more context on thread: 
   https://lists.apache.org/thread.html/r477483ead7c7be7346bebcaab75b405dce67607457ac3f7cf682b999%40%3Cdev.iceberg.apache.org%3E
   
   in spark2 , any operation such as join or union will end up scanning full table manifest ( without performing push down predicate) which can be really slow for large tables.
   
   in spark3, predicate pushdown is performed before calling `estimateStatistics` on reader but option is still useful if size of filter is comparatively large w.r.t manifest ( table generated by stream ingestion).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] sudssf commented on issue #1220: add option to disable compute statistics for spark2 and spark3

Posted by GitBox <gi...@apache.org>.
sudssf commented on issue #1220:
URL: https://github.com/apache/iceberg/issues/1220#issuecomment-661320100


   submitted PR https://github.com/apache/iceberg/pull/1221/files


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed issue #1220: add option to disable compute statistics for spark2 and spark3

Posted by GitBox <gi...@apache.org>.
rdblue closed issue #1220:
URL: https://github.com/apache/iceberg/issues/1220


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org