You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/26 13:25:27 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue #136: ParquetTable should avoid scanning all files twice

alamb opened a new issue #136:
URL: https://github.com/apache/arrow-datafusion/issues/136


   *Note*: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-11047
   
   ParquetTable currently reads the metadata for all files once in the constructor in order to get the schema, and does it again each time scan() is called.
   
   We could read the metadata once and cache it instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb closed issue #136: ParquetTable should avoid scanning all files twice

Posted by GitBox <gi...@apache.org>.

alamb closed issue #136:
URL: https://github.com/apache/arrow-datafusion/issues/136


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] houqp commented on issue #136: ParquetTable should avoid scanning all files twice

Posted by GitBox <gi...@apache.org>.

houqp commented on issue #136:
URL: https://github.com/apache/arrow-datafusion/issues/136#issuecomment-945331026


   @alamb I think this can be closed now, @rdettai 's file format implementation only scans a single file now for schema inference.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on issue #136: ParquetTable should avoid scanning all files twice

Posted by GitBox <gi...@apache.org>.

alamb commented on issue #136:
URL: https://github.com/apache/arrow-datafusion/issues/136#issuecomment-946062165


   👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org