You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/03/02 13:18:02 UTC

[GitHub] [beam] iemejia commented on pull request #14117: [BEAM-7929] Support column projection for Parquet Tables

iemejia commented on pull request #14117:
URL: https://github.com/apache/beam/pull/14117#issuecomment-788901968


   > I'm sad to see a SchemaIOProvider implementation go, but I think it's the right thing to do for now. We need to figure out how it can support projection/predicate pushdown. (Not that hard to add the interfaces, but it's hard to figure out how they'd be useful in core Beam).
   
   I thought SchemaIOProvider's main goal was just to easily wrap IOs for SQL and it does great job for that. I took the other route because more advanced use cases were using the full Table hierarchy.
    
   > My only ask for now is that you add some tests. I think the test case in https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTableProviderTest.java will actually exercise this code since it's doing a projection. I'm not sure if there's an easy way to verify that in the test though.
   
   I am not sure how to test this in a more detailed way. The current test is indeed exercising the projection, and it covers both the correct schema of the 'projected' collection as well as the results. What we can do maybe is to augment the coverage. I can add a test that does not project and only does `SELECT * FROM...` to cover the previously existing functionality.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org