You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/18 07:35:01 UTC

[GitHub] [spark] sadikovi commented on pull request #37419: [SPARK-39833][SQL] Disable Parquet column index in DSv1 to fix a correctness issue in the case of overlapping partition and data columns

sadikovi commented on PR #37419:
URL: https://github.com/apache/spark/pull/37419#issuecomment-1219137464

   I decided to disable column index altogether until I have a better fix or parquet bug is fixed. I also moved tests to ParquetQueryV1 as one of the tests fails in DSv2 due to another bug in projection.
   
   @cloud-fan @sunchao Can you review this PR? 
   I just think adding a check on required schema and column filters could be error-prone especially when nested fields are involved. It seems to me it is easier to disable column index by default which can still be enabled manually by users.
   
   I am also open to other suggestions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org