You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/29 14:26:45 UTC

[GitHub] [spark] ala commented on pull request #38777: [SPARK-41151][FOLLOW-UP][SQL] Keep built-in file _metadata fields nullable value consistent

ala commented on PR #38777:
URL: https://github.com/apache/spark/pull/38777#issuecomment-1330730907

   Sorry, I was on PTO/sick for a couple of days. 
   
   My idea was not to include the row index in `_metadata` for formats that cannot generate it. While we have _many_ Parquet readers, I believe all of them support row index generation right now, so that's not a problem.
   
   For me the main concern might be that we might want to keep growing the `_metadata` struct (most recently: row id, cc @tomvanbussel, @juliuszsompolski). These new fields might not be immediately supported in all the readers (or if they are Databricks-internal, we might not want to support them in OSS readers at all). So while I think it's OK to make row index non-nullable, it would be an issue if we wanted to require that all `_metadata` fields are non-nullable in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org