You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/15 10:57:07 UTC

[GitHub] [spark] bart-samwel commented on pull request #31552: [SPARK-34424][SQL][TESTS] Fix failures of HiveOrcHadoopFsRelationSuite

bart-samwel commented on pull request #31552:
URL: https://github.com/apache/spark/pull/31552#issuecomment-779141105


   > For me, the case of ORC's date (and timestamps too) seems similar to Parquet's INT96 timestamps. The ORC spec says nothing about the calendar systems (https://orc.apache.org/specification/ORCv2/), and it just mentions the offset in days from the epoch:
   > _" Date data is encoded with a PRESENT stream, a DATA stream that records **the number of days after January 1, 1970 in UTC** "_
   > Since DATE just stores as number of days, the calendar system is not "hard coded" in the spec. I think we should support the **"CORRECTED"** mode (via a SQL config or/and a DS option) in the ORC datasource too as we did that recently for Parquet INT96 in #30056. @cloud-fan @bart-samwel WDYT?
   
   That makes sense. At least then we can store the data that gets generated internally and read it back. It would take some work for backward compatibility just like for Parquet -- e.g. we'd have to add metadata to the ORC files, and if that's not present, we'd need to detect which system wrote the file and base the read rebasing decision on that.
   
   FWIW, I think the data generator limitations should be explicitly tweaked for the tests to match the expectations of the test. I.e., if we expect the test won't handle some kind of date correctly, *then and only then* do we turn those off.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org