You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/12 14:02:08 UTC

[GitHub] IvanVergiliev opened a new pull request #23766: [SPARK-26859][SQL] Fix data correctness bug in ORC deserializer

IvanVergiliev opened a new pull request #23766: [SPARK-26859][SQL] Fix data correctness bug in ORC deserializer
URL: https://github.com/apache/spark/pull/23766
 
 
   ## What changes were proposed in this pull request?
   
   There is a bug in `OrcDeserializer.scala` that results in `null`s being set at the wrong column position, and for state from previous records to remain uncleared in next records. There are more details for when exactly the bug gets triggered and what the outcome is in the [JIRA issue](https://jira.apache.org/jira/browse/SPARK-26859).
   
   The high-level summary is that this bug results in severe data correctness issues, but fortunately the set of conditions to expose the bug are complicated and make the surface area somewhat small.
   
   This change fixes the problem and adds a respective test.
   
   ## How was this patch tested?
   
   The change contains a test that fails on `master` and succeeds with the current fix. The test is at the same level of abstraction as existing `OrcSourceSuite` tests. I considered adding unit tests that test the `OrcDeserializer` class directly, but none existed and it didn't seem like a frequent pattern across the parts of the codebase I've seen recently so I decided against doing it - open to reconsidering that decision.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org