You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/24 10:28:10 UTC

[GitHub] [spark] viirya opened a new pull request #24696: [SPARK-27832] Don't decompress and create column batch when the task is completed

viirya opened a new pull request #24696: [SPARK-27832] Don't decompress and create column batch when the task is completed
URL: https://github.com/apache/spark/pull/24696

## What changes were proposed in this pull request?

Cached relation decompresses and creates column batch when accessing cache. It's possible that a thread doesn't stop immediately reading cached relation after the task is completed. Due to race condition, cached relation might still decompresses and creates new and unnecessary batch. At the moment, the returned batch is also immediately closed. At the reader side, it can cause null exception when reading a closed batch, and we probably need to hide such exception.

We don't need to create the batch if the task is completed. It saves the effort to decompress the cached batch and also prevents such exception.

## How was this patch tested?

Hard to write a unit test case for this case, manually tested it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org