You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2019/02/14 06:57:45 UTC

[spark] branch master updated: [SPARK-26851][SQL] Fix double-checked locking in CachedRDDBuilder

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new f34b872  [SPARK-26851][SQL] Fix double-checked locking in CachedRDDBuilder
f34b872 is described below

commit f34b872aed0040d120c54b704edf80e75cfe815e
Author: Bruce Robbins <be...@gmail.com>
AuthorDate: Thu Feb 14 14:57:25 2019 +0800

    [SPARK-26851][SQL] Fix double-checked locking in CachedRDDBuilder
    
    ## What changes were proposed in this pull request?
    
    According to Brian Goetz et al in Java Concurrency in Practice, the double checked locking pattern has worked since Java 5, but only if the resource is declared volatile:
    
    > Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to work if resource is made volatile, and the performance impact of this is small since volatile reads are usually only slightly more expensive than nonvolatile reads.
    
    CachedRDDBuilder. cachedColumnBuffers and CachedRDDBuilder.clearCache both use DCL to manage the resource ``_cachedColumnBuffers``. The missing ingredient is that ``_cachedColumnBuffers`` is not volatile.
    
    Because of this, clearCache may see ``_cachedColumnBuffers`` as null, when in fact it is not, and therefore fail to un-cache the RDD. There may be other, more subtle bugs due to visibility issues.
    
    To avoid these issues, this PR makes ``_cachedColumnBuffers`` volatile.
    
    ## How was this patch tested?
    
    - Existing SQL unit tests
    - Existing pyspark-sql tests
    
    Closes #23768 from bersprockets/SPARK-26851.
    
    Authored-by: Bruce Robbins <be...@gmail.com>
    Signed-off-by: Wenchen Fan <we...@databricks.com>
---
 .../org/apache/spark/sql/execution/columnar/InMemoryRelation.scala      | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
index efdb82f..bc6e958 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
@@ -49,7 +49,7 @@ case class CachedRDDBuilder(
     storageLevel: StorageLevel,
     @transient cachedPlan: SparkPlan,
     tableName: Option[String])(
-    @transient private var _cachedColumnBuffers: RDD[CachedBatch] = null) {
+    @transient @volatile private var _cachedColumnBuffers: RDD[CachedBatch] = null) {
 
   val sizeInBytesStats: LongAccumulator = cachedPlan.sqlContext.sparkContext.longAccumulator
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org