You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Takeshi Yamamuro (JIRA)" <ji...@apache.org> on 2017/07/31 13:17:00 UTC
[jira] [Comment Edited] (SPARK-21579) dropTempView has a critical
BUG
[ https://issues.apache.org/jira/browse/SPARK-21579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107296#comment-16107296 ]
Takeshi Yamamuro edited comment on SPARK-21579 at 7/31/17 1:16 PM:
-------------------------------------------------------------------
I think this is an expected behaviour in Spark; `cache1` is a sub-tree of `cache2`, so if you uncache `cache1`, spark also uncaches `cache2`.
{code}
scala> Seq(("name1", 28)).toDF("name", "age").createOrReplaceTempView("base")
scala> sql("cache table cache1 as select * from base where age >= 25")
scala> sql("cache table cache2 as select * from cache1 where name = 'p1'")
scala> spark.table("cache1").explain
== Physical Plan ==
InMemoryTableScan [name#5, age#6]
+- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache1`
+- *Project [_1#2 AS name#5, _2#3 AS age#6]
+- *Filter (_2#3 >= 25)
+- LocalTableScan [_1#2, _2#3]
scala> spark.table("cache2").explain
== Physical Plan ==
InMemoryTableScan [name#5, age#6]
+- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache2`
+- *Filter (isnotnull(name#5) && (name#5 = p1))
+- InMemoryTableScan [name#5, age#6], [isnotnull(name#5), (name#5 = p1)]
+- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache1`
+- *Project [_1#2 AS name#5, _2#3 AS age#6]
+- *Filter (_2#3 >= 25)
+- LocalTableScan [_1#2, _2#3]
{code}
Probably, you better do this;
{code}
scala> sql("create temporary view tmp1 as select * from base where age >= 25")
scala> sql("cache table cache1 as select * from tmp1")
scala> sql("cache table cache2 as select * from tmp1 where name = 'p1'")
scala> catalog.dropTempView("cache1")
// scala> sql("select * from cache1").explain --> Throws AnalysisException `Table or view not found`
scala> sql("select * from cache2").explain
scala> sql("select * from cache2").explain
== Physical Plan ==
InMemoryTableScan [name#5, age#6]
+- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache2`
+- *Project [_1#2 AS name#5, _2#3 AS age#6]
+- *Filter (((_2#3 >= 25) && isnotnull(_1#2)) && (_1#2 = p1))
+- LocalTableScan [_1#2, _2#3]
{code}
was (Author: maropu):
I think this is an expected behaviour; `cache1` is a sub-tree of `cache2`, so if you uncache `cache1`, spark also uncaches `cache2`.
{code}
scala> Seq(("name1", 28)).toDF("name", "age").createOrReplaceTempView("base")
scala> sql("cache table cache1 as select * from base where age >= 25")
scala> sql("cache table cache2 as select * from cache1 where name = 'p1'")
scala> spark.table("cache1").explain
== Physical Plan ==
InMemoryTableScan [name#5, age#6]
+- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache1`
+- *Project [_1#2 AS name#5, _2#3 AS age#6]
+- *Filter (_2#3 >= 25)
+- LocalTableScan [_1#2, _2#3]
scala> spark.table("cache2").explain
== Physical Plan ==
InMemoryTableScan [name#5, age#6]
+- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache2`
+- *Filter (isnotnull(name#5) && (name#5 = p1))
+- InMemoryTableScan [name#5, age#6], [isnotnull(name#5), (name#5 = p1)]
+- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache1`
+- *Project [_1#2 AS name#5, _2#3 AS age#6]
+- *Filter (_2#3 >= 25)
+- LocalTableScan [_1#2, _2#3]
{code}
Probably, you better do this;
{code}
scala> sql("create temporary view tmp1 as select * from base where age >= 25")
scala> sql("cache table cache1 as select * from tmp1")
scala> sql("cache table cache2 as select * from tmp1 where name = 'p1'")
scala> catalog.dropTempView("cache1")
// scala> sql("select * from cache1").explain --> Throws AnalysisException `Table or view not found`
scala> sql("select * from cache2").explain
scala> sql("select * from cache2").explain
== Physical Plan ==
InMemoryTableScan [name#5, age#6]
+- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache2`
+- *Project [_1#2 AS name#5, _2#3 AS age#6]
+- *Filter (((_2#3 >= 25) && isnotnull(_1#2)) && (_1#2 = p1))
+- LocalTableScan [_1#2, _2#3]
{code}
> dropTempView has a critical BUG
> -------------------------------
>
> Key: SPARK-21579
> URL: https://issues.apache.org/jira/browse/SPARK-21579
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.1.1, 2.2.0
> Reporter: ant_nebula
> Priority: Critical
>
> when I dropTempView dwd_table1 only, sub table dwd_table2 also disappear from http://127.0.0.1:4040/storage/.
> It affect version 2.1.1 and 2.2.0, 2.1.0 is ok for this problem.
> {code:java}
> val spark = SparkSession.builder.master("local").appName("sparkTest").getOrCreate()
> val rows = Seq(Row("p1", 30), Row("p2", 20), Row("p3", 25), Row("p4", 10), Row("p5", 40), Row("p6", 15))
> val schema = new StructType().add(StructField("name", StringType)).add(StructField("age", IntegerType))
> val rowRDD = spark.sparkContext.parallelize(rows, 3)
> val df = spark.createDataFrame(rowRDD, schema)
> df.createOrReplaceTempView("ods_table")
> spark.sql("cache table ods_table")
> spark.sql("cache table dwd_table1 as select * from ods_table where age>=25")
> spark.sql("cache table dwd_table2 as select * from dwd_table1 where name='p1'")
> spark.catalog.dropTempView("dwd_table1")
> //spark.catalog.dropTempView("ods_table")
> spark.sql("select * from dwd_table2").show()
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org