You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Aoyuan Liao (Jira)" <ji...@apache.org> on 2020/10/21 00:47:00 UTC
[jira] [Commented] (SPARK-25985) Verify the SPARK-24613 Cache with
UDF could not be matched with subsequent dependent caches
[ https://issues.apache.org/jira/browse/SPARK-25985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218019#comment-17218019 ]
Aoyuan Liao commented on SPARK-25985:
-------------------------------------
[~smilegator] I think recacheByCondition doesn't keep the cached plan. The following test would fail:
{code:java}
//
test("SPARK-24613 Cache with UDF could not be matched with subsequent dependent caches") {
val udf1 = udf({x: Int => x + 1})
val df = spark.range(0, 10).toDF("a").withColumn("b", udf1($"a"))
val df2 = df.agg(sum(df("b")))
df.cache()
df.count()
df2.cache()
df.unpersist() //recacheByCondition called within
val plan = df2.queryExecution.withCachedData
assert(plan.isInstanceOf[InMemoryRelation])
val internalPlan = plan.asInstanceOf[InMemoryRelation].cacheBuilder.cachedPlan
assert(internalPlan.find(_.isInstanceOf[InMemoryTableScanExec]).isDefined)
}
{code}
The second assertion failed, which means that the data is cached while the plan not.
> Verify the SPARK-24613 Cache with UDF could not be matched with subsequent dependent caches
> -------------------------------------------------------------------------------------------
>
> Key: SPARK-25985
> URL: https://issues.apache.org/jira/browse/SPARK-25985
> Project: Spark
> Issue Type: Sub-task
> Components: SQL, Tests
> Affects Versions: 3.0.0
> Reporter: Xiao Li
> Priority: Major
> Labels: starter
>
> Verify whether recacheByCondition works well when the cache data is with UDF. This is a follow-up of https://github.com/apache/spark/pull/21602
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org