You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Aoyuan Liao (Jira)" <ji...@apache.org> on 2020/10/21 00:47:00 UTC

[jira] [Commented] (SPARK-25985) Verify the SPARK-24613 Cache with UDF could not be matched with subsequent dependent caches

    [ https://issues.apache.org/jira/browse/SPARK-25985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218019#comment-17218019 ] 

Aoyuan Liao commented on SPARK-25985:
-------------------------------------

[~smilegator] I think recacheByCondition doesn't keep the cached plan. The following test would fail:
{code:java}
//
test("SPARK-24613 Cache with UDF could not be matched with subsequent dependent caches") {
    val udf1 = udf({x: Int => x + 1})    
    val df = spark.range(0, 10).toDF("a").withColumn("b", udf1($"a"))
    val df2 = df.agg(sum(df("b")))
    df.cache()
    df.count()
    df2.cache()

    df.unpersist() //recacheByCondition called within

    val plan = df2.queryExecution.withCachedData
    assert(plan.isInstanceOf[InMemoryRelation])
    
    val internalPlan = plan.asInstanceOf[InMemoryRelation].cacheBuilder.cachedPlan
    assert(internalPlan.find(_.isInstanceOf[InMemoryTableScanExec]).isDefined)
}
{code}
The second assertion failed, which means that the data is cached while the plan not.

> Verify the SPARK-24613 Cache with UDF could not be matched with subsequent dependent caches
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-25985
>                 URL: https://issues.apache.org/jira/browse/SPARK-25985
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Tests
>    Affects Versions: 3.0.0
>            Reporter: Xiao Li
>            Priority: Major
>              Labels: starter
>
> Verify whether recacheByCondition works well when the cache data is with UDF. This is a follow-up of https://github.com/apache/spark/pull/21602



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org