You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "sandeshyapuram (Jira)" <ji...@apache.org> on 2019/10/31 09:00:10 UTC

[jira] [Updated] (SPARK-29682) Failure when resolving conflicting references in Join:

     [ https://issues.apache.org/jira/browse/SPARK-29682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sandeshyapuram updated SPARK-29682:
-----------------------------------
    Description: 
When I try to self join a parentDf with multiple childDf say childDf1 ... ... 

where childDfs are derived after a cube or rollup and are filtered based on group bys,

I get and error 

{{Failure when resolving conflicting references in Join: }}

This shows a long error message which is quite unreadable. On the other hand, if I replace cube or rollup with old groupBy, it works without issues.

 

*Sample code:*
{code:java}
val numsDF = sc.parallelize(Seq(1,2,3,4,5,6)).toDF("nums")val cubeDF = numsDF
    .cube("nums")
    .agg(
        max(lit(0)).as("agcol"),
        grouping_id().as("gid")
    )
    
val group0 = cubeDF.filter(col("gid") <=> lit(0))
val group1 = cubeDF.filter(col("gid") <=> lit(1))cubeDF.printSchema
group0.printSchema
group1.printSchema//Recreating cubeDf
cubeDF.select("nums").distinct
    .join(group0, Seq("nums"), "inner")
    .join(group1, Seq("nums"), "inner")
    .show{code}

  was:
When I try to self join a parentDf with multiple childDf say childDf1 ... ... 

where childDfs are derived after a cube or rollup and are filtered based on group bys,

I get and error 

{{Failure when resolving conflicting references in Join: }}

This shows a long error message which is quite unreadable. On the other hand, if I replace cube or rollup with old groupBy, it works without issues.

 

*Sample code:*

 

 
{code:java}
val numsDF = sc.parallelize(Seq(1,2,3,4,5,6)).toDF("nums")val cubeDF = numsDF
    .cube("nums")
    .agg(
        max(lit(0)).as("agcol"),
        grouping_id().as("gid")
    )
    
val group0 = cubeDF.filter(col("gid") <=> lit(0))
val group1 = cubeDF.filter(col("gid") <=> lit(1))cubeDF.printSchema
group0.printSchema
group1.printSchema//Recreating cubeDf
cubeDF.select("nums").distinct
    .join(group0, Seq("nums"), "inner")
    .join(group1, Seq("nums"), "inner")
    .show{code}


> Failure when resolving conflicting references in Join:
> ------------------------------------------------------
>
>                 Key: SPARK-29682
>                 URL: https://issues.apache.org/jira/browse/SPARK-29682
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 2.4.3
>            Reporter: sandeshyapuram
>            Priority: Major
>
> When I try to self join a parentDf with multiple childDf say childDf1 ... ... 
> where childDfs are derived after a cube or rollup and are filtered based on group bys,
> I get and error 
> {{Failure when resolving conflicting references in Join: }}
> This shows a long error message which is quite unreadable. On the other hand, if I replace cube or rollup with old groupBy, it works without issues.
>  
> *Sample code:*
> {code:java}
> val numsDF = sc.parallelize(Seq(1,2,3,4,5,6)).toDF("nums")val cubeDF = numsDF
>     .cube("nums")
>     .agg(
>         max(lit(0)).as("agcol"),
>         grouping_id().as("gid")
>     )
>     
> val group0 = cubeDF.filter(col("gid") <=> lit(0))
> val group1 = cubeDF.filter(col("gid") <=> lit(1))cubeDF.printSchema
> group0.printSchema
> group1.printSchema//Recreating cubeDf
> cubeDF.select("nums").distinct
>     .join(group0, Seq("nums"), "inner")
>     .join(group1, Seq("nums"), "inner")
>     .show{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org