You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "jalendhar Baddam (JIRA)" <ji...@apache.org> on 2017/07/05 04:52:00 UTC

[jira] [Comment Edited] (SPARK-21299) except is throwing the fallowing exception after perform dropDuplicates on the Dataset object

    [ https://issues.apache.org/jira/browse/SPARK-21299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074241#comment-16074241 ] 

jalendhar Baddam edited comment on SPARK-21299 at 7/5/17 4:51 AM:
------------------------------------------------------------------

Still we are getting the issue.
Dataset<Row> ds=spark.read().table("tab1");
ds=ds.dropDuplicates("colname");
ds1=ds.limit(10);
ds=ds.except(ds1)//here its causing the above exception

I am using the version 2.1.1

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.1.1</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.1.1</version>
            <scope>provided</scope>
        </dependency>


was (Author: jalendhar):
Still we are getting the issue.
Dataset<Row> ds=spark.read().table("tab1");
ds=ds.dropDuplicates("colname");
ds1=ds.limit(10);
ds=ds.except(ds1)//here its causing the above exception

> except is throwing the fallowing exception after perform dropDuplicates on the Dataset object
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21299
>                 URL: https://issues.apache.org/jira/browse/SPARK-21299
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.1.0
>         Environment: spark 2.1.0
>            Reporter: jalendhar Baddam
>
> INFO: org.apache.spark.sql.AnalysisException: resolved attribute(s) test_customer_CustID#569 missing from test_customer_ROW_NUM#589L,test_customer_CustID#590,test_customer_Telephone#598L,test_customer_HouseholdID#593,test_customer_Gender#592,test_customer_Title#599,test_customer_Surname#597,test_customer_Occupation#596,test_customer_DOB#591,test_customer_Initials#595,test_customer_Income#594 in operator !Filter (cast(test_customer_CustID#569 as double) > cast(1000 as double));;
> INFO: Except
> INFO: :- Project [test_customer_ROW_NUM#212L, test_customer_CustID#213, test_customer_DOB#214, test_customer_Gender#215, test_customer_HouseholdID#216, test_customer_Income#217, test_customer_Initials#218, test_customer_Occupation#219, test_customer_Surname#220, test_customer_Telephone#221L, test_customer_Title#222]
> INFO: :  +- Sort [test_customer_ROW_NUM#212L ASC NULLS FIRST], true
> INFO: :     +- Project [test_customer_ROW_NUM#212L, test_customer_CustID#213, test_customer_DOB#214, test_customer_Gender#215, test_customer_HouseholdID#216, test_customer_Income#217, test_customer_Initials#218, test_customer_Occupation#219, test_customer_Surname#220, test_customer_Telephone#221L, test_customer_Title#222]
> INFO: :        +- SubqueryAlias 1922a657-80bd-41a5-8e1f-04a248263e47
> INFO: :           +- Aggregate [test_customer_ROW_NUM#212L, test_customer_CustID#213, test_customer_DOB#214, test_customer_Gender#215, test_customer_HouseholdID#216, test_customer_Income#217, test_customer_Initials#218, test_customer_Occupation#219, test_customer_Surname#220, test_customer_Telephone#221L, test_customer_Title#222], [test_customer_ROW_NUM#212L, test_customer_CustID#213, test_customer_DOB#214, test_customer_Gender#215, test_customer_HouseholdID#216, test_customer_Income#217, test_customer_Initials#218, test_customer_Occupation#219, test_customer_Surname#220, test_customer_Telephone#221L, test_customer_Title#222]
> INFO: :              +- Project [test_customer_ROW_NUM#212L, test_customer_CustID#213, test_customer_DOB#214, test_customer_Gender#215, test_customer_HouseholdID#216, test_customer_Income#217, test_customer_Initials#218, test_customer_Occupation#219, test_customer_Surname#220, test_customer_Telephone#221L, test_customer_Title#222]
> INFO: :                 +- Project [test_customer_ROW_NUM#212L, test_customer_CustID#213, test_customer_DOB#214, test_customer_Gender#215, test_customer_HouseholdID#216, test_customer_Income#217, test_customer_Initials#218, test_customer_Occupation#219, test_customer_Surname#220, test_customer_Telephone#221L, test_customer_Title#222]
> INFO: :                    +- Aggregate [test_customer_Gender#215], [first(test_customer_ROW_NUM#212L, false) AS test_customer_ROW_NUM#212L, first(test_customer_CustID#213, false) AS test_customer_CustID#213, first(test_customer_DOB#214, false) AS test_customer_DOB#214, test_customer_Gender#215, first(test_customer_HouseholdID#216, false) AS test_customer_HouseholdID#216, first(test_customer_Income#217, false) AS test_customer_Income#217, first(test_customer_Initials#218, false) AS test_customer_Initials#218, first(test_customer_Occupation#219, false) AS test_customer_Occupation#219, first(test_customer_Surname#220, false) AS test_customer_Surname#220, first(test_customer_Telephone#221L, false) AS test_customer_Telephone#221L, first(test_customer_Title#222, false) AS test_customer_Title#222]
> INFO: :                       +- Project [test_customer_ROW_NUM#212L, test_customer_CustID#213, test_customer_DOB#214, test_customer_Gender#215, test_customer_HouseholdID#216, test_customer_Income#217, test_customer_Initials#218, test_customer_Occupation#219, test_customer_Surname#220, test_customer_Telephone#221L, test_customer_Title#222]
> INFO: :                          +- Filter (cast(test_customer_CustID#213 as double) > cast(1000 as double))
> INFO: :                             +- Project [ROW_NUM#47L AS test_customer_ROW_NUM#212L, CustID#48 AS test_customer_CustID#213, DOB#49 AS test_customer_DOB#214, Gender#50 AS test_customer_Gender#215, HouseholdID#51 AS test_customer_HouseholdID#216, Income#52 AS test_customer_Income#217, Initials#53 AS test_customer_Initials#218, Occupation#54 AS test_customer_Occupation#219, Surname#55 AS test_customer_Surname#220, Telephone#56L AS test_customer_Telephone#221L, Title#57 AS test_customer_Title#222]
> INFO: :                                +- SubqueryAlias customer
> INFO: :                                   +- Relation[ROW_NUM#47L,CustID#48,DOB#49,Gender#50,HouseholdID#51,Income#52,Initials#53,Occupation#54,Surname#55,Telephone#56L,Title#57] parquet
> INFO: +- Project [test_customer_ROW_NUM#568L, test_customer_CustID#569, test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572, test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L, test_customer_Title#577]
> INFO:    +- GlobalLimit 0
> INFO:       +- LocalLimit 0
> INFO:          +- Sort [test_customer_ROW_NUM#568L ASC NULLS FIRST], true
> INFO:             +- Project [test_customer_ROW_NUM#568L, test_customer_CustID#569, test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572, test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L, test_customer_Title#577]
> INFO:                +- SubqueryAlias 1922a657-80bd-41a5-8e1f-04a248263e47
> INFO:                   +- Aggregate [test_customer_ROW_NUM#568L, test_customer_CustID#569, test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572, test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L, test_customer_Title#577], [test_customer_ROW_NUM#568L, test_customer_CustID#569, test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572, test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L, test_customer_Title#577]
> INFO:                      +- Project [test_customer_ROW_NUM#568L, test_customer_CustID#569, test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572, test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L, test_customer_Title#577]
> INFO:                         +- Project [test_customer_ROW_NUM#568L, test_customer_CustID#569, test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572, test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L, test_customer_Title#577]
> INFO:                            +- Project [test_customer_ROW_NUM#568L, test_customer_CustID#569, test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572, test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L, test_customer_Title#577]
> INFO:                               +- Aggregate [test_customer_Gender#592], [first(test_customer_ROW_NUM#568L, false) AS test_customer_ROW_NUM#568L, first(test_customer_CustID#569, false) AS test_customer_CustID#569, first(test_customer_DOB#570, false) AS test_customer_DOB#570, test_customer_Gender#592, first(test_customer_HouseholdID#571, false) AS test_customer_HouseholdID#571, first(test_customer_Income#572, false) AS test_customer_Income#572, first(test_customer_Initials#573, false) AS test_customer_Initials#573, first(test_customer_Occupation#574, false) AS test_customer_Occupation#574, first(test_customer_Surname#575, false) AS test_customer_Surname#575, first(test_customer_Telephone#576L, false) AS test_customer_Telephone#576L, first(test_customer_Title#577, false) AS test_customer_Title#577]
> INFO:                                  +- Project [test_customer_ROW_NUM#568L, test_customer_CustID#569, test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572, test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L, test_customer_Title#577]
> INFO:                                     +- !Project [test_customer_ROW_NUM#568L, test_customer_CustID#569, test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572, test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L, test_customer_Title#577]
> INFO:                                        +- !Filter (cast(test_customer_CustID#569 as double) > cast(1000 as double))
> INFO:                                           +- Project [ROW_NUM#47L AS test_customer_ROW_NUM#589L, CustID#48 AS test_customer_CustID#590, DOB#49 AS test_customer_DOB#591, Gender#50 AS test_customer_Gender#592, HouseholdID#51 AS test_customer_HouseholdID#593, Income#52 AS test_customer_Income#594, Initials#53 AS test_customer_Initials#595, Occupation#54 AS test_customer_Occupation#596, Surname#55 AS test_customer_Surname#597, Telephone#56L AS test_customer_Telephone#598L, Title#57 AS test_customer_Title#599]
> INFO:                                              +- SubqueryAlias customer
> INFO:                                                 +- Relation[ROW_NUM#47L,CustID#48,DOB#49,Gender#50,HouseholdID#51,Income#52,Initials#53,Occupation#54,Surname#55,Telephone#56L,Title#57] parquet
> INFO: 
> INFO: 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40)
> INFO: 	at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:57)
> INFO: 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:337)
> INFO: 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:128)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67)
> INFO: 	at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:57)
> INFO: 	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:48)
> INFO: 	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
> INFO: 	at org.apache.spark.sql.Dataset.withSetOperator(Dataset.scala:2834)
> INFO: 	at org.apache.spark.sql.Dataset.except(Dataset.scala:1652)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org