You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Saurabh Santhosh (JIRA)" <ji...@apache.org> on 2016/04/27 08:44:12 UTC
[jira] [Created] (SPARK-14948) Exception when joining DataFrames
derived form the same DataFrame
Saurabh Santhosh created SPARK-14948:
----------------------------------------
Summary: Exception when joining DataFrames derived form the same DataFrame
Key: SPARK-14948
URL: https://issues.apache.org/jira/browse/SPARK-14948
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.6.0
Reporter: Saurabh Santhosh
h2. Spark Analyser is throwing the following exception in a specific scenario :
h2. Exception :
org.apache.spark.sql.AnalysisException: resolved attribute(s) F1#3 missing from asd#5,F2#4,F1#6,F2#7 in operator !Project [asd#5,F1#3];
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
h2. Code :
{code:title=SparkClient.java|borderStyle=solid}
StructField[] fields = new StructField[2];
fields[0] = new StructField("F1", DataTypes.StringType, true, Metadata.empty());
fields[1] = new StructField("F2", DataTypes.StringType, true, Metadata.empty());
JavaRDD<Row> rdd =
sparkClient.getJavaSparkContext().parallelize(Arrays.asList(RowFactory.create("a", "b")));
DataFrame df = sparkClient.getSparkHiveContext().createDataFrame(rdd, new StructType(fields));
sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t1");
DataFrame aliasedDf = sparkClient.getSparkHiveContext().sql("select F1 as asd, F2 from t1");
sparkClient.getSparkHiveContext().registerDataFrameAsTable(aliasedDf, "t2");
sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t3");
DataFrame join = aliasedDf.join(df, aliasedDf.col("F2").equalTo(df.col("F2")), "inner");
DataFrame select = join.select(aliasedDf.col("asd"), df.col("F1"));
select.collect();
{code}
h2. Observations :
* This issue is related to the Data Type of Fields of the initial Data Frame.(If the Data Type is not String, it will work.)
* It works fine if the data frame is registered as a temporary table and an sql (select a.asd,b.F1 from t2 a inner join t3 b on a.F2=b.F2) is written.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org