You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Amit Baghel (JIRA)" <ji...@apache.org> on 2016/10/15 07:09:20 UTC
[jira] [Created] (SPARK-17952) Java SparkSession createDataFrame
doesn't work with nested Javabean
Amit Baghel created SPARK-17952:
-----------------------------------
Summary: Java SparkSession createDataFrame doesn't work with nested Javabean
Key: SPARK-17952
URL: https://issues.apache.org/jira/browse/SPARK-17952
Project: Spark
Issue Type: Bug
Affects Versions: 2.0.1, 2.0.0
Reporter: Amit Baghel
As per latest spark documentation at http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection, createDataFrame method supports nested JavaBeans. However this is not working. Please see below code.
{code}
public class SubCategory implements Serializable{
private String id;
private String name;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}
{code}
{code}
public class Category implements Serializable{
private String id;
private SubCategory subCategory;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public SubCategory getSubCategory() {
return subCategory;
}
public void setSubCategory(SubCategory subCategory) {
this.subCategory = subCategory;
}
}
{code}
{code}
public class SparkSample {
public static void main(String[] args) throws IOException {
SparkSession spark = SparkSession
.builder()
.appName("SparkSample")
.master("local")
.getOrCreate();
//SubCategory
SubCategory sub = new SubCategory();
sub.setId("sc-111");
sub.setName("Sub-1");
//Category
Category category = new Category();
category.setId("s-111");
category.setSubCategory(sub);
//categoryList
List<Category> categoryList = new ArrayList<Category>();
categoryList.add(category);
//DF
Dataset<Row> dframe = spark.createDataFrame(categoryList, Category.class);
dframe.show();
}
}
{code}
Above code throws below error.
{code}
Exception in thread "main" scala.MatchError: com.sample.SubCategory@e7391d (of class com.sample.SubCategory)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:256)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:251)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:403)
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1106)
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1106)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1106)
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1104)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.toStream(Iterator.scala:1322)
at scala.collection.AbstractIterator.toStream(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298)
at scala.collection.AbstractIterator.toSeq(Iterator.scala:1336)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373)
at com.sample.SparkSample.main(SparkSample.java:33)
{code}
createDataFrame method throws above error. But I observed that createDataset method works fine with below code.
{code}
Encoder<Category> encoder = Encoders.bean(Category.class);
Dataset<Category> dframe = spark.createDataset(categoryList, encoder);
dframe.show();
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org