You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/10/09 11:06:00 UTC

[jira] [Updated] (SPARK-17952) SparkSession createDataFrame method throws exception for nested JavaBeans

     [ https://issues.apache.org/jira/browse/SPARK-17952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-17952:
---------------------------------
    Affects Version/s: 2.3.0

> SparkSession createDataFrame method throws exception for nested JavaBeans
> -------------------------------------------------------------------------
>
>                 Key: SPARK-17952
>                 URL: https://issues.apache.org/jira/browse/SPARK-17952
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 2.0.1, 2.3.0
>            Reporter: Amit Baghel
>
> As per latest spark documentation for Java at http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection, 
> {quote}
> Nested JavaBeans and List or Array fields are supported though.
> {quote}
> However nested JavaBean is not working. Please see the below code.
> SubCategory class
> {code}
> public class SubCategory implements Serializable{
> 	private String id;
> 	private String name;
> 	
> 	public String getId() {
> 		return id;
> 	}
> 	public void setId(String id) {
> 		this.id = id;
> 	}
> 	public String getName() {
> 		return name;
> 	}
> 	public void setName(String name) {
> 		this.name = name;
> 	}	
> }
> {code}
> Category class
> {code}
> public class Category implements Serializable{
> 	private String id;
> 	private SubCategory subCategory;
> 	
> 	public String getId() {
> 		return id;
> 	}
> 	public void setId(String id) {
> 		this.id = id;
> 	}
> 	public SubCategory getSubCategory() {
> 		return subCategory;
> 	}
> 	public void setSubCategory(SubCategory subCategory) {
> 		this.subCategory = subCategory;
> 	}
> }
> {code}
> SparkSample class
> {code}
> public class SparkSample {
> 	public static void main(String[] args) throws IOException {				
> 		SparkSession spark = SparkSession
> 				.builder()
> 				.appName("SparkSample")
> 				.master("local")
> 				.getOrCreate();
> 		//SubCategory
> 		SubCategory sub = new SubCategory();
> 		sub.setId("sc-111");
> 		sub.setName("Sub-1");
> 		//Category
> 		Category category = new Category();
> 		category.setId("s-111");
> 		category.setSubCategory(sub);
> 		//categoryList
> 		List<Category> categoryList = new ArrayList<Category>();
> 		categoryList.add(category);
> 		 //DF
> 		Dataset<Row> dframe = spark.createDataFrame(categoryList, Category.class);	
> 		dframe.show();		
> 	}
> }
> {code}
> Above code throws below error.
> {code}
> Exception in thread "main" scala.MatchError: com.sample.SubCategory@e7391d (of class com.sample.SubCategory)
> 	at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:256)
> 	at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:251)
> 	at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
> 	at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:403)
> 	at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1106)
> 	at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1106)
> 	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> 	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> 	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> 	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> 	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> 	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
> 	at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1106)
> 	at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1104)
> 	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> 	at scala.collection.Iterator$class.toStream(Iterator.scala:1322)
> 	at scala.collection.AbstractIterator.toStream(Iterator.scala:1336)
> 	at scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298)
> 	at scala.collection.AbstractIterator.toSeq(Iterator.scala:1336)
> 	at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373)
> 	at com.sample.SparkSample.main(SparkSample.java:33)
> {code}
> createDataFrame method throws above exception. But I observed that createDataset method works fine with below code.
> {code}
> Encoder<Category> encoder = Encoders.bean(Category.class); 
> Dataset<Category> dframe = spark.createDataset(categoryList, encoder);
> dframe.show();
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org