You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jen-Ming Chung (JIRA)" <ji...@apache.org> on 2017/09/13 04:56:00 UTC
[jira] [Comment Edited] (SPARK-21989) createDataset and the schema
of encoder class
[ https://issues.apache.org/jira/browse/SPARK-21989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164152#comment-16164152 ]
Jen-Ming Chung edited comment on SPARK-21989 at 9/13/17 4:55 AM:
-----------------------------------------------------------------
Hi [~client.test],
I write the above code in scala and run in Spark 2.2.0 can show the schema and content you expected.
{code:java}
case class SimpleData(str: String)
...
import spark.implicits._
val arr = Seq("{\"str\": \"everyone\"}", "{\"str\": \"Hello\"}")
val rdd: RDD[SimpleData] =
spark
.sparkContext
.parallelize(arr)
.map(v => new Gson().fromJson[SimpleData](v, classOf[SimpleData]))
val ds = spark.createDataset(rdd)
ds.printSchema()
root
|-- str: string (nullable = true)
ds.show(false)
+--------+
|str |
+--------+
|everyone|
|Hello |
+--------+
{code}
was (Author: jmchung):
Hi [~client.test],
I write the above code in scala and run in Spark 2.2.0 can show the schema and content you expected.
{code:scala}
case class SimpleData(str: String)
...
import spark.implicits._
val arr = Seq("{\"str\": \"everyone\"}", "{\"str\": \"Hello\"}")
val rdd: RDD[SimpleData] =
spark
.sparkContext
.parallelize(arr)
.map(v => new Gson().fromJson[SimpleData](v, classOf[SimpleData]))
val ds = spark.createDataset(rdd)
ds.printSchema()
root
|-- str: string (nullable = true)
ds.show(false)
+--------+
|str |
+--------+
|everyone|
|Hello |
+--------+
{code}
> createDataset and the schema of encoder class
> ---------------------------------------------
>
> Key: SPARK-21989
> URL: https://issues.apache.org/jira/browse/SPARK-21989
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: taiho choi
>
> Hello.
> public class SampleData implements Serializable {
> public String str;
> }
> ArrayList<String> arr= new ArrayList();
> arr.add("{\"str\": \"everyone\"}");
> arr.add("{\"str\": \"Hello\"}");
> JavaRDD<SampleData> data2 = sc.parallelize(arr).map(v -> {return new Gson().fromJson(v, SampleData.class);});
> Dataset<SampleData> df = sqc.createDataset(data2.rdd(), Encoders.bean(SampleData.class));
> df.printSchema();
> expected result of printSchema is str field of sampleData class.
> actual result is following.
> root
> and if i call df.show() it displays like following.
> ++
> ||
> ++
> ||
> ||
> ++
> what i expected is , "hello", "everyone" will be displayed.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org