You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jen-Ming Chung (JIRA)" <ji...@apache.org> on 2017/09/13 04:56:00 UTC

[jira] [Commented] (SPARK-21989) createDataset and the schema of encoder class

    [ https://issues.apache.org/jira/browse/SPARK-21989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164152#comment-16164152 ] 

Jen-Ming Chung commented on SPARK-21989:
----------------------------------------

Hi [~client.test],
I write the above code in scala and run in Spark 2.2.0 can show the schema and content you expected.

{code:scala}
  case class SimpleData(str: String)
  ...
  import spark.implicits._
  val arr = Seq("{\"str\": \"everyone\"}", "{\"str\": \"Hello\"}")
  val rdd: RDD[SimpleData] =
    spark
      .sparkContext
      .parallelize(arr)
      .map(v => new Gson().fromJson[SimpleData](v, classOf[SimpleData]))
  val ds = spark.createDataset(rdd)
  ds.printSchema()
  root
    |-- str: string (nullable = true)

  ds.show(false)
  +--------+
  |str     |
  +--------+
  |everyone|
  |Hello   |
  +--------+
{code}

> createDataset and the schema of encoder class
> ---------------------------------------------
>
>                 Key: SPARK-21989
>                 URL: https://issues.apache.org/jira/browse/SPARK-21989
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: taiho choi
>
> Hello.
> public class SampleData implements Serializable {
>     public String str;
> }
>         ArrayList<String> arr= new ArrayList();
>         arr.add("{\"str\": \"everyone\"}");
>         arr.add("{\"str\": \"Hello\"}");
>         JavaRDD<SampleData> data2 = sc.parallelize(arr).map(v -> {return new Gson().fromJson(v, SampleData.class);});
>         Dataset<SampleData> df = sqc.createDataset(data2.rdd(), Encoders.bean(SampleData.class));
>         df.printSchema();
> expected result of printSchema is str field of sampleData class.
> actual result is following.
> root
> and if i call df.show() it displays like following.
> ++
> ||
> ++
> ||
> ||
> ++
> what i expected is , "hello", "everyone" will be displayed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org