You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ahmed ZAROUI (JIRA)" <ji...@apache.org> on 2018/02/16 13:40:00 UTC

[jira] [Created] (SPARK-23448) Data encoding problem when not finding the right type

Ahmed ZAROUI created SPARK-23448:
------------------------------------

             Summary: Data encoding problem when not finding the right type
                 Key: SPARK-23448
                 URL: https://issues.apache.org/jira/browse/SPARK-23448
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.0.2
         Environment: Tested locally in linux machine
            Reporter: Ahmed ZAROUI


I have the following json file that contains some noisy data(String instead of Array):

 
{code:java}
{"attr1":"val1","attr2":["val2"]} 
{"attr1":"val1","attr2":"[\"val2\"]"}
{code}
And i need to specify schema programatically like this:

 
{code:java}
implicit val spark = SparkSession
  .builder()
  .master("local[*]")
  .config("spark.ui.enabled", false)
  .config("spark.sql.caseSensitive", "True")
  .getOrCreate()
import spark.implicits._

val schema=StructType(Seq(StructField("attr1",StringType,true),StructField("attr2",ArrayType(StringType,true),true)))
  spark.read.schema(schema).json(input).collect().foreach(println)
{code}
The result given by this code is:
{code:java}
[null,null]
[val1,WrappedArray(val2)]
{code}
Instead of putting null in corrupted column, all columns of the first message are null

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org