You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ahmed ZAROUI (JIRA)" <ji...@apache.org> on 2018/02/16 13:40:00 UTC
[jira] [Created] (SPARK-23448) Data encoding problem when not
finding the right type
Ahmed ZAROUI created SPARK-23448:
------------------------------------
Summary: Data encoding problem when not finding the right type
Key: SPARK-23448
URL: https://issues.apache.org/jira/browse/SPARK-23448
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.0.2
Environment: Tested locally in linux machine
Reporter: Ahmed ZAROUI
I have the following json file that contains some noisy data(String instead of Array):
{code:java}
{"attr1":"val1","attr2":["val2"]}
{"attr1":"val1","attr2":"[\"val2\"]"}
{code}
And i need to specify schema programatically like this:
{code:java}
implicit val spark = SparkSession
.builder()
.master("local[*]")
.config("spark.ui.enabled", false)
.config("spark.sql.caseSensitive", "True")
.getOrCreate()
import spark.implicits._
val schema=StructType(Seq(StructField("attr1",StringType,true),StructField("attr2",ArrayType(StringType,true),true)))
spark.read.schema(schema).json(input).collect().foreach(println)
{code}
The result given by this code is:
{code:java}
[null,null]
[val1,WrappedArray(val2)]
{code}
Instead of putting null in corrupted column, all columns of the first message are null
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org