You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ahmed ZAROUI (JIRA)" <ji...@apache.org> on 2018/02/16 14:19:00 UTC
[jira] [Updated] (SPARK-23448) Dataframe returns wrong result when
column don't respect datatype
[ https://issues.apache.org/jira/browse/SPARK-23448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ahmed ZAROUI updated SPARK-23448:
---------------------------------
Summary: Dataframe returns wrong result when column don't respect datatype (was: Data encoding problem when not finding the right type)
> Dataframe returns wrong result when column don't respect datatype
> -----------------------------------------------------------------
>
> Key: SPARK-23448
> URL: https://issues.apache.org/jira/browse/SPARK-23448
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.0.2
> Environment: Local
> Reporter: Ahmed ZAROUI
> Priority: Major
>
> I have the following json file that contains some noisy data(String instead of Array):
>
> {code:java}
> {"attr1":"val1","attr2":"[\"val2\"]"}
> {"attr1":"val1","attr2":["val2"]}
> {code}
> And i need to specify schema programatically like this:
>
> {code:java}
> implicit val spark = SparkSession
> .builder()
> .master("local[*]")
> .config("spark.ui.enabled", false)
> .config("spark.sql.caseSensitive", "True")
> .getOrCreate()
> import spark.implicits._
> val schema = StructType(
> Seq(StructField("attr1", StringType, true),
> StructField("attr2", ArrayType(StringType, true), true)))
> spark.read.schema(schema).json(input).collect().foreach(println)
> {code}
> The result given by this code is:
> {code:java}
> [null,null]
> [val1,WrappedArray(val2)]
> {code}
> Instead of putting null in corrupted column, all columns of the first message are null
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org