You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2019/11/11 17:07:00 UTC
[jira] [Commented] (SPARK-29806) Using multiline option for a JSON
file which is not multiline results in silent truncation of data.
[ https://issues.apache.org/jira/browse/SPARK-29806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971730#comment-16971730 ]
Hyukjin Kwon commented on SPARK-29806:
--------------------------------------
{{multiline}} in JSON source currently only supports one JSON object or a JSON array.
> Using multiline option for a JSON file which is not multiline results in silent truncation of data.
> ---------------------------------------------------------------------------------------------------
>
> Key: SPARK-29806
> URL: https://issues.apache.org/jira/browse/SPARK-29806
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.4
> Reporter: Dilip Biswal
> Priority: Major
>
> The content of input Json File.
> {code:java}
> {"name":"John", "id":"100"}
> {"name":"Marry","id":"200"}{code}
> The above is valid json file but every record is in single line. But trying to read this file
> with a multiline option with FAILFAST mode, results in data truncation without any error.
> {code:java}
> scala> spark.read.option("multiLine", true).option("mode", "FAILFAST").format("json").load("/tmp/json").show(false)
> +---+----+
> |id |name|
> +---+----+
> |100|John|
> +---+----+
> scala> spark.read.option("mode", "FAILFAST").format("json").load("/tmp/json").show(false)
> +---+-----+
> |id |name |
> +---+-----+
> |100|John |
> |200|Marry|
> +---+-----+{code}
> I think Spark should return an error in this case especially in FAILFAST mode. This can be a common user error and we should not do silent data truncation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org