You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dilip Biswal (Jira)" <ji...@apache.org> on 2019/11/09 01:21:00 UTC
[jira] [Created] (SPARK-29806) Using multiline option for a JSON
file which is not multiline results in silent truncation of data.
Dilip Biswal created SPARK-29806:
------------------------------------
Summary: Using multiline option for a JSON file which is not multiline results in silent truncation of data.
Key: SPARK-29806
URL: https://issues.apache.org/jira/browse/SPARK-29806
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.4.4
Reporter: Dilip Biswal
The content of input Json File.
{code:java}
{"name":"John", "id":"100"}
{"name":"Marry","id":"200"}{code}
The above is valid json file but every record is in single line. But trying to read this file
with a multiline option with FAILFAST mode, results in data truncation without any error.
{code:java}
scala> spark.read.option("multiLine", true).option("mode", "FAILFAST").format("json").load("/tmp/json").show(false)
+---+----+
|id |name|
+---+----+
|100|John|
+---+----+
scala> spark.read.option("mode", "FAILFAST").format("json").load("/tmp/json").show(false)
+---+-----+
|id |name |
+---+-----+
|100|John |
|200|Marry|
+---+-----+{code}
I think Spark should return an error in this case especially in FAILFAST mode. This can be a common user error and we should not do silent data truncation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org