You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Frederick Reiss (JIRA)" <ji...@apache.org> on 2015/05/01 18:18:06 UTC
[jira] [Commented] (SPARK-7273) The SQLContext.jsonFile() api has a
problem when load a format json file?
[ https://issues.apache.org/jira/browse/SPARK-7273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523384#comment-14523384 ]
Frederick Reiss commented on SPARK-7273:
----------------------------------------
The error in the description indicates that there is a character in the middle of the first line of the JSON file that TextInputFormat treats as a line separator. Spark sees the JSON content as
I can think of two potential causes:
a) Steven's JSON content has run through a pretty-printing function, and there is a newline character between the two parts of the JSON object, or
b) Steven's local Hadoop/YARN configuration has a nonstandard setting for "textinputformat.record.delimiter"
[~jiege]: Can you share a copy of your JSON file?
Technical details:
SQLContext.jsonFile() makes a call to org.apache.spark.sql.json.DefaultSource, which delegates the task to org.apache.spark.sql.json.JSONRelation, which uses SparkContext.textFile() to open the JSON file. SparkContext.textFile() uses TextInputFormat to read the file.
> The SQLContext.jsonFile() api has a problem when load a format json file?
> -------------------------------------------------------------------------
>
> Key: SPARK-7273
> URL: https://issues.apache.org/jira/browse/SPARK-7273
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.3.1
> Reporter: steven
> Priority: Minor
>
> my code as follow:
> val df = sqlContext.jsonFile("test.json");
> test.json content is:
> { "name": "steven",
> "age" : "20"
> }
> the jsonFile invoke will get a Exception as follow:
> java.lang.RuntimeException: Failed to parse record "age" : "20"}. Please make sure that each line of the file (or each string in the RDD) is a valid JSON object or an array of JSON objects.
> at scala.sys.package$.error(package.scala:27)
> at org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$2.apply(JsonRDD.scala:313)
> at org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$2.apply(JsonRDD.scala:307)
> is it a bug?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org