You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by fanooos <de...@gmail.com> on 2015/03/17 15:25:47 UTC

org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException

I have a hadoop cluster and I need to query the data stored on the HDFS using
spark sql thrift server.

Spark sql thrift server is up and running. It is configured to read from
HIVE table. The hive table is an external table that corresponding to set of
files stored on HDFS. These files contains JSON data.

I am connecting to spark sql thrift server using beeline. When I try to
execute a simple query like *select * from mytable limit 3* every thing
works fine.

But when I try to execute other queries like *select count(*) from mytable*
the following exceptions is thrown

*org.apache.hadoop.hive.serde2.SerDeException:
org.codehaus.jackson.JsonParseException: Unrecognized character escape ' '
(code 32) at [Source: java.io.StringReader@34ef429a; line: 1, column: 351]*

What I understand from the exception is that there are some files contains
corrupted JSON.

question 1 : am I understand this correctly?
question 2 : How can I find the file(s) causes this problem if I have about
3 thousand files and each file contains about 700 line of json data ?
question 3 : If I am sure that the json in the files on HDFS contains valid
json data, what should I do ?

--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-hadoop-hive-serde2-SerDeException-org-codehaus-jackson-JsonParseException-tp22103.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org