You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jerry <je...@gmail.com> on 2016/02/23 18:02:45 UTC

Fast way to parse JSON in Spark

Hi, 
I had a Java parser using GSON and packaged it as java lib (e.g.
messageparserLib.jar). I use this lib in the Spark streaming and parse the
coming json messages. This is very slow and lots of time lag in
parsing/inserting messages to Cassandra. 
What is the fast way to parse JSON messages in Spark on-the-fly? My Json
message is complex and I want to extract over 30 fields and wrap them in a
case class, then store it in Cassandra with Structure format.
Some candidate solutions are appearing to my mind:
(1) Use Spark SQL to register a temp table and then select the fields what I
want to wrap in the case class.
(2) Use native standard lib of Scala, like
"scala.util.parsing.json.JSON.parseFull" to browse, parse and extract the
fields to map the case class.
(3) Use third-party libraries, play-json, lift-json to browse, parse then
extract the fields to map the case class.
The json messages are coming from Kafka consumer. It's over 1,500 messages
per second. So the message processing (parser and write to Cassandra) is
also need to be completed at the same time (1,500/second).

Thanks in advance.
Jerry

I appreciate it if you can give me any helps and advice. 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Fast-way-to-parse-JSON-in-Spark-tp26306.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org