You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Norman Spangenberg <wi...@studserv.uni-leipzig.de> on 2014/08/06 17:15:14 UTC

parse json-file with scala-api and json4s

hello,
i hope this is the right place for this question.
i'm currently experimenting and comparing flink/stratosphere and apache 
spark.
my goal is to analyse large json-files of twitter-data and now i'm 
looking for a way to parse the json-tuples in a map-function and put in 
a dataset.
for this i'm using the flink scala api and json4s.
but in flink the problem is to parse the json-file.
val words = cleaned.map { ( line =>  parse(line) }
Error Message is:
     Error analyzing UDT org.json4s.JValue: Subtype 
org.json4s.JsonAST.JInt - Field num: BigInt - Unsupported type BigInt 
Subtype
      org.json4s.JsonAST.JArray - Field arr: 
List[org.json4s.JsonAST.JValue] - Subtype org.json4s.JsonAST.JInt - 
Field num: BigInt - Unsupported type BigInt Subtype
      org.json4s.JsonAST.JArray - Field arr: 
List[org.json4s.JsonAST.JValue] - Subtype org.json4s.JsonAST.JDecimal - 
Field num: BigDecimal - Unsupported type
      BigDecimal Subtype org.json4s.JsonAST.JDecimal - Field num: 
BigDecimal - Unsupported type BigDecimal Subtype 
org.json4s.JsonAST.JObject - Field obj:
      List[(String, org.json4s.JsonAST.JValue)] - Field _2: 
org.json4s.JsonAST.JValue - Subtype org.json4s.JsonAST.JInt - Field num: 
BigInt - Unsupported type BigInt
      Subtype org.json4s.JsonAST.JObject - Field obj: List[(String, 
org.json4s.JsonAST.JValue)] - Field _2: org.json4s.JsonAST.JValue - Subtype
      org.json4s.JsonAST.JDecimal - Field num: BigDecimal - Unsupported 
type BigDecimal


in spark i found a way based on 
https://gist.github.com/cotdp/b471cfff183b59d65ae1

val user_interest = lines.map(line => {parse(line)})
                          .map(json => {implicit lazy val formats = 
org.json4s.DefaultFormats
                                     val name = (json \ 
"name").extract[String]
                                     val location_x = (json \ "location" 
\ "x").extract[Double]
                                     val location_y = (json \ "location" 
\ "y").extract[Double]
                                     val likes = (json \ 
"likes").extract[Seq[String]].map(_.toLowerCase()).mkString(";")
                                     ( UserInterest(name, location_x, 
location_y, likes) )
                                    })

this works fine in spark, but is it possible to do the same with flink?

kind regards,
norman

Re: parse json-file with scala-api and json4s

Posted by Aljoscha Krettek <al...@apache.org>.

Hi Norman,
right now it is only possible to use Primitive Types and Case Classes (of
which tuples are a special case) as Scala Data Types. Your program could
work if you omit the second map function and instead put that code in your
first map function. This way you avoid having that custom JSON Type as the
data type of your first intermediate DataSet.

Just let me now if you need more information.

Aljoscha


On Wed, Aug 6, 2014 at 5:15 PM, Norman Spangenberg <
wir12kqe@studserv.uni-leipzig.de> wrote:

> hello,
> i hope this is the right place for this question.
> i'm currently experimenting and comparing flink/stratosphere and apache
> spark.
> my goal is to analyse large json-files of twitter-data and now i'm looking
> for a way to parse the json-tuples in a map-function and put in a dataset.
> for this i'm using the flink scala api and json4s.
> but in flink the problem is to parse the json-file.
> val words = cleaned.map { ( line =>  parse(line) }
> Error Message is:
>     Error analyzing UDT org.json4s.JValue: Subtype org.json4s.JsonAST.JInt
> - Field num: BigInt - Unsupported type BigInt Subtype
>      org.json4s.JsonAST.JArray - Field arr: List[org.json4s.JsonAST.JValue]
> - Subtype org.json4s.JsonAST.JInt - Field num: BigInt - Unsupported type
> BigInt Subtype
>      org.json4s.JsonAST.JArray - Field arr: List[org.json4s.JsonAST.JValue]
> - Subtype org.json4s.JsonAST.JDecimal - Field num: BigDecimal - Unsupported
> type
>      BigDecimal Subtype org.json4s.JsonAST.JDecimal - Field num:
> BigDecimal - Unsupported type BigDecimal Subtype org.json4s.JsonAST.JObject
> - Field obj:
>      List[(String, org.json4s.JsonAST.JValue)] - Field _2:
> org.json4s.JsonAST.JValue - Subtype org.json4s.JsonAST.JInt - Field num:
> BigInt - Unsupported type BigInt
>      Subtype org.json4s.JsonAST.JObject - Field obj: List[(String,
> org.json4s.JsonAST.JValue)] - Field _2: org.json4s.JsonAST.JValue - Subtype
>      org.json4s.JsonAST.JDecimal - Field num: BigDecimal - Unsupported
> type BigDecimal
>
>
> in spark i found a way based on https://gist.github.com/cotdp/
> b471cfff183b59d65ae1
>
> val user_interest = lines.map(line => {parse(line)})
>                          .map(json => {implicit lazy val formats =
> org.json4s.DefaultFormats
>                                     val name = (json \
> "name").extract[String]
>                                     val location_x = (json \ "location" \
> "x").extract[Double]
>                                     val location_y = (json \ "location" \
> "y").extract[Double]
>                                     val likes = (json \
> "likes").extract[Seq[String]].map(_.toLowerCase()).mkString(";")
>                                     ( UserInterest(name, location_x,
> location_y, likes) )
>                                    })
>
> this works fine in spark, but is it possible to do the same with flink?
>
> kind regards,
> norman
>