You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Tom Brown <to...@gmail.com> on 2012/08/08 17:04:06 UTC

Question about querying JSON data

I have a large amount of data JSON data that was generated using
periods in the key names, e.g., {"category.field": "value"}.  I know
that's not the best way to do JSON but for better or worse, it's the
data I have to deal with.

I have tried using get_json_object, but I am concerned that it's JSON
path expressions interpret "." as a special character. I am also
concerned about the overhead of repeatedly parsing each record (each
record is about 2K, so not tiny, but not huge either).

I have tried using Hive-JSON-Serde but it seems to require that my
column names be named the same as my JSON field names.

I had heard that there was a serde somewhere that will allow me to
specify a JSON path to map to each specific field name, but other than
vague references on the mailing list, I haven't found any concrete
info about it.

I would to use existing code, but I can write my own serde if I have to.

What do you recommend?

Thanks in advance!

--Tom