You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ike Walker <ik...@flite.com> on 2013/05/28 16:56:12 UTC

Combining 2 JSON objects in Hive

Hello,

I have two JSON objects stored as strings in a Hive table.

I would like to combine them into a single JSON object in Hive.

I'm running Hive 0.7, but am planning to upgrade soon so a solution that works in Hive 0.8 could be acceptable as well.

For example, here's the data now:
+-------------------------------------+-------------------------------+
| col1                                | col2                          |
+-------------------------------------+-------------------------------+
| {"age":"Over 30","gender":"female"} | {"counter":"0","version":"1"  |
| {"age":"Over 30","gender":"female"} | {"counter":"4","version":"1"  |
| {"age":"Over 30","gender":"male"}   | {"counter":"10","version":"1" |
+-------------------------------------+-------------------------------+

And here's what I want to select:

{"age":"Over 30","gender":"female","counter":"0","version":"1"}
{"age":"Over 30","gender":"female","counter":"4","version":"1"}
{"age":"Over 30","gender":"male","counter":"10","version":"1"}

Thanks,
Ike Walker

Re: Combining 2 JSON objects in Hive

Posted by Stephen Sprague <sp...@gmail.com>.
I know of no way to do this purely natively within hive, however, don't let
that stop you.  Enter the transform() function.   Write your JSON merge
using python, perl, ruby or whatever floats your boat.

Don't let the gnarly syntax on this page scare you:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform

basically its just this

     SELECT TRANSFORM(stuff) USING 'script' AS (thing1 INT, thing2 INT)

as quoted from the last example on that page.


On Tue, May 28, 2013 at 7:56 AM, Ike Walker <ik...@flite.com> wrote:

> Hello,
>
> I have two JSON objects stored as strings in a Hive table.
>
> I would like to combine them into a single JSON object in Hive.
>
> I'm running Hive 0.7, but am planning to upgrade soon so a solution that
> works in Hive 0.8 could be acceptable as well.
>
> For example, here's the data now:
> +-------------------------------------+-------------------------------+
> | col1                                | col2                          |
> +-------------------------------------+-------------------------------+
> | {"age":"Over 30","gender":"female"} | {"counter":"0","version":"1"  |
> | {"age":"Over 30","gender":"female"} | {"counter":"4","version":"1"  |
> | {"age":"Over 30","gender":"male"}   | {"counter":"10","version":"1" |
> +-------------------------------------+-------------------------------+
>
> And here's what I want to select:
>
> {"age":"Over 30","gender":"female","counter":"0","version":"1"}
> {"age":"Over 30","gender":"female","counter":"4","version":"1"}
> {"age":"Over 30","gender":"male","counter":"10","version":"1"}
>
> Thanks,
> Ike Walker
>