You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Subramanian, Sanjay (HQP)" <sa...@roberthalf.com> on 2014/08/21 21:12:03 UTC

Denormalizing JSON arrays

Hey guys

How do I denormalize the JSON arrays with a select statement ?

Thanks

Warm Regards

sanjay


DDL
===

USE sansub01

;

ADD JAR

     ./json-serde-1.3-SNAPSHOT-jar-with-dependencies.jar

;

DROP  TABLE IF EXISTS

     res_score

;

CREATE EXTERNAL TABLE res_score (

     uniqueResumeIdentifier STRING,

     resumeLastModified STRING,

     resumeProcessedOn STRING,

     resume_scores array<struct<

                       funcrole_id:INT,

                          funcrole_name:STRING,

                          lob_id:INT,

  lob_name:STRING,

                          score:INT

                          >>

) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'  STORED AS TEXTFILE LOCATION '/data/resume_scores'

;

SAMPLE DATA
===========

{

    "uniqueResumeIdentifier": “1008552662”,

    "resumeLastModified": "2014-08-05_T_14-01-41",

    "resumeProcessedOn": "2014-08-21_T_11-21-12",

    "resume_scores": [

        {

            "funcrole_id": "1000139",

            "funcrole_name": “foo1",

            "lob_id": "5",

            "lob_name": “lobn5",

            "score": 2176

        },

        {

            "funcrole_id": "1000137",

            "funcrole_name": “foo2",

            "lob_id": "5",

            "lob_name": "lobn5",

            "score": 1774

        },

        {

            "funcrole_id": "1000138",

            "funcrole_name": “foo3",

            "lob_id": “4",

            "lob_name": "lobn4",

            "score": 2300

        },

        {

            "funcrole_id": "1000929",

            "funcrole_name": “foo4",

            "lob_id": "1",

            "lob_name": "lobn1",

            "score": 2878

        }

    ]

}


DATA FORMAT that I want with SELECT

===================================

1008552662 2014-08-05_T_14-01-41 2014-08-21_T_11-21-12 1000139 foo1 5 lobn5 2176

1008552662 2014-08-05_T_14-01-41 2014-08-21_T_11-21-12 1000137 foo2 5 lobn5 1774

1008552662 2014-08-05_T_14-01-41 2014-08-21_T_11-21-12 1000138 foo3 4 lobn4 2300

1008552662 2014-08-05_T_14-01-41 2014-08-21_T_11-21-12 1000929 foo4 1 lobn1 2878



MY ENVIRONMENT

==============

Version Summary
Cluster 1 — CDH 5
Hosts
hadoop[01-03]
Component       Version Release CDH Version
Bigtop-Tomcat (CDH 5 only)      0.7.0+cdh5.1.0+0        1.cdh5.1.0.p0.25        CDH 5
Crunch (CDH 5 only)     0.10.0+cdh5.1.0+14      1.cdh5.1.0.p0.25        CDH 5
Flume NG        1.5.0+cdh5.1.0+10       1.cdh5.1.0.p0.26        CDH 5
MapReduce 1     2.3.0+cdh5.1.0+795      1.cdh5.1.0.p0.58        CDH 5
Hadoop  2.3.0+cdh5.1.0+795      1.cdh5.1.0.p0.58        CDH 5
HDFS    2.3.0+cdh5.1.0+795      1.cdh5.1.0.p0.58        CDH 5
HttpFS  2.3.0+cdh5.1.0+795      1.cdh5.1.0.p0.58        CDH 5
MapReduce 2     2.3.0+cdh5.1.0+795      1.cdh5.1.0.p0.58        CDH 5
YARN    2.3.0+cdh5.1.0+795      1.cdh5.1.0.p0.58        CDH 5
HBase   0.98.1+cdh5.1.0+64      1.cdh5.1.0.p0.34        CDH 5
Lily HBase Indexer      1.5+cdh5.1.0+12 1.cdh5.1.0.p0.41        CDH 5
Hive    0.12.0+cdh5.1.0+369     1.cdh5.1.0.p0.39        CDH 5
HCatalog        0.12.0+cdh5.1.0+369     1.cdh5.1.0.p0.39        CDH 5
Hue     3.6.0+cdh5.1.0+86       1.cdh5.1.0.p0.36        CDH 5
Impala  1.4.0+cdh5.1.0+0        1.cdh5.1.0.p0.92        CDH 5
Kite (CDH 5 only)       0.10.0+cdh5.1.0+120     1.cdh5.1.0.p0.30        CDH 5
Llama (CDH 5 only)      1.0.0+cdh5.1.0+0        1.cdh5.1.0.p0.25        CDH 5
Mahout  0.9+cdh5.1.0+11 1.cdh5.1.0.p0.23        CDH 5
Oozie   4.0.0+cdh5.1.0+249      1.cdh5.1.0.p0.28        CDH 5
Parquet 1.2.5+cdh5.1.0+130      1.cdh5.1.0.p0.26        CDH 5
Pig     0.12.0+cdh5.1.0+33      1.cdh5.1.0.p0.23        CDH 5
sentry  1.3.0+cdh5.1.0+155      1.cdh5.1.0.p0.59        CDH 5
Solr    4.4.0+cdh5.1.0+231      1.cdh5.1.0.p0.32        CDH 5
spark   1.0.0+cdh5.1.0+41       1.cdh5.1.0.p0.27        CDH 5
Sqoop2  1.99.3+cdh5.1.0+26      1.cdh5.1.0.p0.22        CDH 5
Sqoop   1.4.4+cdh5.1.0+55       1.cdh5.1.0.p0.24        CDH 5
Whirr   0.9.0+cdh5.1.0+9        1.cdh5.1.0.p0.21        CDH 5
Zookeeper       3.4.5+cdh5.1.0+29       1.cdh5.1.0.p0.31        CDH 5
Cloudera Manager Management Daemons     5.1.1   1.cm511.p0.82   Not applicable
Java 6  java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)     Unavailable     Not applicable
Java 7  java version "1.7.0_55" Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)    Unavailable     Not applicable
Cloudera Manager Agent  5.1.1   1.cm511.p0.82   Not applicable



Re: Denormalizing JSON arrays

Posted by "Subramanian, Sanjay (HQP)" <sa...@roberthalf.com>.
Ok guys

Figured it out

Looks like the collective Hive Groups positive thought waves are helping me immensely – LOL

After many experiments , this is the HQL that worked beautifully . I had forgotten that explode can take an array as a param as well !


HIVE QUERY

==========

use sansub01

;


add jar ./json-serde-1.3-SNAPSHOT-jar-with-dependencies.jar

;

SELECT

        l.uniqueresumeidentifier,

        l.resumelastmodified,

        l.resumeprocessedon,

        l.lid.lob_id,

        l.lid.funcrole_id,

        l.lid.score

FROM

     (select

        uniqueresumeidentifier,

        resumelastmodified,

        resumeprocessedon,

        lid

     from

        res_score

     lateral view explode

        (res_score.resume_scores) oc1

     AS

        lid) l

;


Thanks

Warm Regards

sanjay



From: <Subramanian>, Sanjay Subramanian <sa...@roberthalf.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Thursday, August 21, 2014 at 12:12 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Denormalizing JSON arrays

Hey guys

How do I denormalize the JSON arrays with a select statement ?

Thanks

Warm Regards

sanjay


DDL
===

USE sansub01

;

ADD JAR

     ./json-serde-1.3-SNAPSHOT-jar-with-dependencies.jar

;

DROP  TABLE IF EXISTS

     res_score

;

CREATE EXTERNAL TABLE res_score (

     uniqueResumeIdentifier STRING,

     resumeLastModified STRING,

     resumeProcessedOn STRING,

     resume_scores array<struct<

                       funcrole_id:INT,

                          funcrole_name:STRING,

                          lob_id:INT,

  lob_name:STRING,

                          score:INT

                          >>

) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'  STORED AS TEXTFILE LOCATION '/data/resume_scores'

;

SAMPLE DATA
===========

{

    "uniqueResumeIdentifier": “1008552662”,

    "resumeLastModified": "2014-08-05_T_14-01-41",

    "resumeProcessedOn": "2014-08-21_T_11-21-12",

    "resume_scores": [

        {

            "funcrole_id": "1000139",

            "funcrole_name": “foo1",

            "lob_id": "5",

            "lob_name": “lobn5",

            "score": 2176

        },

        {

            "funcrole_id": "1000137",

            "funcrole_name": “foo2",

            "lob_id": "5",

            "lob_name": "lobn5",

            "score": 1774

        },

        {

            "funcrole_id": "1000138",

            "funcrole_name": “foo3",

            "lob_id": “4",

            "lob_name": "lobn4",

            "score": 2300

        },

        {

            "funcrole_id": "1000929",

            "funcrole_name": “foo4",

            "lob_id": "1",

            "lob_name": "lobn1",

            "score": 2878

        }

    ]

}


DATA FORMAT that I want with SELECT

===================================

10085526622014-08-05_T_14-01-412014-08-21_T_11-21-121000139 foo15 lobn52176

10085526622014-08-05_T_14-01-412014-08-21_T_11-21-121000137 foo25 lobn51774

10085526622014-08-05_T_14-01-412014-08-21_T_11-21-121000138 foo34 lobn42300

10085526622014-08-05_T_14-01-412014-08-21_T_11-21-121000929 foo41 lobn12878



MY ENVIRONMENT

==============

Version Summary
Cluster 1 — CDH 5
Hosts
hadoop[01-03]
Component       Version Release CDH Version
Bigtop-Tomcat (CDH 5 only)      0.7.0+cdh5.1.0+0        1.cdh5.1.0.p0.25        CDH 5
Crunch (CDH 5 only)     0.10.0+cdh5.1.0+14      1.cdh5.1.0.p0.25        CDH 5
Flume NG        1.5.0+cdh5.1.0+10       1.cdh5.1.0.p0.26        CDH 5
MapReduce 1     2.3.0+cdh5.1.0+795      1.cdh5.1.0.p0.58        CDH 5
Hadoop  2.3.0+cdh5.1.0+795      1.cdh5.1.0.p0.58        CDH 5
HDFS    2.3.0+cdh5.1.0+795      1.cdh5.1.0.p0.58        CDH 5
HttpFS  2.3.0+cdh5.1.0+795      1.cdh5.1.0.p0.58        CDH 5
MapReduce 2     2.3.0+cdh5.1.0+795      1.cdh5.1.0.p0.58        CDH 5
YARN    2.3.0+cdh5.1.0+795      1.cdh5.1.0.p0.58        CDH 5
HBase   0.98.1+cdh5.1.0+64      1.cdh5.1.0.p0.34        CDH 5
Lily HBase Indexer      1.5+cdh5.1.0+12 1.cdh5.1.0.p0.41        CDH 5
Hive    0.12.0+cdh5.1.0+369     1.cdh5.1.0.p0.39        CDH 5
HCatalog        0.12.0+cdh5.1.0+369     1.cdh5.1.0.p0.39        CDH 5
Hue     3.6.0+cdh5.1.0+86       1.cdh5.1.0.p0.36        CDH 5
Impala  1.4.0+cdh5.1.0+0        1.cdh5.1.0.p0.92        CDH 5
Kite (CDH 5 only)       0.10.0+cdh5.1.0+120     1.cdh5.1.0.p0.30        CDH 5
Llama (CDH 5 only)      1.0.0+cdh5.1.0+0        1.cdh5.1.0.p0.25        CDH 5
Mahout  0.9+cdh5.1.0+11 1.cdh5.1.0.p0.23        CDH 5
Oozie   4.0.0+cdh5.1.0+249      1.cdh5.1.0.p0.28        CDH 5
Parquet 1.2.5+cdh5.1.0+130      1.cdh5.1.0.p0.26        CDH 5
Pig     0.12.0+cdh5.1.0+33      1.cdh5.1.0.p0.23        CDH 5
sentry  1.3.0+cdh5.1.0+155      1.cdh5.1.0.p0.59        CDH 5
Solr    4.4.0+cdh5.1.0+231      1.cdh5.1.0.p0.32        CDH 5
spark   1.0.0+cdh5.1.0+41       1.cdh5.1.0.p0.27        CDH 5
Sqoop2  1.99.3+cdh5.1.0+26      1.cdh5.1.0.p0.22        CDH 5
Sqoop   1.4.4+cdh5.1.0+55       1.cdh5.1.0.p0.24        CDH 5
Whirr   0.9.0+cdh5.1.0+9        1.cdh5.1.0.p0.21        CDH 5
Zookeeper       3.4.5+cdh5.1.0+29       1.cdh5.1.0.p0.31        CDH 5
Cloudera Manager Management Daemons     5.1.1   1.cm511.p0.82   Not applicable
Java 6  java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)     Unavailable     Not applicable
Java 7  java version "1.7.0_55" Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)    Unavailable     Not applicable
Cloudera Manager Agent  5.1.1   1.cm511.p0.82   Not applicable