You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Subramanian, Sanjay (HQP)" <sa...@roberthalf.com> on 2014/08/21 21:12:03 UTC
Denormalizing JSON arrays
Hey guys
How do I denormalize the JSON arrays with a select statement ?
Thanks
Warm Regards
sanjay
DDL
===
USE sansub01
;
ADD JAR
./json-serde-1.3-SNAPSHOT-jar-with-dependencies.jar
;
DROP TABLE IF EXISTS
res_score
;
CREATE EXTERNAL TABLE res_score (
uniqueResumeIdentifier STRING,
resumeLastModified STRING,
resumeProcessedOn STRING,
resume_scores array<struct<
funcrole_id:INT,
funcrole_name:STRING,
lob_id:INT,
lob_name:STRING,
score:INT
>>
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE LOCATION '/data/resume_scores'
;
SAMPLE DATA
===========
{
"uniqueResumeIdentifier": “1008552662”,
"resumeLastModified": "2014-08-05_T_14-01-41",
"resumeProcessedOn": "2014-08-21_T_11-21-12",
"resume_scores": [
{
"funcrole_id": "1000139",
"funcrole_name": “foo1",
"lob_id": "5",
"lob_name": “lobn5",
"score": 2176
},
{
"funcrole_id": "1000137",
"funcrole_name": “foo2",
"lob_id": "5",
"lob_name": "lobn5",
"score": 1774
},
{
"funcrole_id": "1000138",
"funcrole_name": “foo3",
"lob_id": “4",
"lob_name": "lobn4",
"score": 2300
},
{
"funcrole_id": "1000929",
"funcrole_name": “foo4",
"lob_id": "1",
"lob_name": "lobn1",
"score": 2878
}
]
}
DATA FORMAT that I want with SELECT
===================================
1008552662 2014-08-05_T_14-01-41 2014-08-21_T_11-21-12 1000139 foo1 5 lobn5 2176
1008552662 2014-08-05_T_14-01-41 2014-08-21_T_11-21-12 1000137 foo2 5 lobn5 1774
1008552662 2014-08-05_T_14-01-41 2014-08-21_T_11-21-12 1000138 foo3 4 lobn4 2300
1008552662 2014-08-05_T_14-01-41 2014-08-21_T_11-21-12 1000929 foo4 1 lobn1 2878
MY ENVIRONMENT
==============
Version Summary
Cluster 1 — CDH 5
Hosts
hadoop[01-03]
Component Version Release CDH Version
Bigtop-Tomcat (CDH 5 only) 0.7.0+cdh5.1.0+0 1.cdh5.1.0.p0.25 CDH 5
Crunch (CDH 5 only) 0.10.0+cdh5.1.0+14 1.cdh5.1.0.p0.25 CDH 5
Flume NG 1.5.0+cdh5.1.0+10 1.cdh5.1.0.p0.26 CDH 5
MapReduce 1 2.3.0+cdh5.1.0+795 1.cdh5.1.0.p0.58 CDH 5
Hadoop 2.3.0+cdh5.1.0+795 1.cdh5.1.0.p0.58 CDH 5
HDFS 2.3.0+cdh5.1.0+795 1.cdh5.1.0.p0.58 CDH 5
HttpFS 2.3.0+cdh5.1.0+795 1.cdh5.1.0.p0.58 CDH 5
MapReduce 2 2.3.0+cdh5.1.0+795 1.cdh5.1.0.p0.58 CDH 5
YARN 2.3.0+cdh5.1.0+795 1.cdh5.1.0.p0.58 CDH 5
HBase 0.98.1+cdh5.1.0+64 1.cdh5.1.0.p0.34 CDH 5
Lily HBase Indexer 1.5+cdh5.1.0+12 1.cdh5.1.0.p0.41 CDH 5
Hive 0.12.0+cdh5.1.0+369 1.cdh5.1.0.p0.39 CDH 5
HCatalog 0.12.0+cdh5.1.0+369 1.cdh5.1.0.p0.39 CDH 5
Hue 3.6.0+cdh5.1.0+86 1.cdh5.1.0.p0.36 CDH 5
Impala 1.4.0+cdh5.1.0+0 1.cdh5.1.0.p0.92 CDH 5
Kite (CDH 5 only) 0.10.0+cdh5.1.0+120 1.cdh5.1.0.p0.30 CDH 5
Llama (CDH 5 only) 1.0.0+cdh5.1.0+0 1.cdh5.1.0.p0.25 CDH 5
Mahout 0.9+cdh5.1.0+11 1.cdh5.1.0.p0.23 CDH 5
Oozie 4.0.0+cdh5.1.0+249 1.cdh5.1.0.p0.28 CDH 5
Parquet 1.2.5+cdh5.1.0+130 1.cdh5.1.0.p0.26 CDH 5
Pig 0.12.0+cdh5.1.0+33 1.cdh5.1.0.p0.23 CDH 5
sentry 1.3.0+cdh5.1.0+155 1.cdh5.1.0.p0.59 CDH 5
Solr 4.4.0+cdh5.1.0+231 1.cdh5.1.0.p0.32 CDH 5
spark 1.0.0+cdh5.1.0+41 1.cdh5.1.0.p0.27 CDH 5
Sqoop2 1.99.3+cdh5.1.0+26 1.cdh5.1.0.p0.22 CDH 5
Sqoop 1.4.4+cdh5.1.0+55 1.cdh5.1.0.p0.24 CDH 5
Whirr 0.9.0+cdh5.1.0+9 1.cdh5.1.0.p0.21 CDH 5
Zookeeper 3.4.5+cdh5.1.0+29 1.cdh5.1.0.p0.31 CDH 5
Cloudera Manager Management Daemons 5.1.1 1.cm511.p0.82 Not applicable
Java 6 java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Unavailable Not applicable
Java 7 java version "1.7.0_55" Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) Unavailable Not applicable
Cloudera Manager Agent 5.1.1 1.cm511.p0.82 Not applicable
Re: Denormalizing JSON arrays
Posted by "Subramanian, Sanjay (HQP)" <sa...@roberthalf.com>.
Ok guys
Figured it out
Looks like the collective Hive Groups positive thought waves are helping me immensely – LOL
After many experiments , this is the HQL that worked beautifully . I had forgotten that explode can take an array as a param as well !
HIVE QUERY
==========
use sansub01
;
add jar ./json-serde-1.3-SNAPSHOT-jar-with-dependencies.jar
;
SELECT
l.uniqueresumeidentifier,
l.resumelastmodified,
l.resumeprocessedon,
l.lid.lob_id,
l.lid.funcrole_id,
l.lid.score
FROM
(select
uniqueresumeidentifier,
resumelastmodified,
resumeprocessedon,
lid
from
res_score
lateral view explode
(res_score.resume_scores) oc1
AS
lid) l
;
Thanks
Warm Regards
sanjay
From: <Subramanian>, Sanjay Subramanian <sa...@roberthalf.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Thursday, August 21, 2014 at 12:12 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Denormalizing JSON arrays
Hey guys
How do I denormalize the JSON arrays with a select statement ?
Thanks
Warm Regards
sanjay
DDL
===
USE sansub01
;
ADD JAR
./json-serde-1.3-SNAPSHOT-jar-with-dependencies.jar
;
DROP TABLE IF EXISTS
res_score
;
CREATE EXTERNAL TABLE res_score (
uniqueResumeIdentifier STRING,
resumeLastModified STRING,
resumeProcessedOn STRING,
resume_scores array<struct<
funcrole_id:INT,
funcrole_name:STRING,
lob_id:INT,
lob_name:STRING,
score:INT
>>
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE LOCATION '/data/resume_scores'
;
SAMPLE DATA
===========
{
"uniqueResumeIdentifier": “1008552662”,
"resumeLastModified": "2014-08-05_T_14-01-41",
"resumeProcessedOn": "2014-08-21_T_11-21-12",
"resume_scores": [
{
"funcrole_id": "1000139",
"funcrole_name": “foo1",
"lob_id": "5",
"lob_name": “lobn5",
"score": 2176
},
{
"funcrole_id": "1000137",
"funcrole_name": “foo2",
"lob_id": "5",
"lob_name": "lobn5",
"score": 1774
},
{
"funcrole_id": "1000138",
"funcrole_name": “foo3",
"lob_id": “4",
"lob_name": "lobn4",
"score": 2300
},
{
"funcrole_id": "1000929",
"funcrole_name": “foo4",
"lob_id": "1",
"lob_name": "lobn1",
"score": 2878
}
]
}
DATA FORMAT that I want with SELECT
===================================
10085526622014-08-05_T_14-01-412014-08-21_T_11-21-121000139 foo15 lobn52176
10085526622014-08-05_T_14-01-412014-08-21_T_11-21-121000137 foo25 lobn51774
10085526622014-08-05_T_14-01-412014-08-21_T_11-21-121000138 foo34 lobn42300
10085526622014-08-05_T_14-01-412014-08-21_T_11-21-121000929 foo41 lobn12878
MY ENVIRONMENT
==============
Version Summary
Cluster 1 — CDH 5
Hosts
hadoop[01-03]
Component Version Release CDH Version
Bigtop-Tomcat (CDH 5 only) 0.7.0+cdh5.1.0+0 1.cdh5.1.0.p0.25 CDH 5
Crunch (CDH 5 only) 0.10.0+cdh5.1.0+14 1.cdh5.1.0.p0.25 CDH 5
Flume NG 1.5.0+cdh5.1.0+10 1.cdh5.1.0.p0.26 CDH 5
MapReduce 1 2.3.0+cdh5.1.0+795 1.cdh5.1.0.p0.58 CDH 5
Hadoop 2.3.0+cdh5.1.0+795 1.cdh5.1.0.p0.58 CDH 5
HDFS 2.3.0+cdh5.1.0+795 1.cdh5.1.0.p0.58 CDH 5
HttpFS 2.3.0+cdh5.1.0+795 1.cdh5.1.0.p0.58 CDH 5
MapReduce 2 2.3.0+cdh5.1.0+795 1.cdh5.1.0.p0.58 CDH 5
YARN 2.3.0+cdh5.1.0+795 1.cdh5.1.0.p0.58 CDH 5
HBase 0.98.1+cdh5.1.0+64 1.cdh5.1.0.p0.34 CDH 5
Lily HBase Indexer 1.5+cdh5.1.0+12 1.cdh5.1.0.p0.41 CDH 5
Hive 0.12.0+cdh5.1.0+369 1.cdh5.1.0.p0.39 CDH 5
HCatalog 0.12.0+cdh5.1.0+369 1.cdh5.1.0.p0.39 CDH 5
Hue 3.6.0+cdh5.1.0+86 1.cdh5.1.0.p0.36 CDH 5
Impala 1.4.0+cdh5.1.0+0 1.cdh5.1.0.p0.92 CDH 5
Kite (CDH 5 only) 0.10.0+cdh5.1.0+120 1.cdh5.1.0.p0.30 CDH 5
Llama (CDH 5 only) 1.0.0+cdh5.1.0+0 1.cdh5.1.0.p0.25 CDH 5
Mahout 0.9+cdh5.1.0+11 1.cdh5.1.0.p0.23 CDH 5
Oozie 4.0.0+cdh5.1.0+249 1.cdh5.1.0.p0.28 CDH 5
Parquet 1.2.5+cdh5.1.0+130 1.cdh5.1.0.p0.26 CDH 5
Pig 0.12.0+cdh5.1.0+33 1.cdh5.1.0.p0.23 CDH 5
sentry 1.3.0+cdh5.1.0+155 1.cdh5.1.0.p0.59 CDH 5
Solr 4.4.0+cdh5.1.0+231 1.cdh5.1.0.p0.32 CDH 5
spark 1.0.0+cdh5.1.0+41 1.cdh5.1.0.p0.27 CDH 5
Sqoop2 1.99.3+cdh5.1.0+26 1.cdh5.1.0.p0.22 CDH 5
Sqoop 1.4.4+cdh5.1.0+55 1.cdh5.1.0.p0.24 CDH 5
Whirr 0.9.0+cdh5.1.0+9 1.cdh5.1.0.p0.21 CDH 5
Zookeeper 3.4.5+cdh5.1.0+29 1.cdh5.1.0.p0.31 CDH 5
Cloudera Manager Management Daemons 5.1.1 1.cm511.p0.82 Not applicable
Java 6 java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Unavailable Not applicable
Java 7 java version "1.7.0_55" Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) Unavailable Not applicable
Cloudera Manager Agent 5.1.1 1.cm511.p0.82 Not applicable