You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Raihan Jamal <ja...@gmail.com> on 2012/07/08 00:20:59 UTC
Joining two tables using HiveQL
Hi Everyone,
CREATE EXTERNAL TABLE IF NOT EXISTS Table1 (This is the MAIN table
through which comparisons need to be made)
(
BUYER_ID BIGINT,
ITEM_ID BIGINT,
CREATED_TIME STRING
)
And this is the data in the above first table
* **BUYER_ID** | **ITEM_ID** | **CREATED_TIME** *
*
----------------------+---------------------------+-----------------------------------------
*
* 1015826235 220003038067 2012-06-21 07:57:39 *
* 1015826235 300003861266 2012-06-21 21:11:12 *
* 1015826235 140002997245 2012-06-14 20:10:16 *
* 1015826235 200002448035 2012-06-08 22:02:17 *
* 1015826235 *260003553381* *2002-01-30 23:12:18* *
This is Second table in Hive- It also contains information about the items
we are purchasing.
CREATE EXTERNAL TABLE IF NOT EXISTS Table2
(
USER_ID BIGINT,
PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>
)
And this is the data in the above second table (Table2)-
***USER_ID** | **PURCHASED_ITEM***
*
-------------------------------+---------------------------------------------
*
* 1015826235
[{"product_id":220003038067,"timestamps":"1340321132000"},
{"product_id":300003861266,"timestamps":"1340271857000"}, *
*
{"product_id":140002997245,"timestamps":"1339694926000"},
{"product_id":200002448035,"timestamps":"1339172659000"}]*
**Compare Table2 with Table1 on USER_ID and BUYER_ID.**
Find the `BUYER_ID(USER_ID)` and as well as those `ITEM_ID` and
`CREATED_TIME` which are missing from `Table2` after comparing from
`Table1` on `BUYER_ID`.
So If you look `Table2` data this(last) `ITEM_ID(PRODUCT_ID) 260003553381`
and `CREATED_TIME(TIMESTAMPS) 2002-01-30 23:12:18` is missing from Table2.
So I want to show the below result using the HiveQL query.
* **BUYER_ID or USER_ID** | **ITEM_ID** |
**CREATED_TIME** | **TIMESTAMP_OF_CREATED_TIME***
*
-----------------------------------------------------+------------------------------------+----------------------------------------------+----------------------------------------------------------------
*
* 1015826235 260003553381
2002-01-30 23:12:18 1012457538*
Can anyone help me with this. As I am new to HiveQL so having lot of
Problem. I wrote the below HiveQL, but it is not working the way I wanted
to. Can anyone suggest me what wrong I am doing wrong?
**Updated:-**
I have written this query, but it doesn't working the way I wanted to.
SELECT
Table1.buyer_id, Table1.item_id, Table1.created_time,
UNIX_TIMESTAMP(Table1.created_time)
FROM ( SELECT user_id,prodID
FROM Table2 test2
LATERAL VIEW explode( test2.purchased_item.product_id )
prodTable AS prodID
) prodTable
RIGHT OUTER JOIN Table1 ON prodTable.user_id = Table1.buyer_id;
*Raihan Jamal*