You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Raihan Jamal <ja...@gmail.com> on 2012/07/06 01:26:39 UTC
HiveQL Joins- Hadoop
Can anyone help me with the below query?
*
*
*CREATE EXTERNAL TABLE IF NOT EXISTS Table1 (This is the MAIN table
through which comparisons need to be made)*
* (*
* BUYER_ID BIGINT,*
* ITEM_ID BIGINT,*
* CREATED_TIME STRING*
* )*
And this is the data in the above first table
* *
* **BUYER_ID** **ITEM_ID** **CREATED_TIME** *
* 1015826235 220003038067 2012-06-21
07:57:39 *
* 1015826235 300003861266 2012-06-21
21:11:12 *
* 1015826235 140002997245 2012-06-14
20:10:16 *
* 1015826235 200002448035 2012-06-08
22:02:17 *
* 1015826235 *260003553381* *2002-01-30 23:12:18* *
This is Second table in Hive- It also contains information about the items
we are purchasing.
*CREATE EXTERNAL TABLE IF NOT EXISTS Table2*
* (*
* USER_ID BIGINT,*
* PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>*
* )*
And this is the data in the above second table-
***USER_ID** **PURCHASED_ITEM***
* 1015826235
[{"product_id":220003038067,"timestamps":"1340321132000"},
{"product_id":300003861266,"timestamps":"1340271857000"}, *
*
{"product_id":140002997245,"timestamps":"1339694926000"},**
{"product_id":200002448035,"timestamps":"1339172659000"}]*
*Compare Table2 with Table1 on USER_ID and BUYER_ID.*
*Find the `BUYER_ID(USER_ID)` and as well as those `ITEM_ID` and
`CREATED_TIME` which are missing from `Table2` after comparing from
`Table1` on `BUYER_ID`.*
So If you look `Table2` data this `ITEM_ID(PRODUCT_ID) 260003553381` and
`CREATED_TIME(TIMESTAMPS) 2002-01-30 23:12:18` is missing from Table2. So
I want to show the below result using the HiveQL query.
***BUYER_ID or USER_ID** **ITEM_ID**
**CREATED_TIME** **TIMESTAMP_OF_CREATED_TIME***
* 1015826235 260003553381
2002-01-30 23:12:18
1012457538*
Below is the query I wrote, I am not sure whether it is right or not as
HiveQL is totally new to me. Can anyone help with this HiveQL query?
select BUYER_ID, ITEM_ID, CREATED_TIME, UNIX_TIMESTAMP(CREATED_TIME) from
table1 where table1.user_id = table2.buyer_id and (table1.item_id <>
table2.product_id or UNIX_TIMESTAMP(table1.created_time) <>
table2.timestamps)
*Raihan Jamal*