You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "@Sanjiv Singh" <sa...@gmail.com> on 2016/01/09 13:18:51 UTC
HIVE : LEFT OUTER JOIN + NON-EQUI JOIN
Hi All,
I am having issue hive LEFT OUTER JOIN.
I had al table in sql-server. then used sqoop to migrate all tables on
hive.
This is the original query from sql-server which contains non-equi LEFT
OUTER JOIN. both table have *cartesian data*.
SELECT
vss.company_id,vss.shares_ship_id,vss.seatmap_cd,vss.cabin,vss.seat,
vss.seat_loc_dscr, vss.ep_seat AS EPlus_Seat, vss.ep_win_seat,
vss.ep_asle_seat, vss.ep_mid_seat, vss.em_win_seat,
vss.em_mid_seat,vss.em_asle_seat,vss.y_win_seat, vss.y_mid_seat,
vss.y_asle_seat, vss.fj_win_seat, vss.fj_mid_seat,
vss.fj_asle_seat,vss.exit_row, vss.bulkhead_row, vss.eff_dt, vss.disc_dt
FROM rvsed11 zz
LEFT OUTER JOIN rvsed22 vss
ON zz.company_id = vss.company_id
AND zz.shares_ship_id = vss.shares_ship_id
AND *zz.report_dt >= vss.eff_dt *
AND *zz.report_dt < vss.disc_dt*;
As we know that Nonequi joins are not working in hive ( Nonequi joins
working in WHERE clause but we cannot use with LEFT OUTER JOIN).
See below hive query with noon-equi condition moved to where clause.
SELECT
vss.company_id,vss.shares_ship_id,vss.seatmap_cd,vss.cabin,vss.seat,
vss.seat_loc_dscr, vss.ep_seat AS EPlus_Seat, vss.ep_win_seat,
vss.ep_asle_seat, vss.ep_mid_seat, vss.em_win_seat,
vss.em_mid_seat,vss.em_asle_seat,vss.y_win_seat, vss.y_mid_seat,
vss.y_asle_seat, vss.fj_win_seat, vss.fj_mid_seat,
vss.fj_asle_seat,vss.exit_row, vss.bulkhead_row, vss.eff_dt, vss.disc_dt
FROM rvsed11 zz
LEFT OUTER JOIN rvsed22 vss
ON zz.company_id = vss.company_id
AND zz.shares_ship_id = vss.shares_ship_id
*WHERE zz.report_dt >= vss.eff_dt AND zz.report_dt < vss.disc_dt;*
Original query is giving 1162 records on Sql-Server , but this hive query
giving 46240 records.
I tried multiple workaround to get same logic , but didn't get same result
on hive.
Can you please help me on this to identify this issue and get query working
on hive with same result set.
Let me know you need other details.
Regards
Sanjiv Singh
Mob : +091 9990-447-339
RE: HIVE : LEFT OUTER JOIN + NON-EQUI JOIN
Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
Can you do two paths?
create temporary table tmp AS
SELECT zz.report_dt, vss.eff_dt, vss.disc_dt, … rest of columns
FROM rvsed11 zz
LEFT OUTER JOIN rvsed22 vss
ON zz.company_id = vss.company_id
AND zz.shares_ship_id = vss.shares_ship_id
;
select <needed columns> from #tmp
Where
report_dt >= eff_dt AND report_dt < disc_dt;
that should return correct results
HTH,
Mich
From: @Sanjiv Singh [mailto:sanjiv.is.on@gmail.com]
Sent: 09 January 2016 12:19
To: user@hive.apache.org; dev@hive.apache.org
Subject: HIVE : LEFT OUTER JOIN + NON-EQUI JOIN
Hi All,
I am having issue hive LEFT OUTER JOIN.
I had al table in sql-server. then used sqoop to migrate all tables on hive.
This is the original query from sql-server which contains non-equi LEFT OUTER JOIN. both table have cartesian data.
SELECT
vss.company_id,vss.shares_ship_id,vss.seatmap_cd,vss.cabin,vss.seat, vss.seat_loc_dscr, vss.ep_seat AS EPlus_Seat, vss.ep_win_seat, vss.ep_asle_seat, vss.ep_mid_seat, vss.em_win_seat,
vss.em_mid_seat,vss.em_asle_seat,vss.y_win_seat, vss.y_mid_seat, vss.y_asle_seat, vss.fj_win_seat, vss.fj_mid_seat, vss.fj_asle_seat,vss.exit_row, vss.bulkhead_row, vss.eff_dt, vss.disc_dt
FROM rvsed11 zz
LEFT OUTER JOIN rvsed22 vss
ON zz.company_id = vss.company_id
AND zz.shares_ship_id = vss.shares_ship_id
AND zz.report_dt >= vss.eff_dt
AND zz.report_dt < vss.disc_dt;
As we know that Nonequi joins are not working in hive ( Nonequi joins working in WHERE clause but we cannot use with LEFT OUTER JOIN).
See below hive query with noon-equi condition moved to where clause.
SELECT
vss.company_id,vss.shares_ship_id,vss.seatmap_cd,vss.cabin,vss.seat, vss.seat_loc_dscr, vss.ep_seat AS EPlus_Seat, vss.ep_win_seat, vss.ep_asle_seat, vss.ep_mid_seat, vss.em_win_seat,
vss.em_mid_seat,vss.em_asle_seat,vss.y_win_seat, vss.y_mid_seat, vss.y_asle_seat, vss.fj_win_seat, vss.fj_mid_seat, vss.fj_asle_seat,vss.exit_row, vss.bulkhead_row, vss.eff_dt, vss.disc_dt
FROM rvsed11 zz
LEFT OUTER JOIN rvsed22 vss
ON zz.company_id = vss.company_id
AND zz.shares_ship_id = vss.shares_ship_id
WHERE zz.report_dt >= vss.eff_dt
AND zz.report_dt < vss.disc_dt;
Original query is giving 1162 records on Sql-Server , but this hive query giving 46240 records.
I tried multiple workaround to get same logic , but didn't get same result on hive.
Can you please help me on this to identify this issue and get query working on hive with same result set.
Let me know you need other details.
Regards
Sanjiv Singh
Mob : +091 9990-447-339