You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Souvik Banerjee <so...@gmail.com> on 2012/12/28 00:16:31 UTC

HIVE left semi join multiple tables with columns from multiple table

Hi,

I am struggling with a problem described below.
Any help how to resolve this problem is highly appreciated.

I have got few tables the structure is over simplified for the sake of
describing the nature of the problem that I am facing.

*Table 1*
* *



ID

Value

1

V11

2

V12

3

V13



*Table 2*
* *



ID

Value

1

V21

2

V22

3

V23



*Table 3*
* *



ID_1

ID_2

Other_Column1

1

3

C1

2

1

C2

3

2

C3



I want to formulate a table which would look like

ID_1

ID_1_Value

ID_2

ID_2_value

Other_Column1

1

V11

3

V23

C1

2

V12

1

V21

C2

3

V13

2

V22

C3



I am facing problem with this.
I tried to LEFT SEMI JOIN in Hive.

I tried to do it in two steps (For 3 tables)
First step I wanted to do a LEFT SEMI JOIN with TABLE 1 and TABLE 3.
But the problem is that in LEFT SEMI JOIN you can not have columns from the
right table in the select clause. So after join my new table simply looks
like TABLE 3.

Can you help me how I can achieve this is HIVE.

Thanks and regards,
Souvik.

Re: HIVE left semi join multiple tables with columns from multiple table

Posted by Mark Grover <gr...@gmail.com>.
Souvik,
I am not sure I understand, you may want to play around with distinct
keyword if you want to join on unique values.
In any case, it looks like you've got it figured out already, so it's all
good:-)

Happy trails!
Mark

On Sat, Dec 29, 2012 at 2:19 PM, Souvik Banerjee
<so...@gmail.com>wrote:

> Thanks Mark for writing back to me.
> I did the same thing and still not getting the desired result.
> I was looking for the cause. I managed to get the desired result. Sharing
> my thought process and how I did it. Let me know your thoughts on this,
> that would be really nice to have. Also please share any better idea to
> achieve this result and feel free to point out if you think that my
> explanation is wrong.
> I think you noticed the fact that I want the value column for both the
> ID_1 and ID_2 and there values in these columns are not UNIQUE.
> It happened that I used to get more rows returned than the original no of
> rows in the second table due to the fact that there are duplicates in ID_1
> and ID_2 column.
> So first of all to get the Value column for both ID_1 and ID_2 I have to
> join twice and that too I can not have join on condition which have
> duplicates.
> So I created a new table out of table 2 with addition of a column having
> unique value for each row using HIVE UDF.
> Then I created 2 views each for ID_1 and ID_2 which contains the
> description by joining on the condition a.ID_1 = b.ID.
> Now I have two views and then I joined these two views on the condition
> that view1.uniqueid = view2.uniqueid.
> That gives me desired output.
>
> Looking forward to have your views.
>
> Thanks and regards,
> Souvik.
>
>
> On Fri, Dec 28, 2012 at 7:40 AM, Mark Grover <gr...@gmail.com>wrote:
>
>> Souvik,
>> In your new example, you need a RIGHT OUTER JOIN between table1 and
>> table2 (order matters - table1 on left, table2 on right) on the ID1 column.
>> Something like this (untested by me):
>> SELECT
>>    table1.*,
>>    table2.*
>> FROM
>>    table1
>>    RIGHT OUTER JOIN table2
>>    ON (table1.id=table2.id_1);
>>
>> Mark
>>
>>
>> On Thu, Dec 27, 2012 at 9:26 PM, Souvik Banerjee <
>> souvikbanerjee@gmail.com> wrote:
>>
>>> Thanks a lot Mark for your attention.
>>> But I think I cannot go for INNER join, the reason behind the fact being
>>> that I want all rows of Table3 irrespective of there is any row
>>> corresponding to that ID in Table 1 or Table 2.
>>> Probably I would have taken care of that while providing the example.
>>> The more refined example would be like below.
>>>
>>> Look forward for your help.
>>>
>>> Thanks and regards,
>>> Souvik.
>>>
>>> P.S. Dropped one table from the earlier example and Now we have table 1
>>> and table 2, we are looking for table 4. (Hope so it's more simple and it's
>>> exactly we need)
>>>
>>>
>>>
>>> *Table 1*
>>>
>>> ID
>>>
>>> Value
>>>
>>> 1
>>>
>>> V11
>>>
>>> 2
>>>
>>> V12
>>>
>>> 3
>>>
>>> V13
>>>
>>>
>>>
>>> *Table 2*
>>>
>>> ID_1
>>>
>>> ID_2
>>>
>>> Other_Column1
>>>
>>> 1
>>>
>>> 3
>>>
>>> C1
>>>
>>> 2
>>>
>>> 1
>>>
>>> C2
>>>
>>> 3
>>>
>>> 2
>>>
>>> C3
>>>
>>> 4
>>>
>>> 9
>>>
>>> C4
>>>
>>> 12
>>>
>>> 3
>>>
>>> C5
>>>
>>> 1
>>>
>>> 8
>>>
>>> C6
>>>
>>>
>>>
>>> I want to formulate a table which would look like (*Table 4*)
>>>
>>> ID_1
>>>
>>> ID_1_Value
>>>
>>> ID_2
>>>
>>> ID_2_value
>>>
>>> Other_Column1
>>>
>>> 1
>>>
>>> V11
>>>
>>> 3
>>>
>>> V13
>>>
>>> C1
>>>
>>> 2
>>>
>>> V12
>>>
>>> 1
>>>
>>> V11
>>>
>>> C2
>>>
>>> 3
>>>
>>> V13
>>>
>>> 2
>>>
>>> V12
>>>
>>> C3
>>>
>>> 4
>>>
>>> NULL / Empty
>>>
>>> 9
>>>
>>> NULL / Empty
>>>
>>> C4
>>>
>>> 12
>>>
>>> NULL / Empty
>>>
>>> 3
>>>
>>> V13
>>>
>>> C5
>>>
>>> 1
>>>
>>> V11
>>>
>>> 8
>>>
>>> NULL / Empty
>>>
>>> C6
>>>
>>>
>>> On Thu, Dec 27, 2012 at 9:06 PM, Mark Grover <
>>> grover.markgrover@gmail.com> wrote:
>>>
>>>> Souvik,
>>>> Let me begin by saying that simplifying the problem goes a long way in
>>>> helping us answer your question. You did it really nicely here, so thank
>>>> you for doing that.
>>>>
>>>> Why don't you use INNER JOIN instead of LEFT SEMI JOIN? You can express
>>>> the same query as INNER JOIN with no restrictions on what can be included
>>>> in the SELECT clause. LEFT SEMI JOIN just implements an "exists" like query
>>>> efficiently. If you want columns from the right table in your select list,
>>>> just do the regular (aka inefficient way) inner join.
>>>>
>>>> Of course, you can optimize inner joins as map joins, sorted merge join
>>>> or sorted merge bucketed joins depending on your use case.
>>>>
>>>> Mark
>>>>
>>>> On Thu, Dec 27, 2012 at 3:16 PM, Souvik Banerjee <
>>>> souvikbanerjee@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am struggling with a problem described below.
>>>>> Any help how to resolve this problem is highly appreciated.
>>>>>
>>>>> I have got few tables the structure is over simplified for the sake of
>>>>> describing the nature of the problem that I am facing.
>>>>>
>>>>> *Table 1*
>>>>> * *
>>>>>
>>>>>
>>>>>
>>>>> ID
>>>>>
>>>>> Value
>>>>>
>>>>> 1
>>>>>
>>>>> V11
>>>>>
>>>>> 2
>>>>>
>>>>> V12
>>>>>
>>>>> 3
>>>>>
>>>>> V13
>>>>>
>>>>>
>>>>>
>>>>> *Table 2*
>>>>> * *
>>>>>
>>>>>
>>>>>
>>>>> ID
>>>>>
>>>>> Value
>>>>>
>>>>> 1
>>>>>
>>>>> V21
>>>>>
>>>>> 2
>>>>>
>>>>> V22
>>>>>
>>>>> 3
>>>>>
>>>>> V23
>>>>>
>>>>>
>>>>>
>>>>> *Table 3*
>>>>> * *
>>>>>
>>>>>
>>>>>
>>>>> ID_1
>>>>>
>>>>> ID_2
>>>>>
>>>>> Other_Column1
>>>>>
>>>>> 1
>>>>>
>>>>> 3
>>>>>
>>>>> C1
>>>>>
>>>>> 2
>>>>>
>>>>> 1
>>>>>
>>>>> C2
>>>>>
>>>>> 3
>>>>>
>>>>> 2
>>>>>
>>>>> C3
>>>>>
>>>>>
>>>>>
>>>>> I want to formulate a table which would look like
>>>>>
>>>>> ID_1
>>>>>
>>>>> ID_1_Value
>>>>>
>>>>> ID_2
>>>>>
>>>>> ID_2_value
>>>>>
>>>>> Other_Column1
>>>>>
>>>>> 1
>>>>>
>>>>> V11
>>>>>
>>>>> 3
>>>>>
>>>>> V23
>>>>>
>>>>> C1
>>>>>
>>>>> 2
>>>>>
>>>>> V12
>>>>>
>>>>> 1
>>>>>
>>>>> V21
>>>>>
>>>>> C2
>>>>>
>>>>> 3
>>>>>
>>>>> V13
>>>>>
>>>>> 2
>>>>>
>>>>> V22
>>>>>
>>>>> C3
>>>>>
>>>>>
>>>>>
>>>>> I am facing problem with this.
>>>>> I tried to LEFT SEMI JOIN in Hive.
>>>>>
>>>>> I tried to do it in two steps (For 3 tables)
>>>>> First step I wanted to do a LEFT SEMI JOIN with TABLE 1 and TABLE 3.
>>>>> But the problem is that in LEFT SEMI JOIN you can not have columns
>>>>> from the right table in the select clause. So after join my new table
>>>>> simply looks like TABLE 3.
>>>>>
>>>>> Can you help me how I can achieve this is HIVE.
>>>>>
>>>>> Thanks and regards,
>>>>> Souvik.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: HIVE left semi join multiple tables with columns from multiple table

Posted by Souvik Banerjee <so...@gmail.com>.
Thanks Mark for writing back to me.
I did the same thing and still not getting the desired result.
I was looking for the cause. I managed to get the desired result. Sharing
my thought process and how I did it. Let me know your thoughts on this,
that would be really nice to have. Also please share any better idea to
achieve this result and feel free to point out if you think that my
explanation is wrong.
I think you noticed the fact that I want the value column for both the ID_1
and ID_2 and there values in these columns are not UNIQUE.
It happened that I used to get more rows returned than the original no of
rows in the second table due to the fact that there are duplicates in ID_1
and ID_2 column.
So first of all to get the Value column for both ID_1 and ID_2 I have to
join twice and that too I can not have join on condition which have
duplicates.
So I created a new table out of table 2 with addition of a column having
unique value for each row using HIVE UDF.
Then I created 2 views each for ID_1 and ID_2 which contains the
description by joining on the condition a.ID_1 = b.ID.
Now I have two views and then I joined these two views on the condition
that view1.uniqueid = view2.uniqueid.
That gives me desired output.

Looking forward to have your views.

Thanks and regards,
Souvik.

On Fri, Dec 28, 2012 at 7:40 AM, Mark Grover <gr...@gmail.com>wrote:

> Souvik,
> In your new example, you need a RIGHT OUTER JOIN between table1 and table2
> (order matters - table1 on left, table2 on right) on the ID1 column.
> Something like this (untested by me):
> SELECT
>    table1.*,
>    table2.*
> FROM
>    table1
>    RIGHT OUTER JOIN table2
>    ON (table1.id=table2.id_1);
>
> Mark
>
>
> On Thu, Dec 27, 2012 at 9:26 PM, Souvik Banerjee <souvikbanerjee@gmail.com
> > wrote:
>
>> Thanks a lot Mark for your attention.
>> But I think I cannot go for INNER join, the reason behind the fact being
>> that I want all rows of Table3 irrespective of there is any row
>> corresponding to that ID in Table 1 or Table 2.
>> Probably I would have taken care of that while providing the example.
>> The more refined example would be like below.
>>
>> Look forward for your help.
>>
>> Thanks and regards,
>> Souvik.
>>
>> P.S. Dropped one table from the earlier example and Now we have table 1
>> and table 2, we are looking for table 4. (Hope so it's more simple and it's
>> exactly we need)
>>
>>
>>
>> *Table 1*
>>
>> ID
>>
>> Value
>>
>> 1
>>
>> V11
>>
>> 2
>>
>> V12
>>
>> 3
>>
>> V13
>>
>>
>>
>> *Table 2*
>>
>> ID_1
>>
>> ID_2
>>
>> Other_Column1
>>
>> 1
>>
>> 3
>>
>> C1
>>
>> 2
>>
>> 1
>>
>> C2
>>
>> 3
>>
>> 2
>>
>> C3
>>
>> 4
>>
>> 9
>>
>> C4
>>
>> 12
>>
>> 3
>>
>> C5
>>
>> 1
>>
>> 8
>>
>> C6
>>
>>
>>
>> I want to formulate a table which would look like (*Table 4*)
>>
>> ID_1
>>
>> ID_1_Value
>>
>> ID_2
>>
>> ID_2_value
>>
>> Other_Column1
>>
>> 1
>>
>> V11
>>
>> 3
>>
>> V13
>>
>> C1
>>
>> 2
>>
>> V12
>>
>> 1
>>
>> V11
>>
>> C2
>>
>> 3
>>
>> V13
>>
>> 2
>>
>> V12
>>
>> C3
>>
>> 4
>>
>> NULL / Empty
>>
>> 9
>>
>> NULL / Empty
>>
>> C4
>>
>> 12
>>
>> NULL / Empty
>>
>> 3
>>
>> V13
>>
>> C5
>>
>> 1
>>
>> V11
>>
>> 8
>>
>> NULL / Empty
>>
>> C6
>>
>>
>> On Thu, Dec 27, 2012 at 9:06 PM, Mark Grover <grover.markgrover@gmail.com
>> > wrote:
>>
>>> Souvik,
>>> Let me begin by saying that simplifying the problem goes a long way in
>>> helping us answer your question. You did it really nicely here, so thank
>>> you for doing that.
>>>
>>> Why don't you use INNER JOIN instead of LEFT SEMI JOIN? You can express
>>> the same query as INNER JOIN with no restrictions on what can be included
>>> in the SELECT clause. LEFT SEMI JOIN just implements an "exists" like query
>>> efficiently. If you want columns from the right table in your select list,
>>> just do the regular (aka inefficient way) inner join.
>>>
>>> Of course, you can optimize inner joins as map joins, sorted merge join
>>> or sorted merge bucketed joins depending on your use case.
>>>
>>> Mark
>>>
>>> On Thu, Dec 27, 2012 at 3:16 PM, Souvik Banerjee <
>>> souvikbanerjee@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am struggling with a problem described below.
>>>> Any help how to resolve this problem is highly appreciated.
>>>>
>>>> I have got few tables the structure is over simplified for the sake of
>>>> describing the nature of the problem that I am facing.
>>>>
>>>> *Table 1*
>>>> * *
>>>>
>>>>
>>>>
>>>> ID
>>>>
>>>> Value
>>>>
>>>> 1
>>>>
>>>> V11
>>>>
>>>> 2
>>>>
>>>> V12
>>>>
>>>> 3
>>>>
>>>> V13
>>>>
>>>>
>>>>
>>>> *Table 2*
>>>> * *
>>>>
>>>>
>>>>
>>>> ID
>>>>
>>>> Value
>>>>
>>>> 1
>>>>
>>>> V21
>>>>
>>>> 2
>>>>
>>>> V22
>>>>
>>>> 3
>>>>
>>>> V23
>>>>
>>>>
>>>>
>>>> *Table 3*
>>>> * *
>>>>
>>>>
>>>>
>>>> ID_1
>>>>
>>>> ID_2
>>>>
>>>> Other_Column1
>>>>
>>>> 1
>>>>
>>>> 3
>>>>
>>>> C1
>>>>
>>>> 2
>>>>
>>>> 1
>>>>
>>>> C2
>>>>
>>>> 3
>>>>
>>>> 2
>>>>
>>>> C3
>>>>
>>>>
>>>>
>>>> I want to formulate a table which would look like
>>>>
>>>> ID_1
>>>>
>>>> ID_1_Value
>>>>
>>>> ID_2
>>>>
>>>> ID_2_value
>>>>
>>>> Other_Column1
>>>>
>>>> 1
>>>>
>>>> V11
>>>>
>>>> 3
>>>>
>>>> V23
>>>>
>>>> C1
>>>>
>>>> 2
>>>>
>>>> V12
>>>>
>>>> 1
>>>>
>>>> V21
>>>>
>>>> C2
>>>>
>>>> 3
>>>>
>>>> V13
>>>>
>>>> 2
>>>>
>>>> V22
>>>>
>>>> C3
>>>>
>>>>
>>>>
>>>> I am facing problem with this.
>>>> I tried to LEFT SEMI JOIN in Hive.
>>>>
>>>> I tried to do it in two steps (For 3 tables)
>>>> First step I wanted to do a LEFT SEMI JOIN with TABLE 1 and TABLE 3.
>>>> But the problem is that in LEFT SEMI JOIN you can not have columns from
>>>> the right table in the select clause. So after join my new table simply
>>>> looks like TABLE 3.
>>>>
>>>> Can you help me how I can achieve this is HIVE.
>>>>
>>>> Thanks and regards,
>>>> Souvik.
>>>>
>>>
>>>
>>
>

Re: HIVE left semi join multiple tables with columns from multiple table

Posted by Mark Grover <gr...@gmail.com>.
Souvik,
In your new example, you need a RIGHT OUTER JOIN between table1 and table2
(order matters - table1 on left, table2 on right) on the ID1 column.
Something like this (untested by me):
SELECT
   table1.*,
   table2.*
FROM
   table1
   RIGHT OUTER JOIN table2
   ON (table1.id=table2.id_1);

Mark

On Thu, Dec 27, 2012 at 9:26 PM, Souvik Banerjee
<so...@gmail.com>wrote:

> Thanks a lot Mark for your attention.
> But I think I cannot go for INNER join, the reason behind the fact being
> that I want all rows of Table3 irrespective of there is any row
> corresponding to that ID in Table 1 or Table 2.
> Probably I would have taken care of that while providing the example.
> The more refined example would be like below.
>
> Look forward for your help.
>
> Thanks and regards,
> Souvik.
>
> P.S. Dropped one table from the earlier example and Now we have table 1
> and table 2, we are looking for table 4. (Hope so it's more simple and it's
> exactly we need)
>
>
>
> *Table 1*
>
> ID
>
> Value
>
> 1
>
> V11
>
> 2
>
> V12
>
> 3
>
> V13
>
>
>
> *Table 2*
>
> ID_1
>
> ID_2
>
> Other_Column1
>
> 1
>
> 3
>
> C1
>
> 2
>
> 1
>
> C2
>
> 3
>
> 2
>
> C3
>
> 4
>
> 9
>
> C4
>
> 12
>
> 3
>
> C5
>
> 1
>
> 8
>
> C6
>
>
>
> I want to formulate a table which would look like (*Table 4*)
>
> ID_1
>
> ID_1_Value
>
> ID_2
>
> ID_2_value
>
> Other_Column1
>
> 1
>
> V11
>
> 3
>
> V13
>
> C1
>
> 2
>
> V12
>
> 1
>
> V11
>
> C2
>
> 3
>
> V13
>
> 2
>
> V12
>
> C3
>
> 4
>
> NULL / Empty
>
> 9
>
> NULL / Empty
>
> C4
>
> 12
>
> NULL / Empty
>
> 3
>
> V13
>
> C5
>
> 1
>
> V11
>
> 8
>
> NULL / Empty
>
> C6
>
>
> On Thu, Dec 27, 2012 at 9:06 PM, Mark Grover <gr...@gmail.com>wrote:
>
>> Souvik,
>> Let me begin by saying that simplifying the problem goes a long way in
>> helping us answer your question. You did it really nicely here, so thank
>> you for doing that.
>>
>> Why don't you use INNER JOIN instead of LEFT SEMI JOIN? You can express
>> the same query as INNER JOIN with no restrictions on what can be included
>> in the SELECT clause. LEFT SEMI JOIN just implements an "exists" like query
>> efficiently. If you want columns from the right table in your select list,
>> just do the regular (aka inefficient way) inner join.
>>
>> Of course, you can optimize inner joins as map joins, sorted merge join
>> or sorted merge bucketed joins depending on your use case.
>>
>> Mark
>>
>> On Thu, Dec 27, 2012 at 3:16 PM, Souvik Banerjee <
>> souvikbanerjee@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am struggling with a problem described below.
>>> Any help how to resolve this problem is highly appreciated.
>>>
>>> I have got few tables the structure is over simplified for the sake of
>>> describing the nature of the problem that I am facing.
>>>
>>> *Table 1*
>>> * *
>>>
>>>
>>>
>>> ID
>>>
>>> Value
>>>
>>> 1
>>>
>>> V11
>>>
>>> 2
>>>
>>> V12
>>>
>>> 3
>>>
>>> V13
>>>
>>>
>>>
>>> *Table 2*
>>> * *
>>>
>>>
>>>
>>> ID
>>>
>>> Value
>>>
>>> 1
>>>
>>> V21
>>>
>>> 2
>>>
>>> V22
>>>
>>> 3
>>>
>>> V23
>>>
>>>
>>>
>>> *Table 3*
>>> * *
>>>
>>>
>>>
>>> ID_1
>>>
>>> ID_2
>>>
>>> Other_Column1
>>>
>>> 1
>>>
>>> 3
>>>
>>> C1
>>>
>>> 2
>>>
>>> 1
>>>
>>> C2
>>>
>>> 3
>>>
>>> 2
>>>
>>> C3
>>>
>>>
>>>
>>> I want to formulate a table which would look like
>>>
>>> ID_1
>>>
>>> ID_1_Value
>>>
>>> ID_2
>>>
>>> ID_2_value
>>>
>>> Other_Column1
>>>
>>> 1
>>>
>>> V11
>>>
>>> 3
>>>
>>> V23
>>>
>>> C1
>>>
>>> 2
>>>
>>> V12
>>>
>>> 1
>>>
>>> V21
>>>
>>> C2
>>>
>>> 3
>>>
>>> V13
>>>
>>> 2
>>>
>>> V22
>>>
>>> C3
>>>
>>>
>>>
>>> I am facing problem with this.
>>> I tried to LEFT SEMI JOIN in Hive.
>>>
>>> I tried to do it in two steps (For 3 tables)
>>> First step I wanted to do a LEFT SEMI JOIN with TABLE 1 and TABLE 3.
>>> But the problem is that in LEFT SEMI JOIN you can not have columns from
>>> the right table in the select clause. So after join my new table simply
>>> looks like TABLE 3.
>>>
>>> Can you help me how I can achieve this is HIVE.
>>>
>>> Thanks and regards,
>>> Souvik.
>>>
>>
>>
>

Re: HIVE left semi join multiple tables with columns from multiple table

Posted by Souvik Banerjee <so...@gmail.com>.
Thanks a lot Mark for your attention.
But I think I cannot go for INNER join, the reason behind the fact being
that I want all rows of Table3 irrespective of there is any row
corresponding to that ID in Table 1 or Table 2.
Probably I would have taken care of that while providing the example.
The more refined example would be like below.

Look forward for your help.

Thanks and regards,
Souvik.

P.S. Dropped one table from the earlier example and Now we have table 1 and
table 2, we are looking for table 4. (Hope so it's more simple and it's
exactly we need)



*Table 1*

ID

Value

1

V11

2

V12

3

V13



*Table 2*

ID_1

ID_2

Other_Column1

1

3

C1

2

1

C2

3

2

C3

4

9

C4

12

3

C5

1

8

C6



I want to formulate a table which would look like (*Table 4*)

ID_1

ID_1_Value

ID_2

ID_2_value

Other_Column1

1

V11

3

V13

C1

2

V12

1

V11

C2

3

V13

2

V12

C3

4

NULL / Empty

9

NULL / Empty

C4

12

NULL / Empty

3

V13

C5

1

V11

8

NULL / Empty

C6


On Thu, Dec 27, 2012 at 9:06 PM, Mark Grover <gr...@gmail.com>wrote:

> Souvik,
> Let me begin by saying that simplifying the problem goes a long way in
> helping us answer your question. You did it really nicely here, so thank
> you for doing that.
>
> Why don't you use INNER JOIN instead of LEFT SEMI JOIN? You can express
> the same query as INNER JOIN with no restrictions on what can be included
> in the SELECT clause. LEFT SEMI JOIN just implements an "exists" like query
> efficiently. If you want columns from the right table in your select list,
> just do the regular (aka inefficient way) inner join.
>
> Of course, you can optimize inner joins as map joins, sorted merge join or
> sorted merge bucketed joins depending on your use case.
>
> Mark
>
> On Thu, Dec 27, 2012 at 3:16 PM, Souvik Banerjee <souvikbanerjee@gmail.com
> > wrote:
>
>> Hi,
>>
>> I am struggling with a problem described below.
>> Any help how to resolve this problem is highly appreciated.
>>
>> I have got few tables the structure is over simplified for the sake of
>> describing the nature of the problem that I am facing.
>>
>> *Table 1*
>> * *
>>
>>
>>
>> ID
>>
>> Value
>>
>> 1
>>
>> V11
>>
>> 2
>>
>> V12
>>
>> 3
>>
>> V13
>>
>>
>>
>> *Table 2*
>> * *
>>
>>
>>
>> ID
>>
>> Value
>>
>> 1
>>
>> V21
>>
>> 2
>>
>> V22
>>
>> 3
>>
>> V23
>>
>>
>>
>> *Table 3*
>> * *
>>
>>
>>
>> ID_1
>>
>> ID_2
>>
>> Other_Column1
>>
>> 1
>>
>> 3
>>
>> C1
>>
>> 2
>>
>> 1
>>
>> C2
>>
>> 3
>>
>> 2
>>
>> C3
>>
>>
>>
>> I want to formulate a table which would look like
>>
>> ID_1
>>
>> ID_1_Value
>>
>> ID_2
>>
>> ID_2_value
>>
>> Other_Column1
>>
>> 1
>>
>> V11
>>
>> 3
>>
>> V23
>>
>> C1
>>
>> 2
>>
>> V12
>>
>> 1
>>
>> V21
>>
>> C2
>>
>> 3
>>
>> V13
>>
>> 2
>>
>> V22
>>
>> C3
>>
>>
>>
>> I am facing problem with this.
>> I tried to LEFT SEMI JOIN in Hive.
>>
>> I tried to do it in two steps (For 3 tables)
>> First step I wanted to do a LEFT SEMI JOIN with TABLE 1 and TABLE 3.
>> But the problem is that in LEFT SEMI JOIN you can not have columns from
>> the right table in the select clause. So after join my new table simply
>> looks like TABLE 3.
>>
>> Can you help me how I can achieve this is HIVE.
>>
>> Thanks and regards,
>> Souvik.
>>
>
>

Re: HIVE left semi join multiple tables with columns from multiple table

Posted by Mark Grover <gr...@gmail.com>.
Souvik,
Let me begin by saying that simplifying the problem goes a long way in
helping us answer your question. You did it really nicely here, so thank
you for doing that.

Why don't you use INNER JOIN instead of LEFT SEMI JOIN? You can express the
same query as INNER JOIN with no restrictions on what can be included in
the SELECT clause. LEFT SEMI JOIN just implements an "exists" like query
efficiently. If you want columns from the right table in your select list,
just do the regular (aka inefficient way) inner join.

Of course, you can optimize inner joins as map joins, sorted merge join or
sorted merge bucketed joins depending on your use case.

Mark

On Thu, Dec 27, 2012 at 3:16 PM, Souvik Banerjee
<so...@gmail.com>wrote:

> Hi,
>
> I am struggling with a problem described below.
> Any help how to resolve this problem is highly appreciated.
>
> I have got few tables the structure is over simplified for the sake of
> describing the nature of the problem that I am facing.
>
> *Table 1*
> * *
>
>
>
> ID
>
> Value
>
> 1
>
> V11
>
> 2
>
> V12
>
> 3
>
> V13
>
>
>
> *Table 2*
> * *
>
>
>
> ID
>
> Value
>
> 1
>
> V21
>
> 2
>
> V22
>
> 3
>
> V23
>
>
>
> *Table 3*
> * *
>
>
>
> ID_1
>
> ID_2
>
> Other_Column1
>
> 1
>
> 3
>
> C1
>
> 2
>
> 1
>
> C2
>
> 3
>
> 2
>
> C3
>
>
>
> I want to formulate a table which would look like
>
> ID_1
>
> ID_1_Value
>
> ID_2
>
> ID_2_value
>
> Other_Column1
>
> 1
>
> V11
>
> 3
>
> V23
>
> C1
>
> 2
>
> V12
>
> 1
>
> V21
>
> C2
>
> 3
>
> V13
>
> 2
>
> V22
>
> C3
>
>
>
> I am facing problem with this.
> I tried to LEFT SEMI JOIN in Hive.
>
> I tried to do it in two steps (For 3 tables)
> First step I wanted to do a LEFT SEMI JOIN with TABLE 1 and TABLE 3.
> But the problem is that in LEFT SEMI JOIN you can not have columns from
> the right table in the select clause. So after join my new table simply
> looks like TABLE 3.
>
> Can you help me how I can achieve this is HIVE.
>
> Thanks and regards,
> Souvik.
>