You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Lunagariya, Dhaval " <dh...@citi.com> on 2016/06/08 12:22:11 UTC

Need Your Inputs For Below Scenario

Hey folks,

Need your help.

Input Table1:

Column1

Column2

Column3

Column4

Column5

Column6

Column7

A

B1

B2

B3(NULL)

B4

B5

B6


Input Table2:
Column1

Column2

B1

D1

B2

D2

B3

D3

B4

D4

B5

D5

B6

D6



Output:
Column1

Column2

Column3

A

B1

D1

A

B2

D2

A

B4

D4

A

B5

D5

A

B6

D6


Here B3 is skipped because B3 is NULL.

What is the efficient way to get above result using Hive?



Regards,
Dhaval


RE: Need Your Inputs For Below Scenario

Posted by "Markovitz, Dudu" <dm...@paypal.com>.
Explode + joins

----------------------------------------------------------------------------------------------------
--  bash
----------------------------------------------------------------------------------------------------

mkdir t1
mkdir t2

cat>t1/data.txt
A      B1     B2            B4     B5     B6

cat>t2/data.txt
B1     D1
B2     D2
B3     D3
B4     D4
B5     D5
B6     D6

hdfs dfs -put t1 t2 /tmp

----------------------------------------------------------------------------------------------------
--  hive
----------------------------------------------------------------------------------------------------

create external table t1
(
    Column1     string
   ,Column2     string
   ,Column3     string
   ,Column4     string
   ,Column5     string
   ,Column6     string
   ,Column7     string
)
    row format delimited
    fields terminated by '\t'
    location '/tmp/t1'
;

create external table t2
(
    Column1     string
   ,Column2     string
)
    row format delimited
    fields terminated by '\t'
    location '/tmp/t2'
;

Theoretically I would have written the query like this -

select          t1.Column1
               ,t1_unpivot.val
               ,t2.Column2

from                            t1

                lateral view    explode (array(Column2,Column3,Column4,Column5,Column6,Column7)) t1_unpivot as val

                join            t2

                on              t2.Column1  =
                                t1_unpivot.val
;

Unfortunately, this syntax is not supported

FAILED: SemanticException [Error 10085]: Line 7:32 JOIN with a LATERAL VIEW is not supported 'val'


As a work-around I'm nesting the "lateral view'


select          t1.Column1
               ,t1.val
               ,t2.Column2

from                           (select          t1.Column1
                                               ,t1_unpivot.val

                                from                            t1

                                                lateral view    explode (array(Column2,Column3,Column4,Column5,Column6,Column7)) t1_unpivot as val
                                )
                                as t1

                join            t2

                on              t2.Column1  =
                                t1.val
;

A      B1     D1
A      B2     D2
A      B4     D4
A      B5     D5
A      B6     D6

From: Lunagariya, Dhaval [mailto:dhaval.lunagariya@citi.com]
Sent: Wednesday, June 08, 2016 6:25 PM
To: 'user@hive.apache.org' <us...@hive.apache.org>
Cc: 'er.dcpatel@gmail.com' <er...@gmail.com>
Subject: RE: Need Your Inputs For Below Scenario

Here Table2 is very large table and contains lakhs of rows.

From: Lunagariya, Dhaval [CCC-OT]
Sent: Wednesday, June 08, 2016 5:52 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Need Your Inputs For Below Scenario

Hey folks,

Need your help.

Input Table1:

Column1

Column2

Column3

Column4

Column5

Column6

Column7

A

B1

B2

B3(NULL)

B4

B5

B6



Input Table2:
Column1

Column2

B1

D1

B2

D2

B3

D3

B4

D4

B5

D5

B6

D6



Output:
Column1

Column2

Column3

A

B1

D1

A

B2

D2

A

B4

D4

A

B5

D5

A

B6

D6


Here B3 is skipped because B3 is NULL.

What is the efficient way to get above result using Hive?



Regards,
Dhaval


RE: Need Your Inputs For Below Scenario

Posted by "Lunagariya, Dhaval " <dh...@citi.com>.
Here Table2 is very large table and contains lakhs of rows.

From: Lunagariya, Dhaval [CCC-OT]
Sent: Wednesday, June 08, 2016 5:52 PM
To: user@hive.apache.org
Subject: Need Your Inputs For Below Scenario

Hey folks,

Need your help.

Input Table1:

Column1

Column2

Column3

Column4

Column5

Column6

Column7

A

B1

B2

B3(NULL)

B4

B5

B6



Input Table2:
Column1

Column2

B1

D1

B2

D2

B3

D3

B4

D4

B5

D5

B6

D6



Output:
Column1

Column2

Column3

A

B1

D1

A

B2

D2

A

B4

D4

A

B5

D5

A

B6

D6


Here B3 is skipped because B3 is NULL.

What is the efficient way to get above result using Hive?



Regards,
Dhaval