You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Periya.Data" <pe...@gmail.com> on 2012/12/03 18:26:10 UTC

cross join

Hi Hive users,
    I have Hive CDH - 0.7.1. I want to know if I can do cross-join with
this version. I am aware that Cross-Join is fixed in version 0.10  (see:
https://issues.apache.org/jira/browse/HIVE-2549 ). Is there a way to do it
in earlier version? Here is a simple example I am trying:

Your suggestions are much appreciated.

bash-3.2$ *hive -e "select * from tbl_A;"*
Hive history
file=/tmp/ttkera/hive_job_log_ttkera_201212031702_1010080644.txt
OK
123
456
789
Time taken: 4.108 seconds
bash-3.2$
bash-3.2$
bash-3.2$* hive -e "select * from tbl_B;"*
Hive history
file=/tmp/ttkera/hive_job_log_ttkera_201212031703_1024384290.txt
OK
A
B
C
D
Time taken: 1.058 seconds
bash-3.2$


[from Ashish Thusoo's earlier suggestion:]

bash-3.2$ *hive -e "select tbl_A.*, tbl_B.* from tbl_A join tbl_B on (1 =
1);"*
Hive history file=/tmp/ttkera/hive_job_log_ttkera_201212031704_162672418.txt
FAILED: Hive Internal Error: java.lang.NullPointerException(null)
java.lang.NullPointerException
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:1353)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.pushJoinFilters(SemanticAnalyzer.java:4727)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6013)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6606)
        at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:485)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)

bash-3.2$
bash-3.2$* hive -e "select tbl_A.*, tbl_B.* from tbl_A cross join tbl_B;"*
Hive history file=/tmp/ttkera/hive_job_log_ttkera_201212031723_379116546.txt
FAILED: Parse Error: line 1:35 mismatched input 'cross' expecting EOF near
'tbl_A'

bash-3.2$

bash-3.2$ *hive -e "select tbl_A.*, tbl_B.* from tbl_A join tbl_B;"*
Hive history
file=/tmp/ttkera/hive_job_log_ttkera_201212031703_1723109044.txt
FAILED: Hive Internal Error: java.lang.NullPointerException(null)
java.lang.NullPointerException
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinReduceSinkChild(SemanticAnalyzer.java:4443)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinOperator(SemanticAnalyzer.java:4548)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinPlan(SemanticAnalyzer.java:4707)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6014)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6606)
        at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:485)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)

Re: cross join

Posted by Mark Grover <gr...@gmail.com>.
Periya,
I am using Hive-0.8.1 and was able to get joins to work with:
1. No on clause
2. "on (true)" as the on clause to work.

I am not entirely sure why you are getting exceptions with the above
queries. Perhaps, there were some bugs in 0.7.1 that got resolved in 0.8.1?

As far as workarounds go, you could try having each table in a sub-query
and have an extra virtual column ("1 as one") and join on that virtual
column.

Mark



On Mon, Dec 3, 2012 at 9:26 AM, Periya.Data <pe...@gmail.com> wrote:

> Hi Hive users,
>     I have Hive CDH - 0.7.1. I want to know if I can do cross-join with
> this version. I am aware that Cross-Join is fixed in version 0.10  (see:
> https://issues.apache.org/jira/browse/HIVE-2549 ). Is there a way to do
> it in earlier version? Here is a simple example I am trying:
>
> Your suggestions are much appreciated.
>
> bash-3.2$ *hive -e "select * from tbl_A;"*
> Hive history
> file=/tmp/ttkera/hive_job_log_ttkera_201212031702_1010080644.txt
> OK
> 123
> 456
> 789
> Time taken: 4.108 seconds
> bash-3.2$
> bash-3.2$
> bash-3.2$* hive -e "select * from tbl_B;"*
> Hive history
> file=/tmp/ttkera/hive_job_log_ttkera_201212031703_1024384290.txt
> OK
> A
> B
> C
> D
> Time taken: 1.058 seconds
> bash-3.2$
>
>
> [from Ashish Thusoo's earlier suggestion:]
>
> bash-3.2$ *hive -e "select tbl_A.*, tbl_B.* from tbl_A join tbl_B on (1 =
> 1);"*
> Hive history
> file=/tmp/ttkera/hive_job_log_ttkera_201212031704_162672418.txt
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:1353)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.pushJoinFilters(SemanticAnalyzer.java:4727)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6013)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6606)
>         at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:485)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>
> bash-3.2$
> bash-3.2$* hive -e "select tbl_A.*, tbl_B.* from tbl_A cross join tbl_B;"*
> Hive history
> file=/tmp/ttkera/hive_job_log_ttkera_201212031723_379116546.txt
> FAILED: Parse Error: line 1:35 mismatched input 'cross' expecting EOF near
> 'tbl_A'
>
> bash-3.2$
>
> bash-3.2$ *hive -e "select tbl_A.*, tbl_B.* from tbl_A join tbl_B;"*
> Hive history
> file=/tmp/ttkera/hive_job_log_ttkera_201212031703_1723109044.txt
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinReduceSinkChild(SemanticAnalyzer.java:4443)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinOperator(SemanticAnalyzer.java:4548)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinPlan(SemanticAnalyzer.java:4707)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6014)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6606)
>         at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:485)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>
>
>  --
>
>
>
>

Re: cross join

Posted by "Periya.Data" <pe...@gmail.com>.
I followed the suggestion from here:
http://mail-archives.apache.org/mod_mbox/hive-user/201105.mbox/%3C54123389-4309-43F9-978D-AD03FCE86BDA@fb.com%3E

added a new column in each of the table, filled them with 1's and joined on
those columns. I guess I have to work with this for now...until my company
upgrades to 0.10.

any other suggestions..without having to add columns to the tables?

Thanks..
PD


On Mon, Dec 3, 2012 at 9:26 AM, Periya.Data <pe...@gmail.com> wrote:

> Hi Hive users,
>     I have Hive CDH - 0.7.1. I want to know if I can do cross-join with
> this version. I am aware that Cross-Join is fixed in version 0.10  (see:
> https://issues.apache.org/jira/browse/HIVE-2549 ). Is there a way to do
> it in earlier version? Here is a simple example I am trying:
>
> Your suggestions are much appreciated.
>
> bash-3.2$ *hive -e "select * from tbl_A;"*
> Hive history
> file=/tmp/ttkera/hive_job_log_ttkera_201212031702_1010080644.txt
> OK
> 123
> 456
> 789
> Time taken: 4.108 seconds
> bash-3.2$
> bash-3.2$
> bash-3.2$* hive -e "select * from tbl_B;"*
> Hive history
> file=/tmp/ttkera/hive_job_log_ttkera_201212031703_1024384290.txt
> OK
> A
> B
> C
> D
> Time taken: 1.058 seconds
> bash-3.2$
>
>
> [from Ashish Thusoo's earlier suggestion:]
>
> bash-3.2$ *hive -e "select tbl_A.*, tbl_B.* from tbl_A join tbl_B on (1 =
> 1);"*
> Hive history
> file=/tmp/ttkera/hive_job_log_ttkera_201212031704_162672418.txt
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:1353)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.pushJoinFilters(SemanticAnalyzer.java:4727)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6013)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6606)
>         at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:485)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>
> bash-3.2$
> bash-3.2$* hive -e "select tbl_A.*, tbl_B.* from tbl_A cross join tbl_B;"*
> Hive history
> file=/tmp/ttkera/hive_job_log_ttkera_201212031723_379116546.txt
> FAILED: Parse Error: line 1:35 mismatched input 'cross' expecting EOF near
> 'tbl_A'
>
> bash-3.2$
>
> bash-3.2$ *hive -e "select tbl_A.*, tbl_B.* from tbl_A join tbl_B;"*
> Hive history
> file=/tmp/ttkera/hive_job_log_ttkera_201212031703_1723109044.txt
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinReduceSinkChild(SemanticAnalyzer.java:4443)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinOperator(SemanticAnalyzer.java:4548)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinPlan(SemanticAnalyzer.java:4707)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6014)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6606)
>         at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:485)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>
>
>