You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by Bing Li <sm...@gmail.com> on 2012/01/16 11:06:00 UTC

Hive JOIN fails if SELECT statement contains fields from the first table.

1. I create two Hive table:
Hive> CREATE EXTERNAL TABLE student_details (studentid INT,studentname
STRING,age INT,gpa FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE LOCATION  ‘/home/biadmin/hivetbl';

Hive>CREATE EXTERNAL TABLE student_score(studentid INT, classid INT,score
FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE
LOCATION '/home/biadmin/hivetbl';

2. Load data
HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_details.txt'
OVERWRITE INTO TABLE student_details;

HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_score.txt'
OVERWRITE INTO TABLE student_score;

3. Run inner join
Hive> SELECT a.studentid,a.studentname,a.age,b.classid,b.score,c.classname
FROM student_details a JOIN student_score b ON (a.studentid = b.studentid);

Result:
There are the following exception:
cannot find field studentname from [0:studentid, 1:classid, 2:score]

[My Question]: studentname is a field of the table student_details (The
first table), why search it in the table student_score(the second table)?

log is like that;
... ...
2012-01-15 23:24:41,727 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201201152221_0014_m_000000_3: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row {"studentid":106,"classid":null,"score":635.0}
        at
org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row {"studentid":106,"classid":null,"score":635.0}
        at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
        at
org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
        ... 4 more
Caused by: java.lang.RuntimeException: cannot find field studentname from
[0:studentid, 1:classid, 2:score]
        at
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:345)
        at
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)
        at
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
        at
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:896)
        at
org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:922)
        at
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:200)
        at
org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
        at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
        at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
        at
org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
        at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
        at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
        ... 5 more

large sql file creating large num of columns

Posted by ameet chaubal <am...@yahoo.com>.

Hi All,

I have a SQL file of size 30mb which is a single create table statement with about 800,000 columns, hence the size. 

I am trying to execute it using hive -f 
<file>. Initially, hive ran the command with 256mb heap size and 
gave me an OOM error. I increased the heap size using export 
HADOOP_HEAPSIZE to 1 gb and eventually 2gb which made the OOM error go 
away. However, the hive command ran for 5 hours without actually 
creating the table. The JVM was running.
However,
1. running a strace on the process showed that it was stuck on a futex call.
2. I am using mysql for metastore and there were no rows added to either TBLS or COLUMNS table.

Question.
1. can hive do this create table of 800k columns from a sql file of 30mb?
2. if theoretically possible, what could be happening that's taking it over 5 hours and still not succeeding?
Running it with debug, it spews the following,
---snip ----

stored as textfile location '<myfile>'
12/01/16 11:28:54 INFO parse.ParseDriver: Parse Completed
12/01/16 11:28:54 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
12/01/16 11:28:54 INFO parse.SemanticAnalyzer: Creating table my_table position=22
----

and it's stuck there at SemanticAnalyzer...

thanks for any insight.

sincerely,

ameet

Re: large sql file creating large num of columns

Posted by ameet chaubal <am...@yahoo.com>.

thanks,

Running it with debug, it spews the following,
stored as textfile location '<myfile>'
12/01/16 11:28:54 INFO parse.ParseDriver: Parse Completed
12/01/16 11:28:54 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
12/01/16 11:28:54 INFO parse.SemanticAnalyzer: Creating table my_table position=22

and it's waiting there at SemanticAnalyzer...

Sincerely,

Ameet

________________________________
 From: Bejoy Ks <be...@yahoo.com>
To: "user@hive.apache.org" <us...@hive.apache.org>; Edward Capriolo <ed...@gmail.com>; ameet chaubal <am...@yahoo.com> 
Sent: Monday, January 16, 2012 10:24 AM
Subject: Re: large sql file creating large num of columns

Hey Ameet
       Please find some pointers inline.

All that hive is supposed to do is to load the definition into mysql, right? 
[Bejoy] Yes you are right

Are you suggesting that it's reading the datafile in HDFS?
[Bejoy] AFAIK it won't do that at the time of table creation. Just meta data entries happen at this stage.

 That should not be happening since the "external table" does not need the data to be present, right? 

[Bejoy]  Again your understanding is right.

I can't give you much of a hint on why the query takes 5 hrs cos I've never tried out such large number of columns. What you can do at this point is enable DEBUG logging in hive and try to get more some stack trace and see whether it is some issue with parsing the DDL or while making data base calls for storing metadata.

Regards
Bejoy.K.S

________________________________
 From: ameet chaubal <am...@yahoo.com>
To: Edward Capriolo <ed...@gmail.com>; "user@hive.apache.org" <us...@hive.apache.org> 
Sent: Monday, January 16, 2012 8:44 PM
Subject: Re: large sql file creating large num of columns

thanks,

this is an external table; so at the DDL stage, there is no data loading that is happening. All that hive is supposed to do is to load the definition into mysql, right? Are you suggesting that it's reading the datafile in HDFS? That should not be happening since the "external table" does not need the data to be present, right?

Sincerely,

Ameet

________________________________
 From: Edward Capriolo <ed...@gmail.com>
To: user@hive.apache.org; ameet chaubal <am...@yahoo.com> 
Sent: Monday, January 16, 2012 10:06 AM
Subject: Re: large sql file creating large num of columns

I highly doubt this will work. I think that many things in hadoop and hive will try to buffer an entire row so even if you make it past the metastore I do not think it will be of any use. 

On Mon, Jan 16, 2012 at 9:42 AM, ameet chaubal <am...@yahoo.com> wrote:

Hi All,
>
>
>I have a SQL file of size 30mb which is a single create table statement with about 800,000 columns, hence the size. 
>
>I am trying to execute it using hive -f <file>. Initially, hive ran the command with 256mb heap size and gave me an OOM error. I increased the heap size using export HADOOP_HEAPSIZE to 1 gb and eventually 2gb which made the OOM error go away. However, the hive command ran for 5 hours without actually creating the table. The JVM was running.
>However,
>1. running a strace on the process showed that it was stuck on a futex call.
>2. I am using mysql for metastore and there were no rows added to either TBLS or COLUMNS table.
>
>
>Question.
>1. can hive do this create table of 800k columns from a sql file of 30mb?
>2. if theoretically possible, what could be happening that's taking it over 5 hours and still not succeeding?
>
>
>any insight is much appreciated.
> 
>Sincerely,
>
>
>Ameet
>

Re: large sql file creating large num of columns

Posted by ameet chaubal <am...@yahoo.com>.

thanks,

Running it with debug, it spews the following,
stored as textfile location '<myfile>'
12/01/16 11:28:54 INFO parse.ParseDriver: Parse Completed
12/01/16 11:28:54 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
12/01/16 11:28:54 INFO parse.SemanticAnalyzer: Creating table my_table position=22

and it's waiting there at SemanticAnalyzer...

Sincerely,

Ameet

________________________________
 From: Bejoy Ks <be...@yahoo.com>
To: "user@hive.apache.org" <us...@hive.apache.org>; Edward Capriolo <ed...@gmail.com>; ameet chaubal <am...@yahoo.com> 
Sent: Monday, January 16, 2012 10:24 AM
Subject: Re: large sql file creating large num of columns

Hey Ameet
       Please find some pointers inline.

All that hive is supposed to do is to load the definition into mysql, right? 
[Bejoy] Yes you are right

Are you suggesting that it's reading the datafile in HDFS?
[Bejoy] AFAIK it won't do that at the time of table creation. Just meta data entries happen at this stage.

 That should not be happening since the "external table" does not need the data to be present, right? 

[Bejoy]  Again your understanding is right.

I can't give you much of a hint on why the query takes 5 hrs cos I've never tried out such large number of columns. What you can do at this point is enable DEBUG logging in hive and try to get more some stack trace and see whether it is some issue with parsing the DDL or while making data base calls for storing metadata.

Regards
Bejoy.K.S

________________________________
 From: ameet chaubal <am...@yahoo.com>
To: Edward Capriolo <ed...@gmail.com>; "user@hive.apache.org" <us...@hive.apache.org> 
Sent: Monday, January 16, 2012 8:44 PM
Subject: Re: large sql file creating large num of columns

thanks,

this is an external table; so at the DDL stage, there is no data loading that is happening. All that hive is supposed to do is to load the definition into mysql, right? Are you suggesting that it's reading the datafile in HDFS? That should not be happening since the "external table" does not need the data to be present, right?

Sincerely,

Ameet

________________________________
 From: Edward Capriolo <ed...@gmail.com>
To: user@hive.apache.org; ameet chaubal <am...@yahoo.com> 
Sent: Monday, January 16, 2012 10:06 AM
Subject: Re: large sql file creating large num of columns

I highly doubt this will work. I think that many things in hadoop and hive will try to buffer an entire row so even if you make it past the metastore I do not think it will be of any use. 

On Mon, Jan 16, 2012 at 9:42 AM, ameet chaubal <am...@yahoo.com> wrote:

Hi All,
>
>
>I have a SQL file of size 30mb which is a single create table statement with about 800,000 columns, hence the size. 
>
>I am trying to execute it using hive -f <file>. Initially, hive ran the command with 256mb heap size and gave me an OOM error. I increased the heap size using export HADOOP_HEAPSIZE to 1 gb and eventually 2gb which made the OOM error go away. However, the hive command ran for 5 hours without actually creating the table. The JVM was running.
>However,
>1. running a strace on the process showed that it was stuck on a futex call.
>2. I am using mysql for metastore and there were no rows added to either TBLS or COLUMNS table.
>
>
>Question.
>1. can hive do this create table of 800k columns from a sql file of 30mb?
>2. if theoretically possible, what could be happening that's taking it over 5 hours and still not succeeding?
>
>
>any insight is much appreciated.
> 
>Sincerely,
>
>
>Ameet
>

Re: large sql file creating large num of columns

Posted by Bejoy Ks <be...@yahoo.com>.

Hey Ameet
       Please find some pointers inline.

All that hive is supposed to do is to load the definition into mysql, right? 
[Bejoy] Yes you are right

Are you suggesting that it's reading the datafile in HDFS?
[Bejoy] AFAIK it won't do that at the time of table creation. Just meta data entries happen at this stage.

 That should not be happening since the "external table" does not need the data to be present, right? 

[Bejoy]  Again your understanding is right.

I can't give you much of a hint on why the query takes 5 hrs cos I've never tried out such large number of columns. What you can do at this point is enable DEBUG logging in hive and try to get more some stack trace and see whether it is some issue with parsing the DDL or while making data base calls for storing metadata.

Regards
Bejoy.K.S

________________________________
 From: ameet chaubal <am...@yahoo.com>
To: Edward Capriolo <ed...@gmail.com>; "user@hive.apache.org" <us...@hive.apache.org> 
Sent: Monday, January 16, 2012 8:44 PM
Subject: Re: large sql file creating large num of columns

thanks,

this is an external table; so at the DDL stage, there is no data loading that is happening. All that hive is supposed to do is to load the definition into mysql, right? Are you suggesting that it's reading the datafile in HDFS? That should not be happening since the "external table" does not need the data to be present, right?

Sincerely,

Ameet

________________________________
 From: Edward Capriolo <ed...@gmail.com>
To: user@hive.apache.org; ameet chaubal <am...@yahoo.com> 
Sent: Monday, January 16, 2012 10:06 AM
Subject: Re: large sql file creating large num of columns

I highly doubt this will work. I think that many things in hadoop and hive will try to buffer an entire row so even if you make it past the metastore I do not think it will be of any use. 

On Mon, Jan 16, 2012 at 9:42 AM, ameet chaubal <am...@yahoo.com> wrote:

Hi All,
>
>
>I have a SQL file of size 30mb which is a single create table statement with about 800,000 columns, hence the size. 
>
>I am trying to execute it using hive -f <file>. Initially, hive ran the command with 256mb heap size and gave me an OOM error. I increased the heap size using export HADOOP_HEAPSIZE to 1 gb and eventually 2gb which made the OOM error go away. However, the hive command ran for 5 hours without actually creating the table. The JVM was running.
>However,
>1. running a strace on the process showed that it was stuck on a futex call.
>2. I am using mysql for metastore and there were no rows added to either TBLS or COLUMNS table.
>
>
>Question.
>1. can hive do this create table of 800k columns from a sql file of 30mb?
>2. if theoretically possible, what could be happening that's taking it over 5 hours and still not succeeding?
>
>
>any insight is much appreciated.
> 
>Sincerely,
>
>
>Ameet
>

Re: large sql file creating large num of columns

Posted by ameet chaubal <am...@yahoo.com>.

thanks,

this is an external table; so at the DDL stage, there is no data loading that is happening. All that hive is supposed to do is to load the definition into mysql, right? Are you suggesting that it's reading the datafile in HDFS? That should not be happening since the "external table" does not need the data to be present, right?

Sincerely,

Ameet

________________________________
 From: Edward Capriolo <ed...@gmail.com>
To: user@hive.apache.org; ameet chaubal <am...@yahoo.com> 
Sent: Monday, January 16, 2012 10:06 AM
Subject: Re: large sql file creating large num of columns

I highly doubt this will work. I think that many things in hadoop and hive will try to buffer an entire row so even if you make it past the metastore I do not think it will be of any use. 

On Mon, Jan 16, 2012 at 9:42 AM, ameet chaubal <am...@yahoo.com> wrote:

Hi All,
>
>
>I have a SQL file of size 30mb which is a single create table statement with about 800,000 columns, hence the size. 
>
>I am trying to execute it using hive -f <file>. Initially, hive ran the command with 256mb heap size and gave me an OOM error. I increased the heap size using export HADOOP_HEAPSIZE to 1 gb and eventually 2gb which made the OOM error go away. However, the hive command ran for 5 hours without actually creating the table. The JVM was running.
>However,
>1. running a strace on the process showed that it was stuck on a futex call.
>2. I am using mysql for metastore and there were no rows added to either TBLS or COLUMNS table.
>
>
>Question.
>1. can hive do this create table of 800k columns from a sql file of 30mb?
>2. if theoretically possible, what could be happening that's taking it over 5 hours and still not succeeding?
>
>
>any insight is much appreciated.
> 
>Sincerely,
>
>
>Ameet
>

Re: large sql file creating large num of columns

Posted by Edward Capriolo <ed...@gmail.com>.

I highly doubt this will work. I think that many things in hadoop and hive
will try to buffer an entire row so even if you make it past the metastore
I do not think it will be of any use.

On Mon, Jan 16, 2012 at 9:42 AM, ameet chaubal <am...@yahoo.com>wrote:

> Hi All,
>
> I have a SQL file of size 30mb which is a single create table statement
> with about 800,000 columns, hence the size.
> I am trying to execute it using hive -f <file>. Initially, hive ran the
> command with 256mb heap size and gave me an OOM error. I increased the heap
> size using export HADOOP_HEAPSIZE to 1 gb and eventually 2gb which made the
> OOM error go away. However, the hive command ran for 5 hours without
> actually creating the table. The JVM was running.
> However,
> 1. running a strace on the process showed that it was stuck on a futex
> call.
> 2. I am using mysql for metastore and there were no rows added to either
> TBLS or COLUMNS table.
>
> Question.
> 1. can hive do this create table of 800k columns from a sql file of 30mb?
> 2. if theoretically possible, what could be happening that's taking it
> over 5 hours and still not succeeding?
>
> any insight is much appreciated.
>
> Sincerely,
>
> Ameet
>

large sql file creating large num of columns

Posted by ameet chaubal <am...@yahoo.com>.

Hi All,

I have a SQL file of size 30mb which is a single create table statement with about 800,000 columns, hence the size. 

I am trying to execute it using hive -f <file>. Initially, hive ran the command with 256mb heap size and gave me an OOM error. I increased the heap size using export HADOOP_HEAPSIZE to 1 gb and eventually 2gb which made the OOM error go away. However, the hive command ran for 5 hours without actually creating the table. The JVM was running.
However,
1. running a strace on the process showed that it was stuck on a futex call.
2. I am using mysql for metastore and there were no rows added to either TBLS or COLUMNS table.

Question.
1. can hive do this create table of 800k columns from a sql file of 30mb?
2. if theoretically possible, what could be happening that's taking it over 5 hours and still not succeeding?

any insight is much appreciated.

 
Sincerely,


Ameet

Re: Hive JOIN fails if SELECT statement contains fields from the first table.

Posted by Mark Grover <mg...@oanda.com>.

Hi Bing,
Something seems wrong about your create table statements.
You are using "LOAD DATA LOCAL INPATH" to load data into Hive tables. This makes me think that the files /home/biadmin/hivetbl/student_details.txt and /home/biadmin/hivetbl/student_score.txt are on the local drive.
In such a case you want to copy them onto HDFS for use by Hive tables and if so, doing "LOAD DATA LOCAL INPATH" is the right thing to do.

However, when you create a table and specify the location 
...STORED AS TEXTFILE LOCATION ‘/home/biadmin/hivetbl'...
This location refers to the HDFS location. If you don't specify this, Hive will choose this by for you by default. However, if you do specify it, it's your responsibility to ensure that this location doesn't conflict with another Hive table.

Long story short, a leaf level directory in HDFS stores the partitions/buckets and contents of 1 table. In your create table statement, you were pointing both Hive tables to the same HDFS location /home/biadmin/hivetbl.

If you could make your two tables, point to different HDFS locations like, '/usr/hive/warehouse/student_score' and '/usr/hive/warehouse/student_details' and then do your load data inpath statements, that should fix the problem.

Good luck!
Mark

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: mgrover@oanda.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


----- Original Message -----
From: "Bing Li" <sm...@gmail.com>
To: dev@hive.apache.org, user@hive.apache.org
Sent: Monday, January 16, 2012 5:06:00 AM
Subject: Hive JOIN fails if SELECT statement contains fields from the first table.


1. I create two Hive table: 
Hive> CREATE EXTERNAL TABLE student_details (studentid INT,studentname STRING,age INT,gpa FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION ‘/home/biadmin/hivetbl'; 


Hive>CREATE EXTERNAL TABLE student_score(studentid INT, classid INT,score FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION '/home/biadmin/hivetbl'; 


2. Load data 
HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_details.txt' OVERWRITE INTO TABLE student_details; 


HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_score.txt' OVERWRITE INTO TABLE student_score; 


3. Run inner join 
Hive> SELECT a.studentid,a.studentname,a.age,b.classid,b.score,c.classname FROM student_details a JOIN student_score b ON (a.studentid = b.studentid); 


Result: 
There are the following exception: 
cannot find field studentname from [0:studentid, 1:classid, 2:score] 


[My Question]: studentname is a field of the table student_details (The first table), why search it in the table student_score(the second table)? 


log is like that; 
... ... 
2012-01-15 23:24:41,727 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201201152221_0014_m_000000_3: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"studentid":106,"classid":null,"score":635.0} 
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161) 
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) 
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) 
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) 
at org.apache.hadoop.mapred.Child.main(Child.java:170) 
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"studentid":106,"classid":null,"score":635.0} 
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) 
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) 
... 4 more 
Caused by: java.lang.RuntimeException: cannot find field studentname from [0:studentid, 1:classid, 2:score] 
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:345) 
at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168) 
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) 
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:896) 
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:922) 
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:200) 
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) 
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) 
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) 
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531) 
... 5 more

Re: Hive JOIN fails if SELECT statement contains fields from the first table.

Posted by Mark Grover <mg...@oanda.com>.

Hi Bing,
Something seems wrong about your create table statements.
You are using "LOAD DATA LOCAL INPATH" to load data into Hive tables. This makes me think that the files /home/biadmin/hivetbl/student_details.txt and /home/biadmin/hivetbl/student_score.txt are on the local drive.
In such a case you want to copy them onto HDFS for use by Hive tables and if so, doing "LOAD DATA LOCAL INPATH" is the right thing to do.

However, when you create a table and specify the location 
...STORED AS TEXTFILE LOCATION ‘/home/biadmin/hivetbl'...
This location refers to the HDFS location. If you don't specify this, Hive will choose this by for you by default. However, if you do specify it, it's your responsibility to ensure that this location doesn't conflict with another Hive table.

Long story short, a leaf level directory in HDFS stores the partitions/buckets and contents of 1 table. In your create table statement, you were pointing both Hive tables to the same HDFS location /home/biadmin/hivetbl.

If you could make your two tables, point to different HDFS locations like, '/usr/hive/warehouse/student_score' and '/usr/hive/warehouse/student_details' and then do your load data inpath statements, that should fix the problem.

Good luck!
Mark

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: mgrover@oanda.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


----- Original Message -----
From: "Bing Li" <sm...@gmail.com>
To: dev@hive.apache.org, user@hive.apache.org
Sent: Monday, January 16, 2012 5:06:00 AM
Subject: Hive JOIN fails if SELECT statement contains fields from the first table.


1. I create two Hive table: 
Hive> CREATE EXTERNAL TABLE student_details (studentid INT,studentname STRING,age INT,gpa FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION ‘/home/biadmin/hivetbl'; 


Hive>CREATE EXTERNAL TABLE student_score(studentid INT, classid INT,score FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION '/home/biadmin/hivetbl'; 


2. Load data 
HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_details.txt' OVERWRITE INTO TABLE student_details; 


HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_score.txt' OVERWRITE INTO TABLE student_score; 


3. Run inner join 
Hive> SELECT a.studentid,a.studentname,a.age,b.classid,b.score,c.classname FROM student_details a JOIN student_score b ON (a.studentid = b.studentid); 


Result: 
There are the following exception: 
cannot find field studentname from [0:studentid, 1:classid, 2:score] 


[My Question]: studentname is a field of the table student_details (The first table), why search it in the table student_score(the second table)? 


log is like that; 
... ... 
2012-01-15 23:24:41,727 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201201152221_0014_m_000000_3: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"studentid":106,"classid":null,"score":635.0} 
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161) 
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) 
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) 
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) 
at org.apache.hadoop.mapred.Child.main(Child.java:170) 
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"studentid":106,"classid":null,"score":635.0} 
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) 
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) 
... 4 more 
Caused by: java.lang.RuntimeException: cannot find field studentname from [0:studentid, 1:classid, 2:score] 
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:345) 
at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168) 
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) 
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:896) 
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:922) 
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:200) 
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) 
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) 
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) 
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531) 
... 5 more