You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Soham Sardar <so...@gmail.com> on 2012/06/25 13:16:09 UTC

some queries actually fail in hive

1)hive> desc users_info;
OK
id	int	
name	string	
age	int	
country	string	
gender	string	
bday	string	

hive> desc users_audit;
OK
id	int	
userid	int	
logtime	string	
Time taken: 0.079 seconds

so both of my tables are fine and has data now the first query which
is failing is

hive> select  users_info.name from users_info inner join users_audit
    > on users_audit.userid=users_info.id
    > where month(users_audit.logtime)>10
    > order by users_info.id;
FAILED: Error in semantic analysis: Line 4:20 Invalid column reference 'id'

now my question is why it should fail in id .(id is a primary key for
users_info table)


2) hive> select  users_info.name from users_info inner join users_audit
    > on users_audit.userid=users_info.id
    > where month(users_audit.logtime)>10
    > order by users_info.id;

for the same above table when i put the following query it fails at
half way down of mapping .

Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
12/06/25 16:45:08 WARN conf.Configuration: mapred.job.name is
deprecated. Instead, use mapreduce.job.name
12/06/25 16:45:08 WARN conf.Configuration: mapred.system.dir is
deprecated. Instead, use mapreduce.jobtracker.system.dir
12/06/25 16:45:08 WARN conf.Configuration: mapred.local.dir is
deprecated. Instead, use mapreduce.cluster.local.dir
12/06/25 16:45:08 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
Please use org.apache.hadoop.log.metrics.EventCounter in all the
log4j.properties files.
Execution log at:
/tmp/hduser/hduser_20120625164545_3c0a9948-f43f-428e-9d8f-ff89fe2f4937.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/hduser/cloudera/hadoop-2.0.0-cdh4.0.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/hduser/cloudera/hive-0.8.1-cdh4.0.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 2; number of reducers: 1
2012-06-25 16:45:18,112 null map = 0%,  reduce = 0%
2012-06-25 16:45:25,886 null map = 50%,  reduce = 0%, Cumulative CPU 4.1 sec
2012-06-25 16:45:26,951 null map = 50%,  reduce = 0%, Cumulative CPU 4.1 sec
2012-06-25 16:45:28,007 null map = 50%,  reduce = 0%, Cumulative CPU 4.1 sec
2012-06-25 16:45:29,069 null map = 83%,  reduce = 0%, Cumulative CPU 10.92 sec
2012-06-25 16:45:30,118 null map = 83%,  reduce = 0%, Cumulative CPU 10.92 sec
2012-06-25 16:45:31,192 null map = 100%,  reduce = 17%, Cumulative CPU 14.64 sec
2012-06-25 16:45:32,251 null map = 100%,  reduce = 17%, Cumulative CPU 14.64 sec
2012-06-25 16:45:33,300 null map = 100%,  reduce = 17%, Cumulative CPU 14.64 sec
2012-06-25 16:45:34,369 null map = 100%,  reduce = 100%, Cumulative
CPU 19.42 sec
MapReduce Total cumulative CPU time: 19 seconds 420 msec
Ended Job = job_1340607580565_0023
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
12/06/25 16:45:35 WARN conf.Configuration: mapred.job.name is
deprecated. Instead, use mapreduce.job.name
12/06/25 16:45:35 WARN conf.Configuration: mapred.system.dir is
deprecated. Instead, use mapreduce.jobtracker.system.dir
12/06/25 16:45:35 WARN conf.Configuration: mapred.local.dir is
deprecated. Instead, use mapreduce.cluster.local.dir
12/06/25 16:45:35 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
Please use org.apache.hadoop.log.metrics.EventCounter in all the
log4j.properties files.
Execution log at:
/tmp/hduser/hduser_20120625164545_3c0a9948-f43f-428e-9d8f-ff89fe2f4937.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/hduser/cloudera/hadoop-2.0.0-cdh4.0.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/hduser/cloudera/hive-0.8.1-cdh4.0.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
java.io.FileNotFoundException: File does not exist:
/tmp/hduser/hive_2012-06-25_16-45-07_351_2914856137008935083/-mr-10002/000000_0
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:736)
	at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:493)
	at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:284)
	at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:239)
	at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:69)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:387)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:353)
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:478)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:470)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:360)
	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:609)
	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:604)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:604)
	at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)
	at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:710)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Job Submission failed with exception
'java.io.FileNotFoundException(File does not exist:
/tmp/hduser/hive_2012-06-25_16-45-07_351_2914856137008935083/-mr-10002/000000_0)'
Execution failed with exit status: 2
Obtaining error information

Task failed!
Task ID:
  Stage-2

Logs:

/tmp/hduser/hive.log
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask

can some one help me out with this ??

Re: some queries actually fail in hive

Posted by Igor Tatarinov <ig...@decide.com>.

1) You have to SELECT every column you ORDER BY (id in this case)

2) the query is the same as in 1) - I assume  you actually ran a different
query

igor
decide.com

On Mon, Jun 25, 2012 at 4:16 AM, Soham Sardar <so...@gmail.com>wrote:

> 1)hive> desc users_info;
> OK
> id      int
> name    string
> age     int
> country string
> gender  string
> bday    string
>
> hive> desc users_audit;
> OK
> id      int
> userid  int
> logtime string
> Time taken: 0.079 seconds
>
> so both of my tables are fine and has data now the first query which
> is failing is
>
> hive> select  users_info.name from users_info inner join users_audit
>    > on users_audit.userid=users_info.id
>    > where month(users_audit.logtime)>10
>    > order by users_info.id;
> FAILED: Error in semantic analysis: Line 4:20 Invalid column reference 'id'
>
> now my question is why it should fail in id .(id is a primary key for
> users_info table)
>
>
> 2) hive> select  users_info.name from users_info inner join users_audit
>    > on users_audit.userid=users_info.id
>    > where month(users_audit.logtime)>10
>    > order by users_info.id;
>
> for the same above table when i put the following query it fails at
> half way down of mapping .
>
> Total MapReduce jobs = 2
> Launching Job 1 out of 2
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
>  set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>  set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>  set mapred.reduce.tasks=<number>
> 12/06/25 16:45:08 WARN conf.Configuration: mapred.job.name is
> deprecated. Instead, use mapreduce.job.name
> 12/06/25 16:45:08 WARN conf.Configuration: mapred.system.dir is
> deprecated. Instead, use mapreduce.jobtracker.system.dir
> 12/06/25 16:45:08 WARN conf.Configuration: mapred.local.dir is
> deprecated. Instead, use mapreduce.cluster.local.dir
> 12/06/25 16:45:08 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
> Please use org.apache.hadoop.log.metrics.EventCounter in all the
> log4j.properties files.
> Execution log at:
> /tmp/hduser/hduser_20120625164545_3c0a9948-f43f-428e-9d8f-ff89fe2f4937.log
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>
> [jar:file:/home/hduser/cloudera/hadoop-2.0.0-cdh4.0.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>
> [jar:file:/home/hduser/cloudera/hive-0.8.1-cdh4.0.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> Job running in-process (local Hadoop)
> Hadoop job information for null: number of mappers: 2; number of reducers:
> 1
> 2012-06-25 16:45:18,112 null map = 0%,  reduce = 0%
> 2012-06-25 16:45:25,886 null map = 50%,  reduce = 0%, Cumulative CPU 4.1
> sec
> 2012-06-25 16:45:26,951 null map = 50%,  reduce = 0%, Cumulative CPU 4.1
> sec
> 2012-06-25 16:45:28,007 null map = 50%,  reduce = 0%, Cumulative CPU 4.1
> sec
> 2012-06-25 16:45:29,069 null map = 83%,  reduce = 0%, Cumulative CPU 10.92
> sec
> 2012-06-25 16:45:30,118 null map = 83%,  reduce = 0%, Cumulative CPU 10.92
> sec
> 2012-06-25 16:45:31,192 null map = 100%,  reduce = 17%, Cumulative CPU
> 14.64 sec
> 2012-06-25 16:45:32,251 null map = 100%,  reduce = 17%, Cumulative CPU
> 14.64 sec
> 2012-06-25 16:45:33,300 null map = 100%,  reduce = 17%, Cumulative CPU
> 14.64 sec
> 2012-06-25 16:45:34,369 null map = 100%,  reduce = 100%, Cumulative
> CPU 19.42 sec
> MapReduce Total cumulative CPU time: 19 seconds 420 msec
> Ended Job = job_1340607580565_0023
> Execution completed successfully
> Mapred Local Task Succeeded . Convert the Join into MapJoin
> Launching Job 2 out of 2
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>  set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>  set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>  set mapred.reduce.tasks=<number>
> 12/06/25 16:45:35 WARN conf.Configuration: mapred.job.name is
> deprecated. Instead, use mapreduce.job.name
> 12/06/25 16:45:35 WARN conf.Configuration: mapred.system.dir is
> deprecated. Instead, use mapreduce.jobtracker.system.dir
> 12/06/25 16:45:35 WARN conf.Configuration: mapred.local.dir is
> deprecated. Instead, use mapreduce.cluster.local.dir
> 12/06/25 16:45:35 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
> Please use org.apache.hadoop.log.metrics.EventCounter in all the
> log4j.properties files.
> Execution log at:
> /tmp/hduser/hduser_20120625164545_3c0a9948-f43f-428e-9d8f-ff89fe2f4937.log
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>
> [jar:file:/home/hduser/cloudera/hadoop-2.0.0-cdh4.0.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>
> [jar:file:/home/hduser/cloudera/hive-0.8.1-cdh4.0.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> java.io.FileNotFoundException: File does not exist:
>
> /tmp/hduser/hive_2012-06-25_16-45-07_351_2914856137008935083/-mr-10002/000000_0
>        at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:736)
>        at
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:493)
>        at
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:284)
>        at
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:239)
>        at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:69)
>        at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:387)
>        at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:353)
>        at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387)
>        at
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:478)
>        at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:470)
>        at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:360)
>        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
>        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
>        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:609)
>        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:604)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:604)
>        at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)
>        at
> org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:710)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Job Submission failed with exception
> 'java.io.FileNotFoundException(File does not exist:
>
> /tmp/hduser/hive_2012-06-25_16-45-07_351_2914856137008935083/-mr-10002/000000_0)'
> Execution failed with exit status: 2
> Obtaining error information
>
> Task failed!
> Task ID:
>  Stage-2
>
> Logs:
>
> /tmp/hduser/hive.log
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
> can some one help me out with this ??
>