You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Soham Sardar <so...@gmail.com> on 2012/06/25 13:16:09 UTC
some queries actually fail in hive
1)hive> desc users_info;
OK
id int
name string
age int
country string
gender string
bday string
hive> desc users_audit;
OK
id int
userid int
logtime string
Time taken: 0.079 seconds
so both of my tables are fine and has data now the first query which
is failing is
hive> select users_info.name from users_info inner join users_audit
> on users_audit.userid=users_info.id
> where month(users_audit.logtime)>10
> order by users_info.id;
FAILED: Error in semantic analysis: Line 4:20 Invalid column reference 'id'
now my question is why it should fail in id .(id is a primary key for
users_info table)
2) hive> select users_info.name from users_info inner join users_audit
> on users_audit.userid=users_info.id
> where month(users_audit.logtime)>10
> order by users_info.id;
for the same above table when i put the following query it fails at
half way down of mapping .
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
12/06/25 16:45:08 WARN conf.Configuration: mapred.job.name is
deprecated. Instead, use mapreduce.job.name
12/06/25 16:45:08 WARN conf.Configuration: mapred.system.dir is
deprecated. Instead, use mapreduce.jobtracker.system.dir
12/06/25 16:45:08 WARN conf.Configuration: mapred.local.dir is
deprecated. Instead, use mapreduce.cluster.local.dir
12/06/25 16:45:08 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
Please use org.apache.hadoop.log.metrics.EventCounter in all the
log4j.properties files.
Execution log at:
/tmp/hduser/hduser_20120625164545_3c0a9948-f43f-428e-9d8f-ff89fe2f4937.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/hduser/cloudera/hadoop-2.0.0-cdh4.0.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/hduser/cloudera/hive-0.8.1-cdh4.0.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 2; number of reducers: 1
2012-06-25 16:45:18,112 null map = 0%, reduce = 0%
2012-06-25 16:45:25,886 null map = 50%, reduce = 0%, Cumulative CPU 4.1 sec
2012-06-25 16:45:26,951 null map = 50%, reduce = 0%, Cumulative CPU 4.1 sec
2012-06-25 16:45:28,007 null map = 50%, reduce = 0%, Cumulative CPU 4.1 sec
2012-06-25 16:45:29,069 null map = 83%, reduce = 0%, Cumulative CPU 10.92 sec
2012-06-25 16:45:30,118 null map = 83%, reduce = 0%, Cumulative CPU 10.92 sec
2012-06-25 16:45:31,192 null map = 100%, reduce = 17%, Cumulative CPU 14.64 sec
2012-06-25 16:45:32,251 null map = 100%, reduce = 17%, Cumulative CPU 14.64 sec
2012-06-25 16:45:33,300 null map = 100%, reduce = 17%, Cumulative CPU 14.64 sec
2012-06-25 16:45:34,369 null map = 100%, reduce = 100%, Cumulative
CPU 19.42 sec
MapReduce Total cumulative CPU time: 19 seconds 420 msec
Ended Job = job_1340607580565_0023
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
12/06/25 16:45:35 WARN conf.Configuration: mapred.job.name is
deprecated. Instead, use mapreduce.job.name
12/06/25 16:45:35 WARN conf.Configuration: mapred.system.dir is
deprecated. Instead, use mapreduce.jobtracker.system.dir
12/06/25 16:45:35 WARN conf.Configuration: mapred.local.dir is
deprecated. Instead, use mapreduce.cluster.local.dir
12/06/25 16:45:35 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
Please use org.apache.hadoop.log.metrics.EventCounter in all the
log4j.properties files.
Execution log at:
/tmp/hduser/hduser_20120625164545_3c0a9948-f43f-428e-9d8f-ff89fe2f4937.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/hduser/cloudera/hadoop-2.0.0-cdh4.0.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/hduser/cloudera/hive-0.8.1-cdh4.0.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
java.io.FileNotFoundException: File does not exist:
/tmp/hduser/hive_2012-06-25_16-45-07_351_2914856137008935083/-mr-10002/000000_0
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:736)
at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:493)
at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:284)
at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:239)
at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:69)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:387)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:353)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:478)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:470)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:360)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:609)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:604)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:604)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)
at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:710)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Job Submission failed with exception
'java.io.FileNotFoundException(File does not exist:
/tmp/hduser/hive_2012-06-25_16-45-07_351_2914856137008935083/-mr-10002/000000_0)'
Execution failed with exit status: 2
Obtaining error information
Task failed!
Task ID:
Stage-2
Logs:
/tmp/hduser/hive.log
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask
can some one help me out with this ??
Re: some queries actually fail in hive
Posted by Igor Tatarinov <ig...@decide.com>.
1) You have to SELECT every column you ORDER BY (id in this case)
2) the query is the same as in 1) - I assume you actually ran a different
query
igor
decide.com
On Mon, Jun 25, 2012 at 4:16 AM, Soham Sardar <so...@gmail.com>wrote:
> 1)hive> desc users_info;
> OK
> id int
> name string
> age int
> country string
> gender string
> bday string
>
> hive> desc users_audit;
> OK
> id int
> userid int
> logtime string
> Time taken: 0.079 seconds
>
> so both of my tables are fine and has data now the first query which
> is failing is
>
> hive> select users_info.name from users_info inner join users_audit
> > on users_audit.userid=users_info.id
> > where month(users_audit.logtime)>10
> > order by users_info.id;
> FAILED: Error in semantic analysis: Line 4:20 Invalid column reference 'id'
>
> now my question is why it should fail in id .(id is a primary key for
> users_info table)
>
>
> 2) hive> select users_info.name from users_info inner join users_audit
> > on users_audit.userid=users_info.id
> > where month(users_audit.logtime)>10
> > order by users_info.id;
>
> for the same above table when i put the following query it fails at
> half way down of mapping .
>
> Total MapReduce jobs = 2
> Launching Job 1 out of 2
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
> set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
> set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
> set mapred.reduce.tasks=<number>
> 12/06/25 16:45:08 WARN conf.Configuration: mapred.job.name is
> deprecated. Instead, use mapreduce.job.name
> 12/06/25 16:45:08 WARN conf.Configuration: mapred.system.dir is
> deprecated. Instead, use mapreduce.jobtracker.system.dir
> 12/06/25 16:45:08 WARN conf.Configuration: mapred.local.dir is
> deprecated. Instead, use mapreduce.cluster.local.dir
> 12/06/25 16:45:08 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
> Please use org.apache.hadoop.log.metrics.EventCounter in all the
> log4j.properties files.
> Execution log at:
> /tmp/hduser/hduser_20120625164545_3c0a9948-f43f-428e-9d8f-ff89fe2f4937.log
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>
> [jar:file:/home/hduser/cloudera/hadoop-2.0.0-cdh4.0.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>
> [jar:file:/home/hduser/cloudera/hive-0.8.1-cdh4.0.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> Job running in-process (local Hadoop)
> Hadoop job information for null: number of mappers: 2; number of reducers:
> 1
> 2012-06-25 16:45:18,112 null map = 0%, reduce = 0%
> 2012-06-25 16:45:25,886 null map = 50%, reduce = 0%, Cumulative CPU 4.1
> sec
> 2012-06-25 16:45:26,951 null map = 50%, reduce = 0%, Cumulative CPU 4.1
> sec
> 2012-06-25 16:45:28,007 null map = 50%, reduce = 0%, Cumulative CPU 4.1
> sec
> 2012-06-25 16:45:29,069 null map = 83%, reduce = 0%, Cumulative CPU 10.92
> sec
> 2012-06-25 16:45:30,118 null map = 83%, reduce = 0%, Cumulative CPU 10.92
> sec
> 2012-06-25 16:45:31,192 null map = 100%, reduce = 17%, Cumulative CPU
> 14.64 sec
> 2012-06-25 16:45:32,251 null map = 100%, reduce = 17%, Cumulative CPU
> 14.64 sec
> 2012-06-25 16:45:33,300 null map = 100%, reduce = 17%, Cumulative CPU
> 14.64 sec
> 2012-06-25 16:45:34,369 null map = 100%, reduce = 100%, Cumulative
> CPU 19.42 sec
> MapReduce Total cumulative CPU time: 19 seconds 420 msec
> Ended Job = job_1340607580565_0023
> Execution completed successfully
> Mapred Local Task Succeeded . Convert the Join into MapJoin
> Launching Job 2 out of 2
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
> set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
> set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
> set mapred.reduce.tasks=<number>
> 12/06/25 16:45:35 WARN conf.Configuration: mapred.job.name is
> deprecated. Instead, use mapreduce.job.name
> 12/06/25 16:45:35 WARN conf.Configuration: mapred.system.dir is
> deprecated. Instead, use mapreduce.jobtracker.system.dir
> 12/06/25 16:45:35 WARN conf.Configuration: mapred.local.dir is
> deprecated. Instead, use mapreduce.cluster.local.dir
> 12/06/25 16:45:35 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
> Please use org.apache.hadoop.log.metrics.EventCounter in all the
> log4j.properties files.
> Execution log at:
> /tmp/hduser/hduser_20120625164545_3c0a9948-f43f-428e-9d8f-ff89fe2f4937.log
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>
> [jar:file:/home/hduser/cloudera/hadoop-2.0.0-cdh4.0.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>
> [jar:file:/home/hduser/cloudera/hive-0.8.1-cdh4.0.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> java.io.FileNotFoundException: File does not exist:
>
> /tmp/hduser/hive_2012-06-25_16-45-07_351_2914856137008935083/-mr-10002/000000_0
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:736)
> at
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:493)
> at
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:284)
> at
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:239)
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:69)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:387)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:353)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:478)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:470)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:360)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:609)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:604)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:604)
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:710)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Job Submission failed with exception
> 'java.io.FileNotFoundException(File does not exist:
>
> /tmp/hduser/hive_2012-06-25_16-45-07_351_2914856137008935083/-mr-10002/000000_0)'
> Execution failed with exit status: 2
> Obtaining error information
>
> Task failed!
> Task ID:
> Stage-2
>
> Logs:
>
> /tmp/hduser/hive.log
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
> can some one help me out with this ??
>