You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Su...@sanofipasteur.com on 2012/08/24 22:43:52 UTC
Hive on Amazon EC2 with S3
Hi,
I have setup a Hadoop cluster on Amazon EC2 with my data stored on S3. I would like to use Hive to process the data on S3.
I created an external table in hive using the following:
CREATE EXTERNAL TABLE mytable1
(
HIT_TIME_GMT string,
SERVICE string
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3n://com.xxxxx.webanalytics/hive/';
I loaded a few records into the table (LOAD DATA LOCAL INPATH '/home/ubuntu/data/play/test' INTO TABLE mytable1;) .
Select * from mytable1; shows me the data in the table.
When I try to run the query which requires a map-reduce job to be run, for example, select count(*) from mytable1; I see an exception thrown.
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
java.io.FileNotFoundException: File does not exist: /hive/test
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:527)
at org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462)
at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
at org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:347)
at org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:313)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:377)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1026)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1018)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:929)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:882)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:856)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:671)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:516)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /hive/test)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
The file does exist and I can see it on S3. Select * from table is returning the data in the table. I am not sure what is going wrong when a map-reduce job is being initiated by the hive query. Any pointer as to where I went wrong? Appreciate your help.
Thank you
Suman
Re: Hive on Amazon EC2 with S3
Posted by Joe Crobak <jo...@gmail.com>.
On Thu, Aug 30, 2012 at 1:25 PM, <Su...@sanofipasteur.com> wrote:
> Thank you Joe. It works now. I will try to read up on the differences
> between CombineHiveInputFormat and HiveInputFormat.****
>
> **
>
I suspect this is a bug, but I'm on such an old version of Hive that I
haven't bothered to look into it any further since we have this workaround.
> **
>
> *From:* Joe Crobak [mailto:joecrow@gmail.com]
> *Sent:* Tuesday, August 28, 2012 10:22 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Hive on Amazon EC2 with S3****
>
> ** **
>
> Hi Suman,****
>
> ** **
>
> We've seen this happen due to a bug in Hive's CombineHiveInputFormat. Try
> disabling that before querying by issuing:****
>
> ** **
>
> SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;****
>
> ** **
>
> HTH,****
>
> Joe****
>
> ** **
>
> On Fri, Aug 24, 2012 at 4:43 PM, <Su...@sanofipasteur.com> wrote:*
> ***
>
> Hi,****
>
> I have setup a Hadoop cluster on Amazon EC2 with my data stored on S3. I
> would like to use Hive to process the data on S3.****
>
> ****
>
> I created an external table in hive using the following:****
>
> CREATE EXTERNAL TABLE mytable1****
>
> (****
>
> HIT_TIME_GMT string,****
>
> SERVICE string****
>
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'****
>
> LOCATION 's3n://com.xxxxx.webanalytics/hive/';****
>
> ****
>
> I loaded a few records into the table (LOAD DATA LOCAL INPATH
> '/home/ubuntu/data/play/test' INTO TABLE mytable1;) .****
>
> ****
>
> Select * from mytable1; shows me the data in the table.****
>
> ****
>
> When I try to run the query which requires a map-reduce job to be run, for
> example, select count(*) from mytable1; I see an exception thrown.****
>
> Total MapReduce jobs = 1****
>
> Launching Job 1 out of 1****
>
> Number of reduce tasks determined at compile time: 1****
>
> In order to change the average load for a reducer (in bytes):****
>
> set hive.exec.reducers.bytes.per.reducer=<number>****
>
> In order to limit the maximum number of reducers:****
>
> set hive.exec.reducers.max=<number>****
>
> In order to set a constant number of reducers:****
>
> set mapred.reduce.tasks=<number>****
>
> java.io.FileNotFoundException: File does not exist: /hive/test****
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:527)
> ****
>
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462)
> ****
>
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
> ****
>
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
> ****
>
> at
> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:347)
> ****
>
> at
> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:313)
> ****
>
> at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:377)
> ****
>
> at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1026)****
>
> at
> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1018)****
>
> at
> org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)****
>
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:929)***
> *
>
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:882)***
> *
>
> at java.security.AccessController.doPrivileged(Native Method)****
>
> at javax.security.auth.Subject.doAs(Subject.java:415)****
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
> ****
>
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882)**
> **
>
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:856)
> ****
>
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:671)****
>
> at
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)****
>
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131)*
> ***
>
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
> ****
>
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)**
> **
>
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)****
>
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)****
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)****
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)****
>
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:516)**
> **
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)****
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> ****
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ****
>
> at java.lang.reflect.Method.invoke(Method.java:601)****
>
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)****
>
> Job Submission failed with exception 'java.io.FileNotFoundException(File
> does not exist: /hive/test)'****
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MapRedTask****
>
> ****
>
> The file does exist and I can see it on S3. Select * from table is
> returning the data in the table. I am not sure what is going wrong when a
> map-reduce job is being initiated by the hive query. Any pointer as to
> where I went wrong? Appreciate your help.****
>
> ****
>
> Thank you****
>
> Suman****
>
> ** **
>
RE: Hive on Amazon EC2 with S3
Posted by Su...@sanofipasteur.com.
Thank you Joe. It works now. I will try to read up on the differences between CombineHiveInputFormat and HiveInputFormat.
From: Joe Crobak [mailto:joecrow@gmail.com]
Sent: Tuesday, August 28, 2012 10:22 PM
To: user@hive.apache.org
Subject: Re: Hive on Amazon EC2 with S3
Hi Suman,
We've seen this happen due to a bug in Hive's CombineHiveInputFormat. Try disabling that before querying by issuing:
SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
HTH,
Joe
On Fri, Aug 24, 2012 at 4:43 PM, <Su...@sanofipasteur.com>> wrote:
Hi,
I have setup a Hadoop cluster on Amazon EC2 with my data stored on S3. I would like to use Hive to process the data on S3.
I created an external table in hive using the following:
CREATE EXTERNAL TABLE mytable1
(
HIT_TIME_GMT string,
SERVICE string
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3n://com.xxxxx.webanalytics/hive/';
I loaded a few records into the table (LOAD DATA LOCAL INPATH '/home/ubuntu/data/play/test' INTO TABLE mytable1;) .
Select * from mytable1; shows me the data in the table.
When I try to run the query which requires a map-reduce job to be run, for example, select count(*) from mytable1; I see an exception thrown.
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
java.io.FileNotFoundException: File does not exist: /hive/test
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:527)
at org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462)
at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
at org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:347)
at org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:313)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:377)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1026)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1018)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:929)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:882)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:856)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:671)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:516)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /hive/test)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
The file does exist and I can see it on S3. Select * from table is returning the data in the table. I am not sure what is going wrong when a map-reduce job is being initiated by the hive query. Any pointer as to where I went wrong? Appreciate your help.
Thank you
Suman
Re: Hive on Amazon EC2 with S3
Posted by Joe Crobak <jo...@gmail.com>.
Hi Suman,
We've seen this happen due to a bug in Hive's CombineHiveInputFormat. Try
disabling that before querying by issuing:
SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
HTH,
Joe
On Fri, Aug 24, 2012 at 4:43 PM, <Su...@sanofipasteur.com> wrote:
> Hi,****
>
> I have setup a Hadoop cluster on Amazon EC2 with my data stored on S3. I
> would like to use Hive to process the data on S3.****
>
> ** **
>
> I created an external table in hive using the following:****
>
> CREATE EXTERNAL TABLE mytable1****
>
> (****
>
> HIT_TIME_GMT string,****
>
> SERVICE string****
>
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'****
>
> LOCATION 's3n://com.xxxxx.webanalytics/hive/';****
>
> ** **
>
> I loaded a few records into the table (LOAD DATA LOCAL INPATH
> '/home/ubuntu/data/play/test' INTO TABLE mytable1;) .****
>
> ** **
>
> Select * from mytable1; shows me the data in the table.****
>
> ** **
>
> When I try to run the query which requires a map-reduce job to be run, for
> example, select count(*) from mytable1; I see an exception thrown.****
>
> Total MapReduce jobs = 1****
>
> Launching Job 1 out of 1****
>
> Number of reduce tasks determined at compile time: 1****
>
> In order to change the average load for a reducer (in bytes):****
>
> set hive.exec.reducers.bytes.per.reducer=<number>****
>
> In order to limit the maximum number of reducers:****
>
> set hive.exec.reducers.max=<number>****
>
> In order to set a constant number of reducers:****
>
> set mapred.reduce.tasks=<number>****
>
> java.io.FileNotFoundException: File does not exist: /hive/test****
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:527)
> ****
>
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462)
> ****
>
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
> ****
>
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
> ****
>
> at
> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:347)
> ****
>
> at
> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:313)
> ****
>
> at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:377)
> ****
>
> at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1026)****
>
> at
> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1018)****
>
> at
> org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)****
>
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:929)***
> *
>
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:882)***
> *
>
> at java.security.AccessController.doPrivileged(Native Method)****
>
> at javax.security.auth.Subject.doAs(Subject.java:415)****
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
> ****
>
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882)**
> **
>
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:856)
> ****
>
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:671)****
>
> at
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)****
>
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131)*
> ***
>
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
> ****
>
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)**
> **
>
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)****
>
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)****
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)****
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)****
>
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:516)**
> **
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)****
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> ****
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ****
>
> at java.lang.reflect.Method.invoke(Method.java:601)****
>
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)****
>
> Job Submission failed with exception 'java.io.FileNotFoundException(File
> does not exist: /hive/test)'****
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MapRedTask****
>
> ** **
>
> The file does exist and I can see it on S3. Select * from table is
> returning the data in the table. I am not sure what is going wrong when a
> map-reduce job is being initiated by the hive query. Any pointer as to
> where I went wrong? Appreciate your help.****
>
> ** **
>
> Thank you****
>
> Suman****
>
Re: Hive on Amazon EC2 with S3
Posted by Manish <ma...@rocketmail.com>.
Hi Suman,
I think you need to have another directory in hive as test. Copy the
data into s3://com.xxxxx/hive/test/
Thank You,
Manish.
On Fri, 2012-08-24 at 20:43 +0000, Suman.Addanki@sanofipasteur.com
wrote:
> Hi,
>
> I have setup a Hadoop cluster on Amazon EC2 with my data stored on S3.
> I would like to use Hive to process the data on S3.
>
>
>
> I created an external table in hive using the following:
>
> CREATE EXTERNAL TABLE mytable1
>
> (
>
> HIT_TIME_GMT string,
>
> SERVICE string
>
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
>
> LOCATION 's3n://com.xxxxx.webanalytics/hive/';
>
>
>
> I loaded a few records into the table (LOAD DATA LOCAL INPATH
> '/home/ubuntu/data/play/test' INTO TABLE mytable1;) .
>
>
>
> Select * from mytable1; shows me the data in the table.
>
>
>
> When I try to run the query which requires a map-reduce job to be run,
> for example, select count(*) from mytable1; I see an exception thrown.
>
> Total MapReduce jobs = 1
>
> Launching Job 1 out of 1
>
> Number of reduce tasks determined at compile time: 1
>
> In order to change the average load for a reducer (in bytes):
>
> set hive.exec.reducers.bytes.per.reducer=<number>
>
> In order to limit the maximum number of reducers:
>
> set hive.exec.reducers.max=<number>
>
> In order to set a constant number of reducers:
>
> set mapred.reduce.tasks=<number>
>
> java.io.FileNotFoundException: File does not exist: /hive/test
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:527)
>
> at org.apache.hadoop.mapred.lib.CombineFileInputFormat
> $OneFileInfo.<init>(CombineFileInputFormat.java:462)
>
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
>
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
>
> at org.apache.hadoop.hive.shims.Hadoop20SShims
> $CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:347)
>
> at org.apache.hadoop.hive.shims.Hadoop20SShims
> $CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:313)
>
> at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:377)
>
> at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1026)
>
> at
> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1018)
>
> at org.apache.hadoop.mapred.JobClient.access
> $600(JobClient.java:174)
>
> at org.apache.hadoop.mapred.JobClient
> $2.run(JobClient.java:929)
>
> at org.apache.hadoop.mapred.JobClient
> $2.run(JobClient.java:882)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
>
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882)
>
> at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:856)
>
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:671)
>
> at
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
>
> at
> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131)
>
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>
> at
> org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
>
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
>
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:516)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:601)
>
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>
> Job Submission failed with exception
> 'java.io.FileNotFoundException(File does not exist: /hive/test)'
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
>
>
> The file does exist and I can see it on S3. Select * from table is
> returning the data in the table. I am not sure what is going wrong
> when a map-reduce job is being initiated by the hive query. Any
> pointer as to where I went wrong? Appreciate your help.
>
>
>
> Thank you
>
> Suman
>
>