You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Pat Ferrel <pa...@gmail.com> on 2012/09/03 00:10:08 UTC

Error using hadoop in non-distributed mode

I'm using mahout with a local filesystem/non-hdfs config for debugging purposes. I'm running inside Intellij IDEA. When I run one particular part of the analysis I get the following error. I didn't write the code but we are looking for some hint about what might cause it. This job completes without error in a single node pseudo-clustered config but I can't use the debugger on it very easily.

several jobs in the pipeline complete without error creating part files just fine
…. 
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist.
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
	at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
	at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
	at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
	at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
	at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
	at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
	at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '127.0.0.1:63483', transport: 'socket'

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist

Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

does exist at the time of the error. So the code is looking for the data in the wrong place?


Re: Error using hadoop in non-distributed mode

Posted by Pat Ferrel <pa...@gmail.com>.
Thanks! You nailed it. 

Mahout was using the cache but fortunately there was an easy way to tell it not to and now the jobs run local and therefore in a debugging setup.


On Sep 4, 2012, at 9:22 PM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:

Hi,

The path /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/<snip> is a location used by the tasktracker process for the 'DistributedCache' - a mechanism to distribute files to all tasks running in a map reduce job. (http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache).

You have mentioned Mahout, so I am assuming that the specific analysis job you are running is using this feature to distribute the output of the file /Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 to the job that is causing a failure.

Also, I find links stating the distributed cache does not work with in the local (non-HDFS) mode. (http://stackoverflow.com/questions/9148724/multiple-input-into-a-mapper-in-hadoop). Look at the second answer.

Thanks
hemanth


On Tue, Sep 4, 2012 at 10:33 PM, Pat Ferrel <pa...@gmail.com> wrote:
The job is creating several output and intermediate files all under the location: Users/pat/Projects/big-data/b/ssvd/ several output directories and files are created correctly and the file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 is created and exists at the time of the error. We seem to be passing in Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 as the input file.

Under what circumstances would an input path passed in as "Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000" be turned into "pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"

???


On Sep 4, 2012, at 1:14 AM, Narasingu Ramesh <ra...@gmail.com> wrote:

Hi Pat,
            Please specify correct input file location.
Thanks & Regards,
Ramesh.Narasingu

On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside Intellij IDEA. When I run one particular part of the analysis I get the error below. I didn't write the code but we are looking for some hint about what might cause it. This job completes without error in a single node pseudo-clustered config outside of IDEA.

several jobs in the pipeline complete without error creating part files just fine in the local file system

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist

Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

does exist at the time of the error. So the code is looking for the data in the wrong place?

….
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
        at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
        at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
        at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '127.0.0.1:63483', transport: 'socket'






Re: Error using hadoop in non-distributed mode

Posted by Pat Ferrel <pa...@gmail.com>.
Thanks! You nailed it. 

Mahout was using the cache but fortunately there was an easy way to tell it not to and now the jobs run local and therefore in a debugging setup.


On Sep 4, 2012, at 9:22 PM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:

Hi,

The path /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/<snip> is a location used by the tasktracker process for the 'DistributedCache' - a mechanism to distribute files to all tasks running in a map reduce job. (http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache).

You have mentioned Mahout, so I am assuming that the specific analysis job you are running is using this feature to distribute the output of the file /Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 to the job that is causing a failure.

Also, I find links stating the distributed cache does not work with in the local (non-HDFS) mode. (http://stackoverflow.com/questions/9148724/multiple-input-into-a-mapper-in-hadoop). Look at the second answer.

Thanks
hemanth


On Tue, Sep 4, 2012 at 10:33 PM, Pat Ferrel <pa...@gmail.com> wrote:
The job is creating several output and intermediate files all under the location: Users/pat/Projects/big-data/b/ssvd/ several output directories and files are created correctly and the file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 is created and exists at the time of the error. We seem to be passing in Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 as the input file.

Under what circumstances would an input path passed in as "Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000" be turned into "pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"

???


On Sep 4, 2012, at 1:14 AM, Narasingu Ramesh <ra...@gmail.com> wrote:

Hi Pat,
            Please specify correct input file location.
Thanks & Regards,
Ramesh.Narasingu

On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside Intellij IDEA. When I run one particular part of the analysis I get the error below. I didn't write the code but we are looking for some hint about what might cause it. This job completes without error in a single node pseudo-clustered config outside of IDEA.

several jobs in the pipeline complete without error creating part files just fine in the local file system

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist

Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

does exist at the time of the error. So the code is looking for the data in the wrong place?

….
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
        at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
        at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
        at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '127.0.0.1:63483', transport: 'socket'






Re: Error using hadoop in non-distributed mode

Posted by Pat Ferrel <pa...@gmail.com>.
Thanks! You nailed it. 

Mahout was using the cache but fortunately there was an easy way to tell it not to and now the jobs run local and therefore in a debugging setup.


On Sep 4, 2012, at 9:22 PM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:

Hi,

The path /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/<snip> is a location used by the tasktracker process for the 'DistributedCache' - a mechanism to distribute files to all tasks running in a map reduce job. (http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache).

You have mentioned Mahout, so I am assuming that the specific analysis job you are running is using this feature to distribute the output of the file /Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 to the job that is causing a failure.

Also, I find links stating the distributed cache does not work with in the local (non-HDFS) mode. (http://stackoverflow.com/questions/9148724/multiple-input-into-a-mapper-in-hadoop). Look at the second answer.

Thanks
hemanth


On Tue, Sep 4, 2012 at 10:33 PM, Pat Ferrel <pa...@gmail.com> wrote:
The job is creating several output and intermediate files all under the location: Users/pat/Projects/big-data/b/ssvd/ several output directories and files are created correctly and the file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 is created and exists at the time of the error. We seem to be passing in Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 as the input file.

Under what circumstances would an input path passed in as "Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000" be turned into "pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"

???


On Sep 4, 2012, at 1:14 AM, Narasingu Ramesh <ra...@gmail.com> wrote:

Hi Pat,
            Please specify correct input file location.
Thanks & Regards,
Ramesh.Narasingu

On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside Intellij IDEA. When I run one particular part of the analysis I get the error below. I didn't write the code but we are looking for some hint about what might cause it. This job completes without error in a single node pseudo-clustered config outside of IDEA.

several jobs in the pipeline complete without error creating part files just fine in the local file system

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist

Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

does exist at the time of the error. So the code is looking for the data in the wrong place?

….
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
        at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
        at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
        at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '127.0.0.1:63483', transport: 'socket'






Re: Error using hadoop in non-distributed mode

Posted by Pat Ferrel <pa...@gmail.com>.
Thanks! You nailed it. 

Mahout was using the cache but fortunately there was an easy way to tell it not to and now the jobs run local and therefore in a debugging setup.


On Sep 4, 2012, at 9:22 PM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:

Hi,

The path /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/<snip> is a location used by the tasktracker process for the 'DistributedCache' - a mechanism to distribute files to all tasks running in a map reduce job. (http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache).

You have mentioned Mahout, so I am assuming that the specific analysis job you are running is using this feature to distribute the output of the file /Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 to the job that is causing a failure.

Also, I find links stating the distributed cache does not work with in the local (non-HDFS) mode. (http://stackoverflow.com/questions/9148724/multiple-input-into-a-mapper-in-hadoop). Look at the second answer.

Thanks
hemanth


On Tue, Sep 4, 2012 at 10:33 PM, Pat Ferrel <pa...@gmail.com> wrote:
The job is creating several output and intermediate files all under the location: Users/pat/Projects/big-data/b/ssvd/ several output directories and files are created correctly and the file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 is created and exists at the time of the error. We seem to be passing in Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 as the input file.

Under what circumstances would an input path passed in as "Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000" be turned into "pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"

???


On Sep 4, 2012, at 1:14 AM, Narasingu Ramesh <ra...@gmail.com> wrote:

Hi Pat,
            Please specify correct input file location.
Thanks & Regards,
Ramesh.Narasingu

On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside Intellij IDEA. When I run one particular part of the analysis I get the error below. I didn't write the code but we are looking for some hint about what might cause it. This job completes without error in a single node pseudo-clustered config outside of IDEA.

several jobs in the pipeline complete without error creating part files just fine in the local file system

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist

Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

does exist at the time of the error. So the code is looking for the data in the wrong place?

….
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
        at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
        at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
        at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '127.0.0.1:63483', transport: 'socket'






Re: Error using hadoop in non-distributed mode

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

The path /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/<snip>
is a location used by the tasktracker process for the 'DistributedCache' -
a mechanism to distribute files to all tasks running in a map reduce job. (
http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache
).

You have mentioned Mahout, so I am assuming that the specific analysis job
you are running is using this feature to distribute the output of the file /
Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 to the job that is
causing a failure.

Also, I find links stating the distributed cache does not work with in the
local (non-HDFS) mode. (
http://stackoverflow.com/questions/9148724/multiple-input-into-a-mapper-in-hadoop).
Look at the second answer.

Thanks
hemanth


On Tue, Sep 4, 2012 at 10:33 PM, Pat Ferrel <pa...@gmail.com> wrote:

> The job is creating several output and intermediate files all under the
> location: Users/pat/Projects/big-data/b/ssvd/ several output directories
> and files are created correctly and the
> file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 is created and
> exists at the time of the error. We seem to be passing
> in Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 as the input file.
>
> Under what circumstances would an input path passed in as
> "Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000" be turned into
> "pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"
>
> ???
>
>
> On Sep 4, 2012, at 1:14 AM, Narasingu Ramesh <ra...@gmail.com>
> wrote:
>
> Hi Pat,
>             Please specify correct input file location.
> Thanks & Regards,
> Ramesh.Narasingu
>
> On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
>> Using hadoop with mahout in a local filesystem/non-hdfs config for
>> debugging purposes inside Intellij IDEA. When I run one particular part of
>> the analysis I get the error below. I didn't write the code but we are
>> looking for some hint about what might cause it. This job completes without
>> error in a single node pseudo-clustered config outside of IDEA.
>>
>> several jobs in the pipeline complete without error creating part files
>> just fine in the local file system
>>
>> The file
>> /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>>
>> which is the subject of the error - does not exist
>>
>> Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>>
>> does exist at the time of the error. So the code is looking for the data
>> in the wrong place?
>>
>> ….
>> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
>> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
>> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
>> 12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
>> java.io.FileNotFoundException: File
>> /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>> does not exist.
>>         at
>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
>>         at
>> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>>         at
>> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>> Exception in thread "main" java.io.IOException: Bt job unsuccessful.
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
>>         at
>> com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
>>         at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>         at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
>> Disconnected from the target VM, address: '127.0.0.1:63483', transport:
>> 'socket'
>>
>>
>
>

Re: Error using hadoop in non-distributed mode

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

The path /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/<snip>
is a location used by the tasktracker process for the 'DistributedCache' -
a mechanism to distribute files to all tasks running in a map reduce job. (
http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache
).

You have mentioned Mahout, so I am assuming that the specific analysis job
you are running is using this feature to distribute the output of the file /
Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 to the job that is
causing a failure.

Also, I find links stating the distributed cache does not work with in the
local (non-HDFS) mode. (
http://stackoverflow.com/questions/9148724/multiple-input-into-a-mapper-in-hadoop).
Look at the second answer.

Thanks
hemanth


On Tue, Sep 4, 2012 at 10:33 PM, Pat Ferrel <pa...@gmail.com> wrote:

> The job is creating several output and intermediate files all under the
> location: Users/pat/Projects/big-data/b/ssvd/ several output directories
> and files are created correctly and the
> file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 is created and
> exists at the time of the error. We seem to be passing
> in Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 as the input file.
>
> Under what circumstances would an input path passed in as
> "Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000" be turned into
> "pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"
>
> ???
>
>
> On Sep 4, 2012, at 1:14 AM, Narasingu Ramesh <ra...@gmail.com>
> wrote:
>
> Hi Pat,
>             Please specify correct input file location.
> Thanks & Regards,
> Ramesh.Narasingu
>
> On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
>> Using hadoop with mahout in a local filesystem/non-hdfs config for
>> debugging purposes inside Intellij IDEA. When I run one particular part of
>> the analysis I get the error below. I didn't write the code but we are
>> looking for some hint about what might cause it. This job completes without
>> error in a single node pseudo-clustered config outside of IDEA.
>>
>> several jobs in the pipeline complete without error creating part files
>> just fine in the local file system
>>
>> The file
>> /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>>
>> which is the subject of the error - does not exist
>>
>> Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>>
>> does exist at the time of the error. So the code is looking for the data
>> in the wrong place?
>>
>> ….
>> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
>> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
>> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
>> 12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
>> java.io.FileNotFoundException: File
>> /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>> does not exist.
>>         at
>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
>>         at
>> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>>         at
>> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>> Exception in thread "main" java.io.IOException: Bt job unsuccessful.
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
>>         at
>> com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
>>         at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>         at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
>> Disconnected from the target VM, address: '127.0.0.1:63483', transport:
>> 'socket'
>>
>>
>
>

Re: Error using hadoop in non-distributed mode

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

The path /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/<snip>
is a location used by the tasktracker process for the 'DistributedCache' -
a mechanism to distribute files to all tasks running in a map reduce job. (
http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache
).

You have mentioned Mahout, so I am assuming that the specific analysis job
you are running is using this feature to distribute the output of the file /
Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 to the job that is
causing a failure.

Also, I find links stating the distributed cache does not work with in the
local (non-HDFS) mode. (
http://stackoverflow.com/questions/9148724/multiple-input-into-a-mapper-in-hadoop).
Look at the second answer.

Thanks
hemanth


On Tue, Sep 4, 2012 at 10:33 PM, Pat Ferrel <pa...@gmail.com> wrote:

> The job is creating several output and intermediate files all under the
> location: Users/pat/Projects/big-data/b/ssvd/ several output directories
> and files are created correctly and the
> file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 is created and
> exists at the time of the error. We seem to be passing
> in Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 as the input file.
>
> Under what circumstances would an input path passed in as
> "Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000" be turned into
> "pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"
>
> ???
>
>
> On Sep 4, 2012, at 1:14 AM, Narasingu Ramesh <ra...@gmail.com>
> wrote:
>
> Hi Pat,
>             Please specify correct input file location.
> Thanks & Regards,
> Ramesh.Narasingu
>
> On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
>> Using hadoop with mahout in a local filesystem/non-hdfs config for
>> debugging purposes inside Intellij IDEA. When I run one particular part of
>> the analysis I get the error below. I didn't write the code but we are
>> looking for some hint about what might cause it. This job completes without
>> error in a single node pseudo-clustered config outside of IDEA.
>>
>> several jobs in the pipeline complete without error creating part files
>> just fine in the local file system
>>
>> The file
>> /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>>
>> which is the subject of the error - does not exist
>>
>> Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>>
>> does exist at the time of the error. So the code is looking for the data
>> in the wrong place?
>>
>> ….
>> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
>> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
>> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
>> 12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
>> java.io.FileNotFoundException: File
>> /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>> does not exist.
>>         at
>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
>>         at
>> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>>         at
>> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>> Exception in thread "main" java.io.IOException: Bt job unsuccessful.
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
>>         at
>> com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
>>         at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>         at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
>> Disconnected from the target VM, address: '127.0.0.1:63483', transport:
>> 'socket'
>>
>>
>
>

Re: Error using hadoop in non-distributed mode

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

The path /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/<snip>
is a location used by the tasktracker process for the 'DistributedCache' -
a mechanism to distribute files to all tasks running in a map reduce job. (
http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache
).

You have mentioned Mahout, so I am assuming that the specific analysis job
you are running is using this feature to distribute the output of the file /
Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 to the job that is
causing a failure.

Also, I find links stating the distributed cache does not work with in the
local (non-HDFS) mode. (
http://stackoverflow.com/questions/9148724/multiple-input-into-a-mapper-in-hadoop).
Look at the second answer.

Thanks
hemanth


On Tue, Sep 4, 2012 at 10:33 PM, Pat Ferrel <pa...@gmail.com> wrote:

> The job is creating several output and intermediate files all under the
> location: Users/pat/Projects/big-data/b/ssvd/ several output directories
> and files are created correctly and the
> file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 is created and
> exists at the time of the error. We seem to be passing
> in Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 as the input file.
>
> Under what circumstances would an input path passed in as
> "Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000" be turned into
> "pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"
>
> ???
>
>
> On Sep 4, 2012, at 1:14 AM, Narasingu Ramesh <ra...@gmail.com>
> wrote:
>
> Hi Pat,
>             Please specify correct input file location.
> Thanks & Regards,
> Ramesh.Narasingu
>
> On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
>> Using hadoop with mahout in a local filesystem/non-hdfs config for
>> debugging purposes inside Intellij IDEA. When I run one particular part of
>> the analysis I get the error below. I didn't write the code but we are
>> looking for some hint about what might cause it. This job completes without
>> error in a single node pseudo-clustered config outside of IDEA.
>>
>> several jobs in the pipeline complete without error creating part files
>> just fine in the local file system
>>
>> The file
>> /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>>
>> which is the subject of the error - does not exist
>>
>> Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>>
>> does exist at the time of the error. So the code is looking for the data
>> in the wrong place?
>>
>> ….
>> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
>> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
>> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
>> 12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
>> java.io.FileNotFoundException: File
>> /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>> does not exist.
>>         at
>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
>>         at
>> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>>         at
>> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>> Exception in thread "main" java.io.IOException: Bt job unsuccessful.
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
>>         at
>> com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
>>         at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>         at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
>> Disconnected from the target VM, address: '127.0.0.1:63483', transport:
>> 'socket'
>>
>>
>
>

Re: Error using hadoop in non-distributed mode

Posted by Pat Ferrel <pa...@gmail.com>.
The job is creating several output and intermediate files all under the location: Users/pat/Projects/big-data/b/ssvd/ several output directories and files are created correctly and the file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 is created and exists at the time of the error. We seem to be passing in Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 as the input file.

Under what circumstances would an input path passed in as "Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000" be turned into "pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"

???


On Sep 4, 2012, at 1:14 AM, Narasingu Ramesh <ra...@gmail.com> wrote:

Hi Pat,
            Please specify correct input file location.
Thanks & Regards,
Ramesh.Narasingu

On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside Intellij IDEA. When I run one particular part of the analysis I get the error below. I didn't write the code but we are looking for some hint about what might cause it. This job completes without error in a single node pseudo-clustered config outside of IDEA.

several jobs in the pipeline complete without error creating part files just fine in the local file system

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist

Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

does exist at the time of the error. So the code is looking for the data in the wrong place?

….
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
        at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
        at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
        at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '127.0.0.1:63483', transport: 'socket'




Re: Error using hadoop in non-distributed mode

Posted by Pat Ferrel <pa...@gmail.com>.
The job is creating several output and intermediate files all under the location: Users/pat/Projects/big-data/b/ssvd/ several output directories and files are created correctly and the file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 is created and exists at the time of the error. We seem to be passing in Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 as the input file.

Under what circumstances would an input path passed in as "Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000" be turned into "pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"

???


On Sep 4, 2012, at 1:14 AM, Narasingu Ramesh <ra...@gmail.com> wrote:

Hi Pat,
            Please specify correct input file location.
Thanks & Regards,
Ramesh.Narasingu

On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside Intellij IDEA. When I run one particular part of the analysis I get the error below. I didn't write the code but we are looking for some hint about what might cause it. This job completes without error in a single node pseudo-clustered config outside of IDEA.

several jobs in the pipeline complete without error creating part files just fine in the local file system

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist

Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

does exist at the time of the error. So the code is looking for the data in the wrong place?

….
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
        at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
        at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
        at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '127.0.0.1:63483', transport: 'socket'




Re: Error using hadoop in non-distributed mode

Posted by Pat Ferrel <pa...@gmail.com>.
The job is creating several output and intermediate files all under the location: Users/pat/Projects/big-data/b/ssvd/ several output directories and files are created correctly and the file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 is created and exists at the time of the error. We seem to be passing in Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 as the input file.

Under what circumstances would an input path passed in as "Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000" be turned into "pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"

???


On Sep 4, 2012, at 1:14 AM, Narasingu Ramesh <ra...@gmail.com> wrote:

Hi Pat,
            Please specify correct input file location.
Thanks & Regards,
Ramesh.Narasingu

On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside Intellij IDEA. When I run one particular part of the analysis I get the error below. I didn't write the code but we are looking for some hint about what might cause it. This job completes without error in a single node pseudo-clustered config outside of IDEA.

several jobs in the pipeline complete without error creating part files just fine in the local file system

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist

Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

does exist at the time of the error. So the code is looking for the data in the wrong place?

….
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
        at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
        at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
        at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '127.0.0.1:63483', transport: 'socket'




Re: Error using hadoop in non-distributed mode

Posted by Pat Ferrel <pa...@gmail.com>.
The job is creating several output and intermediate files all under the location: Users/pat/Projects/big-data/b/ssvd/ several output directories and files are created correctly and the file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 is created and exists at the time of the error. We seem to be passing in Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 as the input file.

Under what circumstances would an input path passed in as "Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000" be turned into "pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"

???


On Sep 4, 2012, at 1:14 AM, Narasingu Ramesh <ra...@gmail.com> wrote:

Hi Pat,
            Please specify correct input file location.
Thanks & Regards,
Ramesh.Narasingu

On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside Intellij IDEA. When I run one particular part of the analysis I get the error below. I didn't write the code but we are looking for some hint about what might cause it. This job completes without error in a single node pseudo-clustered config outside of IDEA.

several jobs in the pipeline complete without error creating part files just fine in the local file system

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist

Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

does exist at the time of the error. So the code is looking for the data in the wrong place?

….
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
        at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
        at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
        at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '127.0.0.1:63483', transport: 'socket'




Re: Error using hadoop in non-distributed mode

Posted by Narasingu Ramesh <ra...@gmail.com>.
Hi Pat,
            Please specify correct input file location.
Thanks & Regards,
Ramesh.Narasingu

On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Using hadoop with mahout in a local filesystem/non-hdfs config for
> debugging purposes inside Intellij IDEA. When I run one particular part of
> the analysis I get the error below. I didn't write the code but we are
> looking for some hint about what might cause it. This job completes without
> error in a single node pseudo-clustered config outside of IDEA.
>
> several jobs in the pipeline complete without error creating part files
> just fine in the local file system
>
> The file
> /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>
> which is the subject of the error - does not exist
>
> Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>
> does exist at the time of the error. So the code is looking for the data
> in the wrong place?
>
> ….
> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
> 12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
> java.io.FileNotFoundException: File
> /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
> does not exist.
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
>         at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>         at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Exception in thread "main" java.io.IOException: Bt job unsuccessful.
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
>         at
> com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
>         at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
> Disconnected from the target VM, address: '127.0.0.1:63483', transport:
> 'socket'
>
>

Re: Error using hadoop in non-distributed mode

Posted by Narasingu Ramesh <ra...@gmail.com>.
Hi Pat,
            Please specify correct input file location.
Thanks & Regards,
Ramesh.Narasingu

On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Using hadoop with mahout in a local filesystem/non-hdfs config for
> debugging purposes inside Intellij IDEA. When I run one particular part of
> the analysis I get the error below. I didn't write the code but we are
> looking for some hint about what might cause it. This job completes without
> error in a single node pseudo-clustered config outside of IDEA.
>
> several jobs in the pipeline complete without error creating part files
> just fine in the local file system
>
> The file
> /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>
> which is the subject of the error - does not exist
>
> Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>
> does exist at the time of the error. So the code is looking for the data
> in the wrong place?
>
> ….
> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
> 12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
> java.io.FileNotFoundException: File
> /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
> does not exist.
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
>         at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>         at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Exception in thread "main" java.io.IOException: Bt job unsuccessful.
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
>         at
> com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
>         at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
> Disconnected from the target VM, address: '127.0.0.1:63483', transport:
> 'socket'
>
>

Re: Error using hadoop in non-distributed mode

Posted by Narasingu Ramesh <ra...@gmail.com>.
Hi Pat,
            Please specify correct input file location.
Thanks & Regards,
Ramesh.Narasingu

On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Using hadoop with mahout in a local filesystem/non-hdfs config for
> debugging purposes inside Intellij IDEA. When I run one particular part of
> the analysis I get the error below. I didn't write the code but we are
> looking for some hint about what might cause it. This job completes without
> error in a single node pseudo-clustered config outside of IDEA.
>
> several jobs in the pipeline complete without error creating part files
> just fine in the local file system
>
> The file
> /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>
> which is the subject of the error - does not exist
>
> Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>
> does exist at the time of the error. So the code is looking for the data
> in the wrong place?
>
> ….
> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
> 12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
> java.io.FileNotFoundException: File
> /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
> does not exist.
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
>         at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>         at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Exception in thread "main" java.io.IOException: Bt job unsuccessful.
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
>         at
> com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
>         at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
> Disconnected from the target VM, address: '127.0.0.1:63483', transport:
> 'socket'
>
>

Re: Error using hadoop in non-distributed mode

Posted by Narasingu Ramesh <ra...@gmail.com>.
Hi Pat,
            Please specify correct input file location.
Thanks & Regards,
Ramesh.Narasingu

On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Using hadoop with mahout in a local filesystem/non-hdfs config for
> debugging purposes inside Intellij IDEA. When I run one particular part of
> the analysis I get the error below. I didn't write the code but we are
> looking for some hint about what might cause it. This job completes without
> error in a single node pseudo-clustered config outside of IDEA.
>
> several jobs in the pipeline complete without error creating part files
> just fine in the local file system
>
> The file
> /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>
> which is the subject of the error - does not exist
>
> Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
>
> does exist at the time of the error. So the code is looking for the data
> in the wrong place?
>
> ….
> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
> 12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
> 12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
> java.io.FileNotFoundException: File
> /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
> does not exist.
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
>         at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>         at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Exception in thread "main" java.io.IOException: Bt job unsuccessful.
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
>         at
> com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
>         at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
> Disconnected from the target VM, address: '127.0.0.1:63483', transport:
> 'socket'
>
>

Error using hadoop in non-distributed mode

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside Intellij IDEA. When I run one particular part of the analysis I get the error below. I didn't write the code but we are looking for some hint about what might cause it. This job completes without error in a single node pseudo-clustered config outside of IDEA.

several jobs in the pipeline complete without error creating part files just fine in the local file system

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist

Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

does exist at the time of the error. So the code is looking for the data in the wrong place?

…. 
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist.
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
	at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
	at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
	at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
	at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
	at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
	at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
	at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '127.0.0.1:63483', transport: 'socket'


Error using hadoop in non-distributed mode

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside Intellij IDEA. When I run one particular part of the analysis I get the error below. I didn't write the code but we are looking for some hint about what might cause it. This job completes without error in a single node pseudo-clustered config outside of IDEA.

several jobs in the pipeline complete without error creating part files just fine in the local file system

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist

Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

does exist at the time of the error. So the code is looking for the data in the wrong place?

…. 
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist.
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
	at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
	at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
	at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
	at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
	at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
	at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
	at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '127.0.0.1:63483', transport: 'socket'


Error using hadoop in non-distributed mode

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside Intellij IDEA. When I run one particular part of the analysis I get the error below. I didn't write the code but we are looking for some hint about what might cause it. This job completes without error in a single node pseudo-clustered config outside of IDEA.

several jobs in the pipeline complete without error creating part files just fine in the local file system

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist

Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

does exist at the time of the error. So the code is looking for the data in the wrong place?

…. 
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist.
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
	at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
	at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
	at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
	at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
	at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
	at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
	at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '127.0.0.1:63483', transport: 'socket'


Error using hadoop in non-distributed mode

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside Intellij IDEA. When I run one particular part of the analysis I get the error below. I didn't write the code but we are looking for some hint about what might cause it. This job completes without error in a single node pseudo-clustered config outside of IDEA.

several jobs in the pipeline complete without error creating part files just fine in the local file system

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist

Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

does exist at the time of the error. So the code is looking for the data in the wrong place?

…. 
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist.
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
	at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
	at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
	at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
	at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
	at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
	at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
	at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '127.0.0.1:63483', transport: 'socket'