You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by PEDRO MANUEL JIMENEZ RODRIGUEZ <pm...@hotmail.com> on 2012/03/03 19:13:56 UTC

DistributedRowMatrix - FileNotFoundException

Hi everyone!

I'm trying to use DistributedRowMatrix in my class code but I'm getting the same error all the time: "FileNotFoundException"

I have put a file in my hdfs directory under /user/hduser/diffuse. And I run the progam with "diffuse" as input and output directory. The code looks like:

 Configuration originalConfig = getConf();
 DistributedRowMatrix matrix = new 
DistributedRowMatrix(inputPath,
                                                               outputPath,
                                                               numRows,
                                                               numCols);
                               
                               JobConf conf = new JobConf(originalConfig);
                               matrix.configure(conf);

                               DistributedRowMatrix t1 = matrix.transpose();

12/03/03 18:55:13 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/03 18:55:14 INFO mapred.FileInputFormat: Total input paths to process : 7
12/03/03 18:55:14 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201203031751_0007
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/user/hduser/diffuse/7476429391099/data
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:525)
    at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:51)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:211)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921)
    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:765)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1200)
    at org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:159)
    at Distributed.MatrixTransposeJob.run(MatrixTransposeJob.java:51)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at Distributed.MatrixTransposeJob.main(MatrixTransposeJob.java:58)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

What I'm doing wrong?

Everytime I try to run the code I'm obtaining a path like this one:

FileNotFoundException: File does not exist: hdfs://localhost:54310/user/hduser/diffuse/7476429391099/data

Thanks a lot.

Pedro.
 		 	   		  

RE: DistributedRowMatrix - FileNotFoundException

Posted by PEDRO MANUEL JIMENEZ RODRIGUEZ <pm...@hotmail.com>.
Thanks everyone for reply. I would try to do what you have suggested me.


> Date: Wed, 7 Mar 2012 21:33:16 -0800
> Subject: Re: DistributedRowMatrix - FileNotFoundException
> From: goksron@gmail.com
> To: user@mahout.apache.org
> 
> In examples/bin/asf-email-examples.sh it shows how the Bayes
> classifier takes raw text and creates int/vector sequence files.  You
> can get a very small subset of the Apache mail archives.
> 
> Try running this example and watch the different files as the script
> makes them. The Mahout job is seq2sparse.
> 
> On Wed, Mar 7, 2012 at 8:20 PM, Paritosh Ranjan <pr...@xebia.com> wrote:
> > You will have to use org.apache.hadoop.io.SequenceFile.Writer to write a
> > sequence file which can be used as a input.
> >
> > Something like,
> >
> > Writer writer = new Writer(fileSystem, conf, pathToWrite, IntWritable.class,
> > VectorWritable.class);
> > //for all IntWritable, VectorWritable pairs
> > writer.append(new IntWritable(theIntValue), new VectorWritable(theVector));
> >
> > and then use this sequence file.
> >
> >
> > On 08-03-2012 02:57, Sean Owen wrote:
> >>
> >> DistributedRowMatrix operates on IntWritable,VectorWritable in a
> >> sequence file, and it looks like you're feeding text. No, it doesn't
> >> accept some text-based format.
> >>
> >> On Wed, Mar 7, 2012 at 8:41 PM, PEDRO MANUEL JIMENEZ RODRIGUEZ
> >> <pm...@hotmail.com>  wrote:
> >>>
> >>> Sorry but I can't understand how to do it.
> >>>
> >>> I have single separated-space text file with my input matrix. To run
> >>> DistributedRowMatrix with that file I need to convert data to seqFile
> >>> format.
> >>>
> >>> How I can do this with  SequenceFileInputFormat? I have tried with
> >>> InputDriver but I didn't have success.
> >>>
> >>> Thanks for your help.
> >>>
> >
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com
 		 	   		  

Re: DistributedRowMatrix - FileNotFoundException

Posted by Lance Norskog <go...@gmail.com>.
In examples/bin/asf-email-examples.sh it shows how the Bayes
classifier takes raw text and creates int/vector sequence files.  You
can get a very small subset of the Apache mail archives.

Try running this example and watch the different files as the script
makes them. The Mahout job is seq2sparse.

On Wed, Mar 7, 2012 at 8:20 PM, Paritosh Ranjan <pr...@xebia.com> wrote:
> You will have to use org.apache.hadoop.io.SequenceFile.Writer to write a
> sequence file which can be used as a input.
>
> Something like,
>
> Writer writer = new Writer(fileSystem, conf, pathToWrite, IntWritable.class,
> VectorWritable.class);
> //for all IntWritable, VectorWritable pairs
> writer.append(new IntWritable(theIntValue), new VectorWritable(theVector));
>
> and then use this sequence file.
>
>
> On 08-03-2012 02:57, Sean Owen wrote:
>>
>> DistributedRowMatrix operates on IntWritable,VectorWritable in a
>> sequence file, and it looks like you're feeding text. No, it doesn't
>> accept some text-based format.
>>
>> On Wed, Mar 7, 2012 at 8:41 PM, PEDRO MANUEL JIMENEZ RODRIGUEZ
>> <pm...@hotmail.com>  wrote:
>>>
>>> Sorry but I can't understand how to do it.
>>>
>>> I have single separated-space text file with my input matrix. To run
>>> DistributedRowMatrix with that file I need to convert data to seqFile
>>> format.
>>>
>>> How I can do this with  SequenceFileInputFormat? I have tried with
>>> InputDriver but I didn't have success.
>>>
>>> Thanks for your help.
>>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: DistributedRowMatrix - FileNotFoundException

Posted by Paritosh Ranjan <pr...@xebia.com>.
You will have to use org.apache.hadoop.io.SequenceFile.Writer to write a 
sequence file which can be used as a input.

Something like,

Writer writer = new Writer(fileSystem, conf, pathToWrite, 
IntWritable.class, VectorWritable.class);
//for all IntWritable, VectorWritable pairs
writer.append(new IntWritable(theIntValue), new VectorWritable(theVector));

and then use this sequence file.

On 08-03-2012 02:57, Sean Owen wrote:
> DistributedRowMatrix operates on IntWritable,VectorWritable in a
> sequence file, and it looks like you're feeding text. No, it doesn't
> accept some text-based format.
>
> On Wed, Mar 7, 2012 at 8:41 PM, PEDRO MANUEL JIMENEZ RODRIGUEZ
> <pm...@hotmail.com>  wrote:
>> Sorry but I can't understand how to do it.
>>
>> I have single separated-space text file with my input matrix. To run DistributedRowMatrix with that file I need to convert data to seqFile format.
>>
>> How I can do this with  SequenceFileInputFormat? I have tried with InputDriver but I didn't have success.
>>
>> Thanks for your help.
>>


Re: DistributedRowMatrix - FileNotFoundException

Posted by Sean Owen <sr...@gmail.com>.
DistributedRowMatrix operates on IntWritable,VectorWritable in a
sequence file, and it looks like you're feeding text. No, it doesn't
accept some text-based format.

On Wed, Mar 7, 2012 at 8:41 PM, PEDRO MANUEL JIMENEZ RODRIGUEZ
<pm...@hotmail.com> wrote:
>
> Sorry but I can't understand how to do it.
>
> I have single separated-space text file with my input matrix. To run DistributedRowMatrix with that file I need to convert data to seqFile format.
>
> How I can do this with  SequenceFileInputFormat? I have tried with InputDriver but I didn't have success.
>
> Thanks for your help.
>

RE: DistributedRowMatrix - FileNotFoundException

Posted by PEDRO MANUEL JIMENEZ RODRIGUEZ <pm...@hotmail.com>.
Sorry but I can't understand how to do it. 

I have single separated-space text file with my input matrix. To run DistributedRowMatrix with that file I need to convert data to seqFile format.

How I can do this with  SequenceFileInputFormat? I have tried with InputDriver but I didn't have success.

Thanks for your help.



> Date: Tue, 6 Mar 2012 19:31:58 +0000
> Subject: Re: DistributedRowMatrix - FileNotFoundException
> From: srowen@gmail.com
> To: user@mahout.apache.org
> 
> Your input is still text though, and I assume your'e trying to use
> TextInputFormat. You can't do this as it expects an IntWritable, and
> that means it expects input as a sequence file, via
> SequenceFileInputFormat.
> 
> On Tue, Mar 6, 2012 at 7:21 PM, PEDRO MANUEL JIMENEZ RODRIGUEZ
> <pm...@hotmail.com> wrote:
> >
> > Thanks for reply.
> >
> > I was doing something wrong. I have to convert my input file to a seqFile. Now I'm trying to convert it.
> >
> > The file looks like:
> >
> > 2323.03 994.45 87.....
> > 56.45 76.21 275.1 12.456......
> > ......
> >
> > Each line represents a matrix row. And each column is separated by space.
> >
> >
> > So I executed the following command to get the seqFile
> >
> >  bin/mahout seqdirectory -i /home/pedro/input -o /home/pedro/diffuse/output -c UTF-8
> >
> > And I try to run my program whith the generated file. Getting the following error:
> >
> > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
> >    at org.apache.mahout.math.hadoop.TransposeJob$TransposeMapper.map(TransposeJob.java:100)
> >    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
> >    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at javax.security.auth.Subject.doAs(Subject.java:396)
> >    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> >    at org.apache.hadoop.mapred.Child.main(Child.java:253)
> >
> > Do I have to change the input file to another format?
> >
> > Thanks.
> >
> >
> >
> >> Date: Sun, 4 Mar 2012 17:48:56 -0800
> >> Subject: Re: DistributedRowMatrix - FileNotFoundException
> >> From: goksron@gmail.com
> >> To: user@mahout.apache.org
> >>
> >> This could be a problem with the DRM code or HDFS management. Try
> >> running it without HDFS or a Hadoop cluster, with local files and in
> >> pseudo-distributed mode. This way you can narrow the problem to one of
> >> the above.
> >>
> >> On Sat, Mar 3, 2012 at 10:13 AM, PEDRO MANUEL JIMENEZ RODRIGUEZ
> >> <pm...@hotmail.com> wrote:
> >> >
> >> > Hi everyone!
> >> >
> >> > I'm trying to use DistributedRowMatrix in my class code but I'm getting the same error all the time: "FileNotFoundException"
> >> >
> >> > I have put a file in my hdfs directory under /user/hduser/diffuse. And I run the progam with "diffuse" as input and output directory. The code looks like:
> >> >
> >> >  Configuration originalConfig = getConf();
> >> >  DistributedRowMatrix matrix = new
> >> > DistributedRowMatrix(inputPath,
> >> >                                                               outputPath,
> >> >                                                               numRows,
> >> >                                                               numCols);
> >> >
> >> >                               JobConf conf = new JobConf(originalConfig);
> >> >                               matrix.configure(conf);
> >> >
> >> >                               DistributedRowMatrix t1 = matrix.transpose();
> >> >
> >> > 12/03/03 18:55:13 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> >> > 12/03/03 18:55:14 INFO mapred.FileInputFormat: Total input paths to process : 7
> >> > 12/03/03 18:55:14 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201203031751_0007
> >> > Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/user/hduser/diffuse/7476429391099/data
> >> >    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:525)
> >> >    at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:51)
> >> >    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:211)
> >> >    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929)
> >> >    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921)
> >> >    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
> >> >    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838)
> >> >    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
> >> >    at java.security.AccessController.doPrivileged(Native Method)
> >> >    at javax.security.auth.Subject.doAs(Subject.java:396)
> >> >    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> >> >    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)
> >> >    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:765)
> >> >    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1200)
> >> >    at org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:159)
> >> >    at Distributed.MatrixTransposeJob.run(MatrixTransposeJob.java:51)
> >> >    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >> >    at Distributed.MatrixTransposeJob.main(MatrixTransposeJob.java:58)
> >> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> >    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >> >    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >> >
> >> > What I'm doing wrong?
> >> >
> >> > Everytime I try to run the code I'm obtaining a path like this one:
> >> >
> >> > FileNotFoundException: File does not exist: hdfs://localhost:54310/user/hduser/diffuse/7476429391099/data
> >> >
> >> > Thanks a lot.
> >> >
> >> > Pedro.
> >> >
> >>
> >>
> >>
> >> --
> >> Lance Norskog
> >> goksron@gmail.com
> >
 		 	   		  

Re: DistributedRowMatrix - FileNotFoundException

Posted by Sean Owen <sr...@gmail.com>.
Your input is still text though, and I assume your'e trying to use
TextInputFormat. You can't do this as it expects an IntWritable, and
that means it expects input as a sequence file, via
SequenceFileInputFormat.

On Tue, Mar 6, 2012 at 7:21 PM, PEDRO MANUEL JIMENEZ RODRIGUEZ
<pm...@hotmail.com> wrote:
>
> Thanks for reply.
>
> I was doing something wrong. I have to convert my input file to a seqFile. Now I'm trying to convert it.
>
> The file looks like:
>
> 2323.03 994.45 87.....
> 56.45 76.21 275.1 12.456......
> ......
>
> Each line represents a matrix row. And each column is separated by space.
>
>
> So I executed the following command to get the seqFile
>
>  bin/mahout seqdirectory -i /home/pedro/input -o /home/pedro/diffuse/output -c UTF-8
>
> And I try to run my program whith the generated file. Getting the following error:
>
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
>    at org.apache.mahout.math.hadoop.TransposeJob$TransposeMapper.map(TransposeJob.java:100)
>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>    at org.apache.hadoop.mapred.Child.main(Child.java:253)
>
> Do I have to change the input file to another format?
>
> Thanks.
>
>
>
>> Date: Sun, 4 Mar 2012 17:48:56 -0800
>> Subject: Re: DistributedRowMatrix - FileNotFoundException
>> From: goksron@gmail.com
>> To: user@mahout.apache.org
>>
>> This could be a problem with the DRM code or HDFS management. Try
>> running it without HDFS or a Hadoop cluster, with local files and in
>> pseudo-distributed mode. This way you can narrow the problem to one of
>> the above.
>>
>> On Sat, Mar 3, 2012 at 10:13 AM, PEDRO MANUEL JIMENEZ RODRIGUEZ
>> <pm...@hotmail.com> wrote:
>> >
>> > Hi everyone!
>> >
>> > I'm trying to use DistributedRowMatrix in my class code but I'm getting the same error all the time: "FileNotFoundException"
>> >
>> > I have put a file in my hdfs directory under /user/hduser/diffuse. And I run the progam with "diffuse" as input and output directory. The code looks like:
>> >
>> >  Configuration originalConfig = getConf();
>> >  DistributedRowMatrix matrix = new
>> > DistributedRowMatrix(inputPath,
>> >                                                               outputPath,
>> >                                                               numRows,
>> >                                                               numCols);
>> >
>> >                               JobConf conf = new JobConf(originalConfig);
>> >                               matrix.configure(conf);
>> >
>> >                               DistributedRowMatrix t1 = matrix.transpose();
>> >
>> > 12/03/03 18:55:13 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>> > 12/03/03 18:55:14 INFO mapred.FileInputFormat: Total input paths to process : 7
>> > 12/03/03 18:55:14 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201203031751_0007
>> > Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/user/hduser/diffuse/7476429391099/data
>> >    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:525)
>> >    at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:51)
>> >    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:211)
>> >    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929)
>> >    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921)
>> >    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
>> >    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838)
>> >    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
>> >    at java.security.AccessController.doPrivileged(Native Method)
>> >    at javax.security.auth.Subject.doAs(Subject.java:396)
>> >    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>> >    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)
>> >    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:765)
>> >    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1200)
>> >    at org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:159)
>> >    at Distributed.MatrixTransposeJob.run(MatrixTransposeJob.java:51)
>> >    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >    at Distributed.MatrixTransposeJob.main(MatrixTransposeJob.java:58)
>> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >    at java.lang.reflect.Method.invoke(Method.java:597)
>> >    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> >
>> > What I'm doing wrong?
>> >
>> > Everytime I try to run the code I'm obtaining a path like this one:
>> >
>> > FileNotFoundException: File does not exist: hdfs://localhost:54310/user/hduser/diffuse/7476429391099/data
>> >
>> > Thanks a lot.
>> >
>> > Pedro.
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>

RE: DistributedRowMatrix - FileNotFoundException

Posted by PEDRO MANUEL JIMENEZ RODRIGUEZ <pm...@hotmail.com>.
Thanks for reply.

I was doing something wrong. I have to convert my input file to a seqFile. Now I'm trying to convert it.

The file looks like:

2323.03 994.45 87.....
56.45 76.21 275.1 12.456......
......

Each line represents a matrix row. And each column is separated by space.


So I executed the following command to get the seqFile

 bin/mahout seqdirectory -i /home/pedro/input -o /home/pedro/diffuse/output -c UTF-8

And I try to run my program whith the generated file. Getting the following error:

java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
    at org.apache.mahout.math.hadoop.TransposeJob$TransposeMapper.map(TransposeJob.java:100)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at org.apache.hadoop.mapred.Child.main(Child.java:253)

Do I have to change the input file to another format?

Thanks.



> Date: Sun, 4 Mar 2012 17:48:56 -0800
> Subject: Re: DistributedRowMatrix - FileNotFoundException
> From: goksron@gmail.com
> To: user@mahout.apache.org
> 
> This could be a problem with the DRM code or HDFS management. Try
> running it without HDFS or a Hadoop cluster, with local files and in
> pseudo-distributed mode. This way you can narrow the problem to one of
> the above.
> 
> On Sat, Mar 3, 2012 at 10:13 AM, PEDRO MANUEL JIMENEZ RODRIGUEZ
> <pm...@hotmail.com> wrote:
> >
> > Hi everyone!
> >
> > I'm trying to use DistributedRowMatrix in my class code but I'm getting the same error all the time: "FileNotFoundException"
> >
> > I have put a file in my hdfs directory under /user/hduser/diffuse. And I run the progam with "diffuse" as input and output directory. The code looks like:
> >
> >  Configuration originalConfig = getConf();
> >  DistributedRowMatrix matrix = new
> > DistributedRowMatrix(inputPath,
> >                                                               outputPath,
> >                                                               numRows,
> >                                                               numCols);
> >
> >                               JobConf conf = new JobConf(originalConfig);
> >                               matrix.configure(conf);
> >
> >                               DistributedRowMatrix t1 = matrix.transpose();
> >
> > 12/03/03 18:55:13 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> > 12/03/03 18:55:14 INFO mapred.FileInputFormat: Total input paths to process : 7
> > 12/03/03 18:55:14 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201203031751_0007
> > Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/user/hduser/diffuse/7476429391099/data
> >    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:525)
> >    at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:51)
> >    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:211)
> >    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929)
> >    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921)
> >    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
> >    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838)
> >    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at javax.security.auth.Subject.doAs(Subject.java:396)
> >    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> >    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)
> >    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:765)
> >    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1200)
> >    at org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:159)
> >    at Distributed.MatrixTransposeJob.run(MatrixTransposeJob.java:51)
> >    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >    at Distributed.MatrixTransposeJob.main(MatrixTransposeJob.java:58)
> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >
> > What I'm doing wrong?
> >
> > Everytime I try to run the code I'm obtaining a path like this one:
> >
> > FileNotFoundException: File does not exist: hdfs://localhost:54310/user/hduser/diffuse/7476429391099/data
> >
> > Thanks a lot.
> >
> > Pedro.
> >
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com
 		 	   		  

Re: DistributedRowMatrix - FileNotFoundException

Posted by Lance Norskog <go...@gmail.com>.
This could be a problem with the DRM code or HDFS management. Try
running it without HDFS or a Hadoop cluster, with local files and in
pseudo-distributed mode. This way you can narrow the problem to one of
the above.

On Sat, Mar 3, 2012 at 10:13 AM, PEDRO MANUEL JIMENEZ RODRIGUEZ
<pm...@hotmail.com> wrote:
>
> Hi everyone!
>
> I'm trying to use DistributedRowMatrix in my class code but I'm getting the same error all the time: "FileNotFoundException"
>
> I have put a file in my hdfs directory under /user/hduser/diffuse. And I run the progam with "diffuse" as input and output directory. The code looks like:
>
>  Configuration originalConfig = getConf();
>  DistributedRowMatrix matrix = new
> DistributedRowMatrix(inputPath,
>                                                               outputPath,
>                                                               numRows,
>                                                               numCols);
>
>                               JobConf conf = new JobConf(originalConfig);
>                               matrix.configure(conf);
>
>                               DistributedRowMatrix t1 = matrix.transpose();
>
> 12/03/03 18:55:13 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 12/03/03 18:55:14 INFO mapred.FileInputFormat: Total input paths to process : 7
> 12/03/03 18:55:14 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201203031751_0007
> Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/user/hduser/diffuse/7476429391099/data
>    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:525)
>    at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:51)
>    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:211)
>    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929)
>    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921)
>    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
>    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838)
>    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)
>    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:765)
>    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1200)
>    at org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:159)
>    at Distributed.MatrixTransposeJob.run(MatrixTransposeJob.java:51)
>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>    at Distributed.MatrixTransposeJob.main(MatrixTransposeJob.java:58)
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> What I'm doing wrong?
>
> Everytime I try to run the code I'm obtaining a path like this one:
>
> FileNotFoundException: File does not exist: hdfs://localhost:54310/user/hduser/diffuse/7476429391099/data
>
> Thanks a lot.
>
> Pedro.
>



-- 
Lance Norskog
goksron@gmail.com