You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Matthew Runo <ma...@gmail.com> on 2011/02/24 19:35:23 UTC

ItemSimilarityJob FileNotFound errors

Hello folks -

I made an attempt at running the ItemSimilarityJob on our hadoop
cluster today, but I can't seem to get past this error:

11/02/24 09:18:17 INFO mapred.JobClient: Task Id :
attempt_201102231433_0008_m_000070_0, Status : FAILED
java.io.FileNotFoundException: File does not exist: /user/mruno/temp
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1586)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1577)
	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:428)
	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
	at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:450)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
	at org.apache.hadoop.mapred.Child.main(Child.java:234)

I ran the job with this command:

hadoop jar mahout-core-0.5-SNAPSHOT-jar-with-dependencies.jar
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
-Dmapred.input.dir=/user/mruno -Dmapred.output.dir=/user/mruno
--similarityClassname SIMILARITY_LOGLIKELIHOOD --tempDir
/user/mruno/temp

The "first" job runs fine, but the second one it spawns after that always fails:
ItemSimilarityJob-ItemIDIndexMapper-ItemIDIndexReducer (runs fine)
ItemSimilarityJob-CountUsersMapper-CountUsersReducer (fails with above error)
ItemSimilarityJob-ToItemPrefsMapper-ToUserVectorReducer (fails with above error)

If I look in HDFS, I have the following directory structure:
/user/mruno/input-data-file.csv
/user/mruno/temp/countUsers/...
/user/mruno/temp/itemIDIndex/...
/user/mruno/temp//userVectors/...

...so obviously the path I gave for --tempDir exists and is writable,
after all the job created all that stuff just fine except for the
input file.

Does anyone have an idea on this? I'm sort of lost as to where to
start, the exception isn't all that helpful.

If I look at the job's XML file, I see that it has mapred.output.dir
set to /user/mruno/temp/userVectors, which does exist there.

I'd appreciate any ideas, and I apologize if this would be better
asked on the Hadoop message list but I thought I'd try here first
since it was specific to the ItemSimilarityJob.

Re: ItemSimilarityJob FileNotFound errors

Posted by Matthew Runo <ma...@gmail.com>.
That may have been it.. I'm not sure though. This command seems to work:

hadoop jar mahout-core-0.5-SNAPSHOT-jar-with-dependencies.jar
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
--similarityClassname SIMILARITY_LOGLIKELIHOOD --tempDir
/user/mruno/temp -o /user/mruno/output -i /user/mruno/input

I hope this helps anyone who's trying to run this stuff..

--Matthew

On Thu, Feb 24, 2011 at 11:46 AM, Sebastian Schelter <ss...@apache.org> wrote:
> Hi Matthew,
>
> I can't really see what's wrong, only thing that makes me wonder is
> that your input and output dir are the same, you sure that's right?
>
> --sebastian
>
> 2011/2/24 Matthew Runo <ma...@gmail.com>:
>> Hello folks -
>>
>> I made an attempt at running the ItemSimilarityJob on our hadoop
>> cluster today, but I can't seem to get past this error:
>>
>> 11/02/24 09:18:17 INFO mapred.JobClient: Task Id :
>> attempt_201102231433_0008_m_000070_0, Status : FAILED
>> java.io.FileNotFoundException: File does not exist: /user/mruno/temp
>>        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1586)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1577)
>>        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:428)
>>        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
>>        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
>>        at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67)
>>        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:450)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>>        at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:234)
>>
>> I ran the job with this command:
>>
>> hadoop jar mahout-core-0.5-SNAPSHOT-jar-with-dependencies.jar
>> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
>> -Dmapred.input.dir=/user/mruno -Dmapred.output.dir=/user/mruno
>> --similarityClassname SIMILARITY_LOGLIKELIHOOD --tempDir
>> /user/mruno/temp
>>
>> The "first" job runs fine, but the second one it spawns after that always fails:
>> ItemSimilarityJob-ItemIDIndexMapper-ItemIDIndexReducer (runs fine)
>> ItemSimilarityJob-CountUsersMapper-CountUsersReducer (fails with above error)
>> ItemSimilarityJob-ToItemPrefsMapper-ToUserVectorReducer (fails with above error)
>>
>> If I look in HDFS, I have the following directory structure:
>> /user/mruno/input-data-file.csv
>> /user/mruno/temp/countUsers/...
>> /user/mruno/temp/itemIDIndex/...
>> /user/mruno/temp//userVectors/...
>>
>> ...so obviously the path I gave for --tempDir exists and is writable,
>> after all the job created all that stuff just fine except for the
>> input file.
>>
>> Does anyone have an idea on this? I'm sort of lost as to where to
>> start, the exception isn't all that helpful.
>>
>> If I look at the job's XML file, I see that it has mapred.output.dir
>> set to /user/mruno/temp/userVectors, which does exist there.
>>
>> I'd appreciate any ideas, and I apologize if this would be better
>> asked on the Hadoop message list but I thought I'd try here first
>> since it was specific to the ItemSimilarityJob.
>>
>

Re: ItemSimilarityJob FileNotFound errors

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Matthew,

I can't really see what's wrong, only thing that makes me wonder is
that your input and output dir are the same, you sure that's right?

--sebastian

2011/2/24 Matthew Runo <ma...@gmail.com>:
> Hello folks -
>
> I made an attempt at running the ItemSimilarityJob on our hadoop
> cluster today, but I can't seem to get past this error:
>
> 11/02/24 09:18:17 INFO mapred.JobClient: Task Id :
> attempt_201102231433_0008_m_000070_0, Status : FAILED
> java.io.FileNotFoundException: File does not exist: /user/mruno/temp
>        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1586)
>        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1577)
>        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:428)
>        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
>        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
>        at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67)
>        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:450)
>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>        at org.apache.hadoop.mapred.Child.main(Child.java:234)
>
> I ran the job with this command:
>
> hadoop jar mahout-core-0.5-SNAPSHOT-jar-with-dependencies.jar
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
> -Dmapred.input.dir=/user/mruno -Dmapred.output.dir=/user/mruno
> --similarityClassname SIMILARITY_LOGLIKELIHOOD --tempDir
> /user/mruno/temp
>
> The "first" job runs fine, but the second one it spawns after that always fails:
> ItemSimilarityJob-ItemIDIndexMapper-ItemIDIndexReducer (runs fine)
> ItemSimilarityJob-CountUsersMapper-CountUsersReducer (fails with above error)
> ItemSimilarityJob-ToItemPrefsMapper-ToUserVectorReducer (fails with above error)
>
> If I look in HDFS, I have the following directory structure:
> /user/mruno/input-data-file.csv
> /user/mruno/temp/countUsers/...
> /user/mruno/temp/itemIDIndex/...
> /user/mruno/temp//userVectors/...
>
> ...so obviously the path I gave for --tempDir exists and is writable,
> after all the job created all that stuff just fine except for the
> input file.
>
> Does anyone have an idea on this? I'm sort of lost as to where to
> start, the exception isn't all that helpful.
>
> If I look at the job's XML file, I see that it has mapred.output.dir
> set to /user/mruno/temp/userVectors, which does exist there.
>
> I'd appreciate any ideas, and I apologize if this would be better
> asked on the Hadoop message list but I thought I'd try here first
> since it was specific to the ItemSimilarityJob.
>