You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2011/02/02 05:51:03 UTC

Hadoop error running Wikipedia exercise

Running the datassetcreator on the full wikipedia set:

 bin/mahout wikipediaDataSetCreator -i wiki -o
../datasets/wikipediainput -c examples/src/test/resources/country.txt

After some time in I got this error and the job quit. It left no output files.

Is this a hiccup, a Hadoop error, or something wrong in Mahout?

----------------------------

11/02/01 01:44:52 INFO bayes.WikipediaDatasetCreatorMapper: Configure:
Input Categories size: 229 Exact Match: false Analyzer:
org.apache.mahout.analysis.WikipediaAnalyzer
11/02/01 01:44:52 INFO mapred.MapTask: Starting flush of map output
11/02/01 01:44:52 INFO mapred.MapTask: Finished spill 0
11/02/01 01:44:52 INFO mapred.TaskRunner:
Task:attempt_local_0001_m_028511_0 is done. And is in the process of
commiting
11/02/01 01:44:52 INFO mapred.LocalJobRunner:
11/02/01 01:44:52 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_028511_0' done.
11/02/01 01:45:18 WARN mapred.LocalJobRunner: job_local_0001
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/file.out
in any of the configured local directories
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
        at org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:50)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:193)
11/02/01 01:45:19 INFO mapred.JobClient: Job complete: job_local_0001
11/02/01 01:45:19 INFO mapred.JobClient: Counters: 8
11/02/01 01:45:19 INFO mapred.JobClient:   FileSystemCounters
11/02/01 01:45:19 INFO mapred.JobClient:     FILE_BYTES_READ=435709583455348
11/02/01 01:45:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=72839164345155
11/02/01 01:45:19 INFO mapred.JobClient:   Map-Reduce Framework
11/02/01 01:45:19 INFO mapred.JobClient:     Combine output records=0
11/02/01 01:45:19 INFO mapred.JobClient:     Map input records=10860674
11/02/01 01:45:19 INFO mapred.JobClient:     Spilled Records=1164848
11/02/01 01:45:19 INFO mapred.JobClient:     Map output bytes=4282654947
11/02/01 01:45:19 INFO mapred.JobClient:     Combine input records=0
11/02/01 01:45:19 INFO mapred.JobClient:     Map output records=1164848
11/02/01 01:45:19 INFO driver.MahoutDriver: Program took 12692646 ms


-- 
Lance Norskog
goksron@gmail.com

Re: Hadoop error running Wikipedia exercise

Posted by vineet yadav <vi...@gmail.com>.

Hi Lance,
It is reading from local file system and not from hadoop file system. Please
check hadoop configuration. Since you are getting error while creating
wikipedia dataset. So make sure that you have enough disk space available in
your system, since wikipedia  datasets is huge.
Thanks
Vineet Yadav
On Wed, Feb 2, 2011 at 10:21 AM, Lance Norskog <go...@gmail.com> wrote:

> Running the datassetcreator on the full wikipedia set:
>
>  bin/mahout wikipediaDataSetCreator -i wiki -o
> ../datasets/wikipediainput -c examples/src/test/resources/country.txt
>
> After some time in I got this error and the job quit. It left no output
> files.
>
> Is this a hiccup, a Hadoop error, or something wrong in Mahout?
>
> ----------------------------
>
> 11/02/01 01:44:52 INFO bayes.WikipediaDatasetCreatorMapper: Configure:
> Input Categories size: 229 Exact Match: false Analyzer:
> org.apache.mahout.analysis.WikipediaAnalyzer
> 11/02/01 01:44:52 INFO mapred.MapTask: Starting flush of map output
> 11/02/01 01:44:52 INFO mapred.MapTask: Finished spill 0
> 11/02/01 01:44:52 INFO mapred.TaskRunner:
> Task:attempt_local_0001_m_028511_0 is done. And is in the process of
> commiting
> 11/02/01 01:44:52 INFO mapred.LocalJobRunner:
> 11/02/01 01:44:52 INFO mapred.TaskRunner: Task
> 'attempt_local_0001_m_028511_0' done.
> 11/02/01 01:45:18 WARN mapred.LocalJobRunner: job_local_0001
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>
> taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/file.out
> in any of the configured local directories
>        at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
>        at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
>        at
> org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:50)
>        at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:193)
> 11/02/01 01:45:19 INFO mapred.JobClient: Job complete: job_local_0001
> 11/02/01 01:45:19 INFO mapred.JobClient: Counters: 8
> 11/02/01 01:45:19 INFO mapred.JobClient:   FileSystemCounters
> 11/02/01 01:45:19 INFO mapred.JobClient:
> FILE_BYTES_READ=435709583455348
> 11/02/01 01:45:19 INFO mapred.JobClient:
> FILE_BYTES_WRITTEN=72839164345155
> 11/02/01 01:45:19 INFO mapred.JobClient:   Map-Reduce Framework
> 11/02/01 01:45:19 INFO mapred.JobClient:     Combine output records=0
> 11/02/01 01:45:19 INFO mapred.JobClient:     Map input records=10860674
> 11/02/01 01:45:19 INFO mapred.JobClient:     Spilled Records=1164848
> 11/02/01 01:45:19 INFO mapred.JobClient:     Map output bytes=4282654947
> 11/02/01 01:45:19 INFO mapred.JobClient:     Combine input records=0
> 11/02/01 01:45:19 INFO mapred.JobClient:     Map output records=1164848
> 11/02/01 01:45:19 INFO driver.MahoutDriver: Program took 12692646 ms
>
>
> --
> Lance Norskog
> goksron@gmail.com
>