You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Deepika Khera <de...@collarity.com> on 2008/11/01 01:23:22 UTC

RE: "Merge of the inmemory files threw an exception" and diffs between 0.17.2 and 0.18.1

Wow, if the issue is fixed with version 0.20, then could we please have
a patch for version 0.18? 

Thanks,
Deepika

-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org] 
Sent: Thursday, October 30, 2008 12:19 PM
To: core-user@hadoop.apache.org
Subject: Re: "Merge of the inmemory files threw an exception" and diffs
between 0.17.2 and 0.18.1

So, Philippe reports that the problem goes away with 0.20-dev  
(trunk?): http://mahout.markmail.org/message/swmzreg6fnzf6icv   We  
aren't totally clear on the structure of SVN for Hadoop, but it seems  
like it is not fixed by this patch.



On Oct 29, 2008, at 10:28 AM, Grant Ingersoll wrote:

> We'll try it out...
>
> On Oct 28, 2008, at 3:00 PM, Arun C Murthy wrote:
>
>>
>> On Oct 27, 2008, at 7:05 PM, Grant Ingersoll wrote:
>>
>>> Hi,
>>>
>>> Over in Mahout (lucene.a.o/mahout), we are seeing an oddity with  
>>> some of our clustering code and Hadoop 0.18.1.  The thread in  
>>> context is at:  http://mahout.markmail.org/message/vcyvlz2met7fnthr
>>>
>>> The problem seems to occur when going from 0.17.2 to 0.18.1.  In  
>>> the user logs, we are seeing the following exception:
>>> 2008-10-27 21:18:37,014 INFO org.apache.hadoop.mapred.Merger: Down  
>>> to the last merge-pass, with 2 segments left of total size: 5011  
>>> bytes
>>> 2008-10-27 21:18:37,033 WARN org.apache.hadoop.mapred.ReduceTask:  
>>> attempt_200810272112_0011_r_000000_0 Merge of the inmemory files  
>>> threw an exception: java.io.IOException: Intermedate merge failed
>>>      at org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>>> $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147)
>>>      at org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>>> $InMemFSMergeThread.run(ReduceTask.java:2078)
>>> Caused by: java.lang.NumberFormatException: For input string: "["
>>
>> If you are sure that this isn't caused by your application-logic,  
>> you could try running with
http://issues.apache.org/jira/browse/HADOOP-4277 
>> .
>>
>> That bug caused many a ship to sail in large circles, hopelessly.
>>
>> Arun
>>
>>>
>>>      at  
>>> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java: 
>>> 1224)
>>>      at java.lang.Double.parseDouble(Double.java:510)
>>>      at  
>>> org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java: 
>>> 60)
>>>      at  
>>> org 
>>> .apache 
>>> .mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256)
>>>      at  
>>> org 
>>> .apache 
>>> .mahout 
>>> .clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38)
>>>      at  
>>> org 
>>> .apache 
>>> .mahout 
>>> .clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31)
>>>      at org.apache.hadoop.mapred.ReduceTask 
>>> $ReduceCopier.combineAndSpill(ReduceTask.java:2174)
>>>      at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access 
>>> $3100(ReduceTask.java:341)
>>>      at org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>>> $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134)
>>>
>>> And in the main output log (from running  bin/hadoop jar  mahout/ 
>>> examples/build/apache-mahout-examples-0.1-dev.job  
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job) we see:
>>> 08/10/27 21:18:41 INFO mapred.JobClient: Task Id :  
>>> attempt_200810272112_0011_r_000000_0, Status : FAILED
>>> java.io.IOException: attempt_200810272112_0011_r_000000_0The  
>>> reduce copier failed
>>>      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>      at org.apache.hadoop.mapred.TaskTracker 
>>> $Child.main(TaskTracker.java:2207)
>>>
>>> If I run this exact same job on 0.17.2 it all runs fine.  I  
>>> suppose either a bug was introduced in 0.18.1 or a bug was fixed  
>>> that we were relying on.  Looking at the release notes between the  
>>> fixes, nothing in particular struck me as related.  If it helps, I  
>>> can provide the instructions for how to run the example in  
>>> question (they need to be written up anyway!)
>>>
>>>
>>> I see some related things at
http://hadoop.markmail.org/search/?q=Merge+of+the+inmemory+files+threw+a
n+exception 
>>> , but those are older, it seems, so not sure what to make of them.
>>>
>>> Thanks,
>>> Grant
>>
>
> --------------------------
> Grant Ingersoll
> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
> http://www.lucenebootcamp.com
>
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>