You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Yi-Kai Tsai <yi...@yahoo-inc.com> on 2008/08/20 06:24:02 UTC

Missing lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz

hi

I found we miss lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz ?

thanks

-- 
Yi-Kai Tsai (cuma) <yi...@yahoo-inc.com>, Asia Regional Search Engineering.

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by Ashish Venugopal <ar...@andrew.cmu.edu>.

Its slightly counterintuitive, but I used to get errors like this when my
reducers would run out of memory. Turns out that if a reducer uses up too
much memory and brings down a node, that it could also kill the services
that are making map data available to other reducers. I cant explain exactly
why this exact error happens, but I have found that the culprit is often
memory usage (normally in the reducer).
Ashish

On Thu, Aug 28, 2008 at 7:59 AM, Jason Venner <ja...@attributor.com> wrote:

> We have started to see this class of error under hadoop 0.16.1 on a
> medium sized hdfs cluster under moderate load
>
> wangxu wrote:
> > Hi,all
> > I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
> > and running hadoop on one namenode and 4 slaves.
> > attached is my hadoop-site.xml, and I didn't change the file
> > hadoop-default.xml
> >
> > when data in segments are large,this kind of errors occure:
> >
> > java.io.IOException: Could not obtain block:
> blk_-2634319951074439134_1129
> file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
> >       at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1462)
> >       at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1312)
> >       at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417)
> >       at java.io.DataInputStream.readFully(DataInputStream.java:178)
> >       at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
> >       at
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
> >       at
> org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1646)
> >       at
> org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:1712)
> >       at
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1787)
> >       at
> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:104)
> >       at
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)
> >       at
> org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordReader.java:112)
> >       at
> org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecordReader.java:130)
> >       at
> org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector(CompositeRecordReader.java:398)
> >       at
> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:56)
> >       at
> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:33)
> >       at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165)
> >       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
> >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> >       at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> >
> >
> > how can I correct this?
> > thanks.
> > Xu
> >
> >
> --
> Jason Venner
> Attributor - Program the Web <http://www.attributor.com/>
> Attributor is hiring Hadoop Wranglers and coding wizards, contact if
> interested
>

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by Jason Venner <ja...@attributor.com>.

We have started to see this class of error under hadoop 0.16.1 on a
medium sized hdfs cluster under moderate load

wangxu wrote:
> Hi,all
> I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
> and running hadoop on one namenode and 4 slaves.
> attached is my hadoop-site.xml, and I didn't change the file
> hadoop-default.xml
>
> when data in segments are large,this kind of errors occure:
>
> java.io.IOException: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
> 	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1462)
> 	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1312)
> 	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:178)
> 	at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
> 	at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
> 	at org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1646)
> 	at org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:1712)
> 	at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1787)
> 	at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:104)
> 	at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)
> 	at org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordReader.java:112)
> 	at org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecordReader.java:130)
> 	at org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector(CompositeRecordReader.java:398)
> 	at org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:56)
> 	at org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:33)
> 	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
>
>
> how can I correct this?
> thanks.
> Xu
>
>   
-- 
Jason Venner
Attributor - Program the Web <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers and coding wizards, contact if
interested

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

Thanks Stefan.

What you are seeing is fixed in HADOOP-3232. It is different from main 
problems reported in this thread. Please try 0.18.1 and see how it works.

Raghu.

Stefan Will wrote:
> I'll add a comment to Jira. I haven't tried the latest version of the patch
> yet, but since it's only changes the dfs client, not the datanode, I don't
> see how it would help with this.
> 
> Two more things I noticed that happen when the datanodes become unresponsive
> (i.e. The "Last Contact" field on the namenode keeps increasing) is:
> 
> 1. The datanode process seem to be completely hung for a while, including
> its Jetty web interface, sometimes for over 10 minutes.
> 
> 2. The task tracker on the same machine keeps humming along, sending regular
> heartbeats
> 
> To me this looks like there is some sort of temporary deadlock in the
> datanode that keeps it from responding to requests. Perhaps it's the block
> report being generated ?
> 
> -- Stefan
>

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by Stefan Will <st...@gmx.net>.

I'll add a comment to Jira. I haven't tried the latest version of the patch
yet, but since it's only changes the dfs client, not the datanode, I don't
see how it would help with this.

Two more things I noticed that happen when the datanodes become unresponsive
(i.e. The "Last Contact" field on the namenode keeps increasing) is:

1. The datanode process seem to be completely hung for a while, including
its Jetty web interface, sometimes for over 10 minutes.

2. The task tracker on the same machine keeps humming along, sending regular
heartbeats

To me this looks like there is some sort of temporary deadlock in the
datanode that keeps it from responding to requests. Perhaps it's the block
report being generated ?

-- Stefan

> From: Raghu Angadi <ra...@yahoo-inc.com>
> Reply-To: <co...@hadoop.apache.org>
> Date: Tue, 09 Sep 2008 16:40:02 -0700
> To: <co...@hadoop.apache.org>
> Subject: Re: Could not obtain block: blk_-2634319951074439134_1129
> file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
> 
> Espen Amble Kolstad wrote:
>> There's a JIRA on this already:
>> https://issues.apache.org/jira/browse/HADOOP-3831
>> Setting dfs.datanode.socket.write.timeout=0 in hadoop-site.xml seems
>> to do the trick for now.
> 
> Please comment on HADOOP-3831 that you are seeing this error.. so that
> it gets committed. Did you try the patch for HADOOP-3831?
> 
> thanks,
> Raghu.
> 
>> Espen
>> 
>> On Mon, Sep 8, 2008 at 11:24 AM, Espen Amble Kolstad <es...@trank.no> wrote:
>>> Hi,
>>> 
>>> Thanks for the tip!
>>> 
>>> I tried revision 692572 of the 0.18 branch, but I still get the same errors.
>>> 
>>> On Sunday 07 September 2008 09:42:43 Dhruba Borthakur wrote:
>>>> The DFS errors might have been caused by
>>>> 
>>>> http://issues.apache.org/jira/browse/HADOOP-4040
>>>> 
>>>> thanks,
>>>> dhruba
>>>> 
>>>> On Sat, Sep 6, 2008 at 6:59 AM, Devaraj Das <dd...@yahoo-inc.com> wrote:
>>>>> These exceptions are apparently coming from the dfs side of things. Could
>>>>> someone from the dfs side please look at these?
>>>>> 
>>>>> On 9/5/08 3:04 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> Thanks!
>>>>>> The patch applies without change to hadoop-0.18.0, and should be
>>>>>> included in a 0.18.1.
>>>>>> 
>>>>>> However, I'm still seeing:
>>>>>> in hadoop.log:
>>>>>> 2008-09-05 11:13:54,805 WARN  dfs.DFSClient - Exception while reading
>>>>>> from blk_3428404120239503595_2664 of
>>>>>> /user/trank/segments/20080905102650/crawl_generate/part-00010 from
>>>>>> somehost:50010: java.io.IOException: Premeture EOF from in
>>>>>> putStream
>>>>>> 
>>>>>> in datanode.log:
>>>>>> 2008-09-05 11:15:09,554 WARN  dfs.DataNode -
>>>>>> DatanodeRegistration(somehost:50010,
>>>>>> storageID=DS-751763840-somehost-50010-1219931304453, infoPort=50075,
>>>>>> ipcPort=50020):Got exception while serving
>>>>>> blk_-4682098638573619471_2662 to
>>>>>> /somehost:
>>>>>> java.net.SocketTimeoutException: 480000 millis timeout while waiting
>>>>>> for channel to be ready for write. ch :
>>>>>> java.nio.channels.SocketChannel[connected local=/somehost:50010
>>>>>> remote=/somehost:45244]
>>>>>> 
>>>>>> These entries in datanode.log happens a few minutes apart repeatedly.
>>>>>> I've reduced # map-tasks so load on this node is below 1.0 with 5GB of
>>>>>> free memory (so it's not resource starvation).
>>>>>> 
>>>>>> Espen
>>>>>> 
>>>>>> On Thu, Sep 4, 2008 at 3:33 PM, Devaraj Das <dd...@yahoo-inc.com> wrote:
>>>>>>>> I started a profile of the reduce-task. I've attached the profiling
>>>>>>>> output. It seems from the samples that ramManager.waitForDataToMerge()
>>>>>>>> doesn't actually wait.
>>>>>>>> Has anybody seen this behavior.
>>>>>>> This has been fixed in HADOOP-3940
>>>>>>> 
>>>>>>> On 9/4/08 6:36 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
>>>>>>>> I have the same problem on our cluster.
>>>>>>>> 
>>>>>>>> It seems the reducer-tasks are using all cpu, long before there's
>>>>>>>> anything to
>>>>>>>> shuffle.
>>>>>>>> 
>>>>>>>> I started a profile of the reduce-task. I've attached the profiling
>>>>>>>> output. It seems from the samples that ramManager.waitForDataToMerge()
>>>>>>>> doesn't actually wait.
>>>>>>>> Has anybody seen this behavior.
>>>>>>>> 
>>>>>>>> Espen
>>>>>>>> 
>>>>>>>> On Thursday 28 August 2008 06:11:42 wangxu wrote:
>>>>>>>>> Hi,all
>>>>>>>>> I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
>>>>>>>>> and running hadoop on one namenode and 4 slaves.
>>>>>>>>> attached is my hadoop-site.xml, and I didn't change the file
>>>>>>>>> hadoop-default.xml
>>>>>>>>> 
>>>>>>>>> when data in segments are large,this kind of errors occure:
>>>>>>>>> 
>>>>>>>>> java.io.IOException: Could not obtain block:
>>>>>>>>> blk_-2634319951074439134_1129
>>>>>>>>> file=/user/root/crawl_debug/segments/20080825053518/content/part-0000
>>>>>>>>> 2/data at
>>>>>>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClie
>>>>>>>>> nt.jav a:1462) at
>>>>>>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.
>>>>>>>>> java:1 312) at
>>>>>>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:14
>>>>>>>>> 17) at java.io.DataInputStream.readFully(DataInputStream.java:178)
>>>>>>>>> at
>>>>>>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.j
>>>>>>>>> ava:64 ) at
>>>>>>>>> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102
>>>>>>>>> ) at
>>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java
>>>>>>>>> :1646) at
>>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceF
>>>>>>>>> ile.ja va:1712) at
>>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile
>>>>>>>>> .java: 1787) at
>>>>>>>>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(Seq
>>>>>>>>> uenceF ileRecordReader.java:104) at
>>>>>>>>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRe
>>>>>>>>> cordRe ader.java:79) at
>>>>>>>>> org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordR
>>>>>>>>> eader. java:112) at
>>>>>>>>> org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecor
>>>>>>>>> dReade r.java:130) at
>>>>>>>>> org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector
>>>>>>>>> (Compo siteRecordReader.java:398) at
>>>>>>>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.
>>>>>>>>> java:5 6) at
>>>>>>>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.
>>>>>>>>> java:3 3) at
>>>>>>>>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.jav
>>>>>>>>> a:165) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45) at
>>>>>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>>>>>>>>> at
>>>>>>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209
>>>>>>>>> )
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> how can I correct this?
>>>>>>>>> thanks.
>>>>>>>>> Xu
>>>

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

Espen Amble Kolstad wrote:
> There's a JIRA on this already:
> https://issues.apache.org/jira/browse/HADOOP-3831
> Setting dfs.datanode.socket.write.timeout=0 in hadoop-site.xml seems
> to do the trick for now.

Please comment on HADOOP-3831 that you are seeing this error.. so that 
it gets committed. Did you try the patch for HADOOP-3831?

thanks,
Raghu.

> Espen
> 
> On Mon, Sep 8, 2008 at 11:24 AM, Espen Amble Kolstad <es...@trank.no> wrote:
>> Hi,
>>
>> Thanks for the tip!
>>
>> I tried revision 692572 of the 0.18 branch, but I still get the same errors.
>>
>> On Sunday 07 September 2008 09:42:43 Dhruba Borthakur wrote:
>>> The DFS errors might have been caused by
>>>
>>> http://issues.apache.org/jira/browse/HADOOP-4040
>>>
>>> thanks,
>>> dhruba
>>>
>>> On Sat, Sep 6, 2008 at 6:59 AM, Devaraj Das <dd...@yahoo-inc.com> wrote:
>>>> These exceptions are apparently coming from the dfs side of things. Could
>>>> someone from the dfs side please look at these?
>>>>
>>>> On 9/5/08 3:04 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
>>>>> Hi,
>>>>>
>>>>> Thanks!
>>>>> The patch applies without change to hadoop-0.18.0, and should be
>>>>> included in a 0.18.1.
>>>>>
>>>>> However, I'm still seeing:
>>>>> in hadoop.log:
>>>>> 2008-09-05 11:13:54,805 WARN  dfs.DFSClient - Exception while reading
>>>>> from blk_3428404120239503595_2664 of
>>>>> /user/trank/segments/20080905102650/crawl_generate/part-00010 from
>>>>> somehost:50010: java.io.IOException: Premeture EOF from in
>>>>> putStream
>>>>>
>>>>> in datanode.log:
>>>>> 2008-09-05 11:15:09,554 WARN  dfs.DataNode -
>>>>> DatanodeRegistration(somehost:50010,
>>>>> storageID=DS-751763840-somehost-50010-1219931304453, infoPort=50075,
>>>>> ipcPort=50020):Got exception while serving
>>>>> blk_-4682098638573619471_2662 to
>>>>> /somehost:
>>>>> java.net.SocketTimeoutException: 480000 millis timeout while waiting
>>>>> for channel to be ready for write. ch :
>>>>> java.nio.channels.SocketChannel[connected local=/somehost:50010
>>>>> remote=/somehost:45244]
>>>>>
>>>>> These entries in datanode.log happens a few minutes apart repeatedly.
>>>>> I've reduced # map-tasks so load on this node is below 1.0 with 5GB of
>>>>> free memory (so it's not resource starvation).
>>>>>
>>>>> Espen
>>>>>
>>>>> On Thu, Sep 4, 2008 at 3:33 PM, Devaraj Das <dd...@yahoo-inc.com> wrote:
>>>>>>> I started a profile of the reduce-task. I've attached the profiling
>>>>>>> output. It seems from the samples that ramManager.waitForDataToMerge()
>>>>>>> doesn't actually wait.
>>>>>>> Has anybody seen this behavior.
>>>>>> This has been fixed in HADOOP-3940
>>>>>>
>>>>>> On 9/4/08 6:36 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
>>>>>>> I have the same problem on our cluster.
>>>>>>>
>>>>>>> It seems the reducer-tasks are using all cpu, long before there's
>>>>>>> anything to
>>>>>>> shuffle.
>>>>>>>
>>>>>>> I started a profile of the reduce-task. I've attached the profiling
>>>>>>> output. It seems from the samples that ramManager.waitForDataToMerge()
>>>>>>> doesn't actually wait.
>>>>>>> Has anybody seen this behavior.
>>>>>>>
>>>>>>> Espen
>>>>>>>
>>>>>>> On Thursday 28 August 2008 06:11:42 wangxu wrote:
>>>>>>>> Hi,all
>>>>>>>> I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
>>>>>>>> and running hadoop on one namenode and 4 slaves.
>>>>>>>> attached is my hadoop-site.xml, and I didn't change the file
>>>>>>>> hadoop-default.xml
>>>>>>>>
>>>>>>>> when data in segments are large,this kind of errors occure:
>>>>>>>>
>>>>>>>> java.io.IOException: Could not obtain block:
>>>>>>>> blk_-2634319951074439134_1129
>>>>>>>> file=/user/root/crawl_debug/segments/20080825053518/content/part-0000
>>>>>>>> 2/data at
>>>>>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClie
>>>>>>>> nt.jav a:1462) at
>>>>>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.
>>>>>>>> java:1 312) at
>>>>>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:14
>>>>>>>> 17) at java.io.DataInputStream.readFully(DataInputStream.java:178)
>>>>>>>> at
>>>>>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.j
>>>>>>>> ava:64 ) at
>>>>>>>> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102
>>>>>>>> ) at
>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java
>>>>>>>> :1646) at
>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceF
>>>>>>>> ile.ja va:1712) at
>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile
>>>>>>>> .java: 1787) at
>>>>>>>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(Seq
>>>>>>>> uenceF ileRecordReader.java:104) at
>>>>>>>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRe
>>>>>>>> cordRe ader.java:79) at
>>>>>>>> org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordR
>>>>>>>> eader. java:112) at
>>>>>>>> org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecor
>>>>>>>> dReade r.java:130) at
>>>>>>>> org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector
>>>>>>>> (Compo siteRecordReader.java:398) at
>>>>>>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.
>>>>>>>> java:5 6) at
>>>>>>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.
>>>>>>>> java:3 3) at
>>>>>>>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.jav
>>>>>>>> a:165) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45) at
>>>>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>>>>>>>> at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209
>>>>>>>> )
>>>>>>>>
>>>>>>>>
>>>>>>>> how can I correct this?
>>>>>>>> thanks.
>>>>>>>> Xu
>>

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by Stefan Will <st...@gmx.net>.

I'm not sure whether this is the same issue or not, but on my 4 slave
cluster, setting the below parameter doesn't seem to fix the issue.

What I'm seeing is that occasionally data nodes stop responding for up to 10
minutes at a time. In this case, the TaskTrackers will mark the nodes as
dead, and occasionally the namenode will mark them as dead as well (you can
see the "Last Contact" time steadily increase for a random node at a time
every half hour or so.

This seems to be happening during times of high disk utilization.

-- Stefan



> From: Espen Amble Kolstad <es...@trank.no>
> Reply-To: <co...@hadoop.apache.org>
> Date: Mon, 8 Sep 2008 12:40:01 +0200
> To: <co...@hadoop.apache.org>
> Subject: Re: Could not obtain block: blk_-2634319951074439134_1129
> file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
> 
> There's a JIRA on this already:
> https://issues.apache.org/jira/browse/HADOOP-3831
> Setting dfs.datanode.socket.write.timeout=0 in hadoop-site.xml seems
> to do the trick for now.
> 
> Espen
> 
> On Mon, Sep 8, 2008 at 11:24 AM, Espen Amble Kolstad <es...@trank.no> wrote:
>> Hi,
>> 
>> Thanks for the tip!
>> 
>> I tried revision 692572 of the 0.18 branch, but I still get the same errors.
>> 
>> On Sunday 07 September 2008 09:42:43 Dhruba Borthakur wrote:
>>> The DFS errors might have been caused by
>>> 
>>> http://issues.apache.org/jira/browse/HADOOP-4040
>>> 
>>> thanks,
>>> dhruba
>>> 
>>> On Sat, Sep 6, 2008 at 6:59 AM, Devaraj Das <dd...@yahoo-inc.com> wrote:
>>>> These exceptions are apparently coming from the dfs side of things. Could
>>>> someone from the dfs side please look at these?
>>>> 
>>>> On 9/5/08 3:04 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
>>>>> Hi,
>>>>> 
>>>>> Thanks!
>>>>> The patch applies without change to hadoop-0.18.0, and should be
>>>>> included in a 0.18.1.
>>>>> 
>>>>> However, I'm still seeing:
>>>>> in hadoop.log:
>>>>> 2008-09-05 11:13:54,805 WARN  dfs.DFSClient - Exception while reading
>>>>> from blk_3428404120239503595_2664 of
>>>>> /user/trank/segments/20080905102650/crawl_generate/part-00010 from
>>>>> somehost:50010: java.io.IOException: Premeture EOF from in
>>>>> putStream
>>>>> 
>>>>> in datanode.log:
>>>>> 2008-09-05 11:15:09,554 WARN  dfs.DataNode -
>>>>> DatanodeRegistration(somehost:50010,
>>>>> storageID=DS-751763840-somehost-50010-1219931304453, infoPort=50075,
>>>>> ipcPort=50020):Got exception while serving
>>>>> blk_-4682098638573619471_2662 to
>>>>> /somehost:
>>>>> java.net.SocketTimeoutException: 480000 millis timeout while waiting
>>>>> for channel to be ready for write. ch :
>>>>> java.nio.channels.SocketChannel[connected local=/somehost:50010
>>>>> remote=/somehost:45244]
>>>>> 
>>>>> These entries in datanode.log happens a few minutes apart repeatedly.
>>>>> I've reduced # map-tasks so load on this node is below 1.0 with 5GB of
>>>>> free memory (so it's not resource starvation).
>>>>> 
>>>>> Espen
>>>>> 
>>>>> On Thu, Sep 4, 2008 at 3:33 PM, Devaraj Das <dd...@yahoo-inc.com> wrote:
>>>>>>> I started a profile of the reduce-task. I've attached the profiling
>>>>>>> output. It seems from the samples that ramManager.waitForDataToMerge()
>>>>>>> doesn't actually wait.
>>>>>>> Has anybody seen this behavior.
>>>>>> 
>>>>>> This has been fixed in HADOOP-3940
>>>>>> 
>>>>>> On 9/4/08 6:36 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
>>>>>>> I have the same problem on our cluster.
>>>>>>> 
>>>>>>> It seems the reducer-tasks are using all cpu, long before there's
>>>>>>> anything to
>>>>>>> shuffle.
>>>>>>> 
>>>>>>> I started a profile of the reduce-task. I've attached the profiling
>>>>>>> output. It seems from the samples that ramManager.waitForDataToMerge()
>>>>>>> doesn't actually wait.
>>>>>>> Has anybody seen this behavior.
>>>>>>> 
>>>>>>> Espen
>>>>>>> 
>>>>>>> On Thursday 28 August 2008 06:11:42 wangxu wrote:
>>>>>>>> Hi,all
>>>>>>>> I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
>>>>>>>> and running hadoop on one namenode and 4 slaves.
>>>>>>>> attached is my hadoop-site.xml, and I didn't change the file
>>>>>>>> hadoop-default.xml
>>>>>>>> 
>>>>>>>> when data in segments are large,this kind of errors occure:
>>>>>>>> 
>>>>>>>> java.io.IOException: Could not obtain block:
>>>>>>>> blk_-2634319951074439134_1129
>>>>>>>> file=/user/root/crawl_debug/segments/20080825053518/content/part-0000
>>>>>>>> 2/data at
>>>>>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClie
>>>>>>>> nt.jav a:1462) at
>>>>>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.
>>>>>>>> java:1 312) at
>>>>>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:14
>>>>>>>> 17) at java.io.DataInputStream.readFully(DataInputStream.java:178)
>>>>>>>> at
>>>>>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.j
>>>>>>>> ava:64 ) at
>>>>>>>> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102
>>>>>>>> ) at
>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java
>>>>>>>> :1646) at
>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceF
>>>>>>>> ile.ja va:1712) at
>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile
>>>>>>>> .java: 1787) at
>>>>>>>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(Seq
>>>>>>>> uenceF ileRecordReader.java:104) at
>>>>>>>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRe
>>>>>>>> cordRe ader.java:79) at
>>>>>>>> org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordR
>>>>>>>> eader. java:112) at
>>>>>>>> org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecor
>>>>>>>> dReade r.java:130) at
>>>>>>>> org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector
>>>>>>>> (Compo siteRecordReader.java:398) at
>>>>>>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.
>>>>>>>> java:5 6) at
>>>>>>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.
>>>>>>>> java:3 3) at
>>>>>>>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.jav
>>>>>>>> a:165) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45) at
>>>>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>>>>>>>> at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209
>>>>>>>> )
>>>>>>>> 
>>>>>>>> 
>>>>>>>> how can I correct this?
>>>>>>>> thanks.
>>>>>>>> Xu
>> 
>>

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by Espen Amble Kolstad <es...@trank.no>.

There's a JIRA on this already:
https://issues.apache.org/jira/browse/HADOOP-3831
Setting dfs.datanode.socket.write.timeout=0 in hadoop-site.xml seems
to do the trick for now.

Espen

On Mon, Sep 8, 2008 at 11:24 AM, Espen Amble Kolstad <es...@trank.no> wrote:
> Hi,
>
> Thanks for the tip!
>
> I tried revision 692572 of the 0.18 branch, but I still get the same errors.
>
> On Sunday 07 September 2008 09:42:43 Dhruba Borthakur wrote:
>> The DFS errors might have been caused by
>>
>> http://issues.apache.org/jira/browse/HADOOP-4040
>>
>> thanks,
>> dhruba
>>
>> On Sat, Sep 6, 2008 at 6:59 AM, Devaraj Das <dd...@yahoo-inc.com> wrote:
>> > These exceptions are apparently coming from the dfs side of things. Could
>> > someone from the dfs side please look at these?
>> >
>> > On 9/5/08 3:04 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
>> >> Hi,
>> >>
>> >> Thanks!
>> >> The patch applies without change to hadoop-0.18.0, and should be
>> >> included in a 0.18.1.
>> >>
>> >> However, I'm still seeing:
>> >> in hadoop.log:
>> >> 2008-09-05 11:13:54,805 WARN  dfs.DFSClient - Exception while reading
>> >> from blk_3428404120239503595_2664 of
>> >> /user/trank/segments/20080905102650/crawl_generate/part-00010 from
>> >> somehost:50010: java.io.IOException: Premeture EOF from in
>> >> putStream
>> >>
>> >> in datanode.log:
>> >> 2008-09-05 11:15:09,554 WARN  dfs.DataNode -
>> >> DatanodeRegistration(somehost:50010,
>> >> storageID=DS-751763840-somehost-50010-1219931304453, infoPort=50075,
>> >> ipcPort=50020):Got exception while serving
>> >> blk_-4682098638573619471_2662 to
>> >> /somehost:
>> >> java.net.SocketTimeoutException: 480000 millis timeout while waiting
>> >> for channel to be ready for write. ch :
>> >> java.nio.channels.SocketChannel[connected local=/somehost:50010
>> >> remote=/somehost:45244]
>> >>
>> >> These entries in datanode.log happens a few minutes apart repeatedly.
>> >> I've reduced # map-tasks so load on this node is below 1.0 with 5GB of
>> >> free memory (so it's not resource starvation).
>> >>
>> >> Espen
>> >>
>> >> On Thu, Sep 4, 2008 at 3:33 PM, Devaraj Das <dd...@yahoo-inc.com> wrote:
>> >>>> I started a profile of the reduce-task. I've attached the profiling
>> >>>> output. It seems from the samples that ramManager.waitForDataToMerge()
>> >>>> doesn't actually wait.
>> >>>> Has anybody seen this behavior.
>> >>>
>> >>> This has been fixed in HADOOP-3940
>> >>>
>> >>> On 9/4/08 6:36 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
>> >>>> I have the same problem on our cluster.
>> >>>>
>> >>>> It seems the reducer-tasks are using all cpu, long before there's
>> >>>> anything to
>> >>>> shuffle.
>> >>>>
>> >>>> I started a profile of the reduce-task. I've attached the profiling
>> >>>> output. It seems from the samples that ramManager.waitForDataToMerge()
>> >>>> doesn't actually wait.
>> >>>> Has anybody seen this behavior.
>> >>>>
>> >>>> Espen
>> >>>>
>> >>>> On Thursday 28 August 2008 06:11:42 wangxu wrote:
>> >>>>> Hi,all
>> >>>>> I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
>> >>>>> and running hadoop on one namenode and 4 slaves.
>> >>>>> attached is my hadoop-site.xml, and I didn't change the file
>> >>>>> hadoop-default.xml
>> >>>>>
>> >>>>> when data in segments are large,this kind of errors occure:
>> >>>>>
>> >>>>> java.io.IOException: Could not obtain block:
>> >>>>> blk_-2634319951074439134_1129
>> >>>>> file=/user/root/crawl_debug/segments/20080825053518/content/part-0000
>> >>>>>2/data at
>> >>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClie
>> >>>>>nt.jav a:1462) at
>> >>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.
>> >>>>>java:1 312) at
>> >>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:14
>> >>>>>17) at java.io.DataInputStream.readFully(DataInputStream.java:178)
>> >>>>> at
>> >>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.j
>> >>>>>ava:64 ) at
>> >>>>> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102
>> >>>>>) at
>> >>>>> org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java
>> >>>>>:1646) at
>> >>>>> org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceF
>> >>>>>ile.ja va:1712) at
>> >>>>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile
>> >>>>>.java: 1787) at
>> >>>>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(Seq
>> >>>>>uenceF ileRecordReader.java:104) at
>> >>>>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRe
>> >>>>>cordRe ader.java:79) at
>> >>>>> org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordR
>> >>>>>eader. java:112) at
>> >>>>> org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecor
>> >>>>>dReade r.java:130) at
>> >>>>> org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector
>> >>>>>(Compo siteRecordReader.java:398) at
>> >>>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.
>> >>>>>java:5 6) at
>> >>>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.
>> >>>>>java:3 3) at
>> >>>>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.jav
>> >>>>>a:165) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45) at
>> >>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>> >>>>> at
>> >>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209
>> >>>>>)
>> >>>>>
>> >>>>>
>> >>>>> how can I correct this?
>> >>>>> thanks.
>> >>>>> Xu
>
>

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by Espen Amble Kolstad <es...@trank.no>.

Hi,

Thanks for the tip!

I tried revision 692572 of the 0.18 branch, but I still get the same errors.

On Sunday 07 September 2008 09:42:43 Dhruba Borthakur wrote:
> The DFS errors might have been caused by
>
> http://issues.apache.org/jira/browse/HADOOP-4040
>
> thanks,
> dhruba
>
> On Sat, Sep 6, 2008 at 6:59 AM, Devaraj Das <dd...@yahoo-inc.com> wrote:
> > These exceptions are apparently coming from the dfs side of things. Could
> > someone from the dfs side please look at these?
> >
> > On 9/5/08 3:04 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
> >> Hi,
> >>
> >> Thanks!
> >> The patch applies without change to hadoop-0.18.0, and should be
> >> included in a 0.18.1.
> >>
> >> However, I'm still seeing:
> >> in hadoop.log:
> >> 2008-09-05 11:13:54,805 WARN  dfs.DFSClient - Exception while reading
> >> from blk_3428404120239503595_2664 of
> >> /user/trank/segments/20080905102650/crawl_generate/part-00010 from
> >> somehost:50010: java.io.IOException: Premeture EOF from in
> >> putStream
> >>
> >> in datanode.log:
> >> 2008-09-05 11:15:09,554 WARN  dfs.DataNode -
> >> DatanodeRegistration(somehost:50010,
> >> storageID=DS-751763840-somehost-50010-1219931304453, infoPort=50075,
> >> ipcPort=50020):Got exception while serving
> >> blk_-4682098638573619471_2662 to
> >> /somehost:
> >> java.net.SocketTimeoutException: 480000 millis timeout while waiting
> >> for channel to be ready for write. ch :
> >> java.nio.channels.SocketChannel[connected local=/somehost:50010
> >> remote=/somehost:45244]
> >>
> >> These entries in datanode.log happens a few minutes apart repeatedly.
> >> I've reduced # map-tasks so load on this node is below 1.0 with 5GB of
> >> free memory (so it's not resource starvation).
> >>
> >> Espen
> >>
> >> On Thu, Sep 4, 2008 at 3:33 PM, Devaraj Das <dd...@yahoo-inc.com> wrote:
> >>>> I started a profile of the reduce-task. I've attached the profiling
> >>>> output. It seems from the samples that ramManager.waitForDataToMerge()
> >>>> doesn't actually wait.
> >>>> Has anybody seen this behavior.
> >>>
> >>> This has been fixed in HADOOP-3940
> >>>
> >>> On 9/4/08 6:36 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
> >>>> I have the same problem on our cluster.
> >>>>
> >>>> It seems the reducer-tasks are using all cpu, long before there's
> >>>> anything to
> >>>> shuffle.
> >>>>
> >>>> I started a profile of the reduce-task. I've attached the profiling
> >>>> output. It seems from the samples that ramManager.waitForDataToMerge()
> >>>> doesn't actually wait.
> >>>> Has anybody seen this behavior.
> >>>>
> >>>> Espen
> >>>>
> >>>> On Thursday 28 August 2008 06:11:42 wangxu wrote:
> >>>>> Hi,all
> >>>>> I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
> >>>>> and running hadoop on one namenode and 4 slaves.
> >>>>> attached is my hadoop-site.xml, and I didn't change the file
> >>>>> hadoop-default.xml
> >>>>>
> >>>>> when data in segments are large,this kind of errors occure:
> >>>>>
> >>>>> java.io.IOException: Could not obtain block:
> >>>>> blk_-2634319951074439134_1129
> >>>>> file=/user/root/crawl_debug/segments/20080825053518/content/part-0000
> >>>>>2/data at
> >>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClie
> >>>>>nt.jav a:1462) at
> >>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.
> >>>>>java:1 312) at
> >>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:14
> >>>>>17) at java.io.DataInputStream.readFully(DataInputStream.java:178)
> >>>>> at
> >>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.j
> >>>>>ava:64 ) at
> >>>>> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102
> >>>>>) at
> >>>>> org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java
> >>>>>:1646) at
> >>>>> org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceF
> >>>>>ile.ja va:1712) at
> >>>>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile
> >>>>>.java: 1787) at
> >>>>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(Seq
> >>>>>uenceF ileRecordReader.java:104) at
> >>>>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRe
> >>>>>cordRe ader.java:79) at
> >>>>> org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordR
> >>>>>eader. java:112) at
> >>>>> org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecor
> >>>>>dReade r.java:130) at
> >>>>> org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector
> >>>>>(Compo siteRecordReader.java:398) at
> >>>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.
> >>>>>java:5 6) at
> >>>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.
> >>>>>java:3 3) at
> >>>>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.jav
> >>>>>a:165) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45) at
> >>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> >>>>> at
> >>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209
> >>>>>)
> >>>>>
> >>>>>
> >>>>> how can I correct this?
> >>>>> thanks.
> >>>>> Xu

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by Dhruba Borthakur <dh...@gmail.com>.

The DFS errors might have been caused by

http://issues.apache.org/jira/browse/HADOOP-4040

thanks,
dhruba

On Sat, Sep 6, 2008 at 6:59 AM, Devaraj Das <dd...@yahoo-inc.com> wrote:
> These exceptions are apparently coming from the dfs side of things. Could
> someone from the dfs side please look at these?
>
>
> On 9/5/08 3:04 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
>
>> Hi,
>>
>> Thanks!
>> The patch applies without change to hadoop-0.18.0, and should be
>> included in a 0.18.1.
>>
>> However, I'm still seeing:
>> in hadoop.log:
>> 2008-09-05 11:13:54,805 WARN  dfs.DFSClient - Exception while reading
>> from blk_3428404120239503595_2664 of
>> /user/trank/segments/20080905102650/crawl_generate/part-00010 from
>> somehost:50010: java.io.IOException: Premeture EOF from in
>> putStream
>>
>> in datanode.log:
>> 2008-09-05 11:15:09,554 WARN  dfs.DataNode -
>> DatanodeRegistration(somehost:50010,
>> storageID=DS-751763840-somehost-50010-1219931304453, infoPort=50075,
>> ipcPort=50020):Got exception while serving
>> blk_-4682098638573619471_2662 to
>> /somehost:
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting
>> for channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/somehost:50010
>> remote=/somehost:45244]
>>
>> These entries in datanode.log happens a few minutes apart repeatedly.
>> I've reduced # map-tasks so load on this node is below 1.0 with 5GB of
>> free memory (so it's not resource starvation).
>>
>> Espen
>>
>> On Thu, Sep 4, 2008 at 3:33 PM, Devaraj Das <dd...@yahoo-inc.com> wrote:
>>>> I started a profile of the reduce-task. I've attached the profiling output.
>>>> It seems from the samples that ramManager.waitForDataToMerge() doesn't
>>>> actually wait.
>>>> Has anybody seen this behavior.
>>>
>>> This has been fixed in HADOOP-3940
>>>
>>>
>>> On 9/4/08 6:36 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
>>>
>>>> I have the same problem on our cluster.
>>>>
>>>> It seems the reducer-tasks are using all cpu, long before there's anything
>>>> to
>>>> shuffle.
>>>>
>>>> I started a profile of the reduce-task. I've attached the profiling output.
>>>> It seems from the samples that ramManager.waitForDataToMerge() doesn't
>>>> actually wait.
>>>> Has anybody seen this behavior.
>>>>
>>>> Espen
>>>>
>>>> On Thursday 28 August 2008 06:11:42 wangxu wrote:
>>>>> Hi,all
>>>>> I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
>>>>> and running hadoop on one namenode and 4 slaves.
>>>>> attached is my hadoop-site.xml, and I didn't change the file
>>>>> hadoop-default.xml
>>>>>
>>>>> when data in segments are large,this kind of errors occure:
>>>>>
>>>>> java.io.IOException: Could not obtain block: blk_-2634319951074439134_1129
>>>>> file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
>>>>> at
>>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.jav
>>>>> a:1462) at
>>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1
>>>>> 312) at
>>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417) at
>>>>> java.io.DataInputStream.readFully(DataInputStream.java:178)
>>>>> at
>>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64
>>>>> ) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
>>>>> at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1646)
>>>>> at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.ja
>>>>> va:1712) at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:
>>>>> 1787) at
>>>>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceF
>>>>> ileRecordReader.java:104) at
>>>>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRe
>>>>> ader.java:79) at
>>>>> org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordReader.
>>>>> java:112) at
>>>>> org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecordReade
>>>>> r.java:130) at
>>>>> org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector(Compo
>>>>> siteRecordReader.java:398) at
>>>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:5
>>>>> 6) at
>>>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:3
>>>>> 3) at
>>>>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165)
>>>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>>>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
>>>>>
>>>>>
>>>>> how can I correct this?
>>>>> thanks.
>>>>> Xu
>>>>
>>>
>>>
>>>
>
>
>

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by Devaraj Das <dd...@yahoo-inc.com>.

These exceptions are apparently coming from the dfs side of things. Could
someone from the dfs side please look at these?


On 9/5/08 3:04 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:

> Hi,
> 
> Thanks!
> The patch applies without change to hadoop-0.18.0, and should be
> included in a 0.18.1.
> 
> However, I'm still seeing:
> in hadoop.log:
> 2008-09-05 11:13:54,805 WARN  dfs.DFSClient - Exception while reading
> from blk_3428404120239503595_2664 of
> /user/trank/segments/20080905102650/crawl_generate/part-00010 from
> somehost:50010: java.io.IOException: Premeture EOF from in
> putStream
> 
> in datanode.log:
> 2008-09-05 11:15:09,554 WARN  dfs.DataNode -
> DatanodeRegistration(somehost:50010,
> storageID=DS-751763840-somehost-50010-1219931304453, infoPort=50075,
> ipcPort=50020):Got exception while serving
> blk_-4682098638573619471_2662 to
> /somehost:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting
> for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/somehost:50010
> remote=/somehost:45244]
> 
> These entries in datanode.log happens a few minutes apart repeatedly.
> I've reduced # map-tasks so load on this node is below 1.0 with 5GB of
> free memory (so it's not resource starvation).
> 
> Espen
> 
> On Thu, Sep 4, 2008 at 3:33 PM, Devaraj Das <dd...@yahoo-inc.com> wrote:
>>> I started a profile of the reduce-task. I've attached the profiling output.
>>> It seems from the samples that ramManager.waitForDataToMerge() doesn't
>>> actually wait.
>>> Has anybody seen this behavior.
>> 
>> This has been fixed in HADOOP-3940
>> 
>> 
>> On 9/4/08 6:36 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
>> 
>>> I have the same problem on our cluster.
>>> 
>>> It seems the reducer-tasks are using all cpu, long before there's anything
>>> to
>>> shuffle.
>>> 
>>> I started a profile of the reduce-task. I've attached the profiling output.
>>> It seems from the samples that ramManager.waitForDataToMerge() doesn't
>>> actually wait.
>>> Has anybody seen this behavior.
>>> 
>>> Espen
>>> 
>>> On Thursday 28 August 2008 06:11:42 wangxu wrote:
>>>> Hi,all
>>>> I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
>>>> and running hadoop on one namenode and 4 slaves.
>>>> attached is my hadoop-site.xml, and I didn't change the file
>>>> hadoop-default.xml
>>>> 
>>>> when data in segments are large,this kind of errors occure:
>>>> 
>>>> java.io.IOException: Could not obtain block: blk_-2634319951074439134_1129
>>>> file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
>>>> at
>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.jav
>>>> a:1462) at
>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1
>>>> 312) at
>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417) at
>>>> java.io.DataInputStream.readFully(DataInputStream.java:178)
>>>> at
>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64
>>>> ) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
>>>> at
>>>> org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1646)
>>>> at
>>>> org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.ja
>>>> va:1712) at
>>>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:
>>>> 1787) at
>>>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceF
>>>> ileRecordReader.java:104) at
>>>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRe
>>>> ader.java:79) at
>>>> org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordReader.
>>>> java:112) at
>>>> org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecordReade
>>>> r.java:130) at
>>>> org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector(Compo
>>>> siteRecordReader.java:398) at
>>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:5
>>>> 6) at
>>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:3
>>>> 3) at
>>>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165)
>>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
>>>> 
>>>> 
>>>> how can I correct this?
>>>> thanks.
>>>> Xu
>>> 
>> 
>> 
>>

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by Espen Amble Kolstad <es...@trank.no>.

Hi,

Thanks!
The patch applies without change to hadoop-0.18.0, and should be
included in a 0.18.1.

However, I'm still seeing:
in hadoop.log:
2008-09-05 11:13:54,805 WARN  dfs.DFSClient - Exception while reading
from blk_3428404120239503595_2664 of
/user/trank/segments/20080905102650/crawl_generate/part-00010 from
somehost:50010: java.io.IOException: Premeture EOF from in
putStream

in datanode.log:
2008-09-05 11:15:09,554 WARN  dfs.DataNode -
DatanodeRegistration(somehost:50010,
storageID=DS-751763840-somehost-50010-1219931304453, infoPort=50075,
ipcPort=50020):Got exception while serving
blk_-4682098638573619471_2662 to
/somehost:
java.net.SocketTimeoutException: 480000 millis timeout while waiting
for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/somehost:50010
remote=/somehost:45244]

These entries in datanode.log happens a few minutes apart repeatedly.
I've reduced # map-tasks so load on this node is below 1.0 with 5GB of
free memory (so it's not resource starvation).

Espen

On Thu, Sep 4, 2008 at 3:33 PM, Devaraj Das <dd...@yahoo-inc.com> wrote:
>> I started a profile of the reduce-task. I've attached the profiling output.
>> It seems from the samples that ramManager.waitForDataToMerge() doesn't
>> actually wait.
>> Has anybody seen this behavior.
>
> This has been fixed in HADOOP-3940
>
>
> On 9/4/08 6:36 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
>
>> I have the same problem on our cluster.
>>
>> It seems the reducer-tasks are using all cpu, long before there's anything to
>> shuffle.
>>
>> I started a profile of the reduce-task. I've attached the profiling output.
>> It seems from the samples that ramManager.waitForDataToMerge() doesn't
>> actually wait.
>> Has anybody seen this behavior.
>>
>> Espen
>>
>> On Thursday 28 August 2008 06:11:42 wangxu wrote:
>>> Hi,all
>>> I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
>>> and running hadoop on one namenode and 4 slaves.
>>> attached is my hadoop-site.xml, and I didn't change the file
>>> hadoop-default.xml
>>>
>>> when data in segments are large,this kind of errors occure:
>>>
>>> java.io.IOException: Could not obtain block: blk_-2634319951074439134_1129
>>> file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
>>> at
>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.jav
>>> a:1462) at
>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1
>>> 312) at
>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417) at
>>> java.io.DataInputStream.readFully(DataInputStream.java:178)
>>> at
>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64
>>> ) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1646)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.ja
>>> va:1712) at
>>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:
>>> 1787) at
>>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceF
>>> ileRecordReader.java:104) at
>>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRe
>>> ader.java:79) at
>>> org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordReader.
>>> java:112) at
>>> org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecordReade
>>> r.java:130) at
>>> org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector(Compo
>>> siteRecordReader.java:398) at
>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:5
>>> 6) at
>>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:3
>>> 3) at
>>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165)
>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
>>>
>>>
>>> how can I correct this?
>>> thanks.
>>> Xu
>>
>
>
>

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by Chris Douglas <ch...@yahoo-inc.com>.

FWIW: HADOOP-3940 is merged into the 0.18 branch and should be part of  
0.18.1. -C

On Sep 4, 2008, at 6:33 AM, Devaraj Das wrote:

>> I started a profile of the reduce-task. I've attached the profiling  
>> output.
>> It seems from the samples that ramManager.waitForDataToMerge()  
>> doesn't
>> actually wait.
>> Has anybody seen this behavior.
>
> This has been fixed in HADOOP-3940
>
>
> On 9/4/08 6:36 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:
>
>> I have the same problem on our cluster.
>>
>> It seems the reducer-tasks are using all cpu, long before there's  
>> anything to
>> shuffle.
>>
>> I started a profile of the reduce-task. I've attached the profiling  
>> output.
>> It seems from the samples that ramManager.waitForDataToMerge()  
>> doesn't
>> actually wait.
>> Has anybody seen this behavior.
>>
>> Espen
>>
>> On Thursday 28 August 2008 06:11:42 wangxu wrote:
>>> Hi,all
>>> I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
>>> and running hadoop on one namenode and 4 slaves.
>>> attached is my hadoop-site.xml, and I didn't change the file
>>> hadoop-default.xml
>>>
>>> when data in segments are large,this kind of errors occure:
>>>
>>> java.io.IOException: Could not obtain block:  
>>> blk_-2634319951074439134_1129
>>> file=/user/root/crawl_debug/segments/20080825053518/content/ 
>>> part-00002/data
>>> at
>>> org.apache.hadoop.dfs.DFSClient 
>>> $DFSInputStream.chooseDataNode(DFSClient.jav
>>> a:1462) at
>>> org.apache.hadoop.dfs.DFSClient 
>>> $DFSInputStream.blockSeekTo(DFSClient.java:1
>>> 312) at
>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java: 
>>> 1417) at
>>> java.io.DataInputStream.readFully(DataInputStream.java:178)
>>> at
>>> org.apache.hadoop.io.DataOutputBuffer 
>>> $Buffer.write(DataOutputBuffer.java:64
>>> ) at  
>>> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java: 
>>> 102)
>>> at
>>> org.apache.hadoop.io.SequenceFile 
>>> $Reader.readBuffer(SequenceFile.java:1646)
>>> at
>>> org.apache.hadoop.io.SequenceFile 
>>> $Reader.seekToCurrentValue(SequenceFile.ja
>>> va:1712) at
>>> org.apache.hadoop.io.SequenceFile 
>>> $Reader.getCurrentValue(SequenceFile.java:
>>> 1787) at
>>> org 
>>> .apache 
>>> .hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceF
>>> ileRecordReader.java:104) at
>>> org 
>>> .apache 
>>> .hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRe
>>> ader.java:79) at
>>> org 
>>> .apache 
>>> .hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordReader.
>>> java:112) at
>>> org 
>>> .apache 
>>> .hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecordReade
>>> r.java:130) at
>>> org 
>>> .apache 
>>> .hadoop.mapred.join.CompositeRecordReader.fillJoinCollector(Compo
>>> siteRecordReader.java:398) at
>>> org 
>>> .apache 
>>> .hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:5
>>> 6) at
>>> org 
>>> .apache 
>>> .hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:3
>>> 3) at
>>> org.apache.hadoop.mapred.MapTask 
>>> $TrackedRecordReader.next(MapTask.java:165)
>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>>> at org.apache.hadoop.mapred.TaskTracker 
>>> $Child.main(TaskTracker.java:2209)
>>>
>>>
>>> how can I correct this?
>>> thanks.
>>> Xu
>>
>
>

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by Devaraj Das <dd...@yahoo-inc.com>.

> I started a profile of the reduce-task. I've attached the profiling output.
> It seems from the samples that ramManager.waitForDataToMerge() doesn't
> actually wait.
> Has anybody seen this behavior.

This has been fixed in HADOOP-3940


On 9/4/08 6:36 PM, "Espen Amble Kolstad" <es...@trank.no> wrote:

> I have the same problem on our cluster.
> 
> It seems the reducer-tasks are using all cpu, long before there's anything to
> shuffle.
> 
> I started a profile of the reduce-task. I've attached the profiling output.
> It seems from the samples that ramManager.waitForDataToMerge() doesn't
> actually wait.
> Has anybody seen this behavior.
> 
> Espen
> 
> On Thursday 28 August 2008 06:11:42 wangxu wrote:
>> Hi,all
>> I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
>> and running hadoop on one namenode and 4 slaves.
>> attached is my hadoop-site.xml, and I didn't change the file
>> hadoop-default.xml
>> 
>> when data in segments are large,this kind of errors occure:
>> 
>> java.io.IOException: Could not obtain block: blk_-2634319951074439134_1129
>> file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
>> at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.jav
>> a:1462) at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1
>> 312) at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417) at
>> java.io.DataInputStream.readFully(DataInputStream.java:178)
>> at
>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64
>> ) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
>> at
>> org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1646)
>> at
>> org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.ja
>> va:1712) at
>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:
>> 1787) at
>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceF
>> ileRecordReader.java:104) at
>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRe
>> ader.java:79) at
>> org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordReader.
>> java:112) at
>> org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecordReade
>> r.java:130) at
>> org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector(Compo
>> siteRecordReader.java:398) at
>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:5
>> 6) at
>> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:3
>> 3) at
>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165)
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
>> 
>> 
>> how can I correct this?
>> thanks.
>> Xu
>

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by Espen Amble Kolstad <es...@trank.no>.

I have the same problem on our cluster.

It seems the reducer-tasks are using all cpu, long before there's anything to 
shuffle.

I started a profile of the reduce-task. I've attached the profiling output.
It seems from the samples that ramManager.waitForDataToMerge() doesn't 
actually wait.
Has anybody seen this behavior.

Espen

On Thursday 28 August 2008 06:11:42 wangxu wrote:
> Hi,all
> I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
> and running hadoop on one namenode and 4 slaves.
> attached is my hadoop-site.xml, and I didn't change the file
> hadoop-default.xml
>
> when data in segments are large,this kind of errors occure:
>
> java.io.IOException: Could not obtain block: blk_-2634319951074439134_1129
> file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
> at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.jav
>a:1462) at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1
>312) at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417) at
> java.io.DataInputStream.readFully(DataInputStream.java:178)
> 	at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64
>) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
> at
> org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1646)
> at
> org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.ja
>va:1712) at
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:
>1787) at
> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceF
>ileRecordReader.java:104) at
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRe
>ader.java:79) at
> org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordReader.
>java:112) at
> org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecordReade
>r.java:130) at
> org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector(Compo
>siteRecordReader.java:398) at
> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:5
>6) at
> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:3
>3) at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
>
>
> how can I correct this?
> thanks.
> Xu

Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by wangxu <hb...@gmail.com>.

Hi,all
I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
and running hadoop on one namenode and 4 slaves.
attached is my hadoop-site.xml, and I didn't change the file
hadoop-default.xml

when data in segments are large,this kind of errors occure:

java.io.IOException: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1462)
	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1312)
	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417)
	at java.io.DataInputStream.readFully(DataInputStream.java:178)
	at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
	at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
	at org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1646)
	at org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:1712)
	at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1787)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:104)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)
	at org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordReader.java:112)
	at org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecordReader.java:130)
	at org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector(CompositeRecordReader.java:398)
	at org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:56)
	at org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:33)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)


how can I correct this?
thanks.
Xu

Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

Posted by wangxu <hb...@gmail.com>.

Hi,all
I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
and running hadoop on one namenode and 4 slaves.
attached is my hadoop-site.xml, and I didn't change the file
hadoop-default.xml

when data in segments are large,this kind of errors occure:

java.io.IOException: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1462)
	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1312)
	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417)
	at java.io.DataInputStream.readFully(DataInputStream.java:178)
	at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
	at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
	at org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1646)
	at org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:1712)
	at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1787)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:104)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)
	at org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordReader.java:112)
	at org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecordReader.java:130)
	at org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector(CompositeRecordReader.java:398)
	at org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:56)
	at org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:33)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)


how can I correct this?
thanks.
Xu

Re: Missing lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz

Posted by Owen O'Malley <oo...@yahoo-inc.com>.

On Aug 20, 2008, at 3:39 AM, Yi-Kai Tsai wrote:

> hi
>
> Could anyone help to re-pack the 0.17.2 with missing  lib/native/ 
> Linux-amd64-64  ?

Once the release is official, we can't change the bytes in the  
tarball. We'd need to make a 17.3.

-- Owen

Re: Missing lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz

Posted by Yi-Kai Tsai <yi...@yahoo-inc.com>.

hi

Could anyone help to re-pack the 0.17.2 with missing  
lib/native/Linux-amd64-64  ?

thanks
> On Wed, Aug 20, 2008 at 9:31 AM, Yi-Kai Tsai <yi...@yahoo-inc.com> wrote:
>
>   
>> But we do have  lib/native/Linux-amd64-64 on  hadoop-0.17.1.tar.gz and
>> hadoop-0.18.0.tar.gz ?
>>     
>
>
> At least for -0.17.1, yes there is.
>
> Regards,
>
> Leon Mergen
>   


-- 
Yi-Kai Tsai (cuma) <yi...@yahoo-inc.com>, Asia Regional Search Engineering.

Re: Missing lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz

Posted by Yi-Kai Tsai <yi...@yahoo-inc.com>.

hi

Could anyone help to re-pack the 0.17.2 with missing  
lib/native/Linux-amd64-64  ?

thanks
> On Wed, Aug 20, 2008 at 9:31 AM, Yi-Kai Tsai <yi...@yahoo-inc.com> wrote:
>
>   
>> But we do have  lib/native/Linux-amd64-64 on  hadoop-0.17.1.tar.gz and
>> hadoop-0.18.0.tar.gz ?
>>     
>
>
> At least for -0.17.1, yes there is.
>
> Regards,
>
> Leon Mergen
>   


-- 
Yi-Kai Tsai (cuma) <yi...@yahoo-inc.com>, Asia Regional Search Engineering.

Re: Missing lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz

Posted by Leon Mergen <le...@solatis.com>.

On Wed, Aug 20, 2008 at 9:31 AM, Yi-Kai Tsai <yi...@yahoo-inc.com> wrote:

> But we do have  lib/native/Linux-amd64-64 on  hadoop-0.17.1.tar.gz and
> hadoop-0.18.0.tar.gz ?


At least for -0.17.1, yes there is.

Regards,

Leon Mergen

Re: Missing lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz

Posted by Leon Mergen <le...@solatis.com>.

On Wed, Aug 20, 2008 at 9:31 AM, Yi-Kai Tsai <yi...@yahoo-inc.com> wrote:

> But we do have  lib/native/Linux-amd64-64 on  hadoop-0.17.1.tar.gz and
> hadoop-0.18.0.tar.gz ?


At least for -0.17.1, yes there is.

Regards,

Leon Mergen

Re: Missing lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz

Posted by Yi-Kai Tsai <yi...@yahoo-inc.com>.

hi

But we do have  lib/native/Linux-amd64-64 on  hadoop-0.17.1.tar.gz and 
hadoop-0.18.0.tar.gz ?

> ya, looks like Owen never built the 64bit native library.  It's an
> optional build step:
> wiki.apache.org/hadoop/HowToRelease
>
> Nige
>
> On Aug 19, 2008, at 9:24 PM, Yi-Kai Tsai wrote:
>
>   
>> hi
>>
>> I found we miss lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz ?
>>
>> thanks
>>
>> --
>> Yi-Kai Tsai (cuma) <yi...@yahoo-inc.com>, Asia Regional Search
>> Engineering.
>>
>>     
>
>   


-- 
Yi-Kai Tsai (cuma) <yi...@yahoo-inc.com>, Asia Regional Search Engineering.

Re: Missing lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz

Posted by Yi-Kai Tsai <yi...@yahoo-inc.com>.

hi

But we do have  lib/native/Linux-amd64-64 on  hadoop-0.17.1.tar.gz and 
hadoop-0.18.0.tar.gz ?

> ya, looks like Owen never built the 64bit native library.  It's an
> optional build step:
> wiki.apache.org/hadoop/HowToRelease
>
> Nige
>
> On Aug 19, 2008, at 9:24 PM, Yi-Kai Tsai wrote:
>
>   
>> hi
>>
>> I found we miss lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz ?
>>
>> thanks
>>
>> --
>> Yi-Kai Tsai (cuma) <yi...@yahoo-inc.com>, Asia Regional Search
>> Engineering.
>>
>>     
>
>   


-- 
Yi-Kai Tsai (cuma) <yi...@yahoo-inc.com>, Asia Regional Search Engineering.

Re: Missing lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz

Posted by Nigel Daley <nd...@yahoo-inc.com>.

ya, looks like Owen never built the 64bit native library.  It's an  
optional build step:
wiki.apache.org/hadoop/HowToRelease

Nige

On Aug 19, 2008, at 9:24 PM, Yi-Kai Tsai wrote:

> hi
>
> I found we miss lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz ?
>
> thanks
>
> -- 
> Yi-Kai Tsai (cuma) <yi...@yahoo-inc.com>, Asia Regional Search  
> Engineering.
>

Re: Missing lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz

Posted by Nigel Daley <nd...@yahoo-inc.com>.

ya, looks like Owen never built the 64bit native library.  It's an  
optional build step:
wiki.apache.org/hadoop/HowToRelease

Nige

On Aug 19, 2008, at 9:24 PM, Yi-Kai Tsai wrote:

> hi
>
> I found we miss lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz ?
>
> thanks
>
> -- 
> Yi-Kai Tsai (cuma) <yi...@yahoo-inc.com>, Asia Regional Search  
> Engineering.
>