You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by MilleBii <mi...@gmail.com> on 2010/01/25 20:06:53 UTC

Error in merge segments

I'm merge segments, yesterday it worked fine, but I got this error today.
Hadoop is not full, because I did a lot of clean up.
I also did a hadoop fsck and it said "healthy", what could it be ?

2010-01-25 12:15:18,998 WARN  hdfs.DFSClient - Exception while reading from
blk_7327930697821434381_9954 of
/user/root/nutch/indexed-segments/20100122220933/part-00000/_1nk.prx from
127.0.0.1:50010: java.io.IOException: Premeture EOF from inputStream


-- 
-MilleBii-

Re: Error in merge segments

Posted by MilleBii <mi...@gmail.com>.
Just in case someone looks for the solution in the future (thanks to Ken on
the hadoop mailing list)

"Could not obtain block" errors are often caused by running out of available
> file handles.  You can confirm this by going to the shell and entering
> "ulimit -n".  If it says 1024, the default, then you will want to increase
> it to about 64,000.
>
>
Indeed Ubuntu default is 1024
But just changing it on the command prompt won't do the job, you need to
change you linux settings & reboot.

HowTo well explained here:
http://posidev.com/blog/2009/06/04/set-ulimit-parameters-on-ubuntu/



2010/1/30 MilleBii <mi...@gmail.com>

> HEEELP !!!
>
> Kind of get stuck on this one.
> I backed-up my hdfs data, reformated the hdfs, put data back, try to merge
> my segments together and it explodes again.
>
> Exception in thread "Lucene Merge Thread #0"
> org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
> Could not obtain block: blk_4670839132945043210_1585
> file=/user/nutch/crawl/indexed-segments/20100113003609/part-00000/_ym.frq
>     at
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:309)
>
> If I go into the hfds/data directory I DO find the faulty block ????
> Could it be a synchro problem on the segment merger code ?
>
> 2010/1/25 MilleBii <mi...@gmail.com>
>
> I'm merge segments, yesterday it worked fine, but I got this error today.
>> Hadoop is not full, because I did a lot of clean up.
>> I also did a hadoop fsck and it said "healthy", what could it be ?
>>
>> 2010-01-25 12:15:18,998 WARN  hdfs.DFSClient - Exception while reading
>> from blk_7327930697821434381_9954 of
>> /user/root/nutch/indexed-segments/20100122220933/part-00000/_1nk.prx from
>> 127.0.0.1:50010: java.io.IOException: Premeture EOF from inputStream
>>
>>
>> --
>> -MilleBii-
>>
>
>
>
> --
> -MilleBii-
>



-- 
-MilleBii-

Re: Error in merge segments

Posted by MilleBii <mi...@gmail.com>.
HEEELP !!!

Kind of get stuck on this one.
I backed-up my hdfs data, reformated the hdfs, put data back, try to merge
my segments together and it explodes again.

Exception in thread "Lucene Merge Thread #0"
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
Could not obtain block: blk_4670839132945043210_1585
file=/user/nutch/crawl/indexed-segments/20100113003609/part-00000/_ym.frq
    at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:309)

If I go into the hfds/data directory I DO find the faulty block ????
Could it be a synchro problem on the segment merger code ?

2010/1/25 MilleBii <mi...@gmail.com>

> I'm merge segments, yesterday it worked fine, but I got this error today.
> Hadoop is not full, because I did a lot of clean up.
> I also did a hadoop fsck and it said "healthy", what could it be ?
>
> 2010-01-25 12:15:18,998 WARN  hdfs.DFSClient - Exception while reading from
> blk_7327930697821434381_9954 of
> /user/root/nutch/indexed-segments/20100122220933/part-00000/_1nk.prx from
> 127.0.0.1:50010: java.io.IOException: Premeture EOF from inputStream
>
>
> --
> -MilleBii-
>



-- 
-MilleBii-