You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Matei Zaharia <ma...@eecs.berkeley.edu> on 2007/10/18 09:32:59 UTC

Lock obtain timed out when running on Hadoop

Hi,

I'm sometimes getting the following error in the dedup 3 job when  
running Nutch 0.9 on top of Hadoop 0.14.2:

java.io.IOException: Lock obtain timed out: Lock@hdfs://r37:54310/ 
user/matei/crawl4/indexes/part-00000/write.lock
	at org.apache.lucene.store.Lock.obtain(Lock.java:69)
	at org.apache.lucene.index.IndexReader.aquireWriteLock 
(IndexReader.java:526)
	at org.apache.lucene.index.IndexReader.deleteDocument 
(IndexReader.java:551)
	at org.apache.nutch.indexer.DeleteDuplicates.reduce 
(DeleteDuplicates.java:378)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:322)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: 
1782)

Other times, it works just fine. Do you know why this is happening?

Thanks,

Matei Zaharia

Re: Lock obtain timed out when running on Hadoop

Posted by Matei Zaharia <ma...@eecs.berkeley.edu>.
Thanks for your reply. I'm wondering, is it possible to skip this  
dedup phase then, or to not acquire a lock? The reason I'd like to  
use 0.14 code is that I've instrumented it to add some tracing and  
I'd like to collect traces of how Nutch uses Hadoop. It may be  
possible to port the changes back to 0.12 but I'd prefer not to  
because I may have other apps that use things in 0.14 and because I  
want to trace the best-performing Hadoop version possible.

Matei

On Oct 18, 2007, at 12:58 AM, Nguyen Manh Tien wrote:

> You should you hadoop 0.12.3 for example to dedup. The current version
> 0.14.x don't support Lock operation.
>
> 2007/10/18, Matei Zaharia <ma...@eecs.berkeley.edu>:
>>
>> Hi,
>>
>> I'm sometimes getting the following error in the dedup 3 job when
>> running Nutch 0.9 on top of Hadoop 0.14.2:
>>
>> java.io.IOException: Lock obtain timed out: Lock@hdfs://r37:54310/
>> user/matei/crawl4/indexes/part-00000/write.lock
>>         at org.apache.lucene.store.Lock.obtain(Lock.java:69)
>>         at org.apache.lucene.index.IndexReader.aquireWriteLock
>> (IndexReader.java:526)
>>         at org.apache.lucene.index.IndexReader.deleteDocument
>> (IndexReader.java:551)
>>         at org.apache.nutch.indexer.DeleteDuplicates.reduce
>> (DeleteDuplicates.java:378)
>>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java: 
>> 322)
>>         at org.apache.hadoop.mapred.TaskTracker$Child.main(
>> TaskTracker.java:
>> 1782)
>>
>> Other times, it works just fine. Do you know why this is happening?
>>
>> Thanks,
>>
>> Matei Zaharia
>>


Re: Lock obtain timed out when running on Hadoop

Posted by Nguyen Manh Tien <ti...@gmail.com>.
You should you hadoop 0.12.3 for example to dedup. The current version
0.14.x don't support Lock operation.

2007/10/18, Matei Zaharia <ma...@eecs.berkeley.edu>:
>
> Hi,
>
> I'm sometimes getting the following error in the dedup 3 job when
> running Nutch 0.9 on top of Hadoop 0.14.2:
>
> java.io.IOException: Lock obtain timed out: Lock@hdfs://r37:54310/
> user/matei/crawl4/indexes/part-00000/write.lock
>         at org.apache.lucene.store.Lock.obtain(Lock.java:69)
>         at org.apache.lucene.index.IndexReader.aquireWriteLock
> (IndexReader.java:526)
>         at org.apache.lucene.index.IndexReader.deleteDocument
> (IndexReader.java:551)
>         at org.apache.nutch.indexer.DeleteDuplicates.reduce
> (DeleteDuplicates.java:378)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:322)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(
> TaskTracker.java:
> 1782)
>
> Other times, it works just fine. Do you know why this is happening?
>
> Thanks,
>
> Matei Zaharia
>