You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by Briggs <ac...@gmail.com> on 2007/06/07 17:20:36 UTC

Lock file problems...

I am getting these lock file errors all over the place when indexing
or even creating crawldbs.  It doesn't happen all the time, but
sometimes it happens continuously.  So, I am not quite sure how these
locks are getting in there, or why they aren't getting removed.

I am not sure where to go from here.

My current application is designed for crawling individual domains.
So, I have multiple custom crawlers that work concurrently.  Each one
basically does:

1) fetch
2) invert links
3) segment merge
4) index
5) deduplicate
6) merge indexes


Though, I am still not 100% sure of what the "indexes" directory is truly for.




java.io.IOException: Lock obtain timed out:
Lock@file:/crawloutput/http$~~www.camlawblog.com/indexes/part-00000/write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:69)
        at org.apache.lucene.index.IndexReader.aquireWriteLock(IndexReader.java:526)
        at org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:551)
        at org.apache.nutch.indexer.DeleteDuplicates.reduce(DeleteDuplicates.java:414)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155)


So, has anyone seen this come up on their own implementations?

RE: Lock file problems...

Posted by Gal Nitzan <ga...@gmail.com>.

I index directly to Solr.
It happened to me while 2 separate indexers accessed it directly. It seemed 
that the Lucene index stayed hung (that's why the lock exists) until I killed 
the process. After that I had to re-build the index, since I was afraid it got 
corrupted.

> -----Original Message-----
> From: Briggs [mailto:acidbriggs@gmail.com]
> Sent: Thursday, June 07, 2007 6:21 PM
> To: nutch-dev@lucene.apache.org
> Subject: Lock file problems...
>
> I am getting these lock file errors all over the place when indexing
> or even creating crawldbs.  It doesn't happen all the time, but
> sometimes it happens continuously.  So, I am not quite sure how these
> locks are getting in there, or why they aren't getting removed.
>
> I am not sure where to go from here.
>
> My current application is designed for crawling individual domains.
> So, I have multiple custom crawlers that work concurrently.  Each one
> basically does:
>
> 1) fetch
> 2) invert links
> 3) segment merge
> 4) index
> 5) deduplicate
> 6) merge indexes
>
>
> Though, I am still not 100% sure of what the "indexes" directory is truly
> for.
>
>
>
>
> java.io.IOException: Lock obtain timed out:
> Lock@file:/crawloutput/http$~~www.camlawblog.com/indexes/part-
> 00000/write.lock
>         at org.apache.lucene.store.Lock.obtain(Lock.java:69)
>         at
> org.apache.lucene.index.IndexReader.aquireWriteLock(IndexReader.java:526)
>         at
> org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:551)
>         at
> org.apache.nutch.indexer.DeleteDuplicates.reduce(DeleteDuplicates.java:414
> )
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155)
>
>
> So, has anyone seen this come up on their own implementations?