You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/09/01 14:31:31 UTC

LinkDB merging completed but..

Hi,

Last night i started another round of link inverting and merging with the 
existing LinkDB. Many hours later i see the job finished without failures or 
errors.

Yet the LinkDB wasn't updated. To my surprise i saw the intermediate files 
hanging around in my Hadoop ~ but i expected the temp files to be cleaned and 
the new LinkDB file to be copied over the old one.

The new file that ought to be copied over is also broken. I cannot readlinkdb. 
Gladly the old still works but before i try to debug some process that takes 
many hours.. any idea?

Thanks

Re: LinkDB merging completed but..

Posted by Markus Jelsma <ma...@openindex.io>.
Good. Manually replacing the tmp dir to current works. It's not corrupt or 
whatsoever and is readable. I recall this was not the case for linkdb but i 
may have don't something else wrong.

Of course i had to remove the lock file as well.

> The same now occasionally happens with the crawldb. Today we tested new
> filters a couple of times. The job failed without failure twice, no strange
> error message or whatever. The only clue is that the tmp files does not
> replace the `current` dir.
> 
> Anything?
> 
> > Hi,
> > 
> > Last night i started another round of link inverting and merging with the
> > existing LinkDB. Many hours later i see the job finished without failures
> > or errors.
> > 
> > Yet the LinkDB wasn't updated. To my surprise i saw the intermediate
> > files hanging around in my Hadoop ~ but i expected the temp files to be
> > cleaned and the new LinkDB file to be copied over the old one.
> > 
> > The new file that ought to be copied over is also broken. I cannot
> > readlinkdb. Gladly the old still works but before i try to debug some
> > process that takes many hours.. any idea?
> > 
> > Thanks

Re: LinkDB merging completed but..

Posted by Markus Jelsma <ma...@openindex.io>.
The same now occasionally happens with the crawldb. Today we tested new 
filters a couple of times. The job failed without failure twice, no strange 
error message or whatever. The only clue is that the tmp files does not 
replace the `current` dir.

Anything?


> Hi,
> 
> Last night i started another round of link inverting and merging with the
> existing LinkDB. Many hours later i see the job finished without failures
> or errors.
> 
> Yet the LinkDB wasn't updated. To my surprise i saw the intermediate files
> hanging around in my Hadoop ~ but i expected the temp files to be cleaned
> and the new LinkDB file to be copied over the old one.
> 
> The new file that ought to be copied over is also broken. I cannot
> readlinkdb. Gladly the old still works but before i try to debug some
> process that takes many hours.. any idea?
> 
> Thanks