You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/09/01 14:31:31 UTC
LinkDB merging completed but..
Hi,
Last night i started another round of link inverting and merging with the
existing LinkDB. Many hours later i see the job finished without failures or
errors.
Yet the LinkDB wasn't updated. To my surprise i saw the intermediate files
hanging around in my Hadoop ~ but i expected the temp files to be cleaned and
the new LinkDB file to be copied over the old one.
The new file that ought to be copied over is also broken. I cannot readlinkdb.
Gladly the old still works but before i try to debug some process that takes
many hours.. any idea?
Thanks
Re: LinkDB merging completed but..
Posted by Markus Jelsma <ma...@openindex.io>.
Good. Manually replacing the tmp dir to current works. It's not corrupt or
whatsoever and is readable. I recall this was not the case for linkdb but i
may have don't something else wrong.
Of course i had to remove the lock file as well.
> The same now occasionally happens with the crawldb. Today we tested new
> filters a couple of times. The job failed without failure twice, no strange
> error message or whatever. The only clue is that the tmp files does not
> replace the `current` dir.
>
> Anything?
>
> > Hi,
> >
> > Last night i started another round of link inverting and merging with the
> > existing LinkDB. Many hours later i see the job finished without failures
> > or errors.
> >
> > Yet the LinkDB wasn't updated. To my surprise i saw the intermediate
> > files hanging around in my Hadoop ~ but i expected the temp files to be
> > cleaned and the new LinkDB file to be copied over the old one.
> >
> > The new file that ought to be copied over is also broken. I cannot
> > readlinkdb. Gladly the old still works but before i try to debug some
> > process that takes many hours.. any idea?
> >
> > Thanks
Re: LinkDB merging completed but..
Posted by Markus Jelsma <ma...@openindex.io>.
The same now occasionally happens with the crawldb. Today we tested new
filters a couple of times. The job failed without failure twice, no strange
error message or whatever. The only clue is that the tmp files does not
replace the `current` dir.
Anything?
> Hi,
>
> Last night i started another round of link inverting and merging with the
> existing LinkDB. Many hours later i see the job finished without failures
> or errors.
>
> Yet the LinkDB wasn't updated. To my surprise i saw the intermediate files
> hanging around in my Hadoop ~ but i expected the temp files to be cleaned
> and the new LinkDB file to be copied over the old one.
>
> The new file that ought to be copied over is also broken. I cannot
> readlinkdb. Gladly the old still works but before i try to debug some
> process that takes many hours.. any idea?
>
> Thanks