You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Manish Verma <m_...@apple.com> on 2016/07/12 06:08:26 UTC

Delete db_gone from crawdb

Hi,

We want to delete db_gone docs from crawled without turing purge on.
We want to control this so that can delete these when ever we wish to clean crawldb.

Regards,
MV


Re: Delete db_gone from crawdb

Posted by Manish Verma <m_...@apple.com>.
I mean like solrclean and -deletegone options do we have any option to delete it from crawldb, using purge we have to change notch-site property and we don’t want to turn purge on all time.
Can we specify something in run time to delete these from crawldb(some script or runtime argument).

Regards,
MV

> On Jul 12, 2016, at 1:48 AM, Markus Jelsma <ma...@openindex.io> wrote:
> 
> Hi - what do you mean by control? In any case, you can turn it on once and purge db_gone, then turn if off again, right?
> Markus
> 
> 
> 
> -----Original message-----
>> From:Manish Verma <m_...@apple.com>
>> Sent: Tuesday 12th July 2016 8:08
>> To: user@nutch.apache.org
>> Subject: Delete db_gone from crawdb
>> 
>> Hi,
>> 
>> We want to delete db_gone docs from crawled without turing purge on.
>> We want to control this so that can delete these when ever we wish to clean crawldb.
>> 
>> Regards,
>> MV
>> 
>> 


RE: Delete db_gone from crawdb

Posted by Markus Jelsma <ma...@openindex.io>.
Hi - what do you mean by control? In any case, you can turn it on once and purge db_gone, then turn if off again, right?
Markus

 
 
-----Original message-----
> From:Manish Verma <m_...@apple.com>
> Sent: Tuesday 12th July 2016 8:08
> To: user@nutch.apache.org
> Subject: Delete db_gone from crawdb
> 
> Hi,
> 
> We want to delete db_gone docs from crawled without turing purge on.
> We want to control this so that can delete these when ever we wish to clean crawldb.
> 
> Regards,
> MV
> 
>