You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Bayu Widyasanyata <bw...@gmail.com> on 2013/10/02 01:24:20 UTC

Delete specific host DB index on Solr database

Hi,

One of my seed URL was changed to new CMS which affect to its URI
presentation format.

How could I delete the old format of CMS on Solr database, then I could
recrawl and reindex again with new URI format comes from new CMS?

Thanks!

-- 
wassalam,
[bayu]

Re: Delete specific host DB index on Solr database

Posted by feng lu <am...@gmail.com>.
yes, before re-fetch again, these old URIs will query by user. one solution
is to delete these broken records by solr delete query using content id if
you know all broken uri. another method is to change the fetch time of
broken uri and re-fetch again. but currently does not support this
functionality.




On Thu, Oct 3, 2013 at 8:18 AM, Bayu Widyasanyata
<bw...@gmail.com>wrote:

> Hi Feng,
>
> How about the existing 'records' stored on current Solr database?
> Before that URL re-fetch again, then search result will refer to old URI
> format (from old CMS).
> User will direct to broken link URL.
>
> If I could delete those old records, it will force clean up old database.
> Then I should recrawl and reindex as usual.
>
> Thanks,
>
>
> On Wed, Oct 2, 2013 at 9:14 AM, feng lu <am...@gmail.com> wrote:
>
> > Hi Bayu
> >
> > Nutch will set the status of that url to STATUS_DB_GONE if the url can
> not
> > fetch successful, and you run the bin/nutch solrclean command that nutch
> > will remove the GONE documents from solr.
> >
> >
> > On Wed, Oct 2, 2013 at 7:24 AM, Bayu Widyasanyata
> > <bw...@gmail.com>wrote:
> >
> > > Hi,
> > >
> > > One of my seed URL was changed to new CMS which affect to its URI
> > > presentation format.
> > >
> > > How could I delete the old format of CMS on Solr database, then I could
> > > recrawl and reindex again with new URI format comes from new CMS?
> > >
> > > Thanks!
> > >
> > > --
> > > wassalam,
> > > [bayu]
> > >
> >
> >
> >
> > --
> > Don't Grow Old, Grow Up... :-)
> >
>
>
>
> --
> wassalam,
> [bayu]
>



-- 
Don't Grow Old, Grow Up... :-)

Re: Delete specific host DB index on Solr database

Posted by Bayu Widyasanyata <bw...@gmail.com>.
Hi Feng,

How about the existing 'records' stored on current Solr database?
Before that URL re-fetch again, then search result will refer to old URI
format (from old CMS).
User will direct to broken link URL.

If I could delete those old records, it will force clean up old database.
Then I should recrawl and reindex as usual.

Thanks,


On Wed, Oct 2, 2013 at 9:14 AM, feng lu <am...@gmail.com> wrote:

> Hi Bayu
>
> Nutch will set the status of that url to STATUS_DB_GONE if the url can not
> fetch successful, and you run the bin/nutch solrclean command that nutch
> will remove the GONE documents from solr.
>
>
> On Wed, Oct 2, 2013 at 7:24 AM, Bayu Widyasanyata
> <bw...@gmail.com>wrote:
>
> > Hi,
> >
> > One of my seed URL was changed to new CMS which affect to its URI
> > presentation format.
> >
> > How could I delete the old format of CMS on Solr database, then I could
> > recrawl and reindex again with new URI format comes from new CMS?
> >
> > Thanks!
> >
> > --
> > wassalam,
> > [bayu]
> >
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>



-- 
wassalam,
[bayu]

Re: Delete specific host DB index on Solr database

Posted by feng lu <am...@gmail.com>.
Hi Bayu

Nutch will set the status of that url to STATUS_DB_GONE if the url can not
fetch successful, and you run the bin/nutch solrclean command that nutch
will remove the GONE documents from solr.


On Wed, Oct 2, 2013 at 7:24 AM, Bayu Widyasanyata
<bw...@gmail.com>wrote:

> Hi,
>
> One of my seed URL was changed to new CMS which affect to its URI
> presentation format.
>
> How could I delete the old format of CMS on Solr database, then I could
> recrawl and reindex again with new URI format comes from new CMS?
>
> Thanks!
>
> --
> wassalam,
> [bayu]
>



-- 
Don't Grow Old, Grow Up... :-)