You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Alaak <al...@gmx.de> on 2012/09/08 00:04:54 UTC

Keeping an externally created field in solr.

Hi,

I have an external program which changes a field for some websites 
within my Solr index. Nutch sets this field to a default value using a 
plugin on indexing a page. My problem now is that nutch resets the field 
for already indexed pages as well, when it updates those pages. Do I 
have any possibility to tell Nutch it should not touch that field if it 
already exists within the Solr Index?

Thanks and Regards

RE: Keeping an externally created field in solr.

Posted by Markus Jelsma <ma...@openindex.io>.
Don't delete the crawl db, that's pointless. You can either delete the whole segment or remove all but crawl_generate and try again. You should delete the segment if you've successfully crawled another segment after that segment because it'll contain the same URL's.

 
 
-----Original message-----
> From:Alaak <al...@gmx.de>
> Sent: Sat 08-Sep-2012 10:43
> To: user@nutch.apache.org
> Cc: Markus Jelsma <ma...@openindex.io>
> Subject: Re: Keeping an externally created field in solr.
> 
> Hi,
> 
> Ok. Thanks. Then I guess I will follow your last proposal and read the 
> value from the Solr Index if the URL is already there.
> 
> Am Sa 08 Sep 2012 00:11:41 CEST schrieb Markus Jelsma:
> >
> > No, but you could modify the indexer to do so. Or make use of Solr's 
> > new capability of updating specific fields. You could also modifiy 
> > that indexer plugin to fetch the value for that field from some source 
> > you have prior to indexing. I think the latter is the easiest to make 
> > but it only works for fields specifically set by Nutch.
> >
> > -----Original message-----
> >>
> >> From:Alaak <al...@gmx.de>
> >> Sent: Sat 08-Sep-2012 00:08
> >> To: user@nutch.apache.org
> >> Subject: Keeping an externally created field in solr.
> >>
> >> Hi,
> >>
> >> I have an external program which changes a field for some websites
> >> within my Solr index. Nutch sets this field to a default value using a
> >> plugin on indexing a page. My problem now is that nutch resets the field
> >> for already indexed pages as well, when it updates those pages. Do I
> >> have any possibility to tell Nutch it should not touch that field if it
> >> already exists within the Solr Index?
> >>
> >> Thanks and Regards
> 

Re: Keeping an externally created field in solr.

Posted by Alaak <al...@gmx.de>.
Hi,

Ok. Thanks. Then I guess I will follow your last proposal and read the 
value from the Solr Index if the URL is already there.

Am Sa 08 Sep 2012 00:11:41 CEST schrieb Markus Jelsma:
>
> No, but you could modify the indexer to do so. Or make use of Solr's 
> new capability of updating specific fields. You could also modifiy 
> that indexer plugin to fetch the value for that field from some source 
> you have prior to indexing. I think the latter is the easiest to make 
> but it only works for fields specifically set by Nutch.
>
> -----Original message-----
>>
>> From:Alaak <al...@gmx.de>
>> Sent: Sat 08-Sep-2012 00:08
>> To: user@nutch.apache.org
>> Subject: Keeping an externally created field in solr.
>>
>> Hi,
>>
>> I have an external program which changes a field for some websites
>> within my Solr index. Nutch sets this field to a default value using a
>> plugin on indexing a page. My problem now is that nutch resets the field
>> for already indexed pages as well, when it updates those pages. Do I
>> have any possibility to tell Nutch it should not touch that field if it
>> already exists within the Solr Index?
>>
>> Thanks and Regards

RE: Keeping an externally created field in solr.

Posted by Markus Jelsma <ma...@openindex.io>.
No, but you could modify the indexer to do so. Or make use of Solr's new capability of updating specific fields. You could also modifiy that indexer plugin to fetch the value for that field from some source you have prior to indexing. I think the latter is the easiest to make but it only works for fields specifically set by Nutch.
 
-----Original message-----
> From:Alaak <al...@gmx.de>
> Sent: Sat 08-Sep-2012 00:08
> To: user@nutch.apache.org
> Subject: Keeping an externally created field in solr.
> 
> Hi,
> 
> I have an external program which changes a field for some websites 
> within my Solr index. Nutch sets this field to a default value using a 
> plugin on indexing a page. My problem now is that nutch resets the field 
> for already indexed pages as well, when it updates those pages. Do I 
> have any possibility to tell Nutch it should not touch that field if it 
> already exists within the Solr Index?
> 
> Thanks and Regards
>