You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "D.Saravanaraj" <sa...@gmail.com> on 2006/03/06 19:24:05 UTC
Problem running Nutch Mapred after applying patch for Adaptive refetch
Hi Andrzej,
I applied your patch for adaptive refetch. In the Indexer.java, the case
statement for STATUS_FETCH_UNMODIFIED is missing in the reduce() method. I
hope a simple break statement is to be added there.
Thanks
D.Saravanaraj
Re: Problem running Nutch Mapred after applying patch for Adaptive
refetch
Posted by Andrzej Bialecki <ab...@getopt.org>.
Raghavendra Prabhu wrote:
> Hi Andrzej
>
> Even i had done the first thing when i added lines to IndexMerger
>
> Thanks for the change..
>
> I was wondering when we recreate a master index from the segments, how will
> it behave.
>
> Say that a content has changed and new thing has been fetched and parsed.
>
> When the index merger is run and the entire index is rebuilt, will the entry
> from the old one be removed and only the new one remain.(I am rebuilding
> the index to fasten up search)
>
> So will the old index also remain and show up in search result
>
That's the purpose of DeleteDuplicates (dedup), which removes obsolete
versions of pages from indexes. Pages are still present in segments
until you delete old segments, but they won't appear in searchable index.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: Problem running Nutch Mapred after applying patch for Adaptive refetch
Posted by Raghavendra Prabhu <rr...@gmail.com>.
Hi Andrzej
Even i had done the first thing when i added lines to IndexMerger
Thanks for the change..
I was wondering when we recreate a master index from the segments, how will
it behave.
Say that a content has changed and new thing has been fetched and parsed.
When the index merger is run and the entire index is rebuilt, will the entry
from the old one be removed and only the new one remain.(I am rebuilding
the index to fasten up search)
So will the old index also remain and show up in search result
Rgds
Prabhu
On 3/7/06, Andrzej Bialecki <ab...@getopt.org> wrote:
>
> Andrzej Bialecki wrote:
> > D.Saravanaraj wrote:
> >> Hi Andrzej,
> >>
> >> I applied your patch for adaptive refetch. In the Indexer.java, the
> case
> >> statement for STATUS_FETCH_UNMODIFIED is missing in the reduce()
> >> method. I
> >> hope a simple break statement is to be added there.
> >>
> >
> > Good catch! Yes, this case needs to be added to the list, just next
> > to STATUS_FETCH_GONE.
>
> Actually, I was wrong, please add this (it will be added to the new
> version of the patch):
>
> case CrawlDatum.STATUS_FETCH_UNMODIFIED:
> // we don't really have the new version of this page,
> // so skip it - it's already in some older segment
> fetchDatum = null;
> break;
>
>
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>
>
Re: Problem running Nutch Mapred after applying patch for Adaptive
refetch
Posted by Andrzej Bialecki <ab...@getopt.org>.
Andrzej Bialecki wrote:
> D.Saravanaraj wrote:
>> Hi Andrzej,
>>
>> I applied your patch for adaptive refetch. In the Indexer.java, the case
>> statement for STATUS_FETCH_UNMODIFIED is missing in the reduce()
>> method. I
>> hope a simple break statement is to be added there.
>>
>
> Good catch! Yes, this case needs to be added to the list, just next
> to STATUS_FETCH_GONE.
Actually, I was wrong, please add this (it will be added to the new
version of the patch):
case CrawlDatum.STATUS_FETCH_UNMODIFIED:
// we don't really have the new version of this page,
// so skip it - it's already in some older segment
fetchDatum = null;
break;
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: Problem running Nutch Mapred after applying patch for Adaptive
refetch
Posted by Andrzej Bialecki <ab...@getopt.org>.
D.Saravanaraj wrote:
> Hi Andrzej,
>
> I applied your patch for adaptive refetch. In the Indexer.java, the case
> statement for STATUS_FETCH_UNMODIFIED is missing in the reduce() method. I
> hope a simple break statement is to be added there.
>
Good catch! Yes, this case needs to be added to the list, just next to
STATUS_FETCH_GONE.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com