You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "D.Saravanaraj" <sa...@gmail.com> on 2006/03/06 19:24:05 UTC

Problem running Nutch Mapred after applying patch for Adaptive refetch

Hi  Andrzej,

I applied your patch for adaptive refetch. In the Indexer.java, the case
statement for STATUS_FETCH_UNMODIFIED is missing in the reduce() method. I
hope a simple break statement is to be added there.

Thanks
D.Saravanaraj

Re: Problem running Nutch Mapred after applying patch for Adaptive refetch

Posted by Andrzej Bialecki <ab...@getopt.org>.
Raghavendra Prabhu wrote:
> Hi Andrzej
>
> Even i had done the first thing when i added lines to IndexMerger
>
> Thanks for the change..
>
> I was wondering when we recreate a master index from the segments, how will
> it behave.
>
> Say that a content has changed and new thing has been fetched and parsed.
>
> When the index merger is run and the entire index is rebuilt, will the entry
> from the old one be removed and  only the new one remain.(I am rebuilding
> the index to fasten up search)
>
> So will the old index also remain and show up in search result
>   

That's the purpose of DeleteDuplicates (dedup), which removes obsolete 
versions of pages from indexes. Pages are still present in segments 
until you delete old segments, but they won't appear in searchable index.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Problem running Nutch Mapred after applying patch for Adaptive refetch

Posted by Raghavendra Prabhu <rr...@gmail.com>.
Hi Andrzej

Even i had done the first thing when i added lines to IndexMerger

Thanks for the change..

I was wondering when we recreate a master index from the segments, how will
it behave.

Say that a content has changed and new thing has been fetched and parsed.

When the index merger is run and the entire index is rebuilt, will the entry
from the old one be removed and  only the new one remain.(I am rebuilding
the index to fasten up search)

So will the old index also remain and show up in search result


Rgds
Prabhu




On 3/7/06, Andrzej Bialecki <ab...@getopt.org> wrote:
>
> Andrzej Bialecki wrote:
> > D.Saravanaraj wrote:
> >> Hi  Andrzej,
> >>
> >> I applied your patch for adaptive refetch. In the Indexer.java, the
> case
> >> statement for STATUS_FETCH_UNMODIFIED is missing in the reduce()
> >> method. I
> >> hope a simple break statement is to be added there.
> >>
> >
> > Good catch!  Yes, this case needs to be added to the list, just next
> > to STATUS_FETCH_GONE.
>
> Actually, I was wrong, please add this (it will be added to the new
> version of the patch):
>
>        case CrawlDatum.STATUS_FETCH_UNMODIFIED:
>          // we don't really have the new version of this page,
>          // so skip it - it's already in some older segment
>          fetchDatum = null;
>          break;
>
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
> ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>

Re: Problem running Nutch Mapred after applying patch for Adaptive refetch

Posted by Andrzej Bialecki <ab...@getopt.org>.
Andrzej Bialecki wrote:
> D.Saravanaraj wrote:
>> Hi  Andrzej,
>>
>> I applied your patch for adaptive refetch. In the Indexer.java, the case
>> statement for STATUS_FETCH_UNMODIFIED is missing in the reduce() 
>> method. I
>> hope a simple break statement is to be added there.
>>   
>
> Good catch!  Yes, this case needs to be added to the list, just next 
> to STATUS_FETCH_GONE.

Actually, I was wrong, please add this (it will be added to the new 
version of the patch):

        case CrawlDatum.STATUS_FETCH_UNMODIFIED:
          // we don't really have the new version of this page,
          // so skip it - it's already in some older segment
          fetchDatum = null;
          break;



-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Problem running Nutch Mapred after applying patch for Adaptive refetch

Posted by Andrzej Bialecki <ab...@getopt.org>.
D.Saravanaraj wrote:
> Hi  Andrzej,
>
> I applied your patch for adaptive refetch. In the Indexer.java, the case
> statement for STATUS_FETCH_UNMODIFIED is missing in the reduce() method. I
> hope a simple break statement is to be added there.
>   

Good catch!  Yes, this case needs to be added to the list, just next to 
STATUS_FETCH_GONE.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com