You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Marko Bauhardt <mb...@media-style.com> on 2006/03/15 18:52:43 UTC

update linkdb

Hi all,
I took a look into the LinkDb and noticed that this class has no  
update mechanism. The input dirs are only the dirs of the parsed  
segments, but the original linkdb is not updated by the new links.

Did I oversee something or wouldn't it be senseful to write a temp  
linkdb as output of the segment(s) folder and merge that with a  
existing link db.
I can imaging a similar behavior as the crawlDb update works today

Any comments? If that sounds interesting I can provide a patch.
Marko


Re: update linkdb

Posted by Andrzej Bialecki <ab...@getopt.org>.
Marko Bauhardt wrote:
> Am 15.03.2006 um 19:31 schrieb Andrzej Bialecki:
>
>> It's already implemented (svn log -r 380163).
>
> Hm. I have tomatoes on my eyes. ;-)

Ooops, no, I have an egg on my face. :-$

> Please help me to find the line of code where the existing linkdb are 
> added as inputFolder for the reducing? I think the old linkdb must be 
> merged with the new linkdb?

You are right, it's not there. Strange... I ran some tests before commit, I must have missed something - the current code has no chance to update the existing data.

If you have a patch I'll be happy to apply it. Thanks!

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: update linkdb

Posted by Marko Bauhardt <mb...@media-style.com>.
Am 15.03.2006 um 19:31 schrieb Andrzej Bialecki:

> It's already implemented (svn log -r 380163).

Hm. I have tomatoes on my eyes. ;-)
Please help me to find the line of code where the existing linkdb are  
added as inputFolder for the reducing? I think the old linkdb must be  
merged with the new linkdb?

Thanks,
Marko



Re: update linkdb

Posted by Andrzej Bialecki <ab...@getopt.org>.
Marko Bauhardt wrote:
> Hi all,
> I took a look into the LinkDb and noticed that this class has no 
> update mechanism. The input dirs are only the dirs of the parsed 
> segments, but the original linkdb is not updated by the new links.
>
> Did I oversee something or wouldn't it be senseful to write a temp 
> linkdb as output of the segment(s) folder and merge that with a 
> existing link db.
> I can imaging a similar behavior as the crawlDb update works today
>
> Any comments? If that sounds interesting I can provide a patch.

It's already implemented (svn log -r 380163).

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com