You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Marko Bauhardt <mb...@media-style.com> on 2006/03/15 18:52:43 UTC
update linkdb
Hi all,
I took a look into the LinkDb and noticed that this class has no
update mechanism. The input dirs are only the dirs of the parsed
segments, but the original linkdb is not updated by the new links.
Did I oversee something or wouldn't it be senseful to write a temp
linkdb as output of the segment(s) folder and merge that with a
existing link db.
I can imaging a similar behavior as the crawlDb update works today
Any comments? If that sounds interesting I can provide a patch.
Marko
Re: update linkdb
Posted by Andrzej Bialecki <ab...@getopt.org>.
Marko Bauhardt wrote:
> Am 15.03.2006 um 19:31 schrieb Andrzej Bialecki:
>
>> It's already implemented (svn log -r 380163).
>
> Hm. I have tomatoes on my eyes. ;-)
Ooops, no, I have an egg on my face. :-$
> Please help me to find the line of code where the existing linkdb are
> added as inputFolder for the reducing? I think the old linkdb must be
> merged with the new linkdb?
You are right, it's not there. Strange... I ran some tests before commit, I must have missed something - the current code has no chance to update the existing data.
If you have a patch I'll be happy to apply it. Thanks!
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: update linkdb
Posted by Marko Bauhardt <mb...@media-style.com>.
Am 15.03.2006 um 19:31 schrieb Andrzej Bialecki:
> It's already implemented (svn log -r 380163).
Hm. I have tomatoes on my eyes. ;-)
Please help me to find the line of code where the existing linkdb are
added as inputFolder for the reducing? I think the old linkdb must be
merged with the new linkdb?
Thanks,
Marko
Re: update linkdb
Posted by Andrzej Bialecki <ab...@getopt.org>.
Marko Bauhardt wrote:
> Hi all,
> I took a look into the LinkDb and noticed that this class has no
> update mechanism. The input dirs are only the dirs of the parsed
> segments, but the original linkdb is not updated by the new links.
>
> Did I oversee something or wouldn't it be senseful to write a temp
> linkdb as output of the segment(s) folder and merge that with a
> existing link db.
> I can imaging a similar behavior as the crawlDb update works today
>
> Any comments? If that sounds interesting I can provide a patch.
It's already implemented (svn log -r 380163).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com