You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Chris Fellows <cc...@sbcglobal.net> on 2006/05/06 01:32:28 UTC

Merging segments

Hello,

So the last discussion on merging segments was back in
Jan. Has there been any progress in this direction?
What would be the benefit of being able merge
segments? Would being able to merge segments open up
new functionality options or is merging just a
convience? Also, what's the estimate for how involved
merge functionality development is?

Regards,

- Chris


Re: Merging segments

Posted by Andrzej Bialecki <ab...@getopt.org>.
Chris Fellows wrote:
> That's great.
>
> Well, my follow up to that then is: 
>
> Will the new tool allow any form of "diff'ing"
> segments? In practice this would allow you to run a
>   

No, it does only two things - merging and slicing. That's already one 
too many... ;)

> crawl on a series of sites one week. Then run another
> crawl on the same sites a week or so later. Diff the
> segments and allow users to search on changes within
> the search domain.
>   

Interesting concept, but I think it would be better implemented as a 
variant of de-duplication, rather than segment content manipulation.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Merging segments

Posted by Chris Fellows <cc...@sbcglobal.net>.
That's great.

Well, my follow up to that then is: 

Will the new tool allow any form of "diff'ing"
segments? In practice this would allow you to run a
crawl on a series of sites one week. Then run another
crawl on the same sites a week or so later. Diff the
segments and allow users to search on changes within
the search domain.

--- Andrzej Bialecki <ab...@getopt.org> wrote:

> Chris Fellows wrote:
> > Hello,
> >
> > So the last discussion on merging segments was
> back in
> > Jan. Has there been any progress in this
> direction?
> > What would be the benefit of being able merge
> > segments? Would being able to merge segments open
> up
> > new functionality options or is merging just a
> > convience? Also, what's the estimate for how
> involved
> > merge functionality development is?
> >   
> 
> Relief is on the way. Fine folks at houxou.com have
> sponsored the 
> development of a brand-new SegmentMerger + slicer,
> and decided to donate 
> it to the project - big thanks!
> 
> I'm running some final tests, and will commit it
> today/tomorrow.
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _  
> __________________________________
> [__ || __|__/|__||\/|  Information Retrieval,
> Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System
> Integration
> http://www.sigram.com  Contact: info at sigram dot
> com
> 
> 
> 


Re: Merging segments

Posted by Andrzej Bialecki <ab...@getopt.org>.
Chris Fellows wrote:
> Hello,
>
> So the last discussion on merging segments was back in
> Jan. Has there been any progress in this direction?
> What would be the benefit of being able merge
> segments? Would being able to merge segments open up
> new functionality options or is merging just a
> convience? Also, what's the estimate for how involved
> merge functionality development is?
>   

Relief is on the way. Fine folks at houxou.com have sponsored the 
development of a brand-new SegmentMerger + slicer, and decided to donate 
it to the project - big thanks!

I'm running some final tests, and will commit it today/tomorrow.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com