You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Doug Cutting <cu...@apache.org> on 2006/05/26 22:14:02 UTC

0.8 release soon?

Andrzej Bialecki wrote:
> 0.8 is pretty stable now, I think we should start considering a release 
> soon, within the next month's time frame.

+1

Are there substantial features still missing from 0.8 that were 
supported in 0.7?

Are there any showstopping bugs, things that worked in 0.7 that are 
broken in 0.8?

Doug

Re: 0.8 release soon?

Posted by Stefan Groschupf <sg...@media-style.com>.
Having the url ip in crawl-datum is a big issue from my point of  
view,  since doing larger crawls is just not possible since the  
described honey pot problems.
I will collect some more information soon.
The solution to lookup ip's during segment generation is just to slow  
as soon you generate larger segments.

Stefan


Am 26.05.2006 um 22:14 schrieb Doug Cutting:

> Andrzej Bialecki wrote:
>> 0.8 is pretty stable now, I think we should start considering a  
>> release soon, within the next month's time frame.
>
> +1
>
> Are there substantial features still missing from 0.8 that were  
> supported in 0.7?
>
> Are there any showstopping bugs, things that worked in 0.7 that are  
> broken in 0.8?
>
> Doug
>


Re: 0.8 release soon?

Posted by Stefan Neufeind <ap...@stefan-neufeind.de>.
Andrzej Bialecki wrote:
> Doug Cutting wrote:
>> Andrzej Bialecki wrote:
>>> 0.8 is pretty stable now, I think we should start considering a
>>> release soon, within the next month's time frame.
>>
>> +1
>>
>> Are there substantial features still missing from 0.8 that were
>> supported in 0.7?
> 
> Next week I'll be working on NUTCH-61 to bring it to a state where it
> could be committed. It's a new feature, so the question is: should we
> play safe, and wait with it after the release, or should we go with it
> in the hope that it will get a wider testing audience? ;)

+1 for being "safe" and instead focusing on some of the already
mentioned patches that might need attention more urgently.

  Stefan

Re: 0.8 release soon?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Doug Cutting wrote:
> Andrzej Bialecki wrote:
>> 0.8 is pretty stable now, I think we should start considering a 
>> release soon, within the next month's time frame.
>
> +1
>
> Are there substantial features still missing from 0.8 that were 
> supported in 0.7?

Next week I'll be working on NUTCH-61 to bring it to a state where it 
could be committed. It's a new feature, so the question is: should we 
play safe, and wait with it after the release, or should we go with it 
in the hope that it will get a wider testing audience? ;)

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: 0.8 release soon?

Posted by Lukas Vlcek <lu...@gmail.com>.
Hi,

I would lobby also for Nutch-273 (redirected pages not updated in DB).
This seems like quite important feature for me - in other words
nutch-0.8 would be un-useful for me without this fix.

Regards,
Lukas

On 5/26/06, Stefan Neufeind <ap...@stefan-neufeind.de> wrote:
> Doug Cutting wrote:
> > Andrzej Bialecki wrote:
> >> 0.8 is pretty stable now, I think we should start considering a
> >> release soon, within the next month's time frame.
> >
> > +1
> >
> > Are there substantial features still missing from 0.8 that were
> > supported in 0.7?
> >
> > Are there any showstopping bugs, things that worked in 0.7 that are
> > broken in 0.8?
>
> +1 as well, though I'm still new to the topic.
>
> During the setup I've come across a few patches that I think might be
> useful to maybe go into the 0.8-release. Those are:
>
> fixes:
> NUTCH-110-fixIllegalXmlChars08.patch
> NUTCH-254-fetcher_filter_url_patch.txt
>
> new features, that I tested and work fine here:
> NUTCH-48-did-you-mean-combined08.patch
> NUTCH-173-patch08-new.patch
> NUTCH-279-regex-normalize.patch
> NUTCH-288-OpenSearch-fix.patch
>
>
> !! open issues, from my side:
> NUTCH-277 (seems to affect httpclient, changing to http helped)
>
>
> Feedback welcome.
>
>
> Regards,
>  Stefan
>

Re: 0.8 release soon?

Posted by Stefan Neufeind <ap...@stefan-neufeind.de>.
Doug Cutting wrote:
> Andrzej Bialecki wrote:
>> 0.8 is pretty stable now, I think we should start considering a
>> release soon, within the next month's time frame.
> 
> +1
> 
> Are there substantial features still missing from 0.8 that were
> supported in 0.7?
> 
> Are there any showstopping bugs, things that worked in 0.7 that are
> broken in 0.8?

+1 as well, though I'm still new to the topic.

During the setup I've come across a few patches that I think might be
useful to maybe go into the 0.8-release. Those are:

fixes:
NUTCH-110-fixIllegalXmlChars08.patch
NUTCH-254-fetcher_filter_url_patch.txt

new features, that I tested and work fine here:
NUTCH-48-did-you-mean-combined08.patch
NUTCH-173-patch08-new.patch
NUTCH-279-regex-normalize.patch
NUTCH-288-OpenSearch-fix.patch


!! open issues, from my side:
NUTCH-277 (seems to affect httpclient, changing to http helped)


Feedback welcome.


Regards,
 Stefan