You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Doug Cutting <cu...@apache.org> on 2006/05/26 22:14:02 UTC
0.8 release soon?
Andrzej Bialecki wrote:
> 0.8 is pretty stable now, I think we should start considering a release
> soon, within the next month's time frame.
+1
Are there substantial features still missing from 0.8 that were
supported in 0.7?
Are there any showstopping bugs, things that worked in 0.7 that are
broken in 0.8?
Doug
Re: 0.8 release soon?
Posted by Stefan Groschupf <sg...@media-style.com>.
Having the url ip in crawl-datum is a big issue from my point of
view, since doing larger crawls is just not possible since the
described honey pot problems.
I will collect some more information soon.
The solution to lookup ip's during segment generation is just to slow
as soon you generate larger segments.
Stefan
Am 26.05.2006 um 22:14 schrieb Doug Cutting:
> Andrzej Bialecki wrote:
>> 0.8 is pretty stable now, I think we should start considering a
>> release soon, within the next month's time frame.
>
> +1
>
> Are there substantial features still missing from 0.8 that were
> supported in 0.7?
>
> Are there any showstopping bugs, things that worked in 0.7 that are
> broken in 0.8?
>
> Doug
>
Re: 0.8 release soon?
Posted by Stefan Neufeind <ap...@stefan-neufeind.de>.
Andrzej Bialecki wrote:
> Doug Cutting wrote:
>> Andrzej Bialecki wrote:
>>> 0.8 is pretty stable now, I think we should start considering a
>>> release soon, within the next month's time frame.
>>
>> +1
>>
>> Are there substantial features still missing from 0.8 that were
>> supported in 0.7?
>
> Next week I'll be working on NUTCH-61 to bring it to a state where it
> could be committed. It's a new feature, so the question is: should we
> play safe, and wait with it after the release, or should we go with it
> in the hope that it will get a wider testing audience? ;)
+1 for being "safe" and instead focusing on some of the already
mentioned patches that might need attention more urgently.
Stefan
Re: 0.8 release soon?
Posted by Andrzej Bialecki <ab...@getopt.org>.
Doug Cutting wrote:
> Andrzej Bialecki wrote:
>> 0.8 is pretty stable now, I think we should start considering a
>> release soon, within the next month's time frame.
>
> +1
>
> Are there substantial features still missing from 0.8 that were
> supported in 0.7?
Next week I'll be working on NUTCH-61 to bring it to a state where it
could be committed. It's a new feature, so the question is: should we
play safe, and wait with it after the release, or should we go with it
in the hope that it will get a wider testing audience? ;)
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: 0.8 release soon?
Posted by Lukas Vlcek <lu...@gmail.com>.
Hi,
I would lobby also for Nutch-273 (redirected pages not updated in DB).
This seems like quite important feature for me - in other words
nutch-0.8 would be un-useful for me without this fix.
Regards,
Lukas
On 5/26/06, Stefan Neufeind <ap...@stefan-neufeind.de> wrote:
> Doug Cutting wrote:
> > Andrzej Bialecki wrote:
> >> 0.8 is pretty stable now, I think we should start considering a
> >> release soon, within the next month's time frame.
> >
> > +1
> >
> > Are there substantial features still missing from 0.8 that were
> > supported in 0.7?
> >
> > Are there any showstopping bugs, things that worked in 0.7 that are
> > broken in 0.8?
>
> +1 as well, though I'm still new to the topic.
>
> During the setup I've come across a few patches that I think might be
> useful to maybe go into the 0.8-release. Those are:
>
> fixes:
> NUTCH-110-fixIllegalXmlChars08.patch
> NUTCH-254-fetcher_filter_url_patch.txt
>
> new features, that I tested and work fine here:
> NUTCH-48-did-you-mean-combined08.patch
> NUTCH-173-patch08-new.patch
> NUTCH-279-regex-normalize.patch
> NUTCH-288-OpenSearch-fix.patch
>
>
> !! open issues, from my side:
> NUTCH-277 (seems to affect httpclient, changing to http helped)
>
>
> Feedback welcome.
>
>
> Regards,
> Stefan
>
Re: 0.8 release soon?
Posted by Stefan Neufeind <ap...@stefan-neufeind.de>.
Doug Cutting wrote:
> Andrzej Bialecki wrote:
>> 0.8 is pretty stable now, I think we should start considering a
>> release soon, within the next month's time frame.
>
> +1
>
> Are there substantial features still missing from 0.8 that were
> supported in 0.7?
>
> Are there any showstopping bugs, things that worked in 0.7 that are
> broken in 0.8?
+1 as well, though I'm still new to the topic.
During the setup I've come across a few patches that I think might be
useful to maybe go into the 0.8-release. Those are:
fixes:
NUTCH-110-fixIllegalXmlChars08.patch
NUTCH-254-fetcher_filter_url_patch.txt
new features, that I tested and work fine here:
NUTCH-48-did-you-mean-combined08.patch
NUTCH-173-patch08-new.patch
NUTCH-279-regex-normalize.patch
NUTCH-288-OpenSearch-fix.patch
!! open issues, from my side:
NUTCH-277 (seems to affect httpclient, changing to http helped)
Feedback welcome.
Regards,
Stefan