You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Stefan Groschupf <sg...@media-style.com> on 2005/12/15 17:14:02 UTC
vote results.
Hi,
I counted the votes manually, I hope I didn't oversee something. I
didn't filter out issues that are 0.8 related, since it is good to
know community wishes anyway. :-)
Greetings,
Stefan
P.S. I agree with Piotr that we should use the issue voting
functionality more for the next releases.
NUTCH-140 Add alias capability in parse-plugins.xml file that allows
mimeType->extensionId mapping
1
NUTCH-139 Standard metadata property names in the ParseData metadata
2
NUTCH-138 non-Latin-1 characters cannot be submitted for search
1
NUTCH-3 multi values of header discarded
1
NUTCH-134 Summarizer doesn't select the best snippets
1
NUTCH-98 RobotRulesParser interprets robots.txt incorrectly
1
NUTCH-120 one "bad" link on a page kills parsing
3
NUTCH-127 uncorrect values using -du, or ls does not return items
2
NUTCH-126 Fetching via https does not work with a proxy (patch)
1
NUTCH-125 OpenOffice Parser plugin
2
NUTCH-110 OpenSearchServlet outputs illegal xml characters
1
NUTCH-36 Chinese in Nutch
1
NUTCH-123 Cache.jsp some times generate NullPointerException
1
NUTCH-121 SegmentReader for mapred
2
NUTCH-119 Regexp to extract outlinks incorrect
1
NUTCH-115 jobtracker.jsp shows too much information
1
NUTCH-108 tasktracker crashs when reconnecting to a new jobtracker.
1
NUTCH-113 Disable permanent DNS-to-IP caching for JVM 1.4
1
NUTCH-111 ndfs.replication is not documented within the nutch-
default.xml configuration file.
1
NUTCH-100 New plugin urlfilter-db
1
NUTCH-106 Datanode corruption
1
NUTCH-95 DeleteDuplicates depends on the order of input segments
1
NUTCH-92 DistributedSearch incorrectly scores results
2
NUTCH-91 empty encoding causes exception
1
NUTCH-52 Parser plugin for MS Excel files
1
NUTCH-74 French Analyzer Plugin
1
NUTCH-64 no results after a restart of a search--server (without
tomcat restart)
1
NUTCH-68 A tool to generate arbitrary fetchlists
1
NUTCH-62 Add html META tag information into metaData in index-more
plugin
1
NUTCH-61 Adaptive re-fetch interval. Detecting umodified content
1
NUTCH-13 If dns points to 127.0.0.1, the url is also crawled
1
NUTCH-48 "Did you mean" query enhancement/refignment feature request
1
NUTCH-45 Log corrupt segments in SegmentMergeTool
1
NUTCH-24 Cannot handle incorrectly cased Content-Type
1
NUTCH-16 boost documents matching a url pattern
1
Re: vote results.
Posted by Jérôme Charron <je...@gmail.com>.
> Just continue voting I will continue with my tally sheet. :-)
Why not creating a wiki page... so that you don't have to do this "bad
work".
Jérôme
Re: vote results.
Posted by Stefan Groschupf <sg...@media-style.com>.
> Shouldn't the period for voting be a bit longer?
Sure - sorry, just let me know and I will calculate the vote points
again as soon people thing we should finish the voting.
Just continue voting I will continue with my tally sheet. :-)
Stefan
Re: vote results.
Posted by Andrzej Bialecki <ab...@getopt.org>.
Stefan Groschupf wrote:
> Hi,
> I counted the votes manually, I hope I didn't oversee something. I
> didn't filter out issues that are 0.8 related, since it is good to
> know community wishes anyway. :-)
Shouldn't the period for voting be a bit longer? I didn't have time to
vote yet... Anyway, my take on this:
> NUTCH-140 Add alias capability in parse-plugins.xml file that
> allows mimeType->extensionId mapping
> 1
> NUTCH-139 Standard metadata property names in the ParseData metadata
> 2
+1
> NUTCH-138 non-Latin-1 characters cannot be submitted for search
> 1
> NUTCH-3 multi values of header discarded
> 1
+1
>
> NUTCH-134 Summarizer doesn't select the best snippets
> 1
+1
I have some patches, which use Lucene Highlighter package instead.
> NUTCH-98 RobotRulesParser interprets robots.txt incorrectly
> 1
> NUTCH-120 one "bad" link on a page kills parsing
> 3
> NUTCH-127 uncorrect values using -du, or ls does not return items
> 2
+1
> NUTCH-126 Fetching via https does not work with a proxy (patch)
> 1
> NUTCH-125 OpenOffice Parser plugin
> 2
+1. Ready to commit, I'll do it tomorrow.
> NUTCH-110 OpenSearchServlet outputs illegal xml characters
> 1
> NUTCH-36 Chinese in Nutch
> 1
> NUTCH-123 Cache.jsp some times generate NullPointerException
> 1
> NUTCH-121 SegmentReader for mapred
> 2
Nearly ready to commit, I can do it probably by the end of the week.
However, this is valid only for the mapred branch, so it doesn't affect
the release.
> NUTCH-119 Regexp to extract outlinks incorrect
> 1
> NUTCH-115 jobtracker.jsp shows too much information
> 1
> NUTCH-108 tasktracker crashs when reconnecting to a new jobtracker.
> 1
> NUTCH-113 Disable permanent DNS-to-IP caching for JVM 1.4
> 1
> NUTCH-111 ndfs.replication is not documented within the nutch-
> default.xml configuration file.
> 1
> NUTCH-100 New plugin urlfilter-db
> 1
> NUTCH-106 Datanode corruption
> 1
> NUTCH-95 DeleteDuplicates depends on the order of input segments
> 1
+1
> NUTCH-92 DistributedSearch incorrectly scores results
> 2
+1. However, solving this correctly is _hard_ ... it's a very similar
problem to the MultiSearcher in Lucene, and it took that group quite
some time to reach an acceptable solution...
> NUTCH-91 empty encoding causes exception
> 1
> NUTCH-52 Parser plugin for MS Excel files
> 1
> NUTCH-74 French Analyzer Plugin
> 1
> NUTCH-64 no results after a restart of a search--server (without
> tomcat restart)
> 1
> NUTCH-68 A tool to generate arbitrary fetchlists
> 1
> NUTCH-62 Add html META tag information into metaData in index-more
> plugin
> 1
> NUTCH-61 Adaptive re-fetch interval. Detecting umodified content
> 1
+1. I think this is an important feature. I have some patches, which
need to be updated. However, I wouldn't be so bold as to commit them
just before a release. There are quite a few subtle issues with the
segment handling if you use this.
> NUTCH-13 If dns points to 127.0.0.1, the url is also crawled
> 1
> NUTCH-48 "Did you mean" query enhancement/refignment feature request
> 1
> NUTCH-45 Log corrupt segments in SegmentMergeTool
> 1
> NUTCH-24 Cannot handle incorrectly cased Content-Type
> 1
Isn't this solved already?
> NUTCH-16 boost documents matching a url pattern
> 1
>
>
>
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com