You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Stefan Groschupf <sg...@media-style.com> on 2005/12/15 17:14:02 UTC

vote results.

Hi,
I counted the votes manually, I hope I didn't oversee something. I  
didn't filter out issues that are 0.8 related, since it is good to  
know community wishes anyway. :-)

Greetings,
Stefan

P.S. I agree with Piotr that we should use the issue voting  
functionality more for the next releases.


NUTCH-140	Add alias capability in parse-plugins.xml file that allows  
mimeType->extensionId mapping
1
NUTCH-139	Standard metadata property names in the ParseData metadata
2
NUTCH-138	non-Latin-1 characters cannot be submitted for search
1
NUTCH-3	multi values of header discarded	
1
NUTCH-134	Summarizer doesn't select the best snippets	
1
NUTCH-98	RobotRulesParser interprets robots.txt incorrectly
1
NUTCH-120	one "bad" link on a page kills parsing	
3
NUTCH-127	uncorrect values using -du, or ls does not return items
2
NUTCH-126	Fetching via https does not work with a proxy (patch)
1
NUTCH-125	OpenOffice Parser plugin	
2
NUTCH-110	OpenSearchServlet outputs illegal xml characters
1
NUTCH-36	Chinese in Nutch	
1
NUTCH-123	Cache.jsp some times generate NullPointerException
1
NUTCH-121	SegmentReader for mapred	
2
NUTCH-119	Regexp to extract outlinks incorrect	
1
NUTCH-115	jobtracker.jsp shows too much information	
1
NUTCH-108	tasktracker crashs when reconnecting to a new jobtracker.
1
NUTCH-113	Disable permanent DNS-to-IP caching for JVM 1.4
1
NUTCH-111	ndfs.replication is not documented within the nutch- 
default.xml configuration file.
1
NUTCH-100	New plugin urlfilter-db	
1
NUTCH-106	Datanode corruption	
1
NUTCH-95	DeleteDuplicates depends on the order of input segments
1
NUTCH-92	DistributedSearch incorrectly scores results	
2
NUTCH-91	empty encoding causes exception	
1
NUTCH-52	Parser plugin for MS Excel files	
1
NUTCH-74	French Analyzer Plugin	
1
NUTCH-64	no results after a restart of a search--server (without  
tomcat restart)
1
NUTCH-68	A tool to generate arbitrary fetchlists	
1
NUTCH-62	Add html META tag information into metaData in index-more  
plugin
1
NUTCH-61	Adaptive re-fetch interval. Detecting umodified content
1
NUTCH-13	If dns points to 127.0.0.1, the url is also crawled
1
NUTCH-48	"Did you mean" query enhancement/refignment feature request
1
NUTCH-45	Log corrupt segments in SegmentMergeTool	
1
NUTCH-24	Cannot handle incorrectly cased Content-Type	
1
NUTCH-16	boost documents matching a url pattern	
1



Re: vote results.

Posted by Jérôme Charron <je...@gmail.com>.
> Just continue voting I will continue with  my tally sheet. :-)

Why not creating a wiki page... so that you don't have to do this "bad
work".

Jérôme

Re: vote results.

Posted by Stefan Groschupf <sg...@media-style.com>.
> Shouldn't the period for voting be a bit longer?
Sure - sorry, just let me know and I will calculate the vote points  
again as soon people thing we should finish the voting.
Just continue voting I will continue with  my tally sheet. :-)
Stefan




Re: vote results.

Posted by Andrzej Bialecki <ab...@getopt.org>.
Stefan Groschupf wrote:

> Hi,
> I counted the votes manually, I hope I didn't oversee something. I  
> didn't filter out issues that are 0.8 related, since it is good to  
> know community wishes anyway. :-)


Shouldn't the period for voting be a bit longer? I didn't have time to 
vote yet... Anyway, my take on this:


> NUTCH-140    Add alias capability in parse-plugins.xml file that 
> allows  mimeType->extensionId mapping
> 1
> NUTCH-139    Standard metadata property names in the ParseData metadata
> 2

+1

> NUTCH-138    non-Latin-1 characters cannot be submitted for search
> 1
> NUTCH-3    multi values of header discarded   
> 1


+1

>
> NUTCH-134    Summarizer doesn't select the best snippets   
> 1


+1
I have some patches, which use Lucene Highlighter package instead.

> NUTCH-98    RobotRulesParser interprets robots.txt incorrectly
> 1
> NUTCH-120    one "bad" link on a page kills parsing   
> 3
> NUTCH-127    uncorrect values using -du, or ls does not return items
> 2


+1

> NUTCH-126    Fetching via https does not work with a proxy (patch)
> 1
> NUTCH-125    OpenOffice Parser plugin   
> 2


+1. Ready to commit, I'll do it tomorrow.

> NUTCH-110    OpenSearchServlet outputs illegal xml characters
> 1
> NUTCH-36    Chinese in Nutch   
> 1
> NUTCH-123    Cache.jsp some times generate NullPointerException
> 1
> NUTCH-121    SegmentReader for mapred   
> 2


Nearly ready to commit, I can do it probably by the end of the week. 
However, this is valid only for the mapred branch, so it doesn't affect 
the release.

> NUTCH-119    Regexp to extract outlinks incorrect   
> 1
> NUTCH-115    jobtracker.jsp shows too much information   
> 1
> NUTCH-108    tasktracker crashs when reconnecting to a new jobtracker.
> 1
> NUTCH-113    Disable permanent DNS-to-IP caching for JVM 1.4
> 1
> NUTCH-111    ndfs.replication is not documented within the nutch- 
> default.xml configuration file.
> 1
> NUTCH-100    New plugin urlfilter-db   
> 1
> NUTCH-106    Datanode corruption   
> 1
> NUTCH-95    DeleteDuplicates depends on the order of input segments
> 1


+1

> NUTCH-92    DistributedSearch incorrectly scores results   
> 2


+1. However, solving this correctly is _hard_ ... it's a very similar 
problem to the MultiSearcher in Lucene, and it took that group quite 
some time to reach an acceptable solution...

> NUTCH-91    empty encoding causes exception   
> 1
> NUTCH-52    Parser plugin for MS Excel files   
> 1
> NUTCH-74    French Analyzer Plugin   
> 1
> NUTCH-64    no results after a restart of a search--server (without  
> tomcat restart)
> 1
> NUTCH-68    A tool to generate arbitrary fetchlists   
> 1
> NUTCH-62    Add html META tag information into metaData in index-more  
> plugin
> 1
> NUTCH-61    Adaptive re-fetch interval. Detecting umodified content
> 1

+1. I think this is an important feature. I have some patches, which 
need to be updated. However, I wouldn't be so bold as to commit them 
just before a release. There are quite a few subtle issues with the 
segment handling if you use this.

> NUTCH-13    If dns points to 127.0.0.1, the url is also crawled
> 1
> NUTCH-48    "Did you mean" query enhancement/refignment feature request
> 1
> NUTCH-45    Log corrupt segments in SegmentMergeTool   
> 1
> NUTCH-24    Cannot handle incorrectly cased Content-Type   
> 1


Isn't this solved already?

> NUTCH-16    boost documents matching a url pattern   
> 1
>
>
>


-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com