You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] Updated: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server - posted by "Susam Pal (JIRA)" <ji...@apache.org> on 2007/11/01 14:12:51 UTC, 1 replies.
- [jira] Commented: (NUTCH-442) Integrate Solr/Nutch - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/01 14:45:50 UTC, 4 replies.
- When is the Clause.getQuery().getBoost == 0? - posted by Ned Rockson <ne...@discoveryengine.com> on 2007/11/01 22:32:51 UTC, 1 replies.
- Re: plugin analyzer - posted by karthik085 <ka...@gmail.com> on 2007/11/02 04:08:33 UTC, 0 replies.
- Nutch automatically deleting sites from search results - posted by karthik085 <ka...@gmail.com> on 2007/11/02 04:27:40 UTC, 0 replies.
- [jira] Created: (NUTCH-571) parse-mp3 plugin doesn't always index album of mp3 - posted by "Joseph Chen (JIRA)" <ji...@apache.org> on 2007/11/03 03:25:50 UTC, 0 replies.
- Re: How to extract specified information from html? - posted by qi wu <ch...@gmail.com> on 2007/11/03 14:56:42 UTC, 1 replies.
- How dose the Nutch-0.9 read the configuration file? - posted by Xin Zhang <nu...@gmail.com> on 2007/11/04 12:30:18 UTC, 1 replies.
- JIRA emails and Nutch - posted by Dennis Kubes <ku...@apache.org> on 2007/11/04 16:48:46 UTC, 4 replies.
- [jira] Issue Comment Edited: (NUTCH-356) Plugin repository cache can lead to memory leak - posted by "Sam Xia (JIRA)" <ji...@apache.org> on 2007/11/06 19:55:50 UTC, 4 replies.
- adding dmoz meta data to index. - posted by "ned@bcit" <ne...@yahoo.com> on 2007/11/06 20:29:55 UTC, 1 replies.
- Tika API - posted by Ned Rockson <ne...@discoveryengine.com> on 2007/11/06 23:47:45 UTC, 5 replies.
- MD5 vs TextProfile Signature - posted by karthik085 <ka...@gmail.com> on 2007/11/07 01:27:45 UTC, 0 replies.
- [jira] Created: (NUTCH-573) Multiple Domains - Query Search - posted by "Rajasekar Karthik (JIRA)" <ji...@apache.org> on 2007/11/07 19:59:50 UTC, 0 replies.
- [jira] Commented: (NUTCH-547) Redirection handling: YahooSlurp's algorithm - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/11/07 20:22:51 UTC, 4 replies.
- [jira] Updated: (NUTCH-547) Redirection handling: YahooSlurp's algorithm - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/11/07 20:24:50 UTC, 0 replies.
- [jira] Commented: (NUTCH-572) Scoring and redirected Urls - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/11/07 20:36:51 UTC, 1 replies.
- [jira] Created: (NUTCH-574) Including inlink anchor text in index can create irrelevant search results. - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/11/07 20:45:51 UTC, 0 replies.
- [jira] Commented: (NUTCH-356) Plugin repository cache can lead to memory leak - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/07 20:58:51 UTC, 0 replies.
- [jira] Commented: (NUTCH-574) Including inlink anchor text in index can create irrelevant search results. - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/07 21:13:50 UTC, 19 replies.
- [jira] Updated: (NUTCH-574) Including inlink anchor text in index can create irrelevant search results. - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/11/07 21:13:50 UTC, 3 replies.
- db.ignore.internal.links and ranking algorithms - posted by karthik085 <ka...@gmail.com> on 2007/11/07 21:32:14 UTC, 4 replies.
- NullPointerException in FetchedSegments.getSummary() - posted by John Doe <or...@yahoo.com> on 2007/11/08 01:27:28 UTC, 0 replies.
- [jira] Resolved: (NUTCH-547) Redirection handling: YahooSlurp's algorithm - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/08 14:20:50 UTC, 0 replies.
- [jira] Closed: (NUTCH-411) Parse ignores meta refresh redirection - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/08 16:04:50 UTC, 0 replies.
- [jira] Closed: (NUTCH-547) Redirection handling: YahooSlurp's algorithm - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/08 16:04:50 UTC, 0 replies.
- [jira] Resolved: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/08 16:10:50 UTC, 0 replies.
- [jira] Updated: (NUTCH-567) Proper (?) handling of URIs in TagSoup. - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/08 16:14:50 UTC, 3 replies.
- [jira] Closed: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/08 16:14:50 UTC, 0 replies.
- [jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup. - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/08 16:21:50 UTC, 2 replies.
- [jira] Resolved: (NUTCH-538) Delete unused classes under o.a.n.util - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/08 20:09:50 UTC, 0 replies.
- [jira] Closed: (NUTCH-538) Delete unused classes under o.a.n.util - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/08 20:09:50 UTC, 0 replies.
- [jira] Closed: (NUTCH-465) I download nutch 0.9 used tar zxvf nutch-0.9.tar.gz at last A lone zero block - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/08 20:11:50 UTC, 0 replies.
- [jira] Closed: (NUTCH-494) FindBugs: CrawlDbReader and DeleteDuplicates - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/08 20:15:50 UTC, 0 replies.
- [jira] Resolved: (NUTCH-494) FindBugs: CrawlDbReader and DeleteDuplicates - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/08 20:15:50 UTC, 0 replies.
- Usage of mapred-default.xml is deprecated in hadoop0.15.0 - posted by Ned Rockson <ne...@discoveryengine.com> on 2007/11/08 23:20:17 UTC, 0 replies.
- [jira] Created: (NUTCH-575) NPE in OpenSearchServlet when summary is null - posted by "John H. Lee (JIRA)" <ji...@apache.org> on 2007/11/08 23:43:50 UTC, 0 replies.
- [jira] Updated: (NUTCH-575) NPE in OpenSearchServlet when summary is null - posted by "John H. Lee (JIRA)" <ji...@apache.org> on 2007/11/08 23:45:50 UTC, 1 replies.
- Build failed in Hudson: Nutch-Nightly #261 - posted by hu...@lucene.zones.apache.org on 2007/11/09 06:36:56 UTC, 1 replies.
- [jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat - posted by "Hudson (JIRA)" <ji...@apache.org> on 2007/11/09 06:38:50 UTC, 1 replies.
- [jira] Commented: (NUTCH-538) Delete unused classes under o.a.n.util - posted by "Hudson (JIRA)" <ji...@apache.org> on 2007/11/09 06:38:51 UTC, 1 replies.
- [jira] Commented: (NUTCH-494) FindBugs: CrawlDbReader and DeleteDuplicates - posted by "Hudson (JIRA)" <ji...@apache.org> on 2007/11/09 06:38:51 UTC, 0 replies.
- EOF exception while fetching - posted by Ned Rockson <ne...@discoveryengine.com> on 2007/11/09 20:48:26 UTC, 0 replies.
- Can we add this to nutch? - posted by misc <mi...@robotgenius.net> on 2007/11/10 00:14:58 UTC, 1 replies.
- Auto complete - posted by misc <mi...@robotgenius.net> on 2007/11/10 02:35:35 UTC, 0 replies.
- Generator speed - posted by misc <mi...@robotgenius.net> on 2007/11/10 02:46:11 UTC, 0 replies.
- wiki faq - posted by misc <mi...@robotgenius.net> on 2007/11/10 02:51:58 UTC, 0 replies.
- Hudson build is back to normal: Nutch-Nightly #262 - posted by hu...@lucene.zones.apache.org on 2007/11/10 05:41:51 UTC, 0 replies.
- [jira] Commented: (NUTCH-479) Support for OR queries - posted by "Sebastian Steinmetz (JIRA)" <ji...@apache.org> on 2007/11/10 20:58:50 UTC, 0 replies.
- takes the URI info, Content, headers, ect into a MYSQL database. - posted by xingjian <xi...@gmail.com> on 2007/11/13 06:37:34 UTC, 2 replies.
- [jira] Commented: (NUTCH-540) some problem about the Nutch cache - posted by "david euler (JIRA)" <ji...@apache.org> on 2007/11/13 15:12:51 UTC, 1 replies.
- Need help in updating url in runtime in [Fetcher.java] - posted by eyal edri <ey...@gmail.com> on 2007/11/13 16:30:14 UTC, 0 replies.
- [jira] Resolved: (NUTCH-574) Including inlink anchor text in index can create irrelevant search results. - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/11/13 18:46:43 UTC, 0 replies.
- [jira] Assigned: (NUTCH-573) Multiple Domains - Query Search - posted by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2007/11/14 08:20:43 UTC, 0 replies.
- [jira] Updated: (NUTCH-573) Multiple Domains - Query Search - posted by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2007/11/14 08:58:43 UTC, 1 replies.
- [jira] Commented: (NUTCH-573) Multiple Domains - Query Search - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/11/14 10:48:43 UTC, 5 replies.
- [jira] Created: (NUTCH-576) Different Analyzers Support - posted by "Rajasekar Karthik (JIRA)" <ji...@apache.org> on 2007/11/14 16:10:43 UTC, 0 replies.
- Re: Commented: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic - posted by w00_008 <ta...@sina.com> on 2007/11/14 19:14:37 UTC, 0 replies.
- [jira] Reopened: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/11/15 00:33:43 UTC, 0 replies.
- [jira] Updated: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/11/15 00:40:43 UTC, 0 replies.
- [jira] Updated: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/11/15 17:47:43 UTC, 0 replies.
- [jira] Issue Comment Edited: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/11/15 18:09:43 UTC, 0 replies.
- [jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility - posted by "Renaud Richardet (JIRA)" <ji...@apache.org> on 2007/11/15 18:13:43 UTC, 1 replies.
- [jira] Closed: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/11/15 19:12:43 UTC, 0 replies.
- Commit Times for Issues - posted by Dennis Kubes <ku...@apache.org> on 2007/11/15 22:37:22 UTC, 6 replies.
- Nutch trunk js-parser problem with extremely long and meaningless Elements - posted by Ned Rockson <ne...@discoveryengine.com> on 2007/11/16 03:18:58 UTC, 0 replies.
- about heritrix crawl,Who will tell me in this Nutch forum?thanks - posted by xingjian <xi...@gmail.com> on 2007/11/16 06:00:08 UTC, 0 replies.
- [jira] Commented: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x - posted by "Hudson (JIRA)" <ji...@apache.org> on 2007/11/16 21:26:43 UTC, 0 replies.
- [jira] Created: (NUTCH-577) Use explicit tika-config.xml file to enable mime magic detection to be turned on and off - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2007/11/18 00:29:43 UTC, 0 replies.
- [jira] Commented: (NUTCH-577) Use explicit tika-config.xml file to enable mime magic detection to be turned on and off - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/18 09:48:43 UTC, 0 replies.
- [jira] Updated: (NUTCH-442) Integrate Solr/Nutch - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/19 22:12:43 UTC, 0 replies.
- [jira] Issue Comment Edited: (NUTCH-442) Integrate Solr/Nutch - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/19 22:27:43 UTC, 0 replies.
- [jira] Created: (NUTCH-578) URL fetched with 403 is generated over and over again - posted by "Nathaniel Powell (JIRA)" <ji...@apache.org> on 2007/11/20 22:39:43 UTC, 0 replies.
- [jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again - posted by "Nathaniel Powell (JIRA)" <ji...@apache.org> on 2007/11/20 22:41:43 UTC, 7 replies.
- [jira] Created: (NUTCH-579) Feed plugin only indexes one post per feed due to identical digest - posted by "Joseph Chen (JIRA)" <ji...@apache.org> on 2007/11/21 08:41:43 UTC, 0 replies.
- [jira] Commented: (NUTCH-579) Feed plugin only indexes one post per feed due to identical digest - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/11/21 09:50:43 UTC, 0 replies.
- [jira] Updated: (NUTCH-580) Remove deprecated hadoop api calls (FS) - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/11/21 17:48:43 UTC, 0 replies.
- [jira] Created: (NUTCH-580) Remove deprecated hadoop api calls (FS) - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/11/21 17:48:43 UTC, 0 replies.
- [jira] Created: (NUTCH-581) DistributedSearch does not update search servers added to search-servers.txt on the fly - posted by "Rohan Mehta (JIRA)" <ji...@apache.org> on 2007/11/21 17:58:43 UTC, 0 replies.
- [jira] Updated: (NUTCH-581) DistributedSearch does not update search servers added to search-servers.txt on the fly - posted by "Rohan Mehta (JIRA)" <ji...@apache.org> on 2007/11/21 18:00:43 UTC, 0 replies.
- [jira] Created: (NUTCH-582) Add missing type parameters - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/11/21 19:47:44 UTC, 0 replies.
- [jira] Updated: (NUTCH-582) Add missing type parameters - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/11/21 19:49:43 UTC, 0 replies.
- Backwards compatibility strategy - posted by Sami Siren <ss...@gmail.com> on 2007/11/22 18:45:57 UTC, 1 replies.
- Applicant for Nutch Project - posted by shaowen yu <yu...@carnation.com.cn> on 2007/11/23 07:13:49 UTC, 1 replies.
- Maintaining source url data (father) during runtime - posted by eyal edri <ey...@gmail.com> on 2007/11/25 12:34:56 UTC, 4 replies.
- [jira] Created: (NUTCH-583) FeedParser empty links for items - posted by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2007/11/27 16:01:43 UTC, 0 replies.
- Issue with IndexSearcher initialization in NuchBean - posted by Frederic Ciminera <ci...@gmail.com> on 2007/11/27 18:10:16 UTC, 0 replies.
- [jira] Commented: (NUTCH-575) NPE in OpenSearchServlet when summary is null - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/11/27 20:06:43 UTC, 0 replies.
- some question about development - posted by 颜韵旋 <ju...@126.com> on 2007/11/28 15:39:39 UTC, 1 replies.
- [jira] Created: (NUTCH-584) urls missing from fetchlist - posted by "Ruslan Ermilov (JIRA)" <ji...@apache.org> on 2007/11/28 16:57:43 UTC, 0 replies.
- [jira] Created: (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed - posted by "Andrea Spinelli (JIRA)" <ji...@apache.org> on 2007/11/29 12:13:43 UTC, 1 replies.
- Parsing ppt with mimetype application/x-mspowerpoint - posted by pavan kumar donepudi <pa...@gmail.com> on 2007/11/29 16:38:40 UTC, 0 replies.
- [jira] Updated: (NUTCH-586) Add option to run compiled classes w/o job file - posted by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2007/11/30 11:36:43 UTC, 0 replies.
- [jira] Created: (NUTCH-586) Add option to run compiled classes w/o job file - posted by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2007/11/30 11:36:43 UTC, 0 replies.