You are viewing a plain text version of this content. The canonical link for it is here.
- Re: ant test failures - posted by Doğacan Güney <do...@gmail.com> on 2007/09/01 14:48:43 UTC, 0 replies.
- [jira] Created: (NUTCH-547) Redirection handling: YahooSlurp's algorithm - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/03 09:47:18 UTC, 0 replies.
- [jira] Updated: (NUTCH-547) Redirection handling: YahooSlurp's algorithm - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/03 09:49:19 UTC, 1 replies.
- [jira] Commented: (NUTCH-546) file URL are filtered out by the crawler - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/03 09:53:19 UTC, 2 replies.
- [jira] Updated: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2007/09/03 10:27:19 UTC, 0 replies.
- [jira] Resolved: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/03 15:38:58 UTC, 0 replies.
- [jira] Commented: (NUTCH-547) Redirection handling: YahooSlurp's algorithm - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/09/03 20:14:57 UTC, 4 replies.
- [jira] Closed: (NUTCH-526) Use a combiner in LinDbMerger to improve the performance as in LinkDb - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2007/09/04 05:36:57 UTC, 0 replies.
- [jira] Updated: (NUTCH-528) CrawlDbReader: add some new stats + dump into a csv format - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2007/09/04 09:16:57 UTC, 0 replies.
- [jira] Updated: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2007/09/04 10:46:58 UTC, 3 replies.
- [jira] Created: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2007/09/04 12:34:59 UTC, 0 replies.
- [jira] Updated: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2007/09/04 12:34:59 UTC, 0 replies.
- [jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2007/09/04 12:38:58 UTC, 4 replies.
- [jira] Closed: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/04 14:32:47 UTC, 0 replies.
- [jira] Commented: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time - posted by "Hudson (JIRA)" <ji...@apache.org> on 2007/09/04 19:00:55 UTC, 0 replies.
- [jira] Commented: (NUTCH-251) Administration GUI - posted by "Marc Brette (JIRA)" <ji...@apache.org> on 2007/09/05 18:33:34 UTC, 1 replies.
- [jira] Updated: (NUTCH-546) file URL are filtered out by the crawler - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/06 14:56:31 UTC, 0 replies.
- [jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/06 15:24:31 UTC, 1 replies.
- [jira] Commented: (NUTCH-524) Generate Problem with Single Node - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/06 15:26:34 UTC, 0 replies.
- Meta Tags and Indexing - posted by Jeff Maki <cr...@gmail.com> on 2007/09/06 16:45:14 UTC, 0 replies.
- Labeling URLs a-la Google - posted by Jeff Maki <cr...@gmail.com> on 2007/09/06 22:04:18 UTC, 1 replies.
- Limiting outlink tags. - posted by Marcin Okraszewski <ok...@o2.pl> on 2007/09/06 23:09:33 UTC, 2 replies.
- [jira] Created: (NUTCH-549) Bug - posted by "crossany (JIRA)" <ji...@apache.org> on 2007/09/07 04:35:28 UTC, 0 replies.
- Re: bug with generate performance - posted by Doğacan Güney <do...@gmail.com> on 2007/09/07 09:37:51 UTC, 2 replies.
- [jira] Created: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/07 10:29:30 UTC, 0 replies.
- [jira] Updated: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/07 10:29:31 UTC, 0 replies.
- [jira] Created: (NUTCH-551) performance for generate is often really bad - posted by "Jim (JIRA)" <ji...@apache.org> on 2007/09/08 01:43:31 UTC, 0 replies.
- [jira] Commented: (NUTCH-551) performance for generate is often really bad - posted by "Jim (JIRA)" <ji...@apache.org> on 2007/09/08 04:14:30 UTC, 5 replies.
- Pl...Give me example - posted by "m.harig" <m....@gmail.com> on 2007/09/08 06:23:36 UTC, 0 replies.
- Daniel Udatny is out of the office. - posted by ru...@rosa.com on 2007/09/08 10:09:24 UTC, 0 replies.
- [jira] Updated: (NUTCH-44) too many search results - posted by "Susam Pal (JIRA)" <ji...@apache.org> on 2007/09/08 11:55:29 UTC, 2 replies.
- [jira] Updated: (NUTCH-281) cached.jsp: base-href needs to be outside comments - posted by "Susam Pal (JIRA)" <ji...@apache.org> on 2007/09/09 12:57:30 UTC, 0 replies.
- [jira] Resolved: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/10 21:41:29 UTC, 0 replies.
- [jira] Closed: (NUTCH-549) Bug - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/10 21:41:30 UTC, 0 replies.
- [jira] Closed: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/10 21:41:30 UTC, 0 replies.
- [jira] Resolved: (NUTCH-546) file URL are filtered out by the crawler - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/10 21:47:29 UTC, 0 replies.
- [jira] Closed: (NUTCH-491) dedup fails with ArrayIndexOutOfBoundsException - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/10 21:49:30 UTC, 0 replies.
- [jira] Commented: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/10 21:53:29 UTC, 2 replies.
- Build failed in Hudson: Nutch-Nightly #203 - posted by hu...@lucene.zones.apache.org on 2007/09/11 08:37:59 UTC, 3 replies.
- [jira] Commented: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 - posted by "Hudson (JIRA)" <ji...@apache.org> on 2007/09/11 08:39:32 UTC, 0 replies.
- Downloading file types to file system - posted by eyal edri <ey...@gmail.com> on 2007/09/11 10:41:14 UTC, 2 replies.
- GoogleMini URL rewriting - posted by Andrzej Bialecki <ab...@getopt.org> on 2007/09/11 22:01:25 UTC, 0 replies.
- Hudson build is back to normal: Nutch-Nightly #204 - posted by hu...@lucene.zones.apache.org on 2007/09/12 06:22:12 UTC, 0 replies.
- Scoring API issues (LONG) - posted by Andrzej Bialecki <ab...@getopt.org> on 2007/09/13 17:44:32 UTC, 4 replies.
- [jira] Created: (NUTCH-552) Upgrade Nutch to Hadoop 0.14.x - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/09/13 18:09:32 UTC, 0 replies.
- [jira] Created: (NUTCH-553) Add more normalization rules to regex-normalize file. - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/09/13 18:41:32 UTC, 0 replies.
- protocol-httpclient Authentication schemes - posted by Susam Pal <su...@gmail.com> on 2007/09/14 23:40:25 UTC, 0 replies.
- [jira] Commented: (NUTCH-434) Replace usage of ObjectWritable with something based on GenericWritable - posted by "Brian Whitman (JIRA)" <ji...@apache.org> on 2007/09/15 00:47:32 UTC, 0 replies.
- [jira] Updated: (NUTCH-412) plugin to parse the feed-url (rss/atom) of a blog - posted by "Brian Whitman (JIRA)" <ji...@apache.org> on 2007/09/15 01:34:32 UTC, 0 replies.
- [jira] Created: (NUTCH-554) Generator throws java.io.IOException and dies on injected urls with no protocol - posted by "Brian Whitman (JIRA)" <ji...@apache.org> on 2007/09/15 17:16:32 UTC, 0 replies.
- [jira] Created: (NUTCH-555) StackOverflowError in DomContentUtils - posted by "Karsten Dello (JIRA)" <ji...@apache.org> on 2007/09/16 20:07:32 UTC, 0 replies.
- [jira] Updated: (NUTCH-555) StackOverflowError in DomContentUtils - posted by "Karsten Dello (JIRA)" <ji...@apache.org> on 2007/09/16 20:09:32 UTC, 6 replies.
- [jira] Created: (NUTCH-556) automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks - posted by "King Kong (JIRA)" <ji...@apache.org> on 2007/09/17 08:34:32 UTC, 0 replies.
- [jira] Updated: (NUTCH-556) automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks - posted by "King Kong (JIRA)" <ji...@apache.org> on 2007/09/17 08:57:32 UTC, 0 replies.
- {Dangerous Content?} Fwd: 100 Messaggi Inoltrati - posted by g....@ifc.cnr.it on 2007/09/17 19:13:30 UTC, 19 replies.
- Fwd: 11 Messaggi Inoltrati - posted by g....@ifc.cnr.it on 2007/09/17 19:27:21 UTC, 0 replies.
- [jira] Updated: (NUTCH-554) Generator throws java.io.IOException and dies on injected urls with no protocol - posted by "Brian Whitman (JIRA)" <ji...@apache.org> on 2007/09/17 20:20:43 UTC, 0 replies.
- Host-level stats, ranking and recrawl - posted by Andrzej Bialecki <ab...@getopt.org> on 2007/09/17 21:38:36 UTC, 3 replies.
- [jira] Created: (NUTCH-557) protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication - posted by "Susam Pal (JIRA)" <ji...@apache.org> on 2007/09/18 20:13:43 UTC, 0 replies.
- [jira] Updated: (NUTCH-557) protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication - posted by "Susam Pal (JIRA)" <ji...@apache.org> on 2007/09/18 20:15:44 UTC, 1 replies.
- [jira] Resolved: (NUTCH-554) Generator throws java.io.IOException and dies on injected urls with no protocol - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/09/18 21:08:44 UTC, 0 replies.
- [jira] Closed: (NUTCH-554) Generator throws java.io.IOException and dies on injected urls with no protocol - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/09/18 21:10:43 UTC, 0 replies.
- [jira] Commented: (NUTCH-554) Generator throws java.io.IOException and dies on injected urls with no protocol - posted by "Hudson (JIRA)" <ji...@apache.org> on 2007/09/19 07:09:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-557) protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2007/09/19 12:50:43 UTC, 5 replies.
- [jira] Created: (NUTCH-558) Need tool to retrieve domain statistics - posted by "Chris Schneider (JIRA)" <ji...@apache.org> on 2007/09/20 01:52:12 UTC, 0 replies.
- NUTCH-251(Administration gui) and next version - posted by karthik085 <ka...@gmail.com> on 2007/09/20 18:57:28 UTC, 2 replies.
- [jira] Updated: (NUTCH-488) Avoid parsing uneccessary links and get a more relevant outlink list - posted by "Marcin Okraszewski (JIRA)" <ji...@apache.org> on 2007/09/20 22:17:50 UTC, 0 replies.
- Blank result page - posted by Balachanthar <ba...@gmail.com> on 2007/09/21 08:29:36 UTC, 0 replies.
- [jira] Commented: (NUTCH-488) Avoid parsing uneccessary links and get a more relevant outlink list - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/21 13:27:51 UTC, 0 replies.
- [jira] Work started: (NUTCH-558) Need tool to retrieve domain statistics - posted by "Chris Schneider (JIRA)" <ji...@apache.org> on 2007/09/21 20:30:50 UTC, 0 replies.
- [jira] Commented: (NUTCH-503) Generator exits incorrectly for small fetchlists - posted by "The Jin Group (JIRA)" <ji...@apache.org> on 2007/09/21 22:12:53 UTC, 0 replies.
- [jira] Updated: (NUTCH-558) Need tool to retrieve domain statistics - posted by "Chris Schneider (JIRA)" <ji...@apache.org> on 2007/09/22 23:59:50 UTC, 0 replies.
- Re: nutch trunk filtering URLs in invertlinks even if -noFilter is on? - posted by Brian Whitman <br...@variogr.am> on 2007/09/23 17:38:17 UTC, 1 replies.
- [jira] Commented: (NUTCH-558) Need tool to retrieve domain statistics - posted by "Chris Schneider (JIRA)" <ji...@apache.org> on 2007/09/23 18:25:50 UTC, 2 replies.
- [jira] Closed: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/24 10:28:50 UTC, 0 replies.
- [jira] Resolved: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/24 10:28:50 UTC, 0 replies.
- [jira] Created: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server - posted by "Susam Pal (JIRA)" <ji...@apache.org> on 2007/09/24 20:28:51 UTC, 0 replies.
- [jira] Updated: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server - posted by "Susam Pal (JIRA)" <ji...@apache.org> on 2007/09/24 20:41:50 UTC, 2 replies.
- [jira] Closed: (NUTCH-557) protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication - posted by "Susam Pal (JIRA)" <ji...@apache.org> on 2007/09/24 20:53:50 UTC, 0 replies.
- [jira] Created: (NUTCH-560) protocol-httpclient reading more bytes than http.content.limit - posted by "Joseph M. (JIRA)" <ji...@apache.org> on 2007/09/25 12:34:50 UTC, 0 replies.
- [jira] Commented: (NUTCH-539) HttpClient plugin does not work with BasicAuthentication - posted by "Alexis Votta (JIRA)" <ji...@apache.org> on 2007/09/25 19:26:50 UTC, 0 replies.
- [jira] Created: (NUTCH-561) HttpClient plugin does not work with NTLM authentication - posted by "Alexis Votta (JIRA)" <ji...@apache.org> on 2007/09/25 19:28:51 UTC, 0 replies.
- [jira] Commented: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server - posted by "Robert Dale (JIRA)" <ji...@apache.org> on 2007/09/25 19:30:50 UTC, 1 replies.
- [jira] Issue Comment Edited: (NUTCH-539) HttpClient plugin does not work with BasicAuthentication - posted by "Susam Pal (JIRA)" <ji...@apache.org> on 2007/09/25 19:54:50 UTC, 0 replies.
- [jira] Resolved: (NUTCH-25) needs 'character encoding' detector - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/26 16:06:51 UTC, 0 replies.
- [jira] Closed: (NUTCH-487) Neko HTML parser goes on default settings. - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/26 16:06:52 UTC, 0 replies.
- [jira] Closed: (NUTCH-369) StringUtil.resolveEncodingAlias is unuseful. - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/09/26 16:08:50 UTC, 0 replies.
- [jira] Commented: (NUTCH-560) protocol-httpclient reading more bytes than http.content.limit - posted by "Susam Pal (JIRA)" <ji...@apache.org> on 2007/09/26 20:54:50 UTC, 0 replies.
- Problem with trunk HtmlParser.java - posted by Ned Rockson <nr...@stanford.edu> on 2007/09/27 01:15:36 UTC, 2 replies.
- Parsing extra fields from an html page in the web..... - posted by Pratyush Banerjee <pr...@gmail.com> on 2007/09/27 15:13:01 UTC, 1 replies.
- query parsing - posted by Sebastian Schick <sc...@informatik.uni-rostock.de> on 2007/09/27 15:59:24 UTC, 1 replies.
- Build failed in Hudson: Nutch-Nightly #219 - posted by hu...@lucene.zones.apache.org on 2007/09/27 19:38:26 UTC, 0 replies.
- [jira] Commented: (NUTCH-487) Neko HTML parser goes on default settings. - posted by "Hudson (JIRA)" <ji...@apache.org> on 2007/09/27 19:38:52 UTC, 0 replies.
- [jira] Commented: (NUTCH-25) needs 'character encoding' detector - posted by "Hudson (JIRA)" <ji...@apache.org> on 2007/09/27 19:38:52 UTC, 1 replies.
- [jira] Commented: (NUTCH-369) StringUtil.resolveEncodingAlias is unuseful. - posted by "Hudson (JIRA)" <ji...@apache.org> on 2007/09/27 19:38:52 UTC, 0 replies.
- Adding fields to BasicQueryFilter - posted by julien nioche <di...@googlemail.com> on 2007/09/27 23:40:34 UTC, 0 replies.
- Build failed in Hudson: Nutch-Nightly #220 - posted by hu...@lucene.zones.apache.org on 2007/09/28 08:19:49 UTC, 0 replies.
- Build failed in Hudson: Nutch-Nightly #221 - posted by hu...@lucene.zones.apache.org on 2007/09/29 06:14:04 UTC, 1 replies.
- [jira] Created: (NUTCH-562) Port mime type framework to use Tika mime detection framework - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2007/09/29 06:36:50 UTC, 0 replies.
- [jira] Work started: (NUTCH-562) Port mime type framework to use Tika mime detection framework - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2007/09/29 06:36:50 UTC, 0 replies.
- Hudson build is back to normal: Nutch-Nightly #222 - posted by hu...@lucene.zones.apache.org on 2007/09/30 06:16:58 UTC, 0 replies.