You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] Commented: (NUTCH-351) Protocol forward proxy - posted by "Chris Schneider (JIRA)" <ji...@apache.org> on 2006/11/02 02:45:17 UTC, 0 replies.
- [jira] Closed: (NUTCH-387) host normalization in Generator$Selector - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/11/03 12:56:18 UTC, 0 replies.
- Fetcher freezes - posted by Aisha <ai...@yahoo.com> on 2006/11/03 15:53:20 UTC, 4 replies.
- [jira] Created: (NUTCH-396) mergesegs sorts URLs, making segments useless for subsequent fetch - posted by "Doug Cook (JIRA)" <ji...@apache.org> on 2006/11/04 00:07:17 UTC, 0 replies.
- deep limitation - posted by an...@orbita1.ru on 2006/11/06 09:31:02 UTC, 0 replies.
- need help to speed up map-reduce - posted by AJ Chen <ca...@gmail.com> on 2006/11/06 22:34:56 UTC, 2 replies.
- [jira] Commented: (NUTCH-36) Chinese in Nutch - posted by "juwen (JIRA)" <ji...@apache.org> on 2006/11/07 07:23:33 UTC, 0 replies.
- implement thai lanaguage analyzer in nutch - posted by sanjeev <sa...@hotmail.com> on 2006/11/07 09:06:19 UTC, 18 replies.
- Modifiying Nutch Indexer - posted by "Javier P. L." <li...@gmail.com> on 2006/11/07 11:23:24 UTC, 2 replies.
- [jira] Updated: (NUTCH-389) a url tokenizer implementation for tokenizing index fields : url and host - posted by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2006/11/07 14:16:51 UTC, 0 replies.
- [jira] Commented: (NUTCH-393) Indexer doesn't handle null documents returned by filters - posted by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2006/11/07 14:34:51 UTC, 1 replies.
- [jira] Created: (NUTCH-397) porting clustering-carrot2 plugin to carrot2 v2.0 - posted by "Do?acan Güney (JIRA)" <ji...@apache.org> on 2006/11/07 17:20:51 UTC, 0 replies.
- [jira] Updated: (NUTCH-397) porting clustering-carrot2 plugin to carrot2 v2.0 - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2006/11/07 17:22:55 UTC, 0 replies.
- [jira] Created: (NUTCH-398) map-reduce very slow when crawling on single server - posted by "AJ Chen (JIRA)" <ji...@apache.org> on 2006/11/08 01:28:51 UTC, 0 replies.
- [jira] Commented: (NUTCH-398) map-reduce very slow when crawling on single server - posted by "nutch.newbie (JIRA)" <ji...@apache.org> on 2006/11/08 06:20:52 UTC, 2 replies.
- implement thai language in nutch - posted by sanjeev <sa...@hotmail.com> on 2006/11/08 11:24:55 UTC, 1 replies.
- Nutch 0.9 not loading plugins (sorry very long) - posted by zzcgiacomini <zz...@echo.fr> on 2006/11/08 11:25:05 UTC, 1 replies.
- why can't build in the Linux with ant - posted by kauu <ba...@gmail.com> on 2006/11/09 03:52:21 UTC, 1 replies.
- How to start working with MapReduce? - posted by kauu <ba...@gmail.com> on 2006/11/09 09:46:11 UTC, 2 replies.
- Nutch and Lucene - posted by hzhong <he...@gmail.com> on 2006/11/10 09:08:40 UTC, 1 replies.
- [jira] Commented: (NUTCH-395) Increase fetching speed - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/10 17:44:39 UTC, 4 replies.
- [jira] Updated: (NUTCH-395) Increase fetching speed - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/11 09:57:38 UTC, 2 replies.
- [jira] Created: (NUTCH-399) Change CommandRunner to use concurrent api from jdk - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/11 16:24:39 UTC, 0 replies.
- [jira] Resolved: (NUTCH-399) Change CommandRunner to use concurrent api from jdk - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/11 16:29:38 UTC, 0 replies.
- [jira] Created: (NUTCH-400) Update & add missing license headers - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/12 01:11:37 UTC, 0 replies.
- [jira] Updated: (NUTCH-400) Update & add missing license headers - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/12 01:11:38 UTC, 0 replies.
- [jira] Commented: (NUTCH-185) XMLParser is configurable xml parser plugin. - posted by "Jayant Kumar Gandhi (JIRA)" <ji...@apache.org> on 2006/11/12 08:36:38 UTC, 3 replies.
- [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content - posted by "Armel Nene (JIRA)" <ji...@apache.org> on 2006/11/12 12:46:40 UTC, 4 replies.
- [jira] Commented: (NUTCH-397) porting clustering-carrot2 plugin to carrot2 v2.0 - posted by "Stanislaw Osinski (JIRA)" <ji...@apache.org> on 2006/11/12 14:47:38 UTC, 1 replies.
- Last-modified http field - posted by Javier Parapar Lopez <ja...@udc.es> on 2006/11/13 13:24:33 UTC, 1 replies.
- [jira] Commented: (NUTCH-400) Update & add missing license headers - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/13 19:38:39 UTC, 0 replies.
- [jira] Created: (NUTCH-401) Hardcoded /tmp directory in SegmentReader - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2006/11/13 20:35:38 UTC, 0 replies.
- [jira] Resolved: (NUTCH-395) Increase fetching speed - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/13 20:50:38 UTC, 2 replies.
- Nutch requires now Java 1.5 - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/11/13 21:25:07 UTC, 0 replies.
- [jira] Commented: (NUTCH-401) Hardcoded /tmp directory in SegmentReader - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/13 21:31:38 UTC, 0 replies.
- [jira] Closed: (NUTCH-401) Hardcoded /tmp directory in SegmentReader - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/11/14 13:26:38 UTC, 0 replies.
- [jira] Closed: (NUTCH-378) MetaWrapper decorator - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/11/14 20:48:38 UTC, 0 replies.
- File Protocol - posted by "Armel T. Nene" <ar...@idna-solutions.com> on 2006/11/15 13:45:33 UTC, 0 replies.
- [jira] Created: (NUTCH-402) Incrementalcrawling and indexing - posted by "Arun Kumar Sharma (JIRA)" <ji...@apache.org> on 2006/11/16 05:53:37 UTC, 0 replies.
- implement thai language indexing and search - posted by sanjeev <sa...@hotmail.com> on 2006/11/16 07:56:05 UTC, 4 replies.
- [jira] Updated: (NUTCH-289) CrawlDatum should store IP address - posted by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2006/11/16 09:44:40 UTC, 0 replies.
- [jira] Commented: (NUTCH-289) CrawlDatum should store IP address - posted by "Uros Gruber (JIRA)" <ji...@apache.org> on 2006/11/16 09:59:40 UTC, 0 replies.
- [jira] Commented: (NUTCH-261) Multi Language Support - posted by "nutch.newbie (JIRA)" <ji...@apache.org> on 2006/11/16 09:59:41 UTC, 0 replies.
- More fetcher speed increases - posted by Doug Cook <na...@candiru.com> on 2006/11/16 17:30:32 UTC, 2 replies.
- 0.7.3 version - posted by Piotr Kosiorowski <pk...@gmail.com> on 2006/11/16 22:09:44 UTC, 1 replies.
- [jira] Created: (NUTCH-403) Make URL filtering optional in Generator - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/18 22:36:37 UTC, 0 replies.
- [jira] Updated: (NUTCH-403) Make URL filtering optional in Generator - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/18 22:40:38 UTC, 1 replies.
- [jira] Resolved: (NUTCH-388) nutch-default.xml has outdated example for urlfilter.order - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/18 22:59:42 UTC, 0 replies.
- [jira] Commented: (NUTCH-403) Make URL filtering optional in Generator - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/11/19 09:33:41 UTC, 0 replies.
- [jira] Created: (NUTCH-404) Fix LinkDB Usage - implementation mismatch - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/19 13:54:39 UTC, 0 replies.
- [jira] Resolved: (NUTCH-404) Fix LinkDB Usage - implementation mismatch - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/19 13:58:38 UTC, 0 replies.
- [jira] Commented: (NUTCH-273) When a page is redirected, the original url is NOT updated. - posted by "Johannes Zillmann (JIRA)" <ji...@apache.org> on 2006/11/19 17:13:40 UTC, 0 replies.
- [jira] Resolved: (NUTCH-403) Make URL filtering optional in Generator - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/19 19:51:38 UTC, 0 replies.
- Can I rewrite org.apache.nutch.parse.msword.extractText(InputStream input) like this - posted by TKDD <my...@gmail.com> on 2006/11/20 04:00:46 UTC, 0 replies.
- Errors in RegexURLFilter - posted by scott green <sm...@gmail.com> on 2006/11/20 16:28:21 UTC, 2 replies.
- [jira] Updated: (NUTCH-92) DistributedSearch incorrectly scores results - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2006/11/20 18:00:06 UTC, 1 replies.
- What's the status of Nutch-GUI? - posted by scott green <sm...@gmail.com> on 2006/11/20 18:12:12 UTC, 17 replies.
- [jira] Commented: (NUTCH-251) Administration GUI - posted by "nutch.newbie (JIRA)" <ji...@apache.org> on 2006/11/20 22:14:04 UTC, 3 replies.
- Nutch HTTPS & Sessions - posted by Gavino Marras <g....@ifc.cnr.it> on 2006/11/21 09:24:58 UTC, 0 replies.
- Nutch crawl a Application Server Authentication - posted by Gavino Marras <g....@ifc.cnr.it> on 2006/11/21 09:57:57 UTC, 0 replies.
- [jira] Created: (NUTCH-405) Content object is not properly initialized in map method of ParseSegment - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/21 18:18:02 UTC, 0 replies.
- [jira] Resolved: (NUTCH-405) Content object is not properly initialized in map method of ParseSegment - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/21 18:21:04 UTC, 0 replies.
- [jira] Closed: (NUTCH-380) Nutch does not run/build against Hadoop 0.6 - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/21 18:33:03 UTC, 0 replies.
- [jira] Closed: (NUTCH-349) Port Nutch to use Hadoop Text instead of UTF8 - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/21 18:39:03 UTC, 0 replies.
- [jira] Resolved: (NUTCH-362) Remove parse-text from unsupported filetypes in parse-plugins.xml - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/21 18:53:05 UTC, 0 replies.
- Nutch sessions cookies https - posted by Gavino Marras <g....@ifc.cnr.it> on 2006/11/21 19:00:07 UTC, 0 replies.
- [jira] Resolved: (NUTCH-305) Update crawl and url filter lists to exclude jpeg|JPEG|bmp|BMP - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/21 19:41:03 UTC, 0 replies.
- Nutch folder configuration - posted by "Armel T. Nene" <ar...@idna-solutions.com> on 2006/11/21 22:55:53 UTC, 1 replies.
- Nutch - Hadoop error - posted by "Armel T. Nene" <ar...@idna-solutions.com> on 2006/11/22 18:49:31 UTC, 0 replies.
- Question on adaptive re-fetch plugin - posted by Scott Green <sm...@gmail.com> on 2006/11/23 07:37:14 UTC, 1 replies.
- [jira] Commented: (NUTCH-331) Fetcher incorrectly reports task progress to tasktracker resulting in skipped URLs - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2006/11/23 11:27:03 UTC, 1 replies.
- [jira] Closed: (NUTCH-331) Fetcher incorrectly reports task progress to tasktracker resulting in skipped URLs - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/11/23 11:58:03 UTC, 0 replies.
- Welcome Chris Mattmann as Nutch committer - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/11/23 13:10:01 UTC, 1 replies.
- [jira] Created: (NUTCH-406) Metadata tries to write null values - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2006/11/23 14:27:04 UTC, 0 replies.
- [jira] Updated: (NUTCH-406) Metadata tries to write null values - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2006/11/23 14:29:03 UTC, 2 replies.
- [jira] Updated: (NUTCH-251) Administration GUI - posted by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2006/11/23 15:35:08 UTC, 1 replies.
- [jira] Work started: (NUTCH-406) Metadata tries to write null values - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2006/11/23 16:45:04 UTC, 0 replies.
- [jira] Commented: (NUTCH-406) Metadata tries to write null values - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/11/23 16:59:03 UTC, 4 replies.
- [jira] Resolved: (NUTCH-406) Metadata tries to write null values - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2006/11/23 18:18:03 UTC, 0 replies.
- [jira] Closed: (NUTCH-406) Metadata tries to write null values - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2006/11/23 18:20:04 UTC, 4 replies.
- [jira] Created: (NUTCH-407) Make Nutch crawling parent directories for file protocol configurable - posted by "Thorsten Scherler (JIRA)" <ji...@apache.org> on 2006/11/24 14:24:01 UTC, 0 replies.
- [jira] Updated: (NUTCH-407) Make Nutch crawling parent directories for file protocol configurable - posted by "Thorsten Scherler (JIRA)" <ji...@apache.org> on 2006/11/24 14:34:02 UTC, 0 replies.
- [jira] Assigned: (NUTCH-390) Javadoc warnings - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2006/11/24 19:28:03 UTC, 0 replies.
- [jira] Assigned: (NUTCH-185) XMLParser is configurable xml parser plugin. - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2006/11/24 19:30:04 UTC, 0 replies.
- [jira] Updated: (NUTCH-339) Refactor nutch to allow fetcher improvements - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/11/24 19:55:04 UTC, 1 replies.
- [jira] Assigned: (NUTCH-339) Refactor nutch to allow fetcher improvements - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/11/24 20:06:05 UTC, 0 replies.
- [jira] Commented: (NUTCH-339) Refactor nutch to allow fetcher improvements - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/24 22:52:05 UTC, 5 replies.
- [jira] Commented: (NUTCH-390) Javadoc warnings - posted by "nutch.newbie (JIRA)" <ji...@apache.org> on 2006/11/25 04:29:03 UTC, 0 replies.
- [jira] Created: (NUTCH-408) Plugin development documentation - posted by "nutch.newbie (JIRA)" <ji...@apache.org> on 2006/11/25 04:45:01 UTC, 2 replies.
- [jira] Updated: (NUTCH-273) When a page is redirected, the original url is NOT updated. - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/11/25 11:40:05 UTC, 0 replies.
- [jira] Commented: (NUTCH-408) Plugin development documentation - posted by "nutch.newbie (JIRA)" <ji...@apache.org> on 2006/11/26 00:04:03 UTC, 0 replies.
- [jira] Created: (NUTCH-409) Add "short circuit" notion to filters to speedup mixed site/subsite crawling - posted by "Doug Cook (JIRA)" <ji...@apache.org> on 2006/11/26 01:18:01 UTC, 0 replies.
- [jira] Updated: (NUTCH-409) Add "short circuit" notion to filters to speedup mixed site/subsite crawling - posted by "Doug Cook (JIRA)" <ji...@apache.org> on 2006/11/26 01:20:03 UTC, 0 replies.
- [jira] Commented: (NUTCH-409) Add "short circuit" notion to filters to speedup mixed site/subsite crawling - posted by "Doug Cook (JIRA)" <ji...@apache.org> on 2006/11/26 02:03:03 UTC, 0 replies.
- implement thai lanaguage analyzer during nutch crawl process - posted by sanjeev <sa...@hotmail.com> on 2006/11/27 05:46:52 UTC, 2 replies.
- [jira] Commented: (NUTCH-407) Make Nutch crawling parent directories for file protocol configurable - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/11/27 09:42:22 UTC, 3 replies.
- [jira] Closed: (NUTCH-407) Make Nutch crawling parent directories for file protocol configurable - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/11/27 10:40:23 UTC, 0 replies.
- [jira] Commented: (NUTCH-92) DistributedSearch incorrectly scores results - posted by "Dogacan Güney (JIRA)" <ji...@apache.org> on 2006/11/27 20:24:24 UTC, 0 replies.
- [jira] Commented: (NUTCH-233) wrong regular expression hang reduce process for ever - posted by "Sean Dean (JIRA)" <ji...@apache.org> on 2006/11/28 14:37:23 UTC, 0 replies.
- updating index without refetching - posted by DS jha <ae...@gmail.com> on 2006/11/28 15:11:17 UTC, 0 replies.
- RE: updating index without refitting - posted by Gal Nitzan <gn...@usa.net> on 2006/11/28 15:24:11 UTC, 1 replies.
- Indexing and Re-crawling site - posted by "Armel T. Nene" <ar...@idna-solutions.com> on 2006/11/28 21:20:28 UTC, 0 replies.
- [jira] Created: (NUTCH-410) Faster RegexNormalize with more features - posted by "Doug Cook (JIRA)" <ji...@apache.org> on 2006/11/29 20:44:21 UTC, 0 replies.
- [jira] Updated: (NUTCH-410) Faster RegexNormalize with more features - posted by "Doug Cook (JIRA)" <ji...@apache.org> on 2006/11/29 20:46:22 UTC, 0 replies.
- Re: Should URL normalization iterate? - posted by Doug Cook <na...@candiru.com> on 2006/11/29 20:47:17 UTC, 0 replies.
- Multi-NutchBean - posted by Scott Green <sm...@gmail.com> on 2006/11/30 06:34:07 UTC, 0 replies.
- [jira] Created: (NUTCH-411) Parse ignores meta refresh redirection - posted by "Dogacan Güney (JIRA)" <ji...@apache.org> on 2006/11/30 15:36:21 UTC, 0 replies.
- [jira] Commented: (NUTCH-411) Parse ignores meta refresh redirection - posted by "Dogacan Güney (JIRA)" <ji...@apache.org> on 2006/11/30 15:53:22 UTC, 0 replies.
- [jira] Updated: (NUTCH-411) Parse ignores meta refresh redirection - posted by "Dogacan Güney (JIRA)" <ji...@apache.org> on 2006/11/30 15:55:22 UTC, 0 replies.
- Brochure for Nutch - posted by Peter Landolt <re...@rosa.com> on 2006/11/30 17:29:54 UTC, 1 replies.