You are viewing a plain text version of this content. The canonical link for it is here.
- RE: what contibute to fetch slowing down - posted by Fuad Efendi <fu...@efendi.ca> on 2005/10/01 04:58:21 UTC, 12 replies.
- Nutch 0.7.1 and Nutch web site - posted by Piotr Kosiorowski <pk...@gmail.com> on 2005/10/01 22:35:32 UTC, 2 replies.
- How can I unsubscribe from the mailing list? - posted by ni...@gmail.com on 2005/10/02 22:09:41 UTC, 1 replies.
- Re[2]: what contibute to fetch slowing down - posted by Michael <mi...@gameservice.ru> on 2005/10/03 02:36:45 UTC, 4 replies.
- RE: BUG - > RobotRulesParser - posted by Fuad Efendi <fu...@efendi.ca> on 2005/10/03 05:24:43 UTC, 0 replies.
- java.net.MalformedURLException: no protocol for parse-plugins.xml - posted by Earl Cahill <ca...@yahoo.com> on 2005/10/03 07:04:54 UTC, 2 replies.
- tasks is not killed - posted by Stefan Groschupf <sg...@media-style.com> on 2005/10/03 15:23:15 UTC, 1 replies.
- umbilical.done is called two times - posted by Stefan Groschupf <sg...@media-style.com> on 2005/10/03 18:37:46 UTC, 1 replies.
- IlTrovatore check: e' SPAM? Re: [Fwd: Fetch list priority] - posted by massimo miccoli <mm...@iltrovatore.it> on 2005/10/03 20:10:07 UTC, 0 replies.
- [jira] Commented: (NUTCH-99) ports are hardcoded or random - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/10/03 22:44:47 UTC, 2 replies.
- DNS - posted by Fuad Efendi <fu...@efendi.ca> on 2005/10/04 07:24:42 UTC, 4 replies.
- [jira] Created: (NUTCH-103) Vivisimo like treeview and url redirect - posted by "robert benea (JIRA)" <ji...@apache.org> on 2005/10/04 17:48:47 UTC, 8 replies.
- No more FetchListEntry in MapReduce branch - posted by Kelvin Tan <ke...@relevanz.com> on 2005/10/04 17:50:51 UTC, 0 replies.
- [jira] Updated: (NUTCH-103) Vivisimo like treeview and url redirect - posted by "robert benea (JIRA)" <ji...@apache.org> on 2005/10/04 17:52:47 UTC, 1 replies.
- Lius Framework - posted by Valmir Macário <va...@gmail.com> on 2005/10/04 22:27:52 UTC, 0 replies.
- plugin analyzer - posted by Robert Benea <ro...@gmail.com> on 2005/10/04 23:20:48 UTC, 3 replies.
- Nutch Contract Work - posted by Rod Taylor <rb...@sitesell.com> on 2005/10/05 17:09:40 UTC, 0 replies.
- [jira] Commented: (NUTCH-36) Chinese in Nutch - posted by "Jack Tang (JIRA)" <ji...@apache.org> on 2005/10/05 18:19:47 UTC, 0 replies.
- Q about "exact" match counts? - posted by "Gaulin, Mark" <mg...@globalspec.com> on 2005/10/05 20:04:07 UTC, 1 replies.
- [jira] Created: (NUTCH-104) Nutch query parser does not support CJK bi-gram segmentation. - posted by "Jack Tang (JIRA)" <ji...@apache.org> on 2005/10/05 20:12:47 UTC, 0 replies.
- java.lang.NoClassDefFoundError: org/jdom/JDOMException at LIUS - posted by Valmir Macário <va...@gmail.com> on 2005/10/05 21:00:51 UTC, 0 replies.
- Fetch Speed Issues - posted by Matt Zytaruk <ma...@wavefire.com> on 2005/10/06 01:47:32 UTC, 0 replies.
- search.jsp (and opensearchservlet) and query (utf-8) encoding - posted by stack <st...@archive.org> on 2005/10/06 02:50:34 UTC, 1 replies.
- [jira] Created: (NUTCH-105) Network error during robots.txt fetch causes file to be ignored - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2005/10/06 17:43:48 UTC, 0 replies.
- Fetcher Speed Issues - posted by Matt Zytaruk <ma...@wavefire.com> on 2005/10/06 17:59:15 UTC, 1 replies.
- Noob Questions - posted by "mhammons (sent by Nabble.com)" <li...@nabble.com> on 2005/10/07 04:09:22 UTC, 1 replies.
- dtrace and nutch - posted by Earl Cahill <ca...@yahoo.com> on 2005/10/07 04:39:05 UTC, 1 replies.
- [jira] Created: (NUTCH-106) Datanode corruption - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2005/10/07 04:46:48 UTC, 0 replies.
- [jira] Created: (NUTCH-107) Typo in plugin/urlfilter-regex/plugin.xml - posted by "Stephen Cross (JIRA)" <ji...@apache.org> on 2005/10/07 04:54:47 UTC, 0 replies.
- [jira] Commented: (NUTCH-94) MapFile.Writer throwing 'File exists error'. - posted by "raghavendra prabhu (JIRA)" <ji...@apache.org> on 2005/10/08 08:05:47 UTC, 1 replies.
- [jira] Commented: (NUTCH-96) MapFile.Writer throws directory exists exception if run multiple times in the same JVM or server JVM. - posted by "raghavendra prabhu (JIRA)" <ji...@apache.org> on 2005/10/08 08:07:47 UTC, 0 replies.
- [jira] Commented: (NUTCH-101) RobotRulesParser - posted by "Fuad Efendi (JIRA)" <ji...@apache.org> on 2005/10/09 04:28:52 UTC, 0 replies.
- [jira] Updated: (NUTCH-101) RobotRulesParser - posted by "Fuad Efendi (JIRA)" <ji...@apache.org> on 2005/10/09 04:32:53 UTC, 0 replies.
- [jira] Updated: (NUTCH-100) New plugin urlfilter-db - posted by "Gal Nitzan (JIRA)" <ji...@apache.org> on 2005/10/09 10:06:47 UTC, 12 replies.
- mr: tasks crash & tasks assign to old nodes - posted by Stefan Groschupf <sg...@media-style.com> on 2005/10/09 11:54:28 UTC, 0 replies.
- [jira] Created: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker. - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/10/09 12:45:49 UTC, 1 replies.
- [jira] Updated: (NUTCH-99) ports are hardcoded or random - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/10/09 16:56:47 UTC, 0 replies.
- reprocessing hanging tasks - posted by Stefan Groschupf <sg...@media-style.com> on 2005/10/10 15:34:26 UTC, 3 replies.
- fetch speed issue - posted by AJ Chen <ca...@gmail.com> on 2005/10/10 23:10:23 UTC, 0 replies.
- [jira] Created: (NUTCH-109) Nutch - Fetcher - HTTP - Performance Testing & Tuning - posted by "Fuad Efendi (JIRA)" <ji...@apache.org> on 2005/10/11 02:01:10 UTC, 0 replies.
- [jira] Updated: (NUTCH-109) Nutch - Fetcher - HTTP - Performance Testing & Tuning - posted by "Fuad Efendi (JIRA)" <ji...@apache.org> on 2005/10/11 02:07:04 UTC, 0 replies.
- [jira] Commented: (NUTCH-109) Nutch - Fetcher - HTTP - Performance Testing & Tuning - posted by "Fuad Efendi (JIRA)" <ji...@apache.org> on 2005/10/11 03:19:04 UTC, 0 replies.
- to many hdd reads - posted by Stefan Groschupf <sg...@media-style.com> on 2005/10/11 13:46:55 UTC, 1 replies.
- [jira] Closed: (NUTCH-107) Typo in plugin/urlfilter-regex/plugin.xml - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2005/10/11 21:50:04 UTC, 0 replies.
- [jira] Updated: (NUTCH-109) Nutch - Fetcher - Performance Test - new Protocol-HTTPClient-Innovation - posted by "Fuad Efendi (JIRA)" <ji...@apache.org> on 2005/10/12 00:35:09 UTC, 5 replies.
- [jira] Commented: (NUTCH-109) Nutch - Fetcher - Performance Test - new Protocol-HTTPClient-Innovation - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/10/12 00:50:04 UTC, 20 replies.
- org.apache.commons.io.FileUtils - posted by Paul Baclace <pe...@baclace.net> on 2005/10/12 08:28:44 UTC, 1 replies.
- Re: nutch downloads - posted by Erik Hatcher <er...@ehatchersolutions.com> on 2005/10/12 13:34:47 UTC, 4 replies.
- suspicious outlink count - posted by EM <em...@cpuedge.com> on 2005/10/12 21:32:57 UTC, 0 replies.
- Re: svn commit: r314958 - in /lucene/nutch/trunk/site: about.html bot.html credits.html i18n.html index.html index.pdf issue_tracking.html linkmap.html mailing_lists.html tutorial.html version_control.html - posted by Erik Hatcher <er...@ehatchersolutions.com> on 2005/10/12 22:34:36 UTC, 0 replies.
- keep count of selected url - posted by Daniele Menozzi <me...@ngi.it> on 2005/10/12 23:35:18 UTC, 0 replies.
- clustering strategies - posted by Earl Cahill <ca...@yahoo.com> on 2005/10/13 01:21:53 UTC, 2 replies.
- [jira] Created: (NUTCH-110) OpenSearchServlet outputs illegal xml characters - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2005/10/13 02:13:13 UTC, 0 replies.
- [jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2005/10/13 02:19:04 UTC, 11 replies.
- Enter Chinese in search box, returns messy results - posted by Song Han <ha...@gmail.com> on 2005/10/13 09:40:46 UTC, 1 replies.
- NutchAnalysis -- Distinguishing between quoted clauses (phrases) and unquoted clauses (individual terms) after parsing - posted by "Dalton, Jeffery" <jd...@globalspec.com> on 2005/10/13 15:01:54 UTC, 1 replies.
- how to make fetcher to use the full bandwidth - posted by AJ Chen <ca...@gmail.com> on 2005/10/13 22:35:18 UTC, 5 replies.
- [jira] Created: (NUTCH-111) ndfs.replication is not documented within the nutch-default.xml configuration file. - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2005/10/13 23:04:04 UTC, 0 replies.
- patch for changes related to TestNDFS - posted by Paul Baclace <pe...@baclace.net> on 2005/10/14 03:44:17 UTC, 0 replies.
- All trackers exited on all nodes - posted by Rod Taylor <rb...@sitesell.com> on 2005/10/14 16:53:01 UTC, 4 replies.
- crawl db stats - posted by Stefan Groschupf <sg...@media-style.com> on 2005/10/15 00:17:06 UTC, 10 replies.
- [jira] Resolved: (NUTCH-88) Enhance ParserFactory plugin selection policy - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2005/10/15 01:45:45 UTC, 0 replies.
- [jira] Created: (NUTCH-112) Link in cached.jsp page to cached content is an absolute link - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2005/10/15 19:42:44 UTC, 0 replies.
- [jira] Updated: (NUTCH-112) Link in cached.jsp page to cached content is an absolute link - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2005/10/15 19:44:45 UTC, 0 replies.
- [jira] Created: (NUTCH-113) Disable permanent DNS-to-IP caching for JVM 1.4 - posted by "Fuad Efendi (JIRA)" <ji...@apache.org> on 2005/10/15 21:25:44 UTC, 0 replies.
- [jira] Updated: (NUTCH-113) Disable permanent DNS-to-IP caching for JVM 1.4 - posted by "Fuad Efendi (JIRA)" <ji...@apache.org> on 2005/10/15 21:59:47 UTC, 0 replies.
- [jira] Commented: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker. - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2005/10/15 23:26:44 UTC, 1 replies.
- Problem opening checksum file - posted by Stefan Groschupf <sg...@media-style.com> on 2005/10/16 04:20:22 UTC, 1 replies.
- why is segslice so slow? - posted by EM <em...@cpuedge.com> on 2005/10/16 06:29:51 UTC, 0 replies.
- [jira] Created: (NUTCH-114) getting number of urls and links from crawldb - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/10/16 07:14:44 UTC, 0 replies.
- [jira] Updated: (NUTCH-114) getting number of urls and links from crawldb - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/10/16 07:19:45 UTC, 2 replies.
- developing a parse-/index-/query- plugin set - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2005/10/16 19:53:03 UTC, 7 replies.
- [jira] Commented: (NUTCH-114) getting number of urls and links from crawldb - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/10/17 20:30:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-82) Nutch Commands should run on Windows without external tools - posted by "Fuad Efendi (JIRA)" <ji...@apache.org> on 2005/10/18 01:27:45 UTC, 7 replies.
- [jira] Commented: (NUTCH-103) Vivisimo like treeview and url redirect - posted by "Bong Chih How (JIRA)" <ji...@apache.org> on 2005/10/18 04:59:44 UTC, 3 replies.
- patch to fix NPE in Daemon.getRunnable() - posted by Paul Baclace <pe...@baclace.net> on 2005/10/18 08:44:58 UTC, 0 replies.
- RegexUrlFilter hangs up - posted by Marko Bauhardt <mb...@media-style.com> on 2005/10/18 09:50:49 UTC, 2 replies.
- [jira] Created: (NUTCH-115) jobtracker.jsp shows too much information - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2005/10/18 17:41:44 UTC, 0 replies.
- No buffer space available - posted by ni...@gmail.com on 2005/10/19 03:28:49 UTC, 11 replies.
- searching return 0 hit - posted by Michael Ji <fj...@yahoo.com> on 2005/10/19 04:02:20 UTC, 3 replies.
- [jira] Created: (NUTCH-116) TestNDFS a JUnit test specifically for NDFS - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/10/19 04:16:44 UTC, 0 replies.
- Map-reduce based SegmentReader - posted by radu mateescu <rb...@gmail.com> on 2005/10/19 04:18:14 UTC, 0 replies.
- [jira] Updated: (NUTCH-116) TestNDFS a JUnit test specifically for NDFS - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/10/19 05:45:44 UTC, 2 replies.
- Re: Event queues vs threads - posted by Paul Baclace <pe...@baclace.net> on 2005/10/19 07:59:35 UTC, 0 replies.
- [jira] Created: (NUTCH-117) Crawl crashes with java.io.IOException: already exists: C:\nutch\crawl.intranet\oct18\db\webdb.new\pagesByURL - posted by "Stephen Cross (JIRA)" <ji...@apache.org> on 2005/10/19 14:43:44 UTC, 0 replies.
- Re: [Nutch-dev] [Fwd: Fetch list priority] - posted by Massimo Miccoli <mm...@iltrovatore.it> on 2005/10/19 18:52:46 UTC, 2 replies.
- [jira] Commented: (NUTCH-116) TestNDFS a JUnit test specifically for NDFS - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/10/19 19:10:48 UTC, 1 replies.
- [jira] Commented: (NUTCH-88) Enhance ParserFactory plugin selection policy - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/10/19 23:36:45 UTC, 10 replies.
- Re: OPIC - posted by Doug Cutting <cu...@nutch.org> on 2005/10/20 00:29:16 UTC, 4 replies.
- [jira] Updated: (NUTCH-82) Nutch Commands should run on Windows without external tools - posted by "Nick Jacobsen (JIRA)" <ji...@apache.org> on 2005/10/20 01:37:44 UTC, 0 replies.
- [jira] Created: (NUTCH-118) FAQ link points to invalid URL - posted by "Steve Betts (JIRA)" <ji...@apache.org> on 2005/10/20 17:17:44 UTC, 0 replies.
- [jira] Created: (NUTCH-119) Regexp to extract outlinks incorrect - posted by "Sébastien Le Callonnec (JIRA)" <ji...@apache.org> on 2005/10/20 21:32:44 UTC, 0 replies.
- rel=nofollow - posted by Doug Cutting <cu...@nutch.org> on 2005/10/20 21:34:12 UTC, 1 replies.
- [jira] Updated: (NUTCH-119) Regexp to extract outlinks incorrect - posted by "Sébastien Le Callonnec (JIRA)" <ji...@apache.org> on 2005/10/20 21:38:59 UTC, 1 replies.
- [jira] Created: (NUTCH-120) one "bad" link on a page kills parsing - posted by "Earl Cahill (JIRA)" <ji...@apache.org> on 2005/10/20 21:40:54 UTC, 0 replies.
- NDFS Limitations or Bug - posted by Rod Taylor <rb...@sitesell.com> on 2005/10/21 03:12:26 UTC, 0 replies.
- [jira] Created: (NUTCH-121) SegmentReader for mapred - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2005/10/21 03:35:44 UTC, 0 replies.
- [jira] Updated: (NUTCH-121) SegmentReader for mapred - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2005/10/21 03:37:46 UTC, 0 replies.
- [jira] Created: (NUTCH-122) block numbers need a better random number generator - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/10/21 06:26:44 UTC, 0 replies.
- [jira] Updated: (NUTCH-122) block numbers need a better random number generator - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/10/21 09:31:03 UTC, 1 replies.
- [jira] Commented: (NUTCH-117) Crawl crashes with java.io.IOException: already exists: C:\nutch\crawl.intranet\oct18\db\webdb.new\pagesByURL - posted by "Nick Jacobsen (JIRA)" <ji...@apache.org> on 2005/10/21 18:10:21 UTC, 1 replies.
- error ParseOutputFormat.java:69: inconvertible types in revision 327230 - posted by Gal Nitzan <gn...@usa.net> on 2005/10/22 00:16:33 UTC, 3 replies.
- status dedub - posted by Stefan Groschupf <sg...@media-style.com> on 2005/10/24 20:48:12 UTC, 4 replies.
- mapred questions - posted by Ken van Mulder <ke...@wavefire.com> on 2005/10/25 00:19:42 UTC, 0 replies.
- [jira] Commented: (NUTCH-49) Flag for generate to fetch only new pages to complement the -refetchonly flag - posted by "byron miller (JIRA)" <ji...@apache.org> on 2005/10/25 16:49:08 UTC, 0 replies.
- merge indices from multiple webdb - posted by AJ Chen <ca...@gmail.com> on 2005/10/25 22:02:36 UTC, 5 replies.
- Long delay in httpclient - posted by Ken Krugler <kk...@krugle.net> on 2005/10/26 17:43:21 UTC, 0 replies.
- best way to load page components - posted by Stefan Groschupf <sg...@media-style.com> on 2005/10/27 15:07:20 UTC, 0 replies.
- debug JSP with eclipse - posted by AJ Chen <ca...@gmail.com> on 2005/10/30 07:19:40 UTC, 1 replies.
- [jira] Commented: (NUTCH-39) pagination in search result - posted by "byron miller (JIRA)" <ji...@apache.org> on 2005/10/31 04:57:55 UTC, 0 replies.
- deltas to wiki page nutch/NutchDistributedFileSystem - posted by Paul Baclace <pe...@baclace.net> on 2005/10/31 21:54:53 UTC, 1 replies.