You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] Commented: (NUTCH-130) Be explicit about target JVM when building (1.4.x?) - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2005/12/01 01:09:30 UTC, 0 replies.
- [jira] Resolved: (NUTCH-130) Be explicit about target JVM when building (1.4.x?) - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/12/01 19:26:31 UTC, 0 replies.
- Re: Urlfilter Patch - posted by Doug Cutting <cu...@nutch.org> on 2005/12/01 19:43:05 UTC, 17 replies.
- incremental crawling - posted by Doug Cutting <cu...@nutch.org> on 2005/12/01 20:15:49 UTC, 5 replies.
- NDFS/MapReduce? - posted by "Goldschmidt, Dave" <dg...@globalspec.com> on 2005/12/01 21:20:02 UTC, 1 replies.
- Re: [Nutch-dev] incremental crawling - posted by Matt Kangas <ka...@gmail.com> on 2005/12/01 21:22:06 UTC, 2 replies.
- [jira] Resolved: (NUTCH-116) TestNDFS a JUnit test specifically for NDFS - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/12/01 21:30:31 UTC, 0 replies.
- [jira] Resolved: (NUTCH-114) getting number of urls and links from crawldb - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2005/12/02 09:46:30 UTC, 0 replies.
- [jira] Commented: (NUTCH-98) RobotRulesParser interprets robots.txt incorrectly - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2005/12/03 20:34:30 UTC, 0 replies.
- [jira] Created: (NUTCH-131) Non-documented variable: mapred.child.heap.size - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2005/12/03 20:37:29 UTC, 0 replies.
- NDFS Connection reset - posted by Jack Tang <hi...@gmail.com> on 2005/12/05 17:45:48 UTC, 3 replies.
- Killing lines - posted by an...@orbita1.ru on 2005/12/06 16:13:20 UTC, 0 replies.
- submitting a patch? - posted by James Nelson <te...@gmail.com> on 2005/12/06 17:44:43 UTC, 3 replies.
- NUTCH-112: Link in cached.jsp page to cached content is an absolute link - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2005/12/06 18:03:28 UTC, 0 replies.
- RCP known limitation or bug? - posted by Stefan Groschupf <sg...@media-style.com> on 2005/12/06 18:13:46 UTC, 2 replies.
- FetchListTool.TableSet.append() result - posted by Chris Schneider <Sc...@TransPac.com> on 2005/12/06 21:01:18 UTC, 0 replies.
- [jira] Created: (NUTCH-132) Add ability to sort on more than one column - posted by "James Nelson (JIRA)" <ji...@apache.org> on 2005/12/06 22:28:08 UTC, 0 replies.
- [jira] Commented: (NUTCH-132) Add ability to sort on more than one column - posted by "James Nelson (JIRA)" <ji...@apache.org> on 2005/12/06 22:35:08 UTC, 0 replies.
- [jira] Updated: (NUTCH-132) Add ability to sort on more than one column - posted by "James Nelson (JIRA)" <ji...@apache.org> on 2005/12/06 22:37:08 UTC, 0 replies.
- [jira] Closed: (NUTCH-112) Link in cached.jsp page to cached content is an absolute link - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2005/12/06 23:04:08 UTC, 0 replies.
- [jira] Created: (NUTCH-133) ParserFactory does not work as expected - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/06 23:04:09 UTC, 0 replies.
- [jira] Updated: (NUTCH-133) ParserFactory does not work as expected - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/06 23:06:11 UTC, 1 replies.
- [jira] Commented: (NUTCH-133) ParserFactory does not work as expected - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2005/12/06 23:23:10 UTC, 16 replies.
- [jira] Created: (NUTCH-134) Summarizer doesn't select the best snippets - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2005/12/07 15:11:08 UTC, 0 replies.
- [jira] Commented: (NUTCH-134) Summarizer doesn't select the best snippets - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/12/07 21:03:08 UTC, 4 replies.
- Nutch 0.8 update issue - posted by Jack Tang <hi...@gmail.com> on 2005/12/08 03:29:00 UTC, 1 replies.
- Re: Lucene performance bottlenecks - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/12/08 10:04:32 UTC, 5 replies.
- about the question of clustering-carrot2 - posted by charlie <ch...@ipedo.com.cn> on 2005/12/08 11:02:23 UTC, 0 replies.
- [jira] Closed: (NUTCH-133) ParserFactory does not work as expected - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/08 12:06:08 UTC, 0 replies.
- nutch questions - posted by Ken van Mulder <ke...@wavefire.com> on 2005/12/09 00:59:31 UTC, 3 replies.
- Should nutch try to reduce first? - posted by Rod Taylor <rb...@sitesell.com> on 2005/12/09 05:57:35 UTC, 1 replies.
- Re: [C2-devel] about the question of clustering-carrot2 - posted by Dawid Weiss <da...@cs.put.poznan.pl> on 2005/12/09 10:09:09 UTC, 0 replies.
- Google performance bottlenecks ;-) (Re: Lucene performance bottlenecks) - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/12/09 10:42:48 UTC, 3 replies.
- parse.getData().getMetadata().get("propName") is NULL? - posted by Jack Tang <hi...@gmail.com> on 2005/12/09 20:04:00 UTC, 1 replies.
- [jira] Created: (NUTCH-135) http header meta data are case insensitive in the real world (e.g. Content-Type or content-type) - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/09 21:51:08 UTC, 0 replies.
- [jira] Updated: (NUTCH-135) http header meta data are case insensitive in the real world (e.g. Content-Type or content-type) - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/09 22:14:08 UTC, 2 replies.
- [jira] Commented: (NUTCH-135) http header meta data are case insensitive in the real world (e.g. Content-Type or content-type) - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2005/12/09 22:59:08 UTC, 3 replies.
- [jira] Assigned: (NUTCH-3) multi values of header discarded - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/10 04:56:11 UTC, 0 replies.
- [jira] Resolved: (NUTCH-135) http header meta data are case insensitive in the real world (e.g. Content-Type or content-type) - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2005/12/11 01:45:08 UTC, 0 replies.
- [jira] Commented: (NUTCH-34) Parsing different content formats - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2005/12/11 19:11:08 UTC, 0 replies.
- Hot Search! Re: Nutch Suggestion? (Google like "did you mean") - posted by Jack Tang <hi...@gmail.com> on 2005/12/12 03:08:09 UTC, 3 replies.
- Re: [Nutch-dev] What are the limitations of nutch - posted by Stefan Groschupf <sg...@media-style.com> on 2005/12/12 09:34:44 UTC, 0 replies.
- Customize the time to retry - posted by Nguyen Ngoc Giang <gi...@gmail.com> on 2005/12/12 11:04:50 UTC, 0 replies.
- BLAST plugin for nutch - posted by Leen Toelen <to...@gmail.com> on 2005/12/12 14:02:50 UTC, 0 replies.
- IndexOptimizer (Re: Lucene performance bottlenecks) - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/12/12 17:32:59 UTC, 17 replies.
- Re: MapRed Generator - posted by Marko Bauhardt <mb...@media-style.com> on 2005/12/12 23:03:39 UTC, 0 replies.
- [jira] Created: (NUTCH-136) mapreduce segment generator generates 50 % less than excepted urls - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/12 23:50:45 UTC, 0 replies.
- Hard-coded Content-type checks - posted by Jérôme Charron <je...@gmail.com> on 2005/12/13 14:24:30 UTC, 2 replies.
- Standard metadata property names in the ParseData metadata - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2005/12/13 18:07:16 UTC, 6 replies.
- [Fwd: Crawler submits forms?] - posted by Doug Cutting <cu...@nutch.org> on 2005/12/13 18:17:25 UTC, 16 replies.
- best file system for NDFS? - posted by Stefan Groschupf <sg...@media-style.com> on 2005/12/13 20:22:44 UTC, 3 replies.
- Idea about aliases in the parse-plugins.xml file - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2005/12/13 21:00:18 UTC, 0 replies.
- [jira] Created: (NUTCH-137) footer is not displayed in search result page - posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2005/12/13 23:55:45 UTC, 0 replies.
- [jira] Created: (NUTCH-138) non-Latin-1 characters cannot be submitted for search - posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2005/12/14 00:07:45 UTC, 0 replies.
- [jira] Created: (NUTCH-139) Standard metadata property names in the ParseData metadata - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2005/12/14 05:02:45 UTC, 0 replies.
- [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2005/12/14 05:04:46 UTC, 14 replies.
- [jira] Updated: (NUTCH-139) Standard metadata property names in the ParseData metadata - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2005/12/14 05:04:48 UTC, 2 replies.
- [jira] Created: (NUTCH-140) Add alias capability in parse-plugins.xml file that allows mimeType->extensionId mapping - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2005/12/14 05:10:46 UTC, 0 replies.
- problem in merging index - posted by Rozina Sorathia <Ro...@KPITCummins.com> on 2005/12/14 06:32:28 UTC, 0 replies.
- Timeout that does not retry - posted by Rod Taylor <rb...@sitesell.com> on 2005/12/14 06:36:19 UTC, 0 replies.
- [jira] Commented: (NUTCH-140) Add alias capability in parse-plugins.xml file that allows mimeType->extensionId mapping - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/14 11:36:45 UTC, 1 replies.
- [jira] Created: (NUTCH-141) jobdetails.jsp doesnt work on webbrowser "safari" - posted by "Marko Bauhardt (JIRA)" <ji...@apache.org> on 2005/12/14 11:53:45 UTC, 0 replies.
- vote for issues to fix in 0.7.2 - posted by Stefan Groschupf <sg...@media-style.com> on 2005/12/14 14:18:51 UTC, 7 replies.
- translation of Nutch search page - posted by hi...@ai.univ-paris8.fr on 2005/12/14 16:57:05 UTC, 0 replies.
- mapreduce fetcher doesn't fetch all urls - posted by Florent Gluck <fl...@busytonight.com> on 2005/12/14 20:39:45 UTC, 12 replies.
- [jira] Closed: (NUTCH-141) jobdetails.jsp doesnt work on webbrowser "safari" - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2005/12/14 22:33:45 UTC, 0 replies.
- duplicated poperty in nutch-default.xml (0.8) - posted by Florent Gluck <fl...@busytonight.com> on 2005/12/15 01:39:39 UTC, 0 replies.
- Nutch design queries - posted by Mike Cannon-Brookes <mc...@gmail.com> on 2005/12/15 14:52:23 UTC, 7 replies.
- vote results. - posted by Stefan Groschupf <sg...@media-style.com> on 2005/12/15 17:14:02 UTC, 0 replies.
- Re: vote results. - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/12/15 17:50:33 UTC, 2 replies.
- JUnit test failures - posted by Piotr Kosiorowski <pk...@gmail.com> on 2005/12/15 19:06:21 UTC, 0 replies.
- [jira] Created: (NUTCH-142) NutchConf should use the thread context classloader - posted by "Mike Cannon-Brookes (JIRA)" <ji...@apache.org> on 2005/12/15 23:22:45 UTC, 0 replies.
- mapred merge to trunk - posted by Doug Cutting <cu...@nutch.org> on 2005/12/15 23:49:47 UTC, 2 replies.
- Re: version branches / two products - posted by David Wallace <da...@nzqa.govt.nz> on 2005/12/16 01:25:06 UTC, 1 replies.
- [jira] Created: (NUTCH-143) Improper error numbers returned on exit - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2005/12/16 06:27:49 UTC, 0 replies.
- [jira] Commented: (NUTCH-143) Improper error numbers returned on exit - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/16 11:55:45 UTC, 1 replies.
- Re: [Nutch-dev] distributed seach - posted by Stefan Groschupf <sg...@media-style.com> on 2005/12/16 12:13:33 UTC, 1 replies.
- [jira] Commented: (NUTCH-39) pagination in search result - posted by "Dima (JIRA)" <ji...@apache.org> on 2005/12/16 14:34:46 UTC, 0 replies.
- "Something is Wrong with Google’s Mathematical Model" - posted by Stefan Groschupf <sg...@media-style.com> on 2005/12/16 20:27:32 UTC, 1 replies.
- [jira] Updated: (NUTCH-3) multi values of header discarded - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/16 20:37:46 UTC, 1 replies.
- [VOTE] Commiter access for Stefan Groschupf - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/12/16 22:50:19 UTC, 7 replies.
- Re: [Nutch-dev] Re: [VOTE] Commiter access for Stefan Groschupf - posted by Kashif Khadim <ka...@yahoo.com> on 2005/12/16 23:44:26 UTC, 0 replies.
- bug in parse-rtf? - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2005/12/17 00:05:48 UTC, 0 replies.
- RE: "Something is Wrong with Google's Mathematical Model" - posted by Paul Sutter <ps...@implicitlabs.com> on 2005/12/17 00:21:24 UTC, 0 replies.
- TrustRank (was Re: "Something is Wrong with Google’s Mathematical Model") - posted by Erik Hatcher <er...@ehatchersolutions.com> on 2005/12/17 10:54:11 UTC, 0 replies.
- [jira] Closed: (NUTCH-3) multi values of header discarded - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2005/12/17 11:13:35 UTC, 0 replies.
- [jira] Commented: (NUTCH-3) multi values of header discarded - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/17 11:17:35 UTC, 4 replies.
- Boolean search support - posted by Nguyen Ngoc Giang <gi...@gmail.com> on 2005/12/17 14:33:59 UTC, 0 replies.
- Re: svn commit: r357334 - in /lucene/nutch/trunk: conf/nutch-default.xml src/java/org/apache/nutch/protocol/Content.java src/java/org/apache/nutch/protocol/ContentProperties.java - posted by Doug Cutting <cu...@nutch.org> on 2005/12/17 17:14:23 UTC, 2 replies.
- [jira] Created: (NUTCH-144) corrupt language identifier tri files and bad language recognition for german - posted by "Bernhard Messer (JIRA)" <ji...@apache.org> on 2005/12/17 17:51:34 UTC, 0 replies.
- [jira] Commented: (NUTCH-144) corrupt language identifier tri files and bad language recognition for german - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/17 18:01:34 UTC, 1 replies.
- [jira] Reopened: (NUTCH-3) multi values of header discarded - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/17 18:55:34 UTC, 0 replies.
- no nightly builds until 27 December - posted by Doug Cutting <cu...@nutch.org> on 2005/12/18 21:39:19 UTC, 0 replies.
- [bug] overwriting job properties until runtime is not possible - posted by Stefan Groschupf <sg...@media-style.com> on 2005/12/19 00:19:01 UTC, 2 replies.
- Latest version of Mapred - posted by Rafi Iz <ra...@hotmail.com> on 2005/12/19 18:46:39 UTC, 1 replies.
- problems http-client - posted by Stefan Groschupf <sg...@media-style.com> on 2005/12/19 19:37:47 UTC, 4 replies.
- Re: Latest version of Mapred - posted by Rafi Iz <ra...@hotmail.com> on 2005/12/19 23:36:45 UTC, 1 replies.
- RE: [Nutch-dev] distributed search - posted by Ledio Ago <la...@looksmart.net> on 2005/12/20 00:24:31 UTC, 8 replies.
- [jira] Created: (NUTCH-145) ant build of the war fie fails on Chinese (zh) .xml files due to UTF-8 BOM - posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2005/12/20 00:59:30 UTC, 0 replies.
- [jira] Commented: (NUTCH-145) ant build of the war fie fails on Chinese (zh) .xml files due to UTF-8 BOM - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/20 01:07:30 UTC, 0 replies.
- [jira] Updated: (NUTCH-145) ant build of the war fie fails on Chinese (zh) .xml files due to UTF-8 BOM - posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2005/12/20 01:09:30 UTC, 0 replies.
- [jira] Updated: (NUTCH-145) build of war file fails on Chinese (zh) .xml files due to UTF-8 BOM - posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2005/12/20 01:09:31 UTC, 0 replies.
- GNU Getopt - posted by Andrew McNabb <am...@mcnabbs.org> on 2005/12/20 08:47:51 UTC, 2 replies.
- nutch and google suggestion - posted by Jack Tang <hi...@gmail.com> on 2005/12/20 10:29:58 UTC, 2 replies.
- [jira] Updated: (NUTCH-131) Non-documented variable: mapred.child.heap.size - posted by "Marko Bauhardt (JIRA)" <ji...@apache.org> on 2005/12/20 12:04:31 UTC, 0 replies.
- Static initializers - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/12/20 14:19:14 UTC, 6 replies.
- [jira] Created: (NUTCH-146) mapred.job.tracker.info.port is defined 2 times in the nutch-default.xml - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/20 18:42:30 UTC, 0 replies.
- [jira] Resolved: (NUTCH-146) mapred.job.tracker.info.port is defined 2 times in the nutch-default.xml - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2005/12/20 19:27:31 UTC, 0 replies.
- [jira] Resolved: (NUTCH-145) build of war file fails on Chinese (zh) .xml files due to UTF-8 BOM - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2005/12/20 19:33:30 UTC, 0 replies.
- nightly build - posted by "tigger ." <b1...@hotmail.com> on 2005/12/20 21:35:04 UTC, 1 replies.
- GETTING OUT OF MAILING LIST - posted by "Rolando H. Martinelli - CoBuys, S.A." <ro...@cobuys.com> on 2005/12/20 22:06:17 UTC, 0 replies.
- nutch-0.8-dev *mapred.input.subdir* problem ? - posted by Lukas Vlcek <lu...@gmail.com> on 2005/12/21 06:56:58 UTC, 4 replies.
- IndexSorter optimizer - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/12/21 14:14:43 UTC, 4 replies.
- Crawling a nutch index with Lucene - posted by Oliver Hummel <hu...@informatik.uni-mannheim.de> on 2005/12/21 17:13:07 UTC, 2 replies.
- [jira] Created: (NUTCH-147) nutch map reduce does not work in windows map reduce runs in a loop - posted by "raghavendra prabhu (JIRA)" <ji...@apache.org> on 2005/12/22 04:45:30 UTC, 0 replies.
- Commons HttpClient 3.0 released - posted by Stefan Groschupf <sg...@media-style.com> on 2005/12/22 10:07:23 UTC, 1 replies.
- [jira] Created: (NUTCH-148) org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates - posted by "raghavendra prabhu (JIRA)" <ji...@apache.org> on 2005/12/22 10:49:30 UTC, 0 replies.
- [jira] Commented: (NUTCH-148) org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2005/12/22 19:37:30 UTC, 3 replies.
- [jira] Created: (NUTCH-149) outlinks not shown properly in cached.jsp - posted by "raghavendra prabhu (JIRA)" <ji...@apache.org> on 2005/12/22 19:37:31 UTC, 0 replies.
- [jira] Commented: (NUTCH-149) outlinks not shown properly in cached.jsp - posted by "raghavendra prabhu (JIRA)" <ji...@apache.org> on 2005/12/22 19:43:30 UTC, 0 replies.
- [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content - posted by "raghavendra prabhu (JIRA)" <ji...@apache.org> on 2005/12/22 20:04:32 UTC, 4 replies.
- Removing old classes from trunk/ - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/12/23 02:16:10 UTC, 1 replies.
- [jira] Created: (NUTCH-150) OutlinkExtractor extremely slow on some non-plain text - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/23 04:17:32 UTC, 0 replies.
- [jira] Updated: (NUTCH-150) OutlinkExtractor extremely slow on some non-plain text - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/23 04:39:31 UTC, 0 replies.
- [jira] Commented: (NUTCH-147) nutch map reduce does not work in windows map reduce runs in a loop - posted by "raghavendra prabhu (JIRA)" <ji...@apache.org> on 2005/12/23 17:12:30 UTC, 0 replies.
- [jira] Closed: (NUTCH-148) org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2005/12/23 20:45:30 UTC, 0 replies.
- [jira] Closed: (NUTCH-147) nutch map reduce does not work in windows map reduce runs in a loop - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2005/12/23 20:47:30 UTC, 0 replies.
- [jira] Created: (NUTCH-151) CommandRunner can hang after the main thread exec is finished and has inefficient busy loop - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/24 02:02:30 UTC, 0 replies.
- failure with crawl using 12/23 trunk - posted by Byron Miller <by...@yahoo.com> on 2005/12/24 05:35:54 UTC, 0 replies.
- severe error in fetch - posted by AJ Chen <ca...@gmail.com> on 2005/12/25 23:38:31 UTC, 4 replies.
- Fwd: bug in Nutch wiki - FAQ - posted by Stefan Groschupf <sg...@media-style.com> on 2005/12/26 14:50:01 UTC, 0 replies.
- [jira] Commented: (NUTCH-151) CommandRunner can hang after the main thread exec is finished and has inefficient busy loop - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/26 23:02:31 UTC, 0 replies.
- [jira] Updated: (NUTCH-151) CommandRunner can hang after the main thread exec is finished and has inefficient busy loop - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/26 23:09:30 UTC, 1 replies.
- [jira] Created: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/27 03:52:30 UTC, 0 replies.
- [jira] Updated: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/27 03:54:31 UTC, 0 replies.
- [jira] Created: (NUTCH-153) TextParser is only supposed to parse plain text, but if given postscript, it can take hours and then fail - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/27 04:29:30 UTC, 0 replies.
- [jira] Updated: (NUTCH-153) TextParser is only supposed to parse plain text, but if given postscript, it can take hours and then fail - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/27 04:31:30 UTC, 0 replies.
- [jira] Commented: (NUTCH-128) second configuration nodes overwrites first node - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/27 04:51:31 UTC, 0 replies.
- [bug?] PRC called emthod require parameter - posted by Stefan Groschupf <sg...@media-style.com> on 2005/12/27 19:17:34 UTC, 0 replies.
- [jira] Commented: (NUTCH-95) DeleteDuplicates depends on the order of input segments - posted by "byron miller (JIRA)" <ji...@apache.org> on 2005/12/28 05:28:32 UTC, 1 replies.
- [jira] Commented: (NUTCH-55) Create dmoz.org search plugin - incorporate the dmoz.org title/category/description if available & - posted by "byron miller (JIRA)" <ji...@apache.org> on 2005/12/28 05:31:31 UTC, 0 replies.
- [jira] Created: (NUTCH-154) Unable to add/update new files to fetchlist/fetcher and thus index, when u rerun crawl tool on same db. - posted by "Arun Kumar Sharma (JIRA)" <ji...@apache.org> on 2005/12/28 09:04:00 UTC, 0 replies.
- [jira] Closed: (NUTCH-154) Unable to add/update new files to fetchlist/fetcher and thus index, when u rerun crawl tool on same db. - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/28 14:19:12 UTC, 0 replies.
- [jira] Closed: (NUTCH-55) Create dmoz.org search plugin - incorporate the dmoz.org title/category/description if available & - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/12/28 14:21:00 UTC, 0 replies.
- [jira] Created: (NUTCH-155) Remove web gui from the distribution to "contrib" and use OpenSearch Servlet - posted by "nutch.newbie (JIRA)" <ji...@apache.org> on 2005/12/28 15:06:01 UTC, 0 replies.
- [jira] Commented: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker. - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/28 22:47:04 UTC, 0 replies.
- [jira] Updated: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker. - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/28 22:51:00 UTC, 0 replies.
- [jira] Created: (NUTCH-156) nutch-daemon.sh should not overwrite old logs by default - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/29 00:05:00 UTC, 0 replies.
- [jira] Updated: (NUTCH-156) nutch-daemon.sh should not overwrite old logs by default - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/29 00:15:01 UTC, 0 replies.
- Mega-cleanup in trunk/ - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/12/29 01:56:05 UTC, 1 replies.
- [jira] Commented: (NUTCH-153) TextParser is only supposed to parse plain text, but if given postscript, it can take hours and then fail - posted by "byron miller (JIRA)" <ji...@apache.org> on 2005/12/29 03:35:01 UTC, 0 replies.
- [jira] Commented: (NUTCH-92) DistributedSearch incorrectly scores results - posted by "byron miller (JIRA)" <ji...@apache.org> on 2005/12/29 03:45:01 UTC, 0 replies.
- [jira] Created: (NUTCH-157) Problem during parsing msword document . It fetching properly but parsing is not working. Please show me the way how can i parse it - posted by "karamjit (JIRA)" <ji...@apache.org> on 2005/12/29 13:25:00 UTC, 0 replies.
- help!search yielding 0 hits with nutch 8 segments - posted by Rozina Sorathia <Ro...@KPITCummins.com> on 2005/12/29 15:32:20 UTC, 0 replies.
- [jira] Closed: (NUTCH-121) SegmentReader for mapred - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2005/12/29 19:38:01 UTC, 0 replies.
- [jira] Created: (NUTCH-158) Process Sitemap data in text, rss or xml format as well as OAI-PMH - posted by "byron miller (JIRA)" <ji...@apache.org> on 2005/12/29 21:00:00 UTC, 0 replies.
- [jira] Commented: (NUTCH-155) Remove web gui from the distribution to "contrib" and use OpenSearch Servlet - posted by "byron miller (JIRA)" <ji...@apache.org> on 2005/12/29 21:05:01 UTC, 0 replies.
- Trunk is broken - posted by Gal Nitzan <gn...@usa.net> on 2005/12/29 23:28:57 UTC, 5 replies.
- Bug in DeleteDuplicates.java ? - posted by Gal Nitzan <gn...@usa.net> on 2005/12/30 00:05:34 UTC, 1 replies.
- java.io.IOException: Job failed - posted by Gal Nitzan <gn...@usa.net> on 2005/12/30 00:05:49 UTC, 0 replies.
- [jira] Updated: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2005/12/30 17:07:02 UTC, 0 replies.
- Adaptive fetch interval & unmodified content detection, episode II - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/12/30 17:31:01 UTC, 0 replies.
- [jira] Created: (NUTCH-159) Specify temp/working directory for crawl - posted by "byron miller (JIRA)" <ji...@apache.org> on 2005/12/31 19:06:01 UTC, 0 replies.
- [jira] Created: (NUTCH-160) Use standard Java Regex library rather than org.apache.oro.text.regex - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2005/12/31 19:14:02 UTC, 0 replies.
- [jira] Updated: (NUTCH-160) Use standard Java Regex library rather than org.apache.oro.text.regex - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2005/12/31 19:14:03 UTC, 0 replies.
- [jira] Commented: (NUTCH-160) Use standard Java Regex library rather than org.apache.oro.text.regex - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2005/12/31 21:09:01 UTC, 0 replies.
- [jira] Commented: (NUTCH-123) Cache.jsp some times generate NullPointerException - posted by "byron miller (JIRA)" <ji...@apache.org> on 2005/12/31 21:45:01 UTC, 0 replies.
- [jira] Commented: (NUTCH-42) enhance search.jsp such that it can also returns XML - posted by "byron miller (JIRA)" <ji...@apache.org> on 2005/12/31 21:57:01 UTC, 0 replies.
- [jira] Closed: (NUTCH-42) enhance search.jsp such that it can also returns XML - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2005/12/31 22:34:00 UTC, 0 replies.
- how to add additional factor at search time to ranking score - posted by AJ Chen <ca...@gmail.com> on 2005/12/31 23:50:19 UTC, 0 replies.