You are viewing a plain text version of this content. The canonical link for it is here.
- Re: Refactoring some plugins - posted by Jérôme Charron <je...@gmail.com> on 2006/04/01 00:18:10 UTC, 0 replies.
- [jira] Updated: (NUTCH-240) Scoring API: extension point, scoring filters and an OPIC plugin - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/04/03 14:21:00 UTC, 2 replies.
- [jira] Closed: (NUTCH-238) NDFSck - fsck utility for NDFS (pre-Hadoop) - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/04/03 14:22:46 UTC, 0 replies.
- [jira] Closed: (NUTCH-230) OPIC score for outlinks should be based on # of valid links, not total # of links. - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/04/03 14:24:49 UTC, 0 replies.
- [jira] Commented: (NUTCH-240) Scoring API: extension point, scoring filters and an OPIC plugin - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2006/04/03 15:30:45 UTC, 4 replies.
- [jira] Assigned: (NUTCH-240) Scoring API: extension point, scoring filters and an OPIC plugin - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/04/03 15:48:02 UTC, 0 replies.
- Add ".settings" to svn:ignore on root Nutch folder? - posted by Dawid Weiss <da...@cs.put.poznan.pl> on 2006/04/04 10:19:53 UTC, 17 replies.
- [jira] Updated: (NUTCH-237) Carrot2 clustering plugin upgrade. - posted by "Dawid Weiss (JIRA)" <ji...@apache.org> on 2006/04/04 12:22:44 UTC, 0 replies.
- [jira] Created: (NUTCH-243) Some meta-refresh urls get ignored due to matching regular expression - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2006/04/04 23:35:46 UTC, 0 replies.
- Which nutch-site.xml wins? - posted by Chris Schneider <Sc...@TransPac.com> on 2006/04/05 04:34:46 UTC, 0 replies.
- [jira] Created: (NUTCH-244) Inconsistent handling of property values boundaries / unable to set db.max.outlinks.per.page to infinite - posted by "AJ Banck (JIRA)" <ji...@apache.org> on 2006/04/05 10:08:43 UTC, 0 replies.
- Search quality evaluation - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/04/05 13:22:37 UTC, 4 replies.
- Patch to fix Redirects - posted by Dennis Kubes <nu...@dragonflymc.com> on 2006/04/05 17:40:47 UTC, 2 replies.
- [jira] Commented: (NUTCH-244) Inconsistent handling of property values boundaries / unable to set db.max.outlinks.per.page to infinite - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2006/04/05 18:32:44 UTC, 2 replies.
- Patch to remove Nutch formating from logs - posted by Christopher Burkey <cb...@openedit.org> on 2006/04/05 23:11:35 UTC, 2 replies.
- [jira] Closed: (NUTCH-244) Inconsistent handling of property values boundaries / unable to set db.max.outlinks.per.page to infinite - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2006/04/06 19:07:09 UTC, 0 replies.
- PMD integration (was: Re: Add ".settings" to svn:ignore on root Nutch folder?) - posted by Piotr Kosiorowski <pk...@gmail.com> on 2006/04/06 21:24:30 UTC, 0 replies.
- 0.8 release schedule (was Re: latest build throws error - critical) - posted by Doug Cutting <cu...@apache.org> on 2006/04/06 21:25:50 UTC, 11 replies.
- Re: PMD integration - posted by Dawid Weiss <da...@cs.put.poznan.pl> on 2006/04/07 09:03:50 UTC, 12 replies.
- [Proposal] New Lucene sub-project - posted by Jérôme Charron <je...@gmail.com> on 2006/04/07 10:26:54 UTC, 6 replies.
- Entity � - posted by ma...@provinzial.com on 2006/04/07 11:28:58 UTC, 0 replies.
- CrawlDbReducer - selecting data for DB update - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/04/07 12:24:11 UTC, 1 replies.
- [jira] Created: (NUTCH-245) XML Schemas for xml configuration files in conf directory - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2006/04/07 22:13:23 UTC, 0 replies.
- [jira] Updated: (NUTCH-245) DTD Schemas for plugin.xml configuration files in conf directory - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2006/04/07 22:15:24 UTC, 2 replies.
- web ui improvement - posted by Sami Siren <ss...@gmail.com> on 2006/04/07 23:13:12 UTC, 2 replies.
- mapred branch - posted by Anton Potehin <an...@orbita1.ru> on 2006/04/10 12:06:56 UTC, 2 replies.
- image search - posted by Anton Potehin <an...@orbita1.ru> on 2006/04/10 13:42:17 UTC, 0 replies.
- Content-Type inconsistency? - posted by Jérôme Charron <je...@gmail.com> on 2006/04/10 23:08:29 UTC, 7 replies.
- nighly build brocken? - posted by Stefan Groschupf <sg...@media-style.com> on 2006/04/11 02:02:13 UTC, 3 replies.
- [jira] Created: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/04/11 15:52:19 UTC, 0 replies.
- [jira] Commented: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement - posted by "Chris Schneider (JIRA)" <ji...@apache.org> on 2006/04/11 17:13:26 UTC, 3 replies.
- Microformats Support - HReview - posted by mikeyc <mc...@gmail.com> on 2006/04/11 21:28:53 UTC, 2 replies.
- Swap with Nutch - posted by larryp <la...@hotmail.com> on 2006/04/12 00:17:54 UTC, 7 replies.
- [jira] Created: (NUTCH-247) robot parser to restrict. - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/04/12 03:47:19 UTC, 0 replies.
- NPE in CrawlDbReducer - posted by Marko Bauhardt <mb...@media-style.com> on 2006/04/12 16:55:02 UTC, 1 replies.
- 0.8 release? - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2006/04/12 18:33:41 UTC, 4 replies.
- [jira] Commented: (NUTCH-245) DTD Schemas for plugin.xml configuration files in conf directory - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/04/12 18:48:24 UTC, 1 replies.
- [jira] Updated: (NUTCH-245) DTD for plugin.xml configuration files - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2006/04/12 18:56:23 UTC, 0 replies.
- [jira] Updated: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement - posted by "Chris Schneider (JIRA)" <ji...@apache.org> on 2006/04/12 23:18:21 UTC, 2 replies.
- Duplicate Detection: Offlince vs. Search Time - posted by Shailesh Kochhar <ko...@uiuc.edu> on 2006/04/13 00:06:18 UTC, 3 replies.
- haddoop - posted by Anton Potehin <an...@orbita1.ru> on 2006/04/13 09:12:23 UTC, 0 replies.
- [jira] Commented: (NUTCH-245) DTD for plugin.xml configuration files - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2006/04/13 15:46:03 UTC, 0 replies.
- [ot] binary subversion diffs - posted by Stefan Groschupf <sg...@media-style.com> on 2006/04/13 23:04:00 UTC, 1 replies.
- Java Main Example - posted by Faisal Akeel <fa...@gmail.com> on 2006/04/14 16:18:14 UTC, 0 replies.
- Nutch calendar - posted by Jérôme Charron <je...@gmail.com> on 2006/04/14 23:41:09 UTC, 0 replies.
- [jira] Closed: (NUTCH-245) DTD for plugin.xml configuration files - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2006/04/15 02:00:01 UTC, 0 replies.
- Seacrh for keywords by url - posted by Richard Braman <rb...@bramantax.com> on 2006/04/15 17:01:05 UTC, 0 replies.
- [jira] Created: (NUTCH-248) add support for internationalized domain names - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/04/15 19:33:00 UTC, 0 replies.
- Can nutch fit to this task ? - posted by ahmed ghouzia <gh...@yahoo.com> on 2006/04/16 10:49:28 UTC, 0 replies.
- plugin.dtd - posted by Stefan Groschupf <sg...@media-style.com> on 2006/04/16 15:02:48 UTC, 2 replies.
- [jira] Created: (NUTCH-249) black- white list url filtering - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/04/17 16:02:40 UTC, 0 replies.
- [jira] Updated: (NUTCH-249) black- white list url filtering - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/04/17 16:18:17 UTC, 1 replies.
- Re: svn commit: r394228 - in /lucene/nutch/trunk: ./ src/java/org/apache/nutch/plugin/ src/plugin/ src/plugin/analysis-de/ src/plugin/analysis-fr/ src/plugin/clustering-carrot2/ src/plugin/creativecommons/ src/plugin/index-basic/ src/plugin/index-more/ src... - posted by Doug Cutting <cu...@apache.org> on 2006/04/17 19:10:25 UTC, 0 replies.
- question about crawldb - posted by Anton Potehin <an...@orbita1.ru> on 2006/04/18 13:36:11 UTC, 2 replies.
- Boost - posted by TDLN <di...@gmail.com> on 2006/04/18 18:27:48 UTC, 2 replies.
- jobtaraker and tasktracker - posted by Anton Potehin <an...@orbita1.ru> on 2006/04/19 15:27:20 UTC, 0 replies.
- Re: jobtaraker and tasktracker - posted by Doug Cutting <cu...@apache.org> on 2006/04/19 17:55:23 UTC, 0 replies.
- [jira] Created: (NUTCH-250) Generate to log truncation caused by generate.max.per.host - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2006/04/20 02:57:34 UTC, 0 replies.
- [jira] Updated: (NUTCH-250) Generate to log truncation caused by generate.max.per.host - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2006/04/20 02:57:35 UTC, 0 replies.
- mapred.map.tasks - posted by Anton Potehin <an...@orbita1.ru> on 2006/04/20 08:56:24 UTC, 3 replies.
- dfs filesystem - posted by Anton Potehin <an...@orbita1.ru> on 2006/04/20 09:03:55 UTC, 0 replies.
- [jira] Commented: (NUTCH-173) PerHost Crawling Policy ( crawl.ignore.external.links ) - posted by "Christophe Noel (JIRA)" <ji...@apache.org> on 2006/04/20 11:15:06 UTC, 1 replies.
- [jira] Resolved: (NUTCH-250) Generate to log truncation caused by generate.max.per.host - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/04/20 21:20:06 UTC, 0 replies.
- nutch user meeting in San Francisco: May 18th - posted by Stefan Groschupf <sg...@media-style.com> on 2006/04/21 01:14:13 UTC, 1 replies.
- refetching interval - posted by Michael Ji <fj...@yahoo.com> on 2006/04/21 22:25:41 UTC, 0 replies.
- [jira] Created: (NUTCH-251) Administration GUI - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/04/21 23:44:05 UTC, 0 replies.
- [jira] Updated: (NUTCH-251) Administration GUI - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/04/22 00:01:07 UTC, 3 replies.
- [jira] Created: (NUTCH-252) Launching a segread/readdb command kills any running nutch commands - posted by "Chris Schneider (JIRA)" <ji...@apache.org> on 2006/04/22 01:31:05 UTC, 0 replies.
- [jira] Commented: (NUTCH-251) Administration GUI - posted by "Zaheed Haque (JIRA)" <ji...@apache.org> on 2006/04/22 14:09:06 UTC, 0 replies.
- [jira] Created: (NUTCH-253) Normalize Host during Generate - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2006/04/24 05:38:05 UTC, 0 replies.
- [jira] Updated: (NUTCH-253) Normalize Host during Generate - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2006/04/24 05:40:05 UTC, 0 replies.
- update crawldb - posted by Anton Potehin <an...@orbita1.ru> on 2006/04/24 13:53:28 UTC, 0 replies.
- Errors in PluginManifestParser - posted by Dennis Kubes <nu...@dragonflymc.com> on 2006/04/24 22:28:50 UTC, 5 replies.
- [jira] Created: (NUTCH-254) Fetcher throws NullPointer if redirect URL is filtered - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2006/04/24 23:00:09 UTC, 0 replies.
- [jira] Updated: (NUTCH-254) Fetcher throws NullPointer if redirect URL is filtered - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2006/04/24 23:02:11 UTC, 0 replies.
- [jira] Closed: (NUTCH-254) Fetcher throws NullPointer if redirect URL is filtered - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/04/25 00:57:06 UTC, 0 replies.
- Search engine project - posted by om...@binde.net on 2006/04/25 10:00:01 UTC, 0 replies.
- [jira] Created: (NUTCH-255) Regular Expression for RegexUrlNormalizer to remove jsessionid - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2006/04/25 19:44:10 UTC, 0 replies.
- [jira] Updated: (NUTCH-255) Regular Expression for RegexUrlNormalizer to remove jsessionid - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2006/04/25 19:44:16 UTC, 0 replies.
- [jira] Closed: (NUTCH-125) OpenOffice Parser plugin - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/04/25 21:14:03 UTC, 0 replies.
- CrawlDatum.metaData should never be null - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/04/25 21:40:35 UTC, 4 replies.
- [jira] Updated: (NUTCH-243) Some meta-refresh urls get ignored due to matching regular expression - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2006/04/25 21:56:03 UTC, 0 replies.
- Nutch Parser Bug - posted by Alex <al...@yahoo.com> on 2006/04/25 23:41:22 UTC, 2 replies.
- exception - posted by Anton Potehin <an...@orbita1.ru> on 2006/04/26 10:31:09 UTC, 3 replies.
- [jira] Commented: (NUTCH-249) black- white list url filtering - posted by "Thomas Delnoij (JIRA)" <ji...@apache.org> on 2006/04/26 13:10:06 UTC, 3 replies.
- Nutch-18 illegal chars in urls: Not sure what the problem is - posted by Chris Fellows <cc...@sbcglobal.net> on 2006/04/26 22:15:05 UTC, 0 replies.
- [jira] Commented: (NUTCH-18) Windows servers include illegal characters in URLs - posted by "Chris Fellows (JIRA)" <ji...@apache.org> on 2006/04/26 23:22:24 UTC, 1 replies.
- Re: svn commit: r394228 - in /lucene/nutch/trunk: ./ src/java/org/apache/nutch/plugin/ src/plugin/ src/plugin/analysis-de/ src/plugin/analysis-fr/ src/plugin/clustering-carrot2/ src/plugin/creativecommons/ src/plugin/index-basic/ src/plugin/index-mor - posted by Jérôme Charron <je...@gmail.com> on 2006/04/26 23:45:06 UTC, 0 replies.
- [jira] Commented: (NUTCH-25) needs 'character encoding' detector - posted by "Chris Fellows (JIRA)" <ji...@apache.org> on 2006/04/27 01:59:03 UTC, 0 replies.
- Re: [Nutch-cvs] svn commit: r397320 - /lucene/nutch/trunk/src/plugin/parse-oo/plugin.xml - posted by Jérôme Charron <je...@gmail.com> on 2006/04/27 09:52:44 UTC, 0 replies.
- Analyze command? - posted by Tran Van Hung <tv...@yahoo.com> on 2006/04/27 13:47:56 UTC, 0 replies.
- TRUNK IllegalArgumentException: Argument is not an array (WAS: Re: exception) - posted by Michael Stack <st...@archive.org> on 2006/04/27 19:07:48 UTC, 1 replies.
- [jira] Created: (NUTCH-256) Cannot open filename ....index.done.crc - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2006/04/28 01:05:37 UTC, 0 replies.
- [jira] Updated: (NUTCH-256) Cannot open filename ....index.done.crc - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2006/04/28 01:08:37 UTC, 0 replies.
- [jira] Commented: (NUTCH-256) Cannot open filename ....index.done.crc - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/04/28 01:16:37 UTC, 3 replies.
- new parameters - posted by an...@orbita1.ru on 2006/04/28 09:12:13 UTC, 0 replies.
- [jira] Created: (NUTCH-257) Summary#toString always Entity encodes -- problem for OpenSearchServlet#description field - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2006/04/28 22:37:37 UTC, 0 replies.
- [jira] Created: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore - posted by "Scott Ganyo (JIRA)" <ji...@apache.org> on 2006/04/28 22:39:37 UTC, 0 replies.
- [jira] Commented: (NUTCH-257) Summary#toString always Entity encodes -- problem for OpenSearchServlet#description field - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/04/28 22:46:37 UTC, 1 replies.
- [jira] Resolved: (NUTCH-256) Cannot open filename ....index.done.crc - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/04/28 23:41:39 UTC, 0 replies.
- [jira] Created: (NUTCH-259) Problem in IndexSorter after dedup - posted by "Michael (JIRA)" <ji...@apache.org> on 2006/04/29 00:18:37 UTC, 0 replies.
- Re: CrawlDbReducer and the lone STATUS_SIGNATURE record - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/04/29 09:49:50 UTC, 1 replies.
- Php frontend - posted by Marco Pereira <ma...@gmail.com> on 2006/04/29 17:55:43 UTC, 0 replies.