You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] Commented: (NUTCH-123) Cache.jsp some times generate NullPointerException - posted by "YourSoft (JIRA)" <ji...@apache.org> on 2006/01/01 11:40:01 UTC, 0 replies.
- Re: Mega-cleanup in trunk/ - posted by Piotr Kosiorowski <pk...@gmail.com> on 2006/01/01 20:46:25 UTC, 1 replies.
- Re: how to add additional factor at search time to ranking score - posted by Piotr Kosiorowski <pk...@gmail.com> on 2006/01/01 21:56:26 UTC, 0 replies.
- [jira] Commented: (NUTCH-142) NutchConf should use the thread context classloader - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2006/01/01 22:11:00 UTC, 0 replies.
- [jira] Commented: (NUTCH-138) non-Latin-1 characters cannot be submitted for search - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2006/01/02 13:02:02 UTC, 3 replies.
- [jira] Commented: (NUTCH-159) Specify temp/working directory for crawl - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/02 18:53:01 UTC, 2 replies.
- Re: svn commit: r359822 - in /lucene/nutch/trunk: bin/ conf/ src/java/org/apache/nutch/crawl/ src/java/org/apache/nutch/fetcher/ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/parse/ src/java/org/apache/nutch/segment/ src/java/org/apache/nutc... - posted by Doug Cutting <cu...@nutch.org> on 2006/01/02 19:39:30 UTC, 3 replies.
- Re: java.io.IOException: Job failed - posted by Doug Cutting <cu...@nutch.org> on 2006/01/02 19:51:25 UTC, 0 replies.
- Re: Bug in DeleteDuplicates.java ? - posted by Doug Cutting <cu...@nutch.org> on 2006/01/02 19:57:56 UTC, 0 replies.
- Re: [bug?] PRC called emthod require parameter - posted by Doug Cutting <cu...@nutch.org> on 2006/01/02 20:18:57 UTC, 4 replies.
- Re: IndexSorter optimizer - posted by Doug Cutting <cu...@nutch.org> on 2006/01/02 20:50:04 UTC, 9 replies.
- [jira] Closed: (NUTCH-138) non-Latin-1 characters cannot be submitted for search - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2006/01/02 21:08:01 UTC, 0 replies.
- [jira] Created: (NUTCH-161) Plain text parser should use parser.character.encoding.default property for fall back encoding - posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2006/01/03 00:36:01 UTC, 0 replies.
- NullPointerException (new as of Dec 31st) - posted by Rod Taylor <rb...@sitesell.com> on 2006/01/03 03:25:44 UTC, 1 replies.
- LogFormatter - posted by Daniel Feinstein <da...@rawsugar.com> on 2006/01/03 14:58:08 UTC, 3 replies.
- Nutch-87 Setup - posted by Neal Whitley <ne...@e-travelmedia.com> on 2006/01/03 19:51:49 UTC, 2 replies.
- [jira] Commented: (NUTCH-87) Efficient site-specific crawling for a large number of sites - posted by "Neal Whitley (JIRA)" <ji...@apache.org> on 2006/01/03 20:00:01 UTC, 1 replies.
- [jira] Created: (NUTCH-162) country code "jp" is used instead of language code "ja" for Japanese - posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2006/01/03 23:14:00 UTC, 0 replies.
- [jira] Commented: (NUTCH-162) country code "jp" is used instead of language code "ja" for Japanese - posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2006/01/03 23:42:00 UTC, 2 replies.
- NegativeArraySizeException in search server - posted by Gal Nitzan <gn...@usa.net> on 2006/01/04 02:00:37 UTC, 0 replies.
- Adding some theory & publication links into the Wiki.. - posted by Byron Miller <by...@yahoo.com> on 2006/01/04 05:12:26 UTC, 0 replies.
- mapred crawling exception - Job failed! - posted by Lukas Vlcek <lu...@gmail.com> on 2006/01/04 08:10:46 UTC, 15 replies.
- [bug] Re: NegativeArraySizeException in search server - posted by Marko Bauhardt <mb...@media-style.com> on 2006/01/04 10:57:57 UTC, 2 replies.
- [jira] Created: (NUTCH-163) LogFormatter design - posted by "Daniel Feinstein (JIRA)" <ji...@apache.org> on 2006/01/04 14:46:00 UTC, 0 replies.
- Problem crawling with nutch-08 - posted by Rozina Sorathia <Ro...@KPITCummins.com> on 2006/01/04 15:18:27 UTC, 0 replies.
- no static NutchConf - posted by Stefan Groschupf <sg...@media-style.com> on 2006/01/04 15:39:38 UTC, 26 replies.
- [jira] Created: (NUTCH-164) Locale (language) choice by first session has global effect to all sessions - posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2006/01/04 19:53:00 UTC, 0 replies.
- [jira] Commented: (NUTCH-39) pagination in search result - posted by "Neal Whitley (JIRA)" <ji...@apache.org> on 2006/01/04 20:14:01 UTC, 1 replies.
- [jira] Commented: (NUTCH-164) Locale (language) choice by first session has global effect to all sessions - posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2006/01/04 20:29:01 UTC, 0 replies.
- Re: svn commit: r365850 - in /lucene/nutch/trunk/src/plugin/protocol-httpclient: ./ lib/ src/java/org/apache/nutch/protocol/httpclient/ - posted by Piotr Kosiorowski <pk...@gmail.com> on 2006/01/04 20:33:47 UTC, 1 replies.
- [jira] Closed: (NUTCH-142) NutchConf should use the thread context classloader - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2006/01/04 21:38:01 UTC, 0 replies.
- injection infinite loop - posted by Andy Liu <an...@gmail.com> on 2006/01/04 22:57:45 UTC, 1 replies.
- [jira] Updated: (NUTCH-163) LogFormatter design - posted by "Daniel Feinstein (JIRA)" <ji...@apache.org> on 2006/01/05 09:02:00 UTC, 0 replies.
- [jira] Commented: (NUTCH-163) LogFormatter design - posted by "nutch.newbie (JIRA)" <ji...@apache.org> on 2006/01/05 09:16:01 UTC, 3 replies.
- Per-page crawling policy - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/01/05 14:58:44 UTC, 14 replies.
- Re: [VOTE] Commiter access for Stefan Groschupf - posted by Doug Cutting <cu...@nutch.org> on 2006/01/05 19:46:27 UTC, 2 replies.
- [jira] Resolved: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker. - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/05 21:16:27 UTC, 0 replies.
- Re: problems http-client - posted by Doug Cutting <cu...@nutch.org> on 2006/01/05 21:34:32 UTC, 6 replies.
- [jira] Resolved: (NUTCH-131) Non-documented variable: mapred.child.heap.size - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/05 22:09:19 UTC, 0 replies.
- Re: GNU Getopt - posted by Doug Cutting <cu...@nutch.org> on 2006/01/05 22:12:34 UTC, 0 replies.
- Normalizing URLs with anchors - posted by Ken Krugler <kk...@transpac.com> on 2006/01/05 22:40:07 UTC, 2 replies.
- [jira] Updated: (NUTCH-139) Standard metadata property names in the ParseData metadata - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2006/01/05 23:07:19 UTC, 6 replies.
- [jira] Commented: (NUTCH-153) TextParser is only supposed to parse plain text, but if given postscript, it can take hours and then fail - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2006/01/05 23:12:19 UTC, 5 replies.
- [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/01/05 23:30:19 UTC, 32 replies.
- creating MapFiles from unsorted data? - posted by Matt Kangas <ka...@gmail.com> on 2006/01/06 18:11:29 UTC, 2 replies.
- Re: svn commit: r366550 - /lucene/nutch/trunk/src/java/org/apache/nutch/ipc/Client.java - posted by Stefan Groschupf <sg...@media-style.com> on 2006/01/06 20:16:35 UTC, 0 replies.
- Re: Class Cast exception - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/01/06 20:27:19 UTC, 7 replies.
- [jira] Commented: (NUTCH-160) Use standard Java Regex library rather than org.apache.oro.text.regex - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/06 20:38:15 UTC, 0 replies.
- Re: Adaptive fetch interval & unmodified content detection, episode II - posted by Doug Cutting <cu...@nutch.org> on 2006/01/06 20:43:25 UTC, 0 replies.
- [jira] Commented: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/06 22:26:25 UTC, 1 replies.
- [jira] Resolved: (NUTCH-151) CommandRunner can hang after the main thread exec is finished and has inefficient busy loop - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/06 22:37:15 UTC, 1 replies.
- [jira] Resolved: (NUTCH-150) OutlinkExtractor extremely slow on some non-plain text - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/06 22:44:15 UTC, 0 replies.
- Nutch Deployment - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2006/01/06 23:55:43 UTC, 1 replies.
- Reporter interface - posted by Andrew McNabb <am...@mcnabbs.org> on 2006/01/07 00:43:43 UTC, 9 replies.
- [jira] Created: (NUTCH-166) secure jobtracker info pages with a password - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/01/08 00:08:20 UTC, 0 replies.
- [jira] Updated: (NUTCH-166) secure jobtracker info pages with a password - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/01/08 00:10:22 UTC, 0 replies.
- NPE in Indexer.java line 184 - posted by Gal Nitzan <gn...@usa.net> on 2006/01/08 01:18:40 UTC, 6 replies.
- [jira] Created: (NUTCH-167) Observation of directive - posted by "Ed Whittaker (JIRA)" <ji...@apache.org> on 2006/01/08 03:44:22 UTC, 0 replies.
- [jira] Updated: (NUTCH-167) Observation of directive - posted by "Ed Whittaker (JIRA)" <ji...@apache.org> on 2006/01/08 03:46:26 UTC, 0 replies.
- test suite fails? - posted by Stefan Groschupf <sg...@media-style.com> on 2006/01/08 20:02:10 UTC, 2 replies.
- What/how num of required maps is set? - posted by Gal Nitzan <gn...@usa.net> on 2006/01/09 11:07:14 UTC, 0 replies.
- Re: What/how num of required maps is set? OOP Wrong list - posted by Gal Nitzan <gn...@usa.net> on 2006/01/09 12:19:33 UTC, 0 replies.
- why index not in segment anymore - posted by Stefan Groschupf <sg...@media-style.com> on 2006/01/09 13:59:08 UTC, 1 replies.
- XmlInputFortmat ? - posted by Jack Tang <hi...@gmail.com> on 2006/01/09 17:16:42 UTC, 1 replies.
- Crawl and parse exceptions - posted by Matt Zytaruk <ma...@wavefire.com> on 2006/01/09 19:00:46 UTC, 3 replies.
- Re: svn commit: r367137 - in /lucene/nutch/trunk/src: java/org/apache/nutch/net/protocols/ plugin/ plugin/lib-http/ plugin/lib-http/src/ plugin/lib-http/src/java/ plugin/lib-http/src/java/org/ plugin/lib-http/src/java/org/apache/ plugin/lib-http/src/java/o... - posted by Doug Cutting <cu...@nutch.org> on 2006/01/09 19:55:40 UTC, 0 replies.
- wiki:commandline options classpaths - posted by Jerry Russell <Je...@activereasoning.com> on 2006/01/09 20:20:02 UTC, 1 replies.
- Re: svn commit: r367137 - in /lucene/nutch/trunk/src: java/org/apache/nutch/net/protocols/ plugin/ plugin/lib-http/ plugin/lib-http/src/ plugin/lib-http/src/java/ plugin/lib-http/src/java/org/ plugin/lib-http/src/java/org/apache/ plugin/lib-http/src/ - posted by Jérôme Charron <je...@gmail.com> on 2006/01/09 22:08:21 UTC, 0 replies.
- [jira] Resolved: (NUTCH-160) Use standard Java Regex library rather than org.apache.oro.text.regex - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/09 23:00:34 UTC, 0 replies.
- [jira] Created: (NUTCH-168) setting http.content.limit to -1 seems to break text parsing on some files - posted by "Jerry Russell (JIRA)" <ji...@apache.org> on 2006/01/09 23:14:35 UTC, 0 replies.
- HTMLMetaProcessor a bug? - posted by Gal Nitzan <gn...@usa.net> on 2006/01/10 01:20:47 UTC, 4 replies.
- NDFS / map tasks - posted by Byron Miller <by...@yahoo.com> on 2006/01/10 03:39:02 UTC, 0 replies.
- [jira] Created: (NUTCH-169) remove static NutchConf - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/01/10 13:42:21 UTC, 0 replies.
- [jira] Updated: (NUTCH-169) remove static NutchConf - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/01/10 13:53:20 UTC, 5 replies.
- [jira] Reopened: (NUTCH-151) CommandRunner can hang after the main thread exec is finished and has inefficient busy loop - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2006/01/10 14:29:20 UTC, 0 replies.
- [jira] Updated: (NUTCH-151) CommandRunner can hang after the main thread exec is finished and has inefficient busy loop - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2006/01/10 14:31:20 UTC, 0 replies.
- [jira] Commented: (NUTCH-169) remove static NutchConf - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/01/10 16:45:20 UTC, 11 replies.
- fetch of XXX failed with: java.lang.ClassCastException: java.util.ArrayList - posted by Gal Nitzan <gn...@usa.net> on 2006/01/10 17:16:46 UTC, 1 replies.
- ParserFactory test fail - posted by Stefan Groschupf <sg...@media-style.com> on 2006/01/10 18:07:27 UTC, 2 replies.
- [jira] Created: (NUTCH-170) Crash with multiple temp directories - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2006/01/10 18:45:19 UTC, 0 replies.
- [jira] Commented: (NUTCH-170) Crash with multiple temp directories - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2006/01/10 18:47:20 UTC, 1 replies.
- OpenOffice and Excel parsers - posted by Rida Benjelloun <ri...@doculibre.com> on 2006/01/10 20:21:31 UTC, 1 replies.
- [jira] Commented: (NUTCH-151) CommandRunner can hang after the main thread exec is finished and has inefficient busy loop - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2006/01/10 23:33:20 UTC, 0 replies.
- PluginManifestParser should be NutchConfigurable - posted by Jack Tang <hi...@gmail.com> on 2006/01/11 16:15:37 UTC, 0 replies.
- weird fetcher behavior - posted by Florent Gluck <fl...@busytonight.com> on 2006/01/11 16:17:33 UTC, 2 replies.
- Problem with latest SVN during reduce phase - posted by Byron Miller <by...@yahoo.com> on 2006/01/11 20:31:55 UTC, 8 replies.
- [jira] Created: (NUTCH-171) Bring back multiple segment support for Generate / Update - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2006/01/12 00:27:19 UTC, 0 replies.
- [jira] Updated: (NUTCH-171) Bring back multiple segment support for Generate / Update - posted by "Rod Taylor (JIRA)" <ji...@apache.org> on 2006/01/12 00:27:20 UTC, 0 replies.
- Bug - Freezes if the last line in the url file does not finish with EOL symbol - posted by Mike Alulin <mi...@yahoo.com> on 2006/01/12 01:05:49 UTC, 0 replies.
- [jira] Commented: (NUTCH-171) Bring back multiple segment support for Generate / Update - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/12 01:07:20 UTC, 1 replies.
- Does the data size in 0.8 vesion should be much smaller than in version 0.7? - posted by Rafi Iz <ra...@hotmail.com> on 2006/01/12 04:29:55 UTC, 0 replies.
- Speed up searching - posted by YourSoft <yo...@freemail.hu> on 2006/01/12 15:46:36 UTC, 0 replies.
- NutchQuery adding non required Terms - posted by Stefan Groschupf <sg...@media-style.com> on 2006/01/12 16:10:54 UTC, 3 replies.
- Where is org.apache.nutch.protocol.http.api.HttpBase? - posted by Jack Tang <hi...@gmail.com> on 2006/01/12 18:06:12 UTC, 1 replies.
- quit the maillist - posted by Su Yan <su...@gmail.com> on 2006/01/12 18:46:51 UTC, 0 replies.
- MapReduce and segment merging - posted by Mike Alulin <mi...@yahoo.com> on 2006/01/12 18:51:32 UTC, 5 replies.
- [jira] Created: (NUTCH-172) Segment merger - posted by "Mike Alulin (JIRA)" <ji...@apache.org> on 2006/01/12 20:56:20 UTC, 0 replies.
- [jira] Updated: (NUTCH-87) Efficient site-specific crawling for a large number of sites - posted by "Matt Kangas (JIRA)" <ji...@apache.org> on 2006/01/13 02:52:20 UTC, 2 replies.
- Nutch/Lucene Document Model - posted by Chih How Bong <ch...@gmail.com> on 2006/01/13 03:15:10 UTC, 0 replies.
- java.io.EOFException ... at org.apache.nutch.ndfs.DataNode$DataXceiver.run... - posted by Rafi Iz <ra...@hotmail.com> on 2006/01/13 07:36:16 UTC, 0 replies.
- [jira] Created: (NUTCH-173) PerHost Crawling Policy ( crawl.ignore.external.links ) - posted by "Philippe EUGENE (JIRA)" <ji...@apache.org> on 2006/01/13 10:19:19 UTC, 0 replies.
- [jira] Updated: (NUTCH-173) PerHost Crawling Policy ( crawl.ignore.external.links ) - posted by "Philippe EUGENE (JIRA)" <ji...@apache.org> on 2006/01/13 10:21:20 UTC, 1 replies.
- Generating multiple fetchlists between updates - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/01/13 14:31:16 UTC, 1 replies.
- number of block duplicated - posted by Stefan Groschupf <sg...@media-style.com> on 2006/01/13 19:57:06 UTC, 4 replies.
- Suggestions on plugin repository - posted by Thomas Jaeger <nu...@thjaeger.org> on 2006/01/14 12:01:44 UTC, 2 replies.
- [jira] Created: (NUTCH-174) Problem encountered with ant during compilation - posted by "Matthias Günter (JIRA)" <ji...@apache.org> on 2006/01/14 16:09:20 UTC, 0 replies.
- [jira] Closed: (NUTCH-174) Problem encountered with ant during compilation - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2006/01/14 19:17:44 UTC, 0 replies.
- [jira] Created: (NUTCH-175) No input directories specified in: while crawing in nightly build from the 14.1.2006: sh ./nutch crawl urllist.txt -dir tmpdir - posted by "Matthias Günter (JIRA)" <ji...@apache.org> on 2006/01/14 21:07:20 UTC, 1 replies.
- [jira] Created: (NUTCH-176) Using -dir: creates an error, when the directory already exists - posted by "Matthias Günter (JIRA)" <ji...@apache.org> on 2006/01/15 14:10:20 UTC, 0 replies.
- [jira] Created: (NUTCH-177) Default installation seems to produce working entity of nutch - posted by "Matthias Günter (JIRA)" <ji...@apache.org> on 2006/01/15 14:20:19 UTC, 0 replies.
- [jira] Updated: (NUTCH-177) Default installation seems to produce working entity of nutch - posted by "Matthias Günter (JIRA)" <ji...@apache.org> on 2006/01/15 14:22:20 UTC, 1 replies.
- [jira] Created: (NUTCH-178) in search.jsp must be session creation "false" - posted by "YourSoft (JIRA)" <ji...@apache.org> on 2006/01/15 15:17:19 UTC, 0 replies.
- Seperating mapred/ndfs and nutch search engine - posted by Dominik Friedrich <do...@wipe-records.org> on 2006/01/15 18:43:00 UTC, 0 replies.
- [jira] Created: (NUTCH-179) Proposition: Enable Nutch to use a parser plugin not just based on content type - posted by "Gal Nitzan (JIRA)" <ji...@apache.org> on 2006/01/15 23:37:19 UTC, 0 replies.
- [jira] Updated: (NUTCH-179) Proposition: Enable Nutch to use a parser plugin not just based on content type - posted by "Gal Nitzan (JIRA)" <ji...@apache.org> on 2006/01/15 23:40:20 UTC, 2 replies.
- [jira] Created: (NUTCH-180) Performance problem with widely used keywords - posted by "Mike Alulin (JIRA)" <ji...@apache.org> on 2006/01/16 00:59:19 UTC, 0 replies.
- [jira] Updated: (NUTCH-180) Performance problem with widely used keywords - posted by "Mike Alulin (JIRA)" <ji...@apache.org> on 2006/01/16 01:01:20 UTC, 0 replies.
- [jira] Commented: (NUTCH-180) Performance problem with widely used keywords - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/01/16 08:06:21 UTC, 0 replies.
- [jira] Created: (NUTCH-181) mapred.local.dir temp dir. space allocation limited by smallest area - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2006/01/17 02:11:19 UTC, 0 replies.
- question/suggestion on nutch file format - posted by Tom <ap...@yahoo.com> on 2006/01/17 05:50:52 UTC, 0 replies.
- Class MultiProperties - posted by Rida Benjelloun <ri...@doculibre.com> on 2006/01/17 22:58:25 UTC, 0 replies.
- Pagination for the Web App - posted by Tyrell Perera <ty...@gmail.com> on 2006/01/18 06:49:25 UTC, 0 replies.
- [jira] Updated: (NUTCH-102) jobtracker does not start when webapps is in src - posted by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2006/01/18 22:38:42 UTC, 0 replies.
- [jira] Commented: (NUTCH-102) jobtracker does not start when webapps is in src - posted by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2006/01/18 22:42:42 UTC, 0 replies.
- [jira] Resolved: (NUTCH-102) jobtracker does not start when webapps is in src - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/18 23:05:42 UTC, 0 replies.
- [jira] Commented: (NUTCH-136) mapreduce segment generator generates 50 % less than excepted urls - posted by "Dominik Friedrich (JIRA)" <ji...@apache.org> on 2006/01/19 02:09:43 UTC, 4 replies.
- [jira] Commented: (NUTCH-117) Crawl crashes with java.io.IOException: already exists: C:\nutch\crawl.intranet\oct18\db\webdb.new\pagesByURL - posted by "Spike Wang (JIRA)" <ji...@apache.org> on 2006/01/19 05:02:47 UTC, 1 replies.
- [jira] Closed: (NUTCH-179) Proposition: Enable Nutch to use a parser plugin not just based on content type - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/19 20:40:43 UTC, 0 replies.
- [jira] Resolved: (NUTCH-177) Default installation seems to produce working entity of nutch - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/19 21:46:42 UTC, 0 replies.
- [jira] Resolved: (NUTCH-176) Using -dir: creates an error, when the directory already exists - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/19 21:48:42 UTC, 0 replies.
- [jira] Updated: (NUTCH-136) mapreduce segment generator generates 50 % less than excepted urls - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/19 22:37:43 UTC, 0 replies.
- [jira] Commented: (NUTCH-173) PerHost Crawling Policy ( crawl.ignore.external.links ) - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/19 22:45:42 UTC, 1 replies.
- Authentication / Content-type - posted by Thushara Wijeratna <Th...@revenuescience.com> on 2006/01/19 23:08:14 UTC, 0 replies.
- [jira] Created: (NUTCH-182) Log when db.max configuration limits reached - posted by "Matt Kangas (JIRA)" <ji...@apache.org> on 2006/01/20 02:48:41 UTC, 0 replies.
- [jira] Updated: (NUTCH-182) Log when db.max configuration limits reached - posted by "Matt Kangas (JIRA)" <ji...@apache.org> on 2006/01/20 03:30:42 UTC, 0 replies.
- Patch for NDFS's df.java - posted by Dominik Friedrich <do...@wipe-records.org> on 2006/01/20 12:54:04 UTC, 2 replies.
- [jira] Commented: (NUTCH-134) Summarizer doesn't select the best snippets - posted by "byron miller (JIRA)" <ji...@apache.org> on 2006/01/20 14:31:42 UTC, 0 replies.
- Link index & page content obtained separately. - posted by Pashabhai <pa...@yahoo.com> on 2006/01/20 14:33:11 UTC, 0 replies.
- [jira] Commented: (NUTCH-68) A tool to generate arbitrary fetchlists - posted by "byron miller (JIRA)" <ji...@apache.org> on 2006/01/20 15:02:42 UTC, 0 replies.
- lang identifier and nutch analyzer in trunk - posted by Jack Tang <hi...@gmail.com> on 2006/01/20 17:37:32 UTC, 15 replies.
- [jira] Updated: (NUTCH-183) MapReduce has a series of problems concerning task-allocation to worker nodes - posted by "Mike Cafarella (JIRA)" <ji...@apache.org> on 2006/01/20 21:42:42 UTC, 0 replies.
- [jira] Created: (NUTCH-183) MapReduce has a series of problems concerning task-allocation to worker nodes - posted by "Mike Cafarella (JIRA)" <ji...@apache.org> on 2006/01/20 21:42:42 UTC, 0 replies.
- [jira] Commented: (NUTCH-183) MapReduce has a series of problems concerning task-allocation to worker nodes - posted by "byron miller (JIRA)" <ji...@apache.org> on 2006/01/21 00:10:42 UTC, 5 replies.
- tool to mount nutch filesystem - posted by John X <jo...@neasys.com> on 2006/01/21 01:55:17 UTC, 5 replies.
- [jira] Closed: (NUTCH-45) Log corrupt segments in SegmentMergeTool - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2006/01/21 08:56:42 UTC, 0 replies.
- Using org.apache.nutch.indexer.IndexMerger (Nutch 0.7) - posted by Chun Wei Ho <cw...@gmail.com> on 2006/01/23 03:56:09 UTC, 0 replies.
- [jira] Commented: (NUTCH-127) uncorrect values using -du, or ls does not return items - posted by "Mike Cafarella (JIRA)" <ji...@apache.org> on 2006/01/23 05:54:04 UTC, 0 replies.
- protocol-httpclient; maximum total connections - posted by or...@agmlab.com on 2006/01/23 15:00:17 UTC, 1 replies.
- [jira] Resolved: (NUTCH-127) uncorrect values using -du, or ls does not return items - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/01/23 19:48:09 UTC, 0 replies.
- xml-parser plugin contribution - posted by Rida Benjelloun <ri...@doculibre.com> on 2006/01/24 05:09:42 UTC, 3 replies.
- patch for nutch and nutch-daemon.sh - posted by Zaheed Haque <za...@gmail.com> on 2006/01/24 08:18:22 UTC, 0 replies.
- Nutch merge problem after fetch is aborted with hung threads. - posted by Lukas Vlcek <lu...@gmail.com> on 2006/01/24 10:11:07 UTC, 0 replies.
- Two possible extensions - posted by "Guenter, Matthias" <Ma...@ipi.ch> on 2006/01/24 11:08:19 UTC, 2 replies.
- [jira] Created: (NUTCH-184) Serbian (sr, Cyrilic) and Serbo-Croatian (sh, Latin) translation - posted by "Ivan Sekulovic (JIRA)" <ji...@apache.org> on 2006/01/24 12:21:09 UTC, 0 replies.
- [jira] Updated: (NUTCH-184) Serbian (sr, Cyrilic) and Serbo-Croatian (sh, Latin) translation - posted by "Ivan Sekulovic (JIRA)" <ji...@apache.org> on 2006/01/24 12:21:10 UTC, 0 replies.
- [jira] Created: (NUTCH-185) XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields. - posted by "Rida Benjelloun (JIRA)" <ji...@apache.org> on 2006/01/24 17:47:09 UTC, 0 replies.
- [jira] Updated: (NUTCH-185) XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields. - posted by "Rida Benjelloun (JIRA)" <ji...@apache.org> on 2006/01/24 17:49:09 UTC, 0 replies.
- [jira] Created: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml - posted by "Gal Nitzan (JIRA)" <ji...@apache.org> on 2006/01/24 23:05:10 UTC, 0 replies.
- [jira] Closed: (NUTCH-136) mapreduce segment generator generates 50 % less than excepted urls - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/01/24 23:20:10 UTC, 0 replies.
- [jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/01/24 23:23:09 UTC, 5 replies.
- [jira] Updated: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml - posted by "Gal Nitzan (JIRA)" <ji...@apache.org> on 2006/01/25 01:36:09 UTC, 1 replies.
- [jira] Created: (NUTCH-187) Run Nutch on Windows without Cygwin - posted by "Dominik Friedrich (JIRA)" <ji...@apache.org> on 2006/01/25 09:44:09 UTC, 0 replies.
- [jira] Updated: (NUTCH-187) Run Nutch on Windows without Cygwin - posted by "Dominik Friedrich (JIRA)" <ji...@apache.org> on 2006/01/25 09:52:09 UTC, 0 replies.
- [jira] Commented: (NUTCH-187) Run Nutch on Windows without Cygwin - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/25 20:25:09 UTC, 0 replies.
- Re: Optimizing which links to fetch - posted by Doug Cutting <cu...@nutch.org> on 2006/01/25 20:31:44 UTC, 0 replies.
- Re: Ideas for enhancements - posted by Doug Cutting <cu...@nutch.org> on 2006/01/25 20:41:14 UTC, 0 replies.
- Re: Searchable mailing lists on nutch.org? - posted by Doug Cutting <cu...@nutch.org> on 2006/01/25 20:47:05 UTC, 0 replies.
- need volunteer to develop search for apache.org - posted by Doug Cutting <cu...@nutch.org> on 2006/01/25 22:24:26 UTC, 12 replies.
- [jira] Commented: (NUTCH-185) XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields. - posted by "Philippe EUGENE (JIRA)" <ji...@apache.org> on 2006/01/26 15:42:10 UTC, 1 replies.
- [jira] Created: (NUTCH-188) Add searchable mailing list links to http://lucene.apache.org/nutch/mailing_lists.html - posted by "Andy Liu (JIRA)" <ji...@apache.org> on 2006/01/26 17:50:10 UTC, 0 replies.
- [jira] Updated: (NUTCH-188) Add searchable mailing list links to http://lucene.apache.org/nutch/mailing_lists.html - posted by "Andy Liu (JIRA)" <ji...@apache.org> on 2006/01/26 17:52:09 UTC, 0 replies.
- [jira] Created: (NUTCH-189) Injection infinite loop - posted by "Andy Liu (JIRA)" <ji...@apache.org> on 2006/01/26 17:54:11 UTC, 0 replies.
- A Nutch config editor... - posted by Dominik Friedrich <do...@wipe-records.org> on 2006/01/26 18:57:18 UTC, 2 replies.
- [jira] Commented: (NUTCH-59) meta data support in webdb - posted by "James Jonas (JIRA)" <ji...@apache.org> on 2006/01/26 20:36:09 UTC, 3 replies.
- [jira] Created: (NUTCH-190) ParseUtil drops reason for failed parse - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2006/01/26 23:33:10 UTC, 0 replies.
- [jira] Updated: (NUTCH-190) ParseUtil drops reason for failed parse - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2006/01/26 23:35:10 UTC, 0 replies.
- [jira] Commented: (NUTCH-190) ParseUtil drops reason for failed parse - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2006/01/26 23:43:09 UTC, 1 replies.
- [jira] Closed: (NUTCH-190) ParseUtil drops reason for failed parse - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/01/27 00:49:34 UTC, 0 replies.
- Re: [Nutch-cvs] svn commit: r372810 - /lucene/nutch/trunk/bin/nutch - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/01/27 12:01:47 UTC, 3 replies.
- Re: svn commit: r372810 - /lucene/nutch/trunk/bin/nutch - posted by Rod Taylor <rb...@sitesell.com> on 2006/01/27 15:23:19 UTC, 5 replies.
- older Nutch list archives (@sf.net)? - posted by "Gordon Mohr (archive.org)" <go...@archive.org> on 2006/01/27 22:45:18 UTC, 4 replies.
- [jira] Commented: (NUTCH-189) Injection infinite loop - posted by "Bryan Pendleton (JIRA)" <ji...@apache.org> on 2006/01/27 23:29:33 UTC, 0 replies.
- [jira] Updated: (NUTCH-189) Injection infinite loop - posted by "Bryan Pendleton (JIRA)" <ji...@apache.org> on 2006/01/27 23:33:33 UTC, 0 replies.
- Nutch - New Features (?) - posted by Fuad Efendi <fu...@efendi.ca> on 2006/01/28 06:28:36 UTC, 1 replies.
- [jira] Commented: (NUTCH-95) DeleteDuplicates depends on the order of input segments - posted by "byron miller (JIRA)" <ji...@apache.org> on 2006/01/29 02:32:33 UTC, 1 replies.
- [jira] Commented: (NUTCH-16) boost documents matching a url pattern - posted by "byron miller (JIRA)" <ji...@apache.org> on 2006/01/29 03:05:35 UTC, 0 replies.
- [jira] Commented: (NUTCH-79) Fault tolerant searching. - posted by "byron miller (JIRA)" <ji...@apache.org> on 2006/01/29 03:09:34 UTC, 1 replies.
- [jira] Commented: (NUTCH-14) NullPointerException NutchBean.getSummary - posted by "byron miller (JIRA)" <ji...@apache.org> on 2006/01/29 03:11:32 UTC, 1 replies.
- [bug] combiner class never used - posted by Stefan Groschupf <sg...@media-style.com> on 2006/01/30 01:41:53 UTC, 4 replies.
- CrawlDb and inputDir's - posted by Stefan Groschupf <sg...@media-style.com> on 2006/01/30 02:33:14 UTC, 0 replies.
- where we need meta data? - posted by Stefan Groschupf <sg...@media-style.com> on 2006/01/30 03:12:52 UTC, 2 replies.
- [jira] Assigned: (NUTCH-169) remove static NutchConf - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/01/30 21:50:33 UTC, 0 replies.
- [jira] Created: (NUTCH-191) InputFormat used in job must be in JobTracker classpath (not loaded from job JAR) - posted by "Bryan Pendleton (JIRA)" <ji...@apache.org> on 2006/01/30 23:00:32 UTC, 0 replies.
- [jira] Created: (NUTCH-192) meta data support for CrawlDatum - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/01/31 01:17:32 UTC, 0 replies.
- [jira] Updated: (NUTCH-192) meta data support for CrawlDatum - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/01/31 01:30:33 UTC, 0 replies.
- [jira] Commented: (NUTCH-192) meta data support for CrawlDatum - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/01/31 10:04:33 UTC, 4 replies.
- indexSorter - applied to SVN or patch in Jira? - posted by Byron Miller <by...@yahoo.com> on 2006/01/31 14:35:47 UTC, 1 replies.
- mapred: config parameters - posted by Michael Nebel <mi...@nebel.de> on 2006/01/31 15:52:20 UTC, 1 replies.
- [jira] Closed: (NUTCH-169) remove static NutchConf - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/01/31 17:10:34 UTC, 0 replies.
- [jira] Created: (NUTCH-193) move NDFS and MapReduce to a separate project - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/31 18:53:32 UTC, 0 replies.
- [jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/31 19:06:32 UTC, 6 replies.
- [jira] Commented: (NUTCH-191) InputFormat used in job must be in JobTracker classpath (not loaded from job JAR) - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/31 20:15:33 UTC, 0 replies.
- [jira] Commented: (NUTCH-44) too many search results - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/01/31 20:23:33 UTC, 0 replies.
- Re: CrawlDb and inputDir's - posted by Doug Cutting <cu...@nutch.org> on 2006/01/31 20:31:44 UTC, 1 replies.
- [Fwd: NutchCVS/0.8-dev] - posted by Doug Cutting <cu...@nutch.org> on 2006/01/31 20:33:08 UTC, 0 replies.
- Re: Lucene's VInt for lengths/counts/sizes - posted by Stefan Groschupf <sg...@media-style.com> on 2006/01/31 22:05:48 UTC, 2 replies.
- [jira] Created: (NUTCH-194) Nutch-169 introduced two tiny bugs - posted by "Marko Bauhardt (JIRA)" <ji...@apache.org> on 2006/01/31 22:35:32 UTC, 0 replies.
- [jira] Updated: (NUTCH-194) Nutch-169 introduced two tiny bugs - posted by "Marko Bauhardt (JIRA)" <ji...@apache.org> on 2006/01/31 22:36:35 UTC, 0 replies.