You are viewing a plain text version of this content. The canonical link for it is here.
- NekoHTML 0.9.5 - posted by Byron Miller <by...@yahoo.com> on 2005/11/01 14:34:24 UTC, 2 replies.
- Halloween Joke at Google - posted by Fuad Efendi <fu...@efendi.ca> on 2005/11/02 04:04:31 UTC, 12 replies.
- Crawling unpolite problem - posted by Christophe Noel <ch...@cetic.be> on 2005/11/03 10:11:51 UTC, 0 replies.
- defininig order of IndexingFilters / accessing Fields of passed Document - posted by "Mr. Udatny" <ru...@rosa.com> on 2005/11/03 17:18:19 UTC, 0 replies.
- Error on parser? new parser parse gif jpeg? - posted by Massimo Miccoli <mm...@iltrovatore.it> on 2005/11/03 19:20:57 UTC, 0 replies.
- mapred bug -- bad part calculation? - posted by Rod Taylor <rb...@sitesell.com> on 2005/11/03 21:32:08 UTC, 22 replies.
- Hiring a Nutch Developer - posted by Nathan Gwilliam <na...@gwilliam.com> on 2005/11/04 09:18:28 UTC, 2 replies.
- nutch cluster questions. - posted by Arsen Popovyan <sb...@orbita1.ru> on 2005/11/04 14:32:51 UTC, 1 replies.
- [jira] Created: (NUTCH-123) Cache.jsp some times generate NullPointerException - posted by "Lutischán Ferenc (JIRA)" <ji...@apache.org> on 2005/11/04 18:24:19 UTC, 0 replies.
- Re: mapred questions - posted by Doug Cutting <cu...@nutch.org> on 2005/11/04 22:47:29 UTC, 0 replies.
- [jira] Updated: (NUTCH-116) TestNDFS a JUnit test specifically for NDFS - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/11/04 23:30:19 UTC, 1 replies.
- [jira] Created: (NUTCH-124) protocol-httpclient does not follow redirects when fetching robots.txt - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/11/05 04:39:19 UTC, 0 replies.
- [jira] Commented: (NUTCH-99) ports are hardcoded or random - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/11/05 16:40:19 UTC, 5 replies.
- Javacc - posted by "Rajan, Renuka" <re...@navteq.com> on 2005/11/07 04:53:40 UTC, 2 replies.
- session support for nutch - posted by ma...@provinzial.com on 2005/11/07 08:26:45 UTC, 0 replies.
- java open source software for Tagging ? - posted by AJ Chen <ca...@gmail.com> on 2005/11/07 09:02:15 UTC, 0 replies.
- probem with inject url to db using ndfs - posted by Arsen Popovyan <sb...@orbita1.ru> on 2005/11/07 15:29:06 UTC, 1 replies.
- standard version of log4j - posted by Byron Miller <by...@yahoo.com> on 2005/11/07 16:08:11 UTC, 3 replies.
- [jira] Resolved: (NUTCH-124) protocol-httpclient does not follow redirects when fetching robots.txt - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/11/07 19:16:20 UTC, 0 replies.
- Site Query Filter Bug? - posted by Matt Zytaruk <ma...@wavefire.com> on 2005/11/07 20:14:35 UTC, 0 replies.
- Request for info regarding filesystem based index. - posted by Mike Reynols <au...@hotmail.com> on 2005/11/08 06:45:22 UTC, 0 replies.
- rank system - posted by Anton Potehin <an...@orbita1.ru> on 2005/11/08 11:22:43 UTC, 2 replies.
- index folder structure - posted by Marko Bauhardt <mb...@media-style.com> on 2005/11/08 13:40:02 UTC, 0 replies.
- questions - posted by Anton Potehin <an...@orbita1.ru> on 2005/11/08 14:31:20 UTC, 3 replies.
- mapred branch - posted by Andrew McNabb <am...@mcnabbs.org> on 2005/11/08 18:11:17 UTC, 1 replies.
- Index update and Google Dance - posted by Jack Tang <hi...@gmail.com> on 2005/11/08 18:38:07 UTC, 10 replies.
- [OTAnn] Feedback - posted by shenanigans <ma...@roomity.com> on 2005/11/08 19:29:07 UTC, 0 replies.
- mapreduce with large amounts of data - posted by Andrew McNabb <am...@mcnabbs.org> on 2005/11/08 20:16:03 UTC, 0 replies.
- Distributed nutch - posted by Rozina Sorathia <Ro...@KPITCummins.com> on 2005/11/09 12:36:29 UTC, 4 replies.
- Re: [Nutch-dev] [jira] Resolved: (NUTCH-124) protocol-httpclient does not follow redirects when fetching robots.txt - posted by Massimo Miccoli <mm...@iltrovatore.it> on 2005/11/09 14:17:47 UTC, 1 replies.
- Lucene or Nutch - posted by Klaus <kl...@vommond.de> on 2005/11/09 14:48:35 UTC, 9 replies.
- protocol-http versus protocol-httpclient - posted by Doug Cutting <cu...@nutch.org> on 2005/11/09 19:19:18 UTC, 5 replies.
- [jira] Closed: (NUTCH-109) Nutch - Fetcher - Performance Test - new Protocol-HTTPClient-Innovation - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2005/11/09 21:03:03 UTC, 0 replies.
- [jira] Commented: (NUTCH-36) Chinese in Nutch - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2005/11/09 21:09:03 UTC, 0 replies.
- [jira] Commented: (NUTCH-124) protocol-httpclient does not follow redirects when fetching robots.txt - posted by "Fuad Efendi (JIRA)" <ji...@apache.org> on 2005/11/10 04:30:05 UTC, 0 replies.
- What is suitable environment? - posted by KAAS INFOTECH <ar...@gmail.com> on 2005/11/10 06:44:40 UTC, 1 replies.
- Problem about method Query#query() - posted by Game Now <ga...@gmail.com> on 2005/11/10 09:37:07 UTC, 2 replies.
- Do nutch help me? - posted by Arun Kumar Sharma <sh...@yahoo.co.in> on 2005/11/10 10:04:03 UTC, 4 replies.
- problem with inject url on mapred - posted by Anton Potehin <an...@orbita1.ru> on 2005/11/10 12:33:14 UTC, 5 replies.
- Max Per Host and topN - posted by Rod Taylor <rb...@sitesell.com> on 2005/11/10 19:03:33 UTC, 2 replies.
- Re: [Nutch Wiki] Update of "PluginCentral" by JakeVanderdray - posted by Stefan Groschupf <sg...@media-style.com> on 2005/11/10 20:10:25 UTC, 0 replies.
- threading versus nio - posted by Doug Cutting <cu...@nutch.org> on 2005/11/10 21:03:19 UTC, 9 replies.
- [jira] Commented: (NUTCH-110) OpenSearchServlet outputs illegal xml characters - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2005/11/10 23:34:03 UTC, 0 replies.
- Fetch not finishing everything in its list? - posted by Rod Taylor <rb...@sitesell.com> on 2005/11/11 03:58:01 UTC, 2 replies.
- Require answer for configuration and other issues - posted by Arun Kumar Sharma <sh...@yahoo.co.in> on 2005/11/11 07:26:56 UTC, 0 replies.
- How to parse the content of password-protected site- OR Do Nutuch can parse the content of password protected site? - posted by Arun Kumar Sharma <sh...@yahoo.co.in> on 2005/11/11 10:48:33 UTC, 0 replies.
- Re: [Nutch Wiki] Update of "OverviewDeploymentConfigs" by PaulBaclace - posted by Stefan Groschupf <sg...@media-style.com> on 2005/11/11 13:45:20 UTC, 2 replies.
- Re: [Nutch-cvs] [Nutch Wiki] Update of "OverviewDeploymentConfigs" by PaulBaclace - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/11/11 17:24:42 UTC, 0 replies.
- mapSearcher was Re: Index update and Google Dance - posted by Stefan Groschupf <sg...@media-style.com> on 2005/11/11 18:32:04 UTC, 0 replies.
- [jira] Updated: (NUTCH-99) ports are hardcoded or random - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/11/11 19:03:03 UTC, 0 replies.
- Urlfilter Patch - posted by Rod Taylor <rb...@sitesell.com> on 2005/11/11 19:48:14 UTC, 3 replies.
- to find if a url is present in the nutch master index - posted by Rozina Sorathia <Ro...@KPITCummins.com> on 2005/11/12 07:00:42 UTC, 0 replies.
- lucene write.lock error - posted by Kashif Khadim <ka...@yahoo.com> on 2005/11/12 12:56:39 UTC, 1 replies.
- NDFS/Mapreduce questions - posted by Joanna Harpell <jo...@gmail.com> on 2005/11/13 00:33:37 UTC, 1 replies.
- mapper & Exceptions - posted by Stefan Groschupf <sg...@media-style.com> on 2005/11/13 17:28:36 UTC, 0 replies.
- Re: suspicious outlink count - posted by Piotr Kosiorowski <pk...@gmail.com> on 2005/11/13 21:13:15 UTC, 0 replies.
- InterruptedException from ControllerThreadSocketFactory.SocketTask - posted by Chris Schneider <Sc...@TransPac.com> on 2005/11/13 22:43:07 UTC, 0 replies.
- Question of Range search - posted by Game Now <ga...@gmail.com> on 2005/11/14 04:39:32 UTC, 1 replies.
- a defect of org.apache.nutch.analysis.NutchAnalysis ? - posted by Game Now <ga...@gmail.com> on 2005/11/14 04:46:26 UTC, 1 replies.
- Fetcher timeout - posted by Jonathan Reichhold <jd...@speakeasy.net> on 2005/11/14 18:40:18 UTC, 0 replies.
- [jira] Closed: (NUTCH-99) ports are hardcoded or random - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2005/11/14 23:41:28 UTC, 0 replies.
- Nutch WebDb storage alternatives: Revisited - posted by "Dalton, Jeffery" <jd...@globalspec.com> on 2005/11/15 23:15:59 UTC, 9 replies.
- what happened to NUTCH Doug - posted by "tigger ." <b1...@hotmail.com> on 2005/11/15 23:16:45 UTC, 0 replies.
- Issue with index-more and query-more plugins - posted by Jonathan Reichhold <jd...@speakeasy.net> on 2005/11/16 20:18:49 UTC, 2 replies.
- Log Newly Found Urls - Patch - posted by Rod Taylor <rb...@sitesell.com> on 2005/11/16 21:51:10 UTC, 0 replies.
- [Slightly off topic] A search interface for the next generation? - posted by Dawid Weiss <da...@cs.put.poznan.pl> on 2005/11/17 09:27:09 UTC, 0 replies.
- Expiry of a page in the Nutch database - posted by Rozina Sorathia <Ro...@KPITCummins.com> on 2005/11/17 12:29:47 UTC, 0 replies.
- [jira] Created: (NUTCH-125) OpenOffice Parser plugin - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2005/11/17 23:32:41 UTC, 0 replies.
- [jira] Updated: (NUTCH-125) OpenOffice Parser plugin - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2005/11/17 23:34:41 UTC, 0 replies.
- [jira] Created: (NUTCH-126) Fetching via https does not work with a proxy (patch) - posted by "Fritz Elfert (JIRA)" <ji...@apache.org> on 2005/11/18 10:14:41 UTC, 0 replies.
- [jira] Updated: (NUTCH-126) Fetching via https does not work with a proxy (patch) - posted by "Fritz Elfert (JIRA)" <ji...@apache.org> on 2005/11/18 10:14:42 UTC, 1 replies.
- [jira] Created: (NUTCH-127) uncorrect values using -du, or ls does not return items - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/11/18 16:37:41 UTC, 0 replies.
- Urlfilter bug (doesn't return on long URLs) - posted by Rod Taylor <rb...@sitesell.com> on 2005/11/19 05:01:28 UTC, 2 replies.
- Problem with CRC files on NDFS - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/11/19 17:24:16 UTC, 1 replies.
- About tomcat - posted by Anton Potehin <an...@orbita1.ru> on 2005/11/21 09:48:43 UTC, 0 replies.
- jobdetails.jsp and jobtracker.jsp - posted by an...@orbita1.ru on 2005/11/21 10:34:03 UTC, 5 replies.
- mapred.map.tasks - posted by an...@orbita1.ru on 2005/11/21 11:22:56 UTC, 0 replies.
- fetcher.thread.per.host not working ?? - posted by Christophe Noel <ch...@cetic.be> on 2005/11/21 14:22:32 UTC, 0 replies.
- merging auto-crawls - posted by Ben Halsted <bh...@gmail.com> on 2005/11/21 19:46:12 UTC, 0 replies.
- Re: mapred.map.tasks - posted by Doug Cutting <cu...@nutch.org> on 2005/11/22 00:09:46 UTC, 1 replies.
- Performance issues with ConjunctionScorer - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/11/22 12:49:45 UTC, 6 replies.
- ndfs / Lost connection to namenode - posted by "Mr. Udatny" <ru...@rosa.com> on 2005/11/22 16:22:24 UTC, 1 replies.
- Questions about Nutch and enterprise search - posted by Karine Storaker <ka...@gmail.com> on 2005/11/22 19:40:39 UTC, 1 replies.
- [Fwd: Spider Causing Contact Form Submissions] - posted by Doug Cutting <cu...@nutch.org> on 2005/11/22 20:30:56 UTC, 2 replies.
- Small bug in Generator - posted by an...@orbita1.ru on 2005/11/23 09:35:33 UTC, 2 replies.
- mapred crawl - posted by Anton Potehin <an...@orbita1.ru> on 2005/11/23 11:44:34 UTC, 0 replies.
- MapRed Generator - posted by Anton Potehin <an...@orbita1.ru> on 2005/11/23 15:04:32 UTC, 0 replies.
- Re: svn commit: r348431 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl: CrawlDatum.java CrawlDbReader.java - posted by Sami Siren <s....@sonera.inet.fi> on 2005/11/23 18:18:03 UTC, 5 replies.
- Re: MapRed Generator - posted by Doug Cutting <cu...@nutch.org> on 2005/11/23 18:33:05 UTC, 0 replies.
- [jira] Commented: (NUTCH-120) one "bad" link on a page kills parsing - posted by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/11/23 23:31:36 UTC, 1 replies.
- [proposal] Generic Markup Language Parser - posted by Jérôme Charron <je...@gmail.com> on 2005/11/24 00:01:54 UTC, 11 replies.
- Incremental crawling - posted by Anton Potehin <an...@orbita1.ru> on 2005/11/24 09:26:39 UTC, 1 replies.
- problem with ndfs - posted by Anton Potehin <an...@orbita1.ru> on 2005/11/24 16:04:19 UTC, 1 replies.
- [jira] Created: (NUTCH-128) second configuration nodes overwrites first node - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/11/24 16:12:55 UTC, 3 replies.
- Re: [Nutch-dev] RE: [proposal] Generic Markup Language Parser - posted by Erik Hatcher <er...@ehatchersolutions.com> on 2005/11/25 11:30:27 UTC, 2 replies.
- [jira] Created: (NUTCH-129) rtf-parser does not work when opened with wordpad files and saved - posted by "raghavendra prabhu (JIRA)" <ji...@apache.org> on 2005/11/25 13:56:55 UTC, 0 replies.
- unsubscribe me please - posted by Keith Campbell <ke...@mac.com> on 2005/11/26 17:57:43 UTC, 0 replies.
- [jira] Closed: (NUTCH-67) I want crawl the websites including news.yahoo.com,game.yahoo.com,blog.yahoo.com,etc! - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/11/27 15:11:56 UTC, 0 replies.
- Summary length - posted by rupa priya <ru...@yahoo.com> on 2005/11/28 09:45:08 UTC, 0 replies.
- Need metadata transport. - posted by ma...@provinzial.com on 2005/11/28 10:37:28 UTC, 1 replies.
- translation in the Italian language - posted by pa...@cli.di.unipi.it on 2005/11/28 19:00:10 UTC, 2 replies.
- I want translate in the Italian language - posted by pa...@cli.di.unipi.it on 2005/11/29 17:21:39 UTC, 0 replies.
- [jira] Created: (NUTCH-130) Be explicit about target JVM when building (1.4.x?) - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2005/11/29 19:43:30 UTC, 0 replies.
- (Re-Formatted) RE: Nutch WebDb storage alternatives: Revisited - posted by "Dalton, Jeffery" <jd...@globalspec.com> on 2005/11/29 20:09:46 UTC, 0 replies.
- How to hack the config? - posted by Kristan Uccello <ku...@gmail.com> on 2005/11/29 22:57:17 UTC, 2 replies.