You are viewing a plain text version of this content. The canonical link for it is here.
- upgrading to hadoop-0.4 - posted by Zaheed Haque <za...@gmail.com> on 2006/07/01 11:16:52 UTC, 2 replies.
- how can I index only a portion of html content? - posted by Brent Verner <br...@rcfile.org> on 2006/07/02 23:33:58 UTC, 1 replies.
- Re: deleting URL duplicates - never actually deleted? - posted by Marko Bauhardt <mb...@media-style.com> on 2006/07/02 23:39:19 UTC, 0 replies.
- How to use multiple indexes - posted by Maher <lo...@yahoo.com> on 2006/07/03 16:26:30 UTC, 0 replies.
- Re: Fetcher hanging temporarily on "deflateBytes" method - posted by Daniel Varela Santoalla <dv...@ecmwf.int> on 2006/07/04 10:07:45 UTC, 0 replies.
- when to use STATUS_SIGNATURE in CrawlDatum - posted by Feng Ji <fe...@gmail.com> on 2006/07/04 21:33:38 UTC, 0 replies.
- logging - posted by Murat Ali Bayir <mu...@agmlab.com> on 2006/07/05 16:05:23 UTC, 6 replies.
- problem with fetching PDF or word format - posted by aicha BEN <ai...@yahoo.com> on 2006/07/05 17:24:42 UTC, 1 replies.
- Alternatives - posted by karl wettin <ka...@snigel.net> on 2006/07/05 17:58:15 UTC, 2 replies.
- Re : problem with fetching PDF or word format - posted by aicha BEN <ai...@yahoo.com> on 2006/07/05 18:00:57 UTC, 0 replies.
- after mergesegs - updatedb? - posted by Honda-Search Administrator <ad...@honda-search.com> on 2006/07/05 20:09:32 UTC, 0 replies.
- Re: [Nutch-general] Alternatives - posted by Jason Calabrese <ma...@jasoncalabrese.com> on 2006/07/06 17:26:53 UTC, 0 replies.
- why i can't crawl all the linked pages in the specified page to crawl. - posted by kevin pang <ke...@gmail.com> on 2006/07/07 04:12:33 UTC, 4 replies.
- Link db (traversal + modification) - posted by og...@yahoo.com on 2006/07/07 07:47:42 UTC, 2 replies.
- Re: [Nutch-general] Link db (traversal + modification) - posted by og...@yahoo.com on 2006/07/07 18:50:23 UTC, 5 replies.
- Number of pages different to number of indexed pages - posted by Lourival Júnior <ju...@gmail.com> on 2006/07/07 19:20:50 UTC, 2 replies.
- Index algorithm - posted by Lourival Júnior <ju...@gmail.com> on 2006/07/07 20:53:15 UTC, 1 replies.
- how to change fetch pages's character encoding when crawing the web pages. - posted by kevin pang <ke...@gmail.com> on 2006/07/09 06:14:20 UTC, 0 replies.
- Re: .8 svn - fetcher performance.. - posted by Zaheed Haque <za...@gmail.com> on 2006/07/10 11:42:17 UTC, 2 replies.
- Character corruption in localized search result GUI? - posted by Teruhiko Kurosaka <Ku...@basistech.com> on 2006/07/11 01:27:35 UTC, 1 replies.
- question about plugins - posted by Abdelhakim Diab <ab...@gmail.com> on 2006/07/11 14:00:51 UTC, 1 replies.
- OpenOffice Support? - posted by Matthew Holt <mh...@redhat.com> on 2006/07/11 15:11:47 UTC, 1 replies.
- Re: Adddays confusion - easy question for the experts - posted by Matthew Holt <mh...@redhat.com> on 2006/07/11 22:51:21 UTC, 2 replies.
- Eclipse IDE - posted by Matthew Holt <mh...@redhat.com> on 2006/07/12 00:29:05 UTC, 2 replies.
- Nutch Newbie question: why do .txt files not show up in search results? - posted by Scott Kim <sc...@gmail.com> on 2006/07/12 08:25:54 UTC, 1 replies.
- Pornfilter - posted by "NG-Marketing, M.Schneider" <sc...@ng-marketing.com> on 2006/07/12 12:10:00 UTC, 2 replies.
- Re[2]: Adddays confusion - easy question for the experts - posted by Dima Mazmanov <nu...@proservice.ge> on 2006/07/12 14:18:16 UTC, 0 replies.
- DocNo ? - posted by Marco Pereira <ma...@gmail.com> on 2006/07/12 16:25:33 UTC, 0 replies.
- Bug in nutch admin option -top? - posted by Timo Scheuer <ti...@dfki.de> on 2006/07/12 17:09:56 UTC, 0 replies.
- any success with php-java-bridge and Nutch? - posted by Chris Stephens <ch...@liveoakinteractive.com> on 2006/07/12 19:20:29 UTC, 3 replies.
- Error running intranet crawl with 0.8.0-dev - posted by Daniel Varela Santoalla <dv...@ecmwf.int> on 2006/07/12 19:31:44 UTC, 3 replies.
- Customizing Search Results - posted by Matthew Holt <mh...@redhat.com> on 2006/07/12 20:09:34 UTC, 3 replies.
- parse-oo plugin - posted by Matthew Holt <mh...@redhat.com> on 2006/07/13 06:35:29 UTC, 0 replies.
- Re: nutch 0.7.2 does not work - posted by manish_sanju <ma...@universeinfosys.com> on 2006/07/13 12:23:49 UTC, 0 replies.
- nutch suitable for blogs? - posted by Chris Newton <cd...@gmail.com> on 2006/07/13 15:58:18 UTC, 1 replies.
- Commom words - posted by Marco Pereira <ma...@gmail.com> on 2006/07/13 17:36:40 UTC, 2 replies.
- Nutch and the Law - posted by Marco Pereira <ma...@gmail.com> on 2006/07/13 18:25:15 UTC, 0 replies.
- nutch-0.8.0-dev search error - posted by Matthew Holt <mh...@redhat.com> on 2006/07/13 18:47:37 UTC, 2 replies.
- 0.8.0 stable enough to use? - posted by Matthew Holt <mh...@redhat.com> on 2006/07/13 19:28:19 UTC, 4 replies.
- Takes a long time for the reduce to go from 95% to 100% - posted by "Shekhar, Jayant" <js...@shopping.com> on 2006/07/13 19:47:56 UTC, 0 replies.
- Added 0 pages - posted by Julius Schorzman <ju...@gmail.com> on 2006/07/13 21:19:01 UTC, 4 replies.
- Recrawl a specific web Page - posted by Lourival Júnior <ju...@gmail.com> on 2006/07/13 21:45:50 UTC, 0 replies.
- Extending scoring plugin - posted by Jacob Brunson <ja...@gmail.com> on 2006/07/13 23:41:44 UTC, 3 replies.
- Unused Segments - posted by Lourival Júnior <ju...@gmail.com> on 2006/07/14 14:29:58 UTC, 0 replies.
- Nullpointer exception dependent on search terms - posted by Chris Stephens <ch...@liveoakinteractive.com> on 2006/07/14 19:42:13 UTC, 1 replies.
- debian 3.1 - posted by "Schackenberg, Benedikt" <sc...@termindoc.de> on 2006/07/14 19:49:40 UTC, 0 replies.
- Nutch on Windows - posted by Kerry Wilson <kw...@wmsco.com> on 2006/07/14 20:49:43 UTC, 4 replies.
- Intranet Recrawl Script for 0.8.0 - posted by Matthew Holt <mh...@redhat.com> on 2006/07/14 22:33:27 UTC, 2 replies.
- Re: page ranking computation in Nutch 08 - posted by Feng Ji <fe...@gmail.com> on 2006/07/15 01:22:18 UTC, 0 replies.
- Parser returning several ParseData? - posted by HUYLEBROECK Jeremy RD-ILAB-SSF <je...@orange-ft.com> on 2006/07/15 03:16:08 UTC, 1 replies.
- Nuch 0.8-dev DmozParser output - posted by sinking <si...@bonbon.net> on 2006/07/15 18:39:21 UTC, 1 replies.
- Built-in Recrawl - posted by Matthew Holt <mh...@redhat.com> on 2006/07/15 21:05:55 UTC, 0 replies.
- Not crawling certain directories. - posted by Matthew Holt <mh...@redhat.com> on 2006/07/15 23:19:59 UTC, 1 replies.
- integrate nutch search engine with cms - posted by Abdelhakim Diab <ab...@gmail.com> on 2006/07/16 12:26:28 UTC, 3 replies.
- probl. big help me - posted by "Schackenberg, Benedikt" <sc...@termindoc.de> on 2006/07/16 20:01:13 UTC, 1 replies.
- help me - posted by "Schackenberg, Benedikt" <sc...@termindoc.de> on 2006/07/16 20:34:16 UTC, 2 replies.
- Volunteers requested for Web Spam Classification - posted by Stefan Groschupf <sg...@media-style.com> on 2006/07/17 05:43:43 UTC, 0 replies.
- Re: stemming - posted by bb...@mail.ru on 2006/07/17 07:51:44 UTC, 5 replies.
- crawl doesn't work - posted by bb...@mail.ru on 2006/07/17 10:09:14 UTC, 2 replies.
- Nutch 0.8 java 1.4/1.5 - posted by "Håvard W. Kongsgård" <h....@niap.no> on 2006/07/17 10:42:08 UTC, 1 replies.
- Vertical Search (Nutch) for Opensource Jobs- http://www.myopensourcejobs.com - posted by Sudhi Seshachala <su...@yahoo.com> on 2006/07/17 15:21:45 UTC, 4 replies.
- Crawl injected Domains only - posted by Ronny <ro...@metzgerei-lebek.de> on 2006/07/18 11:30:15 UTC, 8 replies.
- search in jsp generated pages - posted by yves-marie daniel <yv...@gmail.com> on 2006/07/18 11:45:10 UTC, 0 replies.
- Tutorial for Hadoop and Nutch nigtly build - posted by info <in...@radionav.it> on 2006/07/18 12:31:56 UTC, 0 replies.
- commons-cli-2.0-SNAPSHOT.jar exception - posted by seok keun oh <oh...@gmail.com> on 2006/07/18 13:11:17 UTC, 1 replies.
- Could we configure nutch-site.xml with two directories? - posted by nasm <ri...@gmail.com> on 2006/07/18 16:48:32 UTC, 4 replies.
- 0.8 – Will not accept url list file on Windows - posted by BDalton <bi...@uniform.ca> on 2006/07/18 22:05:50 UTC, 4 replies.
- Re: 0.8 Dev Will not accept url list file on Windows - posted by Sudhi Seshachala <su...@yahoo.com> on 2006/07/18 22:39:28 UTC, 0 replies.
- Re: 0.8 - Will not accept url list file on Windows - posted by Sudhi Seshachala <su...@yahoo.com> on 2006/07/18 23:39:14 UTC, 0 replies.
- missing, but declared functionality - posted by Tomi NA <he...@gmail.com> on 2006/07/19 19:00:34 UTC, 3 replies.
- Reworked recrawl script for 0.8.0 - posted by Matthew Holt <mh...@redhat.com> on 2006/07/20 00:25:50 UTC, 0 replies.
- Best performance approach for single MP machine? - posted by Doug Cook <na...@candiru.com> on 2006/07/20 08:34:38 UTC, 3 replies.
- Generate linkDb | hadoop/nutch 0.8 - posted by "Håvard W. Kongsgård" <h....@niap.no> on 2006/07/20 10:34:03 UTC, 3 replies.
- Please Help.. recrawl script.. will send out to the list when finished for 0.8.0 - posted by Matthew Holt <mh...@redhat.com> on 2006/07/20 16:07:26 UTC, 1 replies.
- Indexing segment | nutch 0.8/hadoop - posted by "Håvard W. Kongsgård" <h....@niap.no> on 2006/07/20 16:49:17 UTC, 0 replies.
- [Fwd: Reworked recrawl script for 0.8.0] - posted by Matthew Holt <mh...@redhat.com> on 2006/07/20 18:02:29 UTC, 0 replies.
- Nutch with Domino web server - posted by Deepa Devanathan <ti...@gmail.com> on 2006/07/21 16:21:22 UTC, 1 replies.
- Recrawl script for 0.8.0 completed... - posted by Matthew Holt <mh...@redhat.com> on 2006/07/21 16:53:57 UTC, 10 replies.
- PLease help... this has to be simple (re: mergesegs) - posted by Honda Search Administrator <ad...@honda-search.com> on 2006/07/21 18:25:39 UTC, 0 replies.
- Help associating domain name and ip address - posted by Sudhi Seshachala <su...@yahoo.com> on 2006/07/21 19:02:21 UTC, 0 replies.
- Why would a record be in the database but not show up in the results? - posted by Matt Timion <ad...@honda-search.com> on 2006/07/21 19:10:35 UTC, 2 replies.
- Hadoop and Recrawl - posted by Info <in...@radionav.it> on 2006/07/21 22:17:40 UTC, 2 replies.
- Null pointer error when perform search - posted by Eric Wu <ta...@gmail.com> on 2006/07/22 02:46:01 UTC, 1 replies.
- HELP ME PLEASE R: Hadoop and Nutch 0.8 - posted by Info <in...@radionav.it> on 2006/07/22 10:16:20 UTC, 0 replies.
- This is my tutorial for hadoop + nutch 0.8 I'm searching a tutorial for recrawl script for nutch+hadoop - posted by info <in...@radionav.it> on 2006/07/22 14:15:42 UTC, 0 replies.
- Nutch to...Frutch - posted by Hans Vallden <ha...@vallden.com> on 2006/07/22 14:39:51 UTC, 0 replies.
- Hadoop and Inject and Recrawl hadoop and nutch v0.8 WORK FINE!!!! - posted by roberto navoni <r....@radionav.it> on 2006/07/22 20:27:47 UTC, 0 replies.
- Dissecting the Nutch Search Page (Please Help!) - posted by Bryan Woliner <br...@gmail.com> on 2006/07/23 19:51:14 UTC, 1 replies.
- Please Help - Patch install - posted by Ronny <ro...@metzgerei-lebek.de> on 2006/07/24 10:03:21 UTC, 6 replies.
- Search with sponsored ads? - posted by Chun Wei Ho <cw...@gmail.com> on 2006/07/24 10:04:12 UTC, 4 replies.
- installation de Nutch - posted by kawther khazri <nu...@yahoo.fr> on 2006/07/24 16:24:40 UTC, 0 replies.
- Nutch 0.8-dev? - posted by Matthew Holt <mh...@redhat.com> on 2006/07/24 17:56:46 UTC, 2 replies.
- Nutch questions... - posted by Robert Sanford <rs...@trefs.com> on 2006/07/24 22:31:23 UTC, 0 replies.
- Lucene question - posted by "Rajan, Renuka" <re...@navteq.com> on 2006/07/24 23:03:37 UTC, 0 replies.
- Re: Lucene question - posted by Renaud Richardet <re...@wyona.com> on 2006/07/25 00:18:06 UTC, 3 replies.
- Nutch Problem on Godaddy.com server: Can't find bundle for base name org.nutch.jsp.search - posted by WAJ <wa...@yahoo.fr> on 2006/07/25 03:55:23 UTC, 0 replies.
- Injecting Into Intranet Crawl - posted by Robert Sanford <rs...@trefs.com> on 2006/07/25 04:16:11 UTC, 2 replies.
- Links - posted by te...@gmail.com on 2006/07/25 14:28:18 UTC, 1 replies.
- Please Help - Patch not working - external links still crawled - posted by Ronny <ro...@metzgerei-lebek.de> on 2006/07/25 15:33:06 UTC, 2 replies.
- Two Errors in Nutch 0.8 Tutorial? - posted by Bryan Woliner <br...@gmail.com> on 2006/07/25 16:00:11 UTC, 2 replies.
- Problem with logging of Fetcher output in 0.8-dev - posted by e w <ep...@gmail.com> on 2006/07/25 18:24:50 UTC, 0 replies.
- Nutch 0.8 – Spell Check - posted by BDalton <bi...@uniform.ca> on 2006/07/26 00:14:27 UTC, 0 replies.
- Nutch with nsf files - posted by Deepa Devanathan <ti...@gmail.com> on 2006/07/26 10:01:40 UTC, 1 replies.
- installation de nutch - posted by kawther khazri <nu...@yahoo.fr> on 2006/07/26 13:31:19 UTC, 3 replies.
- [Fwd: [Fwd: Re: [jira] Commented: (NUTCH-271) Meta-data per URL/site/section]] - posted by Sami Siren <ss...@gmail.com> on 2006/07/26 13:47:18 UTC, 5 replies.
- Howto deploy a ROOT.war (if needed) - posted by "NG-Marketing, M.Schneider" <sc...@ng-marketing.com> on 2006/07/26 16:09:25 UTC, 0 replies.
- 0.8 much slower than 0.7 - posted by Vasja Ocvirk <va...@vizija.si> on 2006/07/26 16:23:41 UTC, 5 replies.
- Getting Keywords from Metatags - posted by Dennis Kubes <nu...@dragonflymc.com> on 2006/07/26 21:21:05 UTC, 0 replies.
- How to add database to an existing nutch index? - posted by Patrick Kratzenstein <pk...@googlemail.com> on 2006/07/27 12:15:33 UTC, 0 replies.
- Total time of a search - posted by Lourival Júnior <ju...@gmail.com> on 2006/07/27 14:17:34 UTC, 1 replies.
- nutch analyze - posted by "NG-Marketing, M.Schneider" <sc...@ng-marketing.com> on 2006/07/27 14:25:47 UTC, 0 replies.
- Embedded Docs - posted by Oleg Galkin <ol...@hotmail.com> on 2006/07/27 15:18:36 UTC, 0 replies.
- mergesegs tool hangs up - posted by Dima Mazmanov <nu...@proservice.ge> on 2006/07/27 16:20:12 UTC, 0 replies.
- Re[2]: stemming - posted by bb...@mail.ru on 2006/07/27 18:25:08 UTC, 0 replies.
- Plugin Documentation - posted by Matthew Holt <mh...@redhat.com> on 2006/07/27 19:08:00 UTC, 0 replies.
- Namenode, Jobtracker don't start correctly - posted by Vishal Shah <vi...@rediff.co.in> on 2006/07/28 06:32:10 UTC, 0 replies.
- [ANNOUNCE] nutch 0.8 - posted by Sami Siren <ss...@gmail.com> on 2006/07/28 14:22:37 UTC, 0 replies.
- java.lang.NoClassDefFoundError - posted by Lourival Júnior <ju...@gmail.com> on 2006/07/28 16:25:15 UTC, 1 replies.
- Re: stemming - RESOLVED - posted by Matthew Holt <mh...@redhat.com> on 2006/07/28 17:03:25 UTC, 1 replies.
- "unknown protocol" and some other problems in 0.8. - posted by Daniel Varela Santoalla <dv...@ecmwf.int> on 2006/07/28 18:18:05 UTC, 0 replies.
- Starting Nutch in init.d? - posted by Bill Goffe <go...@Oswego.EDU> on 2006/07/28 18:34:24 UTC, 4 replies.
- Recrawling... methodology? - posted by Matthew Holt <mh...@redhat.com> on 2006/07/28 21:40:52 UTC, 0 replies.
- Multiable Server setup - posted by aisg <ai...@aol.com> on 2006/07/29 05:46:14 UTC, 0 replies.
- nutch 0.8: invertlinks IOException segments/parse_data - posted by Alexander E Genaud <lx...@pobox.com> on 2006/07/29 16:42:13 UTC, 2 replies.
- Re[2]: stemming - RESOLVED - posted by bb...@mail.ru on 2006/07/29 16:51:30 UTC, 0 replies.
- nutch 0.8 and luke - posted by Tomi NA <he...@gmail.com> on 2006/07/29 20:35:04 UTC, 3 replies.
- Sync 2 different DB. - posted by Boon Siong <bo...@asiaep.com> on 2006/07/31 04:22:48 UTC, 0 replies.
- nutch crawl on a site that needs authentication - posted by Deepa Devanathan <ti...@gmail.com> on 2006/07/31 08:27:46 UTC, 1 replies.
- nutch and external database - posted by aicha BEN <ai...@yahoo.com> on 2006/07/31 14:00:13 UTC, 0 replies.
- [Fwd: Recrawling... methodology?] - posted by Matthew Holt <mh...@redhat.com> on 2006/07/31 16:15:05 UTC, 0 replies.
- max file size vs. available RAM size: crawl uses up all available memory - posted by Tomi NA <he...@gmail.com> on 2006/07/31 17:13:02 UTC, 0 replies.