You are viewing a plain text version of this content. The canonical link for it is here.
- crawling a certain site - posted by Cam Bazz <ca...@gmail.com> on 2006/08/01 16:12:29 UTC, 2 replies.
- linkdbmerge - posted by Murat Ali Bayir <mu...@agmlab.com> on 2006/08/01 19:35:41 UTC, 3 replies.
- Recrawling until there's nothing left, or depth N, whichever comes first - posted by Benjamin Higgins <bh...@gmail.com> on 2006/08/02 01:07:24 UTC, 0 replies.
- Questions about (re)crawling - posted by Benjamin Higgins <bh...@gmail.com> on 2006/08/02 01:22:19 UTC, 4 replies.
- Re: 0.8 much slower than 0.7 - posted by Vasja Ocvirk <va...@vizija.si> on 2006/08/02 09:30:46 UTC, 2 replies.
- Nutch Install problems - posted by Fred Tyre <fr...@hlipublishing.com> on 2006/08/02 14:29:59 UTC, 2 replies.
- no serach result for powerpoint file - posted by aicha BEN <ai...@yahoo.com> on 2006/08/02 17:37:28 UTC, 0 replies.
- [0.7.2] - Custom fields - Reasonable limit - posted by Philippe EUGENE <ph...@neuf.fr> on 2006/08/02 18:00:46 UTC, 0 replies.
- Installation help - posted by EDGAR CHIBAKA <ch...@btinternet.com> on 2006/08/02 18:16:24 UTC, 0 replies.
- sorry, test - posted by webmaster <we...@ogmi.farlep.odessa.ua> on 2006/08/02 19:02:54 UTC, 2 replies.
- Querying Fields - posted by Matthew Holt <mh...@redhat.com> on 2006/08/02 22:58:05 UTC, 19 replies.
- 0.8 Recrawl script updated - posted by Matthew Holt <mh...@redhat.com> on 2006/08/02 23:13:49 UTC, 5 replies.
- Help: No content fetched in V0.8 - posted by Bob Song <da...@163.com> on 2006/08/03 05:57:28 UTC, 0 replies.
- -numFetchers in generate command - posted by Murat Ali Bayir <mu...@agmlab.com> on 2006/08/03 10:07:53 UTC, 5 replies.
- Recrawl urls - posted by Nahuel ANGELINETTI <na...@develog.com> on 2006/08/03 11:50:57 UTC, 8 replies.
- configuration property fetcher.store.content - posted by Timo Scheuer <ti...@dfki.de> on 2006/08/03 14:15:17 UTC, 1 replies.
- Re: Howto deploy a ROOT.war (if needed) - posted by Timo Scheuer <ti...@dfki.de> on 2006/08/03 14:30:57 UTC, 0 replies.
- Re: How to add database to an existing nutch index? - posted by Timo Scheuer <ti...@dfki.de> on 2006/08/03 16:15:27 UTC, 0 replies.
- NullPointException - posted by Lourival Júnior <ju...@gmail.com> on 2006/08/03 16:33:31 UTC, 4 replies.
- ZIP plugin in nutch 0.7.2 - posted by Lourival Júnior <ju...@gmail.com> on 2006/08/03 21:32:24 UTC, 0 replies.
- indexing or search problem? - posted by Rocio Chongtay <oc...@yahoo.com> on 2006/08/04 12:33:07 UTC, 2 replies.
- question re ant job task: why is hadoop jar not included - posted by Renaud Richardet <re...@wyona.com> on 2006/08/04 20:02:29 UTC, 0 replies.
- Fetch jumps to 1.0 complete - posted by Dennis Kubes <nu...@dragonflymc.com> on 2006/08/04 21:24:11 UTC, 13 replies.
- how fetcher thread termination - posted by jian chen <ch...@gmail.com> on 2006/08/05 03:52:20 UTC, 1 replies.
- Warning! IndexMerger trashed our DFS - posted by Chris Schneider <Sc...@TransPac.com> on 2006/08/05 06:35:47 UTC, 0 replies.
- fetcher failure - posted by Feng Ji <fe...@gmail.com> on 2006/08/06 06:05:54 UTC, 1 replies.
- Problems and questions on 0.8 - posted by Iain <ia...@idcl.co.uk> on 2006/08/06 13:15:21 UTC, 2 replies.
- Works by Adding Agency --- Data Re: fetcher failure - posted by Feng Ji <fe...@gmail.com> on 2006/08/06 14:39:52 UTC, 0 replies.
- How do I find out what is goint wrong? - posted by Iain <ia...@idcl.co.uk> on 2006/08/06 19:28:55 UTC, 1 replies.
- RE: How.... - posted by Iain <ia...@idcl.co.uk> on 2006/08/07 10:52:41 UTC, 0 replies.
- problems in logs files on 0.8 - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/07 13:40:39 UTC, 1 replies.
- Distributed Searching Index Size - posted by Dennis Kubes <nu...@dragonflymc.com> on 2006/08/07 22:25:37 UTC, 2 replies.
- nutch08 indexer error - posted by Feng Ji <fe...@gmail.com> on 2006/08/08 03:00:20 UTC, 4 replies.
- Re: Search with sponsored ads? - posted by Chun Wei Ho <cw...@gmail.com> on 2006/08/08 04:07:18 UTC, 1 replies.
- How do I write a nutch query. - posted by Fred Tyre <fr...@hlipublishing.com> on 2006/08/08 14:57:28 UTC, 6 replies.
- problems to start up all of the hadoop servers on the local machine - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/08 16:44:19 UTC, 0 replies.
- boosting - posted by Matthew Holt <mh...@redhat.com> on 2006/08/08 16:50:06 UTC, 0 replies.
- Possible bug in nutch crawl - posted by Fred Tyre <fr...@hlipublishing.com> on 2006/08/08 18:26:48 UTC, 0 replies.
- Re: [Fwd: Re: 0.8 Recrawl script updated] - posted by Matthew Holt <mh...@redhat.com> on 2006/08/08 20:59:30 UTC, 6 replies.
- parse-oo plugin - posted by Matthew Holt <mh...@redhat.com> on 2006/08/08 22:54:58 UTC, 0 replies.
- Plugins not found - posted by "Hamaker, Janna" <ja...@amazon.com> on 2006/08/09 01:33:16 UTC, 0 replies.
- Feedparser 0.6 fork source code - posted by HUYLEBROECK Jeremy RD-ILAB-SSF <je...@orange-ft.com> on 2006/08/09 01:55:12 UTC, 1 replies.
- Single DFS or alternative architectures for performance? - posted by Murat Ali Bayir <mu...@agmlab.com> on 2006/08/09 09:45:37 UTC, 2 replies.
- Aborting with hung threads - posted by Uroš Gruber <ur...@sir-mag.com> on 2006/08/09 10:18:54 UTC, 1 replies.
- problems with the dfs commande - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/09 10:21:09 UTC, 1 replies.
- Error in 0.8 regex-urlfilter.txt - posted by Matthew Holt <mh...@redhat.com> on 2006/08/09 15:51:19 UTC, 0 replies.
- Crawling the entire web -- what's involved? - posted by Chris <sh...@yahoo.com> on 2006/08/09 19:46:51 UTC, 5 replies.
- crawl webpages and files at the same time - posted by Renaud Richardet <re...@wyona.com> on 2006/08/09 19:59:19 UTC, 0 replies.
- Re: [Nutch-general] Single DFS or alternative architectures for performance? - posted by og...@yahoo.com on 2006/08/09 21:16:28 UTC, 0 replies.
- HTMLParseFilter is not called by ParseSegment (nutch parse command) - posted by Bipin Parmar <bi...@yahoo.com> on 2006/08/09 21:47:23 UTC, 1 replies.
- Advice on adding custom search operator that has special(?) characters - posted by Benjamin Higgins <bh...@gmail.com> on 2006/08/10 02:12:28 UTC, 1 replies.
- how to show log in nutch-0.8. release package - posted by Feng Ji <fe...@gmail.com> on 2006/08/10 03:49:50 UTC, 0 replies.
- problems with start-all command - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/10 10:39:21 UTC, 1 replies.
- Crawling flash - posted by Iain <ia...@idcl.co.uk> on 2006/08/10 10:39:39 UTC, 3 replies.
- problem with the DFS commande - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/10 10:54:44 UTC, 0 replies.
- Extended crawling configuration with "mapred.input.value.class"? - posted by Timo Scheuer <ti...@dfki.de> on 2006/08/10 12:22:46 UTC, 0 replies.
- number of mapper - posted by Murat Ali Bayir <mu...@agmlab.com> on 2006/08/10 14:29:45 UTC, 5 replies.
- Index with synonyms - posted by Keyserzero <ke...@yahoo.de> on 2006/08/10 15:10:01 UTC, 0 replies.
- More Fetcher NullPointerException - posted by "Sellek, Greg" <GS...@NameProtect.com> on 2006/08/10 18:44:54 UTC, 2 replies.
- file access rights/permissions considerations - the least painful way - posted by Tomi NA <he...@gmail.com> on 2006/08/10 19:17:44 UTC, 0 replies.
- common-terms.utf8 - posted by Lourival Júnior <ju...@gmail.com> on 2006/08/10 20:58:51 UTC, 3 replies.
- Nutch vs. Google Appliance - posted by "Stevenson, Kerry" <Ke...@gwl.ca> on 2006/08/10 23:28:01 UTC, 2 replies.
- crawl-urlfilter subpages of domains - posted by Jens Martin Schubert <sc...@rz.tu-clausthal.de> on 2006/08/10 23:38:24 UTC, 2 replies.
- Stalling during fetch (0.7) - posted by Benjamin Higgins <bh...@gmail.com> on 2006/08/11 00:04:09 UTC, 2 replies.
- Google feature in Nutch - posted by Florian Fricker <fl...@wyona.com> on 2006/08/11 10:46:25 UTC, 0 replies.
- log4j.properties - posted by Murat Ali Bayir <mu...@agmlab.com> on 2006/08/11 12:56:41 UTC, 0 replies.
- turn on debug log on nutch-0.8. - posted by Feng Ji <fe...@gmail.com> on 2006/08/12 00:36:16 UTC, 2 replies.
- Re: [Nutch-general] log4j.properties - posted by og...@yahoo.com on 2006/08/12 04:40:40 UTC, 0 replies.
- Re: [Nutch-general] Google feature in Nutch - posted by og...@yahoo.com on 2006/08/12 04:43:37 UTC, 8 replies.
- Re: [Nutch-general] common-terms.utf8 - posted by og...@yahoo.com on 2006/08/12 04:56:56 UTC, 1 replies.
- log4j.properties bug(?) - posted by og...@yahoo.com on 2006/08/12 05:14:34 UTC, 2 replies.
- hadoop.log vs. nutch.log - posted by og...@yahoo.com on 2006/08/12 05:17:45 UTC, 1 replies.
- build kaput: plugins' jars can't be found - posted by og...@yahoo.com on 2006/08/12 05:38:06 UTC, 0 replies.
- On fetcher slowness - posted by og...@yahoo.com on 2006/08/12 07:17:51 UTC, 6 replies.
- Re: [Nutch-general] log4j.properties bug(?) - posted by og...@yahoo.com on 2006/08/12 07:26:22 UTC, 3 replies.
- [Nutch-0.8] Missing WAR file - posted by Hou Keat Lee <qu...@gmail.com> on 2006/08/12 17:50:45 UTC, 5 replies.
- crawl w/o store - posted by Paul M Lieberman <pa...@alum.mit.edu> on 2006/08/12 21:38:02 UTC, 1 replies.
- Re: [Nutch-general] On fetcher slowness - posted by og...@yahoo.com on 2006/08/13 05:23:34 UTC, 1 replies.
- nutch-xml.conf - posted by "Insurance Squared Inc." <gc...@insurancesquared.com> on 2006/08/13 18:28:35 UTC, 0 replies.
- reducer can't start - posted by Murat Ali Bayir <mu...@agmlab.com> on 2006/08/14 08:50:53 UTC, 0 replies.
- RE : Re: problems with start-all command - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/14 09:53:07 UTC, 0 replies.
- Subcollection setup and use - posted by Bud Witney <wi...@osu.edu> on 2006/08/14 16:53:28 UTC, 4 replies.
- How can I change the tmp directory location for nutch? - posted by Matt Timion <ad...@honda-search.com> on 2006/08/14 21:13:46 UTC, 1 replies.
- searching on multiple subcollections - posted by Mark Jones <ma...@quovadx.com> on 2006/08/15 00:28:59 UTC, 0 replies.
- The Nutch Crawler and the Web Link Graph - posted by John Casey <jo...@gmail.com> on 2006/08/15 14:00:30 UTC, 2 replies.
- Any plans to move to build Nutchusing Maven? - posted by steven shingler <sh...@gmail.com> on 2006/08/15 18:05:05 UTC, 4 replies.
- Re: nutch installer - posted by thegallier <th...@gmail.com> on 2006/08/15 23:51:27 UTC, 1 replies.
- Neko parsing fix inadvertently reverted? - posted by Benjamin Higgins <bh...@gmail.com> on 2006/08/16 00:23:56 UTC, 0 replies.
- Re: Starting Nutch in init.d? - posted by Bill Goffe <go...@oswego.edu> on 2006/08/16 02:04:20 UTC, 1 replies.
- Webinterface ignores hidden language field - posted by David Podunavac <da...@wyona.com> on 2006/08/16 16:16:44 UTC, 0 replies.
- Creating a store of search terms - posted by steven shingler <sh...@gmail.com> on 2006/08/16 18:29:09 UTC, 0 replies.
- Underlined Phrases - posted by Marco Vanossi <ma...@gmail.com> on 2006/08/16 22:31:25 UTC, 1 replies.
- what Linux distribution goes best with Nutch? - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/17 14:39:11 UTC, 6 replies.
- Help to understand Nutch segment and index - posted by "Sellek, Greg" <GS...@NameProtect.com> on 2006/08/17 15:56:03 UTC, 0 replies.
- Nutch-site.xml properties missing from nutch.xml - posted by Jonathan Addison <jo...@wyona.com> on 2006/08/17 21:33:07 UTC, 0 replies.
- Hadoop replication warning - posted by HUYLEBROECK Jeremy RD-ILAB-SSF <je...@orange-ft.com> on 2006/08/18 00:43:50 UTC, 3 replies.
- RE : Re: what Linux distribution goes best with Nutch? - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/18 11:01:46 UTC, 1 replies.
- Nutch doesn't dive deeper - posted by Michael Wechner <mi...@wyona.com> on 2006/08/18 11:42:26 UTC, 7 replies.
- problem with the rpmbuild commande - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/18 12:05:41 UTC, 1 replies.
- uploading the nutch war file - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/19 13:43:12 UTC, 2 replies.
- How to Search in Category? - posted by victor_emailbox <vi...@yahoo.com> on 2006/08/20 07:19:48 UTC, 1 replies.
- index/search filtering by category - posted by Ernesto De Santis <de...@yahoo.com.ar> on 2006/08/20 22:08:17 UTC, 8 replies.
- show additional lucene index information on Nutch's Search Page - posted by Feng Ji <fe...@gmail.com> on 2006/08/20 22:30:45 UTC, 1 replies.
- RE : Re: uploading the nutch war file - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/21 10:20:29 UTC, 2 replies.
- problem in starting tomcat5 - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/21 11:27:54 UTC, 1 replies.
- RE : Re: RE : Re: uploading the nutch war file - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/21 11:50:59 UTC, 0 replies.
- RE : Re: problem in starting tomcat5 - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/21 12:06:39 UTC, 1 replies.
- Zip Plugin - posted by Lourival Júnior <ju...@gmail.com> on 2006/08/21 18:19:24 UTC, 0 replies.
- Re: Problem with logging of Fetcher output in 0.8-dev - posted by Doug Cook <na...@candiru.com> on 2006/08/21 22:55:50 UTC, 3 replies.
- problem in crawling...... - posted by Abdelhakim Diab <ab...@gmail.com> on 2006/08/22 11:33:47 UTC, 1 replies.
- Re: ö ü ä! German language - posted by dee <da...@wyona.com> on 2006/08/22 16:11:02 UTC, 1 replies.
- differ search in filesystem or webpages - posted by David Podunavac <da...@wyona.com> on 2006/08/22 17:41:07 UTC, 0 replies.
- log4j:WARN Please initialize the log4j system properly. - posted by Chris Stephens <ch...@liveoakinteractive.com> on 2006/08/22 22:51:11 UTC, 1 replies.
- How does Nutch-0.7.2 data upgrade to 0.8? - posted by King Kong <ch...@hotmail.com> on 2006/08/23 10:13:47 UTC, 10 replies.
- How long to get 100 million page - posted by Bui Quang Hung <bq...@nishilab.sys.es.osaka-u.ac.jp> on 2006/08/23 13:49:42 UTC, 3 replies.
- File paths with symbolic links in crawl-urlfilter.txt do not work - posted by Renaud Richardet <re...@wyona.com> on 2006/08/23 17:24:08 UTC, 0 replies.
- What is the purpose of the nutch-*.job file - posted by Michael Wechner <mi...@wyona.com> on 2006/08/24 11:05:33 UTC, 2 replies.
- pb in installing stax-bea - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/24 11:47:47 UTC, 1 replies.
- RE : Re: pb in installing stax-bea - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/24 13:13:08 UTC, 1 replies.
- nutch - start me up - help please - posted by Philip Brown <ph...@primeradesigns.com> on 2006/08/24 14:45:46 UTC, 1 replies.
- restarting fetch - posted by Richard Braman <rb...@bramantax.com> on 2006/08/24 16:00:39 UTC, 1 replies.
- no accent and stemming problem - posted by aicha BEN <ai...@yahoo.com> on 2006/08/24 17:41:07 UTC, 0 replies.
- Query refinement - posted by qu...@webmail.co.za on 2006/08/25 00:22:15 UTC, 3 replies.
- no result for crawling 2000 site seed , for nutch 08 release package - posted by Feng Ji <fe...@gmail.com> on 2006/08/25 04:49:22 UTC, 2 replies.
- new configuration proposal in nutch-site.xml (maximum url length) - posted by Murat Ali Bayir <mu...@agmlab.com> on 2006/08/25 13:01:38 UTC, 3 replies.
- Speeding up compilation without compiling plugins - posted by Michael Wechner <mi...@wyona.com> on 2006/08/25 14:48:15 UTC, 4 replies.
- Making crawler stop after all pages are found. - posted by Sandy Polanski <sa...@yahoo.com> on 2006/08/27 06:23:24 UTC, 4 replies.
- How to Force Nutch to Crawl Only Certain Sites' Pages? - posted by victor_emailbox <vi...@yahoo.com> on 2006/08/27 06:55:45 UTC, 0 replies.
- Is there a way to get Nutch to parse/index by file access directly (not over HTTP)? - posted by Sandy Polanski <sa...@yahoo.com> on 2006/08/27 13:41:44 UTC, 1 replies.
- High System Cpu usage while Fetching,Is it normal? - posted by Jeff Cai <je...@gmail.com> on 2006/08/27 15:16:02 UTC, 0 replies.
- CrawlDbReducer failed Fetcher - posted by Feng Ji <fe...@gmail.com> on 2006/08/27 15:16:12 UTC, 0 replies.
- 0.8.23 Inject would not inject all Urls. - posted by Frank Kempf <fl...@2112portals.com> on 2006/08/27 18:44:36 UTC, 2 replies.
- using tomcat5 - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/28 11:46:00 UTC, 2 replies.
- pb with tomcat5 - posted by kawther khazri <nu...@yahoo.fr> on 2006/08/28 12:43:07 UTC, 0 replies.
- opensearchservlet - HowTo ? - posted by Philip Brown <ph...@primeradesigns.com> on 2006/08/28 15:30:47 UTC, 5 replies.
- Merge result-sets from two or more different indices - posted by Michael Wechner <mi...@wyona.com> on 2006/08/28 16:56:32 UTC, 0 replies.
- Importing already crawled web pages - posted by Zhen Zhen <zh...@cs.dal.ca> on 2006/08/28 18:23:03 UTC, 0 replies.
- Re: RSS search by nutch - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2006/08/28 18:55:15 UTC, 5 replies.
- How to start nutch?? - posted by nutnoob <ha...@hotmail.com> on 2006/08/28 19:32:43 UTC, 1 replies.
- processing parallel sites - posted by bruce <be...@earthlink.net> on 2006/08/28 19:39:24 UTC, 2 replies.
- how to set NUTCH_JAVA_HOME - posted by nutnoob <ha...@hotmail.com> on 2006/08/29 07:23:16 UTC, 3 replies.
- I got something like this when try to run nutch in eclipe - posted by nutnoob <ha...@hotmail.com> on 2006/08/29 07:35:56 UTC, 2 replies.
- A text-based search engine - posted by Bui Quang Hung <bq...@nishilab.sys.es.osaka-u.ac.jp> on 2006/08/29 08:00:13 UTC, 2 replies.
- Re: how to unsubscribe? - posted by Philip Brown <ph...@primeradesigns.com> on 2006/08/29 09:20:01 UTC, 1 replies.
- problem with RTF parsing - posted by aicha BEN <ai...@yahoo.com> on 2006/08/29 16:26:36 UTC, 1 replies.
- Re : problem with RTF parsing - posted by aicha BEN <ai...@yahoo.com> on 2006/08/30 12:09:33 UTC, 1 replies.
- when to use cmd "parse" to parse a segment's pages - posted by Feng Ji <fe...@gmail.com> on 2006/08/30 14:40:43 UTC, 2 replies.
- svn code: no status line reported in log - posted by AJ Chen <ca...@gmail.com> on 2006/08/30 17:55:48 UTC, 0 replies.
- DmozParser questions - posted by Andy Markham <an...@gmail.com> on 2006/08/30 19:31:28 UTC, 0 replies.
- Follow urls with GET/Query String? - posted by Chris Stephens <ch...@liveoakinteractive.com> on 2006/08/30 21:22:13 UTC, 1 replies.
- intranet crawl problems: mime types; .doc-related exceptions; really, really slow crawl + possible infinite loop - posted by Tomi NA <he...@gmail.com> on 2006/08/30 21:25:23 UTC, 2 replies.
- Re: searching by categories - posted by Ernesto De Santis <de...@yahoo.com.ar> on 2006/08/30 23:19:11 UTC, 1 replies.
- httpclient fetcher error in hadoop log - posted by Feng Ji <fe...@gmail.com> on 2006/08/30 23:47:55 UTC, 3 replies.
- some urls in fetch list is not being fetched - posted by Feng Ji <fe...@gmail.com> on 2006/08/31 04:27:40 UTC, 1 replies.
- How to Make Nutch Return Search Results Belonged to the Crawl URL List? - posted by victor_emailbox <vi...@yahoo.com> on 2006/08/31 06:38:15 UTC, 1 replies.
- Re: How to Make Nutch Return Search Results Belonged to the Crawl URL Li - posted by victor_emailbox <vi...@yahoo.com> on 2006/08/31 07:21:03 UTC, 1 replies.
- bug or feature - posted by Uroš Gruber <ur...@sir-mag.com> on 2006/08/31 20:16:32 UTC, 0 replies.
- Re: bug or feature - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/08/31 20:51:35 UTC, 1 replies.
- indexing folders with nutch - posted by Cam Bazz <ca...@gmail.com> on 2006/08/31 23:20:09 UTC, 0 replies.