You are viewing a plain text version of this content. The canonical link for it is here.
- Is Nutch Administration still active? - posted by Martin Xu <ma...@gmail.com> on 2007/11/01 03:52:24 UTC, 1 replies.
- Re: Language not supported in Carrot2 - posted by Uygar BAYAR <uy...@beriltech.com> on 2007/11/01 09:22:43 UTC, 1 replies.
- Re: [URGENT] : Query regarding handling multiple index with nutch.... - posted by Ravi Chintakunta <ra...@gmail.com> on 2007/11/01 12:30:34 UTC, 0 replies.
- Re: XMLParser for Nutch - posted by Sebastian Steinmetz <s....@mederi-research.de> on 2007/11/01 13:58:39 UTC, 1 replies.
- Why I can't install plugin in nutch-0.9 - posted by Xin Zhang <nu...@gmail.com> on 2007/11/01 14:58:13 UTC, 2 replies.
- Multiple Domains Search - posted by karthik085 <ka...@gmail.com> on 2007/11/01 20:25:51 UTC, 3 replies.
- RE: Restricting query to a domain - posted by karthik085 <ka...@gmail.com> on 2007/11/02 04:19:16 UTC, 0 replies.
- restrict indexing only to a domain list with no using crawl-urlfilter - posted by rubenll <ru...@hotmail.com> on 2007/11/02 18:17:37 UTC, 2 replies.
- Is there any plugin for data extraction using Xpath, XQuery or regex for nutch - posted by Anarus <as...@gmail.com> on 2007/11/03 10:13:45 UTC, 0 replies.
- looking for "hire" dev for a customization - posted by rubenll <ru...@hotmail.com> on 2007/11/03 12:59:12 UTC, 0 replies.
- Different Analyzers - posted by karthik085 <ka...@gmail.com> on 2007/11/04 06:00:47 UTC, 1 replies.
- I only need fetcher of Nutch,i need not index of Nutch.How to i input segments to my database's tables. - posted by xingjian <xi...@gmail.com> on 2007/11/05 09:14:48 UTC, 2 replies.
- Template/Menu Detection - posted by Emmanuel <jo...@gmail.com> on 2007/11/05 16:11:36 UTC, 1 replies.
- Out of Memory Error While Crawling - posted by Kunal Wku <wk...@yahoo.com> on 2007/11/05 18:28:39 UTC, 2 replies.
- Reduce copy slow ? - posted by Karol Rybak <ka...@gmail.com> on 2007/11/06 14:23:21 UTC, 0 replies.
- Problem with partititioning - posted by Karol Rybak <ka...@gmail.com> on 2007/11/06 14:58:25 UTC, 1 replies.
- help for a nutch beginner - posted by Josh Attenberg <jo...@gmail.com> on 2007/11/06 16:06:42 UTC, 5 replies.
- how can i get the document object in Nutch. - posted by xingjian <xi...@gmail.com> on 2007/11/07 01:51:14 UTC, 2 replies.
- How i can read the index of Nutch by Lucene's IndexReader. - posted by xingjian <xi...@gmail.com> on 2007/11/07 02:30:54 UTC, 4 replies.
- multiple crawl-urlfilter.txt files for different sites - posted by jian chen <ch...@gmail.com> on 2007/11/07 07:51:40 UTC, 1 replies.
- parser problem - posted by Uygar BAYAR <uy...@beriltech.com> on 2007/11/07 12:37:34 UTC, 0 replies.
- SaveSearch or Adult Filter - posted by Milan Krendzelak <mk...@mtld.mobi> on 2007/11/07 15:24:37 UTC, 0 replies.
- nutch-user@lucene.apache.org - posted by DigitalPebble <ju...@digitalpebble.com> on 2007/11/07 15:36:20 UTC, 0 replies.
- Re: SaveSearch or Adult - posted by Milan Krendzelak <mk...@mtld.mobi> on 2007/11/07 17:07:06 UTC, 0 replies.
- Re: Using nutch just for the crawler/fetcher - posted by jian chen <ch...@gmail.com> on 2007/11/07 20:09:46 UTC, 0 replies.
- [HOW-TO] How to make Nutch Ignore META Tags - posted by karthik085 <ka...@gmail.com> on 2007/11/07 20:29:44 UTC, 0 replies.
- Re: How to limit nutch to fetch, refetch and index just the injected URLs? - posted by karthik085 <ka...@gmail.com> on 2007/11/07 21:17:46 UTC, 0 replies.
- How to returns the stored fields of the Document in this index of Nutch? - posted by xingjian <xi...@gmail.com> on 2007/11/08 02:04:19 UTC, 2 replies.
- slow crawl... - posted by Sebastien Rainville <sr...@brightspark.com> on 2007/11/08 06:31:15 UTC, 1 replies.
- How can I know the Cached Web Charset - posted by crossafire <cr...@gmail.com> on 2007/11/08 09:09:52 UTC, 2 replies.
- noob wants to know: joining with a relational database result, is it possible? - posted by hank williams <ha...@gmail.com> on 2007/11/08 10:42:41 UTC, 1 replies.
- OR query (NUTCH-479) - posted by Sebastian Steinmetz <s....@mederi-research.de> on 2007/11/08 16:51:45 UTC, 0 replies.
- search custom field with search.jsp - posted by jeff gelb <jg...@pearsoncmg.com> on 2007/11/08 19:11:22 UTC, 2 replies.
- Cluster hadoop-site.xml Settings - posted by Daniel Clark <da...@verizon.net> on 2007/11/08 19:46:15 UTC, 1 replies.
- java.lang.NoClassDefFoundError Nutch 0.9 - posted by karthik085 <ka...@gmail.com> on 2007/11/08 21:12:26 UTC, 3 replies.
- error using JobStream.py - posted by Josh Attenberg <jo...@gmail.com> on 2007/11/08 22:25:33 UTC, 1 replies.
- Hadoop .15 and eclipse on windows - posted by Tim Gautier <ti...@gmail.com> on 2007/11/09 01:28:02 UTC, 4 replies.
- crawl on non-standard port, index/search on port 80? - posted by jgelb <jg...@pearsoncmg.com> on 2007/11/09 22:13:04 UTC, 0 replies.
- Nutch-0.9 plugins, trouble with ant 1.6.5 and 1.7 - posted by Mark Bennett <mb...@ideaeng.com> on 2007/11/10 02:29:25 UTC, 3 replies.
- Fetching many pages off LAN - posted by Matei Zaharia <ma...@eecs.berkeley.edu> on 2007/11/10 20:57:19 UTC, 5 replies.
- How to writes the results of successful fetcher to database. - posted by xingjian <xi...@gmail.com> on 2007/11/12 03:55:31 UTC, 4 replies.
- URI is not absolute... - posted by paradise <pa...@gmail.com> on 2007/11/13 13:07:20 UTC, 2 replies.
- java.io.IOException: Unknown format version:-3 - posted by paradise <pa...@gmail.com> on 2007/11/13 13:13:11 UTC, 1 replies.
- Indexing process - posted by payo <pa...@yahoo.com> on 2007/11/13 19:52:04 UTC, 0 replies.
- run the crawl - posted by payo <pa...@yahoo.com> on 2007/11/13 19:59:18 UTC, 2 replies.
- java.net.SocketException: Connection reset when using too many threads - posted by eyal edri <ey...@gmail.com> on 2007/11/14 15:37:36 UTC, 0 replies.
- Higher depth, fewer urls? - posted by Annona Keene <an...@yahoo.com> on 2007/11/14 17:45:59 UTC, 1 replies.
- results display for languages other than English - posted by charlie w <sp...@gmail.com> on 2007/11/14 18:28:44 UTC, 0 replies.
- configuration Nutch - posted by payo <pa...@yahoo.com> on 2007/11/14 23:14:21 UTC, 0 replies.
- Error when using nutch - posted by "Brehm, Robert P" <Ro...@xerox.com> on 2007/11/15 00:34:03 UTC, 1 replies.
- Mobile web sites - posted by Yari M <ya...@mail.ru> on 2007/11/15 21:26:06 UTC, 0 replies.
- indexing word file - posted by crazy <el...@gmail.com> on 2007/11/16 09:15:58 UTC, 9 replies.
- Exception in thread "main" java.lang.IllegalArgumentException: URI is not absolute - posted by paradise <pa...@gmail.com> on 2007/11/16 09:24:32 UTC, 0 replies.
- word caché - posted by payo <pa...@yahoo.com> on 2007/11/16 16:25:04 UTC, 1 replies.
- Nightly version - no results? - posted by ca...@globo.com on 2007/11/16 19:00:46 UTC, 0 replies.
- very low fieldnorm leading to bad results - posted by Sathyam Y <sa...@yahoo.com> on 2007/11/16 19:26:24 UTC, 1 replies.
- Reduce job in invertlinks and index tasks often fails - posted by Matei Zaharia <ma...@eecs.berkeley.edu> on 2007/11/18 05:07:49 UTC, 0 replies.
- A record version mismatch occured. Expecting v5, found v69 - posted by Josh Attenberg <jo...@gmail.com> on 2007/11/18 20:41:38 UTC, 3 replies.
- Adddays & topN - posted by Yari M <ya...@mail.ru> on 2007/11/19 09:32:48 UTC, 0 replies.
- Re: indexing excel file - posted by crazy <el...@gmail.com> on 2007/11/19 15:40:08 UTC, 3 replies.
- nutch 0.9 and eclipse 3.3 - - posted by Lev Kantorovich <le...@gmail.com> on 2007/11/19 20:18:11 UTC, 0 replies.
- http://www.mail-archive.com/nutch-user@lucene.apache.org/msg09096.html - posted by "Moore, Lee C" <Le...@xerox.com> on 2007/11/19 21:41:01 UTC, 2 replies.
- dfs.DataNode - Failed to transfer blk_xxxx to 192.168.140.244:50010 got java.net.SocketException: Connection reset - posted by 施兴 <pa...@gmail.com> on 2007/11/20 04:18:00 UTC, 1 replies.
- Handling authentication - posted by "|^| /-\\ |\\| |) /-\\ |2" <ma...@gmail.com> on 2007/11/20 05:57:17 UTC, 1 replies.
- Re: nutch 0.9 and eclipse 3.3 - - posted by eyal edri <ey...@gmail.com> on 2007/11/20 07:39:59 UTC, 1 replies.
- Is storing 20 fields in a lucene document desirable? - posted by kumarlimbu <ku...@gmail.com> on 2007/11/20 12:44:13 UTC, 1 replies.
- PDF Indexing Problem - posted by Christopher Condit <co...@sdsc.edu> on 2007/11/20 21:00:46 UTC, 0 replies.
- No space left on device - posted by Josh Attenberg <jo...@gmail.com> on 2007/11/21 04:24:57 UTC, 9 replies.
- trying to configure nutch-0.9 - posted by Abdou RABBA <ab...@presse-ocean.com> on 2007/11/21 13:30:03 UTC, 1 replies.
- Re: dfs.DataNode - Failed to transfer blk_xxxx to 192.168.140.244:50010 got java.net.SocketException: Connection reset - posted by Tomislav Poljak <tp...@gmail.com> on 2007/11/21 14:13:29 UTC, 0 replies.
- Crawl API Help - posted by Cool Coder <te...@yahoo.com> on 2007/11/21 23:18:28 UTC, 0 replies.
- several requests with different headers to the same resource - posted by Guido García Bernardo <gg...@itdeusto.com> on 2007/11/23 10:48:36 UTC, 0 replies.
- graphExtractor.pl - posted by Daniele Zuco <da...@gmail.com> on 2007/11/23 20:24:36 UTC, 0 replies.
- using trunk, urls disappearing when using 4 nodes - posted by obradoa <ao...@gmail.com> on 2007/11/23 20:54:27 UTC, 1 replies.
- crawl only option for Crawl.java and crawled content reader class - posted by jian chen <ch...@gmail.com> on 2007/11/24 02:19:26 UTC, 7 replies.
- Relevant feedback - posted by josky <te...@lu.unisi.ch> on 2007/11/26 14:13:53 UTC, 0 replies.
- process crawl - posted by payo <pa...@yahoo.com> on 2007/11/26 20:16:27 UTC, 0 replies.
- Crash in Parser - posted by "Bolle, Jeffrey F." <jb...@mitre.org> on 2007/11/26 21:08:00 UTC, 4 replies.
- Newbie question: fetching specific files only. - posted by "Jose C. Lacal" <Jo...@OpenPHI.com> on 2007/11/26 21:47:08 UTC, 1 replies.
- Generate times - posted by Karol Rybak <ka...@gmail.com> on 2007/11/27 00:02:12 UTC, 2 replies.
- Problems with mixed English/Russian page - posted by charlie w <sp...@gmail.com> on 2007/11/27 01:04:04 UTC, 0 replies.
- Usage readdb dump - posted by Daniele Zuco <da...@gmail.com> on 2007/11/27 09:10:39 UTC, 0 replies.
- NullPointerException with trunk - posted by Alexis Votta <al...@gmail.com> on 2007/11/27 15:11:03 UTC, 3 replies.
- URL-Filter for ?indexing?? - posted by "Christoph M. Pflügler" <ch...@stud.uni-bamberg.de> on 2007/11/27 21:30:55 UTC, 1 replies.
- How to read crawldb - posted by Cool Coder <te...@yahoo.com> on 2007/11/27 23:20:49 UTC, 4 replies.
- fetch: An unexpected error has been detected by Java Runtime Environment - posted by Josh Attenberg <jo...@gmail.com> on 2007/11/28 02:13:16 UTC, 1 replies.
- Problems testing Authentication - posted by j....@thomson.com on 2007/11/28 13:50:18 UTC, 3 replies.
- very poor fetch performance with nutch .8 - posted by Josh Attenberg <jo...@gmail.com> on 2007/11/28 20:50:33 UTC, 1 replies.
- can't find hadoop classes necessary to use Nutch API - posted by Ana Rodighiero <an...@entelepon.com> on 2007/11/28 22:44:01 UTC, 1 replies.
- Hello Nutch! - posted by v k <vk...@gmail.com> on 2007/11/29 00:23:10 UTC, 0 replies.
- Hardware Planning - posted by Paul Stewart <ps...@nexicomgroup.net> on 2007/11/29 03:38:58 UTC, 5 replies.
- Basic question about indexing - posted by Venkat Korvi <vk...@gmail.com> on 2007/11/29 20:33:47 UTC, 0 replies.
- Merge indexes using nutch v 0.9 - posted by Cool Coder <te...@yahoo.com> on 2007/11/29 22:05:11 UTC, 0 replies.
- nutch programmer needed for custom scoring plugin - posted by ronjonbb <ro...@hotmail.com> on 2007/11/29 23:19:04 UTC, 0 replies.
- maintainability of nutch - building incremental index - posted by Koe Black <ko...@yahoo.com> on 2007/11/30 02:38:02 UTC, 0 replies.
- Fetching site's sub-folders only - posted by peashey <pa...@quintura.com> on 2007/11/30 07:42:06 UTC, 0 replies.
- Different configuration for different sites in a crawl possible? - posted by "Mubey N." <mu...@gmail.com> on 2007/11/30 09:24:20 UTC, 2 replies.
- can nutch fetch specific file? - posted by daniel lau <da...@gmail.com> on 2007/11/30 13:50:38 UTC, 0 replies.
- Update nutch index process - posted by ajaxtrend <te...@yahoo.com> on 2007/11/30 21:00:35 UTC, 0 replies.