You are viewing a plain text version of this content. The canonical link for it is here.
- Re: The Future of Nutch - posted by Thorsten Scherler <th...@apache.org> on 2009/04/01 02:28:39 UTC, 4 replies.
- Re: Crawler Output Flat file or Database? - posted by Dennis Kubes <ku...@apache.org> on 2009/04/01 02:59:32 UTC, 2 replies.
- Re: lukeall-0.9.1 to manually add indexes - posted by al...@aim.com on 2009/04/01 06:42:41 UTC, 4 replies.
- Two urls cannot fetch - posted by 陈琛 <ky...@gmail.com> on 2009/04/01 10:40:50 UTC, 2 replies.
- Re: crawl_parse keeps growing after re-crawling and segment merging - posted by Doğacan Güney <do...@gmail.com> on 2009/04/01 11:21:16 UTC, 5 replies.
- only fetch home page - posted by 陈琛 <ky...@gmail.com> on 2009/04/01 11:48:50 UTC, 20 replies.
- Nutch 1.0 experience - posted by consultas <co...@qualidade.eng.br> on 2009/04/01 21:47:36 UTC, 2 replies.
- what is subcollection plugin? - posted by ianwong <yi...@hotmail.com> on 2009/04/02 13:11:10 UTC, 0 replies.
- Problem with Crawler and Parent Directories - posted by Wolf Fischer <Wo...@informatik.uni-augsburg.de> on 2009/04/02 17:00:47 UTC, 5 replies.
- nutch/hadoop performance and optimal configuration - posted by DS jha <ae...@gmail.com> on 2009/04/03 00:39:05 UTC, 5 replies.
- Nutch can't find all files - posted by Hannu Väisänen <hv...@joyx.joensuu.fi> on 2009/04/03 06:35:37 UTC, 5 replies.
- Re: Dedup: Job Failed and crawl stopped at depth 1 - posted by pranesh <pr...@hcl.in> on 2009/04/03 06:45:53 UTC, 0 replies.
- nutch-1.0 distribution config problem - posted by zxh116116 <zx...@sina.com> on 2009/04/03 11:01:38 UTC, 3 replies.
- Re: Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) while indexing. - posted by andy2005cst <an...@gmail.com> on 2009/04/03 11:06:26 UTC, 2 replies.
- Problem in compiling nutch 0.7 - posted by Mayank Kamthan <mk...@gmail.com> on 2009/04/03 15:54:33 UTC, 0 replies.
- What means "Ignoring position" using ArcSegmentCreator? - posted by Felix Zimmermann <fe...@gmx.de> on 2009/04/04 12:55:19 UTC, 1 replies.
- How to find out the encoding and format of the content stored in the index? - posted by dealmaker <vi...@gmail.com> on 2009/04/05 07:54:35 UTC, 0 replies.
- Re: How to find out the encoding and format of the content stored in the index? - posted by yanky young <ya...@gmail.com> on 2009/04/05 08:28:23 UTC, 1 replies.
- Re: How to find out the encoding and format of the content stored in the index? - posted by dealmaker <vi...@gmail.com> on 2009/04/05 18:19:08 UTC, 0 replies.
- nutch-1.0 datanode exception when fetching - posted by zxh116116 <zx...@sina.com> on 2009/04/06 03:45:10 UTC, 0 replies.
- Problem crawling BBC Hindi Site - posted by Ankur Garg <ga...@gmail.com> on 2009/04/06 08:12:04 UTC, 1 replies.
- nutch 0.9 protocol-file plugin break with windows file name that contains space - posted by yanky young <ya...@gmail.com> on 2009/04/06 09:50:02 UTC, 0 replies.
- Why 'crawl' is created in local directory instead of HDFS? - posted by Foss User <fo...@gmail.com> on 2009/04/06 20:42:06 UTC, 0 replies.
- why nutch repeat fetching some pages - posted by yanky young <ya...@gmail.com> on 2009/04/08 07:32:41 UTC, 2 replies.
- resubmitting failed reduce task - posted by DS jha <ae...@gmail.com> on 2009/04/08 13:11:17 UTC, 0 replies.
- java heap space error - posted by srinivas jaini <sr...@gmail.com> on 2009/04/09 08:37:30 UTC, 2 replies.
- number of fetcher threads per host? - posted by Alex Basa <al...@yahoo.com> on 2009/04/09 16:16:03 UTC, 3 replies.
- Subcollections plugin not working - posted by Filipe Antunes <fa...@tecnica.cc> on 2009/04/09 16:49:44 UTC, 0 replies.
- Re: type is incompatible in 1.0! - posted by fmccown <fm...@harding.edu> on 2009/04/09 16:49:55 UTC, 0 replies.
- nutch: java.nio.charset.IllegalCharsetNameException: - posted by "jet2web@trashmail.net" <je...@trashmail.net> on 2009/04/10 02:39:44 UTC, 1 replies.
- java.nio.charset.IllegalCharsetNameException - posted by "jet2web@trashmail.net" <je...@trashmail.net> on 2009/04/10 02:41:54 UTC, 0 replies.
- Re: app question.... - posted by yanky young <ya...@gmail.com> on 2009/04/10 04:33:17 UTC, 0 replies.
- Sizing Guide? - posted by John Whelan <jo...@whelanlabs.com> on 2009/04/11 23:46:11 UTC, 0 replies.
- How come getContent returns HTML Entities? - posted by dealmaker <vi...@gmail.com> on 2009/04/12 07:05:46 UTC, 0 replies.
- fetcher issues - posted by Fadzi Ushewokunze <fa...@butterflycluster.net> on 2009/04/13 04:52:35 UTC, 6 replies.
- Multi-Lingual Support in Nutch - posted by Kunal Wku <wk...@yahoo.com> on 2009/04/13 17:30:17 UTC, 0 replies.
- Null pointer exception - posted by Niraj Aswani <N....@dcs.shef.ac.uk> on 2009/04/14 16:18:03 UTC, 0 replies.
- null-pointer exception - posted by Niraj Aswani <n....@sheffield.ac.uk> on 2009/04/14 16:18:49 UTC, 0 replies.
- Re: Language Identifier plugin - posted by wku_kunal <wk...@yahoo.com> on 2009/04/14 17:17:52 UTC, 0 replies.
- How does Nutch Fetch Files in Relative Path? - posted by dealmaker <vi...@gmail.com> on 2009/04/14 22:35:11 UTC, 0 replies.
- Problems with custom field query - posted by Raymond Balmès <ra...@gmail.com> on 2009/04/15 16:47:05 UTC, 4 replies.
- How to ensure that a particular URL is not crawled (ever) again - posted by Grease <gi...@aplopio.com> on 2009/04/16 07:41:01 UTC, 0 replies.
- How to index segments after converted from Heritrix ARC-files. - posted by Felix Zimmermann <fe...@gmx.de> on 2009/04/16 22:50:25 UTC, 1 replies.
- Seattle / PNW Hadoop + Lucene User Group? - posted by Bradford Stephens <br...@gmail.com> on 2009/04/17 00:27:18 UTC, 8 replies.
- Spell checker in nutch 0.9 - posted by "Gosavi.Shyam" <sh...@gmail.com> on 2009/04/17 10:21:42 UTC, 0 replies.
- nutch search score - posted by Zanzico Gioele <gi...@vitecgroup.it> on 2009/04/17 11:35:19 UTC, 0 replies.
- nutch multiple site - posted by Zanzico Gioele <gi...@vitecgroup.it> on 2009/04/17 11:37:17 UTC, 0 replies.
- Odd results and broken docs when indexing converted ARC-files. - posted by Felix Zimmermann <fe...@gmx.de> on 2009/04/17 14:47:02 UTC, 2 replies.
- Odd results and broken docs when indexing converted ARC-files (-> link to gif). - posted by Felix Zimmermann <fe...@gmx.de> on 2009/04/17 14:54:41 UTC, 1 replies.
- getting WORDLIST - posted by Ilia chachkhunashvili <il...@gmail.com> on 2009/04/17 21:35:26 UTC, 0 replies.
- Nutch-based Application for Windows - posted by John Whelan <jo...@whelanlabs.com> on 2009/04/18 04:44:22 UTC, 1 replies.
- Re: fetcher questions - posted by Dennis Kubes <ku...@apache.org> on 2009/04/18 06:56:46 UTC, 0 replies.
- Dedup not working any more (Lock obtain timed out) - posted by ML mail <ml...@yahoo.com> on 2009/04/19 09:53:38 UTC, 0 replies.
- Query-more problem - posted by Raymond Balmès <ra...@gmail.com> on 2009/04/19 18:09:34 UTC, 2 replies.
- ebook resources - including lucene in action - posted by wu fuheng <wu...@gmail.com> on 2009/04/20 05:58:56 UTC, 4 replies.
- Can't build Nutch - posted by Filipe Antunes <fa...@tecnica.cc> on 2009/04/20 12:00:15 UTC, 4 replies.
- how to restrict search result in defined domains? - posted by ianwong <yi...@hotmail.com> on 2009/04/20 14:56:45 UTC, 3 replies.
- Re: Multiple "site:" in query - posted by ianwong <yi...@hotmail.com> on 2009/04/20 15:22:58 UTC, 0 replies.
- way to get list of indexed URLS and list of words - posted by Ilia chachkhunashvili <il...@gmail.com> on 2009/04/20 16:25:06 UTC, 0 replies.
- Nutch Crawling Questions - posted by Jason Todd Slack-Moehrle <ma...@MailNewsRSS.com> on 2009/04/21 01:10:45 UTC, 2 replies.
- running two crawlers at the same time - posted by Alexander Aristov <al...@gmail.com> on 2009/04/21 14:21:58 UTC, 2 replies.
- nutch 1.0 - posted by Jaime Martín <ja...@gmail.com> on 2009/04/21 23:45:43 UTC, 2 replies.
- hi Kubes:the question about develop environment! - posted by askNutch <he...@126.com> on 2009/04/22 07:41:11 UTC, 7 replies.
- Re: AW: Nutch Training Seminar - posted by brainstorm <br...@gmail.com> on 2009/04/22 12:01:04 UTC, 0 replies.
- Hadoop thread seems to remain alive - posted by "Lukas, Ray" <Ra...@idearc.com> on 2009/04/22 22:30:01 UTC, 12 replies.
- run nutch on eclipse problem? - posted by askNutch <he...@126.com> on 2009/04/23 08:24:22 UTC, 3 replies.
- How to resume crawler after crash - posted by Sherjeel Niazi <sh...@softmatics.com> on 2009/04/23 17:02:42 UTC, 1 replies.
- Using nutchBean - posted by "Lukas, Ray" <Ra...@idearc.com> on 2009/04/23 22:36:07 UTC, 4 replies.
- URL Scoring - posted by MyD <My...@googlemail.com> on 2009/04/24 10:14:17 UTC, 1 replies.
- How to get the html that i crawled - posted by sgirao <sg...@altentic.com> on 2009/04/27 13:28:08 UTC, 4 replies.
- Searching multiple indexes with Nutch-2 servers,0 segments - posted by jqq <re...@gmail.com> on 2009/04/27 14:58:42 UTC, 0 replies.
- Nutch fetch creates too many http sessions - posted by kazam <az...@gmail.com> on 2009/04/27 18:25:56 UTC, 2 replies.
- Unable to register IndexingFilter extesion plugin - N 0.9 - posted by Joel Halbert <jo...@storequery.com> on 2009/04/27 19:40:04 UTC, 2 replies.
- Problem in generating the war file - posted by Mayank Kamthan <mk...@gmail.com> on 2009/04/27 20:47:43 UTC, 3 replies.
- dual core and crawling - posted by Raymond Balmès <ra...@gmail.com> on 2009/04/27 23:17:14 UTC, 9 replies.
- Adding a new class in Nutch and using it in a JSP - posted by Mayank Kamthan <mk...@gmail.com> on 2009/04/27 23:46:45 UTC, 0 replies.
- in nutch1.0 incread summary problem - posted by zxh116116 <zx...@sina.com> on 2009/04/28 16:18:14 UTC, 0 replies.
- N 0.9 - fetcher.threads.per.host - posted by Joel Halbert <jo...@su3analytics.com> on 2009/04/28 18:34:14 UTC, 2 replies.
- Possible bug in when fetching page relative links after redirects - N 1.0. - posted by Joel Halbert <jo...@su3analytics.com> on 2009/04/29 11:07:35 UTC, 0 replies.
- Possible bug in when fetching relative links after a redirect - N 1.0 - posted by Joel Halbert <jo...@storequery.com> on 2009/04/29 11:27:25 UTC, 1 replies.
- Is it possible to avoid Nutch 1.0 from indexing local directories ? - posted by vs...@free.fr on 2009/04/30 11:14:30 UTC, 2 replies.
- General queries - posted by Rahil Baig <ra...@expw.com> on 2009/04/30 17:06:22 UTC, 0 replies.