You are viewing a plain text version of this content. The canonical link for it is here.
- Re: Problem in Distributed file system - posted by nu...@dragonflymc.com on 2006/10/01 05:41:03 UTC, 1 replies.
- Accessing term frequency with/without Vector? - posted by Erik J <sw...@hotmail.com> on 2006/10/02 09:36:49 UTC, 2 replies.
- RE: Java heap space - posted by Vladimir Olenin <VO...@cihi.ca> on 2006/10/02 17:36:14 UTC, 0 replies.
- A couple of questions with 0.8.1 - posted by Omar <or...@yahoo.com> on 2006/10/02 21:01:48 UTC, 0 replies.
- Nutch crawler ignores sites without default page - posted by te...@gmail.com on 2006/10/03 14:58:04 UTC, 0 replies.
- category string gets matched as a term - posted by Dima Gritsenko <di...@ekreative.com> on 2006/10/03 15:35:13 UTC, 1 replies.
- Performance problem in nutch 0.8.1 - posted by Mohan Lal <mo...@gmail.com> on 2006/10/03 16:21:08 UTC, 0 replies.
- focussed crawling - posted by Apache Lucene <lu...@gmail.com> on 2006/10/03 22:42:26 UTC, 5 replies.
- RegexURLFilter Pattern - posted by Dima Mazmanov <di...@proservice.ge> on 2006/10/04 08:46:12 UTC, 1 replies.
- Problem parsing some MS Excel documents (Office 2003) - posted by tryma <tr...@creuna.no> on 2006/10/04 09:38:12 UTC, 1 replies.
- backup/failover NameNode - posted by Mohan Lal <mo...@gmail.com> on 2006/10/04 15:40:01 UTC, 1 replies.
- Inconsistent behaviour while parsing pdf/word/ppt files - posted by Omar <or...@yahoo.com> on 2006/10/04 20:08:53 UTC, 0 replies.
- 0.7.2 Compile Problems - posted by Gary Bone <Ga...@spgmedia.com> on 2006/10/04 22:56:49 UTC, 1 replies.
- Motivation for Crawl-urlfilter.txt - posted by Jared Dunne <ja...@thomson.com> on 2006/10/05 01:17:49 UTC, 1 replies.
- Lucene query support in Nutch - posted by Cristina Belderrain <cr...@gmail.com> on 2006/10/05 04:09:58 UTC, 13 replies.
- Re: Problem Searching - posted by WebDev Freak <we...@gmail.com> on 2006/10/05 06:17:01 UTC, 0 replies.
- DFS Shadow Server - posted by Sunil Kumar PK <pk...@gmail.com> on 2006/10/05 11:23:33 UTC, 0 replies.
- java.lang.NoSuchMethodError while indexing - posted by Adam Borkowski <bo...@3miasto.net> on 2006/10/06 15:32:46 UTC, 3 replies.
- NutchWax - posted by Shay Lawless <se...@gmail.com> on 2006/10/06 18:22:48 UTC, 1 replies.
- fixing segmentes - posted by carmmello <ca...@globo.com> on 2006/10/06 19:32:18 UTC, 1 replies.
- How do I search multiple text fields in Lucene .NET? - posted by codejunky codejunky <co...@yahoo.com> on 2006/10/06 21:02:56 UTC, 1 replies.
- dump page content to Windows file system? - posted by David Bargeron <Da...@nervana.com> on 2006/10/06 22:39:22 UTC, 0 replies.
- nonzero status of 134 - posted by "Håvard W. Kongsgård" <nu...@niap.org> on 2006/10/07 12:46:44 UTC, 0 replies.
- Database update - posted by jaison <ja...@qburst.com> on 2006/10/07 13:56:44 UTC, 2 replies.
- First Time Run Nutch0.8.1 in Eclipse 3.2.1 Problem! - posted by Ong Jin Yang <ji...@metaterri.com> on 2006/10/07 22:18:19 UTC, 1 replies.
- Re: Searching with "and" and "or? - posted by Nguyen Ngoc Giang <gi...@gmail.com> on 2006/10/08 16:59:16 UTC, 0 replies.
- crawl db disrtibution on different data nodes - posted by jaison Qburst <ja...@qburst.com> on 2006/10/09 15:41:42 UTC, 1 replies.
- can nutch 0.7.2 set the max pages when doing a crawl job? - posted by kevin <ke...@gmail.com> on 2006/10/09 18:21:37 UTC, 0 replies.
- Problem with readseg - posted by Pankaj Mathur <ag...@hotmail.com> on 2006/10/10 05:00:35 UTC, 0 replies.
- Searching terms saved in a file - posted by frgrfg gfsdgffsd <ki...@yahoo.fr> on 2006/10/10 10:26:41 UTC, 1 replies.
- Deleting Pages - posted by Gary Bone <Ga...@spgmedia.com> on 2006/10/10 12:04:44 UTC, 1 replies.
- term frequencies for multiple term query - posted by Erik J <sw...@hotmail.com> on 2006/10/10 15:53:07 UTC, 0 replies.
- Re : Searching terms saved in a file - posted by frgrfg gfsdgffsd <ki...@yahoo.fr> on 2006/10/10 16:57:44 UTC, 1 replies.
- summarizer extension - posted by "NG-Marketing, M.Schneider" <sc...@ng-marketing.com> on 2006/10/10 18:32:16 UTC, 0 replies.
- Recrawl script - posted by Chris Stephens <ch...@liveoakinteractive.com> on 2006/10/10 19:03:00 UTC, 3 replies.
- Segment size and mergesegs slicing - posted by Jacob Brunson <ja...@gmail.com> on 2006/10/10 22:33:34 UTC, 1 replies.
- Fetcher aborts with hung threads - posted by Bruno Thiel <br...@objectconsulting.com.au> on 2006/10/11 01:51:17 UTC, 3 replies.
- I can not query myplugin in field category:test - posted by xu nutch <nu...@gmail.com> on 2006/10/11 04:26:30 UTC, 8 replies.
- Re : Re : Searching terms saved in a file - posted by frgrfg gfsdgffsd <ki...@yahoo.fr> on 2006/10/11 09:38:13 UTC, 0 replies.
- java.lang.NullPointerException - posted by Guruprasad Iyer <mu...@gmail.com> on 2006/10/11 12:57:48 UTC, 3 replies.
- HELP: Why crawled files so small? nutch version 0.8.1 - posted by kevin <ke...@gmail.com> on 2006/10/11 16:28:56 UTC, 2 replies.
- invoke crawl from servlet? - posted by Paul M Lieberman <pa...@alum.mit.edu> on 2006/10/11 22:43:03 UTC, 0 replies.
- why I use "site:com" to query , but no result return?? - posted by xu nutch <nu...@gmail.com> on 2006/10/12 02:49:16 UTC, 1 replies.
- crawling sites which require authentication - posted by Guruprasad Iyer <mu...@gmail.com> on 2006/10/12 11:50:02 UTC, 10 replies.
- Creating multiple output segments for generate - posted by Vishal Shah <vi...@rediff.co.in> on 2006/10/12 12:59:02 UTC, 0 replies.
- Re: IdentityReducer while fetching - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/10/12 14:43:38 UTC, 2 replies.
- Record no - posted by jaison Qburst <ja...@qburst.com> on 2006/10/12 15:53:55 UTC, 1 replies.
- getMetaDescription(); - posted by Matze <ma...@dermatzeimnetz.de> on 2006/10/12 17:22:44 UTC, 0 replies.
- How can i crawl more site? - posted by martin <ma...@gmail.com> on 2006/10/13 02:12:22 UTC, 0 replies.
- Depricated methods in hadoop 0.6.2 - posted by Mohan Lal <mo...@gmail.com> on 2006/10/13 13:40:35 UTC, 2 replies.
- Dedup undeletes previously deleted documents - posted by ia...@thomson.com on 2006/10/14 03:38:45 UTC, 0 replies.
- how to share code? - posted by Ernesto De Santis <de...@yahoo.com.ar> on 2006/10/14 18:25:22 UTC, 3 replies.
- Re: Dedup undeletes previously deleted documents - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/10/14 20:50:17 UTC, 0 replies.
- How to read content of a particular url from the crawldb? - posted by shjiang <ji...@souchang.com> on 2006/10/16 05:03:45 UTC, 3 replies.
- Steps in writing a plugin - posted by frgrfg gfsdgffsd <ki...@yahoo.fr> on 2006/10/16 06:38:25 UTC, 0 replies.
- Field?? - posted by frgrfg gfsdgffsd <ki...@yahoo.fr> on 2006/10/16 06:41:26 UTC, 1 replies.
- index url category plugin - posted by Ernesto De Santis <de...@yahoo.com.ar> on 2006/10/16 14:32:29 UTC, 0 replies.
- Re : Field?? - posted by frgrfg gfsdgffsd <ki...@yahoo.fr> on 2006/10/17 05:15:19 UTC, 0 replies.
- My search keeps on looping - posted by Mat Fix <sp...@yahoo.fr> on 2006/10/17 09:24:14 UTC, 0 replies.
- Extending BasicQueryFilter for a new plugiin? - posted by frgrfg gfsdgffsd <ki...@yahoo.fr> on 2006/10/17 13:20:17 UTC, 1 replies.
- near duplicates - posted by Find Me <fi...@gmail.com> on 2006/10/17 17:54:07 UTC, 5 replies.
- java 1.5 or 1.4 - posted by "NG-Marketing, M.Schneider" <sc...@ng-marketing.com> on 2006/10/17 18:44:10 UTC, 1 replies.
- fetch fails at reduce stage because can not sense heartbeat for 600 seconds - posted by Mike Smith <mi...@gmail.com> on 2006/10/18 00:58:34 UTC, 8 replies.
- Re: Reduce Error during fetch - posted by Mike Smith <mi...@gmail.com> on 2006/10/18 01:19:35 UTC, 1 replies.
- Indexing the file system / best approach - posted by Bruno Thiel <br...@objectconsulting.com.au> on 2006/10/18 02:21:07 UTC, 2 replies.
- Re : Extending BasicQueryFilter for a new plugiin? - posted by frgrfg gfsdgffsd <ki...@yahoo.fr> on 2006/10/18 08:27:08 UTC, 0 replies.
- Fetching outside the domain ? - posted by Frederic Goudal <go...@enseirb.fr> on 2006/10/18 08:43:36 UTC, 2 replies.
- nutch page searched by google? - posted by Josef Novak <jo...@gmail.com> on 2006/10/18 13:57:13 UTC, 0 replies.
- Re: Fetching outside the domain ? - posted by go...@enseirb.fr on 2006/10/18 15:08:18 UTC, 0 replies.
- Query Error? - posted by Matthew Holt <ho...@wfu.edu> on 2006/10/18 21:40:29 UTC, 0 replies.
- nutch 0.7.2 index.jsp java error on Solaris - posted by Ron Swartzendruber <sw...@wou.edu> on 2006/10/19 03:52:27 UTC, 0 replies.
- can't search - posted by bb...@mail.ru on 2006/10/19 15:31:24 UTC, 2 replies.
- [Fwd: Query Error?] - posted by Matthew Holt <ho...@wfu.edu> on 2006/10/19 21:50:26 UTC, 1 replies.
- Jdk installation conflict? RedHat 3 and java.lang.NoClassDefFoundError nutch errors - posted by Marco Vanossi <ma...@gmail.com> on 2006/10/20 05:36:25 UTC, 0 replies.
- problem parsing documents : word, rtf, excel, etc... - posted by Aïcha <ai...@yahoo.com> on 2006/10/20 17:52:22 UTC, 0 replies.
- Does WebDB keep the contents of the pages? - posted by Qi Wu <wu...@cn.ibm.com> on 2006/10/24 12:51:37 UTC, 1 replies.
- Re: Plugin HitCollector - posted by Dennis Kubes <nu...@dragonflymc.com> on 2006/10/24 16:32:34 UTC, 0 replies.
- Nutch slow how to speed up? - posted by "Håvard W. Kongsgård" <nu...@niap.org> on 2006/10/24 19:06:53 UTC, 3 replies.
- Modifying Nutch core - posted by Benjamin Higgins <bh...@gmail.com> on 2006/10/24 20:06:02 UTC, 1 replies.
- nutch 0.8 (+ hadoop 0.5) does not crawl reliably - posted by Teruhiko Kurosaka <Ku...@basistech.com> on 2006/10/25 03:13:35 UTC, 4 replies.
- Preventing pages to be indexed based on content - posted by Eelco Lempsink <le...@paragin.nl> on 2006/10/25 16:02:46 UTC, 2 replies.
- My dear - posted by Randell Gore <bi...@0733.com> on 2006/10/25 16:08:47 UTC, 0 replies.
- Spaces in URLs - posted by Scott Hayes <ha...@gmail.com> on 2006/10/26 00:00:06 UTC, 0 replies.
- Nutch Crawl and Webserver Authentication - posted by Christian Gottschalch <ma...@llbc.de> on 2006/10/26 14:38:59 UTC, 0 replies.
- Problem in executing Nutch Tutorial - posted by Ha ward <sm...@gmail.com> on 2006/10/26 17:59:26 UTC, 0 replies.
- map-reduce very slow on single machine - posted by AJ Chen <ca...@gmail.com> on 2006/10/26 18:09:30 UTC, 3 replies.
- Problem in executing Nutch tutorial with cygwin - posted by Haward <sm...@gmail.com> on 2006/10/26 19:18:43 UTC, 1 replies.
- Re: Query Error - posted by Matthew Holt <ho...@wfu.edu> on 2006/10/27 06:40:35 UTC, 0 replies.
- nutch 0.8.1 newbie question (invertlinks error) - posted by "Ilia S. Yatsenko" <il...@gmail.com> on 2006/10/27 09:30:24 UTC, 7 replies.
- [about RecordReader.creatValue()] - posted by TKDD <my...@gmail.com> on 2006/10/27 16:37:49 UTC, 0 replies.
- http.content.limit - posted by Find Me <fi...@gmail.com> on 2006/10/27 18:50:04 UTC, 0 replies.
- Magpie RSS parsing of OpenSearch Format - posted by Bud Witney <wi...@osu.edu> on 2006/10/27 23:50:35 UTC, 2 replies.
- RecordReader.creatValue() - posted by TKDD <my...@gmail.com> on 2006/10/28 03:00:52 UTC, 0 replies.
- when doing crawl by Nutch 0.7.2,the following excetion appear,can't I reuse the crawled db and segments? - posted by kevin <ke...@gmail.com> on 2006/10/28 05:38:18 UTC, 0 replies.
- How can I setup an mp3 search engine? - posted by fa...@gzedu.gov.cn on 2006/10/28 11:31:31 UTC, 3 replies.
- returning a description of a returned document - posted by Tomi NA <he...@gmail.com> on 2006/10/28 12:35:58 UTC, 2 replies.
- nice a indexer - posted by "NG-Marketing, M.Schneider" <sc...@ng-marketing.com> on 2006/10/28 18:20:25 UTC, 1 replies.
- Speeding things up! - posted by Marco Vanossi <ma...@gmail.com> on 2006/10/29 02:46:01 UTC, 2 replies.
- On local file system crawl, why does nutch crawl parent directories? - posted by John George <jt...@yahoo.com> on 2006/10/30 01:50:06 UTC, 0 replies.
- Problem in compile nutch8.1 - posted by fa...@gzedu.gov.cn on 2006/10/30 06:22:39 UTC, 1 replies.
- mergesegs bigger than original - posted by "NG-Marketing, M.Schneider" <sc...@ng-marketing.com> on 2006/10/30 10:41:09 UTC, 2 replies.
- Urgent : Fetcher aborts with hung threads - posted by Aïcha <ai...@yahoo.com> on 2006/10/30 18:16:26 UTC, 0 replies.
- Get messy code while fecthing ftp sites - posted by fa...@gzedu.gov.cn on 2006/10/31 03:51:17 UTC, 1 replies.
- Re: Get messy code while fecthing ftp sites - posted by kauu <ba...@gmail.com> on 2006/10/31 04:08:10 UTC, 0 replies.
- Nutch as static exporter? - posted by Thorsten Scherler <th...@apache.org> on 2006/10/31 16:36:02 UTC, 2 replies.
- large number of urls from Generator are not fetched? - posted by AJ Chen <ca...@gmail.com> on 2006/10/31 19:41:37 UTC, 4 replies.