You are viewing a plain text version of this content. The canonical link for it is here.
- Re: how to crawl Specified type files? - posted by Dennis Kubes <nu...@dragonflymc.com> on 2007/01/01 07:29:07 UTC, 0 replies.
- Re: NUTCH 0.8.1: Difficulties with Analyzers - posted by Dennis Kubes <nu...@dragonflymc.com> on 2007/01/02 01:05:20 UTC, 0 replies.
- Re: Unknown encoding for 'GBK-EUC-H' - posted by Ben Litchfield <be...@benlitchfield.com> on 2007/01/02 02:48:34 UTC, 0 replies.
- fetcher : some doubts - posted by shrinivas patwardhan <sh...@gmail.com> on 2007/01/02 05:48:52 UTC, 11 replies.
- Re: Error on convert to 0.9 during mergesegs step - posted by Alan Tanaman <al...@idna-solutions.com> on 2007/01/02 22:36:37 UTC, 2 replies.
- Duplicate URLs with slightly different URIs.. how to normalize? - posted by Brian Whitman <br...@variogr.am> on 2007/01/02 23:08:06 UTC, 1 replies.
- Intranet crawling maintenance - posted by Daniel López <D....@uib.es> on 2007/01/03 12:58:59 UTC, 0 replies.
- NutchBean searching options - posted by Daniel López <D....@uib.es> on 2007/01/03 13:06:25 UTC, 1 replies.
- nutch81 pages seems were not kept but no error message found - posted by Chee Wu <ch...@gmail.com> on 2007/01/03 13:33:08 UTC, 3 replies.
- Google Search on Nutch? - posted by Justin Hartman <jj...@gmail.com> on 2007/01/03 13:39:54 UTC, 9 replies.
- re-parse hang? - posted by Brian Whitman <br...@variogr.am> on 2007/01/04 05:09:47 UTC, 6 replies.
- Plugins for features - posted by karthik085 <ka...@gmail.com> on 2007/01/04 06:29:38 UTC, 2 replies.
- nutch 0.9 does not recognize slaves - posted by Shailendra Mudgal <mu...@gmail.com> on 2007/01/04 16:49:19 UTC, 2 replies.
- Issues Starting Hadoop Process in Nutch0.9l.1 - posted by srinath <co...@gmail.com> on 2007/01/04 17:13:52 UTC, 7 replies.
- Reparsing fetched content - posted by Mathijs Homminga <ma...@knowlogy.nl> on 2007/01/05 11:42:29 UTC, 1 replies.
- Can't start datanode on slaves (hadoop 0.9.1, nutch nightly build) - posted by Vishal Shah <vi...@rediff.co.in> on 2007/01/05 15:01:41 UTC, 0 replies.
- Reading Inlinks - posted by Ashish <as...@smarteinc.com> on 2007/01/05 19:22:20 UTC, 1 replies.
- Re: Nutch and OSCache - posted by Sean Dean <se...@rogers.com> on 2007/01/06 10:42:42 UTC, 0 replies.
- Wikipedia founder lauching Wiki-inspired search engine; Nutch as potential framework - posted by Renaud Richardet <re...@oslutions.com> on 2007/01/06 15:32:38 UTC, 0 replies.
- Nutch .81: the process to add a new analyzer ? - posted by Chee Wu <ch...@gmail.com> on 2007/01/07 10:12:28 UTC, 6 replies.
- List owner? - posted by James Phillips <ja...@keypot.com> on 2007/01/07 10:55:56 UTC, 2 replies.
- Re: Nutch Programmer Wanted - posted by e w <ep...@gmail.com> on 2007/01/07 16:50:31 UTC, 1 replies.
- Error after SVN update - posted by Nutch Newbie <nu...@gmail.com> on 2007/01/08 12:12:38 UTC, 4 replies.
- Sending cookies in Nutch - posted by Annona Keene <an...@yahoo.com> on 2007/01/08 18:33:34 UTC, 0 replies.
- Using Nutch for special content pages - posted by Tor Harald Thorland <li...@strigen.com> on 2007/01/09 10:17:22 UTC, 4 replies.
- Filtering URLs in CrawlDB - posted by Dennis Kubes <nu...@dragonflymc.com> on 2007/01/09 17:30:57 UTC, 2 replies.
- LocalFileSystem , LinkDbReader and workingDir - posted by Paul Dhaliwal <su...@gmail.com> on 2007/01/09 18:21:19 UTC, 2 replies.
- Running Nutch in Eclipse - posted by Jonathan Hunter <Jo...@oberlin.edu> on 2007/01/10 07:24:31 UTC, 5 replies.
- fetcher fails with NullPointerException - posted by Phạm Hải Thanh <ph...@vasc.com.vn> on 2007/01/10 08:54:01 UTC, 1 replies.
- which port nutch uses ??? - posted by Phạm Hải Thanh <ph...@vasc.com.vn> on 2007/01/10 09:39:28 UTC, 1 replies.
- How to index and return files names ? - posted by Arnaud Goupil <go...@yahoo.fr> on 2007/01/10 11:04:02 UTC, 4 replies.
- fetch list - posted by Carlos González-Cadenas <ca...@gonzalez.name> on 2007/01/10 11:53:02 UTC, 3 replies.
- Starting nutch fails - posted by Tor Harald Thorland <li...@strigen.com> on 2007/01/10 14:22:18 UTC, 2 replies.
- sort result on different set of terms - posted by DS jha <ae...@gmail.com> on 2007/01/10 14:46:32 UTC, 0 replies.
- How to retrieve and store the date infromation of a page - posted by chee wu <ch...@gmail.com> on 2007/01/10 15:13:51 UTC, 0 replies.
- nutch-0.9 trunk is failing in Indexer - posted by Lukas Vlcek <lu...@gmail.com> on 2007/01/10 17:29:30 UTC, 7 replies.
- Job Opportunity (Sunnyvale, CA) - posted by "J. Delgado" <jo...@gmail.com> on 2007/01/10 17:56:27 UTC, 0 replies.
- parse crash: PluginManifestParser - posted by Brian Whitman <br...@variogr.am> on 2007/01/10 21:24:28 UTC, 0 replies.
- Build Failure with 0.8.1 - posted by Steve Kallestad <ka...@gmail.com> on 2007/01/11 01:08:51 UTC, 0 replies.
- RE : RE: How to index and return files names ? - posted by Arnaud Goupil <go...@yahoo.fr> on 2007/01/11 08:34:17 UTC, 0 replies.
- Nutch zone (was Re: Google Search on Nutch?) - posted by Thorsten Scherler <th...@juntadeandalucia.es> on 2007/01/11 09:19:32 UTC, 1 replies.
- (null) when indexing - posted by Phạm Hải Thanh <ph...@vasc.com.vn> on 2007/01/11 10:47:36 UTC, 0 replies.
- nutch in eclipse, No input directories specified - posted by Tim Benke <ze...@fusemail.com> on 2007/01/11 15:16:20 UTC, 4 replies.
- DFS with nutch- 0.72 - posted by Shrinivas Patwardhan <sh...@krawlernetworks.com> on 2007/01/12 06:22:55 UTC, 1 replies.
- problems to exclude subdirectories in a web site - posted by yl...@ifrance.com, yl...@ifrance.com on 2007/01/12 15:16:15 UTC, 2 replies.
- BUG with error: failure closing block of file with Hadoop 0.9.2 and Nutch 0.8.1 - posted by yl...@ifrance.com, yl...@ifrance.com on 2007/01/12 15:26:55 UTC, 1 replies.
- Nutch Crawler (.81) picking up strange links - posted by Steve Kallestad <ka...@gmail.com> on 2007/01/12 21:20:33 UTC, 1 replies.
- Nutch support for frames - posted by karthik085 <ka...@gmail.com> on 2007/01/12 22:03:36 UTC, 0 replies.
- alternative for dmoz rdf ? - posted by Shrinivas Patwardhan <sh...@krawlernetworks.com> on 2007/01/13 07:30:47 UTC, 8 replies.
- nutch server - posted by Shrinivas Patwardhan <sh...@krawlernetworks.com> on 2007/01/13 10:54:35 UTC, 1 replies.
- Redirect source remains unfetched - posted by Mathijs Homminga <ma...@knowlogy.nl> on 2007/01/13 14:34:41 UTC, 3 replies.
- Crawling but no indexing.. - posted by chee wu <ch...@gmail.com> on 2007/01/13 17:21:49 UTC, 0 replies.
- crawling url list - posted by visava <vi...@hotmail.com> on 2007/01/14 05:49:00 UTC, 7 replies.
- Where have all the flowers gone... err... the logs :) - posted by Gal Nitzan <gn...@usa.net> on 2007/01/15 09:58:07 UTC, 1 replies.
- Problem finding out the number of crawled pages per domain - posted by te...@gmail.com on 2007/01/15 14:38:11 UTC, 2 replies.
- Problems stressing "./bin/nutch server" command - posted by Alvaro Cabrerizo <to...@gmail.com> on 2007/01/15 18:24:42 UTC, 0 replies.
- checksum error in segment merger - posted by Brian Whitman <br...@variogr.am> on 2007/01/15 18:30:27 UTC, 7 replies.
- not indexing - posted by bb...@mail.ru on 2007/01/15 18:36:03 UTC, 2 replies.
- nutch-0.8 bundle for eclipse - posted by Renaud Richardet <re...@oslutions.com> on 2007/01/16 02:12:37 UTC, 2 replies.
- Issue While Creating Inverted Links - posted by srinath <co...@gmail.com> on 2007/01/16 07:18:58 UTC, 1 replies.
- Searcher doesn't find what expected - posted by Libor Štefek <li...@logis.cz> on 2007/01/16 07:25:46 UTC, 3 replies.
- DB_unfetched status - posted by cesar voulgaris <ce...@gmail.com> on 2007/01/17 05:57:21 UTC, 3 replies.
- NameNode throws FileNotFoundException: Parent path does not exist on startup - posted by Shailendra Mudgal <mu...@gmail.com> on 2007/01/17 09:26:03 UTC, 4 replies.
- search or Tomcat ill response - posted by yo_keller <yo...@kepler.fr> on 2007/01/17 09:44:10 UTC, 2 replies.
- How to recover data from filesystem - posted by Shailendra Mudgal <mu...@gmail.com> on 2007/01/17 11:28:51 UTC, 1 replies.
- out of memory error at end of indexing - posted by Brian Whitman <br...@variogr.am> on 2007/01/17 17:57:45 UTC, 1 replies.
- How to stop a slow fetch? - posted by Shailendra Mudgal <mu...@gmail.com> on 2007/01/18 06:26:27 UTC, 4 replies.
- Nutch 0.8 cannot find all the links on a page - posted by te...@gmail.com on 2007/01/18 09:30:46 UTC, 2 replies.
- Reduce segment size - posted by Ledio Ago <la...@looksmart.net> on 2007/01/19 02:57:15 UTC, 9 replies.
- notch 0.9 + hadoop 0.10.1 problem - posted by Gal Nitzan <gn...@usa.net> on 2007/01/19 10:44:40 UTC, 1 replies.
- java.lang.OutOfMemoryError - trunk - posted by Gal Nitzan <gn...@usa.net> on 2007/01/19 16:57:01 UTC, 4 replies.
- how to use PorterStemFilter with NutchDocumentAnalyzer - posted by DS jha <ae...@gmail.com> on 2007/01/19 18:14:30 UTC, 3 replies.
- Input directory urls/url-fr.txt in localhost:9000 is invalid with Hadoop 0.4.0patched and Nutch 0.8.1 - posted by yl...@ifrance.com, yl...@ifrance.com on 2007/01/19 19:05:52 UTC, 1 replies.
- Does nutch segments from hadoop .7.1 different from hadoop .10.1 - posted by Gal Nitzan <gn...@usa.net> on 2007/01/19 22:28:45 UTC, 0 replies.
- Unique out of memory exception while fetching.. - posted by Bharat Beedu <bb...@stanfordalumni.org> on 2007/01/20 09:58:21 UTC, 0 replies.
- Limiting the total number of urls to crawl on a single website - posted by Vlador <te...@gmail.com> on 2007/01/21 18:10:47 UTC, 0 replies.
- Indexing only some filetypes with Nutch - posted by Tobias Zahn <To...@arcor.de> on 2007/01/21 18:50:00 UTC, 5 replies.
- Compiling PruneIndexTool trouble - posted by Jonathan Hunter <Jo...@oberlin.edu> on 2007/01/22 06:56:58 UTC, 3 replies.
- "Or" searches in nutch - posted by Nicolás Lichtmaier <ni...@reloco.com.ar> on 2007/01/22 21:51:55 UTC, 0 replies.
- Can I generate nutch index without crawling? - posted by Scott Green <sm...@gmail.com> on 2007/01/23 18:08:31 UTC, 4 replies.
- Boolean searches, again - posted by Nicolás Lichtmaier <ni...@reloco.com.ar> on 2007/01/23 20:08:03 UTC, 2 replies.
- cannot search by url (url:) with Nutch 0.8 - posted by Renaud Richardet <re...@oslutions.com> on 2007/01/24 01:34:42 UTC, 0 replies.
- nutch scrawls only relative links - posted by Denis Pimenov <ki...@bitel.ru> on 2007/01/24 16:16:53 UTC, 2 replies.
- exact matches and stemming - posted by Aïcha <ai...@yahoo.com> on 2007/01/24 18:13:11 UTC, 1 replies.
- Merging large sets of segments, help. - posted by Briggs <ac...@gmail.com> on 2007/01/24 18:48:34 UTC, 5 replies.
- Problem crawling/fetching using https - posted by Michael Wechner <mi...@wyona.com> on 2007/01/24 23:29:25 UTC, 8 replies.
- Partial Success installing Nutch 0.8.1 under Debian Etch: Procedure and Question(s) - posted by "Steve W." <mi...@gmail.com> on 2007/01/25 00:17:55 UTC, 0 replies.
- Multiple collections - posted by Nathan Ter Bogt <na...@agileware.net> on 2007/01/25 02:02:35 UTC, 1 replies.
- Crawling JSPs - posted by Deepa Devanathan <ti...@gmail.com> on 2007/01/25 12:50:08 UTC, 3 replies.
- http://jakarta.apache.org/taglibs/i18n cannot be resolved - posted by "Boemio, Neil (FGIC)" <Ne...@fgic.com> on 2007/01/26 04:58:51 UTC, 3 replies.
- Linking url metadata to nutch search results - posted by ma...@jcademy.com on 2007/01/26 14:57:32 UTC, 1 replies.
- Problems Searching an Index with Nutch - posted by Erik Höschler <er...@l0bster.de> on 2007/01/26 16:04:17 UTC, 6 replies.
- IndexMerger and non-nutch Lucene indexes - posted by Brian Whitman <br...@variogr.am> on 2007/01/26 17:21:25 UTC, 0 replies.
- How to limit nutch to fetch, refetch and index just the injected URLs? - posted by Nicolás Lichtmaier <ni...@reloco.com.ar> on 2007/01/26 23:00:27 UTC, 1 replies.
- Need help with form based authentication - posted by sandeep pujar <sa...@yahoo.com> on 2007/01/26 23:26:21 UTC, 0 replies.
- Re: Need help with form based authentication - posted by Andrzej Bialecki <ab...@getopt.org> on 2007/01/26 23:56:42 UTC, 1 replies.
- Trunk version and NUTCH-251(Administration gui) - posted by karthik085 <ka...@gmail.com> on 2007/01/27 01:51:28 UTC, 0 replies.
- Problem witch PDF title - posted by MS <sc...@gmail.com> on 2007/01/27 16:52:20 UTC, 0 replies.
- Nutch content with Lucene search - posted by Gilbert Groenendijk <gi...@gmail.com> on 2007/01/27 19:34:16 UTC, 5 replies.
- How to remove base url from index entries - posted by Mark_Fletcher <ma...@workday.com> on 2007/01/27 20:19:10 UTC, 0 replies.
- Fetcher threads & automation - posted by Justin Hartman <jj...@gmail.com> on 2007/01/28 10:17:49 UTC, 12 replies.
- Lease expired exception - posted by djames <dj...@supinfo.com> on 2007/01/28 12:04:33 UTC, 3 replies.
- Error while accessing Nutch from browser/tomcat, command-line works fine - posted by Jayant Kumar Gandhi <ja...@gmail.com> on 2007/01/28 13:10:01 UTC, 1 replies.
- Analyzer for searching directly with Lucene - posted by Markus <sp...@yahoo.de> on 2007/01/28 16:07:54 UTC, 0 replies.
- Re : exact matches and stemming - posted by Aïcha <ai...@yahoo.com> on 2007/01/29 09:15:12 UTC, 0 replies.
- Announcing 6S and user study - posted by Le-Shin Wu <le...@cs.indiana.edu> on 2007/01/29 16:42:53 UTC, 0 replies.
- luke cannot open searchable index - posted by Sunnyvale Fl <su...@gmail.com> on 2007/01/29 20:54:18 UTC, 2 replies.
- Vertical Search Means - posted by Reddeppa Naidu <pa...@gmail.com> on 2007/01/30 07:54:43 UTC, 4 replies.
- New to Nutch, a few questions - posted by Nes Yarug <ne...@gmail.com> on 2007/01/30 14:37:47 UTC, 7 replies.
- nutch and form based authentication - posted by Magnus Grimsell <ma...@idainfront.se> on 2007/01/30 17:46:37 UTC, 0 replies.
- Do Nutch crawler/fetcher take cookies - posted by sandeep pujar <sa...@yahoo.com> on 2007/01/30 21:39:51 UTC, 0 replies.
- httpresponse + xml = not reading all bytes - posted by sdeck <sc...@gmail.com> on 2007/01/31 05:09:27 UTC, 0 replies.
- Dedup index error - posted by Hetal Shah <he...@investorsprovident.com> on 2007/01/31 12:27:59 UTC, 2 replies.
- ClassNotFoundException on Hadoop Trunk - posted by Dennis Kubes <nu...@dragonflymc.com> on 2007/01/31 17:31:20 UTC, 0 replies.
- List Domains and adding Boost Values for Custom Fields - posted by Briggs <ac...@gmail.com> on 2007/01/31 19:11:05 UTC, 0 replies.
- Plugin ClassLoader issues... - posted by Briggs <ac...@gmail.com> on 2007/01/31 22:34:59 UTC, 1 replies.