You are viewing a plain text version of this content. The canonical link for it is here.
- Re: Crawling + Indexing staging vs. production and URL conflict - posted by Tomi N/A <he...@gmail.com> on 2007/04/01 16:38:04 UTC, 1 replies.
- Re: Help on Activation of Subcollection at Indexing & searching - posted by prashant_nutch <pr...@in.v2solutions.com> on 2007/04/02 09:47:09 UTC, 1 replies.
- How to delete already stored indexed fields??? - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/04/02 09:47:56 UTC, 10 replies.
- Re: Wildly different crawl results depending on environment... - posted by Enis Soztutar <en...@gmail.com> on 2007/04/02 11:06:47 UTC, 1 replies.
- Can we store field as subcollection name??? - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/04/02 12:20:38 UTC, 0 replies.
- How to prevent a page from being index during crawl or after crawl?? - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/04/02 13:34:57 UTC, 0 replies.
- Running nutch with SOCKS proxy - posted by Vinh Khuc Ngoc <kn...@gmail.com> on 2007/04/02 14:09:21 UTC, 0 replies.
- Fetcher2 too many spinWaiting, How to tune? - posted by qi wu <ch...@gmail.com> on 2007/04/02 18:15:20 UTC, 3 replies.
- problem with date fetched pages? - posted by cesar voulgaris <ce...@gmail.com> on 2007/04/03 05:14:15 UTC, 0 replies.
- how to get rid of some of the fields that are indexed by default eg. content,title,url etc. - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/04/03 15:08:16 UTC, 0 replies.
- Configuration frustrations - posted by Trond Andersen <tr...@gmail.com> on 2007/04/03 16:15:19 UTC, 0 replies.
- Index updates between machines - posted by Chun Wei Ho <cw...@gmail.com> on 2007/04/03 16:39:57 UTC, 4 replies.
- Using nutch as a web crawler - posted by Meryl Silverburgh <si...@gmail.com> on 2007/04/04 04:42:10 UTC, 5 replies.
- Re: Nutch and GET - posted by Damian Florczyk <th...@gentoo.org> on 2007/04/04 10:57:23 UTC, 0 replies.
- Re: Unable to load native-hadoop library - posted by Andrzej Bialecki <ab...@getopt.org> on 2007/04/04 12:05:41 UTC, 3 replies.
- Query on regular expression - posted by ravi_network <ra...@gmail.com> on 2007/04/04 13:04:48 UTC, 2 replies.
- ERROR org.apache.nutch.protocol.http.Http:?java.net.SocketTimeoutException: Read timed out - posted by cha <ch...@metrixline.com> on 2007/04/04 13:06:39 UTC, 3 replies.
- WARN mapred.LocalJobRunner - job_fajjx6 - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/04/04 13:53:37 UTC, 1 replies.
- Nutch - incorrect JavaScript url - posted by Stjepan Marjanovic <m_...@yahoo.com> on 2007/04/04 16:06:52 UTC, 0 replies.
- Nutch Step by Step Maybe someone will find this useful ? - posted by zzcgiacomini <zz...@echo.fr> on 2007/04/04 16:53:54 UTC, 3 replies.
- Exception in thread "main" java.io.IOException: Job failed! - posted by jim shirreffs <jp...@flash.net> on 2007/04/04 18:26:51 UTC, 0 replies.
- crawl-delay and nutch - posted by karthik085 <ka...@gmail.com> on 2007/04/04 23:14:03 UTC, 0 replies.
- Re: [Nutch-general] Nutch Step by Step Maybe someone will find this useful ? - posted by og...@yahoo.com on 2007/04/05 07:04:43 UTC, 0 replies.
- Removing pages from index immediately - posted by og...@yahoo.com on 2007/04/05 08:47:06 UTC, 1 replies.
- help needed on filters - posted by cha <ch...@metrixline.com> on 2007/04/05 09:33:55 UTC, 2 replies.
- Re: [Nutch-general] Removing pages from index immediately - posted by og...@yahoo.com on 2007/04/05 10:09:54 UTC, 6 replies.
- Run Job Crashing - posted by jim shirreffs <jp...@verizon.net> on 2007/04/05 18:51:11 UTC, 1 replies.
- Help please trying to crawl local file system - posted by jim shirreffs <jp...@verizon.net> on 2007/04/05 22:06:22 UTC, 2 replies.
- Nutch 0.9 officially released! - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2007/04/06 04:46:41 UTC, 0 replies.
- Nutch changes 0.9.txt - posted by Paul Liddelow <pa...@gmail.com> on 2007/04/06 08:45:07 UTC, 2 replies.
- Re: how can I handle the files under /tmp? - posted by zh...@live.com on 2007/04/06 11:46:13 UTC, 3 replies.
- web app 0.8 and 0.9 index - posted by djames <dj...@supinfo.com> on 2007/04/06 16:20:05 UTC, 0 replies.
- Trying to setup Nutch - posted by Meryl Silverburgh <si...@gmail.com> on 2007/04/06 21:08:15 UTC, 11 replies.
- NullPointerException during Fetch - posted by Meryl Silverburgh <si...@gmail.com> on 2007/04/07 04:23:46 UTC, 5 replies.
- Incremental indexing and link exploration, /tmp full, nutch design - posted by class acts <cl...@gmail.com> on 2007/04/08 10:43:15 UTC, 1 replies.
- Combining standard Lucene and Nutch - posted by Michael Böckling <Mi...@dmc.de> on 2007/04/10 18:11:31 UTC, 6 replies.
- Probably simple, but... - posted by Brian Hill <hi...@yosemite.cc.ca.us> on 2007/04/10 19:06:46 UTC, 0 replies.
- Garbled cache.jsp - posted by 阿部 公俊 <ro...@hotmail.co.jp> on 2007/04/11 09:32:48 UTC, 0 replies.
- How to recude the tmp disk space usage during linkdb process? - posted by qi wu <ch...@gmail.com> on 2007/04/11 15:01:30 UTC, 5 replies.
- Snippet size - posted by derevo <da...@inbox.ru> on 2007/04/11 21:35:09 UTC, 1 replies.
- ParseException while crawling - posted by Sridhar Teegala <ts...@yahoo.com> on 2007/04/11 22:48:43 UTC, 1 replies.
- Running Nutch on Windows - posted by Sridhar Teegala <ts...@yahoo.com> on 2007/04/11 22:56:07 UTC, 1 replies.
- How to config nutch just crawl html links? - posted by Meryl Silverburgh <si...@gmail.com> on 2007/04/12 03:48:17 UTC, 4 replies.
- How to crawl useful information - posted by James liu <li...@gmail.com> on 2007/04/12 04:19:34 UTC, 0 replies.
- How to dump all the valid links which has been crawled? - posted by Meryl Silverburgh <si...@gmail.com> on 2007/04/12 05:53:18 UTC, 4 replies.
- nutch-09 start problem - posted by Nuther <nu...@proservice.ge> on 2007/04/12 08:56:49 UTC, 4 replies.
- crawl problem with nutch 0.9 - posted by Tomi N/A <he...@gmail.com> on 2007/04/12 09:33:29 UTC, 1 replies.
- Re: Have anybody thought of replacing CrawlDb with any kind of Rational DB? - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/04/12 13:27:28 UTC, 1 replies.
- Forcing update of some URLs - posted by Arie Karhendana <ar...@gamaisitb.org> on 2007/04/12 17:12:22 UTC, 1 replies.
- extracting the result score - posted by Tomi N/A <he...@gmail.com> on 2007/04/12 17:38:57 UTC, 0 replies.
- Pointing UI to custom dir location in .9 - posted by Brian Hill <hi...@yosemite.cc.ca.us> on 2007/04/12 20:33:04 UTC, 0 replies.
- how to use craw-urlfilter.txt - posted by Meryl Silverburgh <si...@gmail.com> on 2007/04/13 06:32:53 UTC, 0 replies.
- Crawling only Links - posted by Matze <ma...@dermatzeimnetz.de> on 2007/04/13 14:26:17 UTC, 0 replies.
- How to add ney segment to index - posted by derevo <da...@inbox.ru> on 2007/04/13 15:43:12 UTC, 0 replies.
- Using Flash, Nutch and OpenSearch - posted by Bud Witney <wi...@osu.edu> on 2007/04/13 21:11:42 UTC, 0 replies.
- Question on searcher.dir in nutch-site.xml - posted by Guanyu Chu <me...@gmail.com> on 2007/04/13 23:50:33 UTC, 2 replies.
- incremental crawling - posted by c wanek <sp...@gmail.com> on 2007/04/14 00:28:27 UTC, 6 replies.
- Plugins Question (fields vs. raw-fields) - posted by nealw <ne...@e-travelmedia.com> on 2007/04/14 03:30:15 UTC, 0 replies.
- Long URL's in results - posted by Paul Liddelow <pa...@gmail.com> on 2007/04/14 10:01:30 UTC, 5 replies.
- nutch books - posted by "Insurance Squared Inc." <gc...@insurancesquared.com> on 2007/04/14 22:44:52 UTC, 0 replies.
- Great Article about Indexers - posted by nealw <ne...@e-travelmedia.com> on 2007/04/15 02:08:52 UTC, 0 replies.
- Index compression - posted by Paul Liddelow <pa...@gmail.com> on 2007/04/15 09:28:00 UTC, 1 replies.
- Crawl www.yahoo.com with nutch - posted by Meryl Silverburgh <si...@gmail.com> on 2007/04/16 05:32:38 UTC, 7 replies.
- Nutch Admin GUI - posted by djames <dj...@supinfo.com> on 2007/04/16 15:06:25 UTC, 0 replies.
- import HTML/XML content files into nutch with properties - posted by David Xiao <da...@gmail.com> on 2007/04/16 17:40:02 UTC, 0 replies.
- regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default - posted by Meryl Silverburgh <si...@gmail.com> on 2007/04/17 06:08:04 UTC, 0 replies.
- Nutch Crawl Question - posted by Ab...@aol.com on 2007/04/17 21:56:31 UTC, 6 replies.
- Re: Fetching outside the domain ? - posted by Tomi N/A <he...@gmail.com> on 2007/04/18 12:40:02 UTC, 5 replies.
- admin db -create doesn't working for m - posted by David Xiao <da...@gmail.com> on 2007/04/18 14:53:04 UTC, 0 replies.
- Language Identification - posted by Honorez Dylan <Dy...@cronos.be> on 2007/04/18 17:30:51 UTC, 0 replies.
- Source of Outlink and how to get Outlinks in 0.9 - posted by Briggs <ac...@gmail.com> on 2007/04/18 23:05:28 UTC, 1 replies.
- Classpath and plugins question - posted by Antony Bowesman <ad...@teamware.com> on 2007/04/19 05:59:14 UTC, 4 replies.
- nutch-0.9.release: Odd Fetcher behaviour - posted by Nuther <nu...@proservice.ge> on 2007/04/19 08:29:01 UTC, 1 replies.
- Nutch admin GUI for 0.9 - posted by Nuther <nu...@proservice.ge> on 2007/04/19 10:08:44 UTC, 0 replies.
- java.net.SocketTimeoutException:connect timed out - posted by cha <ch...@metrixline.com> on 2007/04/19 13:30:30 UTC, 1 replies.
- Cannot crawl from Server - posted by cha <ch...@metrixline.com> on 2007/04/19 13:36:19 UTC, 1 replies.
- having problems with search reading word docs and pdf's in 0.8.1 - posted by Stephen Wilkinson <St...@northdevon.gov.uk> on 2007/04/19 15:58:55 UTC, 0 replies.
- Nutch 0.9 - Generator: 0 records selected for fetching, exiting - posted by Ab...@aol.com on 2007/04/19 20:47:41 UTC, 0 replies.
- Nutch and Crawl Frequency - posted by Briggs <ac...@gmail.com> on 2007/04/19 21:02:24 UTC, 3 replies.
- Office 2007 + XML parser - posted by Antony Bowesman <ad...@teamware.com> on 2007/04/20 04:08:27 UTC, 2 replies.
- Can anybody tell me how the Nutch-0.9 is different than nutch-0.8.1 - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/04/20 08:09:50 UTC, 1 replies.
- Re: having problems with search reading word docs and pdf's in 0.8.1 - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/04/20 08:25:34 UTC, 0 replies.
- Plugin to index categories by url rules - posted by derevo <da...@inbox.ru> on 2007/04/21 01:16:59 UTC, 4 replies.
- Hardware Crashes and Garbage Collection on Nutch/Hadoop - posted by Dennis Kubes <nu...@dragonflymc.com> on 2007/04/21 02:50:00 UTC, 3 replies.
- Re: Any way for removing pages with same title in index? - posted by Chee Wu <ch...@gmail.com> on 2007/04/22 12:12:59 UTC, 0 replies.
- 0.9 ClassCastException: org.apache.hadoop.io.Text - posted by Lauren Massa Lochridge <la...@ieee.org> on 2007/04/23 00:58:27 UTC, 2 replies.
- Can any body explain me the new features of nutch-0.9 - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/04/23 07:49:23 UTC, 1 replies.
- Why Nutch returns 0 results? - posted by openxu <op...@gmail.com> on 2007/04/23 08:06:03 UTC, 4 replies.
- Optional terms - posted by Trond Andersen <tr...@gmail.com> on 2007/04/23 15:40:48 UTC, 0 replies.
- strange URL filter behavior - posted by Ben Szekely <bs...@gmail.com> on 2007/04/23 18:04:08 UTC, 0 replies.
- updating crawls with Nutch 0.9 - posted by Michael McDougall <mc...@grammatech.com> on 2007/04/23 23:40:06 UTC, 0 replies.
- Re: Compile Nutch - posted by franklinb4u <sm...@yahoo.com> on 2007/04/24 08:00:19 UTC, 0 replies.
- ExcelExtractor performance - posted by Antony Bowesman <ad...@teamware.com> on 2007/04/24 11:22:08 UTC, 0 replies.
- Query pdf, etc.. - posted by ekoje ekoje <jo...@gmail.com> on 2007/04/24 15:01:25 UTC, 3 replies.
- Index - posted by ekoje ekoje <jo...@gmail.com> on 2007/04/24 15:06:27 UTC, 3 replies.
- Nutch 0.9 recrawl - posted by Annona Keene <an...@yahoo.com> on 2007/04/24 23:57:59 UTC, 1 replies.
- Using nutch just for the crawler/fetcher - posted by John Kleven <jo...@gmail.com> on 2007/04/25 06:57:51 UTC, 4 replies.
- search in more than one index. - posted by Abdelhakim Diab <ab...@gmail.com> on 2007/04/25 11:51:11 UTC, 2 replies.
- nutch-site.xml score - posted by karthik085 <ka...@gmail.com> on 2007/04/25 19:55:32 UTC, 0 replies.
- nutch-0.9 plugins - posted by karthik085 <ka...@gmail.com> on 2007/04/25 20:43:50 UTC, 0 replies.
- Can I make a custom web searcher with Nutch? - posted by Marcin Okraszewski <ok...@gmail.com> on 2007/04/25 22:41:05 UTC, 1 replies.
- Outlinks during parsing - posted by Antony Bowesman <ad...@teamware.com> on 2007/04/26 01:03:41 UTC, 0 replies.
- nutch search results problem - posted by karthik085 <ka...@gmail.com> on 2007/04/26 03:01:16 UTC, 0 replies.
- nutch freegen bug? - posted by Nuther <nu...@proservice.ge> on 2007/04/26 08:20:18 UTC, 0 replies.
- Adding documents to already created distributed index - posted by Ilya Vishnevsky <Il...@e-legion.com> on 2007/04/26 14:03:32 UTC, 0 replies.
- How to reIndex after reCrawl? - posted by Ilya Vishnevsky <Il...@e-legion.com> on 2007/04/26 17:08:58 UTC, 0 replies.
- Case Sensitive - posted by karthik085 <ka...@gmail.com> on 2007/04/27 01:07:22 UTC, 3 replies.
- Problems during Merging Indexes - posted by Nuther <nu...@proservice.ge> on 2007/04/27 09:06:55 UTC, 1 replies.
- Nutch crawl crashing during merge with ArrayIndexOutOfBoundsException - posted by Mike Brzozowski <bi...@gmail.com> on 2007/04/27 19:51:44 UTC, 0 replies.
- Ignore Robots meta tag - posted by karthik085 <ka...@gmail.com> on 2007/04/27 20:47:46 UTC, 1 replies.
- query filter ordering - posted by c wanek <sp...@gmail.com> on 2007/04/28 00:34:10 UTC, 1 replies.
- crystal - posted by TCXO <fy...@worldzgc.com> on 2007/04/29 10:18:17 UTC, 0 replies.
- Question: Crawl web page and parse - posted by James liu <li...@gmail.com> on 2007/04/30 04:15:19 UTC, 0 replies.
- Nutch encoding problem - posted by Zsolt Horváth <zs...@polymeta.com> on 2007/04/30 09:29:30 UTC, 3 replies.
- Iterate through stored pages - posted by Anton Beza <an...@gmail.com> on 2007/04/30 16:07:18 UTC, 1 replies.
- Nutch and running crawls within a container. - posted by Briggs <ac...@gmail.com> on 2007/04/30 16:45:59 UTC, 3 replies.
- Crawling fixed set of urls (newbie question) - posted by Somnath Banerjee <so...@gmail.com> on 2007/04/30 17:12:03 UTC, 0 replies.