user@nutch.apache.org, 2012-03

You are viewing a plain text version of this content. The canonical link for it is here.

- Re: nutch crawling - posted by Elisabeth Adler <el...@gmail.com> on 2012/03/01 10:51:59 UTC, 0 replies.
- Distributed Indexing on MapReduce - posted by Frank Scholten <fr...@frankscholten.nl> on 2012/03/01 11:08:13 UTC, 0 replies.
- Featured link support in Nutch - posted by Stany Fargose <st...@gmail.com> on 2012/03/01 20:59:00 UTC, 3 replies.
- Re: http.redirect.max - posted by al...@aim.com on 2012/03/01 21:09:42 UTC, 1 replies.
- multiple small crawlers on single machine conflict at /tmp/hadoop-username/mapred - posted by Jeremy Villalobos <je...@gmail.com> on 2012/03/01 22:26:00 UTC, 5 replies.
- Only fetching initial seedlist - posted by James Ford <si...@gmail.com> on 2012/03/01 23:28:52 UTC, 7 replies.
- different fetch interval for each depth urls - posted by al...@aim.com on 2012/03/02 06:19:34 UTC, 3 replies.
- Webgraph / getmerge - posted by Rafael Pappert <rp...@fwpsystems.com> on 2012/03/02 12:31:16 UTC, 0 replies.
- [Blog Post]: Accumulo and Pig play together now - posted by Jason Trost <ja...@gmail.com> on 2012/03/02 14:48:36 UTC, 1 replies.
- Nutch with Letor - posted by varunpandeyengg <va...@gmail.com> on 2012/03/03 05:36:00 UTC, 7 replies.
- Incompatible format version 2 expected 1 or lower - posted by dafna <ni...@elbitsystems.com> on 2012/03/03 19:30:25 UTC, 2 replies.
- Solrindex job failed ! - posted by Haya AL-Tuwaijri <ha...@hotmail.com> on 2012/03/04 07:18:52 UTC, 2 replies.
- java.net.UnknownHostException during fetching - posted by hadi <md...@gmail.com> on 2012/03/04 13:19:09 UTC, 5 replies.
- nutch craling file system - posted by alessio crisantemi <al...@gmail.com> on 2012/03/04 17:02:17 UTC, 5 replies.
- Java Script Crawling using nutch - posted by Dayal <li...@gmail.com> on 2012/03/06 18:10:41 UTC, 1 replies.
- Optimizing crawling for small number of domains/sites (aka. intranet crawling) - posted by webdev1977 <we...@gmail.com> on 2012/03/06 21:03:03 UTC, 2 replies.
- Multiple parsers - posted by "nutch.buddy@gmail.com" <nu...@gmail.com> on 2012/03/07 14:34:07 UTC, 5 replies.
- Crawling with Certs - posted by Christopher Gross <co...@gmail.com> on 2012/03/07 21:22:56 UTC, 8 replies.
- NutchGora continuous indexing - posted by Daniel Rosher <ro...@gmail.com> on 2012/03/08 11:33:11 UTC, 1 replies.
- Nutch as crawler for text analysis: setup ? version ? - posted by Piet van Remortel <pi...@gmail.com> on 2012/03/09 16:19:03 UTC, 2 replies.
- Re: nutch crawling file system SOLVED - posted by alessio crisantemi <al...@gmail.com> on 2012/03/10 17:42:52 UTC, 12 replies.
- crawling with -1 as fetch.interval causes all pages to be refetched at same running instance - posted by "nutch.buddy@gmail.com" <nu...@gmail.com> on 2012/03/11 19:53:11 UTC, 0 replies.
- Re: Exception in thread "main" java.io.IOException: Job failed! - posted by Pantelis <pk...@hotmail.com> on 2012/03/12 13:32:54 UTC, 1 replies.
- Hostnames changed for lots of URLS in crawldb, solr index, how to change? - posted by webdev1977 <we...@gmail.com> on 2012/03/12 14:32:57 UTC, 1 replies.
- split content field into two fields - posted by HaYa aziz <ha...@hotmail.com> on 2012/03/13 12:13:53 UTC, 1 replies.
- unable to crwal a specefic site- Lithium Based - posted by kingping <ik...@gmail.com> on 2012/03/13 13:47:44 UTC, 3 replies.
- Configuring nutch to run on hadoop - posted by Magnús Skúlason <ma...@gmail.com> on 2012/03/14 11:38:19 UTC, 5 replies.
- Order of loading plugins - posted by Mohammad Tambe <Mo...@persistent.co.in> on 2012/03/14 13:50:27 UTC, 2 replies.
- Blacklisted Tasktracker / AlreadyBeingCreatedException - posted by Rafael Pappert <ra...@pappert.biz> on 2012/03/16 14:46:19 UTC, 5 replies.
- Running CrawlDbReader: _SUCCESS/data does not exist - posted by Sudip Datta <pi...@gmail.com> on 2012/03/16 20:46:30 UTC, 4 replies.
- Fetching/Indexing process is taking a lot of time - posted by George <ad...@proservice.ge> on 2012/03/17 07:40:58 UTC, 8 replies.
- Re: Meta Tags - posted by blunderboy <sa...@gmail.com> on 2012/03/19 12:23:07 UTC, 5 replies.
- NutchHadoopTutorial Updated - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/03/19 16:19:59 UTC, 8 replies.
- crawling sile system - posted by alessio crisantemi <al...@gmail.com> on 2012/03/19 20:41:29 UTC, 1 replies.
- Nutch 1.4 with Hadoop - how does Nutch know where Hadoop is running - posted by Dean Pullen <de...@semantico.com> on 2012/03/20 11:51:41 UTC, 4 replies.
- Job failed while creating SolrIndex - posted by blunderboy <sa...@gmail.com> on 2012/03/20 11:56:20 UTC, 1 replies.
- Re: urls won't get crawled - posted by jepse <jp...@jepse.net> on 2012/03/20 12:41:25 UTC, 4 replies.
- Too much logging - posted by Andy Xue <an...@gmail.com> on 2012/03/21 09:00:00 UTC, 1 replies.
- Nutch on Elastic Map Reduce - posted by Milica Bogicevic <mb...@nsphere.net> on 2012/03/21 16:46:41 UTC, 0 replies.
- Generator taking time - posted by James Ford <si...@gmail.com> on 2012/03/22 11:48:40 UTC, 4 replies.
- crawl and update one url already in crawldb - posted by webdev1977 <we...@gmail.com> on 2012/03/22 13:53:02 UTC, 5 replies.
- Amazon S3 and EC2 - posted by Milica Bogicevic <mb...@nsphere.net> on 2012/03/22 15:15:14 UTC, 0 replies.
- canonical tag support - posted by th...@wellsfargo.com on 2012/03/22 20:32:26 UTC, 3 replies.
- Fwd: http://webdatacommons.org/ - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/03/23 13:31:37 UTC, 0 replies.
- Partially parsed pages - posted by Elisabeth Adler <el...@gmail.com> on 2012/03/23 13:39:38 UTC, 1 replies.
- db_unfetched large number, but crawling not fetching any longer - posted by webdev1977 <we...@gmail.com> on 2012/03/23 14:46:20 UTC, 4 replies.
- Problems in Getting the tutorial running. - posted by Apurv Verma <da...@gmail.com> on 2012/03/24 13:51:58 UTC, 1 replies.
- Older plugin in Nutch 1.4 - posted by Vicente Canhoto <vi...@gmail.com> on 2012/03/24 18:14:50 UTC, 2 replies.
- Out-of-the-box Nutch indexing url source to Solr - posted by JohnRodey <ti...@yahoo.com> on 2012/03/25 18:39:53 UTC, 4 replies.
- Re: [ANNOUNCEMENT] Lewis John Mc Gibbney is a Nutch committer and PMC member - posted by Hangthunder <ji...@gmail.com> on 2012/03/26 06:54:51 UTC, 2 replies.
- Nutch not crawling jabong - posted by blunderboy <sa...@gmail.com> on 2012/03/26 10:32:59 UTC, 2 replies.
- Bottleneck of my crawls: NativeCodeLoader - posted by James Ford <si...@gmail.com> on 2012/03/26 13:35:54 UTC, 1 replies.
- Pages that does not dedup - posted by Jan Riewe <ja...@comspace.de> on 2012/03/26 18:07:06 UTC, 0 replies.
- divide fetch process ? - posted by pepe3059 <pe...@gmail.com> on 2012/03/27 00:19:26 UTC, 1 replies.
- Different number of parsed pages for crawls with same settings - posted by Elisabeth Adler <el...@gmail.com> on 2012/03/27 10:28:29 UTC, 2 replies.
- Re: Crawling blogs, feeds & comments - posted by pragya <pr...@gmail.com> on 2012/03/27 11:35:00 UTC, 1 replies.
- Nutch limiting crawl to 100 documents per directory - posted by shano <Sh...@gmail.com> on 2012/03/27 13:02:58 UTC, 3 replies.
- Relative urls, interpage href anchors - posted by webdev1977 <we...@gmail.com> on 2012/03/27 14:43:19 UTC, 3 replies.
- Re-indexing temporarily unavailable page - posted by dspathis <ds...@gmail.com> on 2012/03/27 21:26:12 UTC, 2 replies.
- How to get Term Frequency Vector - posted by Vijith <vi...@gmail.com> on 2012/03/28 08:10:57 UTC, 6 replies.
- Re: Merging issues! - posted by "nutch.buddy@gmail.com" <nu...@gmail.com> on 2012/03/29 11:04:45 UTC, 0 replies.
- Nutch on Hadoop cluster - posted by ashish vyas <ma...@gmail.com> on 2012/03/30 11:58:10 UTC, 0 replies.