You are viewing a plain text version of this content. The canonical link for it is here.
- Solrdedup fails due to date format - posted by al...@aim.com on 2012/02/01 06:26:28 UTC, 3 replies.
- is it necessary to merge DBs before solrindex? - posted by remi tassing <ta...@gmail.com> on 2012/02/01 07:23:50 UTC, 0 replies.
- Re: why nutch dosen't crawl Arabic sites well? - posted by mina <ta...@gmail.com> on 2012/02/01 08:44:25 UTC, 1 replies.
- Re: Focused crawling with nutch - posted by Vijith <vi...@gmail.com> on 2012/02/01 11:53:05 UTC, 11 replies.
- Re: why nutch dosen't crawl all links - posted by mina <ta...@gmail.com> on 2012/02/01 14:09:53 UTC, 0 replies.
- Bad Request in nutch when i use parsechecker? - posted by mina <ta...@gmail.com> on 2012/02/01 14:12:40 UTC, 5 replies.
- Error with solrindex - posted by Joshua J Pavel <jp...@us.ibm.com> on 2012/02/01 15:00:55 UTC, 3 replies.
- Re: invalid uri with "three dots" - posted by remi tassing <ta...@gmail.com> on 2012/02/01 19:18:51 UTC, 0 replies.
- org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: - posted by kaveh minooie <ka...@plutoz.com> on 2012/02/02 01:32:23 UTC, 0 replies.
- how can i use patch-with-utf8-encoding.diff in https://issues.apache.org/jira/browse/NUTCH-1098? - posted by mina <ta...@gmail.com> on 2012/02/02 12:49:21 UTC, 1 replies.
- Nutch 2.0 Webapp - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/02 15:06:52 UTC, 0 replies.
- Failed fetching - posted by Dean Pullen <de...@semantico.com> on 2012/02/02 17:44:11 UTC, 22 replies.
- index-blacklist-whitelist pluign for multiple set of urls - posted by abhayd <aj...@hotmail.com> on 2012/02/03 01:10:43 UTC, 2 replies.
- How parse *only* specific URLs under a domain... -depth 1 -topN 1 does not work as desired - posted by Matt Poff <ma...@headfirst.co.nz> on 2012/02/03 01:37:02 UTC, 3 replies.
- Crawling Local Files within Cygwin - posted by costas0811 <co...@hotmail.com> on 2012/02/03 05:06:59 UTC, 6 replies.
- Nutch unfetched urls count - posted by nutchsolruser <nu...@gmail.com> on 2012/02/03 09:37:08 UTC, 1 replies.
- Is it still possible to create a pure lucene index? - posted by Marek Bachmann <m....@uni-kassel.de> on 2012/02/03 19:55:33 UTC, 3 replies.
- One WebPage to many NutchDocuments - posted by SUJIT PAL <su...@comcast.net> on 2012/02/04 19:59:34 UTC, 3 replies.
- Custom Plugin - Multiple Title values error - posted by Joshua J Pavel <jp...@us.ibm.com> on 2012/02/04 20:49:23 UTC, 0 replies.
- nutch logs when run over hadoop - posted by kaveh minooie <ka...@plutoz.com> on 2012/02/05 00:09:16 UTC, 5 replies.
- Just fetch a specified URL list - posted by Xiao Li <sh...@gmail.com> on 2012/02/06 06:07:08 UTC, 2 replies.
- RSS parser - posted by Michael Kazekin <Mi...@mediainsight.info> on 2012/02/06 14:10:34 UTC, 9 replies.
- Too few parsed pages - posted by Danicela nutch <Da...@mail.com> on 2012/02/06 16:44:25 UTC, 2 replies.
- Re : Re: Too few parsed pages - posted by Danicela nutch <Da...@mail.com> on 2012/02/06 17:03:52 UTC, 1 replies.
- Thread spinWaiting, utilizing bandwidth and connection time out error - posted by kaveh minooie <ka...@plutoz.com> on 2012/02/06 23:22:35 UTC, 0 replies.
- how are CSV/TXT files handled - posted by remi tassing <ta...@gmail.com> on 2012/02/07 08:16:20 UTC, 9 replies.
- Dump into Cassandra using Nutch 1.x - posted by co...@complexityintelligence.com on 2012/02/07 15:12:59 UTC, 7 replies.
- Solandra & Nutch [WAS] Re: Dump into Cassandra using Nutch 1.x - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/08 17:46:58 UTC, 3 replies.
- Re: Java out of memory error - posted by webdev1977 <we...@gmail.com> on 2012/02/08 20:10:22 UTC, 1 replies.
- Seed urls not getting crawled. - posted by Sudip Datta <pi...@gmail.com> on 2012/02/09 08:26:07 UTC, 1 replies.
- WARN regex.RegexURLNormalizer: Can't load the default rules! during Nutch Crawl - posted by Haggai R <ha...@gmail.com> on 2012/02/09 09:16:59 UTC, 1 replies.
- generate.count.mode host vs. domain - posted by kaveh minooie <ka...@plutoz.com> on 2012/02/09 20:57:27 UTC, 1 replies.
- How do "content" and "parseResult" relate? - posted by Joshua J Pavel <jp...@us.ibm.com> on 2012/02/09 23:22:07 UTC, 1 replies.
- Problem in crawling a button (which contains a link) through Nutch - posted by gauravchaudhary <ch...@hcl.com> on 2012/02/10 08:28:33 UTC, 1 replies.
- fetch status in hadoop jobtasks.jsp - posted by kaveh minooie <ka...@plutoz.com> on 2012/02/10 19:58:53 UTC, 1 replies.
- number of map tasks for a fetch job - posted by kaveh minooie <ka...@plutoz.com> on 2012/02/10 20:24:23 UTC, 2 replies.
- Understanding NutchConfigration properly - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/11 23:21:54 UTC, 7 replies.
- Stylesheet in plugin not found when run in distributed mode - posted by webdev1977 <we...@gmail.com> on 2012/02/13 16:48:41 UTC, 6 replies.
- Invalid uri? - posted by kaveh minooie <ka...@plutoz.com> on 2012/02/14 00:57:49 UTC, 3 replies.
- Build a pipeline using nutch - posted by Puneet Pandey <pu...@gmail.com> on 2012/02/14 06:12:43 UTC, 11 replies.
- Are Injector, Generator, Fetcher and Parser Pluggable? - posted by Akash Ashok <th...@gmail.com> on 2012/02/14 11:54:59 UTC, 1 replies.
- fetcher.max.crawl.delay = -1 doesn't work? - posted by Danicela nutch <Da...@mail.com> on 2012/02/14 16:10:09 UTC, 1 replies.
- Filter out unnecessary fields in solrindex - posted by Michael Kazekin <Mi...@mediainsight.info> on 2012/02/14 17:45:30 UTC, 0 replies.
- From Nutch 1.2 to 1.4 - posted by remi tassing <ta...@gmail.com> on 2012/02/14 19:09:20 UTC, 0 replies.
- fetcher.threads.per.queue and fetcher.server.delay - posted by kaveh minooie <ka...@plutoz.com> on 2012/02/14 23:00:57 UTC, 2 replies.
- Re : Re: fetcher.max.crawl.delay = -1 doesn't work? - posted by Danicela nutch <Da...@mail.com> on 2012/02/15 10:08:53 UTC, 1 replies.
- tstamp vs. lastModified ... - posted by remi tassing <ta...@gmail.com> on 2012/02/15 14:26:33 UTC, 13 replies.
- Re : Re: Re : Re: fetcher.max.crawl.delay = -1 doesn't work? - posted by Danicela nutch <Da...@mail.com> on 2012/02/16 10:38:46 UTC, 1 replies.
- Question regarding NutchHadoopTutorial - posted by webdev1977 <we...@gmail.com> on 2012/02/16 18:15:11 UTC, 2 replies.
- Trouble with checking Gora trunk from SVN - posted by apachenutch <po...@gmail.com> on 2012/02/17 05:33:54 UTC, 5 replies.
- Nutch setup on Cassandra error - posted by apachenutch <po...@gmail.com> on 2012/02/17 08:13:30 UTC, 9 replies.
- Failure authenticating with NTLM - posted by Gouri Deshpande <go...@gmail.com> on 2012/02/17 12:27:20 UTC, 1 replies.
- fetch "Aborting with 50 hung threads." - posted by Danicela nutch <Da...@mail.com> on 2012/02/17 14:55:21 UTC, 1 replies.
- concurrency and solrindex - posted by kaveh minooie <ka...@plutoz.com> on 2012/02/18 02:41:21 UTC, 0 replies.
- Tika with nutch - posted by Haya AL-Tuwaijri <ha...@hotmail.com> on 2012/02/18 06:49:43 UTC, 2 replies.
- IOExeption when crawling with nutch in Fetching process - posted by hadi <md...@gmail.com> on 2012/02/18 14:05:45 UTC, 7 replies.
- Some PDF contains is not readable when crawling with nutch - posted by hadi <md...@gmail.com> on 2012/02/18 14:37:22 UTC, 2 replies.
- URLNormalizer not working properly - posted by remi tassing <ta...@gmail.com> on 2012/02/18 20:57:43 UTC, 6 replies.
- index-basic and index-more cause multi-value on non-multi-value title field? - posted by shlomi java <sh...@gmail.com> on 2012/02/19 11:15:36 UTC, 3 replies.
- ParseSegment taking a long time to finish - posted by Magnús Skúlason <ma...@gmail.com> on 2012/02/19 14:53:23 UTC, 2 replies.
- Out of heap memory on 175K links in 'local' mode - posted by Michael Kazekin <Mi...@mediainsight.info> on 2012/02/20 11:10:15 UTC, 1 replies.
- nutch - posted by janwen <to...@163.com> on 2012/02/20 11:53:57 UTC, 1 replies.
- Re : Re : Re: Too few parsed pages - posted by Danicela nutch <Da...@mail.com> on 2012/02/20 17:17:16 UTC, 0 replies.
- Optimising the speed of Nutch. - posted by Bharat Goyal <bh...@shiksha.com> on 2012/02/21 09:19:26 UTC, 5 replies.
- problem with solrindex - posted by ka...@plutoz.com on 2012/02/21 12:56:06 UTC, 0 replies.
- attn:Markus :) multiple_values_encountered_for_non_multiValued_field_title - posted by kaveh minooie <ka...@plutoz.com> on 2012/02/21 21:05:17 UTC, 0 replies.
- Please help - Nutch fetch command not fetching data - posted by apachenutch <po...@gmail.com> on 2012/02/21 21:32:13 UTC, 5 replies.
- [nutchgora] - proposal to support distributed indexing - posted by SUJIT PAL <su...@comcast.net> on 2012/02/22 04:45:35 UTC, 9 replies.
- Re: attn:Markus :) multiple_values_encountered_for_non_multiValued_field_title - posted by Geek Gamer <ge...@gmail.com> on 2012/02/22 07:50:21 UTC, 1 replies.
- Error running Nutch 1.4 crawl on Amazon EMR using the S3 (s3n://) filesystem - posted by Ali S Kureishy <sa...@gmail.com> on 2012/02/22 13:42:36 UTC, 1 replies.
- Using jcifs for NTLM in HttpClient - posted by remi tassing <ta...@gmail.com> on 2012/02/22 14:58:00 UTC, 0 replies.
- Exception in thread "main" java.io.IOException: Job failed! - posted by Daniel Bourrion <da...@univ-angers.fr> on 2012/02/22 16:17:42 UTC, 7 replies.
- Re: http.redirect.max - posted by xuyuanme <xu...@gmail.com> on 2012/02/23 05:08:27 UTC, 8 replies.
- Nutch data to Solr on HTTPS - posted by Christopher Gross <co...@gmail.com> on 2012/02/23 19:26:19 UTC, 6 replies.
- Nutch AND Solr? Nutch performance and features - posted by Spadez <ja...@hotmail.com> on 2012/02/24 15:47:24 UTC, 1 replies.
- Re: Solr Indexing - posted by sc...@gmx.net, sc...@gmx.net on 2012/02/25 15:55:40 UTC, 0 replies.
- Re: how to set Adaptive Fetch Schedule for cwarling? - posted by lazetics <la...@yahoo.com> on 2012/02/25 17:41:53 UTC, 1 replies.
- run nutch1.4 in eclipse - posted by jianwen lou <lo...@gmail.com> on 2012/02/27 09:03:23 UTC, 1 replies.
- crawldb modifications - posted by Charles Thomas <ct...@wisc.edu> on 2012/02/27 20:10:00 UTC, 4 replies.
- Large Shared Drive Crawl - posted by webdev1977 <we...@gmail.com> on 2012/02/27 21:06:08 UTC, 5 replies.
- Query in nutch - posted by Geetha Venu <Ge...@infosys.com> on 2012/02/28 07:53:47 UTC, 0 replies.
- How to crowl AJAX populated pages - posted by Grijesh <pi...@gmail.com> on 2012/02/28 09:56:33 UTC, 5 replies.
- Re: Query in nutch - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/28 18:30:01 UTC, 0 replies.
- too few db_fetched - posted by pepe3059 <pe...@gmail.com> on 2012/02/29 02:33:24 UTC, 3 replies.
- [blog post] Accumulo, Nutch, and GORA - posted by Jason Trost <ja...@gmail.com> on 2012/02/29 02:41:58 UTC, 1 replies.
- Re: [blog post] Accumulo, Nutch, and Gora - posted by Enis Söztutar <en...@apache.org> on 2012/02/29 03:47:25 UTC, 0 replies.
- nutch crawling - posted by sanjay87 <Av...@infosys.com> on 2012/02/29 12:15:02 UTC, 0 replies.