You are viewing a plain text version of this content. The canonical link for it is here.
- please help me solve the problem - posted by 周杰 <zh...@126.com> on 2011/08/01 07:01:05 UTC, 1 replies.
- "network timeout" on 404 pages - posted by Christian Weiske <ch...@netresearch.de> on 2011/08/01 08:41:07 UTC, 4 replies.
- Error "Input path does not exist" when crawling - posted by Christian Weiske <ch...@netresearch.de> on 2011/08/01 09:32:27 UTC, 7 replies.
- Re: Change user-agent in runtime - posted by Markus Jelsma <ma...@openindex.io> on 2011/08/01 13:12:38 UTC, 0 replies.
- Re: TF in wide internet crawls - posted by Markus Jelsma <ma...@openindex.io> on 2011/08/01 13:16:21 UTC, 0 replies.
- Re: Client certificate authentication - posted by Benjamin Heilbrunn <be...@gmail.com> on 2011/08/01 13:41:25 UTC, 0 replies.
- topN with maxNumSegments? - posted by Markus Jelsma <ma...@openindex.io> on 2011/08/01 14:59:25 UTC, 2 replies.
- Nutch-1.3 + Solr 3.3.0 = fail - posted by "John R. Brinkema" <br...@teo.uscourts.gov> on 2011/08/01 20:45:56 UTC, 9 replies.
- Re: Fetched pages has no content - posted by webdev1977 <we...@gmail.com> on 2011/08/01 20:49:23 UTC, 4 replies.
- RE: Nutch not indexing full collection - posted by Chip Calhoun <cc...@aip.org> on 2011/08/01 21:26:39 UTC, 3 replies.
- protocol-httpclient - posted by webdev1977 <we...@gmail.com> on 2011/08/01 21:28:27 UTC, 2 replies.
- redirect and cookie - posted by Dinçer Kavraal <dk...@gmail.com> on 2011/08/02 00:17:28 UTC, 1 replies.
- Some Dump Content Truncated/Corrupted - posted by espeed <ja...@jamesthornton.com> on 2011/08/02 22:56:02 UTC, 2 replies.
- how to extract tf-idf - posted by Zhanibek Datbayev <it...@gmail.com> on 2011/08/03 06:28:10 UTC, 2 replies.
- Re: imported to solr - posted by Kiks <ki...@gmail.com> on 2011/08/03 08:31:16 UTC, 4 replies.
- NullPointerException when calling readdb on empty database - posted by Christian Weiske <ch...@netresearch.de> on 2011/08/03 09:34:49 UTC, 1 replies.
- Fetching ever-changing URLs - posted by Christian Weiske <ch...@netresearch.de> on 2011/08/03 10:02:10 UTC, 1 replies.
- New wiki page for Running Nutch 1.3 in Eclipse - posted by lewis john mcgibbney <le...@gmail.com> on 2011/08/03 14:13:20 UTC, 3 replies.
- solrclean doesn't send delete commands to solr (nutch-1.3) - posted by Alexander Malamud <am...@gmail.com> on 2011/08/04 00:09:21 UTC, 1 replies.
- Re: ranking in nutch/solr results - posted by Way Cool <wa...@gmail.com> on 2011/08/04 00:47:52 UTC, 0 replies.
- remove me - posted by Cheng Li <ch...@usc.edu> on 2011/08/04 22:53:03 UTC, 1 replies.
- Need help handeling corrupted files - posted by Marek Bachmann <m....@uni-kassel.de> on 2011/08/05 13:38:49 UTC, 2 replies.
- How to avoid splitting strings when indexing to solr - posted by Marek Bachmann <m....@uni-kassel.de> on 2011/08/05 15:08:14 UTC, 5 replies.
- Issue with erroneous URL - posted by Sammy Yu <sy...@brightedge.com> on 2011/08/06 12:11:16 UTC, 1 replies.
- Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk - posted by lewis john mcgibbney <le...@gmail.com> on 2011/08/06 19:54:24 UTC, 3 replies.
- Subcollection - posted by Simone Frenzel <ps...@googlemail.com> on 2011/08/08 13:34:38 UTC, 1 replies.
- Re: DocuemntFragement and XPath - posted by gonenc <go...@hotmail.com> on 2011/08/09 09:15:43 UTC, 1 replies.
- Some questions regarding nutch in distributed computing environment - posted by jeffersonzhou <je...@gmail.com> on 2011/08/10 10:17:18 UTC, 3 replies.
- Nutch & Hadoop - posted by jeffersonzhou <je...@gmail.com> on 2011/08/10 11:39:36 UTC, 5 replies.
- Crawl Page, Store full HTML content - posted by Christopher Gross <co...@gmail.com> on 2011/08/10 14:12:13 UTC, 3 replies.
- Re: Question to reduce while parsing - posted by Markus Jelsma <ma...@openindex.io> on 2011/08/10 14:24:49 UTC, 2 replies.
- questions about solrwriter - posted by Cam Bazz <ca...@gmail.com> on 2011/08/10 14:32:28 UTC, 2 replies.
- mysql or berkeley db in distributed nutch environment - posted by jeffersonzhou <je...@gmail.com> on 2011/08/12 08:57:26 UTC, 1 replies.
- ParseResult.put : result not added if Url contains ?,& or # - posted by Max Stricker <st...@gmail.com> on 2011/08/12 13:36:54 UTC, 0 replies.
- Working with facets - posted by Johan Svensson <jo...@euroling.se> on 2011/08/12 14:07:42 UTC, 1 replies.
- Multi-Value metadata missing in ParseResult - posted by Max Stricker <st...@gmail.com> on 2011/08/13 11:02:42 UTC, 4 replies.
- desktop search - posted by Andrew Naylor <na...@gmail.com> on 2011/08/15 04:41:11 UTC, 4 replies.
- Is running nutch in psuedo-distributed mode really worth it? - posted by webdev1977 <we...@gmail.com> on 2011/08/15 14:59:20 UTC, 3 replies.
- Re: ParseResult.put : result not added if Url contains ?,& or # - posted by Markus Jelsma <ma...@openindex.io> on 2011/08/15 15:08:17 UTC, 1 replies.
- Reducer failed when nutch and hadoop work togather - posted by jeffersonzhou <je...@gmail.com> on 2011/08/16 11:59:27 UTC, 1 replies.
- Some question about the generator - posted by Marek Bachmann <m....@uni-kassel.de> on 2011/08/16 15:16:53 UTC, 8 replies.
- fetcher runs without error with no internet connection - posted by al...@aim.com on 2011/08/16 22:23:30 UTC, 7 replies.
- Re: example of searching Nutch with Lucene - posted by acse <a2...@yahoo.com> on 2011/08/17 11:25:10 UTC, 2 replies.
- nutch redirect treatment - posted by abhayd <aj...@hotmail.com> on 2011/08/17 15:01:41 UTC, 6 replies.
- [ANNOUNCE] New Apache Nutch PMC Chair: Julien Nioche - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2011/08/18 01:10:40 UTC, 2 replies.
- customizing URL injection - posted by Dinçer Kavraal <dk...@gmail.com> on 2011/08/18 15:35:26 UTC, 9 replies.
- CrawlDatum.getMetaData() - posted by Markus Jelsma <ma...@openindex.io> on 2011/08/19 15:59:17 UTC, 1 replies.
- No hit on root word when stemming enabled - posted by Johan Svensson <jo...@euroling.se> on 2011/08/19 16:12:06 UTC, 1 replies.
- force recrawl - posted by Max Stricker <st...@gmail.com> on 2011/08/19 19:01:01 UTC, 7 replies.
- linkdb empty - posted by Cam Bazz <ca...@gmail.com> on 2011/08/20 00:34:47 UTC, 1 replies.
- Searching for special characters - posted by Harris Rappaport <hp...@gmail.com> on 2011/08/21 05:10:00 UTC, 3 replies.
- readdblink not showing alllinks - posted by abhayd <aj...@hotmail.com> on 2011/08/22 07:31:02 UTC, 8 replies.
- How to save html source to local drive - posted by dyzc2010 <je...@gmail.com> on 2011/08/22 16:02:51 UTC, 3 replies.
- Empty LinkDB after invertlinks - posted by Marek Bachmann <m....@uni-kassel.de> on 2011/08/23 16:05:16 UTC, 8 replies.
- How to store data in new column in MySQL database Nutch 2.0 - posted by jcoffield <co...@hotmail.com> on 2011/08/23 16:59:10 UTC, 1 replies.
- Nutch crawl updates ignore cans URL - posted by "Ramanathapuram, Rajesh" <Ra...@turner.com> on 2011/08/23 17:36:28 UTC, 0 replies.
- Nutch on EMR - posted by Peter Harrington <pe...@gmail.com> on 2011/08/24 03:03:09 UTC, 2 replies.
- Recursively searching through web dirs - posted by Adam Estrada <es...@gmail.com> on 2011/08/24 22:03:13 UTC, 3 replies.
- Trying to understand and use URLmeta - posted by "John R. Brinkema" <br...@teo.uscourts.gov> on 2011/08/24 22:36:26 UTC, 5 replies.
- Why URLNormalizer doesn't implement the Pluggable? - posted by Kaiwii Ho <ka...@gmail.com> on 2011/08/25 05:31:12 UTC, 2 replies.
- Are there any tutorial for writing regex-normalize.xml? - posted by Kaiwii Ho <ka...@gmail.com> on 2011/08/26 10:22:02 UTC, 2 replies.
- subscription for nutch - posted by Samata Sirsikar <sa...@gmail.com> on 2011/08/26 15:33:31 UTC, 0 replies.
- Re: keeping index up to date - posted by Radim Kolar <hs...@sendmail.cz> on 2011/08/27 07:49:12 UTC, 0 replies.
- Trying to complete index structure wiki page - posted by lewis john mcgibbney <le...@gmail.com> on 2011/08/27 22:19:24 UTC, 2 replies.
- How to generate multiple small segments w/o -numFetchers? - posted by Gabriele Kahlout <ga...@mysimpatico.com> on 2011/08/28 05:24:12 UTC, 4 replies.
- subcribtion - posted by Nikitha Shenoy <ni...@gmail.com> on 2011/08/28 16:13:42 UTC, 0 replies.
- a question about job failed - posted by zhao <25...@qq.com> on 2011/08/29 04:24:52 UTC, 3 replies.
- SSHD for Nutch 1.3 in Pseudo Distributed mode - posted by webdev1977 <we...@gmail.com> on 2011/08/29 16:58:59 UTC, 2 replies.
- Parameter tuning or how to accelerate fetching - posted by "Eggebrecht, Thomas (GfK Marktforschung)" <th...@gfk.com> on 2011/08/29 17:33:48 UTC, 11 replies.
- Injector hanging on Hadoop 0.20.6 - posted by lewis john mcgibbney <le...@gmail.com> on 2011/08/29 17:56:59 UTC, 2 replies.
- Nutch 1.3 - DIFAT array IOException on parsing files - posted by Elisabeth Adler <el...@gmail.com> on 2011/08/30 19:29:51 UTC, 8 replies.
- Regarding Decrease in number of domains in readdb -stats -sort - posted by Gaurav Bagga <gb...@gmail.com> on 2011/08/30 22:12:40 UTC, 3 replies.
- Parse reduce slow as a snail - posted by Markus Jelsma <ma...@openindex.io> on 2011/08/31 00:29:55 UTC, 3 replies.
- Parser crash with HeapSpace error - posted by Marek Bachmann <m....@uni-kassel.de> on 2011/08/31 01:47:05 UTC, 2 replies.
- Weight servers differently - posted by Johan Svensson <jo...@euroling.se> on 2011/08/31 10:22:34 UTC, 8 replies.
- Parsing only common file types - posted by Marek Bachmann <m....@uni-kassel.de> on 2011/08/31 12:49:02 UTC, 1 replies.
- ERROR INDEXING - posted by zm...@facinf.uho.edu.cu on 2011/08/31 19:20:20 UTC, 2 replies.
- Redirect Handling - posted by Gaurav Bagga <gb...@gmail.com> on 2011/08/31 19:42:48 UTC, 0 replies.