You are viewing a plain text version of this content. The canonical link for it is here.
- Re: updatedb fails - posted by AJ Chen <aj...@web2express.org> on 2010/10/01 07:20:13 UTC, 0 replies.
- Re: Not getting all documents - posted by webdev1977 <we...@gmail.com> on 2010/10/01 13:41:12 UTC, 1 replies.
- problems with libraries parse-rtf and parse-msword - posted by Miguel Tinte <mi...@gmail.com> on 2010/10/01 14:01:13 UTC, 2 replies.
- RE: Excluding javascript files from indexing and search results. - posted by "Nemani, Raj" <Ra...@turner.com> on 2010/10/01 16:11:53 UTC, 0 replies.
- Run crawl from java code - posted by Marseld Dedgjonaj <ma...@ikubinfo.com> on 2010/10/02 15:51:28 UTC, 4 replies.
- solrindex with a pseudo-cluster - posted by Steve Cohen <ma...@gmail.com> on 2010/10/02 17:05:07 UTC, 2 replies.
- Re: hadoop or nutch problem? - posted by AJ Chen <aj...@web2express.org> on 2010/10/02 19:28:29 UTC, 0 replies.
- How to Know the flow of the plugins in nutch - posted by nitin hardeniya <ni...@gmail.com> on 2010/10/02 21:57:25 UTC, 0 replies.
- Re: New to Nutch - posted by Israel <we...@gmail.com> on 2010/10/04 05:26:48 UTC, 0 replies.
- Advanced Search with nutch + Boolean operators - posted by Israel <we...@gmail.com> on 2010/10/04 05:56:30 UTC, 1 replies.
- About SOLR and Nutch - posted by Israel <we...@gmail.com> on 2010/10/04 06:02:05 UTC, 2 replies.
- Nutch on file system and web - posted by Davide Cavalaglio <da...@desktopsrl.com> on 2010/10/04 12:49:08 UTC, 2 replies.
- Hadoop compression - posted by Christopher Laux <ct...@googlemail.com> on 2010/10/04 17:48:24 UTC, 1 replies.
- map & reduce tasks numbers - posted by Dennis <ar...@yahoo.com.cn> on 2010/10/05 12:31:44 UTC, 2 replies.
- Nutch-Eclipse - posted by Yavuz Selim YILMAZ <yv...@gmail.com> on 2010/10/05 13:15:25 UTC, 9 replies.
- need a larger map task number - posted by Dennis <ar...@yahoo.com.cn> on 2010/10/05 15:24:27 UTC, 7 replies.
- org.apache.hadoop.mapred.FileAlreadyExistsException - posted by "McGibbney, Lewis John" <Le...@gcu.ac.uk> on 2010/10/05 20:15:32 UTC, 0 replies.
- How to Setup Multiple Crawls in same Nutch code base? - posted by Savannah Beckett <sa...@yahoo.com> on 2010/10/05 21:32:21 UTC, 0 replies.
- revisit time as a function of content type - posted by Christopher Laux <ct...@googlemail.com> on 2010/10/05 23:17:49 UTC, 2 replies.
- very slow fetch job - posted by Dennis <ar...@yahoo.com.cn> on 2010/10/06 02:58:03 UTC, 0 replies.
- Custom Search - posted by Yavuz Selim YILMAZ <yv...@gmail.com> on 2010/10/06 11:16:17 UTC, 2 replies.
- crawling encoding problem - posted by Miguel Tinte <mi...@gmail.com> on 2010/10/06 13:26:05 UTC, 0 replies.
- How to know what fields can be searched? - posted by Bill Arduino <ro...@gmail.com> on 2010/10/06 14:03:54 UTC, 1 replies.
- Re: Fwd: Fetch/Dump problem: Some Chinese characters incorrect. - posted by matinte <mi...@gmail.com> on 2010/10/06 17:10:59 UTC, 1 replies.
- Ip filtering - posted by Jean-Francois Gingras <je...@gmail.com> on 2010/10/06 19:59:09 UTC, 5 replies.
- nutch 1.2 crawl error - posted by herbs yang <he...@gmail.com> on 2010/10/06 23:26:14 UTC, 2 replies.
- How to Parse Non-Url fields in XML? - posted by Savannah Beckett <sa...@yahoo.com> on 2010/10/07 07:49:15 UTC, 0 replies.
- Exclude html-content from index - posted by Matthias Paul <ma...@gmail.com> on 2010/10/07 12:12:03 UTC, 3 replies.
- Can't find org.gora.sql.store.SqlStore - posted by Markus Jelsma <ma...@openindex.io> on 2010/10/07 12:31:08 UTC, 4 replies.
- fetcher.store.content and fetcher.parse - posted by webdev1977 <we...@gmail.com> on 2010/10/07 14:42:00 UTC, 3 replies.
- Parse MS Office etc. in Nutch 1.2 - posted by Erlend Garåsen <e....@usit.uio.no> on 2010/10/08 11:28:44 UTC, 5 replies.
- empty search.jsp page, Distributed Searching - posted by Dennis <ar...@yahoo.com.cn> on 2010/10/08 15:55:00 UTC, 0 replies.
- side by side versions of Nutch - posted by MilleBii <mi...@gmail.com> on 2010/10/08 19:06:22 UTC, 1 replies.
- Adding servers in the cluster - posted by MilleBii <mi...@gmail.com> on 2010/10/08 19:21:42 UTC, 4 replies.
- bug? distributed searching, ugly search.jsp - posted by Dennis <ar...@yahoo.com.cn> on 2010/10/09 08:14:33 UTC, 0 replies.
- Distributed Searching, the crawl folder in HDFS - posted by Dennis <ar...@yahoo.com.cn> on 2010/10/09 09:16:14 UTC, 0 replies.
- Crawling sub-pages but not indexing parent page - posted by Žygimantas Medelis <zy...@medelis.lt> on 2010/10/09 21:51:37 UTC, 0 replies.
- Crawl speed control and HTTP Post - posted by zouzhile <zo...@126.com> on 2010/10/10 07:37:08 UTC, 1 replies.
- HTTP Scheme problem - posted by matinte <mi...@gmail.com> on 2010/10/11 13:23:26 UTC, 0 replies.
- Crawl in AIX - posted by Yavuz Selim YILMAZ <yv...@gmail.com> on 2010/10/12 09:20:13 UTC, 0 replies.
- Instant Search - posted by Dennis <ar...@yahoo.com.cn> on 2010/10/12 10:11:54 UTC, 6 replies.
- Eclipse and Ant build problems - posted by Erlend Garåsen <e....@usit.uio.no> on 2010/10/12 11:38:17 UTC, 0 replies.
- Issues with certain URLs not being fetched. - posted by Mike Pountney <Mi...@semantico.com> on 2010/10/12 12:10:08 UTC, 3 replies.
- Class loading problem running Nutch on existing Hadoop cluster. - posted by Massimo Schiavon <ms...@volunia.com> on 2010/10/12 12:59:04 UTC, 0 replies.
- nutch and page parameters - posted by Antonios Katsikadamos <ak...@googlemail.com> on 2010/10/12 15:42:08 UTC, 0 replies.
- Search RSS - Search only into the links URG - posted by Israel <we...@gmail.com> on 2010/10/13 01:25:02 UTC, 0 replies.
- Problem parsing application/xhtml+xml - posted by Okke Klein <kl...@octoweb.nl> on 2010/10/13 11:33:17 UTC, 1 replies.
- apache nutch query,search - posted by Antonios Katsikadamos <ak...@googlemail.com> on 2010/10/13 15:48:56 UTC, 0 replies.
- i wanto to Not index a home page.....but i want to index the links - posted by Israel <we...@gmail.com> on 2010/10/13 17:34:53 UTC, 5 replies.
- Get content-length in IndexingFilter - posted by Hannes Carl Meyer <ha...@googlemail.com> on 2010/10/13 18:35:02 UTC, 0 replies.
- How to crawl some specific URLs - posted by nitin hardeniya <ni...@gmail.com> on 2010/10/15 19:53:56 UTC, 1 replies.
- How to ensure CrawlDb gets updated with All urls from previous Fetch - posted by Emmanuel de Castro Santana <em...@gmail.com> on 2010/10/15 20:01:31 UTC, 1 replies.
- SmartChineseAnalyzer - posted by Dennis <ar...@yahoo.com.cn> on 2010/10/16 09:43:49 UTC, 1 replies.
- Removing Common Web Page Header and Footer from All Content Fetched by Nutch - posted by Israel Ekpo <is...@gmail.com> on 2010/10/19 03:01:50 UTC, 3 replies.
- No urls in crawldb, just unfetched seed - posted by Atreiu Fuyu <at...@gmail.com> on 2010/10/19 11:53:21 UTC, 2 replies.
- [ANNOUNCE] Welcome Markus Jelsma as a Nutch Committer - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2010/10/19 23:36:07 UTC, 2 replies.
- Nutch-1.2 Crawling Chinese Web Pages - posted by Dennis <ar...@yahoo.com.cn> on 2010/10/20 03:53:07 UTC, 1 replies.
- Why donnot we skip .js - posted by Dennis <ar...@yahoo.com.cn> on 2010/10/20 07:40:49 UTC, 0 replies.
- Multilingual html-pages with Nutch and Solr - posted by Matthias Paul <ma...@gmail.com> on 2010/10/20 09:27:42 UTC, 4 replies.
- http.agent and "unsupported browser" - posted by brad <br...@bcs-mail.net> on 2010/10/20 18:34:47 UTC, 8 replies.
- Crawl the whole blog, but store just the last post - posted by Alberto <al...@lsi.upc.edu> on 2010/10/21 17:45:32 UTC, 4 replies.
- crawl and clustering - posted by Antonios Katsikadamos <ak...@googlemail.com> on 2010/10/22 12:04:57 UTC, 0 replies.
- SOLR Support - can nutch be more "tolerant" if a push error does happen? - posted by Torsten Krah <tk...@fachschaft.imn.htwk-leipzig.de> on 2010/10/22 15:20:21 UTC, 0 replies.
- memory leak using Nutch 1.2! help - posted by cong liu <co...@gmail.com> on 2010/10/23 11:05:46 UTC, 2 replies.
- Read RSS - not return a link with the same page - posted by Israel <we...@gmail.com> on 2010/10/24 03:23:03 UTC, 0 replies.
- regex-urlfilter.txt is ignored - posted by Erlend Garåsen <e....@usit.uio.no> on 2010/10/25 17:42:33 UTC, 2 replies.
- Are there any web crawlers based on database? - posted by xiao yang <ya...@gmail.com> on 2010/10/26 04:56:39 UTC, 15 replies.
- Any changes to setting up solr with nutch 1.2? - posted by Steve Cohen <ma...@gmail.com> on 2010/10/26 21:05:33 UTC, 6 replies.
- Re: Any changes to setting up solr with nutch1.2? - posted by Markus Jelsma <ma...@openindex.io> on 2010/10/26 21:43:14 UTC, 1 replies.
- Host Grouping - posted by Rob Hunter <rh...@oversee.net> on 2010/10/26 22:53:40 UTC, 0 replies.
- If-Modified-Since header with Nutch - posted by Davide Cavalaglio <da...@desktopsrl.com> on 2010/10/27 12:28:15 UTC, 0 replies.
- http authentication and multicore - posted by Juan Felix <ja...@hotmail.com> on 2010/10/27 20:11:45 UTC, 2 replies.
- downloading exact number of pages from list of seed urls - posted by Krish Pan <ce...@gmail.com> on 2010/10/27 23:29:02 UTC, 5 replies.
- failure running on hadoop - posted by Claudio Martella <cl...@tis.bz.it> on 2010/10/28 12:30:22 UTC, 4 replies.
- Can Nutch index/parse targeted sections of a web page? - posted by Andrew McCombe <eu...@gmail.com> on 2010/10/28 12:41:23 UTC, 4 replies.
- How to let nutch index a field? - posted by Dennis <ar...@yahoo.com.cn> on 2010/10/28 13:29:38 UTC, 0 replies.
- Crawling some specific url & avoiding other urls - posted by nitin hardeniya <ni...@gmail.com> on 2010/10/29 11:51:38 UTC, 0 replies.
- org.apache.hadoop.util.DiskChecker$DiskErrorException - posted by a a <mb...@msn.com> on 2010/10/29 16:44:42 UTC, 0 replies.
- Error: No agents listed in 'http.agent.name' property. - posted by Matthew Stevens <ma...@matthewstevens.org> on 2010/10/29 20:08:54 UTC, 3 replies.
- When is it safe to delete a segment? - posted by MilleBii <mi...@gmail.com> on 2010/10/29 21:56:02 UTC, 2 replies.
- Storing voting data with Nutch - posted by MilleBii <mi...@gmail.com> on 2010/10/29 22:08:54 UTC, 2 replies.
- RSS output; deleting urls - posted by Rob Hunter <rh...@oversee.net> on 2010/10/30 00:18:54 UTC, 0 replies.
- Upgrading from 0.9 to 1.2: OpenSearch - posted by "David M. Cole" <dm...@colegroup.com> on 2010/10/31 23:37:50 UTC, 0 replies.