You are viewing a plain text version of this content. The canonical link for it is here.
- Re: Crawling the local file system with Nutch - Document- - posted by kauu <ba...@gmail.com> on 2006/04/01 02:56:23 UTC, 2 replies.
- Re: Log Analysis - posted by TDLN <di...@gmail.com> on 2006/04/01 12:17:46 UTC, 0 replies.
- Nutch 0.7.2 release - posted by Piotr Kosiorowski <pk...@gmail.com> on 2006/04/01 21:26:11 UTC, 3 replies.
- hi all - posted by kauu <ba...@gmail.com> on 2006/04/02 16:48:01 UTC, 6 replies.
- RE: Multiple crawls how to get them to work together - posted by Dan Morrill <ra...@baker.edu> on 2006/04/02 16:49:44 UTC, 0 replies.
- Ubsubscribe - posted by Shahinul Islam <ja...@gmail.com> on 2006/04/02 17:14:10 UTC, 0 replies.
- Re: Nutch 0.7.2 release | upgrading from 0.7.1? - posted by "Håvard W. Kongsgård" <h....@niap.no> on 2006/04/02 18:06:03 UTC, 1 replies.
- Problems Installing - posted by Paul Stewart <ps...@nexicomgroup.net> on 2006/04/02 19:59:48 UTC, 5 replies.
- Merging Nutch crawls under 0.8-dev - posted by Carl Dorestos <ca...@gmail.com> on 2006/04/02 20:34:26 UTC, 0 replies.
- Tomcat Problem - posted by Paul Stewart <ps...@nexicomgroup.net> on 2006/04/03 00:54:54 UTC, 4 replies.
- Re: Merging indexes -- please help.... - posted by Vertical Search <ve...@gmail.com> on 2006/04/03 04:06:51 UTC, 6 replies.
- clean up of hadoop files - posted by Raghavendra Prabhu <rr...@gmail.com> on 2006/04/03 09:05:04 UTC, 2 replies.
- help please! - issues with merging indexes w/ DFS on 0.8 - posted by Olive g <ol...@hotmail.com> on 2006/04/03 16:53:24 UTC, 1 replies.
- Case insensitive regular expressions ? - posted by Thimo Eichstaedt <ab...@digithi.de> on 2006/04/03 17:47:54 UTC, 0 replies.
- thanks, but what I wanted to do is to merge segments from multiple crawls - posted by Olive g <ol...@hotmail.com> on 2006/04/03 21:14:38 UTC, 1 replies.
- Saving Metadata to Mysql - posted by mikeyc <mc...@gmail.com> on 2006/04/03 22:18:42 UTC, 4 replies.
- more questions on this - please advice - posted by Olive g <ol...@hotmail.com> on 2006/04/03 23:06:50 UTC, 4 replies.
- Crawling a file but not indexing it - posted by Benjamin Higgins <bh...@gmail.com> on 2006/04/04 00:10:33 UTC, 4 replies.
- Separate search and index servers? - posted by Scott Simpson <ss...@InterchangeUSA.com> on 2006/04/04 01:43:20 UTC, 1 replies.
- Query on merged indexes returned 0 hit - test case included (Nutch 0.8) - posted by Olive g <ol...@hotmail.com> on 2006/04/04 03:05:30 UTC, 3 replies.
- I had the same issue ... is this a bug or a configuration issue? - posted by Olive g <ol...@hotmail.com> on 2006/04/04 15:52:45 UTC, 0 replies.
- Meta-Refresh Question - posted by Dennis Kubes <nu...@dragonflymc.com> on 2006/04/04 16:38:32 UTC, 2 replies.
- RE: nutch config setup to crawl/query for word/pdf files - posted by Teruhiko Kurosaka <Ku...@basistech.com> on 2006/04/04 19:09:01 UTC, 0 replies.
- stackoverflow - posted by Rajesh Munavalli <fi...@gmail.com> on 2006/04/05 00:18:21 UTC, 1 replies.
- Re: Query on merged indexes returned 0 hit - more issues - posted by Olive g <ol...@hotmail.com> on 2006/04/05 02:24:29 UTC, 0 replies.
- Re: Adaptive fetch - posted by "D.Saravanaraj" <sa...@gmail.com> on 2006/04/05 06:47:14 UTC, 0 replies.
- nutch-svn: Reduce operations: can't open map output - posted by Shawn Gervais <pr...@project10.net> on 2006/04/05 07:32:49 UTC, 0 replies.
- Crawl status - posted by Fabrice Estiévenart <fe...@cetic.be> on 2006/04/05 11:13:49 UTC, 2 replies.
- generate failes - class org.apache.nutch.crawl.Generator$SelectorInverseMapper not org.apache.hadoop.mapred.Mapper - posted by Byron Miller <by...@yahoo.com> on 2006/04/05 15:03:04 UTC, 1 replies.
- Re: generate failes - class org.apache.nutch.crawl.Generator$SelectorInverseMapper not org.apache.hadoop.mapred.Mapper - posted by Jérôme Charron <je...@gmail.com> on 2006/04/05 15:08:02 UTC, 1 replies.
- Re: Adaptive Refetch - posted by Mehmet Tan <me...@agmlab.com> on 2006/04/05 16:21:31 UTC, 5 replies.
- details: stackoverflow error - posted by Rajesh Munavalli <fi...@gmail.com> on 2006/04/05 17:22:06 UTC, 13 replies.
- Info on scoring/indexing and pagerank - posted by "Insurance Squared Inc." <gc...@insurancesquared.com> on 2006/04/05 17:51:13 UTC, 0 replies.
- Nutch 500 Error - posted by Paul Stewart <ps...@nexicomgroup.net> on 2006/04/05 20:39:50 UTC, 8 replies.
- Re: Tuning nutch-0.8-dev (rev-374745 of 2006-02-03) - posted by Doug Cutting <cu...@apache.org> on 2006/04/05 23:47:01 UTC, 0 replies.
- please help!! inverlinks not work properly with more than 5 input parts (0.8) - posted by Olive g <ol...@hotmail.com> on 2006/04/06 14:25:48 UTC, 1 replies.
- .classpath and .project for 0.8 - posted by TDLN <di...@gmail.com> on 2006/04/06 14:50:33 UTC, 4 replies.
- Re: please help!! inverlinks not work properly with more than 5 input parts (0.8 - posted by Olive g <ol...@hotmail.com> on 2006/04/06 15:39:43 UTC, 0 replies.
- RuntimeException running Generator - posted by TDLN <di...@gmail.com> on 2006/04/06 16:01:07 UTC, 1 replies.
- latest build throws error - critical - posted by Raghavendra Prabhu <rr...@gmail.com> on 2006/04/06 16:10:47 UTC, 9 replies.
- Error while updating - posted by "K.A.Hussain Ali" <Hu...@photoninfotech.com> on 2006/04/06 17:08:07 UTC, 0 replies.
- Re: [Nutch-general] RE: boosting custom field values in scoring algorithm - posted by Daniel Iversen <da...@nexle.dk> on 2006/04/07 07:06:37 UTC, 3 replies.
- Large fetch fails with "Task process exit with nonzero status" - posted by Shawn Gervais <pr...@project10.net> on 2006/04/07 09:22:59 UTC, 1 replies.
- is this a bug? - posted by Zaheed Haque <za...@gmail.com> on 2006/04/07 09:33:41 UTC, 1 replies.
- Re: is this a bug? - NOT A BUG! - posted by Zaheed Haque <za...@gmail.com> on 2006/04/07 10:59:59 UTC, 0 replies.
- please help!! It always return 0 hit. - posted by lin yuan <li...@msn.com> on 2006/04/07 11:32:55 UTC, 2 replies.
- fetching stuck in the middle of processing - posted by Michael Ji <fj...@yahoo.com> on 2006/04/08 14:41:24 UTC, 2 replies.
- Add new content on the fly! - posted by "Goldschmidt, Dave" <dg...@globalspec.com> on 2006/04/08 22:32:30 UTC, 0 replies.
- Top query terms - posted by Berlin Brown <be...@gmail.com> on 2006/04/09 01:20:32 UTC, 0 replies.
- Nutch search and hard drive hot spots - posted by Dan Morrill <ra...@baker.edu> on 2006/04/09 16:20:50 UTC, 0 replies.
- Strange question!! but i want to know how to stop Nutch successfully - posted by sapan euf <sa...@yahoo.com> on 2006/04/09 23:43:36 UTC, 0 replies.
- Question about crawldb and segments - posted by Jason Camp <jc...@vhosting.com> on 2006/04/10 00:46:52 UTC, 7 replies.
- refetching interval - posted by Michael Ji <fj...@yahoo.com> on 2006/04/10 03:17:42 UTC, 2 replies.
- When Nutch fetches using mapred ... - posted by Shawn Gervais <pr...@project10.net> on 2006/04/10 10:13:14 UTC, 2 replies.
- Ran a crawl, but stopped in the middle - posted by Berlin Brown <be...@gmail.com> on 2006/04/10 15:34:29 UTC, 0 replies.
- Re: Jpeg and Exif Plugin - posted by Nutch Newbie <nu...@gmail.com> on 2006/04/10 15:37:09 UTC, 0 replies.
- Quesiton about Nutch needs for crawl data - posted by Dan Morrill <ra...@baker.edu> on 2006/04/10 16:00:05 UTC, 0 replies.
- Invalid index (can't re-index) - posted by "Fankhauser, Alain" <Al...@ipi.ch> on 2006/04/10 16:53:50 UTC, 0 replies.
- Nutch administration web interface? - posted by Robert Douglass <ro...@robshouse.net> on 2006/04/10 18:16:13 UTC, 4 replies.
- Does hadoop not reclaim blocks when files are deleted? - posted by Shawn Gervais <pr...@project10.net> on 2006/04/11 07:13:52 UTC, 3 replies.
- Re: [Nutch-general] Add new content on the fly! - posted by Kelvin Tan <ke...@relevanz.com> on 2006/04/11 11:42:57 UTC, 1 replies.
- Re: Small dev question - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/04/11 12:17:04 UTC, 2 replies.
- Auto-crawling & re-crawling the web site - posted by bob knob <an...@yahoo.com> on 2006/04/11 16:05:11 UTC, 2 replies.
- Enabling different file types - posted by bob knob <an...@yahoo.com> on 2006/04/11 17:57:21 UTC, 6 replies.
- Adaptive fetch patch - posted by Raghavendra Prabhu <rr...@gmail.com> on 2006/04/11 18:08:44 UTC, 0 replies.
- Crawling a large, finite set of sites. - posted by Terry Pothecary <te...@pothecary.com> on 2006/04/11 20:02:12 UTC, 0 replies.
- Crawling a large, finite set of websites - posted by David Steglov <da...@yahoo.co.uk> on 2006/04/11 20:11:04 UTC, 0 replies.
- Same Error (Version 0.8) - posted by mikeyc <mc...@gmail.com> on 2006/04/12 05:05:30 UTC, 16 replies.
- java.net.SocketTimeoutException: Read timed out - posted by Elwin <ma...@gmail.com> on 2006/04/12 09:55:44 UTC, 6 replies.
- Nutch0.6 and Nutch 0.7 crawlers - posted by eric park <hk...@gmail.com> on 2006/04/12 10:12:13 UTC, 3 replies.
- Re[2]: Nutch administration web interface? - posted by Dima Mazmanov <nu...@proservice.ge> on 2006/04/12 16:06:03 UTC, 0 replies.
- How do I run AND, OR queries and other query filters on nutch? - posted by Ravish Bhagdev <ra...@gmail.com> on 2006/04/12 18:58:04 UTC, 0 replies.
- plugins directory - posted by mikeyc <mc...@gmail.com> on 2006/04/12 19:58:23 UTC, 1 replies.
- Success - posted by mikeyc <mc...@gmail.com> on 2006/04/12 20:15:56 UTC, 0 replies.
- How best to debug failed fetch-reduce task - posted by Shawn Gervais <pr...@project10.net> on 2006/04/12 23:46:49 UTC, 1 replies.
- Help needed - how to import local files into Nutch 0.8? - posted by Carl Dorestos <ca...@gmail.com> on 2006/04/13 03:12:09 UTC, 2 replies.
- possible bug - posted by Raghavendra Prabhu <rr...@gmail.com> on 2006/04/13 14:39:35 UTC, 2 replies.
- Bug Fix ; File Response.java - posted by Raghavendra Prabhu <rr...@gmail.com> on 2006/04/13 15:10:14 UTC, 0 replies.
- nutch 0.7.2 webapp on resin3 - posted by sub paul <su...@gmail.com> on 2006/04/13 16:54:19 UTC, 0 replies.
- FileNotFoundException on crawl - posted by Michael Levy <Lu...@gmail.com> on 2006/04/13 21:20:09 UTC, 1 replies.
- Adding Level to Website Parse Data - posted by Dennis Kubes <nu...@dragonflymc.com> on 2006/04/13 22:44:50 UTC, 1 replies.
- NullPointerException due to nonexistent (mis-pointed) segments directory - posted by Michael Levy <Lu...@gmail.com> on 2006/04/14 15:40:29 UTC, 1 replies.
- Retrieve RSS Content - posted by mikeyc <mc...@gmail.com> on 2006/04/15 03:12:40 UTC, 0 replies.
- Using Nutch's distributed search server mode - posted by Shawn Gervais <pr...@project10.net> on 2006/04/15 08:57:37 UTC, 5 replies.
- How to run bin/nutch dedup when running multiple servers - posted by "Håvard W. Kongsgård" <h....@niap.no> on 2006/04/15 13:28:22 UTC, 0 replies.
- redirect treatment - posted by "Insurance Squared Inc." <gc...@insurancesquared.com> on 2006/04/15 15:01:15 UTC, 5 replies.
- ant to maven migration - posted by Raghavendra Prabhu <rr...@gmail.com> on 2006/04/15 21:42:31 UTC, 0 replies.
- Out of Memory Error Java Memory Space - posted by Vertical Search <ve...@gmail.com> on 2006/04/16 02:48:38 UTC, 0 replies.
- Can nutch fit to thi task ? - posted by ahmed ghouzia <gh...@yahoo.com> on 2006/04/16 10:36:17 UTC, 4 replies.
- Interesting sites crawled - posted by Berlin Brown <be...@gmail.com> on 2006/04/16 17:05:44 UTC, 0 replies.
- please help-updatedb error for step-by-step crawling (followed 0.8 tutorial) - posted by Olive g <ol...@hotmail.com> on 2006/04/16 18:41:27 UTC, 0 replies.
- Nutch and global scaling - posted by Iain <ia...@idcl.co.uk> on 2006/04/16 18:58:06 UTC, 0 replies.
- modify nutch to boost crawl for certain keywords - posted by Jason Viloria <jn...@yahoo.com> on 2006/04/17 12:11:09 UTC, 1 replies.
- Analyze - posted by "R.Mayoran" <ma...@team-lab.com> on 2006/04/17 14:31:37 UTC, 0 replies.
- Blogger RSS Parsing Error - posted by mikeyc <mc...@gmail.com> on 2006/04/17 19:02:36 UTC, 2 replies.
- Nutch shows same results multiple times. - posted by Dima Mazmanov <nu...@proservice.ge> on 2006/04/18 11:04:18 UTC, 6 replies.
- Nutch crawl not fetching portions of site - posted by Andrew Libby <an...@gmail.com> on 2006/04/18 16:33:04 UTC, 2 replies.
- Could someone please share your experience with 0.8 step-by-step crawl?? - posted by Olive g <ol...@hotmail.com> on 2006/04/18 17:08:42 UTC, 0 replies.
- Re: Could someone please share your experience with 0.8 step-by-step crawl?? - posted by mo...@richmondinformatics.com on 2006/04/18 17:36:24 UTC, 1 replies.
- Index statistics - posted by Benjamin Higgins <bh...@gmail.com> on 2006/04/18 19:52:54 UTC, 4 replies.
- How to get relevance scores of search results.... - posted by Ravish Bhagdev <ra...@gmail.com> on 2006/04/18 19:58:01 UTC, 1 replies.
- USing PruneTool - posted by Vertical Search <ve...@gmail.com> on 2006/04/18 23:50:33 UTC, 1 replies.
- How to deal with javascript urls? - posted by Elwin <ma...@gmail.com> on 2006/04/19 09:25:45 UTC, 1 replies.
- Numerical relevance - posted by Aled Jones <Al...@comtec-europe.co.uk> on 2006/04/19 12:04:36 UTC, 0 replies.
- java.io.IOException: No input directories specified in - posted by Peter Swoboda <pr...@gmx.de> on 2006/04/19 12:50:42 UTC, 32 replies.
- Re: Where to put the nutch-site.xml ? - posted by ap...@fcom.uh.cu on 2006/04/19 13:03:51 UTC, 2 replies.
- nutch 0.7.2 requires JAVA_HOME but not SDK - posted by Alexander E Genaud <lx...@pobox.com> on 2006/04/19 14:22:57 UTC, 0 replies.
- What is the ontology plugin? - posted by Ravish Bhagdev <ra...@gmail.com> on 2006/04/19 14:37:47 UTC, 0 replies.
- crawl command params misinterpreted under Solaris? - posted by Michael Levy <Lu...@gmail.com> on 2006/04/19 17:01:03 UTC, 1 replies.
- Re: Could someone please share your experience with 0.8step-by-step crawl?? - posted by Olive g <ol...@hotmail.com> on 2006/04/19 17:04:44 UTC, 0 replies.
- Re: Could someone please share your experience with 0.8step-by-step crawl?? - posted by mo...@richmondinformatics.com on 2006/04/19 18:05:13 UTC, 0 replies.
- crawling sites that require cookies/passwords. - posted by Cam Bazz <ca...@gmail.com> on 2006/04/19 21:56:11 UTC, 1 replies.
- java.io.IOException: Cannot create file - posted by mo...@richmondinformatics.com on 2006/04/20 12:17:48 UTC, 3 replies.
- nutch advanced usage tutorial - posted by Cam Bazz <ca...@gmail.com> on 2006/04/20 13:18:40 UTC, 1 replies.
- Re[2]: Nutch shows same results multiple times. - posted by Dima Mazmanov <nu...@proservice.ge> on 2006/04/20 14:46:43 UTC, 2 replies.
- nutch user meeting in San Francisco: May 18th - posted by Stefan Groschupf <sg...@media-style.com> on 2006/04/21 01:14:13 UTC, 1 replies.
- nutch installer - posted by Nutch Newbie <nu...@gmail.com> on 2006/04/21 08:15:34 UTC, 0 replies.
- Nutch Search stats - posted by Aled Jones <Al...@comtec-europe.co.uk> on 2006/04/21 15:50:54 UTC, 3 replies.
- favicon? - posted by Bill Goffe <go...@Oswego.EDU> on 2006/04/21 16:06:08 UTC, 1 replies.
- yes, a European nutch meeting is also planed :) - posted by Stefan Groschupf <sg...@media-style.com> on 2006/04/22 00:07:38 UTC, 0 replies.
- Nutch, Java bad on the harddrive? - posted by Berlin Brown <be...@gmail.com> on 2006/04/22 04:55:15 UTC, 0 replies.
- Re: yes, a European nutch meeting is also planed :) - posted by Arun Kaundal <ar...@gmail.com> on 2006/04/22 06:03:59 UTC, 8 replies.
- Intranet crawl - posted by Markus Franz <ma...@markus-franz.de> on 2006/04/22 13:36:30 UTC, 1 replies.
- indexing news sites that have RSS feeds - posted by Cam Bazz <ca...@gmail.com> on 2006/04/22 14:55:42 UTC, 1 replies.
- Search results at command line - posted by Markus Franz <ma...@markus-franz.de> on 2006/04/22 16:59:53 UTC, 2 replies.
- modifying header logo and page content - posted by Chris Fellows <cc...@sbcglobal.net> on 2006/04/23 04:48:53 UTC, 2 replies.
- deletable files - posted by alexis artes <al...@yahoo.com> on 2006/04/24 08:04:53 UTC, 0 replies.
- nutch readdb question - posted by TDLN <di...@gmail.com> on 2006/04/24 09:03:06 UTC, 1 replies.
- unable to filter different file format like .java,.jar,.class with nutch version 0.7.2 - posted by Arun Kumar Sharma <sh...@yahoo.co.in> on 2006/04/24 09:53:01 UTC, 0 replies.
- Re: unable to filter different file format like .java,.jar,.class with nutch version 0.7.2 - posted by TDLN <di...@gmail.com> on 2006/04/24 11:19:36 UTC, 1 replies.
- Restrictive searching approaches? - posted by Andrew Libby <an...@gmail.com> on 2006/04/24 15:55:49 UTC, 9 replies.
- Question about crawl expectations - posted by Jason Camp <jc...@vhosting.com> on 2006/04/25 06:55:38 UTC, 1 replies.
- How to get Text and Parse data for URL - posted by Dennis Kubes <nu...@dragonflymc.com> on 2006/04/25 22:12:16 UTC, 6 replies.
- IOException when generate fetch - posted by Ensheng Wang <nu...@yahoo.com.cn> on 2006/04/27 07:19:13 UTC, 0 replies.
- Optimizing the performance of a Nutch-based web application? - posted by Chun Wei Ho <cw...@gmail.com> on 2006/04/27 08:42:37 UTC, 2 replies.
- Beagle and Nutch - posted by Andrew Libby <an...@gmail.com> on 2006/04/27 13:52:56 UTC, 1 replies.
- Re: [Nutch-general] Beagle and Nutch - posted by og...@yahoo.com on 2006/04/27 20:21:33 UTC, 0 replies.
- MultiSearcher & skewed IDF values - posted by Ken Krugler <kk...@transpac.com> on 2006/04/27 23:32:05 UTC, 3 replies.
- Problem with sorting index - posted by Michael <mi...@gameservice.ru> on 2006/04/28 03:53:25 UTC, 2 replies.
- Running nutch on a non-port 80 site - posted by Deepa Devanathan <ti...@gmail.com> on 2006/04/28 10:21:54 UTC, 1 replies.
- Heritrix - posted by Aled Jones <Al...@comtec-europe.co.uk> on 2006/04/28 10:58:33 UTC, 6 replies.
- partial crawling - posted by Dima Mazmanov <di...@proservice.ge> on 2006/04/28 11:05:15 UTC, 0 replies.
- Connection refused tasktracker on slave machine - posted by zzcgiacomini <zz...@echo.fr> on 2006/04/28 13:00:26 UTC, 1 replies.
- Admin Gui beta test (was Re: ATB: Heritrix) - posted by Stefan Groschupf <sg...@media-style.com> on 2006/04/28 15:23:41 UTC, 15 replies.
- Startscript in windows - posted by Cement Xianyu <ce...@gmail.com> on 2006/04/30 17:57:25 UTC, 5 replies.