You are viewing a plain text version of this content. The canonical link for it is here.
- Re: [Nutch-general] Filtering words... - posted by og...@yahoo.com on 2005/09/01 00:00:16 UTC, 0 replies.
- RE: Analyser error - posted by EM <em...@cpuedge.com> on 2005/09/01 03:42:02 UTC, 0 replies.
- how to generate segments with html pages as input - posted by Rajendra Patil <Ra...@KPITCummins.com> on 2005/09/01 06:55:52 UTC, 3 replies.
- how to fetch all web pages on one site - posted by AJ Chen <an...@sbcglobal.net> on 2005/09/01 08:09:44 UTC, 1 replies.
- Re: parser for xsl, ppt and zip - posted by Ayyanar Inbamohan <te...@yahoo.com> on 2005/09/01 08:28:18 UTC, 0 replies.
- Search Summary - posted by qu...@webmail.co.za on 2005/09/01 11:06:16 UTC, 0 replies.
- Nutch in Solaris (Urgent) - posted by Valmir Macário <va...@gmail.com> on 2005/09/01 15:34:14 UTC, 3 replies.
- Re: How to crawl password required sites in nutch intranet usage? - posted by Pavan <ya...@gmail.com> on 2005/09/01 17:35:49 UTC, 0 replies.
- extension point for omitting page content? - posted by Kamil Wnuk <ka...@gmail.com> on 2005/09/01 23:03:56 UTC, 0 replies.
- Re: searching Accented characters - posted by Ken Krugler <kk...@transpac.com> on 2005/09/01 23:54:33 UTC, 0 replies.
- Made it to the end of the tutorial - posted by Berlin Brown <be...@gmail.com> on 2005/09/02 05:51:45 UTC, 0 replies.
- One more thing, on the web-app - posted by Berlin Brown <be...@gmail.com> on 2005/09/02 06:06:47 UTC, 0 replies.
- Is ppt,xls and powerpoint plugins compatible with nutch 6.0 - posted by Ayyanar Inbamohan <te...@yahoo.com> on 2005/09/02 06:20:25 UTC, 3 replies.
- a question about character encoding - posted by Joey Lv <jo...@achievo.com> on 2005/09/02 09:46:37 UTC, 0 replies.
- Nutch and zero search results - posted by Berlin Brown <be...@gmail.com> on 2005/09/02 13:24:46 UTC, 0 replies.
- Nutch on Intranet: segments DB update question - posted by "Kotov, Alex" <AK...@dpscs.state.md.us> on 2005/09/02 13:57:33 UTC, 0 replies.
- Get a list of links via API? - posted by Brent Goran <br...@strategoit.com> on 2005/09/02 17:33:15 UTC, 0 replies.
- per-page "boost" - concise definition anywhere? - posted by Brent Goran <br...@strategoit.com> on 2005/09/02 18:17:26 UTC, 3 replies.
- RangQuery problem. - posted by Benny <be...@gmail.com> on 2005/09/03 02:18:56 UTC, 5 replies.
- simple question - posted by Luke James <lj...@corp.trancos.com> on 2005/09/03 02:33:48 UTC, 4 replies.
- How to search by "links-to"? - posted by Brent Goran <br...@strategoit.com> on 2005/09/03 08:20:21 UTC, 1 replies.
- Link Analysis Score.. - posted by Rozina Sorathia <Ro...@KPITCummins.com> on 2005/09/03 11:38:52 UTC, 4 replies.
- How is Link Analysis score being calculated? - posted by Rozina Sorathia <Ro...@KPITCummins.com> on 2005/09/03 11:45:49 UTC, 2 replies.
- Content-type mismatch for Excel - posted by Ayyanar Inbamohan <te...@yahoo.com> on 2005/09/05 07:11:44 UTC, 6 replies.
- RE: [Nutch-general] DMOZ Web coverage - posted by EM <em...@cpuedge.com> on 2005/09/05 12:25:33 UTC, 0 replies.
- httpd/unix-directory - posted by EM <em...@cpuedge.com> on 2005/09/05 13:19:31 UTC, 2 replies.
- segment update result on an App Server - posted by Cherian Thomas <Ch...@KPITCummins.com> on 2005/09/05 13:51:00 UTC, 0 replies.
- set link analysis score to lucene index - posted by Michael Ji <fj...@yahoo.com> on 2005/09/05 15:38:41 UTC, 0 replies.
- Wildcards and different sites in Nutch - posted by Mark Johannes <Jo...@eon-is.com> on 2005/09/06 11:32:40 UTC, 0 replies.
- nutch 7.0 not fetching powerpoint, plugin is present - posted by Ayyanar Inbamohan <te...@yahoo.com> on 2005/09/06 13:42:37 UTC, 26 replies.
- why nutch taking application/msword for powerpoint - posted by Ayyanar Inbamohan <te...@yahoo.com> on 2005/09/06 14:36:14 UTC, 1 replies.
- I runed Nutch crawl but got an "FileNotFountException" ,why? - posted by mu xiaofeng <he...@gmail.com> on 2005/09/06 16:35:21 UTC, 1 replies.
- com.sun.net.ssl Error - posted by "Vanderdray, Jake" <JV...@aarp.org> on 2005/09/06 16:55:35 UTC, 5 replies.
- Link Analysis in OC - posted by Michael Ji <fj...@yahoo.com> on 2005/09/07 04:17:42 UTC, 0 replies.
- scope filter in OC - posted by Michael Ji <fj...@yahoo.com> on 2005/09/07 04:32:15 UTC, 0 replies.
- Re: [Nutch-general] scope filter in OC - posted by Kelvin Tan <ke...@relevanz.com> on 2005/09/07 05:45:39 UTC, 0 replies.
- Re: [Nutch-general] Link Analysis in OC - posted by Kelvin Tan <ke...@relevanz.com> on 2005/09/07 05:46:34 UTC, 0 replies.
- Re: JavaScript Urls - posted by Jack Tang <hi...@gmail.com> on 2005/09/07 09:51:39 UTC, 2 replies.
- OT sun jvm licenses (was: Re: com.sun.net.ssl Error) - posted by Bart van der Ouderaa <ba...@masterobjects.com> on 2005/09/07 15:02:53 UTC, 0 replies.
- Recrawling - posted by "Vanderdray, Jake" <JV...@aarp.org> on 2005/09/07 20:44:15 UTC, 4 replies.
- How can I use Nutch 0.7 to crawl the Dynamic news? - posted by mu xiaofeng <he...@gmail.com> on 2005/09/08 08:50:58 UTC, 6 replies.
- Difference between application/vnd.ms-powerpoint and application/powerpoint - posted by Ayyanar Inbamohan <te...@yahoo.com> on 2005/09/08 11:58:04 UTC, 1 replies.
- wildcards - posted by Ro...@wuestenrot.at on 2005/09/08 13:38:27 UTC, 0 replies.
- nutch merge - posted by Jay Pound <we...@poundwebhosting.com> on 2005/09/08 15:15:33 UTC, 1 replies.
- File system at a intranet - posted by Valmir Macário <va...@gmail.com> on 2005/09/08 15:36:03 UTC, 1 replies.
- I am able to crawl powerpoint and Excel files in nutch 7.0, - posted by Ayyanar Inbamohan <te...@yahoo.com> on 2005/09/09 09:07:48 UTC, 0 replies.
- Fetcher Error for local file system - posted by Rajendra Patil <Ra...@KPITCummins.com> on 2005/09/09 15:28:10 UTC, 2 replies.
- bin/nutch: IFS: cannot unset - posted by Valmir Macário <va...@gmail.com> on 2005/09/09 18:41:57 UTC, 0 replies.
- Can I use range query alone - posted by smith learner <sm...@yahoo.com> on 2005/09/11 04:20:17 UTC, 0 replies.
- Re: [Nutch-general] Can I use range query alone - posted by smith learner <sm...@yahoo.com> on 2005/09/11 22:39:43 UTC, 1 replies.
- intrantet crawling with nutch-0.7 - posted by Rajendra Patil <Ra...@KPITCummins.com> on 2005/09/12 08:26:59 UTC, 1 replies.
- Exception while searching, ArrayIndexOutOfBoundsException: -1 - posted by Ayyanar Inbamohan <te...@yahoo.com> on 2005/09/12 08:36:17 UTC, 0 replies.
- FW: Re: crawling protected pages - posted by Edward Quick <ed...@hotmail.com> on 2005/09/12 10:56:25 UTC, 0 replies.
- Recommended Links - posted by "Vanderdray, Jake" <JV...@aarp.org> on 2005/09/12 16:57:10 UTC, 1 replies.
- Sequences of Terms - posted by "Wilkerson, Cory" <cw...@cars.com> on 2005/09/12 18:09:38 UTC, 1 replies.
- Re: [Nutch-general] Re: Sequences of Terms - posted by Lars Aronsson <la...@aronsson.se> on 2005/09/12 19:51:39 UTC, 0 replies.
- why are unfetchable sites kept in webdb? - posted by Kamil Wnuk <ka...@gmail.com> on 2005/09/12 21:46:44 UTC, 1 replies.
- Is there any way to do range query on numerical field? - posted by smith learner <sm...@yahoo.com> on 2005/09/12 22:00:40 UTC, 2 replies.
- Crash when calling new NutchSearcher using nutch-0.7 - posted by Filip Radlinski <fr...@gmail.com> on 2005/09/13 00:29:41 UTC, 0 replies.
- How do I get OR to work? - posted by Lucas Rockwell <lu...@tsw.berkeley.edu> on 2005/09/13 05:46:10 UTC, 0 replies.
- does nutch frame servlet page - posted by ad...@interfree.it on 2005/09/13 13:04:15 UTC, 0 replies.
- Re-indexing segments to add more field information - posted by Mike Berrow <mb...@pacbell.net> on 2005/09/13 19:34:34 UTC, 1 replies.
- [nutch] - http.max.delays: retry later issue? - posted by Lukas Vlcek <lu...@gmail.com> on 2005/09/14 10:30:05 UTC, 9 replies.
- does Nutch crawl dynamic pages??? - posted by ad...@interfree.it on 2005/09/14 10:37:24 UTC, 3 replies.
- Whole web search depth - posted by Paul Williams <Pa...@becta.org.uk> on 2005/09/14 10:38:46 UTC, 5 replies.
- crawl-urlfilter.txt - posted by ad...@interfree.it on 2005/09/14 16:54:28 UTC, 3 replies.
- clustering support documentation - posted by "Ordway, Ryan" <Ry...@oregonstate.edu> on 2005/09/14 22:28:31 UTC, 0 replies.
- Prune - posted by Gal Nitzan <gn...@usa.net> on 2005/09/15 00:01:53 UTC, 4 replies.
- NULL pointer exception - posted by "Ordway, Ryan" <Ry...@oregonstate.edu> on 2005/09/15 01:20:43 UTC, 2 replies.
- can't parse https - posted by Edward Quick <ed...@hotmail.com> on 2005/09/15 16:49:08 UTC, 3 replies.
- Re: [Nutch-general] Re: [nutch] - http.max.delays: retry later issue? - posted by og...@yahoo.com on 2005/09/15 17:36:25 UTC, 2 replies.
- [0.7] Optimize Whole Web Crawl Process - posted by Jon Shoberg <jo...@shoberg.net> on 2005/09/15 19:22:43 UTC, 0 replies.
- Re[2]: Prune - posted by Egor Chernodarov <eg...@zarinsk.dem.ru> on 2005/09/16 08:25:25 UTC, 0 replies.
- Should type: and date: queries work with search.jsp? - posted by Edward Quick <ed...@hotmail.com> on 2005/09/16 10:53:14 UTC, 4 replies.
- crawl-urlfilter - posted by ad...@interfree.it on 2005/09/16 12:13:26 UTC, 0 replies.
- index local system - posted by Valmir Macário <va...@gmail.com> on 2005/09/16 15:41:59 UTC, 5 replies.
- indexing is very very very slow - posted by Gal Nitzan <gn...@usa.net> on 2005/09/16 15:59:29 UTC, 10 replies.
- hello nutchers! - posted by Jimmy Forrester <ji...@gmail.com> on 2005/09/16 23:46:32 UTC, 5 replies.
- pdf parsing - posted by Johannes Söllner <jo...@gmx.net> on 2005/09/17 00:27:47 UTC, 1 replies.
- pinging - posted by EM <em...@cpuedge.com> on 2005/09/17 00:57:07 UTC, 0 replies.
- Exception in mergesegs - posted by Gal Nitzan <gn...@usa.net> on 2005/09/17 04:33:55 UTC, 1 replies.
- crawler-documentation - posted by ad...@interfree.it on 2005/09/17 16:48:02 UTC, 1 replies.
- Using Luke with a Nutch index... FileNotFoundException - posted by dp...@comcast.net on 2005/09/18 05:05:07 UTC, 2 replies.
- UnknownHostException for known hosts - posted by AJ Chen <ca...@gmail.com> on 2005/09/18 06:04:13 UTC, 0 replies.
- Re: [Nutch-general] UnknownHostException for known hosts - posted by og...@yahoo.com on 2005/09/18 06:49:00 UTC, 1 replies.
- Setting segment location in webapp - posted by Vinny <xa...@gmail.com> on 2005/09/18 15:49:44 UTC, 3 replies.
- type: searches - posted by Edward Quick <ed...@hotmail.com> on 2005/09/19 13:58:33 UTC, 1 replies.
- Proposal: refuse to open partially trunc. MapFile, unless forced (Re: indexing is very very very slow) - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/09/19 21:10:57 UTC, 4 replies.
- Switch where index is located? - posted by Adrian Nadeau <an...@evolvingsolutions.ca> on 2005/09/19 22:35:48 UTC, 2 replies.
- regarding gal's faq proposal - posted by gekkokid <me...@gekkokid.org.uk> on 2005/09/20 02:26:37 UTC, 0 replies.
- nutch-user mail archive - posted by Gal Nitzan <gn...@usa.net> on 2005/09/20 08:26:53 UTC, 0 replies.
- Re: Is it possible to change the list of common words without crawling everything again - posted by Andrzej Bialecki <ab...@getopt.org> on 2005/09/20 09:59:06 UTC, 3 replies.
- JDK 1.5 - posted by Gal Nitzan <gn...@usa.net> on 2005/09/20 10:51:17 UTC, 3 replies.
- Updated FAQ - posted by Gal Nitzan <gn...@usa.net> on 2005/09/20 10:58:14 UTC, 0 replies.
- Seeking Nutch Consultant(s) for Six Week Project - posted by "Joe Reger, Jr." <re...@gmail.com> on 2005/09/20 17:48:53 UTC, 1 replies.
- Re: re-generating a fetchlist - posted by Michael Ji <fj...@yahoo.com> on 2005/09/20 19:04:41 UTC, 3 replies.
- Fetching FAQ - posted by "Vanderdray, Jake" <JV...@aarp.org> on 2005/09/20 20:20:27 UTC, 3 replies.
- NDFS java.io.IOException - posted by "Ordway, Ryan" <Ry...@oregonstate.edu> on 2005/09/20 22:52:51 UTC, 5 replies.
- Removing a page from WebDB - posted by Gustavo García <gg...@tid.es> on 2005/09/20 23:26:21 UTC, 0 replies.
- Newbie; Documentation? - posted by Niclas Rothman <ni...@lechill.com> on 2005/09/21 15:39:33 UTC, 0 replies.
- resuming intranet crawl - posted by Edward Quick <ed...@hotmail.com> on 2005/09/21 22:23:05 UTC, 0 replies.
- java.io.IOException: Task process exit with nonzero status - posted by Michael <mi...@gameservice.ru> on 2005/09/22 15:12:40 UTC, 3 replies.
- Links in a segement - posted by Richard Rodrigues <rr...@gold-solutions.com> on 2005/09/22 16:47:47 UTC, 2 replies.
- Re: How can I recover an aborted fetch process - posted by EM <em...@cpuedge.com> on 2005/09/23 07:44:47 UTC, 2 replies.
- page crawl limit? - posted by Edward Quick <ed...@hotmail.com> on 2005/09/23 13:12:43 UTC, 1 replies.
- No external command defined for contentType: - posted by Jon Shoberg <jo...@shoberg.net> on 2005/09/23 18:26:31 UTC, 1 replies.
- Parcer Policy - Re: No external command defined for contentType: - posted by Jon Shoberg <jo...@shoberg.net> on 2005/09/23 19:26:42 UTC, 5 replies.
- SocketTimeoutException - posted by AJ Chen <ca...@gmail.com> on 2005/09/23 19:40:40 UTC, 1 replies.
- Documents in Nutch - posted by dp...@comcast.net on 2005/09/24 05:08:32 UTC, 4 replies.
- HD question for large DB - posted by EM <em...@cpuedge.com> on 2005/09/24 19:07:05 UTC, 0 replies.
- Response content length is not known - posted by Gal Nitzan <gn...@usa.net> on 2005/09/25 08:19:14 UTC, 4 replies.
- Nutch Crawlerget stuck, database unusable? - posted by Paul van Brouwershaven <pa...@vanbrouwershaven.com> on 2005/09/26 07:39:27 UTC, 2 replies.
- nutch to start crawl from the browser, by clicking a command button - posted by Ayyanar Inbamohan <te...@yahoo.com> on 2005/09/26 07:48:04 UTC, 0 replies.
- link analysis and update segments - posted by AJ Chen <ca...@gmail.com> on 2005/09/26 08:24:43 UTC, 6 replies.
- New SE - posted by David Webster <tr...@loxinfo.co.th> on 2005/09/26 09:01:33 UTC, 0 replies.
- Faster Merging? - posted by Jon Shoberg <jo...@shoberg.net> on 2005/09/26 12:36:24 UTC, 0 replies.
- is there any way to prune webdb? - posted by Gal Nitzan <gn...@usa.net> on 2005/09/26 21:00:37 UTC, 0 replies.
- fetcher hangs and thead lifetime - posted by Jon Shoberg <jo...@shoberg.net> on 2005/09/27 02:26:04 UTC, 3 replies.
- Re[2]: java.io.IOException: Task process exit with nonzero status - posted by Michael <mi...@gameservice.ru> on 2005/09/27 03:29:12 UTC, 1 replies.
- Re: Map Reduce - posted by Jack Tang <hi...@gmail.com> on 2005/09/27 12:17:13 UTC, 2 replies.
- SessionIDs and forums are killing my fetch - posted by Jon Shoberg <jo...@shoberg.net> on 2005/09/28 05:11:49 UTC, 5 replies.
- Fetcher Speed with Threads - posted by Paul van Brouwershaven <pa...@vanbrouwershaven.com> on 2005/09/28 09:14:04 UTC, 2 replies.
- HTTP ERROR: 500 - posted by Gal Nitzan <gn...@usa.net> on 2005/09/28 09:56:49 UTC, 0 replies.
- Re: Maintaining only one FAQ - posted by Stefan Groschupf <sg...@media-style.com> on 2005/09/28 11:21:09 UTC, 4 replies.
- Parsing HTML meta tags - posted by Paul Williams <Pa...@becta.org.uk> on 2005/09/28 11:50:53 UTC, 1 replies.
- regex-normalize - Re: SessionIDs and forums are killing my fetch - posted by Jon Shoberg <jo...@shoberg.net> on 2005/09/28 13:48:06 UTC, 0 replies.
- search with ndfs/mapred index - posted by Gal Nitzan <gn...@usa.net> on 2005/09/28 15:48:27 UTC, 2 replies.
- Re: [Nutch-general] Maintaining only one FAQ - posted by og...@yahoo.com on 2005/09/28 18:14:37 UTC, 0 replies.
- pattern matching and boolean searches - posted by Edward Quick <ed...@hotmail.com> on 2005/09/28 21:26:49 UTC, 1 replies.
- java.lang.ClassNotFoundException: org.apache.nutch.ipc.RPC$NullInstance - posted by Gal Nitzan <gn...@usa.net> on 2005/09/28 21:35:12 UTC, 0 replies.
- Re[3]: java.io.IOException: Task process exit with nonzero status - posted by Michael <mi...@gameservice.ru> on 2005/09/28 21:53:59 UTC, 0 replies.
- Re: java.lang.ClassNotFoundException: org.apache.nutch.ipc.RPC$NullInstance - IGNORE sorry - posted by Gal Nitzan <gn...@usa.net> on 2005/09/28 22:28:02 UTC, 0 replies.
- Is it at all necessary to merge segments in MapRed? - posted by Gal Nitzan <gn...@usa.net> on 2005/09/29 01:43:50 UTC, 3 replies.
- java.io.IOException: Cannot create file (in reduce task) - posted by Gal Nitzan <gn...@usa.net> on 2005/09/29 13:56:45 UTC, 4 replies.
- problem about the fetch of dinamic page - posted by ad...@interfree.it on 2005/09/29 16:28:59 UTC, 0 replies.
- MapReduce - posted by Paul van Brouwershaven <pa...@vanbrouwershaven.com> on 2005/09/29 20:22:37 UTC, 3 replies.
- nutch-daemon.sh patch (PID file and IDENT string) - posted by Rod Taylor <rb...@sitesell.com> on 2005/09/29 21:01:45 UTC, 4 replies.
- Re: Maintaining only one FAQ - I can not do it only webmaster - posted by Gal Nitzan <gn...@usa.net> on 2005/09/29 21:07:42 UTC, 0 replies.
- Doug - FAQ - Re: Maintaining only one FAQ - posted by Jon Shoberg <jo...@shoberg.net> on 2005/09/29 21:51:03 UTC, 2 replies.
- New plugin - posted by Gal Nitzan <gn...@usa.net> on 2005/09/29 21:53:42 UTC, 2 replies.
- java.io.IOException: key out of order - posted by Michael <mi...@gameservice.ru> on 2005/09/30 13:32:57 UTC, 0 replies.
- mapred -numFetchers gone? - posted by Rod Taylor <rb...@sitesell.com> on 2005/09/30 16:46:01 UTC, 1 replies.
- Faster UpdateDB - posted by Jon Shoberg <jo...@shoberg.net> on 2005/09/30 18:31:45 UTC, 9 replies.