You are viewing a plain text version of this content. The canonical link for it is here.
- Re: Parallelizing URLFiltering - posted by Dennis Kubes <ku...@apache.org> on 2007/06/01 06:44:29 UTC, 1 replies.
- NoClassDefFoundError while trying to run (format) namenode - posted by Ilya Vishnevsky <Il...@e-legion.com> on 2007/06/01 13:46:43 UTC, 0 replies.
- Content Type Not Resolved Correctly? - posted by Briggs <ac...@gmail.com> on 2007/06/01 16:11:09 UTC, 6 replies.
- Error with the inject command - posted by Berlin Brown <be...@gmail.com> on 2007/06/02 09:20:36 UTC, 3 replies.
- Nutch and faceted search - posted by chris sleeman <ch...@gmail.com> on 2007/06/02 15:52:57 UTC, 3 replies.
- Compression - posted by Emmanuel JOKE <jo...@gmail.com> on 2007/06/02 20:23:07 UTC, 5 replies.
- Checking existence of index segments - posted by Mathijs Homminga <ma...@knowlogy.nl> on 2007/06/02 22:10:20 UTC, 0 replies.
- Cleaning up segments after indexing - posted by Mathijs Homminga <ma...@knowlogy.nl> on 2007/06/02 22:15:45 UTC, 0 replies.
- Is fetcher.throttle.bandwidth known to work? - posted by Enzo Michelangeli <en...@gmail.com> on 2007/06/03 18:17:54 UTC, 7 replies.
- Job Opportunity for Developers, Distributed Java at Fredhopper (Amsterdam) - posted by Vasil Kokareshkov <va...@fredhopper.com> on 2007/06/03 22:20:41 UTC, 1 replies.
- How to enable followRedirects? - posted by Manoharam Reddy <ma...@gmail.com> on 2007/06/04 06:30:07 UTC, 1 replies.
- Number of Pages - posted by carmmello <ca...@globo.com> on 2007/06/04 18:07:06 UTC, 0 replies.
- Re: Nutch 0.9 and Crawl-Delay - posted by Ken Krugler <kk...@transpac.com> on 2007/06/04 21:32:26 UTC, 0 replies.
- Loading mechnism of plugin classes and singleton objects - posted by Enzo Michelangeli <en...@gmail.com> on 2007/06/05 04:20:28 UTC, 11 replies.
- field collapsing impl - posted by Yonik Seeley <yo...@apache.org> on 2007/06/05 05:08:42 UTC, 0 replies.
- Complex problem of recrawling economically - posted by Manoharam Reddy <ma...@gmail.com> on 2007/06/05 06:31:21 UTC, 0 replies.
- Re: WIN XP PRO -Djava.protocol* file:///c:/folder/ Crawling Parents - posted by Vadim B <Ma...@unterderbruecke.de> on 2007/06/05 14:27:25 UTC, 2 replies.
- no datanode to stop - posted by Ilya Vishnevsky <Il...@e-legion.com> on 2007/06/05 16:17:30 UTC, 0 replies.
- Fetch list size affecting fetch speed? - posted by Tim_G <ti...@gmail.com> on 2007/06/05 22:00:38 UTC, 0 replies.
- Changing Initial number of hits/page Searcher shows. - posted by Nick Pisarro <Ni...@aperture.com> on 2007/06/06 00:43:47 UTC, 1 replies.
- search problem-no segments* file found - posted by xu xiong <xi...@gmail.com> on 2007/06/06 05:49:20 UTC, 1 replies.
- urls/nutch in local is invalid - posted by Martin Kammerlander <Ma...@student.uibk.ac.at> on 2007/06/06 17:02:14 UTC, 4 replies.
- stackoverflow error - posted by djames <dj...@supinfo.com> on 2007/06/06 19:09:57 UTC, 4 replies.
- indexing only special documents - posted by Martin Kammerlander <Ma...@student.uibk.ac.at> on 2007/06/06 20:29:41 UTC, 8 replies.
- Hadoop oddity - posted by "Bolle, Jeffrey F." <jb...@mitre.org> on 2007/06/07 01:16:27 UTC, 7 replies.
- RE(2): Changing Initial number of hits/page Searcher shows. - posted by Nick Pisarro <Ni...@aperture.com> on 2007/06/07 01:45:56 UTC, 0 replies.
- ParseData encoding problem - posted by xu xiong <xi...@gmail.com> on 2007/06/07 04:21:40 UTC, 1 replies.
- one Problem - posted by DHANU BUDIREDDI <bu...@gmail.com> on 2007/06/07 14:30:39 UTC, 1 replies.
- Why datanode does not work properly on slave? - posted by Ilya Vishnevsky <Il...@e-legion.com> on 2007/06/07 14:32:36 UTC, 1 replies.
- Cookie - posted by Emmanuel JOKE <jo...@gmail.com> on 2007/06/07 16:09:57 UTC, 3 replies.
- Explanation of topN - posted by monkeynuts84 <mo...@hotmail.com> on 2007/06/08 22:29:10 UTC, 2 replies.
- Crawling the web and going into depth - posted by Berlin Brown <be...@gmail.com> on 2007/06/09 22:19:50 UTC, 6 replies.
- How to add parsed metadata to Parse.getData? - posted by Li Zheng wei <ma...@hotmail.com> on 2007/06/10 23:55:29 UTC, 0 replies.
- Incremental indexing - posted by Enzo Michelangeli <en...@gmail.com> on 2007/06/11 02:58:53 UTC, 0 replies.
- crawling by ip range - posted by Cesar Voulgaris <ce...@gmail.com> on 2007/06/11 03:24:14 UTC, 1 replies.
- is it possible to set different addDays for different sites? - posted by Manoharam Reddy <ma...@gmail.com> on 2007/06/11 07:36:22 UTC, 1 replies.
- Why Nutch is indexing HTTP 302 pages - posted by Manoharam Reddy <ma...@gmail.com> on 2007/06/11 07:37:33 UTC, 1 replies.
- Hadoop startup... - posted by Emmanuel JOKE <jo...@gmail.com> on 2007/06/11 16:43:48 UTC, 0 replies.
- Nutch/Hadoop Fetcher confusion - posted by patrik <pa...@clipblast.com> on 2007/06/12 02:53:05 UTC, 4 replies.
- Cache problem, - posted by Phạm Hải Thanh <ph...@vasc.com.vn> on 2007/06/12 03:29:00 UTC, 4 replies.
- Re: What is parse-oo and why doesn't parsed PDF content show up in cached.jsp ? - posted by Manoharam Reddy <ma...@gmail.com> on 2007/06/12 06:42:58 UTC, 1 replies.
- How to index javascript contents - posted by cyanean <cy...@gmail.com> on 2007/06/12 08:53:37 UTC, 0 replies.
- Hadoop Log4j ? - posted by Emmanuel JOKE <jo...@gmail.com> on 2007/06/12 17:01:01 UTC, 2 replies.
- Can nutch index the javascript code too? - posted by Joseph Chan <cy...@gmail.com> on 2007/06/12 18:28:40 UTC, 1 replies.
- meaning of depth value - tutorial wrong? - posted by Manoharam Reddy <ma...@gmail.com> on 2007/06/13 07:49:32 UTC, 4 replies.
- why number of results is more than topN x depth? - posted by Manoharam Reddy <ma...@gmail.com> on 2007/06/13 08:04:52 UTC, 0 replies.
- Problems stemming - posted by shinta himura <on...@hotmail.com> on 2007/06/13 10:36:52 UTC, 3 replies.
- Enabling Spell-Check plugin in contrib - posted by chris sleeman <ch...@gmail.com> on 2007/06/13 14:04:48 UTC, 2 replies.
- Indexing problems in nutch-nightly - posted by ca...@globo.com on 2007/06/14 20:25:06 UTC, 15 replies.
- Re: Any URL filter available for search.jsp? - posted by Scam <sc...@inbox.ru> on 2007/06/14 23:04:27 UTC, 1 replies.
- Re[2]: Any URL filter available for search.jsp? - posted by Scam <sc...@inbox.ru> on 2007/06/15 00:33:57 UTC, 0 replies.
- Re[2]: Enabling Spell-Check plugin in contrib - posted by Scam <sc...@inbox.ru> on 2007/06/15 01:47:38 UTC, 1 replies.
- URLs and encoding problems - posted by Árni Hermann Reynisson <ar...@hugsmidjan.is> on 2007/06/15 12:46:47 UTC, 1 replies.
- fetch failing while crawling - posted by karan thakral <ka...@gmail.com> on 2007/06/15 16:49:33 UTC, 2 replies.
- Hadoop Fetch Log - posted by Emmanuel JOKE <jo...@gmail.com> on 2007/06/16 19:32:52 UTC, 1 replies.
- deleting pages from db - posted by cesar voulgaris <ce...@gmail.com> on 2007/06/17 08:41:22 UTC, 0 replies.
- Re[3]: Enabling Spell-Check plugin in contrib - posted by Scam <sc...@inbox.ru> on 2007/06/17 20:39:24 UTC, 0 replies.
- Trouble configuring Nutch - posted by niraj tulachan <nj...@yahoo.com> on 2007/06/17 21:03:06 UTC, 2 replies.
- Search Help! - posted by niraj tulachan <nj...@yahoo.com> on 2007/06/18 01:56:06 UTC, 0 replies.
- Reload index - posted by "Naess, Ronny" <Ro...@avinor.no> on 2007/06/18 15:22:56 UTC, 6 replies.
- Having problems getting the field of "content" to be stored - posted by Micah Vivion <mi...@gmail.com> on 2007/06/19 01:36:59 UTC, 1 replies.
- Different config files for different jobs - posted by patrik <pa...@clipblast.com> on 2007/06/19 09:37:32 UTC, 0 replies.
- Re[2]: Problems stemming - posted by Scam <sc...@inbox.ru> on 2007/06/19 11:53:39 UTC, 1 replies.
- doubt about indexing - posted by karan thakral <ka...@gmail.com> on 2007/06/19 12:08:45 UTC, 9 replies.
- Re[4]: Problems stemming - posted by Scam <sc...@inbox.ru> on 2007/06/19 13:16:35 UTC, 0 replies.
- Searching Filter - posted by Milan Krendzelak <mk...@mtld.mobi> on 2007/06/19 16:14:59 UTC, 2 replies.
- Lucene client and nutch index - posted by "Naess, Ronny" <Ro...@avinor.no> on 2007/06/19 19:39:11 UTC, 8 replies.
- Nutch 0.9 hung threads - posted by Sunnyvale Fl <su...@gmail.com> on 2007/06/19 23:03:38 UTC, 3 replies.
- prevent of external links crawling does not work - posted by Scam <sc...@inbox.ru> on 2007/06/20 00:56:44 UTC, 0 replies.
- First nutch based public application, botlist - posted by Berlin Brown <be...@gmail.com> on 2007/06/20 06:19:25 UTC, 0 replies.
- RE: Nutch 0.9 - Generator: 0 records selected for fetching, exiting - posted by patrik <pa...@clipblast.com> on 2007/06/20 06:45:25 UTC, 0 replies.
- how fast can nutch fetch urls ? - posted by Ian Holsman <li...@holsman.net> on 2007/06/20 07:50:49 UTC, 1 replies.
- meta data plugin needed - posted by karan thakral <ka...@gmail.com> on 2007/06/20 11:03:34 UTC, 3 replies.
- Performance: Fetcher2 or Fetcher - posted by Emmanuel JOKE <jo...@gmail.com> on 2007/06/20 14:55:12 UTC, 2 replies.
- not crawling relative URLs - posted by Kai_testing Middleton <ka...@yahoo.com> on 2007/06/20 21:08:29 UTC, 2 replies.
- Possibly use a different library to parse RSS feed for improved performance and compatibility - posted by Kai_testing Middleton <ka...@yahoo.com> on 2007/06/21 01:42:22 UTC, 3 replies.
- Found the bug in Generator when number of URLs is small - posted by Vishal Shah <vi...@rediff.co.in> on 2007/06/21 08:43:30 UTC, 0 replies.
- Problem with merge-output - posted by Phạm Hải Thanh <ph...@vasc.com.vn> on 2007/06/21 11:49:57 UTC, 2 replies.
- How to score a paticular page higher than the other pages - posted by "Harmesh, V2solutions" <ha...@in.v2solutions.com> on 2007/06/21 12:06:00 UTC, 8 replies.
- http.content.limit not respected when the Content-Type header has charset attributes - posted by Vishal Shah <vi...@rediff.co.in> on 2007/06/21 12:06:27 UTC, 2 replies.
- Distributed index - posted by Karol Rybak <ka...@gmail.com> on 2007/06/21 12:46:26 UTC, 10 replies.
- how to specify crawl urls - posted by karan <ka...@gmail.com> on 2007/06/21 18:27:35 UTC, 0 replies.
- Index gets no results - posted by "Rüdiger Schulz (SkyGate)" <sc...@skygate.de> on 2007/06/21 19:00:07 UTC, 1 replies.
- fetching http://www.variety.com/ - posted by Kai_testing Middleton <ka...@yahoo.com> on 2007/06/22 00:24:05 UTC, 10 replies.
- Redirects not working - posted by H H <hi...@yahoo.com> on 2007/06/22 00:46:04 UTC, 0 replies.
- 0.9 document boost inflated - posted by Sunnyvale Fl <su...@gmail.com> on 2007/06/22 03:52:16 UTC, 1 replies.
- injector failing - posted by karan <ka...@gmail.com> on 2007/06/22 10:15:01 UTC, 1 replies.
- OR searches possible? - posted by Robert Young <bu...@gmail.com> on 2007/06/22 11:26:15 UTC, 1 replies.
- Merging Nutch Hits objects - posted by Robert Young <bu...@gmail.com> on 2007/06/22 13:32:53 UTC, 0 replies.
- Cookie question - posted by David Xiao <da...@gmail.com> on 2007/06/22 15:08:02 UTC, 2 replies.
- slow distributed crawling - posted by Des Sant <sa...@gmail.com> on 2007/06/22 17:30:12 UTC, 0 replies.
- How to read all the urls crawled - posted by hzhong <he...@gmail.com> on 2007/06/22 21:04:53 UTC, 0 replies.
- Adding options to individual tasks - posted by patrik <pa...@clipblast.com> on 2007/06/23 01:12:02 UTC, 0 replies.
- Re: Using nutch just for the crawler/fetcher - posted by Kai_testing Middleton <ka...@yahoo.com> on 2007/06/23 04:15:06 UTC, 0 replies.
- Fwd: nutch plugin include failing - posted by karan <ka...@gmail.com> on 2007/06/23 13:26:02 UTC, 0 replies.
- search.jsp not being displayed - posted by karan <ka...@gmail.com> on 2007/06/23 14:29:28 UTC, 0 replies.
- Integrate nutch crawler with Solr index server - posted by David Xiao <da...@gmail.com> on 2007/06/23 14:37:55 UTC, 1 replies.
- search error - posted by karan <ka...@gmail.com> on 2007/06/24 10:28:33 UTC, 7 replies.
- Indexer NPE - posted by Emmanuel JOKE <jo...@gmail.com> on 2007/06/24 12:10:12 UTC, 5 replies.
- how to apply a patch to nutch - posted by Kai_testing Middleton <ka...@yahoo.com> on 2007/06/25 21:51:02 UTC, 6 replies.
- NUTCH-505 - cannot find symbol: variable URL_VALIDATOR - posted by Kai_testing Middleton <ka...@yahoo.com> on 2007/06/26 06:43:41 UTC, 1 replies.
- Weird encoding problem - posted by Karol Rybak <ka...@gmail.com> on 2007/06/26 09:34:46 UTC, 2 replies.
- Case insensitive searching - posted by Robert Young <bu...@gmail.com> on 2007/06/26 12:25:04 UTC, 0 replies.
- Re: [Nutch-general] Integrate nutch crawler with Solr index server - posted by og...@yahoo.com on 2007/06/26 14:42:31 UTC, 9 replies.
- The ranking is wrong - posted by "Naess, Ronny" <Ro...@avinor.no> on 2007/06/26 15:36:58 UTC, 5 replies.
- Deploying Nutch on Tomcat - posted by Jason Ma <ra...@gmail.com> on 2007/06/27 19:03:32 UTC, 1 replies.
- hadoop-site.xml Help - posted by DANIEL CLARK <da...@verizon.net> on 2007/06/27 21:17:27 UTC, 1 replies.
- too slow for re-parse job .. - posted by qi wu <ch...@gmail.com> on 2007/06/28 10:20:26 UTC, 0 replies.
- Problem with ooParser - posted by Karol Rybak <ka...@gmail.com> on 2007/06/28 11:33:54 UTC, 0 replies.
- Stemming with Nutch - posted by Robert Young <bu...@gmail.com> on 2007/06/28 13:00:13 UTC, 2 replies.
- Crawl error with hadoop - posted by Emmanuel JOKE <jo...@gmail.com> on 2007/06/28 14:54:36 UTC, 1 replies.
- Scaling up to several machines with Lucene - posted by Chun Wei Ho <cw...@gmail.com> on 2007/06/28 15:49:07 UTC, 0 replies.
- Using nutch to find image links - posted by bbrown <bb...@botspiritcompany.com> on 2007/06/28 19:10:17 UTC, 0 replies.
- IOException using feed plugin - NUTCH-444 - posted by Kai_testing Middleton <ka...@yahoo.com> on 2007/06/29 01:21:13 UTC, 4 replies.
- what is the meaning of Metadata: _pst_:notfound(14), lastModified=0: - posted by "Harmesh, V2solutions" <ha...@in.v2solutions.com> on 2007/06/29 09:38:50 UTC, 0 replies.
- No buffer space available (maximum connections reached?): connect - posted by Fritz Bein <fr...@gmx.de> on 2007/06/29 10:02:19 UTC, 2 replies.
- Nutch 0.9 - posted by DANIEL CLARK <da...@verizon.net> on 2007/06/29 19:11:32 UTC, 1 replies.
- Nutch crashes during search - posted by Jason Ma <ra...@gmail.com> on 2007/06/29 20:38:23 UTC, 0 replies.
- Nutch 0.9 Help - posted by DANIEL CLARK <da...@verizon.net> on 2007/06/29 21:27:32 UTC, 0 replies.
- Is there a plugin that allows modification of the hit url before it's added to the index? - posted by Mark_Fletcher <ma...@workday.com> on 2007/06/29 22:03:26 UTC, 1 replies.
- NoRouteToHostException - posted by DANIEL CLARK <da...@verizon.net> on 2007/06/29 22:07:10 UTC, 1 replies.
- windows eclipse run - posted by Tsengtan A Shuy <tt...@sbcglobal.net> on 2007/06/29 23:53:06 UTC, 1 replies.
- integrate Nutch into my php front page - posted by Tsengtan A Shuy <tt...@sbcglobal.net> on 2007/06/30 00:34:45 UTC, 11 replies.
- nutch-0.9 windows eclipse run - posted by Tsengtan A Shuy <tt...@sbcglobal.net> on 2007/06/30 03:12:10 UTC, 0 replies.
- Interrupting a nutch crawl -- or use topN? - posted by Kai_testing Middleton <ka...@yahoo.com> on 2007/06/30 04:10:29 UTC, 0 replies.