You are viewing a plain text version of this content. The canonical link for it is here.
- Memory leakduring crawlr? - posted by "Rüdiger Schulz (SkyGate)" <sc...@skygate.de> on 2007/03/01 18:37:28 UTC, 0 replies.
- Behavior of nutch-site.xml vs. hadoop-site.xml - posted by "Ricardo J. Méndez" <me...@gmail.com> on 2007/03/01 19:09:02 UTC, 14 replies.
- Arabic language in Nutch - posted by Munir <mu...@yahoo.com> on 2007/03/02 14:27:46 UTC, 1 replies.
- Total Hits: 0 - posted by av...@gmx.de on 2007/03/03 15:29:19 UTC, 2 replies.
- Getting a list of all items in the database - posted by "Ricardo J. Méndez" <me...@gmail.com> on 2007/03/05 07:57:49 UTC, 3 replies.
- nutch0.8.1+dfs fetch return nothing - posted by xu xiong <xi...@gmail.com> on 2007/03/05 11:20:02 UTC, 0 replies.
- SSL & Nutch (SecureProtocolSocketFactory) - posted by g....@ifc.cnr.it on 2007/03/05 12:04:47 UTC, 1 replies.
- moving crawled db from windows to linux - posted by kan001 <ka...@yahoo.com> on 2007/03/05 18:37:47 UTC, 1 replies.
- Nutch 0.8.1 not parsing XHTML using XML (even mime.type.magic off) - posted by cybercouf <cy...@free.fr> on 2007/03/05 19:26:37 UTC, 0 replies.
- Re: Unable to display search result on Tomcat - posted by png han <pi...@hotmail.com> on 2007/03/05 21:15:53 UTC, 1 replies.
- Hadoop native compression libs [FreeBSD-specific] - Revisited - posted by Sean Dean <se...@rogers.com> on 2007/03/06 01:56:29 UTC, 0 replies.
- Re: [SOLVED] moving crawled db from windows to linux - posted by kan001 <ka...@yahoo.com> on 2007/03/06 05:48:56 UTC, 5 replies.
- Index populated but NutchBean can't find hits - posted by "Ricardo J. Méndez" <me...@gmail.com> on 2007/03/06 06:08:41 UTC, 2 replies.
- Merge Crawls nutch - 0.7.2 - posted by Nuther <nu...@proservice.ge> on 2007/03/06 09:18:51 UTC, 0 replies.
- Re: [SOLVED] Nutch 0.8.1 not parsing XHTML using XML (even mime.type.magic off) - posted by cybercouf <cy...@free.fr> on 2007/03/06 17:21:23 UTC, 0 replies.
- Re: [SOLVED] Unable to display search result on Tomcat - posted by Ping Searcher <pi...@playstarmusic.com> on 2007/03/06 21:42:37 UTC, 0 replies.
- Crawl slow on one machine, fast on another - posted by sdeck <sc...@gmail.com> on 2007/03/06 23:08:34 UTC, 1 replies.
- Re: [SOLVED] Crawl slow on one machine, fast on another - posted by sdeck <sc...@gmail.com> on 2007/03/07 00:44:23 UTC, 0 replies.
- Following outlinks during - or after - seed fetch - posted by "Ricardo J. Méndez" <me...@gmail.com> on 2007/03/07 06:16:54 UTC, 2 replies.
- memory consumpition by nutch - posted by Harmesh <ha...@in.v2solutions.com> on 2007/03/07 08:01:50 UTC, 1 replies.
- How to configured crawl-urlfilters.txt - posted by Harmesh <ha...@in.v2solutions.com> on 2007/03/07 08:05:22 UTC, 1 replies.
- Re: [SOLVED] memory consumpition by nutch - posted by Harmesh <ha...@in.v2solutions.com> on 2007/03/07 10:14:52 UTC, 1 replies.
- Nutch Searchig Issue - posted by prashant_nutch <pr...@in.v2solutions.com> on 2007/03/07 10:25:10 UTC, 0 replies.
- Issue with DB_GONE - posted by "Ratnesh Srivastava, India" <ra...@in.v2solutions.com> on 2007/03/07 10:34:45 UTC, 1 replies.
- Re: Nutch and adsense integration - posted by Arun Kaundal <ar...@gmail.com> on 2007/03/07 14:11:21 UTC, 0 replies.
- RuntimeException: x point net.nutch.parse.Parser not found - posted by #KHOO BING JIN# <KH...@ntu.edu.sg> on 2007/03/07 17:27:51 UTC, 1 replies.
- Good config for ntop - posted by HUYLEBROECK Jeremy RD-ILAB-SSF <je...@orange-ftgroup.com> on 2007/03/07 20:14:25 UTC, 0 replies.
- Nutch 9.x Tomcat Failure - posted by Rafael Turk <ra...@gmail.com> on 2007/03/08 02:06:22 UTC, 0 replies.
- How to restart the crawling process if its stop in between - posted by Harmesh <ha...@in.v2solutions.com> on 2007/03/08 07:03:58 UTC, 1 replies.
- Newbie questions about followed links - posted by Jeroen Verhagen <je...@gmail.com> on 2007/03/08 11:32:25 UTC, 3 replies.
- Re: [SOLVED] Newbie questions about followed links - posted by djames <dj...@supinfo.com> on 2007/03/08 13:47:34 UTC, 0 replies.
- external host link logging - posted by djames <dj...@supinfo.com> on 2007/03/08 14:10:11 UTC, 2 replies.
- Fetch: java.lang.NullPointerException - posted by Rafael Turk <ra...@gmail.com> on 2007/03/09 06:20:26 UTC, 4 replies.
- Re: [SOLVED] external host link logging - posted by djames <dj...@supinfo.com> on 2007/03/09 09:29:49 UTC, 4 replies.
- How to avoid outlinks on jpg/css/... ? - posted by cybercouf <cy...@free.fr> on 2007/03/09 11:27:30 UTC, 1 replies.
- Java Programmatic Access to Invoking Search - posted by d e <cr...@gmail.com> on 2007/03/09 22:27:56 UTC, 1 replies.
- Nothing Fetched when attempting to crawl other than the apache site ! - posted by d e <cr...@gmail.com> on 2007/03/10 10:13:50 UTC, 1 replies.
- Opps! Nothing Fetched when attempting to crawl other than the apache site ! - posted by d e <cr...@gmail.com> on 2007/03/10 10:59:59 UTC, 2 replies.
- fetch2 very slow - anyone try this?? - posted by RP <rp...@earthlink.net> on 2007/03/11 16:27:06 UTC, 1 replies.
- How to crawl for tag specific search - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/12 07:00:53 UTC, 1 replies.
- text extraction - posted by Shrinivas Patwardhan <sh...@krawlernetworks.com> on 2007/03/12 09:01:02 UTC, 0 replies.
- nutch crawl - strange results - posted by Neelesh Rathore <ne...@in.v2solutions.com> on 2007/03/12 12:49:50 UTC, 0 replies.
- Re: [SOLVED] nutch crawl - strange results - posted by Rajneesh Makhija <ra...@in.v2solutions.com> on 2007/03/12 12:54:53 UTC, 0 replies.
- dedup is not removing duplicate record - posted by Harmesh <ha...@in.v2solutions.com> on 2007/03/12 12:54:58 UTC, 0 replies.
- nutch depth level - posted by Neelesh Rathore <ne...@in.v2solutions.com> on 2007/03/12 13:08:17 UTC, 0 replies.
- nutch on tomcat gets shutdown - posted by Neelesh Rathore <ne...@in.v2solutions.com> on 2007/03/12 13:20:09 UTC, 1 replies.
- classpath issue plugins - posted by Jeroen Verhagen <je...@gmail.com> on 2007/03/12 13:57:57 UTC, 0 replies.
- Re: [SOLVED] dedup is not removing duplicate record - posted by "Harmesh, V2solutions" <ha...@in.v2solutions.com> on 2007/03/12 14:15:35 UTC, 0 replies.
- how to remove duplicate URL's - posted by "Harmesh, V2solutions" <ha...@in.v2solutions.com> on 2007/03/12 14:17:01 UTC, 0 replies.
- Hi What is the use of refine-query-init.jsp,refine-query.jsp - posted by inalasuresh <in...@care2.com> on 2007/03/12 14:43:11 UTC, 2 replies.
- Hi what is the use of subcollections.xml - posted by inalasuresh <in...@care2.com> on 2007/03/12 14:47:01 UTC, 1 replies.
- Crawling - posted by inalasuresh <in...@care2.com> on 2007/03/12 14:56:20 UTC, 0 replies.
- nutch-0.8.1 - PDF Fragment problem - posted by Lucifersam <ro...@tagish.co.uk> on 2007/03/12 14:56:42 UTC, 0 replies.
- Re: Recovering aborted fetch - posted by Mathijs Homminga <ma...@knowlogy.nl> on 2007/03/12 15:05:31 UTC, 0 replies.
- DummySSLProtocolSocketFactory problem - posted by Gavino Marras <g....@ifc.cnr.it> on 2007/03/12 16:53:19 UTC, 0 replies.
- Contributing a plugin - posted by "Ricardo J. Méndez" <me...@gmail.com> on 2007/03/12 21:50:12 UTC, 2 replies.
- nutch crawl - incremental update - posted by Bonardo Pascal <p....@free.fr> on 2007/03/13 02:07:50 UTC, 0 replies.
- LinkDB - posted by hzhong <he...@gmail.com> on 2007/03/13 05:30:44 UTC, 0 replies.
- how to restrict the size of segments - posted by "Harmesh, V2solutions" <ha...@in.v2solutions.com> on 2007/03/13 10:59:04 UTC, 1 replies.
- Nutch conf reading - posted by djames <dj...@supinfo.com> on 2007/03/14 11:34:45 UTC, 5 replies.
- Any hints for debuging errors like "java.io.exception: read 95 bytes, should read 159" ? - posted by qi wu <ch...@gmail.com> on 2007/03/14 15:30:36 UTC, 2 replies.
- DummySSLProtocolSocketFactory problem, please help me!!!! - posted by Gavino Marras <g....@ifc.cnr.it> on 2007/03/14 15:39:46 UTC, 1 replies.
- extracting urls into text files - posted by cha <ch...@metrixline.com> on 2007/03/15 16:36:39 UTC, 10 replies.
- Error Nutch_default.xml and crawl-tool.xml not found during compilation - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/16 05:39:16 UTC, 0 replies.
- help me in writing plugin for extracting tag from a HTML page - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/16 05:49:34 UTC, 2 replies.
- When can I delete segments? (still usefull after indexing?) - posted by cybercouf <cy...@free.fr> on 2007/03/16 10:41:35 UTC, 1 replies.
- Problem with stemmer - posted by te...@gmail.com on 2007/03/16 12:16:11 UTC, 0 replies.
- How to reslove ?? java.lang.RuntimeException: No scoring plugins - at least one scoring plugin is required - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/16 14:38:47 UTC, 2 replies.
- Do I need to include Nutch-0.8.1 Source code For writing our own application - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/16 16:09:49 UTC, 0 replies.
- Nutch-0.8.1 Errors - posted by RJ <ry...@sympatico.ca> on 2007/03/17 03:33:18 UTC, 0 replies.
- Crawling sucessful without fetching - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/17 10:49:04 UTC, 1 replies.
- Nutch 0.8.1 issue with fetch - posted by kkfromus <kk...@gmail.com> on 2007/03/19 05:31:27 UTC, 2 replies.
- Problems crawling a URL - posted by Paul Liddelow <pa...@gmail.com> on 2007/03/19 10:14:37 UTC, 1 replies.
- Nutch On Eclipse (windows) - posted by prashant_nutch <pr...@in.v2solutions.com> on 2007/03/19 11:00:08 UTC, 1 replies.
- writing urls to xml files - posted by utsavi <ut...@yahoo.com> on 2007/03/19 16:45:56 UTC, 1 replies.
- Scoring - posted by Damian Florczyk <th...@gentoo.org> on 2007/03/19 16:55:21 UTC, 0 replies.
- HTTP Response Code - posted by Ab...@aol.com on 2007/03/19 22:10:45 UTC, 0 replies.
- Any way for removing pages with same title in index? - posted by qi wu <ch...@gmail.com> on 2007/03/20 18:18:21 UTC, 1 replies.
- Re: Newbie question - syntax error on bin/nutch - posted by Trung Tran <tr...@yahoo.com> on 2007/03/21 01:21:10 UTC, 0 replies.
- WARN QueryFilters - QueryFilter: RecommendedQueryFilter :names no fields. - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/21 07:16:34 UTC, 1 replies.
- Vidoe search - posted by Anton Potekhin <an...@orbita1.ru> on 2007/03/21 11:27:36 UTC, 4 replies.
- help needed : filters in regex-urlfilter.txt - posted by cha <ch...@metrixline.com> on 2007/03/21 16:37:53 UTC, 5 replies.
- Crawl not crawling entire page - posted by Mike Howarth <he...@mikehowarth.co.uk> on 2007/03/22 10:59:57 UTC, 5 replies.
- bzr branches for Apache Lucene/Nutch/Solr/Hadoop at Launchpad - posted by rubdabadub <ru...@gmail.com> on 2007/03/22 12:14:56 UTC, 0 replies.
- Lucene IndexWriter and Nutch index - posted by Ilya Vishnevsky <Il...@e-legion.com> on 2007/03/22 14:42:07 UTC, 1 replies.
- Need Help with crawl-urlfilter.txt - posted by SriramG <sg...@etrade.com> on 2007/03/22 22:00:26 UTC, 1 replies.
- removing jsessionid - posted by cha <ch...@metrixline.com> on 2007/03/23 06:43:03 UTC, 3 replies.
- Merging WebDBs - posted by prashant_nutch <pr...@in.v2solutions.com> on 2007/03/23 07:25:14 UTC, 1 replies.
- I: COME SI FA' AD ANDARE AVANTI ?? - posted by Info <in...@radionav.it> on 2007/03/23 10:56:13 UTC, 0 replies.
- nutch-0.7 Compatible API Problem?? - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/23 13:07:15 UTC, 0 replies.
- Nutch and GET - posted by Damian Florczyk <th...@gentoo.org> on 2007/03/23 14:20:32 UTC, 8 replies.
- Logger duplicates entries by the thousands - posted by Briggs <ac...@gmail.com> on 2007/03/23 14:44:15 UTC, 1 replies.
- Nutch HTML Tag Filter - posted by Anton Beza <an...@gmail.com> on 2007/03/23 20:04:55 UTC, 2 replies.
- ant build + speed - posted by sdeck <sc...@gmail.com> on 2007/03/25 01:10:24 UTC, 0 replies.
- Wikia Search Engine? Anyone working on it? - posted by Dennis Kubes <nu...@dragonflymc.com> on 2007/03/25 07:37:02 UTC, 3 replies.
- Re: Wikia Search Engine? Anyone working on it? - posted by rubdabadub <ru...@gmail.com> on 2007/03/25 11:12:32 UTC, 0 replies.
- not able to index a field in lucene - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/25 14:45:26 UTC, 0 replies.
- plugin inclusion steps - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/25 15:00:58 UTC, 2 replies.
- WARN SummarizerFactory - java.lang.ArrayIndexOutOfBoundsException: 0 - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/26 09:08:03 UTC, 0 replies.
- Re: WARN SummarizerFactory - java.lang.ArrayIndexOutOfBoundsException: 0 - posted by Ravi Chintakunta <ra...@gmail.com> on 2007/03/26 15:50:00 UTC, 1 replies.
- number of fetcher tasks on a hadoop cluster - posted by Mathijs Homminga <ma...@knowlogy.nl> on 2007/03/26 16:25:46 UTC, 0 replies.
- Splitting segments - posted by Mathijs Homminga <ma...@knowlogy.nl> on 2007/03/26 16:58:21 UTC, 2 replies.
- log4j:ERROR Failed to flush writer, - posted by Ab...@aol.com on 2007/03/27 03:38:54 UTC, 0 replies.
- Re: [Nutch-general] Wikia Search Engine? Anyone working on it? - posted by og...@yahoo.com on 2007/03/27 05:08:34 UTC, 0 replies.
- How to store a field for searching??? - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/27 11:51:23 UTC, 1 replies.
- Re: what does this exception probably mean? - posted by Dennis Kubes <nu...@dragonflymc.com> on 2007/03/27 17:02:33 UTC, 1 replies.
- can't remove navigation_id while crawling - posted by cha <ch...@metrixline.com> on 2007/03/27 17:53:06 UTC, 0 replies.
- 0.8.x Crawler compared to 0.7.2 Crawler - posted by Gaurav Agarwal <ga...@yahoo.com> on 2007/03/27 22:11:34 UTC, 3 replies.
- Exception in DeleteDuplicates in nutch-nightly - posted by Tim Benke <ze...@fusemail.com> on 2007/03/27 23:39:26 UTC, 2 replies.
- Need Help ASAP - posted by Yakn <bo...@yahoo.com> on 2007/03/28 06:07:42 UTC, 0 replies.
- Search on Restricted URL ASAP - posted by prashant_nutch <pr...@in.v2solutions.com> on 2007/03/28 09:03:41 UTC, 0 replies.
- recno,segment in ParseData class??? - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/28 10:38:41 UTC, 0 replies.
- error while crawling - posted by cha <ch...@metrixline.com> on 2007/03/28 12:51:34 UTC, 0 replies.
- parse-rss e - posted by og...@yahoo.com on 2007/03/28 23:31:35 UTC, 0 replies.
- 1 Nutch, multiple indices? - posted by og...@yahoo.com on 2007/03/29 00:03:28 UTC, 1 replies.
- Fine tuning scoring/ranking - posted by Annona Keene <an...@yahoo.com> on 2007/03/29 00:24:33 UTC, 0 replies.
- Nutch dataset dirstructure - posted by pike <pi...@kw.nl> on 2007/03/29 10:37:30 UTC, 2 replies.
- java.lang.ClassFormatError: Illegal field name "has inconsistent hierarchy" in class - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/29 16:40:31 UTC, 0 replies.
- [SOLVED] Re: Exception in DeleteDuplicates in nutch-nightly - posted by Tim Benke <ze...@fusemail.com> on 2007/03/29 18:00:42 UTC, 0 replies.
- Help on Activation of Subcollection at Indexing & searching - posted by prashant_nutch <pr...@in.v2solutions.com> on 2007/03/30 08:54:13 UTC, 3 replies.
- Can't find resource: regex-urlfilter.txt - posted by cha <ch...@metrixline.com> on 2007/03/30 09:40:23 UTC, 0 replies.
- Crawling + Indexing staging vs. production and URL conflict - posted by og...@yahoo.com on 2007/03/30 16:58:47 UTC, 2 replies.
- trouble adding fields to index - posted by Siddharth Jonathan <jo...@gmail.com> on 2007/03/31 11:52:08 UTC, 7 replies.
- WARN parse.ParserFactory - ParserFactory: Plugin: OBJECTLinkParser mapped to contentType text/html via parse-plugins.xml, but not enabled via plugin.includes in nutch-default.xml - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/31 12:38:53 UTC, 1 replies.
- Wildly different crawl results depending on environment... - posted by Briggs <ac...@gmail.com> on 2007/03/31 16:10:30 UTC, 0 replies.