You are viewing a plain text version of this content. The canonical link for it is here.
- Searching multiple indexes with Nutch-2 servers,0 segments - posted by jqq <re...@gmail.com> on 2009/05/02 07:59:06 UTC, 4 replies.
- [jira] Created: (NUTCH-734) option to filter "a" tag text - posted by "ron (JIRA)" <ji...@apache.org> on 2009/05/02 13:49:30 UTC, 0 replies.
- Similarity with few keywords - posted by Xalan <aa...@gmail.com> on 2009/05/02 19:57:35 UTC, 0 replies.
- [jira] Commented: (NUTCH-733) plain text view of cached files ignores HTML encoding - posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org> on 2009/05/05 01:12:30 UTC, 0 replies.
- Filtering URLs - posted by MyD <my...@googlemail.com> on 2009/05/05 16:49:06 UTC, 1 replies.
- Nutch crawled results for Clustering with Carrot2 - posted by Gaurang Patel <ga...@gmail.com> on 2009/05/06 15:18:46 UTC, 1 replies.
- [jira] Created: (NUTCH-735) crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command - posted by "Susam Pal (JIRA)" <ji...@apache.org> on 2009/05/09 08:56:45 UTC, 0 replies.
- [jira] Updated: (NUTCH-735) crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command - posted by "Susam Pal (JIRA)" <ji...@apache.org> on 2009/05/09 09:01:45 UTC, 0 replies.
- Re: crawl-tool.xml mentions nutch-site.xml for overriding but it is not possible - posted by Susam Pal <su...@gmail.com> on 2009/05/09 10:37:13 UTC, 0 replies.
- Source code of web pages crawled by Nutch - posted by Gaurang Patel <ga...@gmail.com> on 2009/05/11 23:15:36 UTC, 0 replies.
- Content(source code) of web pages crawled by nutch - posted by Gaurang Patel <ga...@gmail.com> on 2009/05/12 05:20:34 UTC, 0 replies.
- Is there any working Nutch Administration interface in Nutch 1.0? - posted by "Rodrigo Reyes C." <ro...@avity.com> on 2009/05/12 18:16:03 UTC, 2 replies.
- Nutch/Solr: storing the page cache in Solr - posted by Siddhartha Reddy <si...@grok.in> on 2009/05/13 15:36:56 UTC, 2 replies.
- Regarding Solr1.3 and Nutch 0.9 Integration - posted by malli j <co...@gmail.com> on 2009/05/13 17:39:15 UTC, 0 replies.
- [jira] Created: (NUTCH-736) how long it takes nutch 1.0 to fetch - posted by "Filipe Antunes (JIRA)" <ji...@apache.org> on 2009/05/14 11:39:45 UTC, 0 replies.
- [jira] Updated: (NUTCH-736) how long it takes nutch 1.0 to fetch - posted by "Filipe Antunes (JIRA)" <ji...@apache.org> on 2009/05/14 15:28:45 UTC, 0 replies.
- The Future of Nutch, reactivated - posted by Andrzej Bialecki <ab...@getopt.org> on 2009/05/14 15:59:11 UTC, 9 replies.
- [Nutch Wiki] Trivial Update of "HttpAuthenticationSchemes" by susam - posted by Apache Wiki <wi...@apache.org> on 2009/05/14 16:30:52 UTC, 0 replies.
- [Nutch Wiki] Update of "RunningNutchAndSolr" by amitkumar - posted by Apache Wiki <wi...@apache.org> on 2009/05/14 19:07:44 UTC, 4 replies.
- [jira] Commented: (NUTCH-386) Plugin to index categories by url rules - posted by "martin lopez (JIRA)" <ji...@apache.org> on 2009/05/16 03:07:45 UTC, 0 replies.
- [jira] Issue Comment Edited: (NUTCH-386) Plugin to index categories by url rules - posted by "martin lopez (JIRA)" <ji...@apache.org> on 2009/05/16 03:09:45 UTC, 0 replies.
- Ranking Algorithms - posted by atencorps <ch...@googlemail.com> on 2009/05/17 16:55:04 UTC, 1 replies.
- Performance issues with queue-based fetching - posted by Ken Krugler <kk...@transpac.com> on 2009/05/20 02:27:25 UTC, 0 replies.
- Re: Support for Sitemap Protocol and Canonical URLs - posted by Frank McCown <fm...@harding.edu> on 2009/05/20 23:05:25 UTC, 4 replies.
- A link that begins with the question mark(?) can't be crawled. - posted by Donghyeok Kang <wo...@gmail.com> on 2009/05/21 15:51:22 UTC, 0 replies.
- [jira] Updated: (NUTCH-716) Make subcollection index filed multivalued - posted by "Dmitry Lihachev (JIRA)" <ji...@apache.org> on 2009/05/22 06:06:45 UTC, 0 replies.
- [jira] Resolved: (NUTCH-736) how long it takes nutch 1.0 to fetch - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2009/05/24 04:42:45 UTC, 0 replies.
- [jira] Commented: (NUTCH-731) Redirection of robots.txt in RobotRulesParser - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2009/05/24 05:20:45 UTC, 1 replies.
- [jira] Commented: (NUTCH-721) Fetcher2 Slow - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2009/05/24 05:51:45 UTC, 2 replies.
- [jira] Created: (NUTCH-737) urlnormalizer-unalias plugin - posted by "Dmitry Lihachev (JIRA)" <ji...@apache.org> on 2009/05/26 06:16:45 UTC, 0 replies.
- [jira] Updated: (NUTCH-737) urlnormalizer-unalias plugin - posted by "Dmitry Lihachev (JIRA)" <ji...@apache.org> on 2009/05/26 06:18:45 UTC, 3 replies.
- [jira] Created: (NUTCH-738) Close SegmentUpdater when FetchedSegments is closed - posted by "Martina Koch (JIRA)" <ji...@apache.org> on 2009/05/26 08:40:45 UTC, 0 replies.
- [jira] Updated: (NUTCH-738) Close SegmentUpdater when FetchedSegments is closed - posted by "Martina Koch (JIRA)" <ji...@apache.org> on 2009/05/26 08:42:45 UTC, 1 replies.
- [jira] Commented: (NUTCH-702) Lazy Instanciation of Metadata in CrawlDatum - posted by "Dmitry Lihachev (JIRA)" <ji...@apache.org> on 2009/05/27 04:46:45 UTC, 0 replies.
- [jira] Updated: (NUTCH-702) Lazy Instanciation of Metadata in CrawlDatum - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2009/05/27 11:10:45 UTC, 0 replies.
- [jira] Updated: (NUTCH-677) Segment merge filering based on segment content - posted by "Marcin Okraszewski (JIRA)" <ji...@apache.org> on 2009/05/27 23:05:45 UTC, 0 replies.
- [jira] Updated: (NUTCH-490) Extension point with filters for Neko HTML parser (with patch) - posted by "Marcin Okraszewski (JIRA)" <ji...@apache.org> on 2009/05/27 23:28:26 UTC, 0 replies.
- [jira] Assigned: (NUTCH-693) Add configurable option for treating nofollow behaviour. - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2009/05/28 06:16:45 UTC, 0 replies.
- [jira] Commented: (NUTCH-693) Add configurable option for treating nofollow behaviour. - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2009/05/28 06:18:45 UTC, 0 replies.
- [jira] Commented: (NUTCH-650) Hbase Integration - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2009/05/28 06:24:45 UTC, 0 replies.
- [jira] Created: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop - posted by "Dmitry Lihachev (JIRA)" <ji...@apache.org> on 2009/05/28 06:34:45 UTC, 0 replies.
- [jira] Updated: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop - posted by "Dmitry Lihachev (JIRA)" <ji...@apache.org> on 2009/05/28 06:36:45 UTC, 1 replies.
- [jira] Commented: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2009/05/28 19:51:45 UTC, 11 replies.
- [jira] Commented: (NUTCH-677) Segment merge filering based on segment content - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2009/05/28 19:59:45 UTC, 0 replies.
- Remove duplicate nutch conf files from .job file - posted by Kirby Bohling <ki...@gmail.com> on 2009/05/28 20:30:47 UTC, 2 replies.
- [jira] Created: (NUTCH-740) Configuration option to override default language for fetched pages. - posted by "Marcin Okraszewski (JIRA)" <ji...@apache.org> on 2009/05/28 23:13:45 UTC, 0 replies.
- [jira] Updated: (NUTCH-740) Configuration option to override default language for fetched pages. - posted by "Marcin Okraszewski (JIRA)" <ji...@apache.org> on 2009/05/28 23:15:45 UTC, 1 replies.
- Eclipse Nutch1.0 IOException - posted by Georg Kirschner <ge...@gmail.com> on 2009/05/29 15:34:38 UTC, 3 replies.
- [jira] Created: (NUTCH-741) Job file includes multiple copies of nutch config files. - posted by "Kirby Bohling (JIRA)" <ji...@apache.org> on 2009/05/29 22:00:45 UTC, 0 replies.
- [jira] Updated: (NUTCH-741) Job file includes multiple copies of nutch config files. - posted by "Kirby Bohling (JIRA)" <ji...@apache.org> on 2009/05/29 22:02:45 UTC, 0 replies.