You are viewing a plain text version of this content. The canonical link for it is here.
- Protocol.secure - posted by Gavino Marras <g....@ifc.cnr.it> on 2006/12/01 15:32:09 UTC, 0 replies.
- [jira] Commented: (NUTCH-224) Nutch doesn't handle Korean text at all - posted by "Sean Dean (JIRA)" <ji...@apache.org> on 2006/12/02 02:41:22 UTC, 1 replies.
- Re: What's the status of Nutch-GUI? - posted by Stefan Groschupf <sg...@101tec.com> on 2006/12/02 09:04:34 UTC, 1 replies.
- Phrase query analysis-fr - posted by Rida Benjelloun <ri...@doculibre.com> on 2006/12/02 23:45:22 UTC, 0 replies.
- [jira] Created: (NUTCH-412) plugin to parse the feed-url (rss/atom) of a blog - posted by "Renaud Richardet (JIRA)" <ji...@apache.org> on 2006/12/03 08:21:21 UTC, 0 replies.
- [jira] Updated: (NUTCH-412) plugin to parse the feed-url (rss/atom) of a blog - posted by "Renaud Richardet (JIRA)" <ji...@apache.org> on 2006/12/03 08:23:22 UTC, 1 replies.
- Re: implement thai language indexing and search - posted by sanjeev <sa...@hotmail.com> on 2006/12/04 08:21:38 UTC, 5 replies.
- Re: Indexing and Re-crawling site - posted by Lukas Vlcek <lu...@gmail.com> on 2006/12/04 23:11:40 UTC, 3 replies.
- lucene/nutch investigation - posted by bruce <be...@earthlink.net> on 2006/12/05 18:43:56 UTC, 0 replies.
- Full List of Metadata Fields - posted by Shay Lawless <se...@gmail.com> on 2006/12/06 16:31:39 UTC, 0 replies.
- Re: [Archive-access-discuss] Full List of Metadata Fields - posted by Michael Stack <st...@archive.org> on 2006/12/06 17:03:40 UTC, 0 replies.
- Nutch Re-crawl same file over and over again - posted by "Armel T. Nene" <ar...@idna-solutions.com> on 2006/12/07 00:43:27 UTC, 0 replies.
- Nutch site crawling - posted by "Armel T. Nene" <ar...@idna-solutions.com> on 2006/12/07 11:47:20 UTC, 0 replies.
- [jira] Created: (NUTCH-413) Fetcher ignores -noParsing command line option - posted by "Jonathan Amir (JIRA)" <ji...@apache.org> on 2006/12/08 00:11:21 UTC, 0 replies.
- [jira] Commented: (NUTCH-413) Fetcher ignores -noParsing command line option - posted by "Dogacan Güney (JIRA)" <ji...@apache.org> on 2006/12/08 15:00:24 UTC, 2 replies.
- Want some idea abt distributed searching behind Nutch - posted by howard chen <ho...@gmail.com> on 2006/12/08 17:46:44 UTC, 0 replies.
- Re: Brochure for Nutch - posted by Doug Cutting <cu...@apache.org> on 2006/12/08 21:26:11 UTC, 0 replies.
- hi all: - posted by 吴志敏 <ba...@gmail.com> on 2006/12/09 08:59:04 UTC, 3 replies.
- Re: svn commit: r485076 - in /lucene/nutch/trunk/src: java/org/apache/nutch/metadata/SpellCheckedMetadata.java test/org/apache/nutch/metadata/TestSpellCheckedMetadata.java - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2006/12/09 23:56:11 UTC, 4 replies.
- Porn sites' link at the download page - posted by howard chen <ho...@gmail.com> on 2006/12/10 10:21:15 UTC, 2 replies.
- Fetching problem and FileProtocol bug in Nutch 0.8.1 - posted by "Armel T. Nene" <ar...@idna-solutions.com> on 2006/12/10 22:16:00 UTC, 1 replies.
- parse-mp3 plugin concatenating previous tags for text field - posted by Brian Whitman <br...@variogr.am> on 2006/12/11 14:32:54 UTC, 1 replies.
- Changing NutchConf params at Runtime. - posted by Briggs <ac...@gmail.com> on 2006/12/11 16:39:09 UTC, 0 replies.
- include hadoop native libs to nutch? - posted by Sami Siren <ss...@gmail.com> on 2006/12/11 17:26:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-248) add support for internationalized domain names - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/12/11 19:54:22 UTC, 0 replies.
- [jira] Created: (NUTCH-414) parse-mp3 plugin concatenating previous tags for text field - posted by "Brian Whitman (JIRA)" <ji...@apache.org> on 2006/12/12 16:29:20 UTC, 0 replies.
- NUTCH 0.8.1: Difficulties with Analyzers - posted by Fr...@bnc.ca on 2006/12/13 17:21:54 UTC, 0 replies.
- [jira] Created: (NUTCH-415) Generate should mark selected records in crawlDB - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/12/15 13:32:27 UTC, 0 replies.
- [jira] Created: (NUTCH-416) CrawlDatum status and CrawlDbReducer refactoring - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/12/15 13:47:20 UTC, 0 replies.
- [jira] Created: (NUTCH-417) After upgrade to hadoop-0.9.1, parsing and indexing doesn't work. - posted by "Dogacan Güney (JIRA)" <ji...@apache.org> on 2006/12/15 14:27:20 UTC, 0 replies.
- [jira] Commented: (NUTCH-417) After upgrade to hadoop-0.9.1, parsing and indexing doesn't work. - posted by "Dogacan Güney (JIRA)" <ji...@apache.org> on 2006/12/15 14:31:22 UTC, 4 replies.
- [jira] Updated: (NUTCH-417) After upgrade to hadoop-0.9.1, parsing and indexing doesn't work. - posted by "Dogacan Güney (JIRA)" <ji...@apache.org> on 2006/12/15 14:31:24 UTC, 0 replies.
- [jira] Commented: (NUTCH-415) Generate should mark selected records in crawlDB - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/12/15 15:59:22 UTC, 1 replies.
- Warning: set speculative execution to false - posted by Andrzej Bialecki <ab...@getopt.org> on 2006/12/15 16:05:06 UTC, 0 replies.
- Extracting title from XHTML pages - posted by Michael Wechner <mi...@wyona.com> on 2006/12/20 14:42:47 UTC, 4 replies.
- difference between intranet and internet crawling - posted by Michael Wechner <mi...@wyona.com> on 2006/12/20 17:47:59 UTC, 0 replies.
- [jira] Commented: (NUTCH-416) CrawlDatum status and CrawlDbReducer refactoring - posted by "Doug Cook (JIRA)" <ji...@apache.org> on 2006/12/20 23:40:22 UTC, 1 replies.
- [jira] Updated: (NUTCH-272) Max. pages to crawl/fetch per site (emergency limit) - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/12/21 06:10:22 UTC, 0 replies.
- crawl null pointer - posted by hyrogen <jo...@gmail.com> on 2006/12/21 11:22:08 UTC, 0 replies.
- [jira] Created: (NUTCH-418) Fixes parsing of XHTML (e.g. title) - posted by "Michael Wechner (JIRA)" <ji...@apache.org> on 2006/12/21 13:58:21 UTC, 0 replies.
- [jira] Updated: (NUTCH-418) Fixes parsing of XHTML (e.g. title) - posted by "Michael Wechner (JIRA)" <ji...@apache.org> on 2006/12/21 14:00:23 UTC, 0 replies.
- [jira] Commented: (NUTCH-418) Fixes parsing of XHTML (e.g. title) - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/12/21 15:49:22 UTC, 0 replies.
- [jira] Updated: (NUTCH-273) When a page is redirected, the original url is NOT updated. - posted by "Eelco Lempsink (JIRA)" <ji...@apache.org> on 2006/12/22 10:39:27 UTC, 1 replies.
- [jira] Created: (NUTCH-419) unavailable robots.txt kills fetch - posted by "Carsten Lehmann (JIRA)" <ji...@apache.org> on 2006/12/24 13:45:21 UTC, 0 replies.
- [jira] Updated: (NUTCH-419) unavailable robots.txt kills fetch - posted by "Carsten Lehmann (JIRA)" <ji...@apache.org> on 2006/12/24 14:01:26 UTC, 2 replies.
- [jira] Commented: (NUTCH-419) unavailable robots.txt kills fetch - posted by "Carsten Lehmann (JIRA)" <ji...@apache.org> on 2006/12/24 14:26:22 UTC, 0 replies.
- [jira] Created: (NUTCH-420) DeleteDuplicates.HashPartitioner depends on the order of IndexDocs - posted by "Dogacan Güney (JIRA)" <ji...@apache.org> on 2006/12/26 12:30:20 UTC, 0 replies.
- [jira] Updated: (NUTCH-420) DeleteDuplicates.HashPartitioner depends on the order of IndexDocs - posted by "Dogacan Güney (JIRA)" <ji...@apache.org> on 2006/12/26 12:32:24 UTC, 0 replies.
- [jira] Created: (NUTCH-421) Allow predeterminate running order of index filters - posted by "Alan Tanaman (JIRA)" <ji...@apache.org> on 2006/12/27 14:57:20 UTC, 0 replies.
- [jira] Updated: (NUTCH-421) Allow predeterminate running order of index filters - posted by "Alan Tanaman (JIRA)" <ji...@apache.org> on 2006/12/27 15:01:25 UTC, 2 replies.
- [jira] Closed: (NUTCH-415) Generate should mark selected records in crawlDB - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/12/28 01:10:22 UTC, 0 replies.
- [jira] Closed: (NUTCH-416) CrawlDatum status and CrawlDbReducer refactoring - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/12/28 01:14:22 UTC, 0 replies.
- [jira] Closed: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/12/28 01:18:24 UTC, 0 replies.
- [jira] Closed: (NUTCH-273) When a page is redirected, the original url is NOT updated. - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/12/28 01:18:31 UTC, 0 replies.
- [jira] Closed: (NUTCH-274) Empty row in/at end of URL-list results in error - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/12/28 01:22:23 UTC, 0 replies.
- RE: Issue with Boosting Fields - posted by Alan Tanaman <al...@idna-solutions.com> on 2006/12/28 14:39:27 UTC, 0 replies.
- linkdb bug - posted by Doğacan Güney <do...@agmlab.com> on 2006/12/28 17:15:38 UTC, 3 replies.
- [jira] Created: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic - posted by "Alan Tanaman (JIRA)" <ji...@apache.org> on 2006/12/28 20:23:22 UTC, 0 replies.
- [jira] Updated: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic - posted by "Alan Tanaman (JIRA)" <ji...@apache.org> on 2006/12/28 20:25:22 UTC, 0 replies.
- [jira] Created: (NUTCH-423) Add other index-basic fields as query plugins - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2006/12/29 01:46:20 UTC, 0 replies.
- [jira] Updated: (NUTCH-423) Add other index-basic fields as query plugins - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2006/12/29 01:48:30 UTC, 0 replies.