You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] Created: (NUTCH-424) CLONE - Problem persists with Nutch 0.8.1 (Nekohtml 0.9.4) - NekoHTML's DOMFragmentParser hangs on certain URLs - posted by "Karsten Dello (JIRA)" <ji...@apache.org> on 2007/01/01 22:27:27 UTC, 0 replies.
- [jira] Commented: (NUTCH-424) CLONE - Problem persists with Nutch 0.8.1 (Nekohtml 0.9.4) - NekoHTML's DOMFragmentParser hangs on certain URLs - posted by "Karsten Dello (JIRA)" <ji...@apache.org> on 2007/01/01 22:42:27 UTC, 0 replies.
- database exchange of 2 nutches (hybridity of nutch with yacy) - posted by th...@gmx.net on 2007/01/02 01:00:51 UTC, 2 replies.
- Re: [Search-l] database exchange of 2 nutches (hybridity of nutch with yacy) - posted by Toufeeq Hussain <to...@gmail.com> on 2007/01/02 02:50:39 UTC, 0 replies.
- New index-extra plugin and patch to IndexFilters - posted by Alan Tanaman <al...@idna-solutions.com> on 2007/01/02 11:24:16 UTC, 0 replies.
- [jira] Commented: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic - posted by "nutch.newbie (JIRA)" <ji...@apache.org> on 2007/01/02 12:03:27 UTC, 7 replies.
- Creating Lucence Compound Index - posted by Alan Tanaman <al...@idna-solutions.com> on 2007/01/02 13:57:03 UTC, 4 replies.
- Nutch Programmer Wanted - posted by Nutch User <nu...@gmail.com> on 2007/01/02 22:24:15 UTC, 0 replies.
- nutch81 pages seems were not kept but no error message found - posted by Chee Wu <ch...@gmail.com> on 2007/01/03 13:30:02 UTC, 0 replies.
- Bug in Nutch, possibly due to issues-273 and 322 - posted by Meghna Kukreja <om...@gmail.com> on 2007/01/03 20:03:43 UTC, 1 replies.
- [jira] Commented: (NUTCH-420) DeleteDuplicates.HashPartitioner depends on the order of IndexDocs - posted by "Dogacan Güney (JIRA)" <ji...@apache.org> on 2007/01/04 10:30:27 UTC, 3 replies.
- [jira] Updated: (NUTCH-420) DeleteDuplicates.HashPartitioner depends on the order of IndexDocs - posted by "Dogacan Güney (JIRA)" <ji...@apache.org> on 2007/01/04 10:30:27 UTC, 2 replies.
- Issues Starting Hadoop Process in Nutch0.9l.1 - posted by srinath <co...@gmail.com> on 2007/01/04 18:00:31 UTC, 0 replies.
- [jira] Created: (NUTCH-425) parse-js pollutes anchor text with base URL of source page - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2007/01/04 18:21:27 UTC, 0 replies.
- [jira] Updated: (NUTCH-425) parse-js pollutes anchor text with base URL of source page - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2007/01/04 20:05:28 UTC, 0 replies.
- [jira] Commented: (NUTCH-425) parse-js pollutes anchor text with base URL of source page - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2007/01/04 20:14:27 UTC, 0 replies.
- [jira] Created: (NUTCH-426) parse-js skips parsing if found URL fails java.net.URL parse - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2007/01/04 21:12:28 UTC, 0 replies.
- [jira] Commented: (NUTCH-426) parse-js skips parsing if found URL fails java.net.URL parse - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2007/01/04 21:14:27 UTC, 0 replies.
- [jira] Updated: (NUTCH-426) parse-js skips parsing if found URL fails java.net.URL parse - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2007/01/04 21:14:27 UTC, 0 replies.
- [jira] Created: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation. - posted by "Armel Nene (JIRA)" <ji...@apache.org> on 2007/01/05 15:44:27 UTC, 0 replies.
- [jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation. - posted by "Armel Nene (JIRA)" <ji...@apache.org> on 2007/01/05 16:11:28 UTC, 0 replies.
- protocol-smb: a new protocol plugin for Windows Shares - posted by "Armel T. Nene" <ar...@idna-solutions.com> on 2007/01/05 16:22:48 UTC, 0 replies.
- [jira] Commented: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation. - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/01/05 16:56:27 UTC, 1 replies.
- [jira] Closed: (NUTCH-426) parse-js skips parsing if found URL fails java.net.URL parse - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/01/05 18:01:27 UTC, 0 replies.
- [jira] Closed: (NUTCH-425) parse-js pollutes anchor text with base URL of source page - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/01/05 18:01:27 UTC, 0 replies.
- [jira] Resolved: (NUTCH-325) UrlFilters.java throws NPE in case urlfilter.order contains Filters that are not in plugin.includes - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/01/06 10:44:27 UTC, 0 replies.
- [jira] Assigned: (NUTCH-421) Allow predeterminate running order of index filters - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/01/06 11:36:27 UTC, 0 replies.
- [jira] Assigned: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/01/06 11:36:27 UTC, 0 replies.
- [jira] Resolved: (NUTCH-421) Allow predeterminate running order of index filters - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/01/06 21:01:27 UTC, 0 replies.
- Job Opportunity (Sunnyvale, CA) - posted by "J. Delgado" <jo...@gmail.com> on 2007/01/10 04:20:53 UTC, 0 replies.
- [jira] Created: (NUTCH-428) NullPointerException - posted by "Piyush (JIRA)" <ji...@apache.org> on 2007/01/10 15:57:28 UTC, 0 replies.
- sort result on different set of terms - posted by DS jha <ae...@gmail.com> on 2007/01/10 16:02:12 UTC, 3 replies.
- [jira] Created: (NUTCH-429) Secured Searches - posted by "Piyush (JIRA)" <ji...@apache.org> on 2007/01/11 21:08:27 UTC, 0 replies.
- [jira] Closed: (NUTCH-429) Secured Searches - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2007/01/11 21:48:27 UTC, 0 replies.
- [jira] Closed: (NUTCH-420) DeleteDuplicates.HashPartitioner depends on the order of IndexDocs - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/01/11 23:02:27 UTC, 0 replies.
- [jira] Resolved: (NUTCH-428) NullPointerException - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/01/12 23:16:27 UTC, 0 replies.
- [jira] Created: (NUTCH-430) integer overflow in HashComparator.compare - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/01/14 00:07:27 UTC, 0 replies.
- [jira] Updated: (NUTCH-430) integer overflow in HashComparator.compare - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/01/14 00:09:27 UTC, 0 replies.
- How can I get one plugin's root dir - posted by Scott Green <sm...@gmail.com> on 2007/01/15 03:40:00 UTC, 15 replies.
- [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content - posted by "Armel Nene (JIRA)" <ji...@apache.org> on 2007/01/15 11:12:27 UTC, 4 replies.
- [jira] Resolved: (NUTCH-430) integer overflow in HashComparator.compare - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/01/15 16:05:27 UTC, 0 replies.
- Multiple collections - posted by Nathan ter Bogt <nt...@gmail.com> on 2007/01/16 05:08:41 UTC, 0 replies.
- [jira] Commented: (NUTCH-39) pagination in search result - posted by "fantoni benjamin (JIRA)" <ji...@apache.org> on 2007/01/16 11:33:27 UTC, 2 replies.
- Next Nutch release - posted by Sami Siren <ss...@gmail.com> on 2007/01/16 16:53:41 UTC, 25 replies.
- Amazon S3/Ec2 problem [injection and fs.rename() problem] - posted by Mike Smith <mi...@gmail.com> on 2007/01/16 21:30:03 UTC, 0 replies.
- How to index in real time? - posted by Scott Green <sm...@gmail.com> on 2007/01/17 04:15:38 UTC, 2 replies.
- Issue with trunk (rev 496535) - posted by Sean Dean <se...@rogers.com> on 2007/01/17 08:19:51 UTC, 0 replies.
- SynonymEditor - posted by "Krebs, Urs" <Ur...@ipi.ch> on 2007/01/17 15:59:18 UTC, 1 replies.
- [jira] Closed: (NUTCH-68) A tool to generate arbitrary fetchlists - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/01/17 20:57:30 UTC, 0 replies.
- Fetcher2 - posted by Andrzej Bialecki <ab...@getopt.org> on 2007/01/17 22:18:15 UTC, 7 replies.
- [jira] Updated: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content - posted by "Armel Nene (JIRA)" <ji...@apache.org> on 2007/01/18 10:52:29 UTC, 0 replies.
- Field.index... - posted by Paul Sponagl <pa...@abrac.us> on 2007/01/18 11:23:44 UTC, 1 replies.
- java.io.EOFException in latest nightly in mergesegs from hadoop.io.DataOutputBuffer - posted by Brian Whitman <br...@variogr.am> on 2007/01/18 21:08:12 UTC, 11 replies.
- [jira] Commented: (NUTCH-48) "Did you mean" query enhancement/refignment feature request - posted by "fantoni benjamin (JIRA)" <ji...@apache.org> on 2007/01/19 09:58:30 UTC, 0 replies.
- java.lang.IllegalStateException - posted by "Armel T. Nene" <ar...@idna-solutions.com> on 2007/01/19 11:17:27 UTC, 0 replies.
- [jira] Commented: (NUTCH-74) French Analyzer Plugin - posted by "nutch.newbie (JIRA)" <ji...@apache.org> on 2007/01/20 11:11:30 UTC, 0 replies.
- [jira] Commented: (NUTCH-353) pages that serverside forwards will be refetched every time - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2007/01/20 19:28:30 UTC, 4 replies.
- [jira] Created: (NUTCH-431) Move plugin specific properties out of nutch-site.xml and into specific conf files for plugins - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2007/01/20 23:03:30 UTC, 0 replies.
- How to Become a Nutch Developer - posted by Dennis Kubes <nu...@dragonflymc.com> on 2007/01/21 20:47:53 UTC, 10 replies.
- Reviving Nutch 0.7 - posted by Otis Gospodnetic <ot...@yahoo.com> on 2007/01/22 07:47:38 UTC, 13 replies.
- Finished "How to Become a Nutch Developer" - posted by nu...@dragonflymc.com on 2007/01/23 06:33:22 UTC, 2 replies.
- How to modify crawldb values - posted by "Armel T. Nene" <ar...@idna-solutions.com> on 2007/01/23 15:50:29 UTC, 2 replies.
- is crawldb format in Nutch 0.8 compatible with Nutch0.7 - posted by "Armel T. Nene" <ar...@idna-solutions.com> on 2007/01/23 20:26:35 UTC, 1 replies.
- Cross Platform Administration and Deployment for Nutch and Hadoop - posted by Dennis Kubes <nu...@dragonflymc.com> on 2007/01/23 21:06:18 UTC, 1 replies.
- [jira] Created: (NUTCH-432) JAVA_PLATFORM with spaces (i.e. Mac OS X-ppc-32) breaks bin/nutch script - posted by "Brian Whitman (JIRA)" <ji...@apache.org> on 2007/01/24 18:39:49 UTC, 0 replies.
- [jira] Commented: (NUTCH-432) JAVA_PLATFORM with spaces (i.e. Mac OS X-ppc-32) breaks bin/nutch script - posted by "Brian Whitman (JIRA)" <ji...@apache.org> on 2007/01/24 18:44:49 UTC, 0 replies.
- [jira] Created: (NUTCH-433) java.io.EOFException in newer nightlies in mergesegs or indexing from hadoop.io.DataOutputBuffer - posted by "Brian Whitman (JIRA)" <ji...@apache.org> on 2007/01/24 18:53:49 UTC, 0 replies.
- [jira] Commented: (NUTCH-433) java.io.EOFException in newer nightlies in mergesegs or indexing from hadoop.io.DataOutputBuffer - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/01/24 19:01:49 UTC, 4 replies.
- [jira] Assigned: (NUTCH-433) java.io.EOFException in newer nightlies in mergesegs or indexing from hadoop.io.DataOutputBuffer - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/01/24 19:03:49 UTC, 0 replies.
- Minor Javascript error when english search.html page loads. - posted by Peter Lenahan <pl...@optonline.net> on 2007/01/24 19:25:50 UTC, 0 replies.
- [jira] Resolved: (NUTCH-433) java.io.EOFException in newer nightlies in mergesegs or indexing from hadoop.io.DataOutputBuffer - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/01/24 20:53:49 UTC, 0 replies.
- [jira] Created: (NUTCH-434) Replace usage of ObjectWritable with something based on GenericWritable - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/01/24 21:22:49 UTC, 0 replies.
- [jira] Commented: (NUTCH-434) Replace usage of ObjectWritable with something based on GenericWritable - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/01/24 21:47:49 UTC, 2 replies.
- [jira] Updated: (NUTCH-339) Refactor nutch to allow fetcher improvements - posted by "chee.wu (JIRA)" <ji...@apache.org> on 2007/01/25 06:53:49 UTC, 0 replies.
- [jira] Commented: (NUTCH-339) Refactor nutch to allow fetcher improvements - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/01/25 08:43:50 UTC, 0 replies.
- parse-rss test problem - posted by kauu <ba...@gmail.com> on 2007/01/25 10:08:12 UTC, 0 replies.
- Modified date in crawldb - posted by "Armel T. Nene" <ar...@idna-solutions.com> on 2007/01/25 12:52:15 UTC, 4 replies.
- threads-safe methods in Nutch - posted by "Armel T. Nene" <ar...@idna-solutions.com> on 2007/01/25 17:27:04 UTC, 0 replies.
- Re: i18n in nutch home page is misnomor - posted by Doug Cutting <cu...@apache.org> on 2007/01/25 19:00:31 UTC, 0 replies.
- Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore - posted by Doug Cutting <cu...@apache.org> on 2007/01/25 19:08:49 UTC, 6 replies.
- parse-rss make them items as different pages - posted by kauu <ba...@gmail.com> on 2007/01/26 03:17:20 UTC, 7 replies.
- [jira] Created: (NUTCH-435) Synonym-Editor that creates OWL for the ontology plugin - posted by "Urs Krebs (JIRA)" <ji...@apache.org> on 2007/01/26 13:32:50 UTC, 0 replies.
- [jira] Created: (NUTCH-436) Incorrect handling of relative paths when the embedded URL path is empty - posted by "Andrew Groh (JIRA)" <ji...@apache.org> on 2007/01/26 15:09:49 UTC, 0 replies.
- [jira] Updated: (NUTCH-436) Incorrect handling of relative paths when the embedded URL path is empty - posted by "Andrew Groh (JIRA)" <ji...@apache.org> on 2007/01/26 15:13:49 UTC, 0 replies.
- [jira] Updated: (NUTCH-435) Synonym-Editor that creates OWL for the ontology plugin - posted by "Urs Krebs (JIRA)" <ji...@apache.org> on 2007/01/26 15:13:49 UTC, 0 replies.
- record version mismatch occured - posted by Gal Nitzan <gn...@usa.net> on 2007/01/26 15:57:24 UTC, 5 replies.
- [jira] Commented: (NUTCH-436) Incorrect handling of relative paths when the embedded URL path is empty - posted by "Andrew Groh (JIRA)" <ji...@apache.org> on 2007/01/26 19:37:49 UTC, 0 replies.
- [jira] Assigned: (NUTCH-431) Move plugin specific properties out of nutch-site.xml and into specific conf files for plugins - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2007/01/26 19:47:49 UTC, 0 replies.
- java.io.FileNotFoundException: / (Is a directory) - posted by Gal Nitzan <gn...@usa.net> on 2007/01/26 22:36:45 UTC, 0 replies.
- Trunk version and NUTCH-251(Administration gui) - posted by karthik085 <ka...@gmail.com> on 2007/01/27 01:51:09 UTC, 1 replies.
- why can't test the parse-xml plugin - posted by kauu <ba...@gmail.com> on 2007/01/29 08:51:29 UTC, 0 replies.
- Generator: 0 records selected for fetching, exiting - posted by Gal Nitzan <gn...@usa.net> on 2007/01/29 15:35:27 UTC, 0 replies.
- 'RegexIndexingFilter' - posted by Tobias Zahn <To...@arcor.de> on 2007/01/29 19:57:47 UTC, 2 replies.
- [jira] Work started: (NUTCH-390) Javadoc warnings - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2007/01/30 06:55:33 UTC, 0 replies.
- [jira] Resolved: (NUTCH-390) Javadoc warnings - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2007/01/30 06:57:33 UTC, 0 replies.
- [jira] Closed: (NUTCH-390) Javadoc warnings - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2007/01/30 06:59:33 UTC, 0 replies.
- [jira] Work started: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2007/01/30 07:05:33 UTC, 0 replies.
- Meeting Date for our project - posted by ahmed ghouzia <gh...@yahoo.com> on 2007/01/30 19:37:34 UTC, 0 replies.
- RSS-fecter and index individul-how can i realize this function - posted by kauu <ba...@gmail.com> on 2007/01/31 03:01:36 UTC, 7 replies.
- log4j problem - posted by kauu <ba...@gmail.com> on 2007/01/31 06:45:01 UTC, 2 replies.
- Can't Compile Revision 501954 - posted by Tobias Zahn <To...@arcor.de> on 2007/01/31 21:00:45 UTC, 0 replies.