You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] Created: (NUTCH-452) Nutch JSF/My Faces Search Frontend - posted by "Zaheed Haque (JIRA)" <ji...@apache.org> on 2007/03/01 09:54:50 UTC, 0 replies.
- Re: Welcome Dennis Kubes as Nutch committer - posted by Sami Siren <ss...@gmail.com> on 2007/03/01 17:16:30 UTC, 2 replies.
- [jira] Created: (NUTCH-453) Move stop words to a config file - posted by "Steve Severance (JIRA)" <ji...@apache.org> on 2007/03/02 01:33:50 UTC, 0 replies.
- [jira] Commented: (NUTCH-224) Nutch doesn't handle Korean text at all - posted by "Steve Severance (JIRA)" <ji...@apache.org> on 2007/03/02 06:36:50 UTC, 0 replies.
- Issues pending before 0.9 release - posted by Andrzej Bialecki <ab...@getopt.org> on 2007/03/03 01:05:33 UTC, 27 replies.
- [jira] Created: (NUTCH-454) Review Debug Level Log Guards - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/03/04 07:00:50 UTC, 0 replies.
- [jira] Assigned: (NUTCH-436) Incorrect handling of relative paths when the embedded URL path is empty - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/03/04 07:08:50 UTC, 0 replies.
- [jira] Resolved: (NUTCH-400) Update & add missing license headers - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/03/04 08:35:51 UTC, 0 replies.
- [jira] Updated: (NUTCH-436) Incorrect handling of relative paths when the embedded URL path is empty - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/03/04 20:00:51 UTC, 0 replies.
- SSL & Nutch (SecureProtocolSocketFactory) - posted by g....@ifc.cnr.it on 2007/03/05 12:04:23 UTC, 1 replies.
- Re: java.io.FileNotFoundException: / (Is a directory) - posted by Dennis Kubes <nu...@dragonflymc.com> on 2007/03/05 16:15:04 UTC, 0 replies.
- FW: Nutch release process help - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2007/03/06 19:53:57 UTC, 3 replies.
- Nutch invertlinks error - posted by Steve Severance <st...@ivirtuoso.com> on 2007/03/06 23:44:52 UTC, 0 replies.
- [jira] Commented: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic - posted by "Nathan ter Bogt (JIRA)" <ji...@apache.org> on 2007/03/07 05:36:24 UTC, 2 replies.
- [jira] Created: (NUTCH-455) dedup on tokenized fields is faulty - posted by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2007/03/07 11:09:24 UTC, 0 replies.
- [jira] Updated: (NUTCH-455) dedup on tokenized fields is faulty - posted by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2007/03/07 11:11:24 UTC, 0 replies.
- No live nodes contain current block - posted by "Pope, Jackson" <Ja...@bl.uk> on 2007/03/07 16:02:06 UTC, 0 replies.
- 0.9 release - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2007/03/07 18:06:18 UTC, 13 replies.
- [PROPOSAL] Tika, a content analysis toolkit - posted by Jukka Zitting <ju...@gmail.com> on 2007/03/07 18:55:51 UTC, 0 replies.
- [jira] Closed: (NUTCH-432) JAVA_PLATFORM with spaces (i.e. Mac OS X-ppc-32) breaks bin/nutch script - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/03/07 20:01:43 UTC, 0 replies.
- [jira] Commented: (NUTCH-455) dedup on tokenized fields is faulty - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2007/03/07 20:07:24 UTC, 1 replies.
- [jira] Commented: (NUTCH-296) Image Search - posted by "Steve Severance (JIRA)" <ji...@apache.org> on 2007/03/07 22:59:24 UTC, 0 replies.
- [jira] Closed: (NUTCH-437) MapFile in Hadoop Trunk has changed, must update references - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/03/07 23:11:24 UTC, 0 replies.
- [jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation. - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/03/07 23:29:24 UTC, 1 replies.
- [jira] Closed: (NUTCH-167) Observation of directive - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/03/08 00:38:24 UTC, 0 replies.
- language identification training data - posted by karl wettin <ka...@gmail.com> on 2007/03/08 08:09:40 UTC, 0 replies.
- [jira] Created: (NUTCH-456) parse msexcel plugin speedup - posted by "Heiko Dietze (JIRA)" <ji...@apache.org> on 2007/03/08 10:21:24 UTC, 0 replies.
- [jira] Updated: (NUTCH-456) parse msexcel plugin speedup - posted by "Heiko Dietze (JIRA)" <ji...@apache.org> on 2007/03/08 10:23:24 UTC, 0 replies.
- [jira] Updated: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly - posted by "Heiko Dietze (JIRA)" <ji...@apache.org> on 2007/03/08 10:46:24 UTC, 1 replies.
- [jira] Created: (NUTCH-457) Create top level dist directory and checkin KEYS file to subversion be standard with Lucene Java and Hadoop - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/03/08 21:11:24 UTC, 0 replies.
- [jira] Commented: (NUTCH-457) Create top level dist directory and checkin KEYS file to subversion be standard with Lucene Java and Hadoop - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/03/08 21:26:24 UTC, 0 replies.
- How to read data from segments - posted by Steve Severance <st...@ivirtuoso.com> on 2007/03/08 22:28:13 UTC, 5 replies.
- [jira] Commented: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2007/03/08 22:28:24 UTC, 4 replies.
- Course Developer / Supporter Needed: News Site Building - posted by d e <cr...@gmail.com> on 2007/03/09 23:02:19 UTC, 0 replies.
- [jira] Resolved: (NUTCH-233) wrong regular expression hang reduce process for ever - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/03/09 23:43:09 UTC, 0 replies.
- Indexing the Interesting Part Only... - posted by d e <cr...@gmail.com> on 2007/03/10 00:48:43 UTC, 13 replies.
- Building an Archive of Pages Crawled Over - posted by d e <cr...@gmail.com> on 2007/03/10 00:57:13 UTC, 0 replies.
- [jira] Closed: (NUTCH-436) Incorrect handling of relative paths when the embedded URL path is empty - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/03/10 03:38:09 UTC, 0 replies.
- [jira] Resolved: (NUTCH-436) Incorrect handling of relative paths when the embedded URL path is empty - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/03/10 03:38:09 UTC, 0 replies.
- [jira] Closed: (NUTCH-233) wrong regular expression hang reduce process for ever - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/03/10 03:42:09 UTC, 0 replies.
- [jira] Resolved: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2007/03/10 07:49:09 UTC, 0 replies.
- [jira] Closed: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2007/03/10 07:54:09 UTC, 0 replies.
- Re: svn commit: r516728 - in /lucene/nutch/trunk/src/plugin/parse-html/src: java/org/apache/nutch/parse/html/DOMContentUtils.java test/org/apache/nutch/parse/html/TestDOMContentUtils.java - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2007/03/10 17:21:28 UTC, 0 replies.
- Re: svn commit: r516728 - in /lucene/nutch/trunk/src/plugin/parse-html/src: java/org/apache/nutch/parse/html/DOMContentUtils.java test/org/apache/nutch/parse/html/TestDOMContentUtils.java - posted by Dennis Kubes <nu...@dragonflymc.com> on 2007/03/10 18:56:10 UTC, 0 replies.
- Re: svn commit: r516759 - /lucene/nutch/trunk/CHANGES.txt - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2007/03/10 19:37:51 UTC, 2 replies.
- Re: svn commit: r516728 - in /lucene/nutch/trunk/src/plugin/parse-html/src: java/org/apache/nutch/parse/html/DOMContentUtils.java test/org/apache/nutch/parse/html/TestDOMContentUtils.java - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2007/03/10 19:39:53 UTC, 1 replies.
- Re: [Nutch-cvs] svn commit: r516885 - /lucene/nutch/trunk/build.xml - posted by Andrzej Bialecki <ab...@getopt.org> on 2007/03/11 13:18:35 UTC, 1 replies.
- Re: [Nutch-cvs] svn commit: r516888 - /lucene/nutch/trunk/bin/nutch - posted by Andrzej Bialecki <ab...@getopt.org> on 2007/03/11 13:24:20 UTC, 5 replies.
- [jira] Reopened: (NUTCH-432) JAVA_PLATFORM with spaces (i.e. Mac OS X-ppc-32) breaks bin/nutch script - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/03/11 15:35:09 UTC, 0 replies.
- Hadoop 0.11.2 vs. 0.12.1 - posted by Andrzej Bialecki <ab...@getopt.org> on 2007/03/11 21:34:38 UTC, 15 replies.
- HEADSUP: reverting my changes - posted by Sami Siren <ss...@gmail.com> on 2007/03/11 21:45:28 UTC, 0 replies.
- [jira] Updated: (NUTCH-451) Tool to recover partial fetcher output - posted by "Mathijs Homminga (JIRA)" <ji...@apache.org> on 2007/03/12 14:12:09 UTC, 1 replies.
- [jira] Commented: (NUTCH-451) Tool to recover partial fetcher output - posted by "Mathijs Homminga (JIRA)" <ji...@apache.org> on 2007/03/12 15:05:09 UTC, 0 replies.
- DummySSLProtocolSocketFactory problem - posted by Gavino Marras <g....@ifc.cnr.it> on 2007/03/12 16:53:19 UTC, 1 replies.
- [jira] Created: (NUTCH-458) Proxy forwarding to nutch.war does not work. Need to add some code... - posted by "My Nutch (JIRA)" <ji...@apache.org> on 2007/03/12 17:50:09 UTC, 0 replies.
- Build failed in Hudson: Nutch-Nightly #19 - posted by hu...@lucene.zones.apache.org on 2007/03/13 08:02:32 UTC, 0 replies.
- Hudson build is back to normal: Nutch-Nightly #22 - posted by hu...@lucene.zones.apache.org on 2007/03/14 02:42:04 UTC, 0 replies.
- DummySSLProtocolSocketFactory problem, please help me!!!! - posted by Gavino Marras <g....@ifc.cnr.it> on 2007/03/14 15:39:46 UTC, 3 replies.
- New Jira Hudson plugin - posted by Nigel Daley <nd...@yahoo-inc.com> on 2007/03/14 19:22:44 UTC, 3 replies.
- Re: 0.12.1 release plan - posted by Nigel Daley <nd...@yahoo-inc.com> on 2007/03/14 22:46:15 UTC, 1 replies.
- [jira] Updated: (NUTCH-459) Upgrade Nutch to Hadoop 0.12.1 - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/03/15 15:23:09 UTC, 0 replies.
- [jira] Created: (NUTCH-459) Upgrade Nutch to Hadoop 0.12.1 - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/03/15 15:23:09 UTC, 0 replies.
- ApacheCon in Amsterdam - posted by Marc Boucher <ma...@gmail.com> on 2007/03/16 01:42:49 UTC, 4 replies.
- Help me in writing plugin for extracting tag from HTML Pages - posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/16 05:53:35 UTC, 0 replies.
- [jira] Created: (NUTCH-460) RDF parser plugin - posted by "Ricardo J. Méndez (JIRA)" <ji...@apache.org> on 2007/03/17 05:43:09 UTC, 0 replies.
- [jira] Updated: (NUTCH-460) RDF parser plugin - posted by "Ricardo J. Méndez (JIRA)" <ji...@apache.org> on 2007/03/17 05:45:10 UTC, 0 replies.
- Launching custom classes - posted by Steve Severance <st...@ivirtuoso.com> on 2007/03/19 15:09:56 UTC, 2 replies.
- [jira] Created: (NUTCH-461) microformats-reltag plugin and relative links - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2007/03/19 23:09:32 UTC, 0 replies.
- [jira] Commented: (NUTCH-381) Ignore external link not work as expected - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/03/20 00:41:32 UTC, 0 replies.
- [jira] Closed: (NUTCH-381) Ignore external link not work as expected - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/03/20 00:44:32 UTC, 0 replies.
- [jira] Closed: (NUTCH-277) Fetcher dies because of "max. redirects" (avoiding infinite loop) - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/03/20 00:46:32 UTC, 0 replies.
- [jira] Closed: (NUTCH-459) Upgrade Nutch to Hadoop 0.12.1 - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/03/20 00:51:32 UTC, 0 replies.
- [jira] Updated: (NUTCH-353) pages that serverside forwards will be refetched every time - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/03/20 00:51:32 UTC, 0 replies.
- [jira] Closed: (NUTCH-450) How to set up nutch - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/03/20 00:53:32 UTC, 0 replies.
- [jira] Created: (NUTCH-462) Noarchive urls are available via the cache link - posted by "Steve Severance (JIRA)" <ji...@apache.org> on 2007/03/20 03:40:33 UTC, 0 replies.
- [jira] Commented: (NUTCH-462) Noarchive urls are available via the cache link - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/03/20 09:23:32 UTC, 0 replies.
- Re: svn commit: r516643 - in /lucene/nutch/trunk/src/plugin/parse-html/src: java/org/apache/nutch/parse/html/DOMContentUtils.java test/org/apache/nutch/parse/html/TestDOMContentUtils.java - posted by Doug Cutting <cu...@apache.org> on 2007/03/20 18:48:09 UTC, 0 replies.
- [jira] Closed: (NUTCH-462) Noarchive urls are available via the cache link - posted by "Steve Severance (JIRA)" <ji...@apache.org> on 2007/03/20 19:40:32 UTC, 0 replies.
- Multi-pass algorithms - posted by Steve Severance <st...@ivirtuoso.com> on 2007/03/20 22:28:20 UTC, 0 replies.
- [jira] Created: (NUTCH-463) Nutch powerpoint parser plugin fails to parse ppt with images - posted by "Wilson Fong (JIRA)" <ji...@apache.org> on 2007/03/20 23:26:32 UTC, 0 replies.
- [jira] Updated: (NUTCH-463) Nutch powerpoint parser plugin fails to parse ppt with images - posted by "Wilson Fong (JIRA)" <ji...@apache.org> on 2007/03/20 23:28:32 UTC, 0 replies.
- [jira] Commented: (NUTCH-460) RDF parser plugin - posted by "Ricardo J. Méndez (JIRA)" <ji...@apache.org> on 2007/03/21 16:14:32 UTC, 0 replies.
- Distributed Search with nutch - posted by Xavier Quintuna <xa...@gmail.com> on 2007/03/21 19:46:18 UTC, 0 replies.
- [jira] Commented: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement - posted by "Michael Gillis (JIRA)" <ji...@apache.org> on 2007/03/22 04:37:32 UTC, 0 replies.
- [jira] Closed: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/03/22 11:08:33 UTC, 0 replies.
- nutch slide in lucene presentation - posted by Yonik Seeley <yo...@apache.org> on 2007/03/22 20:36:49 UTC, 4 replies.
- indexing with current trunk - posted by Steve Severance <st...@ivirtuoso.com> on 2007/03/22 20:49:25 UTC, 6 replies.
- FW: [jira] Created: (HADOOP-1147) remove all @author tags from source - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2007/03/22 21:24:37 UTC, 1 replies.
- I: COME SI FA' AD ANDARE AVANTI ?? - posted by Info <in...@radionav.it> on 2007/03/23 10:56:13 UTC, 0 replies.
- Breaking change in webapp? - posted by Steve Severance <st...@ivirtuoso.com> on 2007/03/23 15:57:47 UTC, 0 replies.
- Nutch on windows with cygwin. - posted by Yundeng Cao <yu...@gmail.com> on 2007/03/25 11:32:33 UTC, 0 replies.
- Problem with modifying Plugin - posted by z0mbi3 <ak...@gmail.com> on 2007/03/26 11:32:01 UTC, 2 replies.
- Initiation of 0.9 release process - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2007/03/26 17:55:00 UTC, 2 replies.
- Image Search Engine Input - posted by Steve Severance <st...@ivirtuoso.com> on 2007/03/26 22:04:26 UTC, 5 replies.
- Nutch 0 .9 release progress update - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2007/03/27 04:51:00 UTC, 2 replies.
- [jira] Commented: (NUTCH-330) command line tool to search a Lucene index - posted by "chsanthoshkumar (JIRA)" <ji...@apache.org> on 2007/03/27 07:03:32 UTC, 0 replies.
- [jira] Created: (NUTCH-464) Commandline Search - posted by "chsanthoshkumar (JIRA)" <ji...@apache.org> on 2007/03/27 07:09:32 UTC, 0 replies.
- [jira] Updated: (NUTCH-464) Commandline Search - posted by "chsanthoshkumar (JIRA)" <ji...@apache.org> on 2007/03/27 07:11:32 UTC, 0 replies.
- [VOTE] Release Apache Nutch 0.9 - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2007/03/27 07:43:17 UTC, 28 replies.
- [jira] Created: (NUTCH-465) I download nutch 0.9 used tar zxvf nutch-0.9.tar.gz at last A lone zero block - posted by "qiuwenbin (JIRA)" <ji...@apache.org> on 2007/03/27 12:54:32 UTC, 0 replies.
- search for specific html tag by Nutch - posted by Neelesh Rathore <ne...@in.v2solutions.com> on 2007/03/27 13:55:18 UTC, 0 replies.
- Search inside any html tag in nutch - posted by Neelesh Rathore <ne...@in.v2solutions.com> on 2007/03/27 14:16:48 UTC, 0 replies.
- Search inside any html tag by nutch - posted by Neelesh Rathore <ne...@in.v2solutions.com> on 2007/03/27 14:16:48 UTC, 1 replies.
- [jira] Commented: (NUTCH-464) Commandline Search - posted by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2007/03/27 14:58:32 UTC, 0 replies.
- [jira] Resolved: (NUTCH-432) JAVA_PLATFORM with spaces (i.e. Mac OS X-ppc-32) breaks bin/nutch script - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2007/03/27 19:01:33 UTC, 0 replies.
- [jira] Updated: (NUTCH-438) Add -noAdditions to updatedb - posted by "Nicolás Lichtmaier (JIRA)" <ji...@apache.org> on 2007/03/27 20:26:32 UTC, 0 replies.
- [jira] Closed: (NUTCH-464) Commandline Search - posted by "chsanthoshkumar (JIRA)" <ji...@apache.org> on 2007/03/28 08:03:32 UTC, 0 replies.
- Filter the urls from search results. - posted by inalasuresh <in...@care2.com> on 2007/03/28 09:31:35 UTC, 0 replies.
- Next release - 0.10.0 or 1.0.0 ? - posted by Andrzej Bialecki <ab...@getopt.org> on 2007/03/28 20:38:15 UTC, 3 replies.
- Sequence File Question - posted by Steve Severance <st...@ivirtuoso.com> on 2007/03/28 22:11:07 UTC, 5 replies.
- [jira] Commented: (NUTCH-435) Synonym-Editor that creates OWL for the ontology plugin - posted by "Urs Krebs (JIRA)" <ji...@apache.org> on 2007/03/29 15:01:25 UTC, 0 replies.
- Problem Extracting HTML Meta Tags - posted by z0mbi3 <ak...@gmail.com> on 2007/03/30 08:18:47 UTC, 0 replies.
- Re: Image Search Engine Input (General storage of extra data for use by Nutch) - posted by Ed Whittaker <ep...@gmail.com> on 2007/03/30 17:56:19 UTC, 0 replies.