You are viewing a plain text version of this content. The canonical link for it is here.
- question about page fetch - posted by beansproud <ga...@gmail.com> on 2008/09/02 05:21:52 UTC, 1 replies.
- problems: crawling specific domain - posted by Mohammad Monirul Hoque <im...@yahoo.com> on 2008/09/03 06:53:39 UTC, 0 replies.
- fetch an ammeded url - posted by Edward Quick <ed...@hotmail.com> on 2008/09/03 21:43:39 UTC, 1 replies.
- [jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage - posted by "Grant Ingersoll (JIRA)" <ji...@apache.org> on 2008/09/04 15:43:44 UTC, 10 replies.
- [jira] Work started: (NUTCH-621) Nutch needs to declare it's crypto usage - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2008/09/04 16:35:49 UTC, 0 replies.
- [jira] Updated: (NUTCH-621) Nutch needs to declare it's crypto usage - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2008/09/04 16:47:52 UTC, 3 replies.
- FW: Job failed! - posted by Edward Quick <ed...@hotmail.com> on 2008/09/06 09:10:11 UTC, 1 replies.
- problems parsing pdf's - posted by Edward Quick <ed...@hotmail.com> on 2008/09/07 22:59:51 UTC, 0 replies.
- nutch fetch issue - empty content - posted by Viral Shah <vi...@metaweb.com> on 2008/09/10 01:54:29 UTC, 0 replies.
- [jira] Commented: (NUTCH-631) MoreIndexingFilter fails with NoSuchElementException - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/10 16:44:44 UTC, 2 replies.
- [jira] Commented: (NUTCH-635) LinkAnalysis Tool for Nutch - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/11 19:35:44 UTC, 2 replies.
- TSU NOTIFICATION - Encryption - posted by Grant Ingersoll <gs...@apache.org> on 2008/09/11 19:48:56 UTC, 0 replies.
- Droids crawler - posted by Andrzej Bialecki <ab...@getopt.org> on 2008/09/12 14:50:27 UTC, 4 replies.
- [Nutch Wiki] Update of "PublicServers" by amitabhabanerjee - posted by Apache Wiki <wi...@apache.org> on 2008/09/17 03:01:43 UTC, 1 replies.
- [Nutch Wiki] Update of "PublicServers" by EcoliHub - posted by Apache Wiki <wi...@apache.org> on 2008/09/17 04:23:24 UTC, 0 replies.
- [jira] Created: (NUTCH-650) Hbase Integration - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/18 14:03:44 UTC, 0 replies.
- [jira] Updated: (NUTCH-650) Hbase Integration - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/18 15:25:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-639) Change LuceneDocumentWrapper visibility from private to protected - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/19 13:44:44 UTC, 2 replies.
- [jira] Created: (NUTCH-651) Remove bin/{start|stop}-balancer.sh from svn tracking - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/19 14:04:44 UTC, 0 replies.
- [jira] Created: (NUTCH-652) AdaptiveFetchSchedule#setFetchSchedule doesn't calculate fetch interval correctly - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/19 15:02:44 UTC, 0 replies.
- [jira] Updated: (NUTCH-652) AdaptiveFetchSchedule#setFetchSchedule doesn't calculate fetch interval correctly - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/19 15:04:44 UTC, 0 replies.
- [jira] Created: (NUTCH-653) Upgrade to hadoop 0.18 - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/19 15:04:44 UTC, 0 replies.
- [jira] Updated: (NUTCH-653) Upgrade to hadoop 0.18 - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/19 15:06:44 UTC, 0 replies.
- [jira] Updated: (NUTCH-633) ParseSegment no longer allow reparsing - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/19 15:18:45 UTC, 0 replies.
- [jira] Updated: (NUTCH-442) Integrate Solr/Nutch - posted by "Nick Tkach (JIRA)" <ji...@apache.org> on 2008/09/19 17:54:44 UTC, 0 replies.
- good crawler - droids - posted by Rakesh Singh <er...@yahoo.com> on 2008/09/19 21:04:54 UTC, 0 replies.
- [Nutch Wiki] Update of "Nutch0.9-Hadoop0.10-Tutorial" by MarcinOkraszewski - posted by Apache Wiki <wi...@apache.org> on 2008/09/20 00:05:52 UTC, 0 replies.
- [jira] Closed: (NUTCH-639) Change LuceneDocumentWrapper visibility from private to protected - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/20 19:05:44 UTC, 0 replies.
- [jira] Updated: (NUTCH-640) confusing description "set it to Integer.MAX_VALUE" - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/20 19:13:46 UTC, 0 replies.
- [jira] Closed: (NUTCH-651) Remove bin/{start|stop}-balancer.sh from svn tracking - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/22 13:08:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-120) one "bad" link on a page kills parsing - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 16:56:44 UTC, 0 replies.
- [jira] Closed: (NUTCH-120) one "bad" link on a page kills parsing - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 16:56:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-153) TextParser is only supposed to parse plain text, but if given postscript, it can take hours and then fail - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 17:02:44 UTC, 0 replies.
- [jira] Closed: (NUTCH-153) TextParser is only supposed to parse plain text, but if given postscript, it can take hours and then fail - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 17:02:44 UTC, 0 replies.
- [jira] Closed: (NUTCH-155) Remove web gui from the distribution to "contrib" and use OpenSearch Servlet - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 17:06:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-155) Remove web gui from the distribution to "contrib" and use OpenSearch Servlet - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 17:06:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-255) Regular Expression for RegexUrlNormalizer to remove jsessionid - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 17:12:44 UTC, 0 replies.
- [jira] Closed: (NUTCH-255) Regular Expression for RegexUrlNormalizer to remove jsessionid - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 17:12:44 UTC, 0 replies.
- [jira] Closed: (NUTCH-330) command line tool to search a Lucene index - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 17:22:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-330) command line tool to search a Lucene index - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 17:22:44 UTC, 0 replies.
- [jira] Updated: (NUTCH-355) The title of query result could like the summary have the highlight?? - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 18:06:44 UTC, 0 replies.
- [jira] Closed: (NUTCH-359) extraction of links will fail for whole page if one single link cannot be parsed - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 18:08:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-359) extraction of links will fail for whole page if one single link cannot be parsed - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 18:08:44 UTC, 0 replies.
- [jira] Closed: (NUTCH-402) Incrementalcrawling and indexing - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 18:12:44 UTC, 0 replies.
- [jira] Closed: (NUTCH-413) Fetcher ignores -noParsing command line option - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 18:20:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-413) Fetcher ignores -noParsing command line option - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 18:20:44 UTC, 0 replies.
- [jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation. - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 18:22:44 UTC, 0 replies.
- [jira] Closed: (NUTCH-451) Tool to recover partial fetcher output - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 18:24:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-451) Tool to recover partial fetcher output - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 18:24:44 UTC, 0 replies.
- [jira] Closed: (NUTCH-530) Add a combiner to improve performance on updatedb - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 18:32:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-524) Generate Problem with Single Node - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 18:32:44 UTC, 0 replies.
- [jira] Closed: (NUTCH-524) Generate Problem with Single Node - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 18:32:44 UTC, 0 replies.
- [jira] Closed: (NUTCH-556) automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 18:34:45 UTC, 0 replies.
- [jira] Commented: (NUTCH-582) Add missing type parameters - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/22 18:36:44 UTC, 0 replies.
- [jira] Closed: (NUTCH-633) ParseSegment no longer allow reparsing - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/22 18:44:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-637) Add method to nutch and tika system(Code written) - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/22 18:46:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-653) Upgrade to hadoop 0.18 - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/22 23:14:44 UTC, 3 replies.
- [jira] Commented: (NUTCH-375) Link to 0.8.x apidocs broken on website - posted by "Hudson (JIRA)" <ji...@apache.org> on 2008/09/23 06:18:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-633) ParseSegment no longer allow reparsing - posted by "Hudson (JIRA)" <ji...@apache.org> on 2008/09/23 06:18:44 UTC, 0 replies.
- [jira] Commented: (NUTCH-651) Remove bin/{start|stop}-balancer.sh from svn tracking - posted by "Hudson (JIRA)" <ji...@apache.org> on 2008/09/23 06:18:44 UTC, 1 replies.
- [jira] Commented: (NUTCH-650) Hbase Integration - posted by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2008/09/24 00:27:50 UTC, 0 replies.
- [jira] Resolved: (NUTCH-653) Upgrade to hadoop 0.18 - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/24 10:53:44 UTC, 0 replies.
- Crawled documents in readable format - posted by Allan Avendaño <aa...@fiec.espol.edu.ec> on 2008/09/27 20:24:14 UTC, 1 replies.
- Help needed in Integrating a module - posted by Nimesh Priyodit <pr...@yahoo.co.in> on 2008/09/27 21:32:44 UTC, 1 replies.
- Advise On Building Jobs Search Engine - posted by neil_rosewarm <ne...@YAHOO.COM> on 2008/09/28 14:25:53 UTC, 0 replies.
- [jira] Updated: (NUTCH-631) MoreIndexingFilter fails with NoSuchElementException - posted by "Edward Quick (JIRA)" <ji...@apache.org> on 2008/09/28 22:24:44 UTC, 0 replies.
- [jira] Resolved: (NUTCH-621) Nutch needs to declare it's crypto usage - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2008/09/29 15:06:44 UTC, 0 replies.