You are viewing a plain text version of this content. The canonical link for it is here.
- FOSDEM 2016 - take action by 4th of December 2015 - posted by Roman Shaposhnik <rv...@apache.org> on 2015/12/01 07:30:21 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2177) Generator produces only one partition even in distributed mode - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/12/01 11:43:10 UTC, 3 replies.
- [jira] [Comment Edited] (NUTCH-2177) Generator produces only one partition even in distributed mode - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/12/01 12:43:10 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2177) Generator produces only one partition even in distributed mode - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/12/01 12:48:10 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2177) Generator produces only one partition even in distributed mode - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/12/01 13:49:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2172) Parsing whitespace not just tabs in contenttype-mapping.txt - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/12/01 19:55:11 UTC, 4 replies.
- [jira] [Created] (NUTCH-2179) Cleanup job for SOLR Performance Boost - posted by "David Johnson (JIRA)" <ji...@apache.org> on 2015/12/01 20:47:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2179) Cleanup job for SOLR Performance Boost - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/12/01 21:29:11 UTC, 2 replies.
- [jira] [Issue Comment Deleted] (NUTCH-2179) Cleanup job for SOLR Performance Boost - posted by "David Johnson (JIRA)" <ji...@apache.org> on 2015/12/01 21:48:11 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2172) Parsing whitespace not just tabs in contenttype-mapping.txt - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/12/01 21:55:11 UTC, 3 replies.
- [jira] [Assigned] (NUTCH-2107) plugin.xml to validate against plugin.dtd - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/12/01 22:09:10 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2107) plugin.xml to validate against plugin.dtd - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/12/01 22:18:11 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2179) Cleanup job for SOLR Performance Boost - posted by "David Johnson (JIRA)" <ji...@apache.org> on 2015/12/01 22:44:11 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2107) plugin.xml to validate against plugin.dtd - posted by "Hudson (JIRA)" <ji...@apache.org> on 2015/12/01 22:49:11 UTC, 1 replies.
- [jira] [Updated] (NUTCH-2176) Clean up of log4j.properties - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/02 13:41:10 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2176) Clean up of log4j.properties - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/02 13:41:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2176) Clean up of log4j.properties - posted by "Hudson (JIRA)" <ji...@apache.org> on 2015/12/02 13:55:11 UTC, 0 replies.
- [jira] [Created] (NUTCH-2180) FileDumper dumps data, but breaks midway on corrupt segments - posted by "Harshavardhan Manjunatha (JIRA)" <ji...@apache.org> on 2015/12/03 17:11:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2180) FileDumper dumps data, but breaks midway on corrupt segments - posted by "Harshavardhan Manjunatha (JIRA)" <ji...@apache.org> on 2015/12/03 17:15:10 UTC, 8 replies.
- [Nutch Wiki] Trivial Update of "Nutch2Tutorial" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/12/04 06:23:38 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2149) REST endpoint to read Nutch sequence files - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/04 08:12:11 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2128) Refactor configuration end point - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/04 08:12:11 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2178) DeduplicationJob to optionall group on host or domain - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/04 08:14:11 UTC, 0 replies.
- Dropping Nutch 1.11RC#1 Artifacts - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/12/04 08:51:52 UTC, 1 replies.
- [Nutch Wiki] Trivial Update of "Release_HOWTO" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/12/04 08:54:08 UTC, 4 replies.
- [VOTE] Release Apache Nutch 1.11 RC#2 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/12/04 19:03:57 UTC, 4 replies.
- Re: [MASSMAIL][VOTE] Release Apache Nutch 1.11 RC#2 - posted by Roannel Fernández Hernández <ro...@uci.cu> on 2015/12/04 19:20:02 UTC, 1 replies.
- [jira] [Assigned] (NUTCH-2172) Parsing whitespace not just tabs in contenttype-mapping.txt - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/12/06 21:57:10 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2172) index-more: document format of contenttype-mapping.txt - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/12/06 22:12:11 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2172) index-more: document format of contenttype-mapping.txt - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/12/06 22:23:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2172) index-more: document format of contenttype-mapping.txt - posted by "Hudson (JIRA)" <ji...@apache.org> on 2015/12/06 22:55:10 UTC, 0 replies.
- [RESULT] WAS Re: [VOTE] Release Apache Nutch 1.11 RC#2 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/12/08 01:41:18 UTC, 0 replies.
- [RELEASE] Apache Nutch 1.11 - posted by lewis john mcgibbney <le...@apache.org> on 2015/12/08 02:34:11 UTC, 4 replies.
- [jira] [Created] (NUTCH-2181) Add Webpage for 3rd Party Connectors/Libraries to Apache Nutch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/08 02:50:11 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2181) Add Webpage for 3rd Party Connectors/Libraries to Apache Nutch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/08 02:51:11 UTC, 0 replies.
- Fwd: ApacheCon NA 2015 Travel Assistance Applications now open! - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/12/08 05:21:07 UTC, 0 replies.
- [jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support - posted by "Vincent Slot (JIRA)" <ji...@apache.org> on 2015/12/08 11:49:11 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1449) Optionally delete documents skipped by IndexingFilters - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/08 14:03:10 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2076) exceptions are not handled when using method waitForCompletion in a try block - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/12/08 22:15:11 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2042) parse-html increase chunk size used to detect charset - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/12/08 22:47:10 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2042) parse-html increase chunk size used to detect charset - posted by "Hudson (JIRA)" <ji...@apache.org> on 2015/12/08 23:48:11 UTC, 1 replies.
- [jira] [Created] (NUTCH-2182) Make reverseUrlDirs file dumper option hash the URL for consistency - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/12/09 00:02:10 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2182) Make reverseUrlDirs file dumper option hash the URL for consistency - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/12/09 00:04:11 UTC, 0 replies.
- [jira] [Created] (NUTCH-2183) Improvement to SegmentChecker for skipping non-segments present in segments directory - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/09 04:01:10 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2183) Improvement to SegmentChecker for skipping non-segments present in segments directory - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/09 04:04:10 UTC, 1 replies.
- [GitHub] nutch pull request: fix for NUTCH-2180 FileDumper skips Corrupt Se... - posted by harsham05 <gi...@git.apache.org> on 2015/12/09 04:39:07 UTC, 2 replies.
- [jira] [Updated] (NUTCH-2180) FileDumper dumps data, but breaks midway on corrupt segments - posted by "Harshavardhan Manjunatha (JIRA)" <ji...@apache.org> on 2015/12/09 04:53:11 UTC, 1 replies.
- [jira] [Assigned] (NUTCH-2180) FileDumper dumps data, but breaks midway on corrupt segments - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/12/09 17:06:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2183) Improvement to SegmentChecker for skipping non-segments present in segments directory - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/12/09 17:12:10 UTC, 2 replies.
- [jira] [Comment Edited] (NUTCH-2180) FileDumper dumps data, but breaks midway on corrupt segments - posted by "Harshavardhan Manjunatha (JIRA)" <ji...@apache.org> on 2015/12/09 18:14:11 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2180) FileDumper dumps data, but breaks midway on corrupt segments - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/10 04:04:11 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2183) Improvement to SegmentChecker for skipping non-segments present in segments directory - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/10 04:06:11 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3327 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/12/10 06:16:18 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #3328 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/12/10 07:42:20 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/10 16:36:10 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/10 16:37:11 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/10 17:01:11 UTC, 0 replies.
- [jira] [Work started] (NUTCH-2184) Enable IndexingJob to function with no crawldb - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/12 03:37:46 UTC, 0 replies.
- [jira] [Created] (NUTCH-2184) Enable IndexingJob to function with no crawldb - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/12 03:37:46 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/12 03:37:46 UTC, 23 replies.
- [jira] [Created] (NUTCH-2185) protocol-soda-consumer plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/14 01:01:46 UTC, 0 replies.
- Deploy a Nutch crawler or use Webhose.io? - posted by "Jon.P" <jo...@gmail.com> on 2015/12/14 09:40:11 UTC, 0 replies.
- [jira] [Work stopped] (NUTCH-2184) Enable IndexingJob to function with no crawldb - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/15 23:13:46 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2184) Enable IndexingJob to function with no crawldb - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/15 23:15:47 UTC, 1 replies.
- [jira] [Created] (NUTCH-2186) -addBinaryContent flag can cause "String length must be a multiple of four" error in IndexingJob - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/12/15 23:19:46 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2184) Enable IndexingJob to function with no crawldb - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/16 10:43:46 UTC, 0 replies.
- [jira] [Created] (NUTCH-2187) Change FileDumper SHAs to all uppercase - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/12/16 22:59:46 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2187) Change FileDumper SHAs to all uppercase - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/12/16 23:06:46 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2182) Make reverseUrlDirs file dumper option hash the URL for consistency - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/12/16 23:10:47 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2182) Make reverseUrlDirs file dumper option hash the URL for consistency - posted by "Hudson (JIRA)" <ji...@apache.org> on 2015/12/17 00:00:48 UTC, 0 replies.
- [jira] [Created] (NUTCH-2188) While crawling with solr url (kerberos enabled) Error: org.apache.solr.common.SolrException: Unauthorized - posted by "Mohankumar K H (JIRA)" <ji...@apache.org> on 2015/12/17 07:11:46 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2188) While crawling with solr url (kerberos enabled) Error: org.apache.solr.common.SolrException: Unauthorized - posted by "Mohankumar K H (JIRA)" <ji...@apache.org> on 2015/12/17 11:18:46 UTC, 4 replies.
- [jira] [Updated] (NUTCH-2188) While crawling with solr url (kerberos enabled) Error: org.apache.solr.common.SolrException: Unauthorized - posted by "Mohankumar K H (JIRA)" <ji...@apache.org> on 2015/12/18 06:49:46 UTC, 1 replies.
- [jira] [Created] (NUTCH-2189) Domain filter must deactivate if no rules are present - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/21 13:34:46 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2189) Domain filter must deactivate if no rules are present - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/21 13:47:46 UTC, 1 replies.
- [jira] [Updated] (NUTCH-2065) Domain URL filter to support protocols - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/21 13:54:46 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2189) Domain filter must deactivate if no rules are present - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/12/22 22:02:46 UTC, 4 replies.
- [jira] [Commented] (NUTCH-2065) Domain URL filter to support protocols - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/12/22 22:29:46 UTC, 1 replies.
- [jira] [Closed] (NUTCH-2065) Domain URL filter to support protocols - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/23 10:20:46 UTC, 0 replies.
- [jira] [Created] (NUTCH-2190) Protocol normalizer - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/23 10:22:46 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1946) Upgrade to Gora 0.6.1 - posted by "Auro Miralles (JIRA)" <ji...@apache.org> on 2015/12/24 09:51:49 UTC, 1 replies.
- [jira] [Created] (NUTCH-2191) Add protocol-htmlunit - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/24 13:21:49 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2191) Add protocol-htmlunit - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/24 13:25:49 UTC, 0 replies.
- [jira] [Created] (NUTCH-2192) Get rid of oro - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/24 13:38:49 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2192) Get rid of oro - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/24 13:39:49 UTC, 0 replies.
- [jira] [Closed] (NUTCH-2189) Domain filter must deactivate if no rules are present - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/24 13:46:49 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2189) Domain filter must deactivate if no rules are present - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/24 13:46:49 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2190) Protocol normalizer - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/12/24 14:34:49 UTC, 0 replies.