You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (NUTCH-2667) Update Tika and Commons Collections 4 - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/04 08:49:00 UTC, 3 replies.
- Build failed in Jenkins: Nutch-nutchgora #1624 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/04 09:40:28 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1625 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/04 10:05:22 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1626 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/04 12:16:15 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1627 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/04 12:32:11 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #1628 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/04 14:49:39 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2667) Update Tika and Commons Collections 4 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/04 15:52:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2682) Upgrade to Tika 1.20 - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/04 16:48:00 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2658) Add README file to all plugins in src/plugin - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/06 11:18:00 UTC, 1 replies.
- [jira] [Updated] (NUTCH-2658) Add README file to all plugins in src/plugin - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 11:20:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2658) Add README file to all plugins in src/plugin - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 11:21:00 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3593 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/06 11:44:18 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2307) Implement Missing NutchServer REST API Tests - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 11:46:00 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3594 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/06 11:49:09 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3595 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/06 11:50:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2657) Protocol-http to store HTTP response header with "\r\n" - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/06 11:53:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2657) Protocol-http to store HTTP response header with "\r\n" - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 11:53:00 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #3596 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/06 12:53:45 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2628) Fetcher: optionally generate signature of unparsed content - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/06 19:41:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2628) Fetcher: optionally generate signature of unparsed content - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 19:41:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2475) If and else-if branches has the same condition - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 20:01:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2627) Fetcher to optionally filter URLs - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 20:06:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2671) Upgrade ant ivy library - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 20:08:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2475) If and else-if branches has the same condition - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 20:26:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1783) Cleanup temp folders in case of failures - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 20:33:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1377) Add option to index via CloudSolrServer instead - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 20:40:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2475) If and else-if branches has the same condition - posted by "Hudson (JIRA)" <ji...@apache.org> on 2019/01/06 20:43:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2675) Give parsers the capability to read and write CrawlDatum - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 20:44:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2656) Update description to configure Solr 7.x in tutorial - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 20:46:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2663) Improve index-jexl-filter syntax for scripts - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 21:06:00 UTC, 1 replies.
- [jira] [Updated] (NUTCH-2395) Cannot run job worker! - error while running multiple crawling jobs in parallel - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/06 21:13:00 UTC, 5 replies.
- [jira] [Created] (NUTCH-2683) DeduplicationJob: add option to prefer https:// over http:// - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/07 08:05:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1623) Implement file.content.ignored function - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/07 09:27:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2395) Cannot run job worker! - error while running multiple crawling jobs in parallel - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/07 09:38:01 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2673) EOFException protocol-http - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/07 10:09:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/07 10:10:01 UTC, 6 replies.
- [jira] [Resolved] (NUTCH-2670) org.apache.nutch.indexer.IndexerMapReduce does not read the value of "indexer.delete" from nutch-site.xml - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/07 10:12:00 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver - posted by "Stas Batururimi (JIRA)" <ji...@apache.org> on 2019/01/07 10:49:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2683) DeduplicationJob: add option to prefer https:// over http:// - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/07 11:14:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2680) Documentation: https supported by multiple protocol plugins not only httpclient - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/07 11:19:00 UTC, 2 replies.
- [jira] [Closed] (NUTCH-2673) EOFException protocol-http - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2019/01/07 11:29:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2666) Increase default value for http.content.limit / ftp.content.limit / file.content.limit - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/07 11:43:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2666) Increase default value for http.content.limit / ftp.content.limit / file.content.limit - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/07 11:48:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2678) Allow for per-host configurable protocol plugin - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2019/01/08 12:43:00 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2678) Allow for per-host configurable protocol plugin - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2019/01/08 12:44:00 UTC, 7 replies.
- [ANNOUNCE] Apache Roadshow Chicago, Call for Presentations - posted by Trevor Grant <ra...@apache.org> on 2019/01/15 14:41:58 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2631) KafkaIndexWriter - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/15 15:11:00 UTC, 6 replies.
- [jira] [Created] (NUTCH-2684) Add README.md file to all indexer writers plugins - posted by "Roannel Fernández Hernández (JIRA)" <ji...@apache.org> on 2019/01/15 21:39:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2685) Add README.md file to all exchange plugins - posted by "Roannel Fernández Hernández (JIRA)" <ji...@apache.org> on 2019/01/15 21:43:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2686) Separate field for mime types mapped by index-more plugin - posted by "Roannel Fernández Hernández (JIRA)" <ji...@apache.org> on 2019/01/15 23:31:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2686) Separate field for mime types mapped by index-more plugin - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/16 13:49:00 UTC, 5 replies.
- [jira] [Created] (NUTCH-2687) Regex for reading title from Content-Disposition is wrong - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2019/01/16 14:45:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2687) Regex for reading title from Content-Disposition is wrong - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2019/01/16 14:48:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2685) Add README.md file to all exchange plugins - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/16 15:52:00 UTC, 2 replies.
- [jira] [Created] (NUTCH-2688) Unify the licence headers - posted by "Roannel Fernández Hernández (JIRA)" <ji...@apache.org> on 2019/01/17 15:55:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2688) Unify the licence headers - posted by "Roannel Fernández Hernández (JIRA)" <ji...@apache.org> on 2019/01/17 15:59:00 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2688) Unify the licence headers - posted by "Roannel Fernández Hernández (JIRA)" <ji...@apache.org> on 2019/01/17 16:04:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2687) Regex for reading title from Content-Disposition is wrong - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/18 09:08:00 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-2687) Regex for reading title from Content-Disposition is wrong - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2019/01/18 10:39:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2678) Allow for per-host configurable protocol plugin - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2019/01/18 12:29:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2653) ProtocolFactory.getProtocol(url) creates separate plugin instances for http/https - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/18 15:17:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2663) Improve index-jexl-filter syntax for scripts - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/18 15:25:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2663) Improve index-jexl-filter syntax for scripts - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/18 15:25:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2680) Documentation: https supported by multiple protocol plugins not only httpclient - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/18 15:27:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2680) Documentation: https supported by multiple protocol plugins not only httpclient - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/18 15:27:00 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3601 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/18 15:44:10 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3602 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/18 15:55:41 UTC, 0 replies.
- [Nutch Wiki] Update of "IndexWriters" by RoannelFernandez - posted by Apache Wiki <wi...@apache.org> on 2019/01/20 22:57:52 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2629) Documentation for CSV Index Writer - posted by "Roannel Fernández Hernández (JIRA)" <ji...@apache.org> on 2019/01/21 00:03:00 UTC, 0 replies.
- [Nutch Wiki] Update of "IndexWriters" by SebastianNagel - posted by Apache Wiki <wi...@apache.org> on 2019/01/21 11:03:33 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2629) Documentation for CSV Index Writer - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/21 11:05:00 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3603 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/21 14:44:04 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2682) Upgrade to Tika 1.20 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/21 15:38:00 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3604 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/21 15:44:14 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3605 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/21 15:44:46 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3606 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/21 15:45:35 UTC, 0 replies.
- [jira] [Created] (NUTCH-2689) Speed up urlfilter-regex and urlfilter-automaton - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/22 13:46:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2689) Speed up urlfilter-regex and urlfilter-automaton - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/22 14:14:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2689) Speed up urlfilter-regex and urlfilter-automaton - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/22 14:14:00 UTC, 4 replies.
- [jira] [Resolved] (NUTCH-2686) Separate field for mime types mapped by index-more plugin - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/22 15:45:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2690) Configurable and fast URL filter - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/22 15:49:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2690) Configurable and fast URL filter - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/22 15:52:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2691) Improve logging from scoring-depth plugin - posted by "Yossi Tamari (JIRA)" <ji...@apache.org> on 2019/01/22 15:57:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2598) URLNormalizerChecker fails on invalid URLs in input - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/22 16:06:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2691) Improve logging from scoring-depth plugin - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/22 16:06:00 UTC, 3 replies.
- [jira] [Updated] (NUTCH-2691) Improve logging from scoring-depth plugin - posted by "Yossi Tamari (JIRA)" <ji...@apache.org> on 2019/01/22 16:25:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2685) Add README.md file to all exchange plugins - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/22 16:27:00 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3607 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/22 16:44:12 UTC, 0 replies.
- [jira] [Created] (NUTCH-2692) Subcollection to support case-insensitive white and black lists - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2019/01/28 09:41:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2692) Subcollection to support case-insensitive white and black lists - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2019/01/28 11:08:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2691) Improve logging from scoring-depth plugin - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/29 10:20:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2689) Speed up urlfilter-regex and urlfilter-automaton - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/01/29 10:33:00 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3608 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/29 10:44:12 UTC, 0 replies.