You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (NUTCH-2278) Handle alpha-2 language codes consistently - posted by "Fengtan (Jira)" <ji...@apache.org> on 2022/01/02 16:47:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2923) Add Job Id in Job Failure messages - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/02 19:17:00 UTC, 21 replies.
- Addressing Nutch use of CMS WAS: [IMPORTANT] - ci.apache.org and CMS Shutdown end of January 2022 - posted by lewis john mcgibbney <le...@apache.org> on 2022/01/03 00:02:42 UTC, 3 replies.
- [jira] [Comment Edited] (NUTCH-2923) Add Job Id in Job Failure messages - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/04 05:12:00 UTC, 0 replies.
- [GitHub] [nutch] lewismc commented on pull request #717: WIP NUTCH-2919 Upgrade to Tika 2.2.0 - posted by GitBox <gi...@apache.org> on 2022/01/04 16:11:29 UTC, 3 replies.
- [jira] [Commented] (NUTCH-2919) Upgrade to Tika 2.2.0 - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/04 16:12:00 UTC, 5 replies.
- [jira] [Created] (NUTCH-2925) Secure the Nutch REST API using Apache Shiro - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/04 17:02:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2925) Secure the Nutch REST API using Apache Shiro - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/04 17:05:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-2926) Implement persistent storage for Nutch Webserver resources - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/04 18:05:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2926) Implement persistent storage for Nutch Webserver resources - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/04 18:06:00 UTC, 1 replies.
- [GitHub] [nutch] kennethmcfarland commented on pull request #717: WIP NUTCH-2919 Upgrade to Tika 2.2.0 - posted by GitBox <gi...@apache.org> on 2022/01/04 21:34:32 UTC, 0 replies.
- [GitHub] [nutch] prakharchaube opened a new pull request #721: NUTCH-2923: Added JobId in Job Failure logs - posted by GitBox <gi...@apache.org> on 2022/01/05 17:54:06 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2923) Add Job Id in Job Failure messages - posted by "Prakhar Chaube (Jira)" <ji...@apache.org> on 2022/01/05 17:56:00 UTC, 1 replies.
- [GitHub] [nutch] lewismc commented on pull request #720: NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers - posted by GitBox <gi...@apache.org> on 2022/01/05 19:32:09 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2429) Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/05 19:33:00 UTC, 4 replies.
- [GitHub] [nutch] lewismc commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs - posted by GitBox <gi...@apache.org> on 2022/01/05 19:38:59 UTC, 3 replies.
- [GitHub] [nutch] prakharchaube commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs - posted by GitBox <gi...@apache.org> on 2022/01/05 20:07:37 UTC, 5 replies.
- [jira] [Commented] (NUTCH-2838) Apache Tez integration - posted by "László Bodor (Jira)" <ji...@apache.org> on 2022/01/07 18:31:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2839) Implement Tez counters in Injector job - posted by "László Bodor (Jira)" <ji...@apache.org> on 2022/01/07 19:15:00 UTC, 3 replies.
- [GitHub] [nutch] lewismc merged pull request #720: NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers - posted by GitBox <gi...@apache.org> on 2022/01/08 04:08:08 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2429) Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/08 04:09:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2429) Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/08 04:09:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2856) Implement a protocol-smb plugin based on hierynomus/smbj - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/08 04:10:00 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2839) Implement Tez counters in Injector job - posted by "László Bodor (Jira)" <ji...@apache.org> on 2022/01/08 12:09:00 UTC, 4 replies.
- [GitHub] [nutch] sebastian-nagel commented on pull request #720: NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers - posted by GitBox <gi...@apache.org> on 2022/01/08 17:50:31 UTC, 0 replies.
- [jira] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins - posted by "Karl-Philipp Richter (Jira)" <ji...@apache.org> on 2022/01/08 20:54:00 UTC, 0 replies.
- [GitHub] [nutch] lewismc commented on pull request #703: NUTCH-2903 indexer-elastic: allow to connect to Elastic server via HTTPS - posted by GitBox <gi...@apache.org> on 2022/01/08 21:39:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2903) Unable to Connect to Elasticsearch over HTTPS - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/08 21:40:00 UTC, 4 replies.
- [GitHub] [nutch] lewismc edited a comment on pull request #703: NUTCH-2903 indexer-elastic: allow to connect to Elastic server via HTTPS - posted by GitBox <gi...@apache.org> on 2022/01/08 21:40:44 UTC, 0 replies.
- [jira] [Work stopped] (NUTCH-2839) Implement Tez counters in Injector job - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/08 22:25:00 UTC, 0 replies.
- [ANNOUNCE] Apache Any23 2.6 Release - posted by lewis john mcgibbney <le...@apache.org> on 2022/01/09 00:36:33 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel merged pull request #703: NUTCH-2903 indexer-elastic: allow to connect to Elastic server via HTTPS - posted by GitBox <gi...@apache.org> on 2022/01/09 09:46:07 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2903) Unable to Connect to Elasticsearch over HTTPS - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/09 09:55:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2927) indexer-elastic: use Java API client - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/09 10:47:00 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel commented on pull request #703: NUTCH-2903 indexer-elastic: allow to connect to Elastic server via HTTPS - posted by GitBox <gi...@apache.org> on 2022/01/09 10:47:11 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1999) Add http://nutch.apache.org/robots.txt - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/09 12:43:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2928) Fix favicon of content pages - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/09 12:52:00 UTC, 0 replies.
- [GitHub] [nutch-site] sebastian-nagel opened a new pull request #1: NUTCH-1999 Add /robots.txt to Nutch site - posted by GitBox <gi...@apache.org> on 2022/01/09 12:59:15 UTC, 0 replies.
- Re: !! Join the #nutch Slack channel !! - posted by Sebastian Nagel <wa...@googlemail.com> on 2022/01/09 13:16:38 UTC, 1 replies.
- [GitHub] [nutch] sebastian-nagel commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs - posted by GitBox <gi...@apache.org> on 2022/01/10 11:56:07 UTC, 1 replies.
- [GitHub] [nutch] lewismc commented on pull request #717: NUTCH-2919 Upgrade to Tika 2.2.0 - posted by GitBox <gi...@apache.org> on 2022/01/10 20:26:52 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2919) NUTCH-2919 Upgrade to Tika 2.2.0 and Any23 2.6 - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/10 20:28:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2924) Generate maxCount expr evaluated only once - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/11 13:45:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2929) Fetcher: start threads slowly to avoid that resources are temporarily exhausted - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/11 14:04:00 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel opened a new pull request #722: NUTCH-2929 Fetcher: start threads slowly to avoid that resources are temporarily exhausted - posted by GitBox <gi...@apache.org> on 2022/01/11 14:18:06 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2929) Fetcher: start threads slowly to avoid that resources are temporarily exhausted - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/11 14:19:00 UTC, 7 replies.
- [jira] [Created] (NUTCH-2930) Protocol-okhttp: implement IP filter - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/11 15:03:00 UTC, 0 replies.
- [GitHub] [nutch] lewismc commented on pull request #722: NUTCH-2929 Fetcher: start threads slowly to avoid that resources are temporarily exhausted - posted by GitBox <gi...@apache.org> on 2022/01/11 23:40:23 UTC, 1 replies.
- [jira] [Created] (NUTCH-2931) Improvements to 1.x REST API - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/12 05:10:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2925) Secure the Nutch REST API using Apache Shiro - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/12 05:11:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2932) Create OpenAPI specification for Nutch 1.x REST API - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/12 05:16:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2933) GET /seed doesn't return previously generated seed lists - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/12 05:25:00 UTC, 0 replies.
- [GitHub] [nutch-site] lewismc commented on pull request #1: NUTCH-1999 Add /robots.txt to Nutch site - posted by GitBox <gi...@apache.org> on 2022/01/12 06:09:41 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel commented on pull request #722: NUTCH-2929 Fetcher: start threads slowly to avoid that resources are temporarily exhausted - posted by GitBox <gi...@apache.org> on 2022/01/12 16:51:15 UTC, 0 replies.
- [GitHub] [nutch] lewismc commented on pull request #717: NUTCH-2919 Upgrade to Tika 2.2.1 and Any23 2.6 - posted by GitBox <gi...@apache.org> on 2022/01/13 01:09:49 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2919) NUTCH-2919 Upgrade to Tika 2.2.0 and Any23 2.6 - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/13 01:10:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-2934) Replace Apache Ant build system with Gradle - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/13 18:39:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/13 18:41:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2638) Publish plugins in Maven - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/13 18:42:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2244) Publish Protocol-Interactiveselenium to central maven repo - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/13 18:42:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2901) migrate to maven or gradle - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/13 18:43:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2293) Make the unit tests which requires "plugin.folders" as integration tests - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/13 18:43:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2934) Replace Apache Ant build system with Gradle - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/13 18:45:00 UTC, 0 replies.
- NUTCH-2934 Replace Apache Ant build system with Gradle - posted by lewis john mcgibbney <le...@apache.org> on 2022/01/13 18:59:36 UTC, 0 replies.
- [jira] [Created] (NUTCH-2935) DeduplicationJob: failure on URLs with invalid percent encoding - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/14 09:12:00 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel opened a new pull request #723: NUTCH-2935 DeduplicationJob: failure on URLs with invalid percent encoding - posted by GitBox <gi...@apache.org> on 2022/01/14 09:38:47 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2935) DeduplicationJob: failure on URLs with invalid percent encoding - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/14 09:39:00 UTC, 3 replies.
- [GitHub] [nutch] sebastian-nagel merged pull request #722: NUTCH-2929 Fetcher: start threads slowly to avoid that resources are temporarily exhausted - posted by GitBox <gi...@apache.org> on 2022/01/14 09:41:34 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2929) Fetcher: start threads slowly to avoid that resources are temporarily exhausted - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/14 09:43:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2936) Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/14 14:05:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2573) Suspend crawling if robots.txt fails to fetch with 5xx status - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/14 15:57:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2573) Suspend crawling if robots.txt fails to fetch with 5xx status - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/14 15:57:00 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel opened a new pull request #724: NUTCH-2573 Suspend crawling if robots.txt fails to fetch with 5xx status - posted by GitBox <gi...@apache.org> on 2022/01/15 13:16:30 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2573) Suspend crawling if robots.txt fails to fetch with 5xx status - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/15 13:17:00 UTC, 5 replies.
- [jira] [Created] (NUTCH-2937) parse-tika: review dependency exclusions and avoid dependency conflicts in distributed mode - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/15 14:15:00 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel commented on pull request #717: NUTCH-2919 Upgrade to Tika 2.2.1 and Any23 2.6 - posted by GitBox <gi...@apache.org> on 2022/01/15 14:15:33 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2936) Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/15 14:20:00 UTC, 4 replies.
- [jira] [Updated] (NUTCH-2936) Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/15 14:21:00 UTC, 0 replies.
- [GitHub] [nutch] lewismc merged pull request #717: NUTCH-2919 Upgrade to Tika 2.2.1 and Any23 2.6 - posted by GitBox <gi...@apache.org> on 2022/01/15 23:24:37 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2919) NUTCH-2919 Upgrade to Tika 2.2.1 and Any23 2.6 - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/15 23:25:00 UTC, 2 replies.
- [jira] [Updated] (NUTCH-2919) NUTCH-2919 Upgrade to Tika 2.2.1 and Any23 2.6 - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/15 23:25:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2919) NUTCH-2919 Upgrade to Tika 2.2.1 and Any23 2.6 - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/15 23:25:00 UTC, 0 replies.
- [GitHub] [nutch] lewismc commented on a change in pull request #724: NUTCH-2573 Suspend crawling if robots.txt fails to fetch with 5xx status - posted by GitBox <gi...@apache.org> on 2022/01/15 23:54:01 UTC, 0 replies.
- [jira] [Created] (NUTCH-2938) Use Any23's RepositoryWriter to write structured data to Rdf4j repository - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/16 00:44:00 UTC, 0 replies.
- [GitHub] [nutch] lewismc opened a new pull request #725: NUTCH-2938 Use Any23's RepositoryWriter to write structured data to Rdf4j repository - posted by GitBox <gi...@apache.org> on 2022/01/16 01:07:05 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2938) Use Any23's RepositoryWriter to write structured data to Rdf4j repository - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/16 01:08:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2936) Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/16 02:41:00 UTC, 0 replies.
- [jira] [Work started] (NUTCH-2936) Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/16 02:41:00 UTC, 0 replies.
- [GitHub] [nutch] lewismc opened a new pull request #726: NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode - posted by GitBox <gi...@apache.org> on 2022/01/16 04:53:11 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel commented on pull request #724: NUTCH-2573 Suspend crawling if robots.txt fails to fetch with 5xx status - posted by GitBox <gi...@apache.org> on 2022/01/17 18:41:08 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel merged pull request #723: NUTCH-2935 DeduplicationJob: failure on URLs with invalid percent encoding - posted by GitBox <gi...@apache.org> on 2022/01/17 18:57:12 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel commented on pull request #723: NUTCH-2935 DeduplicationJob: failure on URLs with invalid percent encoding - posted by GitBox <gi...@apache.org> on 2022/01/17 18:58:35 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2935) DeduplicationJob: failure on URLs with invalid percent encoding - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/17 19:00:00 UTC, 0 replies.
- [GitHub] [nutch] lewismc commented on pull request #724: NUTCH-2573 Suspend crawling if robots.txt fails to fetch with 5xx status - posted by GitBox <gi...@apache.org> on 2022/01/17 19:41:27 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel merged pull request #724: NUTCH-2573 Suspend crawling if robots.txt fails to fetch with 5xx status - posted by GitBox <gi...@apache.org> on 2022/01/18 07:22:53 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2573) Suspend crawling if robots.txt fails to fetch with 5xx status - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/18 07:25:00 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel commented on a change in pull request #721: NUTCH-2923: Added JobId in Job Failure logs - posted by GitBox <gi...@apache.org> on 2022/01/18 16:49:05 UTC, 0 replies.
- [GitHub] [nutch] prakharchaube commented on a change in pull request #721: NUTCH-2923: Added JobId in Job Failure logs - posted by GitBox <gi...@apache.org> on 2022/01/18 17:11:36 UTC, 1 replies.
- [jira] [Work started] (NUTCH-2925) Secure the Nutch REST API using Apache Shiro - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/19 16:38:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-122) block numbers need a better random number generator - posted by "pankaj kumar singh (Jira)" <ji...@apache.org> on 2022/01/24 10:49:00 UTC, 10 replies.
- [GitHub] [nutch] sebastian-nagel merged pull request #721: NUTCH-2923: Added JobId in Job Failure logs - posted by GitBox <gi...@apache.org> on 2022/01/27 16:04:05 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2923) Add Job Id in Job Failure messages - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/27 16:11:00 UTC, 0 replies.