You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Resolved] (NUTCH-2423) Update contributor info page - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/01 07:01:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2053) Uncessary dependencies included in ivy.xml (post NUTCH-2038) - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/01 07:16:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-2053) Uncessary dependencies included in ivy.xml (post NUTCH-2038) - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/01 07:16:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1947) Overhaul o.a.n.parse.OutlinkExtractor.java - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/01 07:21:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1947) Overhaul o.a.n.parse.OutlinkExtractor.java - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/01 07:21:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1635) New crawldb sometimes ends up in current - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/01 07:32:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1374) Workaround for license headers - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/01 07:40:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1373) Implement consistent execution of normalising and filtering in Generator - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/01 07:41:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1373) Implement consistent execution of normalising and filtering in Generator - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/01 07:41:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-3011) HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx) - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/01 12:08:00 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel opened a new pull request, #786: NUTCH-3011 HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx) - posted by "sebastian-nagel (via GitHub)" <gi...@apache.org> on 2023/10/01 12:20:41 UTC, 0 replies.
- [jira] [Commented] (NUTCH-3011) HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx) - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/01 12:21:00 UTC, 3 replies.
- Re: [PR] NUTCH-3010 Injector: count unique number of injected URLs [nutch] - posted by "sebastian-nagel (via GitHub)" <gi...@apache.org> on 2023/10/02 09:17:32 UTC, 1 replies.
- [jira] [Commented] (NUTCH-3010) Injector: count unique number of injected URLs - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/02 09:18:00 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-3010) Injector: count unique number of injected URLs - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/02 09:19:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1635) New crawldb sometimes ends up in current - posted by "Markus Jelsma (Jira)" <ji...@apache.org> on 2023/10/02 09:32:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.9.0 - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2023/10/02 15:51:00 UTC, 8 replies.
- [jira] [Comment Edited] (NUTCH-2959) Upgrade to Apache Tika 2.9.0 - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2023/10/02 15:52:00 UTC, 0 replies.
- Re: [PR] NUTCH-2897 Do not supress deprecated API warnings [nutch] - posted by "sebastian-nagel (via GitHub)" <gi...@apache.org> on 2023/10/03 10:25:33 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2897) Do not supress deprecated API warnings - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/03 10:26:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2897) Do not supress deprecated API warnings - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/03 10:27:00 UTC, 0 replies.
- Re: [PR] NUTCH-2853 bin/nutch: remove deprecated commands solrindex, solrdedup, solrclean [nutch] - posted by "sebastian-nagel (via GitHub)" <gi...@apache.org> on 2023/10/03 10:28:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2853) bin/nutch: remove deprecated commands solrindex, solrdedup, solrclean - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/03 10:29:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2853) bin/nutch: remove deprecated commands solrindex, solrdedup, solrclean - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/03 10:29:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2938) Use Any23's RepositoryWriter to write structured data to Rdf4j repository - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/03 10:32:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2938) Use Any23's RepositoryWriter to write structured data to Rdf4j repository - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/03 10:35:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-2938) Use Any23's RepositoryWriter to write structured data to Rdf4j repository - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/03 10:35:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1130) JUnit test for Any23 RDF plugin - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/03 10:36:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1130) JUnit test for Any23 RDF plugin - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/03 10:36:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-475) Adaptive crawl delay - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/03 11:17:00 UTC, 0 replies.
- Re: [PR] NUTCH-2959 -- upgrade Tika to 2.9.0 [nutch] - posted by "tballison (via GitHub)" <gi...@apache.org> on 2023/10/03 13:55:34 UTC, 4 replies.
- Re: [PR] NUTCH-3002 Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive [nutch] - posted by "jnioche (via GitHub)" <gi...@apache.org> on 2023/10/06 11:07:30 UTC, 1 replies.
- Re: [PR] NUTCH-3011 HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx) [nutch] - posted by "jnioche (via GitHub)" <gi...@apache.org> on 2023/10/06 11:07:55 UTC, 1 replies.
- [jira] [Commented] (NUTCH-3002) Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/06 11:08:00 UTC, 2 replies.
- Re: [PR] NUTCH-2990 HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309 [nutch] - posted by "jnioche (via GitHub)" <gi...@apache.org> on 2023/10/06 11:10:36 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2990) HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309 - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/06 11:11:00 UTC, 2 replies.
- [jira] [Created] (NUTCH-3012) SegmentReader when dumping with option -recode: NPE on documents without charset defined - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/09 06:12:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-3012) SegmentReader when dumping with option -recode: NPE on unparsed documents - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/09 08:24:00 UTC, 1 replies.
- [PR] NUTCH-3012 SegmentReader when dumping with option -recode: NPE on unarsed documents [nutch] - posted by "sebastian-nagel (via GitHub)" <gi...@apache.org> on 2023/10/09 08:29:41 UTC, 1 replies.
- [jira] [Commented] (NUTCH-3012) SegmentReader when dumping with option -recode: NPE on unparsed documents - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/09 08:30:00 UTC, 2 replies.
- [jira] [Created] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/21 03:31:00 UTC, 0 replies.
- [PR] NUTCH-3013 Employ commons-lang3's StopWatch to simplify timing logic [nutch] - posted by "lewismc (via GitHub)" <gi...@apache.org> on 2023/10/21 05:29:09 UTC, 3 replies.
- [jira] [Commented] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/21 05:30:00 UTC, 4 replies.
- [jira] [Work started] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/21 05:36:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-3014) Standardize NutchJob job names - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/21 06:02:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-3014) Standardize NutchJob job names - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/21 13:45:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-3002) Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/21 13:47:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-3002) Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/21 13:47:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-3006) Downgrade Tika dependency to 2.2.1 (core and parse-tika) - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/21 13:48:00 UTC, 0 replies.
- Re: [PR] NUTCH-3009 Upgrade to Hadoop 3.3.6 [nutch] - posted by "sebastian-nagel (via GitHub)" <gi...@apache.org> on 2023/10/21 13:49:10 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-3009) Upgrade to Hadoop 3.3.6 - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/21 13:50:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-3009) Upgrade to Hadoop 3.3.6 - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/21 13:50:00 UTC, 1 replies.
- [jira] [Assigned] (NUTCH-3009) Upgrade to Hadoop 3.3.6 - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/21 13:50:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2990) HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309 - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/21 13:54:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-3011) HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx) - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/21 14:21:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-3012) SegmentReader when dumping with option -recode: NPE on unparsed documents - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/21 14:22:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/21 18:11:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/21 18:11:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-3014) Standardize Job names - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/22 17:49:00 UTC, 2 replies.
- [PR] NUTCH-3014 Standardize Job names [nutch] - posted by "lewismc (via GitHub)" <gi...@apache.org> on 2023/10/22 18:03:55 UTC, 1 replies.
- [jira] [Commented] (NUTCH-3014) Standardize Job names - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/22 18:04:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-3015) Add more CI steps to GitHub master-build.yml - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/22 19:28:00 UTC, 0 replies.
- [PR] NUTCH-3015 Add more CI steps to GitHub master-build.yml [nutch] - posted by "lewismc (via GitHub)" <gi...@apache.org> on 2023/10/23 02:42:12 UTC, 3 replies.
- [jira] [Commented] (NUTCH-3015) Add more CI steps to GitHub master-build.yml - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/23 02:43:00 UTC, 4 replies.
- Nutch codebase formatting - posted by lewis john mcgibbney <le...@apache.org> on 2023/10/23 19:28:45 UTC, 2 replies.
- [jira] [Work started] (NUTCH-3014) Standardize Job names - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/23 20:08:00 UTC, 0 replies.
- [jira] [Work started] (NUTCH-3015) Add more CI steps to GitHub master-build.yml - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/23 20:09:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2887) Migrate to JUnit 5 Jupiter - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/24 00:22:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-3016) Upgrade Apache Ivy to 2.5.2 - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/24 14:54:00 UTC, 0 replies.
- [jira] [Work started] (NUTCH-2887) Migrate to JUnit 5 Jupiter - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/24 16:38:00 UTC, 0 replies.
- [PR] NUTCH-2887 Migrate to JUnit 5 Jupiter [nutch] - posted by "lewismc (via GitHub)" <gi...@apache.org> on 2023/10/24 16:39:10 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2887) Migrate to JUnit 5 Jupiter - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/24 16:40:00 UTC, 1 replies.
- [jira] [Work stopped] (NUTCH-3015) Add more CI steps to GitHub master-build.yml - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/27 22:05:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-3015) Add more CI steps to GitHub master-build.yml - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/27 22:05:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-3015) Add more CI steps to GitHub master-build.yml - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2023/10/27 22:05:00 UTC, 0 replies.
- Build failed in Jenkins: Nutch ยป Nutch-trunk #135 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2023/10/27 22:26:46 UTC, 0 replies.
- Call for Presentations now open: Community over Code EU 2024 - posted by Ryan Skraba <rs...@apache.org> on 2023/10/30 17:06:16 UTC, 0 replies.
- [jira] [Created] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input - posted by "Julien Nioche (Jira)" <ji...@apache.org> on 2023/10/30 17:18:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2023/10/30 17:58:00 UTC, 1 replies.
- [PR] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input [NUTCH-3017] [nutch] - posted by "jnioche (via GitHub)" <gi...@apache.org> on 2023/10/30 18:22:31 UTC, 2 replies.
- [jira] [Commented] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/30 18:23:00 UTC, 5 replies.
- [PR] [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 [nutch] - posted by "jnioche (via GitHub)" <gi...@apache.org> on 2023/10/30 18:47:42 UTC, 2 replies.
- [jira] [Created] (NUTCH-3018) Consider pooling remote webdrivers for Selenium? - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2023/10/31 18:23:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2959) Upgrade to Apache Tika 2.9.0 - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2023/10/31 18:24:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-3019) Upgrade to Apache Tika 2.9.1 - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2023/10/31 18:25:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-3019) Upgrade to Apache Tika 2.9.1 - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2023/10/31 18:26:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-3018) Consider pooling remote webdrivers for Selenium? - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2023/10/31 18:32:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-3018) Consider pooling remote webdrivers for Selenium? - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2023/10/31 18:40:00 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-3018) Consider pooling remote webdrivers for Selenium? - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2023/10/31 18:47:00 UTC, 1 replies.