You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Created] (NUTCH-2741) Remove ivy/ivy-2.2.0.jar - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 08:58:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1917) index.parse.md, index.content.md and index.db.md should support wildcard - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 11:54:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1403) Add default ScoringFilter for manipulating metadata - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/10/01 12:37:00 UTC, 6 replies.
- [jira] [Resolved] (NUTCH-1220) Upgrade Solr deps - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 12:46:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1805) Remove unnecessary transitive dependencies from Hadoop core - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 12:51:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1035) Tune Solr config for Nutch users - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 12:58:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1076) Solrindex has no documents following bin/nutch solrindex when using protocol-file - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:01:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1176) Fix all javadoc warnings from nightly builds - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:02:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1194) CrawlDB lock should be released earlier - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:17:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1186) FreeGenerator always normalizes - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:22:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:22:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1342) Read time out protocol-http - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:30:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1380) Fetcher reducer not to configure filter/normalizers - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:45:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1380) Fetcher reducer not to configure filter/normalizers - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:45:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1749) Optionally exclude title from content field - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:46:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1559) parse-metatags duplicates extracted metatags in combination with parse-tika - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:47:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2735) Update the indexer-solr documentation about the schema.xml usage - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:48:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2735) Update the indexer-solr documentation about the schema.xml usage - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:48:00 UTC, 2 replies.
- [jira] [Updated] (NUTCH-2248) CSS parser plugin - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:51:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2278) Handle alpha-2 language codes consistently - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:51:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1403) Add default ScoringFilter for manipulating metadata - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:51:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2511) SitemapProcessor limited by http.content.limit - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:52:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2419) Domain blacklist URL filter does not respect command-line override for file - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:52:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2506) host is not available for filtering on the JEXL indexing plugin - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:52:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2525) Metadata indexer cannot handle uppercase parse metadata - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:53:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2309) Scoring-Similarity Plugin raises NullPointerException when error occurs in fetching URL - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:53:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2353) Create seed file with metadata using the REST API - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 13:53:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2737) Generator: count and log reason of rejections during selection - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 14:20:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2738) Generator: document property generate.restrict.status - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 14:20:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2737) Generator: count and log reason of rejections during selection - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/10/01 14:20:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2740) Generator: generate.max.count overflow not logged - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 14:21:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2740) Generator: generate.max.count overflow not logged - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 14:22:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2279) LinkRank fails when using Hadoop MR output compression - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/10/01 14:24:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2279) LinkRank fails when using Hadoop MR output compression - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/01 14:25:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2738) Generator: document property generate.restrict.status - posted by "Hudson (Jira)" <ji...@apache.org> on 2019/10/01 15:00:00 UTC, 0 replies.
- ApacheCon North America 2020, project participation - posted by Rich Bowen <rb...@apache.org> on 2019/10/01 16:35:16 UTC, 0 replies.
- Re: [VOTE] Release Apache Nutch 2.4 RC#1 - posted by Jorge Betancourt <be...@gmail.com> on 2019/10/02 15:12:22 UTC, 2 replies.
- [VOTE] Release Apache Nutch 1.16 RC#1 - posted by Sebastian Nagel <wa...@googlemail.com> on 2019/10/02 17:54:59 UTC, 6 replies.
- [jira] [Created] (NUTCH-2742) Unable to parse specific pdf file - posted by "Mark Aragon (Jira)" <ji...@apache.org> on 2019/10/06 13:16:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2742) Unable to parse specific pdf file - posted by "Mark Aragon (Jira)" <ji...@apache.org> on 2019/10/06 13:17:00 UTC, 5 replies.
- [jira] [Commented] (NUTCH-2742) Unable to parse specific pdf file - posted by "Yossi Tamari (Jira)" <ji...@apache.org> on 2019/10/06 13:38:00 UTC, 1 replies.
- [jira] [Closed] (NUTCH-2742) Unable to parse specific pdf file - posted by "Mark Aragon (Jira)" <ji...@apache.org> on 2019/10/06 14:36:00 UTC, 1 replies.
- [jira] [Issue Comment Deleted] (NUTCH-2742) Unable to parse specific pdf file - posted by "M A (Jira)" <ji...@apache.org> on 2019/10/06 15:44:00 UTC, 1 replies.
- [jira] [Reopened] (NUTCH-2742) Unable to parse specific pdf file - posted by "M A (Jira)" <ji...@apache.org> on 2019/10/06 15:44:00 UTC, 0 replies.
- [RESULT] was [VOTE] Release Apache Nutch 2.4 RC#1 - posted by Sebastian Nagel <wa...@googlemail.com> on 2019/10/08 11:45:47 UTC, 0 replies.
- [RESULT] was [VOTE] Release Apache Nutch 1.16 RC#1 - posted by Sebastian Nagel <wa...@googlemail.com> on 2019/10/08 12:00:07 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2511) SitemapProcessor limited by http.content.limit - posted by "Yossi Tamari (Jira)" <ji...@apache.org> on 2019/10/10 15:46:00 UTC, 1 replies.
- [ANNOUNCE] Apache Nutch 2.4 Release - posted by Sebastian Nagel <sn...@apache.org> on 2019/10/11 14:59:58 UTC, 0 replies.
- [ANNOUNCE] Apache Nutch 1.16 Release - posted by Sebastian Nagel <sn...@apache.org> on 2019/10/11 15:03:07 UTC, 0 replies.
- [jira] [Created] (NUTCH-2743) Add list of Nutch properties (nutch-default.xml) to documentation - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/11 15:26:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2743) Add list of Nutch properties (nutch-default.xml) to documentation - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/11 15:28:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-2360) HTTP Basic Authentication in SolrIndexerPlugin is gone - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/11 15:36:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1126) JUnit test for urlfilter-prefix - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/11 15:37:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1522) Upgrade to Tika 1.3 - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/11 15:37:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1591) Incorrect conversion of ByteBuffer to String - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/11 15:37:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1475) Index-More Plugin -- A better fall back value for date field - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/11 15:37:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1578) Upgrade to Hadoop 1.2.0 - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/11 15:37:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2744) CrawlDbReader: improved reporting of syntactic errors in Jexl expression - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/11 19:13:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2669) Reliable solution for javax.ws packaging.type - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/11 19:34:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2671) Upgrade ant ivy library - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/11 19:37:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/11 19:38:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2290) Update licenses of bundled libraries - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/11 19:38:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2722) Fetch dependencies via https - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2019/10/13 22:30:00 UTC, 0 replies.
- [jira] [Work stopped] (NUTCH-1709) Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2019/10/13 22:31:00 UTC, 0 replies.
- [jira] [Work stopped] (NUTCH-2307) Implement Missing NutchServer REST API Tests - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2019/10/13 22:32:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2307) Implement Missing NutchServer REST API Tests - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2019/10/13 22:32:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2133) Transfer Selenium Documentation to Wiki - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/14 11:32:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2133) Transfer Selenium Documentation to WIki - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/14 11:32:00 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-2133) Transfer Selenium Documentation to WIki - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/14 11:32:00 UTC, 0 replies.
- [SECURITY] Nutch 2.3.1 affected by downstream dependency CVE-2016-6809 - posted by lewis john mcgibbney <le...@apache.org> on 2019/10/14 22:26:45 UTC, 0 replies.
- [jira] [Created] (NUTCH-2745) Solr schema.xml not shipped in binary release - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/15 11:05:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2511) SitemapProcessor limited by http.content.limit - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/15 12:04:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2746) Basic URL normalizer to normalize Unicode domain names - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/15 14:04:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2746) Basic URL normalizer to normalize Unicode domain names - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/10/15 14:10:00 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1559) parse-metatags duplicates extracted metatags - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/17 14:03:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1559) parse-metatags duplicates extracted metatags - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/17 14:07:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1559) parse-metatags duplicates extracted metatags - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/17 14:08:00 UTC, 6 replies.
- [jira] [Created] (NUTCH-2747) Replace remaining o.a.commons.logging by org.slf4j - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/17 18:34:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2747) Replace remaining o.a.commons.logging by org.slf4j - posted by "Shashanka Balakuntala Srinivasa (Jira)" <ji...@apache.org> on 2019/10/18 03:34:00 UTC, 5 replies.
- [jira] [Updated] (NUTCH-2747) Replace remaining o.a.commons.logging by org.slf4j - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/18 07:27:00 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2747) Replace remaining o.a.commons.logging by org.slf4j - posted by "Shashanka Balakuntala Srinivasa (Jira)" <ji...@apache.org> on 2019/10/18 10:12:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2748) Fetch status gone (redirect exceeded) not to overwrite existing items in CrawlDb - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/18 14:00:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2748) Fetch status gone (redirect exceeded) not to overwrite existing items in CrawlDb - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/18 15:37:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2749) Fetcher and scoring-opic: transfer score to redirects - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/18 16:21:02 UTC, 0 replies.
- [jira] [Created] (NUTCH-2750) improve CrawlDbReader & LinkDbReader reader handling - posted by "Jurian Broertjes (Jira)" <ji...@apache.org> on 2019/10/24 12:03:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2750) improve CrawlDbReader & LinkDbReader reader handling - posted by "Jurian Broertjes (Jira)" <ji...@apache.org> on 2019/10/24 12:15:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2750) improve CrawlDbReader & LinkDbReader reader handling - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/10/24 14:11:00 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2677) Update Jest client in indexer-elastic-rest plugin - posted by "Shashanka Balakuntala Srinivasa (Jira)" <ji...@apache.org> on 2019/10/30 09:06:00 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-2677) Update Jest client in indexer-elastic-rest plugin - posted by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2019/10/30 16:59:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2735) Update the indexer-solr documentation about the schema.xml usage - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/31 12:02:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2733) protocol-okhttp: add support for Brotli compression (Content-Encoding) - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/31 12:21:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2733) protocol-okhttp: add support for Brotli compression (Content-Encoding) - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/31 12:25:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2671) Upgrade ant ivy library - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/10/31 12:39:00 UTC, 0 replies.