You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Created] (NUTCH-2786) TrustManager methods do not have certificate validation logic - posted by "Md Mahir Asef Kabir (Jira)" <ji...@apache.org> on 2020/05/04 03:21:00 UTC, 0 replies.
- [GitHub] [nutch] AthenaXiao opened a new pull request #524: [NUTCH-2786] add a warning for insecure TrustManager - posted by GitBox <gi...@apache.org> on 2020/05/04 16:05:25 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2786) TrustManager methods do not have certificate validation logic - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/05/04 16:06:00 UTC, 1 replies.
- [jira] [Updated] (NUTCH-2786) TrustManager methods do not have certificate validation logic - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/04 20:06:00 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-2434) Add methods to reset parameters HTMLMetaTags - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/05 09:32:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1652) Avoid instanciation of MimeUtil for each Content object created - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/05 09:45:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1945) Test for XLSX parser - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/05 09:53:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1945) Test for XLSX parser - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/05 09:53:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2434) Add methods to reset parameters HTMLMetaTags - posted by "Hudson (Jira)" <ji...@apache.org> on 2020/05/05 09:55:00 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel opened a new pull request #525: NUTCH-1945 Test for XLSX parser - posted by GitBox <gi...@apache.org> on 2020/05/05 11:31:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1945) Test for XLSX parser - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/05/05 11:32:00 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1806) Delegate processing of URL domains to crawler commons - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/05 11:33:00 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel commented on pull request #514: NUTCH-1194 Generator: CrawlDB lock should be released earlier - posted by GitBox <gi...@apache.org> on 2020/05/05 11:38:27 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1194) Generator: CrawlDB lock should be released earlier - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/05/05 11:39:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1194) Generator: CrawlDB lock should be released earlier - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/05 12:11:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2785) FreeGenerator: command-line option to define number of generated fetch lists - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/05 13:58:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2002) ParserChecker and IndexingFiltersChecker to check robots.txt - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/05 14:00:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2753) Add -listen option to command-line help of CrawlDbReader and LinkDbReader - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/05 14:01:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2753) Add -listen option to command-line help of CrawlDbReader and LinkDbReader - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/05 14:01:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2758) Add plugin READMEs to binary release packages - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/05 14:02:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2758) Add plugin READMEs to binary release packages - posted by "Hudson (Jira)" <ji...@apache.org> on 2020/05/05 14:55:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2753) Add -listen option to command-line help of CrawlDbReader and LinkDbReader - posted by "Hudson (Jira)" <ji...@apache.org> on 2020/05/05 14:55:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2785) FreeGenerator: command-line option to define number of generated fetch lists - posted by "Hudson (Jira)" <ji...@apache.org> on 2020/05/05 14:55:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2002) ParserChecker and IndexingFiltersChecker to check robots.txt - posted by "Hudson (Jira)" <ji...@apache.org> on 2020/05/05 14:55:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2419) Domain blacklist URL filter does not respect command-line override for file - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/12 13:12:00 UTC, 1 replies.
- [jira] [Updated] (NUTCH-2419) Some URL filters and normalizers do not respect command-line override for rule file - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/12 13:20:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1945) Test for XLSX parser - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/12 13:36:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2318) Text extraction in HtmlParser adds too much whitespace. - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/12 17:11:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2419) Some URL filters and normalizers do not respect command-line override for rule file - posted by "Markus Jelsma (Jira)" <ji...@apache.org> on 2020/05/13 10:27:00 UTC, 4 replies.
- [GitHub] [nutch] sebastian-nagel merged pull request #526: NUTCH-2419 Some URL filters and normalizers do not respect command-line override for rule file - posted by GitBox <gi...@apache.org> on 2020/05/14 15:43:26 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2419) Some URL filters and normalizers do not respect command-line override for rule file - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/14 15:44:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2596) Upgrade from org.mortbay.jetty to org.eclipse.jetty - posted by "Shashanka Balakuntala Srinivasa (Jira)" <ji...@apache.org> on 2020/05/15 15:47:00 UTC, 1 replies.
- [GitHub] [nutch] sebastian-nagel opened a new pull request #527: NUTCH-2496 Speed up link inversion step in crawling script - posted by GitBox <gi...@apache.org> on 2020/05/15 17:22:27 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2496) Speed up link inversion step in crawling script - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/05/15 17:23:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2496) Speed up link inversion step in crawling script - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/15 17:26:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1971) The crawldb.url.filters property is not present in any configuration file - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/05/15 17:31:00 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel opened a new pull request #528: NUTCH-2720 ROBOTS metatag ignored when capitalized - posted by GitBox <gi...@apache.org> on 2020/05/15 21:18:44 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2720) ROBOTS metatag ignored when capitalized - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/05/15 21:19:00 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2567) parse-metatags writes all meta tags twice - posted by "Sandro Osswald (Jira)" <ji...@apache.org> on 2020/05/18 12:54:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2567) parse-metatags writes all meta tags twice - posted by "Sandro Osswald (Jira)" <ji...@apache.org> on 2020/05/18 12:54:00 UTC, 0 replies.