You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Created] (NUTCH-2787) CrawlDb JSON dump does not export metadata primitive data types correctly - posted by "Patrick Mézard (Jira)" <ji...@apache.org> on 2020/06/04 13:29:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2787) CrawlDb JSON dump does not export metadata primitive data types correctly - posted by "Patrick Mézard (Jira)" <ji...@apache.org> on 2020/06/04 13:32:00 UTC, 3 replies.
- [jira] [Updated] (NUTCH-2567) parse-metatags writes all meta tags twice - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/08 12:07:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2567) parse-metatags writes all meta tags twice - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/08 12:07:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2788) ParseData: improve presentation of Metadat in method toString() - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/09 09:29:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2788) ParseData: improve presentation of Metadat in method toString() - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/09 09:29:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2567) parse-metatags writes all meta tags twice - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/09 09:40:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2788) ParseData: improve presentation of Metadata in method toString() - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/09 09:43:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-2789) Docker README: update links to point to cwiki - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/09 10:06:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2789) Documendation: update links to point to cwiki - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/09 10:09:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2789) Documendation: update links to point to cwiki - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/09 10:09:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2789) Documendation: update links to point to cwiki - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/09 10:09:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2789) Documentation: update links to point to cwiki - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/09 10:30:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2720) ROBOTS metatag ignored when capitalized - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/09 10:46:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2496) Speed up link inversion step in crawling script - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/09 10:48:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2496) Speed up link inversion step in crawling script - posted by "Hudson (Jira)" <ji...@apache.org> on 2020/06/09 12:00:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2720) ROBOTS metatag ignored when capitalized - posted by "Hudson (Jira)" <ji...@apache.org> on 2020/06/09 12:00:11 UTC, 1 replies.
- [jira] [Assigned] (NUTCH-2787) CrawlDb JSON dump does not export metadata primitive data types correctly - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/09 12:11:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2787) CrawlDb JSON dump does not export metadata primitive data types correctly - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/09 12:11:00 UTC, 4 replies.
- [jira] [Created] (NUTCH-2790) CSVIndexWriter does not escape leading quotes properly - posted by "Patrick Mézard (Jira)" <ji...@apache.org> on 2020/06/09 14:53:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2790) CSVIndexWriter does not escape leading quotes properly - posted by "Patrick Mézard (Jira)" <ji...@apache.org> on 2020/06/09 15:21:00 UTC, 4 replies.
- [jira] [Created] (NUTCH-2791) domainstats, protocolstats and crawlcomplete do not handle GCS URLs - posted by "Patrick Mézard (Jira)" <ji...@apache.org> on 2020/06/09 15:36:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2791) domainstats, protocolstats and crawlcomplete do not handle GCS URLs - posted by "Patrick Mézard (Jira)" <ji...@apache.org> on 2020/06/09 15:36:00 UTC, 2 replies.
- [GitHub] [nutch] sebastian-nagel opened a new pull request #530: NUTCH-2789 Documentation: update links to point to cwiki - posted by GitBox <gi...@apache.org> on 2020/06/09 15:50:21 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2789) Documentation: update links to point to cwiki - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/06/09 15:51:01 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2791) domainstats, protocolstats and crawlcomplete do not handle GCS URLs - posted by "Patrick Mézard (Jira)" <ji...@apache.org> on 2020/06/09 15:56:00 UTC, 7 replies.
- [GitHub] [nutch] sebastian-nagel opened a new pull request #529: NUTCH-2788 ParseData: improve presentation of Metadata in method toString() - posted by GitBox <gi...@apache.org> on 2020/06/09 16:02:32 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2788) ParseData: improve presentation of Metadata in method toString() - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/06/09 16:03:01 UTC, 4 replies.
- [GitHub] [nutch] sebastian-nagel opened a new pull request #531: NUTCH-2787 CrawlDb JSON dump does not export metadata primitive data types correctly - posted by GitBox <gi...@apache.org> on 2020/06/09 16:04:11 UTC, 0 replies.
- [GitHub] [nutch] jorgelbg commented on pull request #529: NUTCH-2788 ParseData: improve presentation of Metadata in method toString() - posted by GitBox <gi...@apache.org> on 2020/06/09 16:09:25 UTC, 0 replies.
- [GitHub] [nutch] lewismc commented on pull request #530: NUTCH-2789 Documentation: update links to point to cwiki - posted by GitBox <gi...@apache.org> on 2020/06/09 16:25:46 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel merged pull request #528: NUTCH-2720 ROBOTS metatag ignored when capitalized - posted by GitBox <gi...@apache.org> on 2020/06/09 16:25:48 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel merged pull request #527: NUTCH-2496 Speed up link inversion step in crawling script - posted by GitBox <gi...@apache.org> on 2020/06/09 16:27:48 UTC, 0 replies.
- [GitHub] [nutch] pmezard opened a new pull request #532: NUTCH-2790 indexer-csv: escape field leading quote character - posted by GitBox <gi...@apache.org> on 2020/06/09 16:30:40 UTC, 0 replies.
- [GitHub] [nutch] lewismc commented on pull request #529: NUTCH-2788 ParseData: improve presentation of Metadata in method toString() - posted by GitBox <gi...@apache.org> on 2020/06/09 16:35:24 UTC, 0 replies.
- [GitHub] [nutch] lewismc commented on pull request #531: NUTCH-2787 CrawlDb JSON dump does not export metadata primitive data types correctly - posted by GitBox <gi...@apache.org> on 2020/06/09 16:45:14 UTC, 0 replies.
- [GitHub] [nutch] pmezard opened a new pull request #533: NUTCH-2791 Handle GCS URLs in stats commands - posted by GitBox <gi...@apache.org> on 2020/06/09 16:46:59 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel commented on pull request #532: NUTCH-2790 indexer-csv: escape field leading quote character - posted by GitBox <gi...@apache.org> on 2020/06/09 20:53:45 UTC, 0 replies.
- [PROPOSAL] Replace whitelist blacklist with allowlist denylist - posted by lewis john mcgibbney <le...@apache.org> on 2020/06/09 22:20:48 UTC, 2 replies.
- Re: [EXTERNAL] [PROPOSAL] Replace whitelist blacklist with allowlist denylist - posted by Chris Mattmann <ma...@apache.org> on 2020/06/09 22:37:28 UTC, 1 replies.
- [GitHub] [nutch] mfeltscher commented on a change in pull request #279: NUTCH-2501: Take NUTCH_HEAPSIZE into account when crawling using crawl script - posted by GitBox <gi...@apache.org> on 2020/06/09 23:21:57 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2501) allow to set Java heap size when using crawl script in distributed mode - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/06/09 23:22:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2755) Remove obsolete plugin indexer-elastic-rest - posted by "Moreno Feltscher (Jira)" <ji...@apache.org> on 2020/06/09 23:39:00 UTC, 1 replies.
- [GitHub] [nutch] pmezard commented on pull request #531: NUTCH-2787 CrawlDb JSON dump does not export metadata primitive data types correctly - posted by GitBox <gi...@apache.org> on 2020/06/10 06:52:59 UTC, 0 replies.
- [jira] [Created] (NUTCH-2792) nutch index -params is only used in Solr indexer - posted by "Patrick Mézard (Jira)" <ji...@apache.org> on 2020/06/10 08:23:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2792) nutch index -params is only used in Solr indexer - posted by "Patrick Mézard (Jira)" <ji...@apache.org> on 2020/06/10 08:50:00 UTC, 4 replies.
- [jira] [Updated] (NUTCH-2792) nutch index -params is only used in Solr indexer - posted by "Patrick Mézard (Jira)" <ji...@apache.org> on 2020/06/10 08:51:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-2793) CSV indexer does not work in distributed mode - posted by "Patrick Mézard (Jira)" <ji...@apache.org> on 2020/06/10 12:02:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2793) CSV indexer does not work in distributed mode - posted by "Patrick Mézard (Jira)" <ji...@apache.org> on 2020/06/10 12:12:00 UTC, 2 replies.
- [GitHub] [nutch] pmezard opened a new pull request #534: NUTCH-2793 indexer-csv: make it work in distributed mode - posted by GitBox <gi...@apache.org> on 2020/06/10 12:14:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2793) CSV indexer does not work in distributed mode - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/06/10 12:15:00 UTC, 10 replies.
- [jira] [Issue Comment Deleted] (NUTCH-2793) CSV indexer does not work in distributed mode - posted by "Patrick Mézard (Jira)" <ji...@apache.org> on 2020/06/10 12:24:00 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel commented on a change in pull request #279: NUTCH-2501: Take NUTCH_HEAPSIZE into account when crawling using crawl script - posted by GitBox <gi...@apache.org> on 2020/06/10 14:08:44 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel commented on a change in pull request #534: NUTCH-2793 indexer-csv: make it work in distributed mode - posted by GitBox <gi...@apache.org> on 2020/06/10 15:21:58 UTC, 1 replies.
- [GitHub] [nutch] pmezard commented on pull request #534: NUTCH-2793 indexer-csv: make it work in distributed mode - posted by GitBox <gi...@apache.org> on 2020/06/10 16:31:55 UTC, 2 replies.
- [GitHub] [nutch] pmezard commented on a change in pull request #534: NUTCH-2793 indexer-csv: make it work in distributed mode - posted by GitBox <gi...@apache.org> on 2020/06/10 16:32:39 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel commented on pull request #534: NUTCH-2793 indexer-csv: make it work in distributed mode - posted by GitBox <gi...@apache.org> on 2020/06/10 16:49:32 UTC, 2 replies.
- [GitHub] [nutch] sebastian-nagel commented on a change in pull request #533: NUTCH-2791 Handle GCS URLs in stats commands - posted by GitBox <gi...@apache.org> on 2020/06/10 18:25:47 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2790) CSVIndexWriter does not escape leading quotes properly - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/10 18:27:00 UTC, 1 replies.
- [GitHub] [nutch] sebastian-nagel merged pull request #532: NUTCH-2790 indexer-csv: escape field leading quote character - posted by GitBox <gi...@apache.org> on 2020/06/10 18:27:04 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2790) CSVIndexWriter does not escape leading quotes properly - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/10 18:29:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2787) CrawlDb JSON dump does not export metadata primitive data types correctly - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/10 18:36:00 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel merged pull request #529: NUTCH-2788 ParseData: improve presentation of Metadata in method toString() - posted by GitBox <gi...@apache.org> on 2020/06/10 18:42:48 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2788) ParseData: improve presentation of Metadata in method toString() - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/10 18:44:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2789) Documentation: update links to point to cwiki - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/10 18:45:00 UTC, 0 replies.
- [GitHub] [nutch] pmezard commented on pull request #533: NUTCH-2791 Handle GCS URLs in stats commands - posted by GitBox <gi...@apache.org> on 2020/06/11 06:56:09 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel commented on pull request #533: NUTCH-2791 Handle GCS URLs in stats commands - posted by GitBox <gi...@apache.org> on 2020/06/11 11:21:25 UTC, 0 replies.
- [GitHub] [nutch] sebastian-nagel merged pull request #533: NUTCH-2791 Handle GCS URLs in stats commands - posted by GitBox <gi...@apache.org> on 2020/06/11 11:21:37 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2791) domainstats, protocolstats and crawlcomplete do not handle GCS URLs - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/11 11:27:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2596) Upgrade from org.mortbay.jetty to org.eclipse.jetty - posted by "Shashanka Balakuntala Srinivasa (Jira)" <ji...@apache.org> on 2020/06/15 06:54:00 UTC, 2 replies.
- [jira] [Created] (NUTCH-2794) Add additional ciphers to HTTP base's default cipher suite - posted by "Markus Jelsma (Jira)" <ji...@apache.org> on 2020/06/16 12:48:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2794) Add additional ciphers to HTTP base's default cipher suite - posted by "Markus Jelsma (Jira)" <ji...@apache.org> on 2020/06/16 12:49:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2794) Add additional ciphers to HTTP base's default cipher suite - posted by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/06/16 16:07:00 UTC, 4 replies.
- Preparing to release 1.17 - posted by Sebastian Nagel <wa...@googlemail.com> on 2020/06/16 16:59:57 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2794) Add additional ciphers to HTTP base's default cipher suite - posted by "Markus Jelsma (Jira)" <ji...@apache.org> on 2020/06/17 11:24:00 UTC, 0 replies.
- [VOTE] Release Apache Nutch 1.17 RC#1 - posted by Sebastian Nagel <sn...@apache.org> on 2020/06/18 10:22:56 UTC, 4 replies.
- Regarding the branch 2.x - posted by Shashanka Balakuntala <sh...@gmail.com> on 2020/06/19 08:39:10 UTC, 0 replies.
- Announcing ApacheCon @Home 2020 - posted by Rich Bowen <rb...@apache.org> on 2020/06/29 12:54:01 UTC, 0 replies.