You are viewing a plain text version of this content. The canonical link for it is here.
- ApacheCon North America 2018 schedule is now live. - posted by Rich Bowen <rb...@apache.org> on 2018/05/01 12:36:05 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2468) should filter out invalid URLs by default - posted by "Michael Coffey (JIRA)" <ji...@apache.org> on 2018/05/03 18:51:00 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-2468) should filter out invalid URLs by default - posted by "Michael Coffey (JIRA)" <ji...@apache.org> on 2018/05/03 18:52:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2575) protocol-http does not respect the maximum content-size - posted by "Omkar Reddy (JIRA)" <ji...@apache.org> on 2018/05/06 09:19:00 UTC, 2 replies.
- [jira] [Assigned] (NUTCH-2513) ant eclipse protocol unsafe - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/08 11:23:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2513) ant eclipse target fails with "protocol switch unsafe" - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/08 11:28:00 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-2513) ant eclipse target fails with "protocol switch unsafe" - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/08 11:33:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2513) ant eclipse target fails with "protocol switch unsafe" - posted by "Hudson (JIRA)" <ji...@apache.org> on 2018/05/08 11:57:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-2576) HTTP protocol plugin based on okhttp - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/09 11:48:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2576) HTTP protocol plugin based on okhttp - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/09 11:52:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2576) HTTP protocol plugin based on okhttp - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/09 12:01:00 UTC, 4 replies.
- [jira] [Resolved] (NUTCH-2514) Segmentation Fault issue while running crawl job. - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/10 11:00:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2501) Take into account $NUTCH_HEAPSIZE when crawling using crawl script - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/10 12:16:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2161) Interrupted failed and/or killed tasks fail to clean up temp directories in HDFS - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/10 12:59:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2575) protocol-http does not respect the maximum content-size for chunked responses - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/10 13:52:00 UTC, 3 replies.
- [jira] [Commented] (NUTCH-2575) protocol-http does not respect the maximum content-size for chunked responses - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/10 21:02:00 UTC, 4 replies.
- [jira] [Resolved] (NUTCH-2575) protocol-http does not respect the maximum content-size for chunked responses - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/10 21:04:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2562) protocol-http fails to read large chunked HTTP responses - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/10 21:22:00 UTC, 3 replies.
- [jira] [Updated] (NUTCH-2562) protocol-http fails to read large chunked HTTP responses - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/10 21:36:00 UTC, 1 replies.
- REMINDER: Apache EU Roadshow 2018 schedule announced! - posted by sh...@apache.org on 2018/05/11 12:13:36 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2574) hostCount >= maxCount comparison wrong - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/11 14:03:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2574) hostCount >= maxCount comparison wrong - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/11 14:03:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2577) protocol-selenium can't handle https - posted by "hussein Al_Ahmad (JIRA)" <ji...@apache.org> on 2018/05/15 13:28:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2577) protocol-selenium can't handle https - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/15 15:19:00 UTC, 3 replies.
- [jira] [Created] (NUTCH-2578) Avoid lock by MimeUtil in constructor of protocol.Content - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/17 12:17:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2578) Avoid lock by MimeUtil in constructor of protocol.Content - posted by "Yossi Tamari (JIRA)" <ji...@apache.org> on 2018/05/17 12:47:00 UTC, 4 replies.
- [jira] [Updated] (NUTCH-2578) Avoid lock by MimeUtil in constructor of protocol.Content - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/18 15:09:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2579) Fetcher to use parsed URL to call ProtocolFactory.getProtocol(url) - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/18 16:17:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers. - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/20 23:52:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-2580) Improvements for Rabbitmq support - posted by "Roannel Fernández Hernández (JIRA)" <ji...@apache.org> on 2018/05/21 14:21:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2580) Improvements for Rabbitmq support - posted by "Roannel Fernández Hernández (JIRA)" <ji...@apache.org> on 2018/05/21 14:40:00 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-2578) Avoid lock by MimeUtil in constructor of protocol.Content - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2018/05/21 18:38:00 UTC, 3 replies.
- [jira] [Created] (NUTCH-2581) Caching of redirected robots.txt may overwrite correct robots.txt rules - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/22 13:04:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2581) Caching of redirected robots.txt may overwrite correct robots.txt rules - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/22 13:08:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2581) Caching of redirected robots.txt may overwrite correct robots.txt rules - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/22 13:08:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2582) Set pool size of XML SAX parsers used for MIME detection in Tika 1.19 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/22 15:56:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2582) Set pool size of XML SAX parsers used for MIME detection in Tika 1.19 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/22 15:58:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2290) Update licenses of bundled libraries - posted by "Ralf (JIRA)" <ji...@apache.org> on 2018/05/22 20:35:00 UTC, 3 replies.
- [jira] [Commented] (NUTCH-2512) Nutch 1.14 does not work under JDK9 - posted by "Ralf (JIRA)" <ji...@apache.org> on 2018/05/22 20:48:00 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2500) Add pull-reqest template to github - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/23 10:50:00 UTC, 4 replies.
- [jira] [Resolved] (NUTCH-2577) protocol-selenium can't handle https - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/23 16:22:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2310) Protocol-Selenium does not support HTTPS protocol - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/23 16:23:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2273) Selenium and InteractiveSelenium Do Not Support HTTPS - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/23 16:23:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2549) protocol-http does not behave the same as browsers - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/05/24 12:29:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2549) protocol-http does not behave the same as browsers - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/05/24 12:29:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2500) Add pull-reqest template to github - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/24 12:37:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2500) Add pull-reqest template to github - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/24 12:38:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2557) protocol-http fails to follow redirections when an HTTP response body is invalid - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/24 13:31:00 UTC, 2 replies.
- [jira] [Comment Edited] (NUTCH-2557) protocol-http fails to follow redirections when an HTTP response body is invalid - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/24 13:33:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2583) Upgrading Nutch's dependencies - posted by "Ralf (JIRA)" <ji...@apache.org> on 2018/05/24 13:48:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2583) Upgrading Nutch's dependencies - posted by "Ralf (JIRA)" <ji...@apache.org> on 2018/05/24 13:56:00 UTC, 2 replies.
- [jira] [Created] (NUTCH-2584) Upgrade parse-tika to use Tika 1.18 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/24 14:52:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2583) Upgrading Nutch's dependencies - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/24 14:53:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2579) Fetcher to use parsed URL to call ProtocolFactory.getProtocol(url) - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/24 15:58:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2584) Upgrade parse-tika to use Tika 1.18 - posted by "Ralf (JIRA)" <ji...@apache.org> on 2018/05/24 16:22:00 UTC, 5 replies.
- [jira] [Commented] (NUTCH-2580) Improvements for Rabbitmq support - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/24 20:44:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-2585) NPE in TrieStringMatcher - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2018/05/25 14:33:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2586) Add a fallback mechanism for missing meta tags - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/05/28 14:30:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2587) Tests do not pass - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/05/28 15:16:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2588) Getting status code x01 (unfetched) on more than 80% crawled urls - posted by "Usama Tahir (JIRA)" <ji...@apache.org> on 2018/05/29 06:29:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2588) Getting status code x01 (unfetched) on more than 80% crawled urls - posted by "Usama Tahir (JIRA)" <ji...@apache.org> on 2018/05/29 06:57:00 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2584) Upgrade parse-tika to use Tika 1.18 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/29 07:23:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2587) Tests do not pass - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/29 08:16:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2588) Getting status code x01 (unfetched) on more than 80% crawled urls - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/29 08:39:00 UTC, 1 replies.
- [jira] [Closed] (NUTCH-2587) Tests do not pass - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/05/29 08:47:00 UTC, 0 replies.
- A Hadoop documentation issue about Nutch - posted by Cihad Guzel <cg...@gmail.com> on 2018/05/29 08:55:52 UTC, 1 replies.
- [jira] [Created] (NUTCH-2589) HTML redirections are not followed when using parse-tika - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/05/29 16:01:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2589) HTML redirections are not followed when using parse-tika - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/05/29 16:02:00 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-2588) Getting status code x01 (unfetched) on more than 80% crawled urls - posted by "Usama Tahir (JIRA)" <ji...@apache.org> on 2018/05/30 05:58:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2588) Getting status code x01 (unfetched) on more than 80% crawled urls - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/30 10:30:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2589) HTML redirections are not followed when using parse-tika - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/30 11:15:00 UTC, 2 replies.
- [jira] [Created] (NUTCH-2590) SegmentReader -get fails - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/31 15:37:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2590) SegmentReader -get fails - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/31 15:51:00 UTC, 1 replies.
- [jira] [Updated] (NUTCH-2590) SegmentReader -get fails - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/31 15:52:00 UTC, 0 replies.
- REMINDER: Apache EU Roadshow 2018 in Berlin is less than 2 weeks away! - posted by sh...@apache.org on 2018/05/31 20:51:47 UTC, 0 replies.