You are viewing a plain text version of this content. The canonical link for it is here.
- svn commit: r1077906 - /websites/production/nutch/ - posted by gm...@apache.org on 2022/01/06 06:57:20 UTC, 0 replies.
- svn commit: r1077907 - /websites/staging/nutch/ - posted by gm...@apache.org on 2022/01/06 06:57:56 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers (#720) - posted by le...@apache.org on 2022/01/08 04:09:29 UTC, 0 replies.
- [nutch] branch master updated (e76d69f -> 78e827a) - posted by sn...@apache.org on 2022/01/09 09:46:04 UTC, 0 replies.
- [nutch-site] branch NUTCH-1999-nutch-site-robots-txt created (now 142489f) - posted by sn...@apache.org on 2022/01/09 12:57:59 UTC, 0 replies.
- [nutch-site] 01/01: NUTCH-1999 Add /robots.txt to Nutch site - posted by sn...@apache.org on 2022/01/09 12:58:00 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2929 Fetcher: start threads slowly to avoid that resources are temporarily exhausted - sleep for a configurable delay (fetcher.threads.start.delay) before starting the next Fetcher thread to avoid that resources (DNS, Tika XML parser pools) are temporarily exhausted when Fetcher threads fetch the first pages simultaneously - posted by sn...@apache.org on 2022/01/14 09:41:59 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2919 Upgrade to Tika 2.2.1 and Any23 2.6 (#717) - posted by le...@apache.org on 2022/01/15 23:24:29 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2935 DeduplicationJob: failure on URLs with invalid percent encoding - catch IllegalArgumentException when unescaping percent-encoding in URLs - if one URL of two compared URLs is valid, keep it as non-duplicate - add unit tests for DeduplicationJob - posted by sn...@apache.org on 2022/01/17 18:57:03 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2573 Suspend crawling if robots.txt fails to fetch with 5xx status (#724) - posted by sn...@apache.org on 2022/01/18 07:22:42 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2923: Added JobId in Job Failure logs (#721) - posted by sn...@apache.org on 2022/01/27 16:05:38 UTC, 0 replies.