You are viewing a plain text version of this content. The canonical link for it is here.
- [nutch] branch master updated: NUTCH-2683 DeduplicationJob: add option to prefer https:// over http:// - add optional value "httpsOverHttp" to -compareOrder argument to prefer https:// over http:// if it comes before the "urlLength" and neither "score" nor "fetchTime" take precedence - code improvements: remove nested loop, sort imports, add @Override statements where applicable - posted by sn...@apache.org on 2019/04/10 11:34:51 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2666 Increase default value for http.content.limit / ftp.content.limit / file.content.limit - increase the default content limit from 64 kB to 1024 kB - posted by sn...@apache.org on 2019/04/10 11:39:39 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2701 Fetcher: log dates and times also in human-readable form - add human-readable date to log message about time limit - move date formatter to TimingUtil - use new thread-safe date and time API - posted by sn...@apache.org on 2019/04/10 11:43:41 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2703 parse-tika: Boilerpipe should not run for non-(X)HTML pages - posted by ma...@apache.org on 2019/04/11 10:33:54 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2700 Indexchecker: improve command-line help - add options `-doIndex` to pass "checked" document to index writers (the property `doIndex` is kept to ensure back-ward compatibility) - posted by sn...@apache.org on 2019/04/11 10:45:36 UTC, 0 replies.
- [nutch] branch master updated (510a4ea -> b56a577) - posted by sn...@apache.org on 2019/04/12 11:27:13 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2704 Upgrade crawler-commons dependency to 1.0 - posted by sn...@apache.org on 2019/04/12 11:28:36 UTC, 0 replies.