You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (NUTCH-2545) Upgrade to Any23 2.2 - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/02 16:10:00 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-2509) Inconsistent behavior in SitemapProcessor - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/02 16:10:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2545) Upgrade to Any23 2.2 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/04/02 16:12:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-2545) Upgrade to Any23 2.2 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/04/02 16:12:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2518) Must check return value of job.waitForCompletion() - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/02 16:42:00 UTC, 14 replies.
- [jira] [Assigned] (NUTCH-2518) Must check return value of job.waitForCompletion() - posted by "Kenneth McFarland (JIRA)" <ji...@apache.org> on 2018/04/03 05:29:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2548) Compressed content skipped. Content of size 78 was truncated to 74 - posted by "rusty x (JIRA)" <ji...@apache.org> on 2018/04/03 15:53:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2548) Compressed content skipped. Content of size 78 was truncated to 74 - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/03 18:11:00 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-2518) Must check return value of job.waitForCompletion() - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/04 11:00:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2549) protocol-http does not behave the same as browsers - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/06 14:40:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2549) protocol-http does not behave the same as browsers - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/06 14:42:00 UTC, 5 replies.
- [jira] [Created] (NUTCH-2550) Redirects are broken - posted by "Hans Brende (JIRA)" <ji...@apache.org> on 2018/04/07 23:17:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2550) Redirects are broken - posted by "Hans Brende (JIRA)" <ji...@apache.org> on 2018/04/07 23:48:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2550) Redirects are broken - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/08 00:18:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-2551) NullPointerException in generator - posted by "Hans Brende (JIRA)" <ji...@apache.org> on 2018/04/08 05:39:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2551) NullPointerException in generator - posted by "Hans Brende (JIRA)" <ji...@apache.org> on 2018/04/08 05:52:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-2552) CrawlDbReader -topN fails - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/09 08:21:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2552) CrawlDbReader -topN fails - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/09 08:23:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2553) Fetcher not to modify URLs to be fetched - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/09 08:27:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2553) Fetcher not to modify URLs to be fetched - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/09 08:28:00 UTC, 6 replies.
- [jira] [Created] (NUTCH-2554) parsechecker can't fetch some URLs - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 08:49:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2554) parserchecker can't fetch some URLs - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 08:49:00 UTC, 3 replies.
- [jira] [Commented] (NUTCH-2554) parserchecker can't fetch some URLs - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/09 09:45:00 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-2548) Compressed content skipped. Content of size 78 was truncated to 74 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/09 10:01:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2550) Fetcher fails to follow redirects - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/09 10:06:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2549) protocol-http does not behave the same as browsers - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/09 10:44:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2012) Merge parsechecker and indexchecker - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/09 12:13:00 UTC, 3 replies.
- [jira] [Commented] (NUTCH-2455) Speed up the merging of HostDb entries for variable fetch delay - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/09 13:30:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2554) parserchecker can't fetch some URLs - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 13:46:00 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2549) protocol-http does not behave the same as browsers - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 14:23:00 UTC, 3 replies.
- [jira] [Created] (NUTCH-2555) URL normalization problem: path not starting with a '/' - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 14:33:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2555) URL normalization problem: path not starting with a '/' - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 14:34:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-2556) protocol-http makes invalid HTTP/1.0 requests - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 14:35:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2557) protocol-http fails to follow redirections when an HTTP response body is invalid - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 14:41:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2558) protocol-http cannot handle a missing HTTP status line - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 14:42:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2559) protocol-http cannot handle colons after the HTTP status code - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 14:44:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2560) protocol-http throws an error when an http header spans over multiple lines - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 14:45:01 UTC, 0 replies.
- [jira] [Created] (NUTCH-2561) protocol-http can be made to read arbitrarily large HTTP responses - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 14:48:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2561) protocol-http can be made to read arbitrarily large HTTP responses - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 14:51:00 UTC, 4 replies.
- [jira] [Created] (NUTCH-2562) protocol-http fails to read large chunked HTTP responses - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 14:59:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2562) protocol-http fails to read large chunked HTTP responses - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 15:02:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2563) HTTP header spellchecking issues - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/09 15:06:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2551) NullPointerException in generator - posted by "Omkar Reddy (JIRA)" <ji...@apache.org> on 2018/04/10 06:50:00 UTC, 9 replies.
- [jira] [Created] (NUTCH-2564) protocol-http throws an error when the content-length header is not a number - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/10 08:45:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2564) protocol-http throws an error when the content-length header is not a number - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/10 09:54:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2565) MergeDB incorrectly handles unfetched CrawlDatums - posted by "Jurian Broertjes (JIRA)" <ji...@apache.org> on 2018/04/10 11:00:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2565) MergeDB incorrectly handles unfetched CrawlDatums - posted by "Jurian Broertjes (JIRA)" <ji...@apache.org> on 2018/04/10 11:15:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2550) Fetcher fails to follow redirects - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/10 22:52:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2550) Fetcher fails to follow redirects - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/04/10 22:53:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2539) Not correct naming of db.url.filters and db.url.normalizers in nutch-default.xml - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/10 22:54:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2539) Not correct naming of db.url.filters and db.url.normalizers in nutch-default.xml - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/04/10 22:55:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2539) Not correct naming of db.url.filters and db.url.normalizers in nutch-default.xml - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/04/10 22:55:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2533) Injector: NullPointerException if seed URL dir contains non-file entries - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/11 09:45:00 UTC, 4 replies.
- [jira] [Assigned] (NUTCH-2533) Injector: NullPointerException if seed URL dir contains non-file entries - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/11 09:45:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2533) Injector: NullPointerException if seed URL dir contains non-file entries - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/11 09:46:00 UTC, 7 replies.
- [jira] [Assigned] (NUTCH-2566) Fix exception log messages - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/11 10:12:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2566) Fix exception log messages - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/11 10:12:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2566) Fix exception log messages - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/11 10:25:00 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-1226) Migrate CrawlDbReader to MapReduce API - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/11 10:56:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1226) Migrate CrawlDbReader to MapReduce API - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/11 10:57:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1224) Migrate FreeGenerator to MapReduce API - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/11 10:59:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1223) Migrate WebGraph to MapReduce API - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/11 10:59:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1219) Upgrade all jobs to new MapReduce API - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/11 11:00:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2384) nutch 2.3.1 job not properly interacting with hadoop 2.7.1 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/11 11:39:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2552) CrawlDbReader -topN fails - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/11 11:54:00 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-2012) Merge parsechecker and indexchecker - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/11 12:22:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2145) parse/index checker fail to fetch valid percent-encoded URLs - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/11 12:22:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2566) Fix exception log messages - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/11 12:39:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2567) parse-metatags writes every meta tags twice - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/11 15:58:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2567) parse-metatags writes every meta tags twice - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/11 16:00:00 UTC, 4 replies.
- [jira] [Resolved] (NUTCH-2533) Injector: NullPointerException if seed URL dir contains non-file entries - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/12 08:29:00 UTC, 0 replies.
- [Nutch Wiki] Update of "NutchHadoopSingleNodeTutorial" by SebastianNagel - posted by Apache Wiki <wi...@apache.org> on 2018/04/12 12:41:42 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2551) NullPointerException in generator - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/12 13:04:00 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2551) NullPointerException in generator - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/12 13:08:00 UTC, 0 replies.
- Testing Nutch on a single-node Hadoop cluster - posted by Sebastian Nagel <wa...@googlemail.com> on 2018/04/12 13:10:51 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2551) NullPointerException in generator - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/12 13:13:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2568) Caught exception is immediately rethrown - posted by "Hans Brende (JIRA)" <ji...@apache.org> on 2018/04/13 00:32:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2568) Caught exception is immediately rethrown - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/16 09:56:00 UTC, 2 replies.
- Unsubscribe to nutch mailing/dev list - posted by Pramod Nagarajarao <pr...@gmail.com> on 2018/04/18 18:28:02 UTC, 1 replies.
- How does Nutch update a Solr Document? - posted by BlackIce <bl...@gmail.com> on 2018/04/19 17:27:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2552) CrawlDbReader -topN fails - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/21 16:25:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1228) Change mapred.task.timeout to mapreduce.task.timeout in fetcher - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/21 16:34:00 UTC, 5 replies.
- [jira] [Updated] (NUTCH-1228) Change mapred.task.timeout to mapreduce.task.timeout in fetcher - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/21 16:34:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2553) Fetcher not to modify URLs to be fetched - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/21 16:37:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2568) Caught exception is immediately rethrown - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/21 16:45:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2544) Nutch 1.15 no longer compatible with AWS EMR and S3 - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/22 07:32:01 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2517) mergesegs corrupts segment data - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/22 19:19:00 UTC, 2 replies.
- [jira] [Created] (NUTCH-2569) ClassNotFoundException when running in (pseudo-)distributed mode - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/22 19:24:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2569) ClassNotFoundException when running in (pseudo-)distributed mode - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/22 19:31:00 UTC, 2 replies.
- [jira] [Assigned] (NUTCH-2569) ClassNotFoundException when running in (pseudo-)distributed mode - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/22 19:50:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2570) Deduplication job fails to install deduplicated CrawlDb - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/22 19:58:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2526) scoring-opic creating Issues while indexing some documents which were generated at parsetime. - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/22 20:36:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2570) Deduplication job fails to install deduplicated CrawlDb - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/22 20:43:00 UTC, 2 replies.
- [jira] [Updated] (NUTCH-2526) NPE in scoring-opic when indexing document without CrawlDb datum - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/23 09:41:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2526) NPE in scoring-opic when indexing document without CrawlDb datum - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/23 09:54:00 UTC, 2 replies.
- [jira] [Created] (NUTCH-2571) SegmentReader -list fails to read segment - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/23 11:26:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2570) Deduplication job fails to install deduplicated CrawlDb - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/23 11:27:01 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2544) Nutch 1.15 no longer compatible with AWS EMR and S3 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/23 11:27:01 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2571) SegmentReader -list fails to read segment - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/23 11:27:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2571) SegmentReader -list fails to read segment - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/23 11:48:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2572) HostDb: updatehostdb does not set values - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/23 11:57:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2571) SegmentReader -list fails to read segment - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/23 12:15:00 UTC, 3 replies.
- [jira] [Assigned] (NUTCH-2572) HostDb: updatehostdb does not set values - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/23 15:32:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2572) HostDb: updatehostdb does not set values - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/23 15:32:00 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-1228) Change mapred.task.timeout to mapreduce.task.timeout in fetcher - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/26 10:34:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2527) URL filter: provide rules to exclude localhost and private address spaces - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/26 10:38:00 UTC, 2 replies.
- [jira] [Assigned] (NUTCH-2527) URL filter: provide rules to exclude localhost and private address spaces - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/26 10:43:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2527) URL filter: provide rules to exclude localhost and private address spaces - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/26 10:44:00 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1228) Change mapred.task.timeout to mapreduce.task.timeout in fetcher - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/26 10:45:01 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2544) Nutch 1.15 no longer compatible with AWS EMR and S3 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/26 10:58:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2526) NPE in scoring-opic when indexing document without CrawlDb datum - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/26 11:00:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2517) mergesegs corrupts segment data - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/26 11:10:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2569) ClassNotFoundException when running in (pseudo-)distributed mode - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/26 11:13:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2570) Deduplication job fails to install deduplicated CrawlDb - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/26 11:15:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2571) SegmentReader -list fails to read segment - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/26 11:17:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2572) HostDb: updatehostdb does not set values - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/26 11:19:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2573) Suspend crawling if robots.txt fails to fetch with 5xx status - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/26 14:15:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2573) Suspend crawling if robots.txt fails to fetch with 5xx status - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/26 14:19:00 UTC, 1 replies.
- [jira] [Updated] (NUTCH-2573) Suspend crawling if robots.txt fails to fetch with 5xx status - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/04/26 14:20:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2574) hostCount >= maxCount comparison wrong - posted by "Michael Coffey (JIRA)" <ji...@apache.org> on 2018/04/28 00:05:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2575) protocol-http does not respect the maximum content-size - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/30 09:55:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2575) protocol-http does not respect the maximum content-size - posted by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/04/30 09:56:00 UTC, 0 replies.