You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Created] (NUTCH-2490) Sitemap processing: Sitemap index files not working - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/02 22:51:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2490) Sitemap processing: Sitemap index files not working - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/02 22:55:00 UTC, 4 replies.
- [jira] [Updated] (NUTCH-2490) Sitemap processing: Sitemap index files not working - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/02 23:41:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-2491) Integrate sitemap processing and HostDB into crawl script - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/03 14:16:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2491) Integrate sitemap processing and HostDB into crawl script - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/03 14:18:00 UTC, 3 replies.
- [jira] [Commented] (NUTCH-2460) use the headless option of firefox and chrome in protocol-selenium - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/03 15:45:01 UTC, 5 replies.
- [jira] [Commented] (NUTCH-2454) REST API fix for usage of hostdb in generator - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/03 17:31:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2454) REST API fix for usage of hostdb in generator - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/03 17:32:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2491) Integrate sitemap processing and HostDB into crawl script - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/03 17:35:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2491) Integrate sitemap processing and HostDB into crawl script - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/03 17:36:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers. - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/03 17:37:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2490) Sitemap processing: Sitemap index files not working - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/03 17:42:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1129) Any23 Nutch plugin - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/03 20:10:00 UTC, 31 replies.
- [jira] [Created] (NUTCH-2492) Add more configuration parameters to crawl script - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/03 23:31:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2492) Add more configuration parameters to crawl script - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/03 23:34:00 UTC, 6 replies.
- [jira] [Commented] (NUTCH-2375) Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/05 17:37:00 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-2467) Sitemap type field can be null - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/01/06 08:45:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1807) avoid methods relying on system-specific default locale / charset - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/01/07 21:07:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1807) avoid methods relying on system-specific default locale / charset - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/01/07 21:08:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2492) Add more configuration parameters to crawl script - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/08 11:51:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2492) Add more configuration parameters to crawl script - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/08 11:51:01 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2488) Please use SSL (https) for KEYS, sigs, hashes - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/01/08 12:25:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2488) Please use SSL (https) for KEYS, sigs, hashes - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/01/08 12:26:00 UTC, 0 replies.
- [jira] [Closed] (NUTCH-2488) Please use SSL (https) for KEYS, sigs, hashes - posted by "Sebb (JIRA)" <ji...@apache.org> on 2018/01/08 13:28:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2321) Indexing filter checker leaks threads - posted by "Jurian Broertjes (JIRA)" <ji...@apache.org> on 2018/01/08 17:49:00 UTC, 3 replies.
- [jira] [Created] (NUTCH-2493) Add configuration parameter for sitemap processing to crawler script - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/09 00:24:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2493) Add configuration parameter for sitemap processing to crawler script - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/09 00:26:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2493) Add configuration parameter for sitemap processing to crawler script - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/09 00:32:00 UTC, 4 replies.
- [jira] [Commented] (NUTCH-2441) ARG_SEGMENT usage - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/10 01:44:00 UTC, 3 replies.
- [jira] [Updated] (NUTCH-2324) Issue in setting default linkdb path - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/10 01:48:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2324) Issue in setting default linkdb path - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/10 01:51:04 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2493) Add configuration parameter for sitemap processing to crawler script - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/10 16:16:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2494) Fetcher: java.lang.IllegalArgumentException: Wrong FS: s3 - posted by "Ashraful Islam (JIRA)" <ji...@apache.org> on 2018/01/11 10:18:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2494) Fetcher: java.lang.IllegalArgumentException: Wrong FS: s3 - posted by "Ashraful Islam (JIRA)" <ji...@apache.org> on 2018/01/11 10:20:01 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2494) Fetcher: java.lang.IllegalArgumentException: Wrong FS: s3 - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/11 11:26:00 UTC, 5 replies.
- [jira] [Updated] (NUTCH-1129) Any23 Nutch plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/11 21:22:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1129) Any23 Nutch plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/11 21:22:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2495) Use -deleteGone instead of clean job in crawler script while indexing - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/12 23:23:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2495) Use -deleteGone instead of clean job in crawler script while indexing - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/12 23:24:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2496) Speed up link inversion step in crawling script - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/12 23:33:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2496) Speed up link inversion step in crawling script - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/12 23:34:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2496) Speed up link inversion step in crawling script - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/12 23:35:00 UTC, 4 replies.
- [jira] [Created] (NUTCH-2497) Elastic REST Indexer: Allow multiple hosts - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/13 01:11:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2497) Elastic REST Indexer: Allow multiple hosts - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/13 01:12:00 UTC, 3 replies.
- [jira] [Created] (NUTCH-2498) Docker fiels are outdated - posted by "dhirajforyou (JIRA)" <ji...@apache.org> on 2018/01/13 08:58:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2498) Docker files are outdated - posted by "dhirajforyou (JIRA)" <ji...@apache.org> on 2018/01/13 09:00:15 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2321) Indexing filter checker leaks threads - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/15 17:39:01 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2461) Generate passes the data to when maxCount == 0 - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/15 17:40:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2461) Generate passes the data to when maxCount == 0 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/15 17:41:01 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2496) Speed up link inversion step in crawling script - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2018/01/16 10:53:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2499) Elastic REST Indexer: Duplicate values - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/16 21:43:00 UTC, 3 replies.
- [jira] [Created] (NUTCH-2499) Elastic REST Indexer: Duplicate values - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/16 21:43:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2499) Elastic REST Indexer: Duplicate values - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/16 21:47:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2370) FileDumper: save JSON mapping file -> URL - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/16 22:00:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2500) Add pull-reqest template to github - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/01/17 11:04:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2500) Add pull-reqest template to github - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/01/17 11:04:01 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2466) Sitemap processor to follow redirects - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/01/17 13:26:00 UTC, 13 replies.
- [jira] [Updated] (NUTCH-2481) HostDatum deltas(previous step statistics) - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2018/01/17 13:32:00 UTC, 2 replies.
- [jira] [Updated] (NUTCH-2481) HostDatum deltas(previous step statistics) and Metadata expressions - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2018/01/17 13:36:00 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2455) Speed up the merging of HostDb entries for variable fetch delay - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/18 13:00:00 UTC, 2 replies.
- [jira] [Updated] (NUTCH-2497) Elastic REST Indexer: Allow multiple hosts - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/18 17:48:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2497) Elastic REST Indexer: Allow multiple hosts - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/18 17:49:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2441) ARG_SEGMENT usage - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/18 17:52:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2481) HostDatum deltas(previous step statistics) and Metadata expressions - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2018/01/19 11:37:00 UTC, 4 replies.
- [jira] [Created] (NUTCH-2501) Take into account $NUTCH_HEAPSIZE when crawling using crawl script - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/22 22:31:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/23 13:29:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2466) Sitemap processor to follow redirects - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2018/01/23 15:50:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-2503) Add option to run tests for a single plugin - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/23 16:07:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2503) Add option to run tests for a single plugin - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2018/01/23 16:13:00 UTC, 3 replies.
- [jira] [Commented] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/23 16:16:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2501) Take into account $NUTCH_HEAPSIZE when crawling using crawl script - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/23 16:17:01 UTC, 8 replies.
- [jira] [Assigned] (NUTCH-2499) Elastic REST Indexer: Duplicate values - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/23 16:18:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2503) Add option to run tests for a single plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/23 17:52:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2503) Add option to run tests for a single plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/23 17:53:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2501) Take into account $NUTCH_HEAPSIZE when crawling using crawl script - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/23 17:55:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/23 17:55:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2495) Use -deleteGone instead of clean job in crawler script while indexing - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/23 17:56:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2499) Elastic REST Indexer: Duplicate values - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/23 17:59:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/23 18:01:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/23 18:02:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2369) Create a new GraphGenerator Tool for writing Nutch Records as a Full Web Graph - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/24 20:55:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2369) Create a new GraphGenerator Tool for writing Nutch Records as a Full Web Graph - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2018/01/24 21:53:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-2504) Results of maxCountExpr and fetchDelayExpr should be stored in memory in Generate - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2018/01/25 14:55:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2504) Results of maxCountExpr and fetchDelayExpr should be stored in memory in Generate - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2018/01/25 15:07:00 UTC, 2 replies.
- [jira] [Comment Edited] (NUTCH-2481) HostDatum deltas(previous step statistics) and Metadata expressions - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2018/01/25 16:20:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/26 04:21:00 UTC, 6 replies.
- [Nutch Wiki] New attachment added to page Anthelion - posted by Apache Wiki <wi...@apache.org> on 2018/01/26 19:28:12 UTC, 0 replies.
- [jira] [Created] (NUTCH-2505) nutch does not delete the .locked file, when the generator partition got an exception - posted by "Ajoy Lian (JIRA)" <ji...@apache.org> on 2018/01/27 07:57:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2505) nutch does not delete the .locked file, when the generator partition got an exception - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/27 08:15:00 UTC, 2 replies.
- [jira] [Assigned] (NUTCH-2494) Fetcher: java.lang.IllegalArgumentException: Wrong FS: s3 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/01/29 15:40:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2494) Fetcher: java.lang.IllegalArgumentException: Wrong FS: s3 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/01/29 15:42:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2506) host is not available for filtering on the JEXL indexing plugin - posted by "Jorge Luis Betancourt Gonzalez (JIRA)" <ji...@apache.org> on 2018/01/30 12:56:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2506) host is not available for filtering on the JEXL indexing plugin - posted by "Jorge Luis Betancourt Gonzalez (JIRA)" <ji...@apache.org> on 2018/01/30 14:47:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2507) NutchTutorial wiki pages as a lot of outdated command line calls when it starts with the solr interaction - posted by "artodeto (JIRA)" <ji...@apache.org> on 2018/01/31 11:15:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2466) Sitemap processor to follow redirects - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2018/01/31 13:55:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2508) Misleading documentation about http.proxy.exception.list - posted by "Moreno Feltscher (JIRA)" <ji...@apache.org> on 2018/01/31 22:39:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2508) Misleading documentation about http.proxy.exception.list - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/31 22:52:00 UTC, 1 replies.
- [jira] [Updated] (NUTCH-2508) Misleading documentation about http.proxy.exception.list - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/31 23:00:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2508) Misleading documentation about http.proxy.exception.list - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/01/31 23:00:00 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2466) Sitemap processor to follow redirects - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2018/01/31 23:15:00 UTC, 0 replies.