You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (NUTCH-2399) indexer-elastic does not index multi-value fields (only the first value is indexed) - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/01 10:24:02 UTC, 2 replies.
- parse-zip Nutch 2.x compatibility? - posted by Michael Chen <yi...@u.northwestern.edu> on 2017/08/02 00:21:32 UTC, 1 replies.
- Re: Question on 2.x sitemap functionality - posted by Michael Chen <yi...@u.northwestern.edu> on 2017/08/02 00:28:29 UTC, 3 replies.
- [jira] [Commented] (NUTCH-2375) Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/02 03:14:00 UTC, 3 replies.
- HTML Support for jsoup-extractor in Nutch 2.x? - posted by Michael Chen <yi...@u.northwestern.edu> on 2017/08/02 21:42:38 UTC, 1 replies.
- Parse-zip porting? - posted by Michael Chen <yi...@u.northwestern.edu> on 2017/08/04 23:52:19 UTC, 0 replies.
- [jira] [Created] (NUTCH-2405) jsoup-extractor structure correction, typo fixed - posted by "Kaidul Islam (JIRA)" <ji...@apache.org> on 2017/08/06 09:09:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2405) jsoup-extractor structure correction, typo fixed - posted by "Kaidul Islam (JIRA)" <ji...@apache.org> on 2017/08/06 09:13:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2405) jsoup-extractor structure correction, typo fixed - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/06 09:23:00 UTC, 2 replies.
- [jira] [Created] (NUTCH-2406) Sum up constants, make minor changes - posted by "kenneth mcfarland (JIRA)" <ji...@apache.org> on 2017/08/08 08:28:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2406) Sum up constants, make minor changes - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/08 08:29:00 UTC, 2 replies.
- fetching pdfs from our website - posted by "d.kumar@technisat.de" <d....@technisat.de> on 2017/08/08 13:00:03 UTC, 0 replies.
- Regarding checksum error in hadoop in my latest PR. - posted by Omkar Reddy <om...@apache.org> on 2017/08/09 09:56:42 UTC, 1 replies.
- Release of TREC Dynamic Domain: Polar Dataset - posted by "Mattmann, Chris A (3010)" <ch...@jpl.nasa.gov> on 2017/08/09 16:55:34 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2406) Sum up constants, make minor changes - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2017/08/09 17:20:01 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2406) Sum up constants, make minor changes - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2017/08/09 17:20:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2406) Sum up constants, make minor changes - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2017/08/09 17:20:01 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2405) jsoup-extractor structure correction, typo fixed - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2017/08/09 17:26:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2407) Memory leak causing Nutch Server to run out of memory - posted by "Vyacheslav Pascarel (JIRA)" <ji...@apache.org> on 2017/08/11 21:20:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1932) Automatically remove orphaned pages - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/12 13:37:00 UTC, 2 replies.
- [jira] [Created] (NUTCH-2408) CrawlDb: allow update from unparsed segments - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/12 14:16:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2408) CrawlDb: allow update from unparsed segments - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/12 14:24:00 UTC, 3 replies.
- [jira] [Commented] (NUTCH-2407) Memory leak causing Nutch Server to run out of memory - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/14 14:18:01 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2408) CrawlDb: allow update from unparsed segments - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/15 12:21:01 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2400) Solr 6.6.0 compatibility - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/15 15:26:00 UTC, 2 replies.
- [Nutch Wiki] Update of "NutchTutorial" by SebastianNagel - posted by Apache Wiki <wi...@apache.org> on 2017/08/15 15:29:35 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2400) Solr 6.6.0 compatibility - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2017/08/15 15:52:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/15 16:32:00 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2298) TestCrawlDbStates.testCrawlDbStatTransitionInject broken - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/15 19:57:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2378) ChildFirst plugin classloader - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/15 20:28:00 UTC, 10 replies.
- [jira] [Updated] (NUTCH-2378) ChildFirst plugin classloader - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/15 20:29:01 UTC, 2 replies.
- [jira] [Comment Edited] (NUTCH-2378) ChildFirst plugin classloader - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/15 20:29:01 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2378) ChildFirst plugin classloader - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/16 12:49:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2407) Memory leak causing Nutch Server to run out of memory - posted by "Vyacheslav Pascarel (JIRA)" <ji...@apache.org> on 2017/08/16 21:55:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/08/17 08:46:03 UTC, 5 replies.
- [jira] [Updated] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/08/17 08:46:03 UTC, 0 replies.
- [jira] [Created] (NUTCH-2409) Injector: complete command-line help and counters - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/17 10:53:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2409) Injector: complete command-line help and counters - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/17 10:55:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2410) Unit test for jsoup-extractor not to depend on external resources - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/17 11:07:00 UTC, 0 replies.
- NutchServer - posted by kenneth mcfarland <ke...@gmail.com> on 2017/08/17 18:46:40 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1690) IndexClean: mark url as unindexed after clean to not delete again - posted by "hussein Al_Ahmad (JIRA)" <ji...@apache.org> on 2017/08/18 13:58:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2378) ChildFirst plugin classloader - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/18 14:35:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2071) A parser failure on a single document may fail crawling job - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/18 14:36:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2071) A parser failure on a single document may fail crawling job - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/18 14:36:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2316) Library conflict with Parser-Tika Plugin and Lib Folder - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/18 14:37:00 UTC, 0 replies.
- [jira] [Work stopped] (NUTCH-2316) Library conflict with Parser-Tika Plugin and Lib Folder - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/18 14:37:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2316) Library conflict with Parser-Tika Plugin and Lib Folder - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/18 14:37:00 UTC, 0 replies.
- [jira] [Work started] (NUTCH-2316) Library conflict with Parser-Tika Plugin and Lib Folder - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/18 14:37:00 UTC, 0 replies.
- Styles - posted by kenneth mcfarland <ke...@gmail.com> on 2017/08/18 22:08:55 UTC, 2 replies.
- [jira] [Comment Edited] (NUTCH-1690) IndexClean: mark url as unindexed after clean to not delete again - posted by "hussein Al_Ahmad (JIRA)" <ji...@apache.org> on 2017/08/19 15:16:01 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2399) indexer-elastic does not index multi-value fields (only the first value is indexed) - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2017/08/21 16:51:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2399) indexer-elastic does not index multi-value fields (only the first value is indexed) - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2017/08/21 16:51:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1129) Any23 Nutch plugin - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/21 17:41:00 UTC, 6 replies.
- [jira] [Created] (NUTCH-2411) Index-metadata to support indexing multiple values for a field - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/08/22 14:27:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2411) Index-metadata to support indexing multiple values for a field - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/08/22 14:30:00 UTC, 6 replies.
- [jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers. - posted by "Roannel Fernández Hernández (JIRA)" <ji...@apache.org> on 2017/08/23 14:41:00 UTC, 2 replies.
- [jira] [Created] (NUTCH-2412) Exchange component for indexing job - posted by "Roannel Fernández Hernández (JIRA)" <ji...@apache.org> on 2017/08/24 13:26:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2413) When fetching and parsing together, parameter "parse.filter.urls" is ignored - posted by "Marcos Bori (JIRA)" <ji...@apache.org> on 2017/08/25 12:16:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2413) When fetching and parsing together, parameter "parse.filter.urls" is ignored - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/25 12:24:00 UTC, 9 replies.
- [jira] [Updated] (NUTCH-2413) When fetching and parsing together, parameter "parse.filter.urls" is ignored - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/25 13:43:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2310) Protocol-Selenium does not support HTTPS protocol - posted by "Antoine DELMOTTE (JIRA)" <ji...@apache.org> on 2017/08/25 16:31:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2413) Parsing fetcher to respect property "parse.filter.urls" - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/26 08:48:00 UTC, 1 replies.
- [jira] [Updated] (NUTCH-2413) Parsing fetcher to respect property "parse.filter.urls" - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/26 08:48:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2413) Parsing fetcher to respect property "parse.filter.urls" - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/26 08:49:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2413) Parsing fetcher to respect property "parse.filter.urls" - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/26 08:50:00 UTC, 0 replies.
- [Nutch Wiki] Update of "GoogleSummerOfCode/GraphGeneratorTool/WeeklyReports" by OmkarReddy - posted by Apache Wiki <wi...@apache.org> on 2017/08/27 06:17:03 UTC, 0 replies.
- [jira] [Created] (NUTCH-2414) Allow LanguageIndexingFilter to actually filter documents by language. - posted by "Yossi Tamari (JIRA)" <ji...@apache.org> on 2017/08/28 12:49:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2414) Allow LanguageIndexingFilter to actually filter documents by language. - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/28 12:52:00 UTC, 5 replies.
- [jira] [Created] (NUTCH-2415) Create a JEXL based IndexingFilter - posted by "Yossi Tamari (JIRA)" <ji...@apache.org> on 2017/08/29 16:14:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2415) Create a JEXL based IndexingFilter - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/29 16:17:00 UTC, 17 replies.
- [jira] [Created] (NUTCH-2416) Fetcher to log thread ID - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/08/30 08:44:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2416) Fetcher to log thread ID - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/08/30 08:44:01 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2416) Fetcher to log thread ID - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/08/30 08:54:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2417) Support for variable fetch delay via FreeGenerator - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/08/30 11:58:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2415) Create a JEXL based IndexingFilter - posted by "Jorge Luis Betancourt Gonzalez (JIRA)" <ji...@apache.org> on 2017/08/31 08:44:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2411) Index-metadata to support indexing multiple values for a field - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/08/31 12:23:01 UTC, 0 replies.