You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Work started] (NUTCH-2213) CommonCrawlDataDumper saves gzipped body in extracted form - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/01 04:35:18 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2213) CommonCrawlDataDumper saves gzipped body in extracted form - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/01 04:35:18 UTC, 0 replies.
- [GitHub] nutch pull request: NUTCH-2213 : do not store the headers verbatim... - posted by asfgit <gi...@git.apache.org> on 2016/03/01 04:36:14 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2213) CommonCrawlDataDumper saves gzipped body in extracted form - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/03/01 04:36:18 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2213) CommonCrawlDataDumper saves gzipped body in extracted form - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/01 04:44:18 UTC, 0 replies.
- [GitHub] nutch pull request: Fix the issue of the bad tstamp - posted by asfgit <gi...@git.apache.org> on 2016/03/01 04:59:55 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2236) Upgrade to Hadoop 2.7.1 - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2016/03/01 16:47:18 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2060) dedup is removing entries with status db_gone - posted by "Rupanshu Satsangi (JIRA)" <ji...@apache.org> on 2016/03/02 00:20:18 UTC, 4 replies.
- [GitHub] nutch pull request: NUTCH-2184 Enable IndexingJob to function with... - posted by lewismc <gi...@git.apache.org> on 2016/03/02 05:23:45 UTC, 5 replies.
- [jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/03/02 05:24:18 UTC, 8 replies.
- [jira] [Commented] (NUTCH-2197) Add solr5 solrcloud indexer support - posted by "Arun Kumar (JIRA)" <ji...@apache.org> on 2016/03/02 09:08:18 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_ - posted by "Adnane B. (JIRA)" <ji...@apache.org> on 2016/03/02 13:48:18 UTC, 5 replies.
- [jira] [Created] (NUTCH-2237) DeduplicationJob: Add extra order criteria based on slug - posted by "Ron van der Vegt (JIRA)" <ji...@apache.org> on 2016/03/02 14:51:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2237) DeduplicationJob: Add extra order criteria based on slug - posted by "Ron van der Vegt (JIRA)" <ji...@apache.org> on 2016/03/02 15:20:18 UTC, 2 replies.
- [jira] [Closed] (NUTCH-2233) Index-basic incorrect assignment of next fetch time when using Mongodb as storage backend - posted by "Pablo Torres (JIRA)" <ji...@apache.org> on 2016/03/02 16:53:18 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_ - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2016/03/02 18:35:18 UTC, 4 replies.
- [jira] [Created] (NUTCH-2238) Indexer for Elasticsearch 2.x - posted by "Pablo Torres (JIRA)" <ji...@apache.org> on 2016/03/03 14:05:18 UTC, 0 replies.
- [GitHub] nutch pull request: fix for NUTCH-2238 contributed by ptorrestr - posted by ptorrestr <gi...@git.apache.org> on 2016/03/03 15:37:56 UTC, 4 replies.
- [jira] [Commented] (NUTCH-2238) Indexer for Elasticsearch 2.x - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/03/03 15:38:18 UTC, 4 replies.
- [jira] [Commented] (NUTCH-2237) DeduplicationJob: Add extra order criteria based on slug - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2016/03/03 21:46:18 UTC, 1 replies.
- [jira] [Updated] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_ - posted by "Adnane B. (JIRA)" <ji...@apache.org> on 2016/03/03 23:41:18 UTC, 0 replies.
- GSOC 2016 - posted by Cihad Guzel <cg...@gmail.com> on 2016/03/07 23:38:50 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/08 05:09:41 UTC, 0 replies.
- [jira] [Work started] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/08 05:09:41 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/08 05:10:41 UTC, 3 replies.
- [jira] [Updated] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2016/03/08 14:01:40 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch - posted by "Robert Meusel (JIRA)" <ji...@apache.org> on 2016/03/08 16:05:40 UTC, 2 replies.
- [jira] [Assigned] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2016/03/08 19:41:40 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2005) Implement HTrace'ing in Nutch - posted by "Farasath Ahamed (JIRA)" <ji...@apache.org> on 2016/03/08 20:30:41 UTC, 3 replies.
- [GitHub] nutch pull request: NUTCH-2202 Integration of Anthelion (Focused C... - posted by lewismc <gi...@git.apache.org> on 2016/03/09 10:21:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2185) protocol-soda-consumer plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2016/03/09 23:38:40 UTC, 0 replies.
- [jira] [Created] (NUTCH-2239) Selenium Handlers for Ajax Patterns from Student submissions - posted by "Raghav Bharadwaj Jayasimha Rao (JIRA)" <ji...@apache.org> on 2016/03/14 02:58:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2239) Selenium Handlers for Ajax Patterns from Student submissions - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/14 07:44:33 UTC, 2 replies.
- [jira] [Assigned] (NUTCH-2239) Selenium Handlers for Ajax Patterns from Student submissions - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/14 07:44:33 UTC, 0 replies.
- [jira] [Work started] (NUTCH-2239) Selenium Handlers for Ajax Patterns from Student submissions - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/14 07:45:33 UTC, 0 replies.
- [jira] [Work started] (NUTCH-2191) Add protocol-htmlunit - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/14 08:04:33 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2191) Add protocol-htmlunit - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/14 08:04:33 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2191) Add protocol-htmlunit - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/14 08:04:33 UTC, 33 replies.
- [jira] [Commented] (NUTCH-2138) Tika cannot OCR embedded images from PDF - posted by "Longuemare (JIRA)" <ji...@apache.org> on 2016/03/15 16:04:34 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-2138) Tika cannot OCR embedded images from PDF - posted by "eldk (JIRA)" <ji...@apache.org> on 2016/03/15 16:12:33 UTC, 7 replies.
- [jira] [Commented] (NUTCH-2076) exceptions are not handled when using method waitForCompletion in a try block - posted by "songwanging (JIRA)" <ji...@apache.org> on 2016/03/16 06:01:33 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1492) Support gora-dynamodb in Nutch 2.x - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2016/03/17 08:15:33 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2206) Provide example scoring.similarity.stopword.file - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2016/03/17 08:15:33 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2191) Add protocol-htmlunit - posted by "Karanjeet Singh (JIRA)" <ji...@apache.org> on 2016/03/17 08:58:33 UTC, 1 replies.
- 1.11 branch/tag - posted by Markus Jelsma <ma...@openindex.io> on 2016/03/17 10:43:11 UTC, 2 replies.
- [jira] [Created] (NUTCH-2240) ava.lang.NoSuchFieldError: INSTANCE selenium nutch - posted by "lq (JIRA)" <ji...@apache.org> on 2016/03/17 18:29:33 UTC, 0 replies.
- [jira] [Created] (NUTCH-2241) Unstable Selenium plugin in Nutch. Fixed bugs and enhanced configuration - posted by "Karanjeet Singh (JIRA)" <ji...@apache.org> on 2016/03/20 00:21:33 UTC, 0 replies.
- [GitHub] nutch pull request: fix for NUTCH-2241 contributed by karanjeets - posted by karanjeets <gi...@git.apache.org> on 2016/03/20 00:59:33 UTC, 4 replies.
- [jira] [Commented] (NUTCH-2241) Unstable Selenium plugin in Nutch. Fixed bugs and enhanced configuration - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/03/20 00:59:33 UTC, 4 replies.
- [jira] [Work started] (NUTCH-2241) Unstable Selenium plugin in Nutch. Fixed bugs and enhanced configuration - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/20 01:42:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2241) Unstable Selenium plugin in Nutch. Fixed bugs and enhanced configuration - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/20 01:42:33 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2241) Unstable Selenium plugin in Nutch. Fixed bugs and enhanced configuration - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/20 01:42:33 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2241) Unstable Selenium plugin in Nutch. Fixed bugs and enhanced configuration - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/03/20 01:44:33 UTC, 0 replies.
- [GitHub] nutch pull request: Add the boilerpipe parsing adapted from NUTCH-... - posted by asfgit <gi...@git.apache.org> on 2016/03/20 01:47:27 UTC, 0 replies.
- [jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/03/20 01:47:33 UTC, 0 replies.
- GSoC 2016 for NUTCH-1756 - posted by Furkan KAMACI <fu...@gmail.com> on 2016/03/21 18:46:19 UTC, 1 replies.
- Build failed in Jenkins: Nutch-nutchgora #1550 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/03/22 04:04:04 UTC, 0 replies.
- [GitHub] nutch pull request: NUTCH-2222 re-fetch deletes all metadata excep... - posted by lewismc <gi...@git.apache.org> on 2016/03/22 04:46:02 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3357 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/03/22 05:11:02 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2230) Nutch doesn't index all URLs found - posted by "Aaron Cosand (JIRA)" <ji...@apache.org> on 2016/03/22 14:40:25 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2230) Nutch doesn't index all URLs found - posted by "Aaron Cosand (JIRA)" <ji...@apache.org> on 2016/03/23 15:26:25 UTC, 1 replies.
- [jira] [Created] (NUTCH-2242) lastModified not always set - posted by "Jurian Broertjes (JIRA)" <ji...@apache.org> on 2016/03/23 15:37:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2242) lastModified not always set - posted by "Jurian Broertjes (JIRA)" <ji...@apache.org> on 2016/03/23 15:50:25 UTC, 1 replies.
- New to Nutch2.x - posted by Sabah Sajjad Khan <sa...@wayne.edu> on 2016/03/23 18:06:59 UTC, 1 replies.
- [jira] [Issue Comment Deleted] (NUTCH-2230) Nutch doesn't index all URLs found - posted by "Aaron Cosand (JIRA)" <ji...@apache.org> on 2016/03/23 18:24:25 UTC, 0 replies.
- [jira] [Created] (NUTCH-2243) Documentation for Nutch 2.X REST API - posted by "Furkan KAMACI (JIRA)" <ji...@apache.org> on 2016/03/23 23:46:25 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1756) Security layer for NutchServer - posted by "Furkan KAMACI (JIRA)" <ji...@apache.org> on 2016/03/24 00:55:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1756) Security layer for NutchServer - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2016/03/24 01:02:25 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-2005) Implement HTrace'ing in Nutch - posted by "Farasath Ahamed (JIRA)" <ji...@apache.org> on 2016/03/24 07:03:25 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2242) lastModified not always set - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2016/03/24 13:03:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2005) Implement HTrace'ing in Nutch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2016/03/24 14:37:25 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2089) Move Nutch to compile on JDK 8 - posted by "Furkan KAMACI (JIRA)" <ji...@apache.org> on 2016/03/26 01:24:25 UTC, 1 replies.
- [jira] [Updated] (NUTCH-2191) Add protocol-htmlunit - posted by "Karanjeet Singh (JIRA)" <ji...@apache.org> on 2016/03/27 04:10:25 UTC, 0 replies.
- [GitHub] nutch pull request: fix for NUTCH-2191 contributed by karanjeets - posted by karanjeets <gi...@git.apache.org> on 2016/03/27 08:24:10 UTC, 22 replies.
- [selenium] running selenium headless - posted by Sabah Sajjad Khan <sa...@wayne.edu> on 2016/03/29 02:07:42 UTC, 3 replies.
- [jira] [Created] (NUTCH-2244) Publish Protocol-Interactiveselenium to central maven repo - posted by "Raghav Bharadwaj Jayasimha Rao (JIRA)" <ji...@apache.org> on 2016/03/30 18:46:25 UTC, 0 replies.
- [jira] [Created] (NUTCH-2245) Developed the NGram Model on the existing Unigram Cosine Similarity Model - posted by "Bhavya Sanghavi (JIRA)" <ji...@apache.org> on 2016/03/30 20:22:25 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2245) Developed the NGram Model on the existing Unigram Cosine Similarity Model - posted by "Sujen Shah (JIRA)" <ji...@apache.org> on 2016/03/30 20:29:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2245) Developed the NGram Model on the existing Unigram Cosine Similarity Model - posted by "Sujen Shah (JIRA)" <ji...@apache.org> on 2016/03/30 20:29:25 UTC, 0 replies.
- [GitHub] nutch pull request: Fix for NUTCH-2245 NGram Model for Cosine Simi... - posted by bhavyasanghavi <gi...@git.apache.org> on 2016/03/31 07:20:02 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2245) Developed the NGram Model on the existing Unigram Cosine Similarity Model - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/03/31 07:20:25 UTC, 0 replies.
- Nutch: Tika Parser error while parsing an image - posted by Karanjeet Singh <ka...@usc.edu> on 2016/03/31 11:40:15 UTC, 1 replies.