You are viewing a plain text version of this content. The canonical link for it is here.
- RE: [DISCUSS] Release Apache Nutch 1.10 - posted by Markus Jelsma <ma...@openindex.io> on 2015/04/01 00:45:07 UTC, 4 replies.
- [GitHub] nutch pull request: fix for NUTCH-1771 contributed by Chong Li - posted by areshero <gi...@git.apache.org> on 2015/04/01 02:44:28 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1771) Solrindex fails if a segment is corrupted or incomplete - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/04/01 02:44:53 UTC, 10 replies.
- [GitHub] nutch pull request: fix for Nutch 1973 by sujen1412 - posted by sujen1412 <gi...@git.apache.org> on 2015/04/01 03:36:28 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1973) Job Administration end point for the REST service - posted by "Sujen Shah (JIRA)" <ji...@apache.org> on 2015/04/01 03:36:53 UTC, 12 replies.
- [Nutch Wiki] Update of "Nutch_1.X_RESTAPI/RunningJobsTutorial" by SujenShah - posted by Apache Wiki <wi...@apache.org> on 2015/04/01 05:54:57 UTC, 1 replies.
- [Nutch Wiki] Update of "Nutch_1.X_RESTAPI" by SujenShah - posted by Apache Wiki <wi...@apache.org> on 2015/04/01 05:54:57 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1977) commoncrawldump java heap space - posted by "Jiaheng Zhang (JIRA)" <ji...@apache.org> on 2015/04/01 06:31:53 UTC, 0 replies.
- [Nutch Wiki] Update of "ContributorsGroup" by ChrisMattmann - posted by Apache Wiki <wi...@apache.org> on 2015/04/01 07:25:59 UTC, 0 replies.
- Re: svn commit: r1670442 - /nutch/trunk/src/test/org/apache/nutch/crawl/TestCrawlDbMerger.java - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2015/04/01 07:46:37 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1977) commoncrawldump java heap space - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/01 07:47:53 UTC, 0 replies.
- [Nutch Wiki] Update of "CommonCrawlDataDumper" by darrencheng - posted by Apache Wiki <wi...@apache.org> on 2015/04/01 19:08:18 UTC, 0 replies.
- [jira] [Created] (NUTCH-1980) Jexl expressions for CrawlDbReader - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/04/01 23:54:53 UTC, 0 replies.
- [jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support - posted by "Alexander Kingson (JIRA)" <ji...@apache.org> on 2015/04/01 23:59:54 UTC, 0 replies.
- [jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support - posted by "Alexander Kingson (JIRA)" <ji...@apache.org> on 2015/04/02 00:00:54 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1975) New configuration for CommonCrawlDataDumper tool - posted by "Giuseppe Totaro (JIRA)" <ji...@apache.org> on 2015/04/02 01:03:53 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1980) Jexl expressions for CrawlDbReader - posted by "Jorge Luis Betancourt Gonzalez (JIRA)" <ji...@apache.org> on 2015/04/02 06:36:52 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1980) Jexl expressions for CrawlDbReader - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/04/02 10:49:53 UTC, 2 replies.
- [jira] [Created] (NUTCH-1981) Upgrade icu4j to version 51.1 - posted by "Marko Asplund (JIRA)" <ji...@apache.org> on 2015/04/02 15:30:19 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1981) Upgrade icu4j to version 51.1 - posted by "Marko Asplund (JIRA)" <ji...@apache.org> on 2015/04/02 15:30:20 UTC, 3 replies.
- [jira] [Created] (NUTCH-1982) Make Git ignore IDE project files and add note about IDE setup - posted by "Marko Asplund (JIRA)" <ji...@apache.org> on 2015/04/02 15:30:22 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1982) Make Git ignore IDE project files and add note about IDE setup - posted by "Marko Asplund (JIRA)" <ji...@apache.org> on 2015/04/02 15:30:22 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1975) New configuration for CommonCrawlDataDumper tool - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/03 16:33:53 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1975) New configuration for CommonCrawlDataDumper tool - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/03 16:36:53 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "Nutch_1.X_RESTAPI/RunningJobsTutorial" by SujenShah - posted by Apache Wiki <wi...@apache.org> on 2015/04/04 04:09:48 UTC, 1 replies.
- Re: GSOC RDF Microformats Support - posted by Remzi Düzağaç <re...@gmail.com> on 2015/04/04 16:20:35 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1973) Job Administration end point for the REST service - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/04 17:14:33 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1973) Job Administration end point for the REST service - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/04 17:14:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1933) nutch-selenium plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/06 18:57:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1934) Refactor Fetcher in trunk - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/06 18:57:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-827) HTTP POST Authentication - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/06 18:58:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1856) Document webpage.avsc and host.avsc - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/06 18:58:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1823) Upgrade to elasticsearch 1.4.1 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/06 18:58:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1839) Improve WebGraph CLI parsing - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/06 18:58:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1660) Index filter for Page's latitude and longitude - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/06 18:58:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1526) Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/06 18:59:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-841) Create a Wicket-based Web Application for Nutch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/06 18:59:13 UTC, 0 replies.
- Warm hello! - posted by "Thapar, Shivika" <st...@indiana.edu> on 2015/04/06 20:22:11 UTC, 1 replies.
- Hello! - posted by "Doshi, Nipurn" <ni...@indiana.edu> on 2015/04/06 20:22:11 UTC, 1 replies.
- unsubscribe - posted by Chris Hairfield <ch...@gmail.com> on 2015/04/06 20:29:49 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher - posted by "Asitang Mishra (JIRA)" <ji...@apache.org> on 2015/04/06 21:11:12 UTC, 19 replies.
- Nutch 1.9 integration with Solr 5.0.0 - posted by Anchit Jain <an...@gmail.com> on 2015/04/06 22:12:05 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher - posted by "Asitang Mishra (JIRA)" <ji...@apache.org> on 2015/04/07 03:38:12 UTC, 6 replies.
- [jira] [Updated] (NUTCH-1697) SegmentMerger to implement Tool - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/04/07 13:35:12 UTC, 1 replies.
- Re: HTTP Post Authentication - posted by Tizy Ninan <ti...@gmail.com> on 2015/04/07 14:11:36 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/07 22:31:12 UTC, 12 replies.
- trouble using nutch server - posted by Mahmoud Gzawi <gz...@gmail.com> on 2015/04/08 00:58:47 UTC, 4 replies.
- [jira] [Commented] (NUTCH-1981) Upgrade icu4j to version 51.1 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/04/08 14:00:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1981) Upgrade icu4j - posted by "Marko Asplund (JIRA)" <ji...@apache.org> on 2015/04/08 14:25:12 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1981) Upgrade icu4j - posted by "Marko Asplund (JIRA)" <ji...@apache.org> on 2015/04/08 14:28:12 UTC, 4 replies.
- Issue with Nutch 2.3 and solr 4.9.1 on crawling website: NoSuchElementException - posted by Suman Saurabh <ss...@gmail.com> on 2015/04/08 15:21:51 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1247) CrawlDatum.retries should be int - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/04/08 16:26:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1946) Upgrade to Gora 0.6.1 - posted by "Jeroen Vlek (JIRA)" <ji...@apache.org> on 2015/04/09 15:26:12 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1946) Upgrade to Gora 0.6.1 - posted by "Jeroen Vlek (JIRA)" <ji...@apache.org> on 2015/04/09 15:27:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1771) Solrindex fails if a segment is corrupted or incomplete - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/04/10 00:23:12 UTC, 0 replies.
- [jira] [Created] (NUTCH-1983) CommonCrawlDumper and FileDumper don't dump correct JSON - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/10 06:42:12 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1983) CommonCrawlDumper and FileDumper don't dump correct JSON - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/10 06:43:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1972) Dockerfile for Nutch 1.x - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/10 06:59:12 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1944) Add raw content to indexes - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/10 07:20:14 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1944) Add raw content to indexes - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/10 07:20:14 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1944) Add raw content to indexes - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/10 07:21:12 UTC, 0 replies.
- Git/SVN integration on Nutch 2.x - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2015/04/10 07:25:14 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1944) Add raw content to indexes - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/10 07:29:13 UTC, 4 replies.
- [jira] [Comment Edited] (NUTCH-1944) Add raw content to indexes - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/10 07:54:12 UTC, 0 replies.
- [jira] [Created] (NUTCH-1984) Eliminate unnecessary dependencies - posted by "Marko Asplund (JIRA)" <ji...@apache.org> on 2015/04/10 12:45:20 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1984) Eliminate unnecessary dependencies - posted by "Marko Asplund (JIRA)" <ji...@apache.org> on 2015/04/10 12:47:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1983) CommonCrawlDumper and FileDumper don't dump correct JSON - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/11 01:32:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1905) Nutch index tool should be resilient to segments that don't have crawl_* data - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/11 01:34:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1983) CommonCrawlDumper and FileDumper don't dump correct JSON - posted by "Hudson (JIRA)" <ji...@apache.org> on 2015/04/11 01:51:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1960) JUnit test for dump method of CommonCrawlDataDumper - posted by "Hudson (JIRA)" <ji...@apache.org> on 2015/04/11 06:07:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1960) JUnit test for dump method of CommonCrawlDataDumper - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/11 06:41:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1960) JUnit test for dump method of CommonCrawlDataDumper - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/11 06:42:12 UTC, 0 replies.
- Unsubscribe - posted by Anne Mary Joy <an...@usc.edu> on 2015/04/11 21:20:00 UTC, 7 replies.
- [jira] [Resolved] (NUTCH-1981) Upgrade icu4j - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/04/12 00:14:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1984) Eliminate unnecessary dependencies - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/04/12 00:44:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/12 08:11:12 UTC, 8 replies.
- Review Request 33112: NUTCH-1927: Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing - posted by Chris Mattmann <ma...@apache.org> on 2015/04/12 18:29:14 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/12 18:30:12 UTC, 15 replies.
- [jira] [Comment Edited] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/04/13 13:25:13 UTC, 0 replies.
- [jira] [Issue Comment Deleted] (NUTCH-1946) Upgrade to Gora 0.6.1 - posted by "Jeroen Vlek (JIRA)" <ji...@apache.org> on 2015/04/14 10:36:14 UTC, 0 replies.
- [Nutch Wiki] Update of "SumanSaurabh/GSoC2015Nutch" by SumanSaurabh - posted by Apache Wiki <wi...@apache.org> on 2015/04/14 12:43:30 UTC, 1 replies.
- [jira] [Created] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter - posted by "Jorge Luis Betancourt Gonzalez (JIRA)" <ji...@apache.org> on 2015/04/14 22:30:59 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter - posted by "Jorge Luis Betancourt Gonzalez (JIRA)" <ji...@apache.org> on 2015/04/14 22:34:00 UTC, 0 replies.
- [Nutch Wiki] Update of "FrontPage" by ChrisMattmann - posted by Apache Wiki <wi...@apache.org> on 2015/04/15 07:28:18 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1972) Dockerfile for Nutch 1.x - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/04/15 16:38:58 UTC, 0 replies.
- [Nutch Wiki] Update of "WhiteListRobots" by ChrisMattmann - posted by Apache Wiki <wi...@apache.org> on 2015/04/15 16:56:48 UTC, 4 replies.
- [jira] [Created] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/04/15 17:12:58 UTC, 0 replies.
- [GitHub] nutch pull request: NUTCH-1986 - Update and clarify default Elasti... - posted by MJJoyce <gi...@git.apache.org> on 2015/04/15 17:30:24 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/04/15 17:30:59 UTC, 5 replies.
- [jira] [Created] (NUTCH-1987) Make bin/crawl indexer agnostic - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/04/15 17:45:58 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1987) Make bin/crawl indexer agnostic - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/04/15 17:53:59 UTC, 9 replies.
- [jira] [Comment Edited] (NUTCH-1987) Make bin/crawl indexer agnostic - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/04/15 17:54:58 UTC, 0 replies.
- [GitHub] nutch pull request: NUTCH-1987 - Make bin/crawl indexer agnostic - posted by MJJoyce <gi...@git.apache.org> on 2015/04/15 20:14:55 UTC, 1 replies.
- [jira] [Created] (NUTCH-1988) Make nested output directory dump optional - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/04/15 21:17:58 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1988) Make nested output directory dump optional - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/04/15 21:22:59 UTC, 1 replies.
- [GitHub] nutch pull request: NUTCH-1988 - Add optional flat directory flag ... - posted by MJJoyce <gi...@git.apache.org> on 2015/04/15 21:23:33 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1988) Make nested output directory dump optional - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/04/15 21:23:59 UTC, 4 replies.
- [jira] [Commented] (NUTCH-1930) Fetcher erases Markers for certain URLs / documents - posted by "Clement Mai (JIRA)" <ji...@apache.org> on 2015/04/16 00:13:59 UTC, 0 replies.
- [jira] [Created] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper - posted by "Giuseppe Totaro (JIRA)" <ji...@apache.org> on 2015/04/16 17:50:58 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper - posted by "Giuseppe Totaro (JIRA)" <ji...@apache.org> on 2015/04/16 17:51:58 UTC, 3 replies.
- [jira] [Created] (NUTCH-1990) Use URI.normalise() in BasicURLNormalizer - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/04/16 18:02:58 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/04/16 21:03:59 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1987) Make bin/crawl indexer agnostic - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/04/16 21:03:59 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1964) tmp directory not cleaned up after using commoncrawldump tool - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/16 21:29:59 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1964) tmp directory not cleaned up after using commoncrawldump tool - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/04/16 21:29:59 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1964) tmp directory not cleaned up after using commoncrawldump tool - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/16 21:29:59 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1906) Typo in CrawlDbReader command line help - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/04/16 21:45:59 UTC, 3 replies.
- [GitHub] nutch pull request: NUTCH-1906 - Remove duplicate stats flag listi... - posted by MJJoyce <gi...@git.apache.org> on 2015/04/16 21:46:42 UTC, 1 replies.
- [GitHub] nutch pull request: NUTCH-1911 - Make domainstatics help info a sm... - posted by MJJoyce <gi...@git.apache.org> on 2015/04/16 22:36:12 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1911) Imeprove DomainStatistics tool command line parsing - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/04/16 22:36:59 UTC, 4 replies.
- [jira] [Created] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection - posted by "Iain Lopata (JIRA)" <ji...@apache.org> on 2015/04/17 02:03:43 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection - posted by "Iain Lopata (JIRA)" <ji...@apache.org> on 2015/04/17 02:13:02 UTC, 5 replies.
- [jira] [Commented] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection - posted by "Iain Lopata (JIRA)" <ji...@apache.org> on 2015/04/17 02:51:58 UTC, 6 replies.
- [jira] [Assigned] (NUTCH-1911) Imeprove DomainStatistics tool command line parsing - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/17 20:05:59 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1911) Imeprove DomainStatistics tool command line parsing - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/17 20:06:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1906) Typo in CrawlDbReader command line help - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/17 20:14:00 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1906) Typo in CrawlDbReader command line help - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/17 20:14:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1906) Typo in CrawlDbReader command line help - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/17 20:26:01 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/17 20:27:59 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/17 20:27:59 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/17 22:37:59 UTC, 0 replies.
- DARPA Memex - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2015/04/17 22:55:58 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1988) Make nested output directory dump optional - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/17 22:57:02 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1988) Make nested output directory dump optional - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/17 22:58:59 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/18 18:31:58 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/18 18:31:59 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/18 18:33:59 UTC, 0 replies.
- [jira] [Created] (NUTCH-1992) Port whitelist from NUTCH-1927 to 2.x - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/18 18:35:58 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/18 18:35:59 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper - posted by "Hudson (JIRA)" <ji...@apache.org> on 2015/04/18 18:50:58 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1992) Port whitelist from NUTCH-1927 to 2.x - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/18 19:29:58 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/04/18 22:43:02 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1990) Use URI.normalise() in BasicURLNormalizer - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/04/19 23:18:58 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1697) SegmentMerger to implement Tool - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/20 02:39:59 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1990) Use URI.normalise() in BasicURLNormalizer - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/04/20 17:52:59 UTC, 4 replies.
- [jira] [Work started] (NUTCH-1987) Make bin/crawl indexer agnostic - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/21 04:45:59 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1987) Make bin/crawl indexer agnostic - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/21 04:45:59 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1987) Make bin/crawl indexer agnostic - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/21 04:48:59 UTC, 0 replies.
- [jira] [Created] (NUTCH-1993) Nutch does not use backup parsers - posted by "Arkadi Kosmynin (JIRA)" <ji...@apache.org> on 2015/04/21 08:01:07 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1993) Nutch does not use backup parsers - posted by "Arkadi Kosmynin (JIRA)" <ji...@apache.org> on 2015/04/21 09:17:59 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1697) SegmentMerger to implement Tool - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/04/21 09:43:59 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1993) Nutch does not use backup parsers - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/21 16:57:58 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1993) Nutch does not use backup parsers - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/21 16:57:59 UTC, 0 replies.
- [jira] [Created] (NUTCH-1994) Upgrade to Apache Tika 1.8 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/21 17:59:58 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/22 01:28:01 UTC, 16 replies.
- [jira] [Updated] (NUTCH-1973) Job Administration end point for the REST service - posted by "Sujen Shah (JIRA)" <ji...@apache.org> on 2015/04/22 03:24:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1973) Job Administration end point for the REST service - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/22 03:47:58 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3077 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/22 04:43:46 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #3078 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/22 06:15:57 UTC, 0 replies.
- [jira] [Created] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist - posted by "Giuseppe Totaro (JIRA)" <ji...@apache.org> on 2015/04/22 08:36:58 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist - posted by "Giuseppe Totaro (JIRA)" <ji...@apache.org> on 2015/04/22 08:36:59 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-1990) Use URI.normalise() in BasicURLNormalizer - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/04/22 11:55:58 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection - posted by "Iain Lopata (JIRA)" <ji...@apache.org> on 2015/04/22 14:51:58 UTC, 0 replies.
- [jira] [Created] (NUTCH-1996) Make protocol-selenium README part of plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/22 18:35:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1996) Make protocol-selenium README part of plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/22 18:36:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1996) Make protocol-selenium README part of plugin - posted by "Hudson (JIRA)" <ji...@apache.org> on 2015/04/22 18:59:59 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1994) Upgrade to Apache Tika 1.8 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/22 19:42:59 UTC, 1 replies.
- [jira] [Work started] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/22 20:20:59 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/22 20:20:59 UTC, 0 replies.
- [jira] [Created] (NUTCH-1997) Add CBOR "magic header" to CommonCrawlDataDumper output - posted by "Giuseppe Totaro (JIRA)" <ji...@apache.org> on 2015/04/22 20:34:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1997) Add CBOR "magic header" to CommonCrawlDataDumper output - posted by "Giuseppe Totaro (JIRA)" <ji...@apache.org> on 2015/04/22 20:35:59 UTC, 3 replies.
- [jira] [Created] (NUTCH-1998) Add support for user-defined file extension to CommonCrawlDataDumper - posted by "Giuseppe Totaro (JIRA)" <ji...@apache.org> on 2015/04/22 20:40:59 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1993) Nutch does not use backup parsers - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/04/22 22:50:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1062) Migrate BasicURLNormalizer from Apache ORO to java.util.regex - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/04/22 23:08:59 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1998) Add support for user-defined file extension to CommonCrawlDataDumper - posted by "Giuseppe Totaro (JIRA)" <ji...@apache.org> on 2015/04/23 00:19:58 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1998) Add support for user-defined file extension to CommonCrawlDataDumper - posted by "Luke sh (JIRA)" <ji...@apache.org> on 2015/04/23 00:23:59 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1997) Add CBOR "magic header" to CommonCrawlDataDumper output - posted by "Luke sh (JIRA)" <ji...@apache.org> on 2015/04/23 00:45:00 UTC, 6 replies.
- [jira] [Created] (NUTCH-1999) Add http://nutch.apache.org/robots.txt - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/04/23 12:53:39 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1999) Add http://nutch.apache.org/robots.txt - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/04/23 12:53:39 UTC, 0 replies.
- [jira] [Created] (NUTCH-2000) Link inversion fails with .locked already exists. - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/04/23 13:03:38 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter - posted by "Jorge Luis Betancourt Gonzalez (JIRA)" <ji...@apache.org> on 2015/04/23 16:51:38 UTC, 3 replies.
- [GitHub] nutch pull request: Branch 1.6 - posted by isAbird <gi...@git.apache.org> on 2015/04/23 17:36:29 UTC, 0 replies.
- [PROPOSE] Kick off Apache Nutch 1.8 by EoB Friday 04232015 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/04/23 20:14:47 UTC, 1 replies.
- Unsubscribe - posted by Mengxian Li <me...@usc.edu> on 2015/04/23 21:07:20 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1994) Upgrade to Apache Tika 1.8 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/23 23:39:38 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1947) Overhaul o.a.n.parse.OutlinkExtractor.java - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/23 23:40:39 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2000) Link inversion fails with .locked already exists. - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/23 23:40:39 UTC, 4 replies.
- [jira] [Commented] (NUTCH-1963) CommonsCrawlDataDumper is too long ( > 100 bytes) when -gzip option invoked - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/23 23:41:39 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1969) URL Normalizer properly handling slashes - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/23 23:42:39 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1958) Remove scoring-opic from nutch-default.xml - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/23 23:43:39 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2000) Link inversion fails with .locked already exists. - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/04/23 23:50:38 UTC, 3 replies.
- Build failed in Jenkins: Nutch-trunk #3083 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/23 23:50:50 UTC, 0 replies.
- [jira] [Created] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml - posted by "Jeff Cocking (JIRA)" <ji...@apache.org> on 2015/04/24 00:20:38 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml - posted by "Jeff Cocking (JIRA)" <ji...@apache.org> on 2015/04/24 00:43:38 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml - posted by "Jeff Cocking (JIRA)" <ji...@apache.org> on 2015/04/24 00:45:38 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1963) CommonsCrawlDataDumper is too long ( > 100 bytes) when -gzip option invoked - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/24 01:37:38 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3084 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/24 02:50:02 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3085 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/24 03:23:25 UTC, 0 replies.
- Re: [MASSMAIL]Re: [PROPOSE] Kick off Apache Nutch 1.8 by EoB Friday 04232015 - posted by Jorge Luis Betancourt González <jl...@uci.cu> on 2015/04/24 04:04:21 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter - posted by "Jorge Luis Betancourt Gonzalez (JIRA)" <ji...@apache.org> on 2015/04/24 04:31:38 UTC, 0 replies.
- [jira] [Issue Comment Deleted] (NUTCH-1997) Add CBOR "magic header" to CommonCrawlDataDumper output - posted by "Luke sh (JIRA)" <ji...@apache.org> on 2015/04/24 04:42:38 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3086 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/24 04:50:02 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3087 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/24 06:14:38 UTC, 0 replies.
- (Unknown) - posted by Kunal Parakh <kp...@usc.edu> on 2015/04/24 21:08:32 UTC, 0 replies.
- [ANNOUNCE] New Nutch committer and PMC - Guiseppe Totaro - posted by Sebastian Nagel <wa...@googlemail.com> on 2015/04/24 22:00:49 UTC, 3 replies.
- Build failed in Jenkins: Nutch-trunk #3088 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/25 06:19:07 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/25 17:49:38 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1997) Add CBOR "magic header" to CommonCrawlDataDumper output - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/25 17:52:39 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1997) Add CBOR "magic header" to CommonCrawlDataDumper output - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/25 17:52:39 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1997) Add CBOR "magic header" to CommonCrawlDataDumper output - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/25 17:57:38 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3089 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/25 18:49:59 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3090 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/26 06:06:57 UTC, 0 replies.
- [jira] [Work started] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/27 03:36:38 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/27 03:36:38 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/27 03:38:38 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1969) URL Normalizer properly handling slashes - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/27 03:39:38 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1969) URL Normalizer properly handling slashes - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/27 03:39:38 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1969) URL Normalizer properly handling slashes - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/04/27 03:41:38 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3091 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/27 03:50:04 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3092 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/27 06:07:17 UTC, 0 replies.
- [jira] [Created] (NUTCH-2002) ParserChecker to check robots.txt - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/04/27 16:47:38 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2002) ParserChecker to check robots.txt - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/04/27 16:48:38 UTC, 0 replies.
- All issues fixed for 1.10 - Tika 1.8 build issue - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2015/04/27 18:39:18 UTC, 1 replies.
- Build failed in Jenkins: Nutch-trunk #3093 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/28 06:23:08 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3094 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/29 06:23:14 UTC, 0 replies.
- [jira] [Created] (NUTCH-2003) topN is not work correctly - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2015/04/29 11:33:05 UTC, 0 replies.
- [jira] [Created] (NUTCH-2004) ParseChecker does not handle redirects - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/04/29 21:30:07 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2004) ParseChecker does not handle redirects - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/04/29 21:31:08 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #3095 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/04/29 21:52:59 UTC, 0 replies.
- [VOTE] Release Apache Nutch 1.10 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/04/29 23:54:26 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1559) parse-metatags duplicates extracted metatags in combination with parse-tika - posted by "Jeff Cocking (JIRA)" <ji...@apache.org> on 2015/04/30 21:38:06 UTC, 1 replies.
- Reverse Geocoding with Nutch 1.10 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/04/30 23:26:17 UTC, 0 replies.