You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Updated] (NUTCH-1924) Nutch + HBase Docker - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/01 03:11:34 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7 - posted by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2015/02/01 03:13:34 UTC, 15 replies.
- [jira] [Assigned] (NUTCH-827) HTTP POST Authentication - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/01 05:22:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-827) HTTP POST Authentication - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/01 05:22:35 UTC, 5 replies.
- [jira] [Commented] (NUTCH-827) HTTP POST Authentication - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/01 05:23:34 UTC, 9 replies.
- [jira] [Updated] (NUTCH-1925) Upgrade Tika to version 1.7 - posted by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2015/02/02 04:03:34 UTC, 6 replies.
- [jira] [Updated] (NUTCH-1928) Indexing filter of documents by the MIME type - posted by "Jorge Luis Betancourt Gonzalez (JIRA)" <ji...@apache.org> on 2015/02/02 06:37:34 UTC, 7 replies.
- [jira] [Commented] (NUTCH-1928) Indexing filter of documents by the MIME type - posted by "Jorge Luis Betancourt Gonzalez (JIRA)" <ji...@apache.org> on 2015/02/02 06:39:34 UTC, 12 replies.
- [jira] [Created] (NUTCH-1930) Fetcher erases Markers for certain URLs / documents - posted by "Michiel (JIRA)" <ji...@apache.org> on 2015/02/02 16:45:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1930) Fetcher erases Markers for certain URLs / documents - posted by "Michiel (JIRA)" <ji...@apache.org> on 2015/02/02 16:46:35 UTC, 2 replies.
- Re: Blog topic: Maxmind's GeoIP2 API being used in Apache Nutch 1.10 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/02/02 18:30:53 UTC, 1 replies.
- [Nutch Wiki] Trivial Update of "HttpAuthenticationSchemes" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/02/02 20:19:49 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "HttpPostAuthentication" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/02/02 20:21:31 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1924) Nutch + HBase Docker - posted by "Radosław Stankiewicz (JIRA)" <ji...@apache.org> on 2015/02/03 00:56:35 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1924) Nutch + HBase Docker - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/03 02:30:34 UTC, 5 replies.
- [jira] [Created] (NUTCH-1931) Apache Nutch 1.x REST service and crawler visualization - posted by "Sujen Shah (JIRA)" <ji...@apache.org> on 2015/02/03 05:57:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1930) Fetcher erases Markers for certain URLs / documents - posted by "Michiel (JIRA)" <ji...@apache.org> on 2015/02/03 17:28:36 UTC, 0 replies.
- [jira] [Work started] (NUTCH-827) HTTP POST Authentication - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/04 01:05:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1929) Consider implementing dependency injection for crawl HTTPS sites that use self signed certificates - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/04 01:32:34 UTC, 0 replies.
- [jira] [Created] (NUTCH-1932) Automatically remove orphaned pages - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/04 17:35:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1932) Automatically remove orphaned pages - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/04 17:36:34 UTC, 0 replies.
- [jira] [Created] (NUTCH-1933) nutch-selenium plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/04 19:37:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1933) nutch-selenium plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/04 19:37:36 UTC, 6 replies.
- [jira] [Created] (NUTCH-1934) Refactor Fetcher in trunk - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/04 21:34:34 UTC, 0 replies.
- [jira] [Created] (NUTCH-1935) too many open files - posted by "yuanyun.cn (JIRA)" <ji...@apache.org> on 2015/02/04 22:56:36 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "ContributorsGroup" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/02/04 23:33:11 UTC, 1 replies.
- GSoC 2015 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/02/04 23:42:22 UTC, 6 replies.
- [Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/02/04 23:42:22 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1935) too many open files - posted by "stack (JIRA)" <ji...@apache.org> on 2015/02/04 23:58:34 UTC, 3 replies.
- [Nutch Wiki] Trivial Update of "AdvancedAjaxInteraction" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/02/05 00:19:26 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1934) Refactor Fetcher in trunk - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/05 02:17:34 UTC, 5 replies.
- [jira] [Commented] (NUTCH-1933) nutch-selenium plugin - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/05 13:29:34 UTC, 21 replies.
- [jira] [Comment Edited] (NUTCH-1933) nutch-selenium plugin - posted by "Mo Omer (JIRA)" <ji...@apache.org> on 2015/02/05 17:16:34 UTC, 2 replies.
- [jira] [Created] (NUTCH-1936) GSoC 2015 - Move Nutch to Hadoop 2.X - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/05 18:54:37 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "GoogleSummerOfCode" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/02/05 18:58:57 UTC, 2 replies.
- [INVITATION] Apache Nutch Google Summer of Code 2015 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/02/05 19:35:03 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1936) GSoC 2015 - Move Nutch to Hadoop 2.X - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/05 19:35:39 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1931) Apache Nutch 1.x REST service and crawler visualization - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/02/06 19:40:34 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1931) Apache Nutch 1.x REST service and crawler visualization - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/02/06 19:40:35 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1916) Apache Nutch CXF-based REST services - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/02/06 19:41:35 UTC, 0 replies.
- Google Summer of Code Program - posted by Owen A Lin <oa...@nyu.edu> on 2015/02/06 23:23:53 UTC, 3 replies.
- (Unknown) - posted by lujinhong <lu...@yahoo.com> on 2015/02/07 14:21:03 UTC, 12 replies.
- hbase content of the injectorjob - posted by lujinhong <lu...@yahoo.com> on 2015/02/07 14:23:16 UTC, 3 replies.
- hbase content of injectorjob - posted by jinhong lu <lu...@yahoo.com> on 2015/02/07 15:33:40 UTC, 2 replies.
- hbase content of nutch - posted by lu_jin_hong(陆锦洪) <lu...@163.com> on 2015/02/07 15:37:33 UTC, 2 replies.
- [GitHub] nutch pull request: Update README.txt - posted by chrismattmann <gi...@git.apache.org> on 2015/02/07 19:51:42 UTC, 2 replies.
- Suggestion: move README.txt to README.md - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2015/02/07 19:55:03 UTC, 0 replies.
- [jira] [Created] (NUTCH-1937) Error: Could not find or load main class bin.crawl - posted by "Nishant Jani (JIRA)" <ji...@apache.org> on 2015/02/07 21:25:34 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1937) Error: Could not find or load main class bin.crawl - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/02/07 22:34:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1937) Error: Could not find or load main class bin.crawl - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/02/07 22:42:34 UTC, 2 replies.
- Isnt the All-in-one Crawl Deprecated? - posted by nishant jani <ni...@gmail.com> on 2015/02/07 23:56:48 UTC, 1 replies.
- [jira] [Created] (NUTCH-1938) Error When Running Nutch - posted by "Pranshu Kumar (JIRA)" <ji...@apache.org> on 2015/02/08 08:25:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1938) Error When Running Nutch - posted by "Pranshu Kumar (JIRA)" <ji...@apache.org> on 2015/02/08 08:26:34 UTC, 1 replies.
- unsubscribe - posted by Arthur Cinader <ac...@gmail.com> on 2015/02/08 13:56:55 UTC, 3 replies.
- [jira] [Work started] (NUTCH-1938) Unable to load realm info from SCDynamicStore - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/02/08 17:59:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1938) Unable to load realm info from SCDynamicStore - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/02/08 17:59:34 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1938) Unable to load realm info from SCDynamicStore - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/02/08 17:59:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1938) Unable to load realm info from SCDynamicStore - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/02/08 18:00:41 UTC, 1 replies.
- Fetch queue size, Multiple seed URLs and Maximum Depth - posted by Preetam Pradeepkumar Shingavi <sh...@usc.edu> on 2015/02/08 19:18:52 UTC, 4 replies.
- [jira] [Resolved] (NUTCH-1938) Unable to load realm info from SCDynamicStore - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/08 20:06:34 UTC, 0 replies.
- [Nutch Wiki] New attachment added to page GoogleSummerOfCode - posted by Apache Wiki <wi...@apache.org> on 2015/02/08 20:43:16 UTC, 0 replies.
- 572:Crawl statistics for each repository ? - posted by Jaydeep Bagrecha <ba...@usc.edu> on 2015/02/08 23:22:36 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1913) LinkDB to implement db.ignore.external.links - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/02/09 17:56:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1932) Automatically remove orphaned pages - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/02/09 18:13:35 UTC, 2 replies.
- [jira] [Comment Edited] (NUTCH-1932) Automatically remove orphaned pages - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/09 18:27:34 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1735) code dedup fetcher queue redirects - posted by "Leo Ye (JIRA)" <ji...@apache.org> on 2015/02/10 02:36:35 UTC, 1 replies.
- [jira] [Work stopped] (NUTCH-827) HTTP POST Authentication - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/10 06:10:35 UTC, 0 replies.
- Re: Reverse Geocoding for the Masses - Apache Nutch Guest Post - Revised - STF - Invitation to comment - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/02/10 17:42:43 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1323) AjaxNormalizer - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/10 17:58:12 UTC, 2 replies.
- [jira] [Assigned] (NUTCH-1735) code dedup fetcher queue redirects - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/02/10 23:12:12 UTC, 0 replies.
- [jira] [Created] (NUTCH-1939) Fetcher fails to follow redirects - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/02/10 23:18:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1939) Fetcher fails to follow redirects - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/02/10 23:22:13 UTC, 0 replies.
- Why the protocol-httpclient Does Handle URL with Special Characters - posted by Renxia Wang <re...@usc.edu> on 2015/02/10 23:58:04 UTC, 1 replies.
- [Nutch Wiki] Update of "FrontPage" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/02/11 01:13:12 UTC, 0 replies.
- [Nutch Wiki] Update of "SujenShah" by SujenShah - posted by Apache Wiki <wi...@apache.org> on 2015/02/11 01:18:28 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1939) Fetcher fails to follow redirects - posted by "lufeng (JIRA)" <ji...@apache.org> on 2015/02/11 02:41:11 UTC, 2 replies.
- [jira] [Comment Edited] (NUTCH-1939) Fetcher fails to follow redirects - posted by "lufeng (JIRA)" <ji...@apache.org> on 2015/02/11 03:17:11 UTC, 0 replies.
- org.mortbay.proxy package not found in nutch 1.x, Ref Class - ProxyTestbed - posted by Preetam Pradeepkumar Shingavi <sh...@usc.edu> on 2015/02/11 05:00:04 UTC, 2 replies.
- [jira] [Created] (NUTCH-1940) Port HTTP POST Authentication to 2.X - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/11 05:22:11 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1940) Port HTTP POST Authentication to 2.X - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/11 05:22:11 UTC, 0 replies.
- Nutch-Selenium in Nutch 1.10 - posted by Shuo Li <sl...@usc.edu> on 2015/02/11 06:36:46 UTC, 17 replies.
- [jira] [Created] (NUTCH-1941) Optional rolling http.agent.name's - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/11 07:24:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1941) Optional rolling http.agent.name's - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/11 07:26:11 UTC, 0 replies.
- Move Nutch to Hadoop 2.X - posted by Dulaj Viduranga <vi...@icloud.com> on 2015/02/11 15:25:21 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1323) AjaxNormalizer - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/12 09:30:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1323) AjaxNormalizer - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/12 09:31:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1913) LinkDB to implement db.ignore.external.links - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/12 09:45:11 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1913) LinkDB to implement db.ignore.external.links - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/12 09:48:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1921) Optionally parse fetch_not_modified - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/12 09:51:11 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1684) ParseMeta to be added before fetch schedulers are run - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/12 09:51:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1730) Scoring-depth optionally not to increment depth for external hosts - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/12 09:52:11 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1724) LinkDBReader to support regex output filtering - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/12 09:59:11 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1939) Fetcher fails to follow redirects - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/02/12 12:54:11 UTC, 0 replies.
- [jira] [Created] (NUTCH-1942) Remove TopLevelDomain - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/02/12 15:49:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1942) Remove TopLevelDomain - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/02/12 15:56:12 UTC, 6 replies.
- Varying Number of URLS Crawled. - posted by Nagarjun Pola <np...@usc.edu> on 2015/02/12 19:39:12 UTC, 1 replies.
- nutch subscribe - posted by Poojan Jhaveri <pj...@usc.edu> on 2015/02/13 00:53:07 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1925) Upgrade Tika to version 1.7 - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/13 13:26:12 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1724) LinkDBReader to support regex output filtering - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/13 13:29:11 UTC, 0 replies.
- Re: [nutch-cassandra-docker] Inquiry on contribution (#1) - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/02/13 15:23:10 UTC, 1 replies.
- Build failed in Jenkins: Nutch-nutchgora #1337 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/02/13 15:48:36 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1724) LinkDBReader to support regex output filtering - posted by "Hudson (JIRA)" <ji...@apache.org> on 2015/02/13 16:09:13 UTC, 0 replies.
- Vagrant Crushed When using Nutch-Selenium - posted by Shuo Li <sl...@usc.edu> on 2015/02/13 19:12:21 UTC, 13 replies.
- Integrate Splash with Nutch akin to Selenium - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2015/02/13 20:38:23 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-827) HTTP POST Authentication - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/13 23:07:11 UTC, 0 replies.
- [jira] [Created] (NUTCH-1943) Form authentication should not be global and ignore - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/13 23:09:11 UTC, 0 replies.
- [jira] [Created] (NUTCH-1944) Add raw content to indexes - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/14 05:35:11 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1338 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/02/14 11:32:59 UTC, 0 replies.
- [GitHub] nutch pull request: fix for NUTCH-1944 Index HTML raw content cont... - posted by Meabed <gi...@git.apache.org> on 2015/02/14 14:33:06 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1944) Add raw content to indexes - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/02/14 14:33:11 UTC, 3 replies.
- Build failed in Jenkins: Nutch-nutchgora #1339 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/02/15 09:27:27 UTC, 0 replies.
- Does Limiting the (ftp|http).content.size Affect the Parsing and Deduplication? - posted by Renxia Wang <re...@usc.edu> on 2015/02/15 19:10:33 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient - posted by "Fabio Santagostino (JIRA)" <ji...@apache.org> on 2015/02/15 20:53:11 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient - posted by "Fabio Santagostino (JIRA)" <ji...@apache.org> on 2015/02/15 21:03:12 UTC, 1 replies.
- How to know whether politeness policy is well set? - posted by Jaydeep Bagrecha <ba...@usc.edu> on 2015/02/16 01:43:07 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1340 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/02/16 05:01:56 UTC, 0 replies.
- Re: - posted by Jiaxin Ye <ji...@usc.edu> on 2015/02/16 08:34:31 UTC, 5 replies.
- Any suggestions for avoiding this fetch failure error? - posted by Jaydeep Bagrecha <ba...@usc.edu> on 2015/02/16 09:57:19 UTC, 0 replies.
- Nutch-Selenium Error - posted by Mohammad Al-Mohsin <me...@mem9.net> on 2015/02/16 10:57:00 UTC, 4 replies.
- [jira] [Updated] (NUTCH-1921) Optionally disable HTTP if-modified-since header - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/02/16 11:11:12 UTC, 1 replies.
- Nutch Crawler Java.io.IOException - posted by Siddharth Mahendra Dasani <sd...@usc.edu> on 2015/02/16 22:39:06 UTC, 6 replies.
- Build failed in Jenkins: Nutch-nutchgora #1341 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/02/17 05:03:08 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-1925) Upgrade Tika to version 1.7 - posted by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2015/02/17 19:59:11 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1925) Upgrade Tika to version 1.7 - posted by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2015/02/17 20:42:12 UTC, 1 replies.
- HttpPostAuthentication Cannot Find Authentication Form - posted by Mohammad Al-Mohsin <me...@mem9.net> on 2015/02/17 21:22:59 UTC, 2 replies.
- [jira] [Issue Comment Deleted] (NUTCH-1925) Upgrade Tika to version 1.7 - posted by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2015/02/17 21:39:12 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1342 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/02/18 05:04:14 UTC, 0 replies.
- subscribe - posted by Shreshta Manu <ma...@usc.edu> on 2015/02/18 05:25:40 UTC, 0 replies.
- Selenium Grid 2 Installation Tutorial for Mac - posted by Nagarjun Pola <np...@usc.edu> on 2015/02/18 05:48:27 UTC, 2 replies.
- Tesseract OCR and GDAL in Tika plugin for Nutch? - posted by Nikunj Gala <ni...@usc.edu> on 2015/02/18 23:24:59 UTC, 6 replies.
- Build failed in Jenkins: Nutch-nutchgora #1343 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/02/19 05:03:28 UTC, 0 replies.
- [Maven Failed] - posted by Jiaxin Ye <ji...@usc.edu> on 2015/02/19 08:48:15 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1679) UpdateDb using batchId, link may override crawled page. - posted by "Alfonso Nishikawa (JIRA)" <ji...@apache.org> on 2015/02/19 15:27:12 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1679) UpdateDb using batchId, link may override crawled page. - posted by "Alfonso Nishikawa (JIRA)" <ji...@apache.org> on 2015/02/19 15:46:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1944) Add raw content to indexes - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/19 18:05:12 UTC, 2 replies.
- [ANNOUNCE] New Nutch committer and PMC - Jorge Luis Betancourt Gonzalez - posted by Sebastian Nagel <wa...@googlemail.com> on 2015/02/19 18:20:53 UTC, 6 replies.
- [jira] [Closed] (NUTCH-1935) too many open files - posted by "jefferyyuan (JIRA)" <ji...@apache.org> on 2015/02/19 20:35:12 UTC, 0 replies.
- Re: [MASSMAIL] Re: [ANNOUNCE] New Nutch committer and PMC - Jorge Luis Betancourt Gonzalez - posted by Yusniel Hidalgo Delgado <yh...@uci.cu> on 2015/02/19 20:57:37 UTC, 0 replies.
- [jira] [Created] (NUTCH-1945) Test for XLSX parser - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/02/20 00:08:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1945) Test for XLSX parser - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/02/20 00:13:11 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1344 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/02/20 05:32:35 UTC, 0 replies.
- Problem Fetching with Selenium Installed - posted by Nagarjun Pola <np...@usc.edu> on 2015/02/20 06:44:43 UTC, 7 replies.
- Problem installing Selenium on Ubuntu with Nutch trunk 1.10 - posted by Yash Sangani <ys...@usc.edu> on 2015/02/20 09:36:13 UTC, 15 replies.
- linkdb/current/part-00000/data does not exist - posted by Shuo Li <sl...@usc.edu> on 2015/02/21 00:26:22 UTC, 3 replies.
- Build failed in Jenkins: Nutch-nutchgora #1345 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/02/21 05:03:14 UTC, 0 replies.
- Nutchpy crawled statistics - posted by Pranshu Kumar <pr...@usc.edu> on 2015/02/21 05:45:10 UTC, 3 replies.
- Nutch-Selenium Plugin Truncates Binary Data - posted by Mohammad Al-Mohsin <me...@mem9.net> on 2015/02/21 15:03:28 UTC, 5 replies.
- [Nutch Wiki] Update of "NutchTutorial" by SujenShah - posted by Apache Wiki <wi...@apache.org> on 2015/02/22 03:30:53 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1346 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/02/22 05:01:56 UTC, 0 replies.
- [Nutch Wiki] Update of "Nutch_1.X_RESTAPI" by SujenShah - posted by Apache Wiki <wi...@apache.org> on 2015/02/22 05:08:16 UTC, 0 replies.
- Selenium error - posted by Puranjay Rajpal <pr...@usc.edu> on 2015/02/22 09:49:08 UTC, 0 replies.
- [jira] [Created] (NUTCH-1946) Upgrade to Gora 0.6 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/22 21:23:11 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #1347 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/02/22 21:41:16 UTC, 0 replies.
- [Nutch Wiki] Update of "RunNutchInEclipse" by SebastianNagel - posted by Apache Wiki <wi...@apache.org> on 2015/02/22 21:41:27 UTC, 0 replies.
- [Nutch Wiki] New attachment added to page RunNutchInEclipse - posted by Apache Wiki <wi...@apache.org> on 2015/02/22 21:49:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-840) Port tests from parse-html to parse-tika - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/22 22:00:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1923) Nutch + Cassandra Docker - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/22 22:41:11 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1709) Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/22 22:51:11 UTC, 0 replies.
- [jira] [Created] (NUTCH-1947) Overhaul o.a.n.parse.OutlinkExtractor.java - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/22 23:03:11 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1946) Upgrade to Gora 0.6 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/22 23:25:11 UTC, 0 replies.
- How to read metadata/content of an URL in URLFilter? - posted by Renxia Wang <re...@usc.edu> on 2015/02/23 00:36:52 UTC, 18 replies.
- [Nutch Wiki] Update of "AdvancedAjaxInteraction" by ChrisMattmann - posted by Apache Wiki <wi...@apache.org> on 2015/02/23 01:35:24 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1946) Upgrade to Gora 0.6 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/23 04:16:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1946) Upgrade to Gora 0.6 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/23 04:53:11 UTC, 11 replies.
- Subscribe to the mailing list - posted by Chetan Vazirabadkar <va...@usc.edu> on 2015/02/23 06:31:28 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1928) Indexing filter of documents by the MIME type - posted by "Jorge Luis Betancourt Gonzalez (JIRA)" <ji...@apache.org> on 2015/02/23 14:55:11 UTC, 0 replies.
- [jira] [Created] (NUTCH-1948) Make the Selenium remote web driver specification, configuration and selection available via a Factory-type mechanism - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/23 19:29:13 UTC, 0 replies.
- Re: [MASSMAIL]Re: How to read metadata/content of an URL in URLFilter? - posted by Jorge Luis Betancourt González <jl...@uci.cu> on 2015/02/23 21:56:37 UTC, 1 replies.
- unsubscribe - posted by Gioele Zanzico <gi...@manfrotto.com> on 2015/02/23 22:06:56 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "Getting_Started" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/02/24 02:40:25 UTC, 0 replies.
- How to verify Nutch - Selenium - posted by nishant jani <ni...@gmail.com> on 2015/02/24 03:41:59 UTC, 0 replies.
- [jira] [Created] (NUTCH-1949) Dump out the Nuth data into the Common Crawl format - posted by "Giuseppe Totaro (JIRA)" <ji...@apache.org> on 2015/02/24 05:41:11 UTC, 0 replies.
- questions about the webui packages - posted by lujinhong <lu...@yahoo.com> on 2015/02/24 16:05:52 UTC, 2 replies.
- [Nutch Wiki] Update of "ContributorsGroup" by ChrisMattmann - posted by Apache Wiki <wi...@apache.org> on 2015/02/24 16:37:05 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1949) Dump out the Nuth data into the Common Crawl format - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/24 17:41:04 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1949) Dump out the Nuth data into the Common Crawl format - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/24 17:42:04 UTC, 2 replies.
- [jira] [Created] (NUTCH-1950) File name too long when bin/nutch dump - posted by "Chong Li (JIRA)" <ji...@apache.org> on 2015/02/24 20:07:05 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1946) Upgrade to Gora 0.6 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/24 20:28:04 UTC, 1 replies.
- [jira] [Created] (NUTCH-1951) Parse tool to accept a parent segment directory as well as individual segment directory - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/24 23:23:05 UTC, 0 replies.
- [jira] [Created] (NUTCH-1952) Add a timezone to the Nutch log4j.properties configuration - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/24 23:31:04 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "bin/nutch parse" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/02/24 23:49:20 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1870) Generic xsl parser plugin - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/02/25 22:56:04 UTC, 0 replies.
- tika to parse url data content - posted by Nancy Sharma <na...@gmail.com> on 2015/02/26 04:52:48 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1870) Generic xsl parser plugin - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/02/26 05:05:05 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1950) File name too long when bin/nutch dump - posted by "Chong Li (JIRA)" <ji...@apache.org> on 2015/02/26 07:33:06 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1950) File name too long when bin/nutch dump - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/02/26 07:35:05 UTC, 5 replies.
- [GitHub] nutch pull request: fix for NUTCH-1950 contributed by xzjh - posted by xzjh <gi...@git.apache.org> on 2015/02/26 08:04:08 UTC, 0 replies.
- [jira] [Issue Comment Deleted] (NUTCH-1950) File name too long when bin/nutch dump - posted by "Chong Li (JIRA)" <ji...@apache.org> on 2015/02/26 08:57:05 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1933) nutch-selenium plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/26 19:32:04 UTC, 0 replies.
- MetaData fornear duplicates - posted by Ami Akshay Parikh <am...@usc.edu> on 2015/02/26 19:53:52 UTC, 4 replies.
- Unsubscribe - posted by Massimo Miccoli <mm...@iltrovatore.it> on 2015/02/26 20:11:52 UTC, 2 replies.
- [jira] [Comment Edited] (NUTCH-1946) Upgrade to Gora 0.6 - posted by "Henry Saputra (JIRA)" <ji...@apache.org> on 2015/02/26 22:57:06 UTC, 1 replies.
- unsuscribe - posted by Jiangang Sun <ji...@usc.edu> on 2015/02/27 00:30:06 UTC, 0 replies.
- [jira] [Created] (NUTCH-1953) Integrate µBlock into Nutch to block Ads - posted by "Trevor Claude Lewis (JIRA)" <ji...@apache.org> on 2015/02/28 03:22:04 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "Nutch_1.X_RESTAPI" by SujenShah - posted by Apache Wiki <wi...@apache.org> on 2015/02/28 09:12:54 UTC, 0 replies.
- [Nutch Wiki] Update of "AdvancedAjaxInteraction" by JayavanthShenoy - posted by Apache Wiki <wi...@apache.org> on 2015/02/28 11:02:37 UTC, 0 replies.