You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/01 09:47:18 UTC, 3 replies.
- [jira] [Created] (NUTCH-2029) Mark.checkMark returns empty string when null is expected with mongodb storage - posted by "Alexander Yastrebov (JIRA)" <ji...@apache.org> on 2015/06/01 17:46:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2029) Mark.checkMark returns empty string when null is expected with mongodb storage - posted by "Alexander Yastrebov (JIRA)" <ji...@apache.org> on 2015/06/01 17:47:17 UTC, 1 replies.
- [Nutch Wiki] Update of "GoogleSummerOfCode/SitemapCrawler/week1" by CihadGuzel - posted by Apache Wiki <wi...@apache.org> on 2015/06/01 21:29:07 UTC, 0 replies.
- [Nutch Wiki] Update of "GoogleSummerOfCode/SitemapCrawler/week2" by CihadGuzel - posted by Apache Wiki <wi...@apache.org> on 2015/06/01 21:32:59 UTC, 0 replies.
- [Nutch Wiki] Update of "GoogleSummerOfCode/SitemapCrawler" by CihadGuzel - posted by Apache Wiki <wi...@apache.org> on 2015/06/01 22:03:15 UTC, 4 replies.
- [jira] [Updated] (NUTCH-2028) java.lang.IllegalArgumentException: can't serialize class org.apache.avro.util.Utf8 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/02 00:38:20 UTC, 0 replies.
- [GitHub] nutch pull request: fix for NUTCH-2015 contributed by Sujen Shah - posted by asfgit <gi...@git.apache.org> on 2015/06/02 06:29:17 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/02 06:30:18 UTC, 0 replies.
- [jira] [Created] (NUTCH-2030) ParseZip plugin is not able to extract language from zip document,this could solve that problem. - posted by "Eyeris Rodriguez Rueda (JIRA)" <ji...@apache.org> on 2015/06/02 15:19:18 UTC, 0 replies.
- [jira] [Created] (NUTCH-2031) Create Admin End point for Nutch 1.x REST service - posted by "Sujen Shah (JIRA)" <ji...@apache.org> on 2015/06/02 21:01:51 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1944) Add raw content to indexes - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/02 21:06:50 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1785) Ability to index raw content - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/02 21:06:50 UTC, 2 replies.
- [GitHub] nutch pull request: fix for NUTCH-2031 contributed by Sujen Shah - posted by sujen1412 <gi...@git.apache.org> on 2015/06/02 21:11:10 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2031) Create Admin End point for Nutch 1.x REST service - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/06/02 21:11:50 UTC, 7 replies.
- [jira] [Updated] (NUTCH-2031) Create Admin End point for Nutch 1.x REST service - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/03 04:18:50 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2031) Create Admin End point for Nutch 1.x REST service - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/03 04:18:50 UTC, 0 replies.
- [jira] [Work started] (NUTCH-2031) Create Admin End point for Nutch 1.x REST service - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/03 04:18:50 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3148 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/06/03 06:06:36 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2031) Create Admin End point for Nutch 1.x REST service - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/03 06:06:50 UTC, 0 replies.
- [GitHub] nutch pull request: added missing class NutchServerInfo for fix fo... - posted by sujen1412 <gi...@git.apache.org> on 2015/06/03 06:20:55 UTC, 1 replies.
- Jenkins build is back to normal : Nutch-trunk #3149 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/06/03 07:02:30 UTC, 0 replies.
- [GitHub] nutch pull request: fix for NUTCH-2027 - posted by asitang <gi...@git.apache.org> on 2015/06/03 18:50:25 UTC, 4 replies.
- [jira] [Commented] (NUTCH-2027) seed list REST endpoint for Nutch 1.10 - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/06/03 18:50:39 UTC, 13 replies.
- [jira] [Assigned] (NUTCH-2027) seed list REST endpoint for Nutch 1.10 - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/03 19:33:47 UTC, 0 replies.
- [jira] [Work started] (NUTCH-2027) seed list REST endpoint for Nutch 1.10 - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/03 19:33:47 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2027) seed list REST endpoint for Nutch 1.10 - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/03 19:33:50 UTC, 2 replies.
- [jira] [Created] (NUTCH-2032) Plugin to index the raw content of a readable document. - posted by "Luis Lopez (JIRA)" <ji...@apache.org> on 2015/06/03 21:01:39 UTC, 0 replies.
- [jira] [Created] (NUTCH-2033) parse-tika skips valid documents. - posted by "Luis Lopez (JIRA)" <ji...@apache.org> on 2015/06/03 21:14:38 UTC, 0 replies.
- [jira] [Created] (NUTCH-2034) CrawlDB filtered documents counter. - posted by "Luis Lopez (JIRA)" <ji...@apache.org> on 2015/06/03 21:21:37 UTC, 0 replies.
- [jira] [Created] (NUTCH-2035) Regex filter using case sensitive rules. - posted by "Luis Lopez (JIRA)" <ji...@apache.org> on 2015/06/03 21:26:38 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2035) Regex filter using case sensitive rules. - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/03 23:27:38 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2032) Plugin to index the raw content of a readable document. - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/03 23:38:38 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2034) CrawlDB filtered documents counter. - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/03 23:41:40 UTC, 2 replies.
- [jira] [Comment Edited] (NUTCH-2034) CrawlDB filtered documents counter. - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/03 23:41:40 UTC, 0 replies.
- [jira] [Created] (NUTCH-2036) Adding some continuous crawl goodies to the crawl script - posted by "Jorge Luis Betancourt Gonzalez (JIRA)" <ji...@apache.org> on 2015/06/04 14:51:38 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2036) Adding some continuous crawl goodies to the crawl script - posted by "Jorge Luis Betancourt Gonzalez (JIRA)" <ji...@apache.org> on 2015/06/04 14:52:37 UTC, 3 replies.
- [jira] [Commented] (NUTCH-2036) Adding some continuous crawl goodies to the crawl script - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/06/04 16:11:38 UTC, 5 replies.
- [jira] [Updated] (NUTCH-2035) Regex filter using case sensitive rules. - posted by "Luis Lopez (JIRA)" <ji...@apache.org> on 2015/06/04 18:47:38 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1459 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/06/06 06:00:27 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3152 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/06/06 06:00:27 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #1460 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/06/07 06:03:04 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #3153 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/06/07 06:07:54 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2027) seed list REST endpoint for Nutch 1.10 - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/07 18:32:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2037) Job endpoint to support Indexing from the REST API - posted by "Sujen Shah (JIRA)" <ji...@apache.org> on 2015/06/08 02:03:00 UTC, 0 replies.
- [GitHub] nutch pull request: fix for NUTCH-2037 contributed by Sujen Shah - posted by sujen1412 <gi...@git.apache.org> on 2015/06/08 02:10:43 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2037) Job endpoint to support Indexing from the REST API - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/06/08 02:11:00 UTC, 3 replies.
- Build failed in Jenkins: Nutch-trunk #3154 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/06/08 06:09:31 UTC, 0 replies.
- [jira] [Work started] (NUTCH-2037) Job endpoint to support Indexing from the REST API - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/08 07:59:00 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2037) Job endpoint to support Indexing from the REST API - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/08 07:59:00 UTC, 0 replies.
- [Nutch Wiki] Update of "Nutch_1.X_RESTAPI/RunningJobsTutorial" by SujenShah - posted by Apache Wiki <wi...@apache.org> on 2015/06/08 19:30:27 UTC, 0 replies.
- [Nutch Wiki] Update of "Nutch_1.X_RESTAPI/RunningJobsTutorial/IndexJob" by SujenShah - posted by Apache Wiki <wi...@apache.org> on 2015/06/08 20:44:16 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1076) Solrindex has no documents following bin/nutch solrindex when using protocol-file - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/08 22:41:01 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2017) Remove debug log from MimeUtil - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/08 22:54:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2017) Remove debug log from MimeUtil - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/08 22:54:00 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #3155 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/06/08 23:50:43 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2017) Remove debug log from MimeUtil - posted by "Hudson (JIRA)" <ji...@apache.org> on 2015/06/08 23:51:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2033) parse-tika skips valid documents. - posted by "Luis Lopez (JIRA)" <ji...@apache.org> on 2015/06/09 02:09:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2037) Job endpoint to support Indexing from the REST API - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/09 08:03:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2038) url filter that uses a model (from a classifier) - posted by "Asitang Mishra (JIRA)" <ji...@apache.org> on 2015/06/10 19:38:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2038) url filter that uses a model (from a classifier) - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/10 23:04:00 UTC, 2 replies.
- [jira] [Created] (NUTCH-2039) Relevance based scoring filter - posted by "Sujen Shah (JIRA)" <ji...@apache.org> on 2015/06/11 06:29:00 UTC, 0 replies.
- crawler-commons 0.6 released - posted by Julien Nioche <li...@gmail.com> on 2015/06/11 10:02:57 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2038) url filter that uses a model (from a classifier) - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/13 17:47:00 UTC, 0 replies.
- [jira] [Work started] (NUTCH-2038) url filter that uses a model (from a classifier) - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/13 17:47:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2038) url filter that uses a model (from a classifier) - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/13 17:47:00 UTC, 3 replies.
- [Nutch Wiki] Trivial Update of "NutchTutorial" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/06/13 19:37:19 UTC, 1 replies.
- [Nutch Wiki] Trivial Update of "bin/crawl" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/06/13 20:26:29 UTC, 0 replies.
- [GitHub] nutch pull request: fix for NUTCH-2039 contributed by Sujen Shah - posted by sujen1412 <gi...@git.apache.org> on 2015/06/15 07:59:38 UTC, 5 replies.
- [jira] [Commented] (NUTCH-2039) Relevance based scoring filter - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/06/15 08:00:12 UTC, 24 replies.
- Generate separate fetchlist by host - posted by Sujen Shah <su...@gmail.com> on 2015/06/15 19:50:23 UTC, 0 replies.
- [jira] [Created] (NUTCH-2040) Upgrade to Crawler Commons 0.6 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/15 21:42:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2039) Relevance based scoring filter - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/16 19:42:01 UTC, 1 replies.
- [jira] [Updated] (NUTCH-2038) Naive Bayes classifier based url filter - posted by "Asitang Mishra (JIRA)" <ji...@apache.org> on 2015/06/17 02:17:01 UTC, 1 replies.
- GSoC Reporting and Progress - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/06/17 15:55:52 UTC, 0 replies.
- GSoC Progress and Reporting - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/06/17 16:04:38 UTC, 1 replies.
- [jira] [Created] (NUTCH-2041) indexer fails if linkdb is missing - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/17 16:34:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2041) indexer fails if linkdb is missing - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/17 16:35:01 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2000) Link inversion fails with .locked already exists. - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/17 16:40:00 UTC, 3 replies.
- [jira] [Updated] (NUTCH-2000) Link inversion fails with .locked already exists. - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/17 16:41:01 UTC, 1 replies.
- [Nutch Wiki] Update of "GoogleSummerOfCode/SitemapCrawler/weeklyreport" by CihadGuzel - posted by Apache Wiki <wi...@apache.org> on 2015/06/17 16:41:02 UTC, 6 replies.
- [jira] [Commented] (NUTCH-2038) Naive Bayes classifier based url filter - posted by "Asitang Mishra (JIRA)" <ji...@apache.org> on 2015/06/17 18:51:00 UTC, 50 replies.
- [jira] [Comment Edited] (NUTCH-2038) Naive Bayes classifier based url filter - posted by "Asitang Mishra (JIRA)" <ji...@apache.org> on 2015/06/17 18:52:01 UTC, 2 replies.
- [jira] [Issue Comment Deleted] (NUTCH-2038) Naive Bayes classifier based url filter - posted by "Asitang Mishra (JIRA)" <ji...@apache.org> on 2015/06/17 23:00:01 UTC, 0 replies.
- (Unknown) - posted by Swati Kothari <sw...@usc.edu> on 2015/06/17 23:11:01 UTC, 0 replies.
- Unsubscribe - posted by Swati Kothari <sw...@usc.edu> on 2015/06/17 23:11:18 UTC, 5 replies.
- Re: [jira] [Created] (NUTCH-2000) Link inversion fails with .locked already exists. - posted by Sahil Shah <sa...@gmail.com> on 2015/06/18 03:01:38 UTC, 2 replies.
- [GitHub] nutch pull request: Nutch 2038 - posted by asitang <gi...@git.apache.org> on 2015/06/18 08:31:14 UTC, 12 replies.
- Nutch Committer Workflow - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/06/18 08:52:55 UTC, 1 replies.
- [GitHub] nutch pull request: NUTCH-2038 - posted by asitang <gi...@git.apache.org> on 2015/06/18 17:16:54 UTC, 41 replies.
- [jira] [Created] (NUTCH-2042) parse-html increase chunk size used to detect charset - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/18 17:17:00 UTC, 0 replies.
- Added git@git.apache.org to dev - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2015/06/18 17:24:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2042) parse-html increase chunk size used to detect charset - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/18 17:29:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2043) Interface and high level design for classification using models - posted by "Asitang Mishra (JIRA)" <ji...@apache.org> on 2015/06/18 20:03:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2043) Interface and high level design for classification using models - posted by "Asitang Mishra (JIRA)" <ji...@apache.org> on 2015/06/18 20:06:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2043) Interface and high level design for classification using models - posted by "Asitang Mishra (JIRA)" <ji...@apache.org> on 2015/06/18 20:09:03 UTC, 0 replies.
- [jira] [Created] (NUTCH-2044) Support for an expanded HttpHeaders list - posted by "Soren Scott (JIRA)" <ji...@apache.org> on 2015/06/18 23:51:02 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2039) Relevance based scoring filter - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/19 05:08:01 UTC, 0 replies.
- [jira] [Work started] (NUTCH-2039) Relevance based scoring filter - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/19 05:08:01 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2039) Relevance based scoring filter - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/19 05:21:01 UTC, 0 replies.
- GSOC2015 - sitemap parser - posted by Cihad Guzel <cg...@gmail.com> on 2015/06/21 01:40:50 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3169 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/06/21 06:07:23 UTC, 0 replies.
- [jira] [Commented] (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed - posted by "Bojan Tomic (JIRA)" <ji...@apache.org> on 2015/06/21 12:48:01 UTC, 0 replies.
- [jira] [Created] (NUTCH-2045) index-basic incorrect assignment of next fetch time (page.getFetchTime()) as page fetch time - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/22 20:32:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2045) index-basic incorrect assignment of next fetch time (page.getFetchTime()) as page fetch time - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/22 20:36:00 UTC, 3 replies.
- [jira] [Commented] (NUTCH-2045) index-basic incorrect assignment of next fetch time (page.getFetchTime()) as page fetch time - posted by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/06/23 01:19:01 UTC, 2 replies.
- [jira] [Closed] (NUTCH-1711) Normalizer does not encode exclamation mark - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/06/23 15:07:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1625) IndexerMapReduce skips FETCH_NOTMODIFIED - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/06/23 16:28:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1684) ParseMeta to be added before fetch schedulers are run - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/06/23 16:31:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1692) SegmentReader broken in distributed mode - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/06/23 16:45:02 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1730) Scoring-depth optionally not to increment depth for external hosts - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/06/23 17:00:02 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1838) Host and domain based regex and automaton filtering - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/06/23 17:19:00 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1980) Jexl expressions for CrawlDbReader - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/06/23 17:38:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1335) OutlinkDB to collect unique URL's only - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/06/23 18:10:01 UTC, 0 replies.
- [Nutch Wiki] Update of "GoogleSummerOfCode/SitemapCrawler/weeklyreport" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/06/23 18:49:50 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "GoogleSummerOfCode/SitemapCrawler/weeklyreport" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2015/06/23 18:55:11 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2045) index-basic incorrect assignment of next fetch time (page.getFetchTime()) as page fetch time - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/24 00:20:43 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2045) index-basic incorrect assignment of next fetch time (page.getFetchTime()) as page fetch time - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/24 00:33:42 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1504) Pluggable url partitioner - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/24 03:57:42 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1504) Pluggable url partitioner - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/24 03:58:42 UTC, 1 replies.
- [jira] [Created] (NUTCH-2046) The crawl script should be able to skip an initial injection. - posted by "Luis Lopez (JIRA)" <ji...@apache.org> on 2015/06/24 18:14:04 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2046) The crawl script should be able to skip an initial injection. - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/24 19:08:04 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2046) The crawl script should be able to skip an initial injection. - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/24 19:08:04 UTC, 2 replies.
- [jira] [Updated] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks) - posted by "Asitang Mishra (JIRA)" <ji...@apache.org> on 2015/06/24 19:43:05 UTC, 1 replies.
- Github Spam - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/06/24 21:46:59 UTC, 5 replies.
- [jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks) - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/25 00:20:04 UTC, 28 replies.
- [jira] [Commented] (NUTCH-1692) SegmentReader broken in distributed mode - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/25 00:42:04 UTC, 0 replies.
- [jira] [Created] (NUTCH-2047) Improvements to the relevance scoring plugin - posted by "Sujen Shah (JIRA)" <ji...@apache.org> on 2015/06/25 00:59:04 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2047) Improvements to the relevance scoring plugin - posted by "Sujen Shah (JIRA)" <ji...@apache.org> on 2015/06/25 01:02:04 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1625) IndexerMapReduce skips FETCH_NOTMODIFIED - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/25 01:05:04 UTC, 2 replies.
- [jira] [Comment Edited] (NUTCH-2047) Improvements to the relevance scoring plugin - posted by "Sujen Shah (JIRA)" <ji...@apache.org> on 2015/06/25 01:10:04 UTC, 0 replies.
- [IMPORTANT] Migration Towards HAdoop 2.X --> 3.X - posted by Lewis John Mcgibbney <le...@gmail.com> on 2015/06/25 01:19:22 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks) - posted by "Asitang Mishra (JIRA)" <ji...@apache.org> on 2015/06/25 02:20:05 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2016) Remove OldFetcher from trunk - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/06/25 10:58:04 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2041) indexer fails if linkdb is missing - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/06/25 10:58:04 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-2036) Adding some continuous crawl goodies to the crawl script - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/25 14:00:06 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2036) Adding some continuous crawl goodies to the crawl script - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2015/06/25 15:57:06 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-1416) IndexerMapReduce can index older version of a document instead of latest one - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/25 17:12:05 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1517) CloudSearch indexer - posted by "Ji Kwon Lim (JIRA)" <ji...@apache.org> on 2015/06/25 20:03:04 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2000) Link inversion fails with .locked already exists. - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/25 20:43:05 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2016) Remove unused class OldFetcher - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/25 20:50:05 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2016) Remove unused class OldFetcher - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/25 20:51:05 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2000) Link inversion fails with .locked already exists. - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/25 20:52:04 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2041) indexer fails if linkdb is missing - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/25 21:09:04 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1335) OutlinkDB to collect unique URL's only - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/25 21:18:06 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-2041) indexer fails if linkdb is missing - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/25 21:21:04 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1684) ParseMeta to be added before fetch schedulers are run - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/25 21:32:04 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1730) Scoring-depth optionally not to increment depth for external hosts - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/25 22:11:04 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2016) Remove unused class OldFetcher - posted by "Hudson (JIRA)" <ji...@apache.org> on 2015/06/25 22:17:05 UTC, 1 replies.
- GSOC midterm report - posted by Cihad Guzel <cg...@gmail.com> on 2015/06/25 23:59:58 UTC, 1 replies.
- [Nutch Wiki] Trivial Update of "GoogleSummerOfCode/SitemapCrawler" by CihadGuzel - posted by Apache Wiki <wi...@apache.org> on 2015/06/26 10:45:23 UTC, 0 replies.
- [Nutch Wiki] Update of "GoogleSummerOfCode/SitemapCrawler/midtermreport" by CihadGuzel - posted by Apache Wiki <wi...@apache.org> on 2015/06/26 11:07:27 UTC, 0 replies.
- [jira] [Created] (NUTCH-2048) parset-tika: fix dependencies in plugin.xml - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/26 12:16:04 UTC, 0 replies.
- [jira] [Created] (NUTCH-2049) Upgrade Trunk to Hadoop > 2.6 stable - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/26 21:33:04 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2049) Upgrade Trunk to Hadoop > 2.6 stable - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/26 21:35:04 UTC, 0 replies.
- [jira] [Created] (NUTCH-2050) Upgrade HBase and Hadoop versioning on 2.X Docker - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/06/27 00:23:04 UTC, 0 replies.
- [jira] [Created] (NUTCH-2051) Add 'publish-local-m2' ant target for creating local maven artifacts - posted by "Matt DeBoer (JIRA)" <ji...@apache.org> on 2015/06/27 09:23:04 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2051) Add 'publish-local-m2' ant target for creating local maven artifacts - posted by "Matt DeBoer (JIRA)" <ji...@apache.org> on 2015/06/27 09:25:04 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks) - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/29 07:16:05 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3181 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/06/29 07:49:47 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks) - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/06/29 09:43:04 UTC, 0 replies.
- [jira] [Created] (NUTCH-2052) Enhance index-static to allow configurable delimiters - posted by "Peter Ciuffetti (JIRA)" <ji...@apache.org> on 2015/06/29 19:17:04 UTC, 0 replies.
- [Nutch Wiki] Update of "NutchScoring" by SujenShah - posted by Apache Wiki <wi...@apache.org> on 2015/06/29 23:01:37 UTC, 0 replies.
- [Nutch Wiki] Update of "SimilarityScoringFilter" by SujenShah - posted by Apache Wiki <wi...@apache.org> on 2015/06/29 23:24:42 UTC, 1 replies.
- [GitHub] nutch pull request: Nutch 2052 - Enhancement to index-static to al... - posted by PeterCiuffetti <gi...@git.apache.org> on 2015/06/30 00:41:42 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2052) Enhance index-static to allow configurable delimiters - posted by "Peter Ciuffetti (JIRA)" <ji...@apache.org> on 2015/06/30 00:48:04 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #3182 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2015/06/30 06:41:36 UTC, 0 replies.
- [jira] [Work started] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks) - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/06/30 07:01:05 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1464) index-static plugin doesn't allow the colon within the field value - posted by "Peter Ciuffetti (JIRA)" <ji...@apache.org> on 2015/06/30 13:33:05 UTC, 0 replies.
- Class not found exception when adding external jars in a plugin!!! - posted by Asitang Mishra <as...@usc.edu> on 2015/06/30 20:56:11 UTC, 0 replies.