You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (NUTCH-1778) Generator not logging number of URLs in batch correctly - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/03 22:51:13 UTC, 3 replies.
- Build failed in Jenkins: Nutch-trunk #2883 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/12/04 05:00:28 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1251 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/12/04 05:00:36 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1778) Generator not logging number of URLs in batch correctly - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/04 23:18:13 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1877) Suffix URL filter to ignore query string by default - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/04 23:32:12 UTC, 1 replies.
- Jenkins build is back to normal : Nutch-nutchgora #1252 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/12/05 05:03:50 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #2884 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/12/05 05:07:08 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1877) Suffix URL filter to ignore query string by default - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/05 20:56:12 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1778) Generator not logging number of URLs in batch correctly - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/05 21:38:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1877) Suffix URL filter to ignore query string by default - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/12/05 21:42:12 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1829) Generator : unable to distinguish real errors - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/05 21:44:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1709) Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/12/05 22:18:13 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1829) Generator : unable to distinguish real errors - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/12/05 22:19:12 UTC, 1 replies.
- [jira] [Assigned] (NUTCH-1779) Apply formatting to the code - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/12/05 22:20:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1823) Upgrade to elasticsearch 1.2/1.3 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/12/05 22:21:14 UTC, 1 replies.
- [jira] [Assigned] (NUTCH-1823) Upgrade to elasticsearch 1.2/1.3 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/12/06 00:47:13 UTC, 0 replies.
- [jira] [Created] (NUTCH-1893) Parse-tika plugin seems broken when parsing some feed file - posted by "Mengying Wang (JIRA)" <ji...@apache.org> on 2014/12/06 08:32:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1893) Parse-tika plugin seems broken when parsing some feed file - posted by "Mengying Wang (JIRA)" <ji...@apache.org> on 2014/12/06 12:34:19 UTC, 1 replies.
- Re: [nsf-polar-usc-students] Parse-tika plugin with tika (1.7-SNAPSHOT) can't retrieve any parser - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2014/12/06 18:11:22 UTC, 0 replies.
- [jira] [Created] (NUTCH-1894) Revert "Normalize duplicate slashes in URL's" - posted by "Jigal van Hemert (JIRA)" <ji...@apache.org> on 2014/12/08 11:30:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1894) Revert "Normalize duplicate slashes in URL's" - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/12/08 16:19:12 UTC, 1 replies.
- [jira] [Assigned] (NUTCH-1778) Generator not logging number of URLs in batch correctly - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/08 20:48:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1778) Generator not logging number of URLs in batch correctly - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/08 20:49:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1829) Generator : unable to distinguish real errors - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/08 20:51:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1893) Parse-tika plugin seems broken when parsing some feed file - posted by "Mengying Wang (JIRA)" <ji...@apache.org> on 2014/12/09 16:08:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1592) XPath works on documents parsed with parse-html but not parse-tika - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/12/09 16:26:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1823) Upgrade to elasticsearch 1.4.1 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/12/09 19:34:13 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1823) Upgrade to elasticsearch 1.2/1.3 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/12/09 19:34:13 UTC, 0 replies.
- Re: Crawling a site and saving the page html exactly as is in a database - posted by Xavier Morera <xa...@familiamorera.com> on 2014/12/09 22:35:07 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1823) Upgrade to elasticsearch 1.4.1 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/12/09 23:18:12 UTC, 9 replies.
- [jira] [Created] (NUTCH-1895) run() method in Crawler.java doesnt put Nutch.ARG_BATCH in argMap - posted by "FeiTian (JIRA)" <ji...@apache.org> on 2014/12/10 03:33:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1895) run() method in Crawler.java doesnt put Nutch.ARG_BATCH in argMap - posted by "FeiTian (JIRA)" <ji...@apache.org> on 2014/12/10 03:35:12 UTC, 3 replies.
- [jira] [Comment Edited] (NUTCH-1823) Upgrade to elasticsearch 1.4.1 - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/12/10 10:26:13 UTC, 0 replies.
- HttpPostAuthentication - posted by Tizy Ninan <ti...@gmail.com> on 2014/12/10 11:02:18 UTC, 0 replies.
- Re: Already subscribed to dev@nutch.apache.org - posted by Tizy Ninan <ti...@gmail.com> on 2014/12/10 12:04:42 UTC, 0 replies.
- Not able to crawl a website using Nutch - posted by "Thalatam, Venkata naveen" <ve...@bankofamerica.com> on 2014/12/10 12:54:41 UTC, 2 replies.
- [jira] [Created] (NUTCH-1896) SolrDeleteDuplicates does not use the mapped Solr field names from solrindex-mapping.xml - posted by "Brian (JIRA)" <ji...@apache.org> on 2014/12/10 17:28:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1896) SolrDeleteDuplicates does not use the mapped Solr field names from solrindex-mapping.xml - posted by "Brian (JIRA)" <ji...@apache.org> on 2014/12/10 17:30:15 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1592) TikaParser can uppercase the element names while generating the DOM - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/12/11 12:40:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1592) TikaParser can uppercase the element names while generating the DOM - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/12/11 12:41:13 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-1592) TikaParser can uppercase the element names while generating the DOM - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/12/11 12:41:13 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1592) TikaParser can uppercase the element names while generating the DOM - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/12/11 12:43:13 UTC, 0 replies.
- help regarding nutch headings plugin - posted by Krishna Chaitanya <kk...@gmail.com> on 2014/12/11 15:04:41 UTC, 0 replies.
- [jira] [Created] (NUTCH-1897) Easier debugging of plugin XML errors - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/12/11 15:32:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1897) Easier debugging of plugin XML errors - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/12/11 15:33:13 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1897) Easier debugging of plugin XML errors - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/12/11 16:23:13 UTC, 4 replies.
- [jira] [Updated] (NUTCH-1898) Add -dumpRawText prameter to parsechecker tool - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/12/11 19:40:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1898) Add -dumpRawHTML prameter to parsechecker tool - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/12/11 19:40:13 UTC, 1 replies.
- [jira] [Created] (NUTCH-1898) Add -dumpRawText prameter to parsechecker tool - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/12/11 19:40:13 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1895) run() method in Crawler.java doesnt put Nutch.ARG_BATCH in argMap - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/11 21:50:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1898) Add -dumpRawHTML prameter to parsechecker tool - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/11 22:40:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1856) Document webpage.avsc and host.avsc - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/12/12 01:11:13 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1856) Document webpage.avsc and host.avsc - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/12/12 01:11:13 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1823) Upgrade to elasticsearch 1.4.1 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/12/12 01:21:14 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1897) Easier debugging of plugin XML errors - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/12/12 11:17:13 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1897) Easier debugging of plugin XML errors - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/12/12 11:30:13 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1263 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/12/12 11:39:53 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1264 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/12/13 05:01:45 UTC, 3 replies.
- Jenkins build is back to normal : Nutch-nutchgora #1265 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/12/14 05:03:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1797) remove unused package o.a.n.html - posted by "Saurabh Chhajed (JIRA)" <ji...@apache.org> on 2014/12/15 07:40:13 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1888) Specify HTMLMapper to use in TikaParser - posted by "Halil Simsek (JIRA)" <ji...@apache.org> on 2014/12/15 15:44:13 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1888) Specify HTMLMapper to use in TikaParser - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/12/15 17:56:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1888) Specify HTMLMapper to use in TikaParser - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/12/15 18:42:13 UTC, 0 replies.
- [jira] [Created] (NUTCH-1899) upgrade restlet lib to prevent build failure - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/15 22:10:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1899) upgrade restlet lib to prevent build failure - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/15 22:17:14 UTC, 2 replies.
- [jira] [Assigned] (NUTCH-1899) upgrade restlet lib to prevent build failure - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/12/16 09:17:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1899) upgrade restlet lib to prevent build failure - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/12/16 09:25:13 UTC, 0 replies.
- [jira] [Created] (NUTCH-1900) DockerFile for Nutch 2.x - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/12/16 09:29:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1900) DockerFile for Nutch 2.x - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/12/16 09:30:13 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-1709) Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/12/16 13:09:13 UTC, 0 replies.
- [jira] [Created] (NUTCH-1901) ability to tag Urls and index those tags using Solr - posted by "Krishna Chaitanya (JIRA)" <ji...@apache.org> on 2014/12/16 13:24:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1901) ability to tag Urls and index those tags using Solr - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/12/16 14:10:13 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1901) ability to tag Urls and index those tags using Solr - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/12/16 14:11:15 UTC, 0 replies.
- nutch 2.2.1 inject error on Windows - posted by Hesham Hussein <v-...@outlook.com> on 2014/12/16 19:57:29 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1797) remove unused package o.a.n.html - posted by "Saurabh Chhajed (JIRA)" <ji...@apache.org> on 2014/12/16 20:58:13 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1797) remove unused package o.a.n.html - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/16 21:59:13 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1797) remove unused package o.a.n.html - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/16 22:00:14 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1797) remove unused package o.a.n.html - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/16 22:01:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1863) Add JSON format dump output to readdb command - posted by "Tamer Yousef (JIRA)" <ji...@apache.org> on 2014/12/17 19:56:14 UTC, 0 replies.
- [jira] [Created] (NUTCH-1902) Missing nekohtml.jar - posted by "Cao Manh Dat (JIRA)" <ji...@apache.org> on 2014/12/18 11:06:13 UTC, 0 replies.
- resolve-default failed with branch 2.x on svn - posted by Đạt Cao Mạnh <ca...@gmail.com> on 2014/12/18 11:24:10 UTC, 0 replies.
- [jira] [Created] (NUTCH-1903) Resolve-default failed with branch 2.x - posted by "Cao Manh Dat (JIRA)" <ji...@apache.org> on 2014/12/19 05:20:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1903) Resolve-default failed with branch 2.x - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/19 22:26:14 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1902) Missing nekohtml.jar - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/19 22:35:14 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1831) compiling against gora-0.5 fails - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/19 22:46:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1810) Duplicate jdom dependency - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/19 23:42:13 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-1638) SolrWriter Bad String comparision - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/20 00:04:14 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1895) run() method in Crawler.java doesnt put Nutch.ARG_BATCH in argMap - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/20 00:06:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1755) Project name bug in build.xml - posted by "Juan GF (JIRA)" <ji...@apache.org> on 2014/12/21 23:24:13 UTC, 3 replies.
- [jira] [Comment Edited] (NUTCH-1755) Project name bug in build.xml - posted by "Juan GF (JIRA)" <ji...@apache.org> on 2014/12/21 23:26:13 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2905 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/12/22 05:07:51 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "Nutch2Tutorial" by SebastianNagel - posted by Apache Wiki <wi...@apache.org> on 2014/12/22 23:50:32 UTC, 0 replies.
- nutch 2.2.1 inject error on Windows‏ - posted by Hesham Hussein <v-...@outlook.com> on 2014/12/23 02:42:53 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #2906 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/12/23 05:07:01 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1834) GeneratorMapper behavior depends on log level - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/23 16:50:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1834) GeneratorMapper behavior depends on log level - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/23 16:50:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1834) GeneratorMapper behavior depends on log level - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/23 16:51:13 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1834) GeneratorMapper behavior depends on log level - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/23 17:04:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1842) crawl.gen.delay has a wrong default value in nutch-default.xml or is being parsed incorrectly - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/12/23 23:00:14 UTC, 0 replies.
- elasticindex error job failed: name=elastic-index‏ - posted by Hesham Hussein <v-...@outlook.com> on 2014/12/26 16:30:43 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1903) Resolve-default failed with branch 2.x - posted by "Cao Manh Dat (JIRA)" <ji...@apache.org> on 2014/12/27 02:37:13 UTC, 0 replies.
- Re: elasticindex error job failed: name=elastic-index‏ - posted by Talat Uyarer <ta...@uyarer.com> on 2014/12/27 10:38:45 UTC, 0 replies.
- Re: elasticindex error job failed: name=elastic-index - posted by Talat Uyarer <ta...@uyarer.com> on 2014/12/29 20:52:44 UTC, 1 replies.