You are viewing a plain text version of this content. The canonical link for it is here.
- Build failed in Jenkins: Nutch-trunk #2063 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/01 05:16:56 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #450 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/01 05:18:20 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series - posted by "J. Gobel (JIRA)" <ji...@apache.org> on 2013/01/01 13:08:12 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series - posted by "J. Gobel (JIRA)" <ji...@apache.org> on 2013/01/01 13:08:13 UTC, 0 replies.
- [jira] [Created] (NUTCH-1511) Metadata in MYSQL updated with 'garbage' - posted by "J. Gobel (JIRA)" <ji...@apache.org> on 2013/01/01 14:56:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1511) Metadata in MYSQL updated with 'garbage' - posted by "J. Gobel (JIRA)" <ji...@apache.org> on 2013/01/01 15:00:12 UTC, 6 replies.
- [jira] [Commented] (NUTCH-1511) Metadata in MYSQL updated with 'garbage' - posted by "kiran (JIRA)" <ji...@apache.org> on 2013/01/01 19:00:16 UTC, 4 replies.
- [jira] [Closed] (NUTCH-1511) Metadata in MYSQL updated with 'garbage' - posted by "J. Gobel (JIRA)" <ji...@apache.org> on 2013/01/01 23:36:12 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2064 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/02 05:14:40 UTC, 3 replies.
- Build failed in Jenkins: Nutch-nutchgora #451 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/02 05:15:51 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-1511) Metadata in MYSQL updated with 'garbage' - posted by "J. Gobel (JIRA)" <ji...@apache.org> on 2013/01/02 16:40:12 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1511) Metadata in MYSQL updated with 'garbage' - posted by "J. Gobel (JIRA)" <ji...@apache.org> on 2013/01/02 16:44:13 UTC, 0 replies.
- [jira] [Issue Comment Deleted] (NUTCH-1511) Metadata in MYSQL updated with 'garbage' - posted by "J. Gobel (JIRA)" <ji...@apache.org> on 2013/01/02 16:46:12 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2065 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/03 05:16:33 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #452 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/03 05:16:43 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1494) RSS feed plugin seems broken - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/03 09:20:14 UTC, 8 replies.
- [jira] [Commented] (NUTCH-1053) Parsing of RSS feeds fails - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/03 09:32:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1274) Fix [cast] javac warnings - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/03 12:12:12 UTC, 3 replies.
- [Nutch Wiki] Trivial Update of "NutchAdministrationUserInterface" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2013/01/03 14:19:31 UTC, 1 replies.
- Build failed in Jenkins: Nutch-trunk #2066 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/03 14:31:45 UTC, 0 replies.
- Re: Nutch Admin Interface (looking for work) - posted by Lewis John Mcgibbney <le...@gmail.com> on 2013/01/03 14:32:04 UTC, 0 replies.
- Re: Problems with activation.jar - posted by Lewis John Mcgibbney <le...@gmail.com> on 2013/01/03 14:54:54 UTC, 0 replies.
- [jira] [Created] (NUTCH-1512) SegmentMerger to normalize - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/03 15:42:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1512) SegmentMerger to normalize - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/03 15:46:12 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2067 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/04 05:10:32 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #453 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/04 05:13:51 UTC, 0 replies.
- [jira] [Created] (NUTCH-1513) Support Robots.txt for Ftp urls - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/04 09:56:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1513) Support Robots.txt for Ftp urls - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/04 10:04:13 UTC, 3 replies.
- Build failed in Jenkins: Nutch-nutchgora #454 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/05 05:16:07 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2068 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/05 05:16:37 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2069 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/06 05:12:44 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #455 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/06 05:14:02 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1514) Phase out the deprecated configuration properties (if possible) - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/06 16:08:12 UTC, 1 replies.
- [jira] [Created] (NUTCH-1514) Phase out the deprecated configuration properties (if possible) - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/06 16:08:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1514) Phase out the deprecated configuration properties (if possible) - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/01/06 21:26:12 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1513) Support Robots.txt for Ftp urls - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/06 23:10:12 UTC, 1 replies.
- Build failed in Jenkins: Nutch-trunk #2070 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/07 05:16:40 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #456 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/07 05:21:21 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/07 09:02:13 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1508) Port limit crawler to defined depth to 2.x - posted by "Ferdy Galema (JIRA)" <ji...@apache.org> on 2013/01/07 11:16:13 UTC, 2 replies.
- [jira] [Comment Edited] (NUTCH-1508) Port limit crawler to defined depth to 2.x - posted by "Ferdy Galema (JIRA)" <ji...@apache.org> on 2013/01/07 11:16:16 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/01/07 16:24:12 UTC, 11 replies.
- Build failed in Jenkins: Nutch-trunk #2071 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/08 00:11:46 UTC, 0 replies.
- Failing Nightly Builds - posted by Lewis John Mcgibbney <le...@gmail.com> on 2013/01/08 00:24:11 UTC, 4 replies.
- [jira] [Commented] (NUTCH-1245) URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 00:58:13 UTC, 1 replies.
- [jira] [Commented] (NUTCH-978) A Plugin for extracting certain element of a web page on html page parsing. - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 01:40:13 UTC, 4 replies.
- [jira] [Updated] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default. - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 01:44:12 UTC, 4 replies.
- [jira] [Updated] (NUTCH-1507) Remove FetcherOutput - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 01:46:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1509) Implement read/write in NutchField - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 01:46:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1506) Add UPDATE action to NutchIndexAction - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 01:48:13 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1505) java.lang.IllegalArgumentException during updatedb - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 01:50:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1505) java.lang.IllegalArgumentException during updatedb - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 01:50:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-710) Support for rel="canonical" attribute - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 01:52:13 UTC, 0 replies.
- [PROPOSAL] Base Testing class for Scoring plugins - posted by Lewis John Mcgibbney <le...@gmail.com> on 2013/01/08 04:02:03 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1494) RSS feed plugin seems broken - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/08 04:16:13 UTC, 2 replies.
- [jira] [Commented] (NUTCH-840) Port tests from parse-html to parse-tika - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 04:28:13 UTC, 5 replies.
- [jira] [Updated] (NUTCH-840) Port tests from parse-html to parse-tika - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 04:28:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1119) JUnit test for index-static - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 04:34:13 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-1119) JUnit test for index-static - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 04:36:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1224) Migrate FreeGenerator to MapReduce API - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 04:38:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1127) JUnit test for urlfilter-validator - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 04:56:12 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-1127) JUnit test for urlfilter-validator - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/08 04:58:12 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #457 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/08 05:15:57 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2072 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/08 05:24:08 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1506) Add UPDATE action to NutchIndexAction - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/08 12:38:16 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1507) Remove FetcherOutput - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/08 12:40:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1509) Implement read/write in NutchField - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/08 12:40:13 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2073 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/09 05:21:01 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #458 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/09 05:23:42 UTC, 0 replies.
- Fwd: UnknownHostException after upgrade from 1.0.3 > 1.1.1 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2013/01/09 23:21:56 UTC, 1 replies.
- Build failed in Jenkins: Nutch-trunk #2074 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/10 00:14:42 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1494) RSS feed plugin seems broken - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/10 00:32:14 UTC, 0 replies.
- [jira] [Created] (NUTCH-1515) RSS plugin broken and won't compile - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/10 00:34:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1513) Support Robots.txt for Ftp urls - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/10 00:42:13 UTC, 1 replies.
- Build failed in Jenkins: Nutch-trunk #2075 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/10 01:47:32 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1308) Unnecessary truncate content configuration, and logging in parse-zip/ZipParser - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/10 03:54:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1308) Unnecessary truncate content configuration, and logging in parse-zip/ZipParser - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/10 03:56:12 UTC, 0 replies.
- Build failed in Jenkins: Nutch-2.x-Windows #1 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/10 04:09:34 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk-Windows #1 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/10 04:10:44 UTC, 0 replies.
- Jenkins Builds - posted by Lewis John Mcgibbney <le...@gmail.com> on 2013/01/10 04:15:40 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1308) Unnecessary truncate content configuration, and logging in parse-zip/ZipParser - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/10 04:22:12 UTC, 0 replies.
- Build failed in Jenkins: nutch-2.x-maven #1 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/10 04:25:51 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2076 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/10 05:17:31 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #459 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/10 05:23:46 UTC, 0 replies.
- Build failed in Jenkins: Nutch-2.x-Windows #2 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/10 05:30:42 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk-Windows #2 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/10 05:34:37 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-978) A Plugin for extracting certain element of a web page on html page parsing. - posted by "Emmanuel Colin (JIRA)" <ji...@apache.org> on 2013/01/10 11:12:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1047) Pluggable indexing backends - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/01/10 17:46:13 UTC, 4 replies.
- Build failed in Jenkins: Nutch-trunk #2077 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/10 21:12:04 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2078 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/10 21:31:09 UTC, 0 replies.
- [jira] [Created] (NUTCH-1516) Nutch 2.x pom.xml out of sync with ivy.xml - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/10 22:02:13 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1516) Nutch 2.x pom.xml out of sync with ivy.xml - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/10 22:12:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1516) Nutch 2.x pom.xml out of sync with ivy.xml - posted by "Hudson (JIRA)" <ji...@apache.org> on 2013/01/10 22:22:15 UTC, 2 replies.
- Build failed in Jenkins: Nutch-2.x-Windows #3 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/10 23:14:17 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #460 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/10 23:14:26 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "CommandLineOptions" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2013/01/11 02:42:20 UTC, 4 replies.
- [Nutch Wiki] Trivial Update of "bin/nutch_hostinject" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2013/01/11 02:59:46 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2079 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/11 05:18:32 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #461 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/11 05:21:08 UTC, 0 replies.
- Build failed in Jenkins: Nutch-2.x-Windows #4 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/11 05:29:12 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk-Windows #3 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/11 05:38:05 UTC, 0 replies.
- Which is the main branch of nutch 2 ? - posted by vetus <ve...@isac.cat> on 2013/01/11 09:29:56 UTC, 2 replies.
- [jira] [Created] (NUTCH-1517) CloudSearch indexer - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/01/11 18:42:12 UTC, 0 replies.
- Nightly Builds Nearly fixed - posted by Lewis John Mcgibbney <le...@gmail.com> on 2013/01/11 19:48:21 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1274) Fix [cast] javac warnings - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 05:00:12 UTC, 7 replies.
- Jenkins build is back to normal : Nutch-trunk #2080 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/12 05:21:56 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1419) parsechecker and indexchecker to report protocol status - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 05:44:13 UTC, 2 replies.
- [jira] [Assigned] (NUTCH-1274) Fix [cast] javac warnings - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 06:20:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default. - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/12 09:00:15 UTC, 7 replies.
- [jira] [Resolved] (NUTCH-1274) Fix [cast] javac warnings - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 17:38:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1042) Fetcher.max.crawl.delay property not taken into account correctly when set to -1 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 17:48:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-802) Problems managing outlinks with large url length - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 18:38:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1472) InvalidRequestException(why:(String didn't validate.) [webpage][f][ts] failed validation) - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 18:46:12 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1436) bin/nutch absent in zip package - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 18:48:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1495) -normalize and -filter for updatedb command in nutch 2.x - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 18:50:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1190) MoreIndexingFilter refactor: move data formats used to parse "lastModified" to a config file. - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:00:12 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1015) MoreIndexingFilter: can't parse erroneous date: 2006-05-24T20:03:42 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:26:13 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #463 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/12 19:31:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1483) Can't crawl filesystem with protocol-file plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:36:12 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1461) Problem with TableUtil - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:38:13 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2013/01/12 19:39:38 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-1094) create comprehensive documentation for Nutchgora branch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:42:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1447) Nutch 2.x with Cloudera CDH 4 get Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:44:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1473) Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:44:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1418) error parsing robots rules- can't decode path: /wiki/Wikipedia%3Mediation_Committee/ - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:46:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1458) Support for raw HTML field added to Solr - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:46:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1452) hadoop.job.history.user.location in nutch-default making job history useless - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:48:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1457) Nutch2 Refactor the update process so that fetched items are only processed once - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:48:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-806) Merge CrawlDBScanner with CrawlDBReader - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:52:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1410) impact of a map-reduce problem - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:52:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1502) Test for CrawlDatum state transitions - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:54:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1481) When using MySQL as storage unicode characters within URLS cause nutch to fail - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:54:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1490) Data Truncation exceptions when using mysql - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:54:12 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1487) Nutch parse fails first time for PDF files and works on reparse - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:56:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1297) it is better for fetchItemQueues to select items from greater queues first - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:56:13 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1267) urlmeta to delegate indexing to index-metadata - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:56:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1286) Refactoring/reimplementing crawling API (NutchApp) - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:56:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1268) parse-meta to delegate indexing to index-metadata - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:56:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1282) linkdb scalability - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:58:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1303) Fetcher to skip queues for URLS getting repeated exceptions, based on percentage - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:58:12 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1269) Generate main problems - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:58:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1270) some of Deflate encoded pages not fetched - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:58:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1281) tika parser not work properly with unwanted file types that passed from filters in nutch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:00:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1278) Fetch Improvement in threads per host - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:04:13 UTC, 1 replies.
- [jira] [Updated] (NUTCH-926) Nutch follows wrong url in - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:04:13 UTC, 0 replies.
-
[jira] [Updated] (NUTCH-881) Good quality documentation for Nutch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:06:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1253) Incompatible neko and xerces versions - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:06:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1257) Support for the x-robots-tag HTTP Header - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:08:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:08:14 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1080) Type safe members , arguments for better readability - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:10:12 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1076) Solrindex has no documents following bin/nutch solrindex when using protocol-file - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:10:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1371) Replace Ivy with Maven Ant tasks - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:14:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1387) All parsers should respond to cancellation / interrupts. - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:16:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1382) Adding support for EmbeddedSolrServer to SolrIndexer - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:16:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1375) extract main content of a html file - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:18:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1334) NPE in FetcherOutputFormat - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:18:13 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1329) parser not extract outlinks to external web sites - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:18:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1321) IDNNormalizer - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:20:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1315) reduce speculation on but ParseOutputFormat doesn't name output files correctly? - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:20:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1309) fetch queue management - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:20:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-933) Fetcher does not save a pages Last-Modified value in CrawlDatum - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:22:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-929) Create a REST-based admin UI for Nutch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:22:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-969) FTP erro encoding - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:22:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-968) Crawling - File Error 404 when fetching file with an chinese word in the file name - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:22:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-891) Nutch build should not depend on unversioned local deps - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:22:13 UTC, 1 replies.
- [jira] [Updated] (NUTCH-952) fix outlink which started with '?' in html parser - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:24:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-649) Log list of files found but not crawled. - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:24:13 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-960) Language ID - confidence factor - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:26:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-945) Indexing to multiple SOLR Servers - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:28:12 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-734) option to filter "a" tag text - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:30:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-745) MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:30:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-685) Content-level redirect status lost in ParseSegment - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:32:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-583) FeedParser empty links for items - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:34:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-356) Plugin repository cache can lead to memory leak - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:38:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-366) Merge URLFilters and URLNormalizers - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:38:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-475) Adaptive crawl delay - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:38:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-314) Multiple language identifier instances - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:40:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-207) Bandwidth target for fetcher rather than a thread count - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:40:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1508) Port limit crawler to defined depth to 2.x - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:40:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-802) Problems managing outlinks with large url length - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:42:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-795) Add ability to maintain nofollow attribute in linkdb - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:42:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:42:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-789) Improvements to Tika parser - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:44:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1505) java.lang.IllegalArgumentException during updatedb - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:44:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-804) CrawlDatum.statNames can be modified - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:44:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-813) Repetitive crawl 403 status page - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:46:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:46:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1464) index-static plugin doesn't allow the colon within the field value - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:46:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:48:12 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1485) TableUtil reverseURL to keep userinfo part - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:48:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1182) fetcher should track and shut down hung threads - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:48:13 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1018) Solr Document Size Limit - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:50:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1007) Add readdb -host output - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:52:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/01/12 20:54:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1345) JAVA_HOME should not be required - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:56:12 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-1316) create EmbeddedNutchInstance testing utility class - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:00:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1313) Nutch trunk add response headers to datastore for the protocol-httpclient plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:00:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-966) Behavior of NOINDEX,FOLLOW is not intuitive - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:02:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-813) Repetitive crawl 403 status page - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/01/12 21:02:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-911) recrawls file protocol causes Errors/Exceptions when actually not modified or gone - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:02:13 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-910) Cached.jsp has a bug with encoding - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:04:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-923) Multilingual support for Solr-index-mapping - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:04:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-625) Non-ascii character broken in dumped content for mixed encoding (utf-8 and multi-byte) - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:06:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-829) duplicate hadoop temp files - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:06:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-670) feed plugin does not parse RSS2 enclosures - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:08:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-609) Allow Plugins to be Loaded from Jar File(s) - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:08:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-664) Possibility to update already stored documents. - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:10:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-750) HtmlParser plugin - page title extraction - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:10:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-718) urlfilter-subnets plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:10:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-737) urlnormalizer-unalias plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:12:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-690) bug in DomContentUtils.shouldThrowAwayLink? - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:12:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-589) Hierarchical Classloaders - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:14:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-566) Sun's URL class has bug in creation of relative query URLs - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:14:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-569) Protocol plugins should report progress to the fetcher - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:14:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation. - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:16:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-431) Move plugin specific properties out of nutch-site.xml and into specific conf files for plugins - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:16:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-410) Faster RegexNormalize with more features - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:18:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-409) Add "short circuit" notion to filters to speedup mixed site/subsite crawling - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:18:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-449) Format of junit output should be configurable - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:18:13 UTC, 1 replies.
- [jira] [Updated] (NUTCH-386) Plugin to index categories by url rules - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:20:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-351) Protocol forward proxy - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:20:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-490) Extension point with filters for Neko HTML parser (with patch) - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:22:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-346) Improve readability of logs/hadoop.log - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:22:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-342) Nutch commands log to nutch/logs/hadoop.logs by default - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:22:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-477) Extend URLFilters to support different filtering chains - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:22:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-213) checkstyle - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:24:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-248) add support for internationalized domain names - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:24:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-215) Plugin execution order - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:26:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-737) urlnormalizer-unalias plugin - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/12 21:28:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-49) Flag for generate to fetch only new pages to complement the -refetchonly flag - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:28:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-693) Add configurable option for treating nofollow behaviour. - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:28:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1500) bin/crawl fails on step solrindex with wrong path to segment - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:30:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-49) Flag for generate to fetch only new pages to complement the -refetchonly flag - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/12 21:32:13 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1489) elasticindex should report the indexed documents like solrindex does - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 21:32:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-693) Add configurable option for treating nofollow behaviour. - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/12 21:34:12 UTC, 2 replies.
- [jira] [Closed] (NUTCH-693) Add configurable option for treating nofollow behaviour. - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/12 21:42:12 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2082 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/13 05:20:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1257) Support for the x-robots-tag HTTP Header - posted by "Mike (JIRA)" <ji...@apache.org> on 2013/01/13 09:16:12 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1257) Support for the x-robots-tag HTTP Header - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/13 12:06:12 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "NutchMeetUps" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2013/01/14 00:36:17 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1371) Replace Ivy with Maven Ant tasks - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/14 01:42:12 UTC, 1 replies.
- [jira] [Assigned] (NUTCH-1371) Replace Ivy with Maven Ant tasks - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/14 01:42:13 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #2083 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/14 05:23:23 UTC, 0 replies.
- [ANNOUNCE] New Nutch committer and PMC : Tejas Patil - posted by Julien Nioche <li...@gmail.com> on 2013/01/14 09:49:27 UTC, 3 replies.
- Build failed in Jenkins: Nutch-trunk #2084 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/15 05:11:07 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #466 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/15 05:11:07 UTC, 0 replies.
- pass null NutchDocument BasicIndexingFilter - posted by feng lu <am...@gmail.com> on 2013/01/15 07:54:33 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1453) Substantiate tests for IndexingFilters - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/01/15 08:04:13 UTC, 6 replies.
- [jira] [Updated] (NUTCH-1453) Substantiate tests for IndexingFilters - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/01/15 08:04:13 UTC, 4 replies.
- [jira] [Created] (NUTCH-1518) session cookies support - posted by "David Michael Gang (JIRA)" <ji...@apache.org> on 2013/01/15 08:58:14 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "bin/nutch_inject" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2013/01/15 18:52:32 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1518) session cookies support - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/15 19:08:13 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1518) session cookies support - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/15 19:08:13 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1518) session cookies support - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/15 21:22:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-827) HTTP POST Authentication - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/15 21:30:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1087) Deprecate crawl command and replace with example script - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/01/15 22:14:15 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-1500) bin/crawl fails on step solrindex with wrong path to segment - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/01/15 22:26:12 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #2085 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/15 23:11:26 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1500) bin/crawl fails on step solrindex with wrong path to segment - posted by "Hudson (JIRA)" <ji...@apache.org> on 2013/01/15 23:12:15 UTC, 1 replies.
- [Nutch Wiki] Trivial Update of "PluginCentral" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2013/01/15 23:53:34 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1453) Substantiate tests for IndexingFilters - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/16 03:13:14 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #467 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/16 05:12:13 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1453) Substantiate tests for IndexingFilters - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/01/16 06:46:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1506) Add UPDATE action to NutchIndexAction - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/16 12:08:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1507) Remove FetcherOutput - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/16 12:08:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1509) Implement read/write in NutchField - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/16 12:12:12 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2087 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/16 13:06:45 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #2088 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/16 14:11:49 UTC, 0 replies.
- [jira] [Created] (NUTCH-1519) Configuration Overrides not in sync between WebTableReader and nutch-default.xml - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/16 21:48:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1047) Pluggable indexing backends - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/17 00:08:13 UTC, 26 replies.
- Jenkins build is back to normal : Nutch-nutchgora #468 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/17 06:55:32 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1519) Configuration Overrides not in sync between WebTableReader and nutch-default.xml - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/01/17 07:54:24 UTC, 3 replies.
- [jira] [Created] (NUTCH-1520) SegmentMerger looses records - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/17 10:00:28 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1520) SegmentMerger looses records - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/17 12:00:23 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1520) SegmentMerger looses records - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/17 12:10:14 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers. - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/01/17 12:52:13 UTC, 3 replies.
- Build failed in Jenkins: Nutch-2.x-Windows #5 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/18 00:17:32 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk-Windows #4 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/18 00:22:39 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1449) Optionally delete documents skipped by IndexingFilters - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/01/18 03:48:13 UTC, 1 replies.
- [jira] [Work stopped] (NUTCH-1453) Substantiate tests for IndexingFilters - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/01/18 04:38:16 UTC, 0 replies.
- [CALL FOR TESTING] NUTCH-1047 Pluggable indexing backends - posted by Julien Nioche <li...@gmail.com> on 2013/01/18 17:15:53 UTC, 4 replies.
- [jira] [Resolved] (NUTCH-1453) Substantiate tests for IndexingFilters - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/18 22:10:12 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/20 11:12:12 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1042) Fetcher.max.crawl.delay property not taken into account correctly when set to -1 - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/20 11:44:12 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1042) Fetcher.max.crawl.delay property not taken into account correctly when set to -1 - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/20 11:44:13 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1329) parser not extract outlinks to external web sites - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/20 12:12:15 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1223) Migrate WebGraph to MapReduce API - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/01/21 02:46:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1219) Upgrade all jobs to new MapReduce API - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/01/21 02:46:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1223) Migrate WebGraph to MapReduce API - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/01/21 07:20:12 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1223) Migrate WebGraph to MapReduce API - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/21 08:10:14 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1219) Upgrade all jobs to new MapReduce API - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/21 10:34:12 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/22 04:18:27 UTC, 0 replies.
- CrawlDbFilter urlNormalizers NULL pointer - posted by feng lu <am...@gmail.com> on 2013/01/22 06:59:15 UTC, 2 replies.
- [jira] [Created] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/01/22 08:30:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/01/22 08:34:13 UTC, 3 replies.
- [jira] [Created] (NUTCH-1522) Upgrade to Tika 1.3 - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/01/23 10:50:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1482) Rename HTMLParseFilter - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/01/23 10:52:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1522) Upgrade to Tika 1.3 - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/23 10:54:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.1.0 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/24 04:43:12 UTC, 0 replies.
- [jira] [Created] (NUTCH-1523) Upgrade solr-solr4j dependency to 4.1.0 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/24 04:47:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1523) Upgrade solr-solr4j dependency to 4.1.0 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/24 04:47:12 UTC, 0 replies.
- Daily Batch Digests of Mailing Lists Available - posted by Lewis John Mcgibbney <le...@gmail.com> on 2013/01/24 19:26:26 UTC, 0 replies.
- [jira] [Created] (NUTCH-1524) Internal links are not being saved even with change in parameter (db.ignore.internal.links) - posted by "kiran (JIRA)" <ji...@apache.org> on 2013/01/24 22:21:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1250) parse-html does not parse links with empty anchor - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/01/25 07:39:13 UTC, 2 replies.
- installing nutch on windows 7 cygpath cant convert empty path - posted by peterbarretto <pe...@gmail.com> on 2013/01/25 07:52:52 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2100 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/26 05:20:55 UTC, 0 replies.
- review board - posted by Tejas Patil <te...@gmail.com> on 2013/01/26 07:28:57 UTC, 1 replies.
- Addition to Pluggable Backends - posted by Lewis John Mcgibbney <le...@gmail.com> on 2013/01/26 09:31:42 UTC, 2 replies.
- Build failed in Jenkins: Nutch-trunk #2101 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/27 05:20:24 UTC, 0 replies.
- [jira] [Created] (NUTCH-1525) Generator to record - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/27 19:31:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1524) Internal links are not being saved even with change in parameter (db.ignore.internal.links) - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/27 19:33:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1525) Generator to record external links even when db.ignore.external.links set to true - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/27 19:35:12 UTC, 4 replies.
- [jira] [Created] (NUTCH-1526) Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/01/28 03:33:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/28 03:33:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1526) Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/01/28 03:33:13 UTC, 0 replies.
- Review Request: Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs - posted by Chris Mattmann <ma...@apache.org> on 2013/01/28 03:34:18 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2013/01/28 04:33:12 UTC, 7 replies.
- Build failed in Jenkins: Nutch-nutchgora #477 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/28 05:10:18 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #2102 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/28 05:17:22 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default. - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/28 09:05:14 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1042) Fetcher.max.crawl.delay property not taken into account correctly when set to -1 - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/28 09:07:13 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #478 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/28 09:16:40 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1465) Support sitemaps in Nutch - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/28 09:39:13 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1465) Support sitemaps in Nutch - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/01/28 21:03:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-945) Indexing to multiple SOLR Servers - posted by "Alexander Kingson (JIRA)" <ji...@apache.org> on 2013/01/29 02:26:11 UTC, 1 replies.
- Jenkins build is back to normal : Nutch-nutchgora #479 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/29 05:21:36 UTC, 0 replies.
- Outlinks in parse filter - posted by Markus Jelsma <ma...@openindex.io> on 2013/01/29 13:16:03 UTC, 1 replies.
- Inlinks not being saved in the database - posted by kiran chitturi <ch...@gmail.com> on 2013/01/29 16:42:50 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/01/30 04:33:18 UTC, 4 replies.
- Build failed in Jenkins: Nutch-nutchgora #480 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/30 05:11:12 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2105 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/30 05:11:13 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #481 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/31 05:08:11 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2106 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/01/31 05:08:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1525) Generator to record external links even when db.ignore.external.links set to true - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/01/31 10:41:12 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1526) Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/31 21:51:12 UTC, 0 replies.
- [DISCUSS] Nutch Policy/Opinion on Review Board - posted by Lewis John Mcgibbney <le...@gmail.com> on 2013/01/31 21:57:50 UTC, 1 replies.