You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (NUTCH-1667) Updatedb always ignore batchId - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/12/02 02:34:35 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1672) Inlinks are added twice in DbUpdateReducer - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/12/02 02:34:35 UTC, 0 replies.
- Re: [DISCUSS] Release Trunk - posted by Sebastian Nagel <wa...@googlemail.com> on 2013/12/02 22:02:01 UTC, 2 replies.
- [jira] [Created] (NUTCH-1678) Remove dependency on org.apache.oro - posted by "James Sullivan (JIRA)" <ji...@apache.org> on 2013/12/03 08:13:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1678) Remove dependency on org.apache.oro - posted by "James Sullivan (JIRA)" <ji...@apache.org> on 2013/12/03 08:19:35 UTC, 1 replies.
- nutch with Hadoop 2 - posted by d_k <ma...@gmail.com> on 2013/12/03 09:31:53 UTC, 0 replies.
- [jira] [Created] (NUTCH-1679) UpdateDb using batchId, link may override crawled page. - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/04 17:29:35 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1556) enabling updatedb to accept batchId - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/12/04 20:20:36 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-1556) enabling updatedb to accept batchId - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/12/04 20:20:37 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1679) UpdateDb using batchId, link may override crawled page. - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/12/04 20:22:35 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-1556) enabling updatedb to accept batchId - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/12/04 20:24:36 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1556) enabling updatedb to accept batchId - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/12/04 20:26:35 UTC, 0 replies.
- [jira] [Commented] (NUTCH-656) DeleteDuplicates based on crawlDB only - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/12/08 14:57:35 UTC, 1 replies.
- Nutch with YARN (aka Hadoop 2.0) - posted by Tejas Patil <te...@gmail.com> on 2013/12/09 07:42:56 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1321) IDNNormalizer - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/12/09 11:31:07 UTC, 5 replies.
- [jira] [Commented] (NUTCH-1326) HostDeduplicator for Nutch - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/12/09 15:14:07 UTC, 1 replies.
- [jira] [Created] (NUTCH-1680) CrawldbReader to dump minRetry value - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/12/10 13:53:08 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1680) CrawldbReader to dump minRetry value - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/12/10 13:55:07 UTC, 0 replies.
- [jira] [Created] (NUTCH-1681) In URLUtil.java, toUNICODE method does not work correctly - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/12/10 17:01:11 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1681) In URLUtil.java, toUNICODE method does not work correctly - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/12/10 17:07:07 UTC, 11 replies.
- [jira] [Created] (NUTCH-1682) Port optionally maintain custom fetch interval despite AdaptiveFetchSchedule to 2.x - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/11 03:44:07 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1682) Port optionally maintain custom fetch interval despite AdaptiveFetchSchedule to 2.x - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/11 04:03:10 UTC, 3 replies.
- [jira] [Created] (NUTCH-1683) Optionally maintain custom fetch interval despite AbstractFetchSchedule - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/11 04:10:08 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1683) Optionally maintain custom fetch interval despite AbstractFetchSchedule - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/11 04:14:15 UTC, 4 replies.
- [jira] [Created] (NUTCH-1684) ParseMeta to be added before fetch schedulers are run - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/12/11 15:57:08 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1684) ParseMeta to be added before fetch schedulers are run - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/12/11 15:57:08 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1681) In URLUtil.java, toUNICODE method does not work correctly - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/12/12 21:31:08 UTC, 12 replies.
- [jira] [Updated] (NUTCH-1360) Suport the storing of IP address connected to when web crawling - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/12/13 17:59:09 UTC, 2 replies.
- [jira] [Comment Edited] (NUTCH-1360) Suport the storing of IP address connected to when web crawling - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/12/13 18:01:09 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-1130) JUnit test for Any23 RDF plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/12/13 18:03:08 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1130) JUnit test for Any23 RDF plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/12/13 18:03:08 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1129) Any23 Nutch plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/12/13 18:03:08 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1577) Add target for creating eclipse project - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/12/15 01:31:07 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1325) HostDB for Nutch - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/12/15 03:35:07 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/12/15 09:18:08 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/12/16 00:32:07 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1465) Support sitemaps in Nutch - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/12/16 01:10:08 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/12/16 10:28:07 UTC, 5 replies.
- [jira] [Updated] (NUTCH-1321) IDNNormalizer - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/12/16 17:57:12 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1676) Add rudimentary SSL support to protocol-http - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/12/17 15:50:11 UTC, 5 replies.
- [jira] [Updated] (NUTCH-1676) Add rudimentary SSL support to protocol-http - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/12/18 12:48:08 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/12/19 14:28:09 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/12/20 16:24:11 UTC, 2 replies.
- [jira] [Created] (NUTCH-1685) URLUtil.toUNICODE fails on IDNs - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/12/20 23:44:11 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1685) URLUtil.toUNICODE fails on IDNs - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/12/20 23:48:09 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #855 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/22 08:05:04 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2458 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/22 08:07:50 UTC, 0 replies.
- [jira] [Created] (NUTCH-1686) Optimize UpdateDb to load less field from Store - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/23 04:41:50 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1686) Optimize UpdateDb to load less field from Store - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/23 04:41:51 UTC, 1 replies.
- [jira] [Created] (NUTCH-1687) Pick queue in Round Robin - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/23 04:59:50 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1687) Pick queue in Round Robin - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/23 05:01:50 UTC, 7 replies.
- Step Through Nutch 1.7 Inside Eclipse Missing Argument - posted by Bin Wang <bi...@gmail.com> on 2013/12/23 05:05:05 UTC, 3 replies.
- Jenkins build is back to normal : Nutch-nutchgora #856 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/23 05:06:23 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2459 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/23 05:07:04 UTC, 0 replies.
- [jira] [Created] (NUTCH-1688) Port DeleteDuplicate based on crawlDB only to 2.x - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/23 05:16:50 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1688) Port DeleteDuplicate based on crawlDB only to 2.x - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/23 05:18:50 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1687) Pick queue in Round Robin - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/12/23 05:31:50 UTC, 5 replies.
- [jira] [Created] (NUTCH-1689) Improve CrawlDb stats - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/23 05:31:51 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1689) Improve CrawlDb stats - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/23 05:31:51 UTC, 5 replies.
- [jira] [Commented] (NUTCH-1686) Optimize UpdateDb to load less field from Store - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/12/23 05:33:50 UTC, 1 replies.
- [jira] [Created] (NUTCH-1690) IndexClean: mark url as unindexed after clean to not delete again - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/23 05:53:50 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1690) IndexClean: mark url as unindexed after clean to not delete again - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/12/23 05:55:50 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1689) Improve CrawlDb stats - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/12/23 06:48:54 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1690) IndexClean: mark url as unindexed after clean to not delete again - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/12/23 09:34:50 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1685) URLUtil.toUNICODE fails on IDNs - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/12/23 10:26:50 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1685) URLUtil.toUNICODE fails on IDNs - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/12/23 11:42:50 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1681) In URLUtil.java, toUNICODE method does not work correctly - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/12/23 15:30:53 UTC, 1 replies.
- Jenkins build is back to normal : Nutch-trunk #2460 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/23 15:49:46 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1681) In URLUtil.java, toUNICODE method does not work correctly - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/12/23 16:10:57 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1360) Suport the storing of IP address connected to when web crawling - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/12/23 18:19:55 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2461 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/23 18:47:32 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2462 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/24 05:04:26 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1674) Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index - posted by "Alparslan Avcı (JIRA)" <ji...@apache.org> on 2013/12/24 16:05:51 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1674) Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index - posted by "Alparslan Avcı (JIRA)" <ji...@apache.org> on 2013/12/24 16:07:51 UTC, 2 replies.
- Nutch Several Segment Folders Containing Duplicate Key/URLs - posted by Bin Wang <bi...@gmail.com> on 2013/12/24 17:06:19 UTC, 1 replies.
- Build failed in Jenkins: Nutch-trunk #2463 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/25 06:30:02 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #860 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/25 06:31:59 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #861 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/26 05:03:41 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2464 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/26 05:05:17 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2465 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/27 05:06:08 UTC, 0 replies.
- Nutch Crawl a Specific List Of URLs (150K) - posted by Bin Wang <bi...@gmail.com> on 2013/12/27 19:49:52 UTC, 3 replies.
- Build failed in Jenkins: Nutch-trunk #2466 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/28 05:06:24 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2467 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/29 05:05:18 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2468 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/30 05:06:08 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #865 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/30 05:08:09 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2469 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/31 05:06:10 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #866 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/12/31 05:08:10 UTC, 0 replies.