You are viewing a plain text version of this content. The canonical link for it is here.
- Build failed in Jenkins: Nutch-trunk #2470 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/01 05:05:23 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1647) protocol-http throws unzipBestEffort returned null for some pages - posted by "lufeng (JIRA)" <ji...@apache.org> on 2014/01/01 15:18:52 UTC, 2 replies.
- Build failed in Jenkins: Nutch-nutchgora #868 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/02 02:11:31 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1670) set same crawldb directory in mergedb parameter - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/02 02:39:50 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1325) HostDB for Nutch - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/02 03:35:51 UTC, 3 replies.
- Build failed in Jenkins: Nutch-nutchgora #869 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/02 05:22:14 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2471 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/02 07:11:09 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1080) Type safe members , arguments for better readability - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/02 07:24:50 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1325) HostDB for Nutch - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/02 11:35:52 UTC, 8 replies.
- [jira] [Commented] (NUTCH-1080) Type safe members , arguments for better readability - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/02 11:43:54 UTC, 2 replies.
- [jira] [Reopened] (NUTCH-1360) Suport the storing of IP address connected to when web crawling - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/02 12:25:52 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/02 12:25:53 UTC, 3 replies.
- [jira] [Updated] (NUTCH-356) Plugin repository cache can lead to memory leak - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/02 12:36:51 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1360) Suport the storing of IP address connected to when web crawling - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/02 12:56:53 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1360) Suport the storing of IP address connected to when web crawling - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/02 12:56:56 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2472 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/02 15:13:36 UTC, 0 replies.
- [jira] [Created] (NUTCH-1691) DomainBlacklist url filter does not allow -D filter file override - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/02 16:48:53 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1691) DomainBlacklist url filter does not allow -D filter file override - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/02 16:56:50 UTC, 5 replies.
- [jira] [Updated] (NUTCH-1691) DomainBlacklist url filter does not allow -D filter file override - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/02 16:56:50 UTC, 0 replies.
- [jira] [Created] (NUTCH-1692) SegmentReader broken in distributed mode - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/02 17:47:50 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1692) SegmentReader broken in distributed mode - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/02 17:47:50 UTC, 1 replies.
- [jira] [Assigned] (NUTCH-1080) Type safe members , arguments for better readability - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/02 20:31:52 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1670) set same crawldb directory in mergedb parameter - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/02 20:43:51 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2473 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/02 21:42:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1454) parsing chm failed - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/02 22:39:50 UTC, 0 replies.
- Re: Nutch Crawl a Specific List Of URLs (150K) - posted by Bin Wang <bi...@gmail.com> on 2014/01/02 23:13:06 UTC, 1 replies.
- use to parse big Nutch/Content file - posted by Bin Wang <bi...@gmail.com> on 2014/01/02 23:48:41 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1686) Optimize UpdateDb to load less field from Store - posted by "Tien Nguyen Manh (JIRA)" <ji...@apache.org> on 2014/01/03 03:57:51 UTC, 0 replies.
- How Map Reduce code in Nutch run in local mode vs distributed mode? - posted by Bin Wang <bi...@gmail.com> on 2014/01/03 04:28:59 UTC, 3 replies.
- Build failed in Jenkins: Nutch-nutchgora #870 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/03 05:05:30 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2474 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/03 05:08:25 UTC, 0 replies.
- [jira] [Created] (NUTCH-1693) TextMD5Signatue compute on textual content - posted by "Tien Nguyen Manh (JIRA)" <ji...@apache.org> on 2014/01/03 05:25:50 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1693) TextMD5Signatue compute on textual content - posted by "Tien Nguyen Manh (JIRA)" <ji...@apache.org> on 2014/01/03 05:27:51 UTC, 5 replies.
- [jira] [Commented] (NUTCH-1693) TextMD5Signatue compute on textual content - posted by "Tien Nguyen Manh (JIRA)" <ji...@apache.org> on 2014/01/03 05:31:50 UTC, 4 replies.
- [jira] [Commented] (NUTCH-356) Plugin repository cache can lead to memory leak - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/03 06:05:53 UTC, 12 replies.
- [jira] [Created] (NUTCH-1694) Consider removing URL filter attribute warnings. - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/03 14:40:50 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1694) Consider removing URL filter attribute warnings. - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/03 14:40:52 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #871 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/04 05:04:02 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2475 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/04 05:06:53 UTC, 0 replies.
- Independent Map Reduce to parse Nutch content (Cont.) - posted by Bin Wang <bi...@gmail.com> on 2014/01/04 05:56:07 UTC, 4 replies.
- [jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/04 08:44:51 UTC, 12 replies.
- Inject operation: can't it be done in a single map-reduce job ? - posted by Tejas Patil <te...@gmail.com> on 2014/01/04 09:00:47 UTC, 4 replies.
- Build failed in Jenkins: Nutch-nutchgora #872 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/04 14:53:55 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #873 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/05 05:05:14 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2476 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/05 05:10:13 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #874 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/06 05:06:19 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2477 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/06 05:09:08 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1371) Replace Ivy with Maven Ant tasks - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/01/06 13:04:54 UTC, 6 replies.
- [jira] [Updated] (NUTCH-1675) NutchField to support long - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/06 15:45:56 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1675) NutchField to support long - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/06 15:45:57 UTC, 1 replies.
- Build failed in Jenkins: Nutch-nutchgora #875 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/07 05:04:33 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2478 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/07 05:09:01 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1675) NutchField to support long - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/07 13:21:51 UTC, 0 replies.
- Nightly builds - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/01/07 13:34:34 UTC, 2 replies.
- Build failed in Jenkins: Nutch-trunk #2479 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/07 13:50:05 UTC, 0 replies.
- [jira] [Created] (NUTCH-1695) NutchDocument.toString() - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/07 14:17:51 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1695) NutchDocument.toString() - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/07 14:19:51 UTC, 1 replies.
- [jira] [Created] (NUTCH-1696) Enable use of (Gora) SNAPSHOT dependencies - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/07 19:15:53 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1696) Enable use of (Gora) SNAPSHOT dependencies - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/07 19:45:51 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1695) NutchDocument.toString() - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/01/07 21:49:50 UTC, 4 replies.
- Build failed in Jenkins: Nutch-nutchgora #876 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/08 05:06:26 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2480 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/08 05:09:50 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1695) NutchDocument.toString() - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/08 10:42:50 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2481 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/08 10:50:07 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1696) Enable use of (Gora) SNAPSHOT dependencies - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/08 12:24:51 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #877 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/08 12:36:21 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2482 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/08 12:39:12 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #2483 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/08 12:53:36 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #878 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/08 13:02:09 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1616) SegmentMerger missing proper crawl_fetch datum - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/08 14:13:50 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index? - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/08 14:13:55 UTC, 24 replies.
- [jira] [Updated] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index? - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/08 14:21:51 UTC, 7 replies.
- [jira] [Created] (NUTCH-1697) SegmentMerger to implement Tool - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/09 11:43:50 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1697) SegmentMerger to implement Tool - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/09 12:43:50 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index? - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/09 17:53:00 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1568) port pluggable indexing architecture to 2.x - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/01/11 13:16:51 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1655) Indexer Plugin for Elastic Search - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/01/11 13:18:50 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1568) port pluggable indexing architecture to 2.x - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/13 14:07:51 UTC, 6 replies.
- [jira] [Resolved] (NUTCH-1667) Updatedb always ignore batchId - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/13 14:17:56 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1672) Inlinks are added twice in DbUpdateReducer - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/13 14:23:51 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1667) Updatedb always ignore batchId - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/01/13 14:41:57 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1672) Inlinks are added twice in DbUpdateReducer - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/01/13 14:42:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-1698) crawl script should not specify solrUrl to accommodate pluggable indexing architecture - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/13 15:40:50 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1698) crawl script should not specify solrUrl to accommodate pluggable indexing architecture - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/13 15:40:51 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1568) port pluggable indexing architecture to 2.x - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/13 19:46:58 UTC, 2 replies.
- [jira] [Created] (NUTCH-1699) Tika Parser - Image Parse Bug - posted by "Mehmet Zahid Yüzügüldü (JIRA)" <ji...@apache.org> on 2014/01/14 08:30:50 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1699) Tika Parser - Image Parse Bug - posted by "Mehmet Zahid Yüzügüldü (JIRA)" <ji...@apache.org> on 2014/01/14 08:30:51 UTC, 4 replies.
- [jira] [Commented] (NUTCH-1699) Tika Parser - Image Parse Bug - posted by "Mehmet Zahid Yüzügüldü (JIRA)" <ji...@apache.org> on 2014/01/14 08:30:51 UTC, 4 replies.
- [jira] [Created] (NUTCH-1700) Remove deprecated code in src/plugin/creativecommons/build.xml - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/14 09:58:51 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1699) Tika Parser - Image Parse Bug - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/14 10:01:17 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1680) CrawldbReader to dump minRetry value - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/14 13:21:50 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1680) CrawldbReader to dump minRetry value - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/14 13:21:50 UTC, 0 replies.
- Proposal for SolrIndexWriter - posted by Lajos <la...@protulae.com> on 2014/01/14 14:07:00 UTC, 3 replies.
- [jira] [Created] (NUTCH-1701) Make Solr Document Boost as an option - posted by "Tien Nguyen Manh (JIRA)" <ji...@apache.org> on 2014/01/15 08:43:21 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1701) Make Solr Document Boost as an option - posted by "Tien Nguyen Manh (JIRA)" <ji...@apache.org> on 2014/01/15 08:45:19 UTC, 1 replies.
- [jira] [Created] (NUTCH-1702) Port HostNormalizer to 2.x - posted by "Tien Nguyen Manh (JIRA)" <ji...@apache.org> on 2014/01/15 09:01:27 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1702) Port HostNormalizer to 2.x - posted by "Tien Nguyen Manh (JIRA)" <ji...@apache.org> on 2014/01/15 09:24:20 UTC, 3 replies.
- [jira] [Created] (NUTCH-1703) Nutch ignores alt text of images - posted by "Canan Girgin (JIRA)" <ji...@apache.org> on 2014/01/15 10:20:20 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1703) Nutch ignores alt text of images - posted by "Canan Girgin (JIRA)" <ji...@apache.org> on 2014/01/15 10:22:19 UTC, 5 replies.
- [jira] [Commented] (NUTCH-1703) Nutch ignores alt text of images - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/15 11:21:20 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-1568) port pluggable indexing architecture to 2.x - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/15 13:04:21 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1655) Indexer Plugin for Elastic Search - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/15 13:34:20 UTC, 3 replies.
- [jira] [Comment Edited] (NUTCH-1703) Nutch ignores alt text of images - posted by "Canan Girgin (JIRA)" <ji...@apache.org> on 2014/01/15 15:19:21 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1701) Make Solr Document Boost as an option - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/15 15:45:19 UTC, 0 replies.
- [jira] [Created] (NUTCH-1704) Port DomainBlacklist urlfilter to 2.x - posted by "Tien Nguyen Manh (JIRA)" <ji...@apache.org> on 2014/01/15 16:05:20 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1704) Port DomainBlacklist urlfilter to 2.x - posted by "Tien Nguyen Manh (JIRA)" <ji...@apache.org> on 2014/01/15 16:05:21 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1699) Tika Parser - Image Parse Bug - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/15 16:21:20 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1674) Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index - posted by "Alexander Uretsky (JIRA)" <ji...@apache.org> on 2014/01/15 16:39:21 UTC, 4 replies.
- [jira] [Updated] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series - posted by "Tien Nguyen Manh (JIRA)" <ji...@apache.org> on 2014/01/15 16:39:23 UTC, 8 replies.
- [jira] [Updated] (NUTCH-1662) Indexer Plugin for Solr Cloud - posted by "Yasin Kılınç (JIRA)" <ji...@apache.org> on 2014/01/15 16:39:27 UTC, 0 replies.
- [jira] [Created] (NUTCH-1705) Make configuration option for HtmlParser & TikaParser to extract text or title for noIndex page - posted by "Tien Nguyen Manh (JIRA)" <ji...@apache.org> on 2014/01/15 17:11:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1705) Make configuration option for HtmlParser & TikaParser to extract text or title for noIndex page - posted by "Tien Nguyen Manh (JIRA)" <ji...@apache.org> on 2014/01/15 17:11:26 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1630) How to achieve finishing fetch approximately at the same time for each queue (a.k.a adaptive queue size) - posted by "Yasin Kılınç (JIRA)" <ji...@apache.org> on 2014/01/16 16:35:21 UTC, 0 replies.
- [jira] [Created] (NUTCH-1706) IndexerMapReduce does not remove db_redir_temp etc - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/16 16:44:19 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1706) IndexerMapReduce does not remove db_redir_temp etc - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/16 17:00:19 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1706) IndexerMapReduce does not remove db_redir_temp etc - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/16 17:08:19 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-1655) Indexer Plugin for Elastic Search - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/16 23:07:21 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1486) Upgrade to the latest Solr 4.x - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/16 23:07:21 UTC, 0 replies.
- Fwd: ApacheCon NA 2014 Travel Assistance Applications now open! - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/01/17 11:05:51 UTC, 0 replies.
- [jira] [Created] (NUTCH-1707) DummyIndexingWriter - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/17 11:54:29 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1707) DummyIndexingWriter - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/17 11:58:20 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1707) DummyIndexingWriter - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/01/17 13:02:21 UTC, 4 replies.
- [jira] [Comment Edited] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/17 13:02:23 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1707) DummyIndexingWriter - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/01/17 13:18:19 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1630) How to achieve finishing fetch approximately at the same time for each queue (a.k.a adaptive queue size) - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/19 17:31:21 UTC, 5 replies.
- [jira] [Commented] (NUTCH-1697) SegmentMerger to implement Tool - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/19 17:42:19 UTC, 1 replies.
- [jira] [Created] (NUTCH-1708) use same id when indexing and deleting redirects - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/01/20 01:13:19 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #2497 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/01/20 05:09:04 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1680) CrawldbReader to dump minRetry value - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/20 10:31:20 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1708) use same id when indexing and deleting redirects - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/20 10:44:20 UTC, 0 replies.
- What is the correct way to serialize a MapWritable to WebPage's metadata? - posted by d_k <ma...@gmail.com> on 2014/01/20 16:02:02 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1645) Junit Test Case for Adaptive Fetch Schedule class - posted by "msertacturkel (JIRA)" <ji...@apache.org> on 2014/01/21 12:20:23 UTC, 3 replies.
- Renovating "Nutch Hadoop Tutorial" wiki page - posted by Tejas Patil <te...@gmail.com> on 2014/01/21 18:44:56 UTC, 8 replies.
- [jira] [Commented] (NUTCH-1572) Nutch 2.x should use o.a.g.mem.store.MemStore for testing - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/01/21 19:53:20 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/21 20:21:22 UTC, 4 replies.
- Request for reviewing HostDb and Sitemap features - posted by Tejas Patil <te...@gmail.com> on 2014/01/21 20:26:49 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1325) HostDB for Nutch - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/21 20:27:23 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1413) Record response time - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/21 20:59:20 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1164) Write JUnit tests for protocol-http - posted by "Sertac TURKEL (JIRA)" <ji...@apache.org> on 2014/01/22 10:24:20 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-1325) HostDB for Nutch - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/22 12:29:21 UTC, 0 replies.
- Right was to run crawl script in deploy mode - posted by Tejas Patil <te...@gmail.com> on 2014/01/22 15:26:25 UTC, 0 replies.
- [jira] [Created] (NUTCH-1709) Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/22 21:11:20 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/22 21:13:25 UTC, 5 replies.
- [jira] [Resolved] (NUTCH-1413) Record response time - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/01/22 22:25:46 UTC, 0 replies.
- Nutch 2.x HEAD + gora-core & gora-cassandra 0.4-SNAPSHOT (trunk) - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/01/22 22:47:46 UTC, 0 replies.
- [jira] [Created] (NUTCH-1710) Add gora package logging to log4j.properties - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/22 23:28:19 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1710) Add gora package logging to log4j.properties - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/22 23:32:20 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1710) Add gora package logging to log4j.properties - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/22 23:32:20 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1710) Add gora package logging to log4j.properties - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/22 23:32:21 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1710) Add gora package logging to log4j.properties - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/01/22 23:42:23 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1709) Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc - posted by "Alparslan Avcı (JIRA)" <ji...@apache.org> on 2014/01/23 13:03:38 UTC, 1 replies.
- [jira] [Created] (NUTCH-1711) Normalizer does not encode exclamation mark - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/23 14:07:37 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1711) Normalizer does not encode exclamation mark - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/23 14:15:37 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1465) Support sitemaps in Nutch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/23 15:39:37 UTC, 0 replies.
- [jira] [Created] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/23 15:54:41 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/23 15:56:37 UTC, 1 replies.
- [jira] [Created] (NUTCH-1713) IpAddressResolver and DNSCache - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/23 16:13:37 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1713) IpAddressResolver and DNSCache - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/23 16:15:37 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1660) Index filter for Page's latitude and longitude - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/23 16:23:39 UTC, 0 replies.
- [jira] [Created] (NUTCH-1714) Nutch 2.x upgrade to use GORA_94 branch - posted by "Alparslan Avcı (JIRA)" <ji...@apache.org> on 2014/01/23 17:00:41 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1714) Nutch 2.x upgrade to use GORA_94 branch - posted by "Alparslan Avcı (JIRA)" <ji...@apache.org> on 2014/01/23 17:02:37 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-1714) Nutch 2.x upgrade to use GORA_94 branch - posted by "Alparslan Avcı (JIRA)" <ji...@apache.org> on 2014/01/23 17:18:40 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1164) Write JUnit tests for protocol-http - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/23 20:06:41 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/23 20:54:46 UTC, 0 replies.
- Re: Right way to run crawl script in deploy mode - posted by Tejas Patil <te...@gmail.com> on 2014/01/23 21:11:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1645) Junit Test Case for Adaptive Fetch Schedule class - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/23 22:05:39 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1253) Incompatible neko and xerces versions - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/23 22:11:40 UTC, 3 replies.
- [jira] [Comment Edited] (NUTCH-1253) Incompatible neko and xerces versions - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/23 22:11:41 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1677) ORIGINAL_CHAR_ENCODING and CHAR_ENCODING_FOR_CONVERSION are not set in Parse HTML - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/24 01:07:38 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1677) ORIGINAL_CHAR_ENCODING and CHAR_ENCODING_FOR_CONVERSION are not set in Parse HTML - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/24 01:07:39 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1622) Create Outlinks with metadata - posted by "Daniel Kugel (JIRA)" <ji...@apache.org> on 2014/01/24 07:34:40 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1676) Add rudimentary SSL support to protocol-http - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/24 13:40:37 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1676) Add rudimentary SSL support to protocol-http - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/01/24 13:44:42 UTC, 3 replies.
- [jira] [Assigned] (NUTCH-356) Plugin repository cache can lead to memory leak - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/24 14:06:37 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-356) Plugin repository cache can lead to memory leak - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/24 14:24:39 UTC, 0 replies.
- [jira] [Created] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/24 18:29:37 UTC, 0 replies.
- [jira] [Created] (NUTCH-1716) RobotRulesParser adds extra '*' to the robots name - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/24 18:39:39 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/24 18:41:40 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-1716) RobotRulesParser adds extra '*' to the robots name - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/24 18:43:37 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/24 18:49:37 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/24 18:51:37 UTC, 2 replies.
- Nutch meetup / hackathon at BerlinBuzzwords next May? - posted by Julien Nioche <li...@gmail.com> on 2014/01/24 22:39:17 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1164) Write JUnit tests for protocol-http - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/01/25 06:48:40 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1414) Date extraction parse filter - posted by "Luke (JIRA)" <ji...@apache.org> on 2014/01/26 13:01:38 UTC, 5 replies.
- [jira] [Comment Edited] (NUTCH-1414) Date extraction parse filter - posted by "Luke (JIRA)" <ji...@apache.org> on 2014/01/26 13:01:40 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1692) SegmentReader broken in distributed mode - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/26 17:58:38 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1084) ReadDB url throws exception - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/27 13:09:38 UTC, 0 replies.
- [jira] [Created] (NUTCH-1717) HostDB not to complain if filters/normalizers are disabled - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/27 16:15:37 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1717) HostDB not to complain if filters/normalizers are disabled - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/27 16:15:38 UTC, 0 replies.
- [jira] [Created] (NUTCH-1718) update description of property http.robots.agent - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/01/28 09:55:37 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1718) update description of property http.robots.agent - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/28 11:58:38 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1717) HostDB not to complain if filters/normalizers are disabled - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/28 14:00:40 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1719) DomainStatistics fails in 2.x because URL is not unreversed - posted by "Gerhard Gossen (JIRA)" <ji...@apache.org> on 2014/01/28 14:52:39 UTC, 2 replies.
- [jira] [Created] (NUTCH-1719) DomainStatistics fails in 2.x because URL is not unreversed - posted by "Gerhard Gossen (JIRA)" <ji...@apache.org> on 2014/01/28 14:52:39 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1718) update description of property http.robots.agent - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/28 20:38:45 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1719) DomainStatistics fails in 2.x because URL is not unreversed - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/28 21:20:39 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1253) Incompatible neko and xerces versions - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/29 15:18:08 UTC, 0 replies.
- [jira] [Created] (NUTCH-1720) Duplicate lines in HttpBase.java - posted by "Walter Tietze (JIRA)" <ji...@apache.org> on 2014/01/29 15:36:09 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1720) Duplicate lines in HttpBase.java - posted by "Walter Tietze (JIRA)" <ji...@apache.org> on 2014/01/29 16:04:08 UTC, 5 replies.
- [jira] [Updated] (NUTCH-1720) Duplicate lines in HttpBase.java - posted by "Walter Tietze (JIRA)" <ji...@apache.org> on 2014/01/29 16:06:08 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-1719) DomainStatistics fails in 2.x because URL is not unreversed - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/30 12:00:09 UTC, 0 replies.
- Submission to ApacheCon on Tika - posted by Chris Mattmann <ma...@apache.org> on 2014/01/31 07:02:22 UTC, 0 replies.
- [jira] [Created] (NUTCH-1721) Upgrade to Crawler commons 0.3 - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/31 15:16:11 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1721) Upgrade to Crawler commons 0.3 - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/31 15:18:09 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1721) Upgrade to Crawler commons 0.3 - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2014/01/31 15:22:08 UTC, 1 replies.