You are viewing a plain text version of this content. The canonical link for it is here.
- Build failed in Jenkins: Nutch-nutchgora #806 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/11/01 05:07:31 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2409 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/11/01 05:14:30 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling - posted by "Yasin Kılınç (JIRA)" <ji...@apache.org> on 2013/11/01 07:40:18 UTC, 9 replies.
- [jira] [Commented] (NUTCH-1640) OOM in ParseSegment Phase - posted by "Ian H. (JIRA)" <ji...@apache.org> on 2013/11/01 11:54:19 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-1640) OOM in ParseSegment Phase - posted by "Ian H. (JIRA)" <ji...@apache.org> on 2013/11/01 11:54:19 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1640) OOM in ParseSegment Phase - posted by "Ian H. (JIRA)" <ji...@apache.org> on 2013/11/01 14:06:23 UTC, 1 replies.
- [jira] [Issue Comment Deleted] (NUTCH-1640) OOM in ParseSegment Phase - posted by "Ian H. (JIRA)" <ji...@apache.org> on 2013/11/01 14:08:21 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1360) Suport the storing of IP address connected to when web crawling - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/01 18:43:19 UTC, 4 replies.
- [jira] [Resolved] (NUTCH-1125) JUnit test for tld - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/01 19:47:19 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #807 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/11/02 05:31:41 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1125) JUnit test for tld - posted by "Hudson (JIRA)" <ji...@apache.org> on 2013/11/02 05:32:18 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1651) modifiedTime and prevmodifiedTime never set - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/02 14:41:18 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1413) Record response time - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/02 15:03:19 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1413) Record response time - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/02 15:07:17 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1643) Unnecessary fetching with http.content.limit when using protocol-http - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/02 15:09:17 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1650) Adaptive Fetch Scheduler interval Wrong Set - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/02 15:13:18 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1588) Port NUTCH-1245 URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again to 2.x - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/02 15:17:20 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1643) Unnecessary fetching with http.content.limit when using protocol-http - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/11/02 16:14:19 UTC, 0 replies.
- [jira] [Created] (NUTCH-1660) Index filter for Page's latitude and longitude - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/11/02 16:20:20 UTC, 0 replies.
- Why is createWebStore not generic ? - posted by Talat UYARER <ta...@agmlab.com> on 2013/11/02 17:07:40 UTC, 1 replies.
- [jira] [Created] (NUTCH-1661) Language based crawling - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/11/02 17:17:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1661) Language based crawling - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/11/02 17:57:20 UTC, 3 replies.
- Jenkins build is back to normal : Nutch-trunk #2410 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/11/02 23:12:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1588) Port NUTCH-1245 URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again to 2.x - posted by "Hudson (JIRA)" <ji...@apache.org> on 2013/11/03 09:02:21 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1245) URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again - posted by "Hudson (JIRA)" <ji...@apache.org> on 2013/11/03 09:02:22 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1650) Adaptive Fetch Scheduler interval Wrong Set - posted by "Hudson (JIRA)" <ji...@apache.org> on 2013/11/03 09:02:23 UTC, 0 replies.
- [jira] [Created] (NUTCH-1662) Indexer Plugin for Solr Cloud - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/11/03 11:30:17 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1659) Custom partitioner for Adaptive Queue Size - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/11/03 14:01:18 UTC, 1 replies.
- [jira] [Updated] (NUTCH-828) Fetch Filter - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/03 14:23:18 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1371) Replace Ivy with Maven Ant tasks - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/03 14:26:17 UTC, 0 replies.
- Best way to avoid filtering outlinks? [PATCH] - posted by "Andy Boothe [WCG]" <ab...@wcgworld.com> on 2013/11/03 18:12:59 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1659) Custom partitioner for Adaptive Queue Size - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/11/04 11:12:17 UTC, 0 replies.
- [jira] [Created] (NUTCH-1663) Crawl page with specified language - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/11/04 18:31:17 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1651) modifiedTime and prevmodifiedTime never set - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/04 20:14:19 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1663) Crawl page with specified language - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/11/05 10:32:18 UTC, 4 replies.
- [jira] [Updated] (NUTCH-1660) Index filter for Page's latitude and longitude - posted by "Yasin Kılınç (JIRA)" <ji...@apache.org> on 2013/11/05 11:04:18 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1663) Crawl page with specified language - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/11/05 14:37:20 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1660) Index filter for Page's latitude and longitude - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/11/05 16:11:19 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1517) CloudSearch indexer - posted by "Tom Hill (JIRA)" <ji...@apache.org> on 2013/11/05 18:00:19 UTC, 0 replies.
- [jira] [Created] (NUTCH-1664) Support for Hadoop 2.x - posted by "Paul Inventado (JIRA)" <ji...@apache.org> on 2013/11/06 17:35:17 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1664) Support for Hadoop 2.x - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/06 23:00:19 UTC, 0 replies.
- [jira] [Created] (NUTCH-1665) Generator to implement Tool - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/11/08 20:24:17 UTC, 0 replies.
- [jira] [Created] (NUTCH-1666) Optimisation for BasicURLNormalizer - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/11 10:56:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1666) Optimisation for BasicURLNormalizer - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/11 10:56:18 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1666) Optimisation for BasicURLNormalizer - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/11/11 11:04:18 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1666) Optimisation for BasicURLNormalizer - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/11 11:16:28 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1402) Create AbstractScoringFilter - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/11 16:40:17 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1324) DupeDB for Nutch - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/11 16:46:19 UTC, 1 replies.
- [jira] [Commented] (NUTCH-656) DeleteDuplicates based on crawlDB only - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/11 16:48:17 UTC, 5 replies.
- [jira] [Resolved] (NUTCH-1100) SolrDedup broken - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/11 17:02:19 UTC, 0 replies.
- Partial document update Solr - posted by erik rombouts <e....@gmail.com> on 2013/11/11 23:15:52 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1100) SolrDedup broken - posted by "Hudson (JIRA)" <ji...@apache.org> on 2013/11/12 01:33:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1556) enabling updatedb to accept batchId - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/11/13 08:38:19 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1646) IndexerMapReduce to consider DB status - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/11/13 21:19:23 UTC, 0 replies.
- [jira] [Created] (NUTCH-1667) Updatedb always ignore batchId - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/11/14 09:43:20 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1667) Updatedb always ignore batchId - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/11/14 09:45:21 UTC, 1 replies.
- [jira] [Updated] (NUTCH-656) DeleteDuplicates based on crawlDB only - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/14 11:29:24 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-656) DeleteDuplicates based on crawlDB only - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/14 12:57:22 UTC, 0 replies.
- [jira] [Created] (NUTCH-1668) Remove package org.apache.nutch.indexer.solr - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/14 13:01:25 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1621) Deprecated class o.a.n.crawl.Crawler is still in code base - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/14 13:13:21 UTC, 0 replies.
- All in one Crawl class - posted by Julien Nioche <li...@gmail.com> on 2013/11/14 13:22:50 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1668) Remove package org.apache.nutch.indexer.solr - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/14 15:41:24 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1646) IndexerMapReduce to consider DB status - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/11/14 23:29:21 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #819 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/11/15 02:15:37 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1621) Deprecated class o.a.n.crawl.Crawler is still in code base - posted by "Hudson (JIRA)" <ji...@apache.org> on 2013/11/15 02:16:18 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-828) Fetch Filter - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/15 18:39:22 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1607) Make inproper multiValued field configurable - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/15 18:53:22 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1558) CharEncodingForConversion in ParseData's ParseMeta, not in ParseData's ContentMeta - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/15 18:57:21 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1382) Adding support for EmbeddedSolrServer to SolrIndexer - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/15 19:03:22 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #820 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/11/16 02:39:40 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1668) Remove package org.apache.nutch.indexer.solr - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/18 13:09:20 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1668) Remove package org.apache.nutch.indexer.solr - posted by "Hudson (JIRA)" <ji...@apache.org> on 2013/11/18 19:39:21 UTC, 0 replies.
- [jira] [Created] (NUTCH-1669) FTP crawl does not use FTP's server root folder - posted by "Rafael Thomas Goz Coutinho (JIRA)" <ji...@apache.org> on 2013/11/19 13:33:23 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1669) FTP crawl does not use FTP's server root folder - posted by "Rafael Thomas Goz Coutinho (JIRA)" <ji...@apache.org> on 2013/11/19 13:41:20 UTC, 0 replies.
- [jira] [Created] (NUTCH-1670) set same crawldb directory in mergedb parameter - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/11/20 15:49:36 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1670) set same crawldb directory in mergedb parameter - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/11/20 15:53:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1670) set same crawldb directory in mergedb parameter - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/11/20 15:53:36 UTC, 1 replies.
- [jira] [Reopened] (NUTCH-1587) misspelled property "threshold" in conf/log4j.properties - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/11/21 23:02:35 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1587) misspelled property "threshold" in conf/log4j.properties - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/11/21 23:06:35 UTC, 0 replies.
- [jira] [Created] (NUTCH-1671) indexchecker to add digest field - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/11/21 23:41:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1671) indexchecker to add digest field - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/11/21 23:43:35 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1587) misspelled property "threshold" in conf/log4j.properties - posted by "Hudson (JIRA)" <ji...@apache.org> on 2013/11/22 08:34:37 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1309) fetch queue management - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/22 12:41:35 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1667) Updatedb always ignore batchId - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/11/23 03:35:35 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1671) indexchecker to add digest field - posted by "lufeng (JIRA)" <ji...@apache.org> on 2013/11/23 03:53:35 UTC, 0 replies.
- [jira] [Created] (NUTCH-1672) Inlinks are added twice in DbUpdateReducer - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/11/24 09:36:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1672) Inlinks are added twice in DbUpdateReducer - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/11/24 09:36:36 UTC, 1 replies.
- [jira] [Created] (NUTCH-1673) Title isn't reset in MoreIndexingFilter - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/11/24 09:42:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1673) Title isn't reset in MoreIndexingFilter - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/11/24 09:44:35 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1674) Use batchId filter enable scan (GORA-119) for Fetch,Parse,Update,Index - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/11/25 03:44:35 UTC, 1 replies.
- [jira] [Created] (NUTCH-1674) Use batchId filter enable scan (GORA-119) for Fetch,Parse,Update,Index - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/11/25 03:44:35 UTC, 0 replies.
- [jira] [Created] (NUTCH-1675) NutchField to support long - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/11/26 13:09:36 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1675) NutchField to support long - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/11/26 13:13:35 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1673) Title isn't reset in MoreIndexingFilter - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/26 14:41:35 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1674) Use batchId filter enable scan (GORA-119) for Fetch,Parse,Update,Index - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/26 14:47:35 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1486) Upgrade to the latest Solr 4.x - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/11/27 04:21:38 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1630) How to achieve finishing fetch approximately at the same time for each queue (a.k.a adaptive queue size) - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/11/27 04:31:36 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1325) HostDB for Nutch - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/11/27 04:37:35 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1674) Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/11/27 04:48:36 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1661) Language based crawling - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/11/27 06:00:36 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1297) it is better for fetchItemQueues to select items from greater queues first - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/11/27 06:02:36 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1674) Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/27 10:59:38 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1673) Title isn't reset in MoreIndexingFilter - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/11/27 11:17:35 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1647) protocol-http throws unzipBestEffort returned null for some pages - posted by "Luke (JIRA)" <ji...@apache.org> on 2013/11/27 20:39:35 UTC, 1 replies.
- [jira] [Created] (NUTCH-1676) Add rudimentary SSL support to protocol-http - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/28 10:13:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1676) Add rudimentary SSL support to protocol-http - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/11/28 10:15:35 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1647) protocol-http throws unzipBestEffort returned null for some pages - posted by "Luke (JIRA)" <ji...@apache.org> on 2013/11/28 12:55:39 UTC, 0 replies.
- [DISCUSS] Release Trunk - posted by Lewis John Mcgibbney <le...@gmail.com> on 2013/11/28 17:34:27 UTC, 1 replies.
- [jira] [Created] (NUTCH-1677) ORIGINAL_CHAR_ENCODING and CHAR_ENCODING_FOR_CONVERSION are not set in Parse HTML - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/11/29 13:35:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1677) ORIGINAL_CHAR_ENCODING and CHAR_ENCODING_FOR_CONVERSION are not set in Parse HTML - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/11/29 17:26:36 UTC, 0 replies.