You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (NUTCH-1714) Nutch 2.x upgrade to use GORA_94 branch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/01 11:34:16 UTC, 9 replies.
- [WELCOME] Nutch PMC Welcomes Talat Uyarer to PMC and Committer - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/04/01 16:33:44 UTC, 3 replies.
- [jira] [Created] (NUTCH-1744) FTP Issue when entering Passive mode - posted by "Rafael Thomas Goz Coutinho (JIRA)" <ji...@apache.org> on 2014/04/01 18:17:32 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1744) FTP Issue when entering Passive mode - posted by "Rafael Thomas Goz Coutinho (JIRA)" <ji...@apache.org> on 2014/04/01 19:47:16 UTC, 1 replies.
- [jira] [Closed] (NUTCH-1744) FTP Issue when entering Passive mode - posted by "Rafael Thomas Goz Coutinho (JIRA)" <ji...@apache.org> on 2014/04/01 20:09:26 UTC, 0 replies.
- [jira] [Created] (NUTCH-1745) Upgrade to ElasticSearch 1.1.0 - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/02 17:15:45 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1745) Upgrade to ElasticSearch 1.1.0 - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/02 17:15:51 UTC, 0 replies.
- Add Field to crawled content for indexing - posted by Yann Levreau <ya...@gmail.com> on 2014/04/02 17:42:17 UTC, 2 replies.
- Build failed in Jenkins: Nutch-trunk #2587 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/04/03 06:24:40 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1745) Upgrade to ElasticSearch 1.1.0 - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/04/03 12:03:16 UTC, 1 replies.
- [jira] [Created] (NUTCH-1746) OutOfMemory Error in Mappers - posted by "Greg Padiasek (JIRA)" <ji...@apache.org> on 2014/04/03 15:58:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1746) OutOfMemoryError in Mappers - posted by "Greg Padiasek (JIRA)" <ji...@apache.org> on 2014/04/03 16:00:44 UTC, 4 replies.
- [jira] [Resolved] (NUTCH-351) Protocol forward proxy - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/03 17:16:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1739) ExecutorService field in ParseUtil.java not be right use and cause memory leak - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/03 17:22:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-351) Protocol forward proxy - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/03 21:33:18 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #2588 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/04/04 08:31:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1486) Upgrade to the latest Solr 4.x - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/04/04 13:07:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1486) Upgrade to the latest Solr 4.x - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/04/04 13:14:19 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x - posted by "Alparslan Avcı (JIRA)" <ji...@apache.org> on 2014/04/04 14:47:15 UTC, 0 replies.
- Url validator rejected url because of 2 dots - posted by Mustafa Sertac Turkel <se...@agmlab.com> on 2014/04/04 14:59:07 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1746) OutOfMemoryError in Mappers - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/04 15:13:15 UTC, 4 replies.
- [jira] [Comment Edited] (NUTCH-1746) OutOfMemoryError in Mappers - posted by "Greg Padiasek (JIRA)" <ji...@apache.org> on 2014/04/04 15:37:16 UTC, 4 replies.
- upgrading protocol-httpclient to httpclient 4.1.1 - posted by d_k <ma...@gmail.com> on 2014/04/04 16:14:07 UTC, 5 replies.
- [jira] [Resolved] (NUTCH-1745) Upgrade to ElasticSearch 1.1.0 - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/04 16:46:16 UTC, 0 replies.
- [jira] [Issue Comment Deleted] (NUTCH-1746) OutOfMemoryError in Mappers - posted by "Greg Padiasek (JIRA)" <ji...@apache.org> on 2014/04/05 05:46:15 UTC, 1 replies.
- [jira] [Created] (NUTCH-1747) Use AtomicInteger as semaphore in Fetcher - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/05 09:27:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1747) Use AtomicInteger as semaphore in Fetcher - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/05 09:29:17 UTC, 1 replies.
- [jira] [Assigned] (NUTCH-207) Bandwidth target for fetcher rather than a thread count - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/05 09:33:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1735) code dedup fetcher queue redirects - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/05 09:40:14 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1687) Pick queue in Round Robin - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/05 10:31:16 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1735) code dedup fetcher queue redirects - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/05 19:08:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1182) fetcher should track and shut down hung threads - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/05 21:07:14 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1747) Use AtomicInteger as semaphore in Fetcher - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/05 21:44:14 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1747) Use AtomicInteger as semaphore in Fetcher - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/05 21:55:14 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-385) Server delay feature conflicts with maxThreadsPerHost - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/05 22:03:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-490) Extension point with filters for Neko HTML parser (with patch) - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/05 22:07:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1297) it is better for fetchItemQueues to select items from greater queues first - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/05 22:15:16 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1278) Fetch Improvement in threads per host - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/05 22:19:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-410) Faster RegexNormalize with more features - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/05 22:21:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-827) HTTP POST Authentication - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/05 22:23:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1342) Read time out protocol-http - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/05 22:25:14 UTC, 0 replies.
- [jira] [Created] (NUTCH-1748) despite unix systems allow "abc..xyz.txt" kind of urls, url validator plugin rejects. - posted by "Sertac TURKEL (JIRA)" <ji...@apache.org> on 2014/04/05 22:36:15 UTC, 0 replies.
- [jira] [Issue Comment Deleted] (NUTCH-1615) Implementing A Feature for Fetching From Websites Dump - posted by "cihad güzel (JIRA)" <ji...@apache.org> on 2014/04/05 23:51:16 UTC, 0 replies.
- [jira] [Created] (NUTCH-1749) Title duplicated in document body - posted by "Greg Padiasek (JIRA)" <ji...@apache.org> on 2014/04/06 06:05:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1749) Title duplicated in document body - posted by "Greg Padiasek (JIRA)" <ji...@apache.org> on 2014/04/06 06:07:14 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1615) Implementing A Feature for Fetching From Websites Dump - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/06 12:28:15 UTC, 0 replies.
- [jira] [Created] (NUTCH-1750) Improvement of Fetcher's reportStatus - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/06 15:05:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1750) Improvement of Fetcher's reportStatus - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/06 15:09:14 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1750) Improvement of Fetcher's reportStatus - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/06 15:16:15 UTC, 2 replies.
- [jira] [Commented] (NUTCH-385) Server delay feature conflicts with maxThreadsPerHost - posted by "Chris Schneider (JIRA)" <ji...@apache.org> on 2014/04/07 01:06:14 UTC, 1 replies.
- [jira] [Reopened] (NUTCH-385) Server delay feature conflicts with maxThreadsPerHost - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/07 10:51:15 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2014/04/07 18:07:25 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2596 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/04/09 08:08:32 UTC, 0 replies.
- [jira] [Created] (NUTCH-1751) Empty anchors should not index - posted by "Sertac TURKEL (JIRA)" <ji...@apache.org> on 2014/04/09 08:44:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1751) Empty anchors should not index - posted by "Sertac TURKEL (JIRA)" <ji...@apache.org> on 2014/04/09 08:50:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1748) Despite Unix systems accept files containing two dots.Urlfilter-validator rejects such path names. - posted by "Sertac TURKEL (JIRA)" <ji...@apache.org> on 2014/04/09 08:50:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1750) Improvement of Fetcher's reportStatus - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/09 10:38:15 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1750) Improvement of Fetcher's reportStatus - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/09 10:38:16 UTC, 0 replies.
- [jira] [Created] (NUTCH-1752) cache robots.txt rules per protocol:host:port - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/09 11:01:31 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1752) cache robots.txt rules per protocol:host:port - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/09 11:03:16 UTC, 1 replies.
- Jenkins build is back to normal : Nutch-trunk #2597 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/04/09 12:14:40 UTC, 0 replies.
- [jira] [Created] (NUTCH-1753) Eclipse dependecy problem for 2.x - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/04/09 13:18:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1753) Eclipse dependecy problem for 2.x - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/04/09 13:18:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-710) Support for rel="canonical" attribute - posted by "Sertac TURKEL (JIRA)" <ji...@apache.org> on 2014/04/09 13:26:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1748) Despite Unix systems accept files containing two dots.Urlfilter-validator rejects such path names. - posted by "Alex McLintock (JIRA)" <ji...@apache.org> on 2014/04/09 14:09:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1748) urlfilter-validator to allow .. (two dots) inside file names (path elements) - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/09 15:18:15 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1748) urlfilter-validator to allow .. (two dots) inside file names (path elements) - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/09 15:26:16 UTC, 1 replies.
- [jira] [Commented] (NUTCH-710) Support for rel="canonical" attribute - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/09 16:30:24 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1752) cache robots.txt rules per protocol:host:port - posted by "lufeng (JIRA)" <ji...@apache.org> on 2014/04/09 16:40:17 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1731) Better cmd line parsing for NutchServer - posted by "Fjodor Vershinin (JIRA)" <ji...@apache.org> on 2014/04/09 20:28:15 UTC, 5 replies.
- [jira] [Comment Edited] (NUTCH-1731) Better cmd line parsing for NutchServer - posted by "Fjodor Vershinin (JIRA)" <ji...@apache.org> on 2014/04/09 20:30:17 UTC, 0 replies.
- Creating newbie tag for Nutch Jira - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/04/09 22:35:15 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1751) Empty anchors should not index - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/09 23:11:14 UTC, 1 replies.
- [jira] [Created] (NUTCH-1754) remove BOM from extracted plain text - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/10 00:06:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1733) parse-html to support HTML5 charset definitions - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/10 00:08:19 UTC, 0 replies.
- [jira] [Created] (NUTCH-1755) Project name bug in build.xml - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/10 01:12:16 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1751) Empty anchors should not index - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/10 01:41:16 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1733) parse-html to support HTML5 charset definitions - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/04/10 01:41:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1414) Date extraction parse filter - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/04/10 15:39:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1422) reset signature for redirects - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/04/10 15:41:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1694) Consider removing URL filter attribute warnings. - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/04/10 15:43:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1732) IndexerMapReduce to delete explicitly not indexable documents - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/04/10 15:43:15 UTC, 0 replies.
- Pushing content to Solr from Nutch - posted by Xavier Morera <xa...@familiamorera.com> on 2014/04/10 19:05:01 UTC, 3 replies.
- [ANNOUNCE] crawler-commons 0.4 is released - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/04/11 20:14:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1731) Better cmd line parsing for NutchServer - posted by "Fjodor Vershinin (JIRA)" <ji...@apache.org> on 2014/04/12 22:54:15 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1454) parsing chm failed - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/13 18:27:16 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1454) parsing chm failed - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/13 18:27:17 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1708) use same id when indexing and deleting redirects - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/04/14 13:43:17 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1731) Better cmd line parsing for NutchServer - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/14 20:35:27 UTC, 0 replies.
- [jira] [Created] (NUTCH-1756) Security layer for NutchServer - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/14 20:41:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1756) Security layer for NutchServer - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/14 20:41:24 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1726) HeadingsFilter does not find nested nodes - posted by "lufeng (JIRA)" <ji...@apache.org> on 2014/04/15 16:58:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1714) Nutch 2.x upgrade to use GORA_94 branch - posted by "Alexander Kingson (JIRA)" <ji...@apache.org> on 2014/04/15 21:00:17 UTC, 6 replies.
- [jira] [Commented] (NUTCH-1676) Add rudimentary SSL support to protocol-http - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 16:22:18 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1720) Duplicate lines in HttpBase.java - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 16:30:16 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1147) WebGraph nodeDumper uses only 1 reducer - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 16:32:21 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1697) SegmentMerger to implement Tool - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 16:38:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1603) ZIP parser complains about truncated PDF file - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 16:40:14 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 16:42:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1743) parsechecker to show outlinks - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 16:58:16 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1743) parsechecker to show outlinks - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 16:58:17 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1743) parsechecker to show outlinks - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 17:00:28 UTC, 2 replies.
- [jira] [Issue Comment Deleted] (NUTCH-1743) parsechecker to show outlinks - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 17:08:19 UTC, 0 replies.
- [jira] [Created] (NUTCH-1757) ParserChecker to take custom metadata as input - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 17:14:20 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1757) ParserChecker to take custom metadata as input - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 17:16:19 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1603) ZIP parser complains about truncated PDF file - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/04/16 17:28:19 UTC, 2 replies.
- [jira] [Created] (NUTCH-1758) IndexChecker to send document to IndexWriters - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 17:32:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1758) IndexChecker to send document to IndexWriters - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 17:34:14 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1758) IndexChecker to send document to IndexWriters - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/16 17:34:19 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1566) bin/nutch to allow whitespace in paths - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/17 00:00:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1603) ZIP parser complains about truncated PDF file - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/17 00:18:18 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1473) Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/17 00:22:16 UTC, 1 replies.
- [jira] [Closed] (NUTCH-1473) Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/17 00:22:17 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1164) Write JUnit tests for protocol-http - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/17 00:24:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-992) SolrDedup is broken in 2.x - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/17 00:24:17 UTC, 0 replies.
- [jira] [Closed] (NUTCH-970) Injector job crashes with MySQL with table collation set to utf8_general_ci - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/17 00:26:15 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-970) Injector job crashes with MySQL with table collation set to utf8_general_ci - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/17 00:26:15 UTC, 1 replies.
- [jira] [Updated] (NUTCH-970) Injector job crashes with MySQL with table collation set to utf8_general_ci - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/17 00:28:15 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-970) Injector job crashes with MySQL with table collation set to utf8_general_ci - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/17 00:28:15 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-1473) Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/17 00:30:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1473) Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/17 00:30:19 UTC, 0 replies.
- [jira] [Created] (NUTCH-1759) Upgrade to Crawler Commons 0.4 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/17 00:34:20 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1308) Unnecessary truncate content configuration, and logging in parse-zip/ZipParser - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/17 00:40:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1308) Unnecessary truncate content configuration, and logging in parse-zip/ZipParser - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/17 00:48:18 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1605) mime type detector recognizes xlsx as zip file - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/17 00:48:24 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers - posted by "lufeng (JIRA)" <ji...@apache.org> on 2014/04/17 03:59:16 UTC, 0 replies.
- [jira] [Created] (NUTCH-1760) Crawl script fails to find job file if called from outside bin dir - posted by "David Hosking (JIRA)" <ji...@apache.org> on 2014/04/17 10:00:36 UTC, 0 replies.
- [jira] [Created] (NUTCH-1761) Crawl script fails to find job file if not started from inside bin dir - posted by "David Hosking (JIRA)" <ji...@apache.org> on 2014/04/17 10:02:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1761) Crawl script fails to find job file if not started from inside bin dir - posted by "David Hosking (JIRA)" <ji...@apache.org> on 2014/04/17 10:04:14 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1760) Crawl script fails to find job file if called from outside bin dir - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/17 11:28:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1761) Crawl script fails to find job file if not started from inside bin dir - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/17 12:04:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1761) Crawl script fails to find job file if not started from inside bin dir - posted by "David Hosking (JIRA)" <ji...@apache.org> on 2014/04/17 12:45:15 UTC, 2 replies.
- [jira] [Closed] (NUTCH-1761) Crawl script fails to find job file if not started from inside bin dir - posted by "David Hosking (JIRA)" <ji...@apache.org> on 2014/04/17 12:45:16 UTC, 0 replies.
- [jira] [Commented] (NUTCH-207) Bandwidth target for fetcher rather than a thread count - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/17 16:35:15 UTC, 0 replies.
- [jira] [Issue Comment Deleted] (NUTCH-207) Bandwidth target for fetcher rather than a thread count - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/17 17:29:17 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1197) Add statically configured field values to solrindex-mapping.xml - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/18 10:24:15 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-685) Content-level redirect status lost in ParseSegment - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/18 10:24:16 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-477) Extend URLFilters to support different filtering chains - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/18 10:24:17 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a "?" - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/18 10:24:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-385) Server delay feature conflicts with maxThreadsPerHost - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/18 10:24:19 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1615) Implementing A Feature for Fetching From Websites Dump - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/18 10:26:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1511) Metadata in MYSQL updated with 'garbage' - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/18 10:28:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1079) StringBuffer converted to StringBuilder - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/18 10:28:18 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1079) StringBuffer converted to StringBuilder - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/18 10:30:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1625) IndexerMapReduce skips FETCH_NOTMODIFIED - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/18 10:30:18 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1182) fetcher should track and shut down hung threads - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/18 10:40:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/18 11:05:15 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1270) some of Deflate encoded pages not fetched - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/18 11:05:16 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1410) impact of a map-reduce problem - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/18 11:08:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1416) IndexerMapReduce can index older version of a document instead of latest one - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/20 17:56:14 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1416) IndexerMapReduce can index older version of a document instead of latest one - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/20 18:02:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-926) Redirections from META tag - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/20 18:06:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-926) Redirections from META tag don't get filtered - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/20 18:08:14 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1700) Remove deprecated code in src/plugin/creativecommons/build.xml - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/20 18:30:16 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-1665) Generator to implement Tool - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/20 18:37:19 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1627) Debian package for installing nutch - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/20 18:43:16 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1626) Homebrew formula for installing Nutch in Mac OS X - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/20 18:45:17 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1628) Chocolatey package for Windows users - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/20 18:45:17 UTC, 0 replies.
- [jira] [Created] (NUTCH-1762) project web site's search (provided by lucid) is broken - posted by "Ahmet Arslan (JIRA)" <ji...@apache.org> on 2014/04/23 03:15:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1762) project web site's search (provided by lucid) is broken - posted by "Ahmet Arslan (JIRA)" <ji...@apache.org> on 2014/04/23 03:17:17 UTC, 2 replies.
- [ANNOUNCE] NUTCH-841 Accepted into Google Summer of Code - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/04/23 15:07:31 UTC, 0 replies.
- Debugging Nutch from Windows - posted by Diaa Abdallah <di...@gmail.com> on 2014/04/23 16:57:23 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1700) Remove deprecated code in src/plugin/creativecommons/build.xml - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/23 17:22:15 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1700) Remove deprecated code in src/plugin/creativecommons/build.xml - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/23 17:28:17 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1700) Remove deprecated code in src/plugin/creativecommons/build.xml - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/23 17:28:18 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1762) project web site's search (provided by lucid) is broken - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/23 22:06:16 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1762) project web site's search (provided by lucid) is broken - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/23 22:06:16 UTC, 0 replies.
- Re: [ANNOUNCEMENT] Apache Gora 0.4 Release - posted by Talat Uyarer <ta...@uyarer.com> on 2014/04/24 07:34:42 UTC, 0 replies.
- [jira] [Created] (NUTCH-1763) Improving comments on the Injector Class - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/04/24 14:35:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1763) Improving comments on the Injector Class - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/04/24 14:37:14 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1763) Improving comments on the Injector Class - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/04/24 14:37:15 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1762) project web site's search (provided by lucid) is broken - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/24 20:13:16 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1182) fetcher should track and shut down hung threads - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/25 00:13:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1182) fetcher to log hung threads - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/25 00:15:16 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1182) fetcher to log hung threads - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/25 00:15:16 UTC, 0 replies.
- Contributing Improvements to Classes documentation - posted by Diaa Abdallah <di...@gmail.com> on 2014/04/25 01:01:47 UTC, 5 replies.
- Build failed in Jenkins: Nutch-trunk #2616 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/04/25 06:33:52 UTC, 0 replies.
- Why are web urls not assumed to be http - posted by Diaa Abdallah <di...@gmail.com> on 2014/04/25 11:53:24 UTC, 1 replies.
- [jira] [Created] (NUTCH-1764) readdb arguments check bug - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/04/25 13:44:17 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1764) readdb arguments check bug - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/04/25 13:46:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1764) readdb arguments check bug - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/04/25 13:46:16 UTC, 1 replies.
- [jira] [Closed] (NUTCH-1762) project web site's search (provided by lucid) is broken - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/25 20:03:24 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1752) cache robots.txt rules per protocol:host:port - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/25 23:39:15 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #2617 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/04/26 10:17:45 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-566) Sun's URL class has bug in creation of relative query URLs - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/26 23:36:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-952) fix outlink which started with '?' in html parser - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/26 23:47:15 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-952) fix outlink which started with '?' in html parser - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/26 23:47:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-566) Sun's URL class has bug in creation of relative query URLs - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/26 23:47:17 UTC, 0 replies.
- [jira] [Commented] (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a "?" - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/26 23:53:15 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1764) readdb to show command-line help if no action (-stats, -dump, etc.) given - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/27 00:03:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1764) readdb to show command-line help if no action (-stats, -dump, etc.) given - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/27 00:14:16 UTC, 0 replies.
- [jira] [Created] (NUTCH-1765) SolrClean to remove redirected URLs from Solr - posted by "Iain Lopata (JIRA)" <ji...@apache.org> on 2014/04/27 00:42:14 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1764) readdb to show command-line help if no action (-stats, -dump, etc.) given - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/04/27 01:00:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-797) URL not properly constructed when link target begins with a "?" - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/27 11:24:15 UTC, 2 replies.
- [jira] [Commented] (NUTCH-797) URL not properly constructed when link target begins with a "?" - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/27 11:24:17 UTC, 2 replies.
- [jira] [Commented] (NUTCH-952) fix outlink which started with '?' in html parser - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/27 11:29:14 UTC, 0 replies.
- [jira] [Commented] (NUTCH-566) Sun's URL class has bug in creation of relative query URLs - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/27 11:31:15 UTC, 0 replies.
- [jira] [Created] (NUTCH-1766) Generator to unlock crawldb and remove tempdir if generate job fails - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/04/27 23:55:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1766) Generator to unlock crawldb and remove tempdir if generate job fails - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/04/28 00:04:14 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1766) Generator to unlock crawldb and remove tempdir if generate job fails - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/04/28 00:04:15 UTC, 0 replies.
- [jira] [Created] (NUTCH-1767) remove special treatment of "params" in relative links - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/28 00:20:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1767) remove special treatment of "params" in relative links - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/28 00:24:14 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1364) Add a counter in Generator for malformed urls - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/04/28 00:28:15 UTC, 1 replies.
- [DISCUSS] Roadmap for 2.3 Release - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/04/28 02:50:16 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1129) Any23 Nutch plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/28 02:54:14 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1422) reset signature for redirects - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/28 10:51:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-207) Bandwidth target for fetcher rather than a thread count - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/28 10:59:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1759) Upgrade to Crawler Commons 0.4 - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/28 13:03:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1674) Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/28 14:14:17 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1674) Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index - posted by "Alparslan Avcı (JIRA)" <ji...@apache.org> on 2014/04/28 16:16:16 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #1005 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/04/28 16:20:31 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1759) Upgrade to Crawler Commons 0.4 - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/04/28 16:22:15 UTC, 1 replies.
- Build failed in Jenkins: Nutch-trunk #2621 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/04/28 17:01:52 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-797) URL not properly constructed when link target begins with a "?" - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/28 23:11:16 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-797) URL not properly constructed when link target begins with a "?" - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/04/28 23:11:17 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #2622 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/04/29 11:52:54 UTC, 0 replies.
- [jira] [Created] (NUTCH-1768) port NUTCH-1745 to Nutch 2.x (Upgrade to ElasticSearch 1.1.0) - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/30 11:12:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1768) port NUTCH-1745 to Nutch 2.x (Upgrade to ElasticSearch 1.1.0) - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/30 11:14:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1720) Duplicate lines in HttpBase.java - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/30 15:15:16 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1720) Duplicate lines in HttpBase.java - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/04/30 15:15:17 UTC, 1 replies.