You are viewing a plain text version of this content. The canonical link for it is here.
- Re: Fwd: Maven configuration - posted by Sebastian Nagel <wa...@googlemail.com> on 2017/11/02 11:06:22 UTC, 2 replies.
- [jira] [Updated] (NUTCH-2368) Variable generate.max.count and fetcher.server.delay - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2017/11/03 14:12:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2368) Variable generate.max.count and fetcher.server.delay - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/11/03 14:14:00 UTC, 6 replies.
- [jira] [Created] (NUTCH-2454) REST API fix for usage of hostdb in generator - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2017/11/03 14:20:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2454) REST API fix for usage of hostdb in generator - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2017/11/03 14:22:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2442) Injector to stop if job fails to avoid loss of CrawlDb - posted by "Omkar Reddy (JIRA)" <ji...@apache.org> on 2017/11/03 19:59:03 UTC, 10 replies.
- [jira] [Resolved] (NUTCH-2383) Wrong FS exception in Fetcher - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/04 16:43:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2242) lastModified not always set - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/04 17:12:00 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2450) Remove FixMe in ParseOutputFormat - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/05 19:46:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2451) MalformedURLExceptions on perfectly looking URLs? - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/05 20:44:01 UTC, 3 replies.
- [jira] [Commented] (NUTCH-2452) Problem retrieving encoded URLs via FTP? - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/05 20:47:00 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2453) FTP protocol seems to have issues running multithreaded - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/05 20:49:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2033) parse-tika skips valid documents. - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/05 21:03:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2443) Extract links from the video tag with the parse-html plugin - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/05 21:04:00 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2443) Extract links from the video tag with the parse-html plugin - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/05 21:05:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2452) Problem retrieving encoded URLs via FTP? - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/05 21:25:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2431) Filterchecker to implement Tool-interface - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/06 13:21:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2422) Update information about git repository - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/06 13:52:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2420) Bug in variable generate.max.count and fetcher.server.delay - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/06 15:06:00 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-2420) Bug in variable generate.max.count and fetcher.server.delay - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/11/06 16:10:01 UTC, 0 replies.
- [jira] [Created] (NUTCH-2455) Speed up the merging of HostDb entries for variable fetch delay - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/11/06 16:11:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2455) Speed up the merging of HostDb entries for variable fetch delay - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/11/06 16:11:00 UTC, 1 replies.
- quick start for dev Nutch in Intellij? - posted by "Allison, Timothy B." <ta...@mitre.org> on 2017/11/06 17:30:44 UTC, 2 replies.
- [jira] [Created] (NUTCH-2456) Redirected documents are not indexed - posted by "Yossi Tamari (JIRA)" <ji...@apache.org> on 2017/11/06 17:54:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2456) Redirected documents are not indexed - posted by "Yossi Tamari (JIRA)" <ji...@apache.org> on 2017/11/06 18:09:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2442) Injector to stop if job fails to avoid loss of CrawlDb - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/06 21:44:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/11/06 21:55:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2422) Update information about git repository - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/06 22:11:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2422) Update information about git repository - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/06 22:12:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2298) TestCrawlDbStates.testCrawlDbStatTransitionInject broken - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/06 22:13:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2076) exceptions are not handled when using method waitForCompletion in a try block - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/06 22:16:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2040) Upgrade to Crawler Commons 0.6 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/06 22:24:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2040) Upgrade to Crawler Commons 0.6 - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/06 22:25:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2040) Upgrade to recent version of Crawler-Commons - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/06 22:26:00 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2040) Upgrade to recent version of Crawler-Commons - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/06 22:26:00 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-2451) MalformedURLExceptions on perfectly looking URLs? - posted by "Hiran Chaudhuri (JIRA)" <ji...@apache.org> on 2017/11/07 11:42:00 UTC, 5 replies.
- [jira] [Commented] (NUTCH-2456) Redirected documents are not indexed - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/07 15:56:00 UTC, 3 replies.
- [jira] [Comment Edited] (NUTCH-2456) Redirected documents are not indexed - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/07 15:57:00 UTC, 1 replies.
- [jira] [Updated] (NUTCH-2456) Allow to index pages/URLs not contained in CrawlDb - posted by "Yossi Tamari (JIRA)" <ji...@apache.org> on 2017/11/07 17:37:02 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2456) Allow to index pages/URLs not contained in CrawlDb - posted by "Yossi Tamari (JIRA)" <ji...@apache.org> on 2017/11/07 17:41:02 UTC, 14 replies.
- [jira] [Commented] (NUTCH-2317) Plugin jars don't get added to classpath while running in local - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/08 08:52:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers. - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/08 14:24:00 UTC, 4 replies.
- [jira] [Commented] (NUTCH-2383) Wrong FS exception in Fetcher - posted by "Omkar Reddy (JIRA)" <ji...@apache.org> on 2017/11/08 14:27:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2375) Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/08 20:05:00 UTC, 2 replies.
- [jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/08 21:04:00 UTC, 0 replies.
- [Nutch Wiki] Update of "NutchHadoopSingleNodeTutorial" by OmkarReddy - posted by Apache Wiki <wi...@apache.org> on 2017/11/09 12:56:18 UTC, 0 replies.
- Request for patches review - posted by Semyon Semyonov <se...@mail.com> on 2017/11/09 16:34:52 UTC, 1 replies.
- [jira] [Created] (NUTCH-2458) TikaParser doesn't work with tika-config.xml set - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/11/09 16:48:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2458) TikaParser doesn't work with tika-config.xml set - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/11/09 16:49:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2458) TikaParser doesn't work with tika-config.xml set - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/11/09 16:58:00 UTC, 2 replies.
- [jira] [Created] (NUTCH-2459) Nutch cannot download/parse some files - posted by "Hiran Chaudhuri (JIRA)" <ji...@apache.org> on 2017/11/10 00:01:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2459) Nutch cannot download/parse some files via FTP - posted by "Hiran Chaudhuri (JIRA)" <ji...@apache.org> on 2017/11/10 00:02:00 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-2458) TikaParser doesn't work with tika-config.xml set - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/11/10 09:59:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2455) Speed up the merging of HostDb entries for variable fetch delay - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/10 16:54:00 UTC, 4 replies.
- [jira] [Created] (NUTCH-2460) use the headless option of firefox and chrome in protocol-selenium - posted by "hussein Al_Ahmad (JIRA)" <ji...@apache.org> on 2017/11/11 12:04:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2460) use the headless option of firefox and chrome in protocol-selenium - posted by "hussein Al_Ahmad (JIRA)" <ji...@apache.org> on 2017/11/11 13:15:00 UTC, 2 replies.
- [jira] [Closed] (NUTCH-2452) Problem retrieving encoded URLs via FTP? - posted by "Hiran Chaudhuri (JIRA)" <ji...@apache.org> on 2017/11/11 23:28:00 UTC, 0 replies.
- [Nutch Wiki] Update of "Release_HOWTO" by SebastianNagel - posted by Apache Wiki <wi...@apache.org> on 2017/11/13 08:27:02 UTC, 0 replies.
- [jira] [Created] (NUTCH-2461) Generate pass the data to when maxCount == 0 - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2017/11/14 13:41:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2461) Generate passes the data to when maxCount == 0 - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2017/11/14 13:42:00 UTC, 0 replies.
- [jira] [Issue Comment Deleted] (NUTCH-2368) Variable generate.max.count and fetcher.server.delay - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2017/11/14 13:43:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2439) Upgrade to Apache Tika 1.17 - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/11/15 14:36:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2462) Cleanup Tika Boilerpipe patch - posted by "Jorge Luis Betancourt Gonzalez (JIRA)" <ji...@apache.org> on 2017/11/15 15:45:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/11/16 16:16:00 UTC, 2 replies.
- [jira] [Comment Edited] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/11/16 16:23:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2463) Enable sampling CrawlDB - posted by "Yossi Tamari (JIRA)" <ji...@apache.org> on 2017/11/20 14:35:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2463) Enable sampling CrawlDB - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/20 14:45:00 UTC, 4 replies.
- [jira] [Commented] (NUTCH-1129) Any23 Nutch plugin - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/20 17:03:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2464) Headers That Contain HTML Elements Are Not Parsed - posted by "Cass Pallansch (JIRA)" <ji...@apache.org> on 2017/11/20 18:44:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2464) Headers That Contain HTML Elements Are Not Parsed - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/21 22:02:00 UTC, 7 replies.
- [jira] [Updated] (NUTCH-2464) Headers That Contain HTML Elements Are Not Parsed - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/21 22:03:00 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-2464) Headers That Contain HTML Elements Are Not Parsed - posted by "Cass Pallansch (JIRA)" <ji...@apache.org> on 2017/11/22 12:33:00 UTC, 0 replies.
- Build path errors(Eclipse) in the latest nutch develop - posted by Semyon Semyonov <se...@mail.com> on 2017/11/23 13:36:00 UTC, 3 replies.
- [jira] [Commented] (NUTCH-2460) use the headless option of firefox and chrome in protocol-selenium - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/23 16:02:14 UTC, 0 replies.
- [jira] [Created] (NUTCH-2465) Broken Eclipse project. Classpaths and interactiveselenium should be fixed. - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2017/11/27 10:53:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2465) Broken Eclipse project. Classpaths and interactiveselenium should be fixed. - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2017/11/27 14:31:00 UTC, 7 replies.
- [jira] [Updated] (NUTCH-2463) Enable sampling CrawlDB - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/28 10:49:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2463) Enable sampling CrawlDB - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/28 10:50:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-2466) Sitemap processor to follow redirects - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/11/28 12:19:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2466) Sitemap processor to follow redirects - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/11/28 12:21:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-2467) Sitemap type field can be null - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/11/28 13:12:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2467) Sitemap type field can be null - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2017/11/28 13:50:00 UTC, 1 replies.
- [jira] [Commented] (NUTCH-2454) REST API fix for usage of hostdb in generator - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/29 13:21:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2461) Generate passes the data to when maxCount == 0 - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/29 13:51:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2441) ARG_SEGMENT usage - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/29 14:46:01 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2467) Sitemap type field can be null - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/29 18:46:00 UTC, 0 replies.
- [jira] [Issue Comment Deleted] (NUTCH-2465) Broken Eclipse project. Classpaths and interactiveselenium should be fixed. - posted by "Semyon Semyonov (JIRA)" <ji...@apache.org> on 2017/11/30 13:23:00 UTC, 0 replies.
- Test org.apache.nutch.net.TestURLNormalizers FAILED in the latest master - posted by Semyon Semyonov <se...@mail.com> on 2017/11/30 13:30:32 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-2465) Broken Eclipse project. Classpaths and interactiveselenium should be fixed. - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/30 18:58:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-2464) Headers That Contain HTML Elements Are Not Parsed - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/30 19:21:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-2464) Plugin headings: Headers That Contain HTML Elements Are Not Parsed - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/30 19:22:01 UTC, 0 replies.
- [jira] [Created] (NUTCH-2468) should filter out invalid URLs by default - posted by "Michael Coffey (JIRA)" <ji...@apache.org> on 2017/11/30 19:39:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2468) should filter out invalid URLs by default - posted by "Michael Coffey (JIRA)" <ji...@apache.org> on 2017/11/30 19:47:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2464) Plugin headings: Headers That Contain HTML Elements Are Not Parsed - posted by "Hudson (JIRA)" <ji...@apache.org> on 2017/11/30 19:53:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-2395) Cannot run job worker! - error while running multiple crawling jobs in parallel - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/11/30 20:05:00 UTC, 2 replies.