You are viewing a plain text version of this content. The canonical link for it is here.
- Build failed in Hudson: Nutch-Nightly #312 - posted by hu...@lucene.zones.apache.org on 2008/01/01 05:19:03 UTC, 0 replies.
- Build failed in Hudson: Nutch-Nightly #313 - posted by hu...@lucene.zones.apache.org on 2008/01/02 05:24:53 UTC, 0 replies.
- [jira] Created: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2008/01/02 09:54:34 UTC, 0 replies.
- [jira] Updated: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2008/01/02 09:58:34 UTC, 1 replies.
- Build failed in Hudson: Nutch-Nightly #314 - posted by hu...@lucene.zones.apache.org on 2008/01/02 17:05:33 UTC, 0 replies.
- Hudson build is back to normal: Nutch-Nightly #315 - posted by hu...@lucene.zones.apache.org on 2008/01/02 20:38:38 UTC, 0 replies.
- Student contributions - posted by Frank McCown <fm...@harding.edu> on 2008/01/02 23:44:52 UTC, 3 replies.
- Build failed in Hudson: Nutch-Nightly #316 - posted by hu...@lucene.zones.apache.org on 2008/01/03 05:42:56 UTC, 0 replies.
- Hudson build is back to normal: Nutch-Nightly #317 - posted by hu...@lucene.zones.apache.org on 2008/01/04 06:44:23 UTC, 0 replies.
- [jira] Commented: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2008/01/04 08:31:34 UTC, 1 replies.
- [jira] Commented: (NUTCH-580) Remove deprecated hadoop api calls (FS) - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2008/01/04 08:37:33 UTC, 2 replies.
- [jira] Commented: (NUTCH-531) Pages with no ContentType cause a Null Pointer exception - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2008/01/04 08:57:34 UTC, 1 replies.
- [jira] Issue Comment Edited: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2008/01/04 10:55:34 UTC, 0 replies.
- [jira] Commented: (NUTCH-596) ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2008/01/04 12:04:34 UTC, 1 replies.
- [jira] Resolved: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/01/04 20:51:34 UTC, 0 replies.
- [jira] Closed: (NUTCH-481) http.content.limit is broken in the protocol-httpclient plugin - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/01/04 20:51:35 UTC, 0 replies.
- [jira] Closed: (NUTCH-560) protocol-httpclient reading more bytes than http.content.limit - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/01/04 20:53:34 UTC, 0 replies.
- [jira] Closed: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/01/04 20:53:34 UTC, 0 replies.
- [jira] Closed: (NUTCH-561) HttpClient plugin does not work with NTLM authentication - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/01/04 20:53:34 UTC, 0 replies.
- [jira] Closed: (NUTCH-539) HttpClient plugin does not work with BasicAuthentication - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/01/04 20:53:34 UTC, 0 replies.
- [jira] Commented: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/01/04 20:59:34 UTC, 1 replies.
- [jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup. - posted by "Dawid Weiss (JIRA)" <ji...@apache.org> on 2008/01/05 18:38:33 UTC, 1 replies.
- Build failed in Hudson: Nutch-Nightly #319 - posted by hu...@lucene.zones.apache.org on 2008/01/06 05:27:39 UTC, 0 replies.
- Build failed in Hudson: Nutch-Nightly #320 - posted by hu...@lucene.zones.apache.org on 2008/01/07 05:29:01 UTC, 0 replies.
- Tika 0.1-incubating released - posted by Chris Mattmann <ch...@jpl.nasa.gov> on 2008/01/07 19:00:29 UTC, 0 replies.
- [jira] Created: (NUTCH-599) nutch crawl and index problem - posted by "sudarat (JIRA)" <ji...@apache.org> on 2008/01/08 02:46:38 UTC, 2 replies.
- Build failed in Hudson: Nutch-Nightly #321 - posted by hu...@lucene.zones.apache.org on 2008/01/08 05:46:11 UTC, 0 replies.
- [jira] Closed: (NUTCH-599) nutch crawl and index problem - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/01/08 08:44:34 UTC, 0 replies.
- Problems with Hadhoop Log4J on Nutch 0.8.1 - posted by Jesiel Trevisan <je...@gmail.com> on 2008/01/08 19:01:14 UTC, 0 replies.
- [jira] Created: (NUTCH-600) Nutch index problem - posted by "sudarat (JIRA)" <ji...@apache.org> on 2008/01/09 05:54:33 UTC, 0 replies.
- Build failed in Hudson: Nutch-Nightly #322 - posted by hu...@lucene.zones.apache.org on 2008/01/09 06:31:57 UTC, 0 replies.
- Hudson build is back to normal: Nutch-Nightly #323 - posted by hu...@lucene.zones.apache.org on 2008/01/09 21:40:49 UTC, 0 replies.
- nutch and future - posted by "tigger ." <b1...@hotmail.com> on 2008/01/10 17:34:51 UTC, 1 replies.
- [jira] Commented: (NUTCH-534) SegmentMerger: add -normalize option - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2008/01/11 13:02:34 UTC, 1 replies.
- [jira] Commented: (NUTCH-528) CrawlDbReader: add some new stats + dump into a csv format - posted by "Emmanuel Joke (JIRA)" <ji...@apache.org> on 2008/01/11 13:04:34 UTC, 0 replies.
- [jira] Closed: (NUTCH-600) Nutch index problem - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/01/11 19:03:35 UTC, 0 replies.
- setting number of reduce outputs problem - posted by viz <vi...@gmail.com> on 2008/01/12 01:05:11 UTC, 1 replies.
- Plugins? - posted by Bryan Bishop <ka...@gmail.com> on 2008/01/12 02:37:28 UTC, 1 replies.
- [jira] Closed: (NUTCH-534) SegmentMerger: add -normalize option - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/15 18:55:36 UTC, 0 replies.
- [jira] Resolved: (NUTCH-534) SegmentMerger: add -normalize option - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/15 18:55:36 UTC, 0 replies.
- [jira] Commented: (NUTCH-368) Message queueing system - posted by "Chris Chiappone (JIRA)" <ji...@apache.org> on 2008/01/15 22:09:38 UTC, 1 replies.
- [jira] Resolved: (NUTCH-528) CrawlDbReader: add some new stats + dump into a csv format - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/15 23:03:34 UTC, 0 replies.
- [jira] Closed: (NUTCH-528) CrawlDbReader: add some new stats + dump into a csv format - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/15 23:05:39 UTC, 0 replies.
- [jira] Resolved: (NUTCH-597) Fetcher2 - java.lang.NullPointerException when host does not exist and fetcher.threads.per.host.by.ip is set to true causes threads to finish. - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/15 23:39:34 UTC, 0 replies.
- [jira] Closed: (NUTCH-597) Fetcher2 - java.lang.NullPointerException when host does not exist and fetcher.threads.per.host.by.ip is set to true causes threads to finish. - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/15 23:41:34 UTC, 0 replies.
- [jira] Commented: (NUTCH-594) Serve Nutch search results in XML and JSON - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/15 23:49:39 UTC, 0 replies.
- [jira] Commented: (NUTCH-592) Fetcher2 : NPE for page with status ProtocolStatus.TEMP_MOVED - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/16 00:01:41 UTC, 0 replies.
- [jira] Commented: (NUTCH-590) Index multiple docs per call using IndexingFilter extension point - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/16 00:11:35 UTC, 0 replies.
- [jira] Commented: (NUTCH-584) urls missing from fetchlist - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/16 02:09:34 UTC, 2 replies.
- [jira] Updated: (NUTCH-584) urls missing from fetchlist - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/16 02:11:34 UTC, 0 replies.
- Serious bug in Generator / FreeGenerator - posted by Andrzej Bialecki <ab...@getopt.org> on 2008/01/16 02:15:33 UTC, 0 replies.
- [jira] Commented: (NUTCH-363) Fetcher normalizes everything at least twice - posted by "iwan cornelius (JIRA)" <ji...@apache.org> on 2008/01/16 07:57:35 UTC, 1 replies.
- [jira] Commented: (NUTCH-597) Fetcher2 - java.lang.NullPointerException when host does not exist and fetcher.threads.per.host.by.ip is set to true causes threads to finish. - posted by "Hudson (JIRA)" <ji...@apache.org> on 2008/01/16 09:15:42 UTC, 0 replies.
- [jira] Resolved: (NUTCH-584) urls missing from fetchlist - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/16 17:54:35 UTC, 0 replies.
- [jira] Closed: (NUTCH-584) urls missing from fetchlist - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/16 17:54:35 UTC, 0 replies.
- Help: parsing pdf files - posted by Krishnamohan Meduri <Kr...@Sun.COM> on 2008/01/16 21:31:17 UTC, 1 replies.
- Need pointers regarding accessing crawled data/customizing policy for crawl. - posted by Manoj Bist <ma...@gmail.com> on 2008/01/17 08:32:47 UTC, 1 replies.
- Build failed in Hudson: Nutch-Nightly #331 - posted by hu...@lucene.zones.apache.org on 2008/01/17 17:34:07 UTC, 0 replies.
- [jira] Updated: (NUTCH-570) Improvement of URL Ordering in Generator.java - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/17 21:20:34 UTC, 0 replies.
- [jira] Resolved: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/17 21:28:34 UTC, 0 replies.
- [jira] Resolved: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/17 21:28:34 UTC, 0 replies.
- [jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/17 21:28:34 UTC, 0 replies.
- [jira] Commented: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/17 21:28:35 UTC, 0 replies.
- [jira] Closed: (NUTCH-159) Specify temp/working directory for crawl - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/17 21:30:34 UTC, 0 replies.
- [jira] Commented: (NUTCH-159) Specify temp/working directory for crawl - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/17 21:30:34 UTC, 0 replies.
- [jira] Closed: (NUTCH-95) DeleteDuplicates depends on the order of input segments - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/17 21:32:34 UTC, 0 replies.
- [jira] Commented: (NUTCH-95) DeleteDuplicates depends on the order of input segments - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/17 21:32:34 UTC, 0 replies.
- End-Of-Life status for 0.7.x? - posted by Andrzej Bialecki <ab...@getopt.org> on 2008/01/17 21:38:32 UTC, 7 replies.
- New Developer - posted by Ahmad Dahlan <a_...@yahoo.com> on 2008/01/18 02:53:56 UTC, 0 replies.
- Hudson build is back to normal: Nutch-Nightly #332 - posted by hu...@lucene.zones.apache.org on 2008/01/18 07:00:01 UTC, 0 replies.
- NOTICE: End Of Life status for Nutch 0.7.x - posted by Andrzej Bialecki <ab...@getopt.org> on 2008/01/18 10:52:41 UTC, 0 replies.
- [jira] Resolved: (NUTCH-580) Remove deprecated hadoop api calls (FS) - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2008/01/19 10:01:08 UTC, 0 replies.
- Build failed in Hudson: Nutch-trunk #333 - posted by Hudson Apache Zone <hu...@hudson.zones.apache.org> on 2008/01/19 10:08:59 UTC, 0 replies.
- [jira] Commented: (NUTCH-595) "Target file:/.... already exists" - posted by "armand rayman (JIRA)" <ji...@apache.org> on 2008/01/20 06:14:36 UTC, 0 replies.
- Build failed in Hudson: Nutch-trunk #334 - posted by Hudson Apache Zone <hu...@hudson.zones.apache.org> on 2008/01/20 10:08:01 UTC, 0 replies.
- Crawl taking too much time - posted by ki...@wipro.com on 2008/01/21 06:57:08 UTC, 1 replies.
- Hudson build is back to normal: Nutch-trunk #335 - posted by Hudson Apache Zone <hu...@hudson.zones.apache.org> on 2008/01/21 10:15:26 UTC, 0 replies.
- [jira] Closed: (NUTCH-12) WebDBReader options to print incoming links - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:10:37 UTC, 0 replies.
- [jira] Commented: (NUTCH-12) WebDBReader options to print incoming links - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:10:38 UTC, 0 replies.
- [jira] Closed: (NUTCH-175) No input directories specified in: while crawing in nightly build from the 14.1.2006: sh ./nutch crawl urllist.txt -dir tmpdir - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:12:34 UTC, 0 replies.
- [jira] Commented: (NUTCH-226) CrawlDb Filter tool - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:14:34 UTC, 0 replies.
- [jira] Closed: (NUTCH-226) CrawlDb Filter tool - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:14:34 UTC, 0 replies.
- [jira] Closed: (NUTCH-59) meta data support in webdb - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:18:35 UTC, 0 replies.
- [jira] Commented: (NUTCH-59) meta data support in webdb - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:18:36 UTC, 0 replies.
- [jira] Closed: (NUTCH-115) jobtracker.jsp shows too much information - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:22:34 UTC, 0 replies.
- [jira] Commented: (NUTCH-128) second configuration nodes overwrites first node - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:22:34 UTC, 0 replies.
- [jira] Commented: (NUTCH-115) jobtracker.jsp shows too much information - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:22:34 UTC, 0 replies.
- [jira] Closed: (NUTCH-128) second configuration nodes overwrites first node - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:22:34 UTC, 0 replies.
- [jira] Closed: (NUTCH-163) LogFormatter design - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:26:43 UTC, 0 replies.
- [jira] Commented: (NUTCH-163) LogFormatter design - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:26:49 UTC, 0 replies.
- [jira] Closed: (NUTCH-252) Launching a segread/readdb command kills any running nutch commands - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:30:47 UTC, 0 replies.
- [jira] Commented: (NUTCH-252) Launching a segread/readdb command kills any running nutch commands - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:31:04 UTC, 0 replies.
- [jira] Closed: (NUTCH-438) Add -noAdditions to updatedb - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:38:34 UTC, 0 replies.
- [jira] Commented: (NUTCH-440) Command line utilities should exit with an error message when given wrong arguments - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:38:35 UTC, 0 replies.
- [jira] Closed: (NUTCH-440) Command line utilities should exit with an error message when given wrong arguments - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:38:35 UTC, 0 replies.
- [jira] Commented: (NUTCH-438) Add -noAdditions to updatedb - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:38:35 UTC, 0 replies.
- [jira] Closed: (NUTCH-368) Message queueing system - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/01/22 15:50:36 UTC, 0 replies.
- Build failed in Hudson: Nutch-Nightly #336 - posted by hu...@lucene.zones.apache.org on 2008/01/23 09:21:35 UTC, 0 replies.
- [ANNOUNCE] New Build Server - posted by Nigel Daley <nd...@yahoo-inc.com> on 2008/01/24 00:42:18 UTC, 0 replies.
- URLs can not be fetched when run nutch0.9 in Eclipse3.2 - posted by Shi Wang <wa...@gmail.com> on 2008/01/24 10:40:27 UTC, 0 replies.
- Build failed in Hudson: Nutch-trunk #340 - posted by Apache Hudson Server <hu...@hudson.zones.apache.org> on 2008/01/26 11:03:18 UTC, 0 replies.
- Hudson build is back to normal: Nutch-trunk #341 - posted by Apache Hudson Server <hu...@hudson.zones.apache.org> on 2008/01/26 21:14:50 UTC, 0 replies.
- [jira] Updated: (NUTCH-587) Upgrade Nutch to use Hadoop 0.15.3 release - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2008/01/26 23:08:34 UTC, 1 replies.
- Help needed? - posted by showWayer <ec...@Hotmail.com> on 2008/01/27 10:08:20 UTC, 0 replies.
- Build failed in Hudson: Nutch-trunk #343 - posted by Apache Hudson Server <hu...@hudson.zones.apache.org> on 2008/01/28 07:30:59 UTC, 0 replies.
- Build failed in Hudson: Nutch-trunk #344 - posted by Apache Hudson Server <hu...@hudson.zones.apache.org> on 2008/01/28 09:10:05 UTC, 0 replies.
- [jira] Resolved: (NUTCH-587) Upgrade Nutch to use Hadoop 0.15.3 release - posted by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2008/01/28 23:39:34 UTC, 0 replies.
- Hudson build is back to normal: Nutch-trunk #345 - posted by Apache Hudson Server <hu...@hudson.zones.apache.org> on 2008/01/29 05:35:16 UTC, 0 replies.
- read crawldb. - posted by nadav hashimshony <na...@gmail.com> on 2008/01/29 17:28:48 UTC, 8 replies.
- Reg: Nutch Admin GUI - posted by Prafulla <pr...@gmail.com> on 2008/01/30 05:44:58 UTC, 1 replies.
- cache page return http 500 in 1.0-dev (rev 616745) - posted by Vinci <vi...@polyu.edu.hk> on 2008/01/31 14:34:03 UTC, 2 replies.