You are viewing a plain text version of this content. The canonical link for it is here.
- Re: Patch Available status? - posted by Stefan Groschupf <sg...@101tec.com> on 2006/09/01 00:06:54 UTC, 3 replies.
- [jira] Updated: (NUTCH-358) Language Switching - posted by "David Podunavac (JIRA)" <ji...@apache.org> on 2006/09/01 09:49:23 UTC, 0 replies.
- [jira] Updated: (NUTCH-358) Language Switching PROBLEM FIXED - posted by "David Podunavac (JIRA)" <ji...@apache.org> on 2006/09/01 09:51:23 UTC, 0 replies.
- [jira] Created: (NUTCH-360) Switch nutch to use java 5 source format - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/01 17:19:22 UTC, 0 replies.
- LuceneQueryOptimizer and no query - posted by daniel rosher <da...@hotonline.com> on 2006/09/01 17:22:22 UTC, 0 replies.
- [jira] Resolved: (NUTCH-360) Switch nutch to use java 5 source format - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/02 07:15:23 UTC, 0 replies.
- [jira] Created: (NUTCH-361) generator create fetchlist randomly - posted by "Uros Gruber (JIRA)" <ji...@apache.org> on 2006/09/02 20:15:22 UTC, 0 replies.
- [jira] Commented: (NUTCH-361) generator create fetchlist randomly - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/03 06:39:23 UTC, 16 replies.
- limitation - posted by an...@orbita1.ru on 2006/09/04 10:53:10 UTC, 0 replies.
- [jira] Updated: (NUTCH-361) generator create fetchlist randomly - posted by "Uros Gruber (JIRA)" <ji...@apache.org> on 2006/09/04 11:33:23 UTC, 0 replies.
- several url to search for [multiple url] - posted by David Podunavac <da...@wyona.com> on 2006/09/04 15:43:53 UTC, 0 replies.
- Re: Should URL normalization iterate? - posted by Neal Richter <nr...@gmail.com> on 2006/09/04 20:51:46 UTC, 0 replies.
- [jira] Commented: (NUTCH-249) black- white list url filtering - posted by "Uros Gruber (JIRA)" <ji...@apache.org> on 2006/09/05 10:11:23 UTC, 2 replies.
- problem with hadoop - posted by Richard Braman <rb...@bramantax.com> on 2006/09/06 01:11:19 UTC, 2 replies.
- Nutch nightly build failure - posted by nu...@lucene.apache.org on 2006/09/06 02:22:16 UTC, 1 replies.
- [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb - posted by "Richard Braman (JIRA)" <ji...@apache.org> on 2006/09/06 03:13:25 UTC, 2 replies.
- Why "nutch plugin" says the plugin is "not present or inactive"? - posted by Teruhiko Kurosaka <Ku...@basistech.com> on 2006/09/06 05:39:06 UTC, 0 replies.
- indexing problem - posted by an...@orbita1.ru on 2006/09/06 10:31:38 UTC, 2 replies.
- Content-type detection for Tika - posted by Jukka Zitting <ju...@gmail.com> on 2006/09/06 11:36:33 UTC, 1 replies.
- 0.8.1 - posted by Sami Siren <ss...@gmail.com> on 2006/09/06 17:49:29 UTC, 4 replies.
- [Fwd: Re: get CrawlDatum] - posted by Uroš Gruber <ur...@sir-mag.com> on 2006/09/06 19:43:05 UTC, 2 replies.
- [jira] Updated: (NUTCH-105) Network error during robots.txt fetch causes file to be ignored - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/07 16:27:35 UTC, 0 replies.
- log error in deploying nutch-0.9-dev.jar - posted by AJ Chen <ca...@gmail.com> on 2006/09/07 18:16:01 UTC, 1 replies.
- [jira] Commented: (NUTCH-208) http: proxy exception list: - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/07 19:29:23 UTC, 0 replies.
- [jira] Created: (NUTCH-362) Remove parse-text from unsupported filetypes in parse-plugins.xml - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/07 20:02:23 UTC, 0 replies.
- [jira] Commented: (NUTCH-273) When a page is redirected, the original url is NOT updated. - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/07 20:14:25 UTC, 0 replies.
- [jira] Commented: (NUTCH-339) Refactor nutch to allow fetcher improvements - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/07 20:16:24 UTC, 2 replies.
- [jira] Commented: (NUTCH-359) extraction of links will fail for whole page if one single link cannot be parsed - posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2006/09/08 06:56:23 UTC, 0 replies.
- Re: HTTP/1.1 problem - posted by og...@yahoo.com on 2006/09/08 07:19:08 UTC, 0 replies.
- Re: Ontology compile bug - posted by og...@yahoo.com on 2006/09/08 07:25:00 UTC, 2 replies.
- [jira] Updated: (NUTCH-339) Refactor nutch to allow fetcher improvements - posted by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2006/09/08 10:17:28 UTC, 0 replies.
- [jira] Created: (NUTCH-363) Fetcher normalizes everything at least twice - posted by "Doug Cook (JIRA)" <ji...@apache.org> on 2006/09/08 20:47:22 UTC, 0 replies.
- [jira] Created: (NUTCH-364) Javascript parser creates some fairly bogus URLs - posted by "Doug Cook (JIRA)" <ji...@apache.org> on 2006/09/09 02:23:49 UTC, 0 replies.
- [jira] Created: (NUTCH-365) Flexible URL normalization - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/09 15:22:22 UTC, 0 replies.
- [jira] Assigned: (NUTCH-365) Flexible URL normalization - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/09 15:22:23 UTC, 0 replies.
- [jira] Updated: (NUTCH-365) Flexible URL normalization - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/09 15:24:23 UTC, 0 replies.
- [jira] Commented: (NUTCH-365) Flexible URL normalization - posted by "Doug Cook (JIRA)" <ji...@apache.org> on 2006/09/09 17:44:26 UTC, 5 replies.
- How could I test my modify to NutchAnalysis.jj? - posted by heack <ko...@gmail.com> on 2006/09/10 09:49:51 UTC, 2 replies.
- Help: DistributedSearch thown ClassCastException - posted by emanihc <em...@gmail.com> on 2006/09/10 17:15:27 UTC, 0 replies.
- I use eclipse to run NutchAnalysis.java, but it meet QueryFilter RunTime error - posted by heack <ko...@gmail.com> on 2006/09/10 17:19:34 UTC, 0 replies.
- [jira] Created: (NUTCH-366) Merge URLFilters and URLNormalizers - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/12 16:29:22 UTC, 1 replies.
- [jira] Created: (NUTCH-367) DistributedSearch thown ClassCastException - posted by "emanihc (JIRA)" <ji...@apache.org> on 2006/09/12 16:59:22 UTC, 0 replies.
- File system watching for intranets - posted by Ben Ogle <og...@gmail.com> on 2006/09/12 20:04:13 UTC, 2 replies.
- I modify NutchAnalysis.jj and NutchDocumentTokenizer.java to let nutch support chinese word. - posted by heack <ko...@gmail.com> on 2006/09/13 16:02:50 UTC, 0 replies.
- Re: Any plans to move to build Nutchusing Maven? - posted by og...@yahoo.com on 2006/09/13 23:14:47 UTC, 0 replies.
- ask a problem about nutch (from China) - posted by yin chunhui <hu...@hotmail.com> on 2006/09/15 05:14:45 UTC, 2 replies.
- [jira] Commented: (NUTCH-353) pages that serverside forwards will be refetched every time - posted by "King Kong (JIRA)" <ji...@apache.org> on 2006/09/15 06:18:23 UTC, 1 replies.
- [jira] Created: (NUTCH-368) Message queueing system - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/15 22:37:24 UTC, 0 replies.
- [jira] Updated: (NUTCH-368) Message queueing system - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/15 22:39:23 UTC, 1 replies.
- A Problem about Nutch Plugin - posted by yin chunhui <hu...@hotmail.com> on 2006/09/18 11:53:07 UTC, 0 replies.
- [jira] Created: (NUTCH-369) StringUtil.resolveEncodingAlias is unuseful. - posted by "King Kong (JIRA)" <ji...@apache.org> on 2006/09/18 12:24:26 UTC, 0 replies.
- Time of Reading Local Files - posted by Jane Zhen <zh...@hotmail.com> on 2006/09/18 14:55:07 UTC, 0 replies.
- Speed of reading local files - posted by Zhen Zhen <zh...@cs.dal.ca> on 2006/09/18 19:07:54 UTC, 0 replies.
- Empty "incoming anchor text" - posted by Zhen Zhen <zh...@cs.dal.ca> on 2006/09/18 19:13:38 UTC, 0 replies.
- [jira] Commented: (NUTCH-368) Message queueing system - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/09/18 19:32:23 UTC, 4 replies.
- Which tutorial to use for getting Nutch 9.12 up and running on a single machine? - posted by Jp Mutch <jp...@yahoo.com> on 2006/09/18 19:48:47 UTC, 2 replies.
- I cann't fetch wml page - posted by yin chunhui <hu...@hotmail.com> on 2006/09/19 07:49:04 UTC, 0 replies.
- [jira] Resolved: (NUTCH-105) Network error during robots.txt fetch causes file to be ignored - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/19 18:10:24 UTC, 0 replies.
- [jira] Commented: (NUTCH-364) Javascript parser creates some fairly bogus URLs - posted by "Doug Cook (JIRA)" <ji...@apache.org> on 2006/09/19 19:37:25 UTC, 0 replies.
- [jira] Resolved: (NUTCH-367) DistributedSearch thown ClassCastException - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/19 21:36:24 UTC, 0 replies.
- CrawlDatum.modifiedTime ? - posted by "Kim, Greg" <gr...@shopping.com> on 2006/09/19 23:22:04 UTC, 0 replies.
- Ant tasks/build.xml file for running Nutch in debug mode? - posted by Jp Mutch <jp...@yahoo.com> on 2006/09/22 07:05:38 UTC, 0 replies.
- [jira] Created: (NUTCH-370) Generator loosed urls when run with LocalJobRunner - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/22 19:03:22 UTC, 0 replies.
- [jira] Updated: (NUTCH-370) Generator looses urls when run with LocalJobRunner - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/22 19:03:23 UTC, 0 replies.
- [jira] Closed: (NUTCH-365) Flexible URL normalization - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/22 23:02:23 UTC, 0 replies.
- [jira] Assigned: (NUTCH-332) doubling score causes by page internal anchors. - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/22 23:04:25 UTC, 0 replies.
- [jira] Closed: (NUTCH-332) doubling score causes by page internal anchors. - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/22 23:46:24 UTC, 0 replies.
- [jira] Closed: (NUTCH-336) Harvested links shouldn't get db.score.injected in addition to inbound contributions - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/23 19:28:23 UTC, 0 replies.
- [jira] Updated: (NUTCH-353) pages that serverside forwards will be refetched every time - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/23 19:50:24 UTC, 0 replies.
- [jira] Closed: (NUTCH-337) Fetcher ignores the fetcher.parse value configured in config file - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/23 20:57:23 UTC, 0 replies.
- [jira] Closed: (NUTCH-253) Normalize Host during Generate - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/23 21:01:23 UTC, 0 replies.
- [jira] Closed: (NUTCH-350) urls blocked db.fetch.retry.max * http.max.delays times during fetching are marked as STATUS_DB_GONE - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/23 21:45:23 UTC, 0 replies.
- [jira] Closed: (NUTCH-276) db.score.link.internal problem - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/23 21:47:24 UTC, 0 replies.
- [jira] Closed: (NUTCH-205) Wrong 'fetch date' for non available pages - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/23 21:51:23 UTC, 0 replies.
- [jira] Closed: (NUTCH-105) Network error during robots.txt fetch causes file to be ignored - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/24 17:31:24 UTC, 0 replies.
- [jira] Closed: (NUTCH-266) hadoop bug when doing updatedb - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/24 17:31:26 UTC, 0 replies.
- [jira] Closed: (NUTCH-318) log4j not proper configured, readdb doesnt give any information - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/24 17:31:27 UTC, 0 replies.
- [jira] Closed: (NUTCH-338) Remove the text parser as an option for parsing PDF files in parse-plugins.xml - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/24 17:31:28 UTC, 0 replies.
- [jira] Closed: (NUTCH-344) Fetcher threads blocked on synchronized block in cleanExpiredServerBlocks - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/24 17:31:31 UTC, 0 replies.
- [jira] Closed: (NUTCH-370) Generator looses urls when run with LocalJobRunner - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/24 18:42:23 UTC, 0 replies.
- Modifications necessary to upgrade to Hadoop 0.6.2 - posted by Marcel Petrisor <ma...@mcr.ro> on 2006/09/25 17:49:16 UTC, 0 replies.
- [jira] Created: (NUTCH-371) DeleteDuplicates should remove documents with duplicate URLs - posted by "Chris Schneider (JIRA)" <ji...@apache.org> on 2006/09/25 18:14:51 UTC, 0 replies.
- [jira] Updated: (NUTCH-371) DeleteDuplicates should remove documents with duplicate URLs - posted by "Chris Schneider (JIRA)" <ji...@apache.org> on 2006/09/25 18:16:51 UTC, 0 replies.
- [jira] Created: (NUTCH-372) Fetcher halting and throttling - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/26 11:47:50 UTC, 0 replies.
- [jira] Created: (NUTCH-373) Fetcher halting and throttling - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/26 11:49:50 UTC, 0 replies.
- [jira] Closed: (NUTCH-372) Fetcher halting and throttling - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/26 11:57:53 UTC, 0 replies.
- [jira] Closed: (NUTCH-373) Fetcher halting and throttling - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/26 11:57:54 UTC, 0 replies.
- Searching on fields with uppercase letters - posted by Enrico Triolo <en...@gmail.com> on 2006/09/26 16:10:24 UTC, 2 replies.
- [jira] Commented: (NUTCH-351) Protocol forward proxy - posted by "Chris Schneider (JIRA)" <ji...@apache.org> on 2006/09/27 04:52:51 UTC, 1 replies.
- [jira] Created: (NUTCH-374) when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing. - posted by "King Kong (JIRA)" <ji...@apache.org> on 2006/09/27 19:32:51 UTC, 0 replies.
- [jira] Created: (NUTCH-375) Link to 0.8.x apidocs broken on website - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/29 05:25:50 UTC, 0 replies.
- [jira] Resolved: (NUTCH-375) Link to 0.8.x apidocs broken on website - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/29 05:27:51 UTC, 0 replies.
- [jira] Commented: (NUTCH-374) when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing. - posted by "Meghna Kukreja (JIRA)" <ji...@apache.org> on 2006/09/29 16:31:51 UTC, 1 replies.
- wavering again and then the hell the earth which bodies are - posted by Bradley Parker <ef...@yelir.com> on 2006/09/30 16:40:00 UTC, 0 replies.
- [jira] Assigned: (NUTCH-374) when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing. - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2006/09/30 21:31:21 UTC, 0 replies.
- [jira] Resolved: (NUTCH-374) when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing. - posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org> on 2006/09/30 21:39:21 UTC, 0 replies.
- [jira] Created: (NUTCH-376) Add methods to control runtime behaviour of NutchBean - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/09/30 21:45:19 UTC, 0 replies.
- Re: svn commit: r451649 - /lucene/nutch/trunk/CHANGES.txt - posted by Sami Siren <ss...@gmail.com> on 2006/09/30 21:46:09 UTC, 5 replies.