You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] Commented: (NUTCH-313) moreFrom property in search.properties cannot be translated into Japanese. Compound text issue. - posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2006/07/01 01:08:30 UTC, 0 replies.
- [jira] Created: (NUTCH-316) Confusion about query languages - posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2006/07/01 01:34:29 UTC, 0 replies.
- [jira] Created: (NUTCH-317) Clarify what the queryLanguage argument of Query.parse(...) means - posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2006/07/01 01:40:30 UTC, 0 replies.
- neko parser or tagsoup parser? - posted by Uygar Yüzsüren <uy...@gmail.com> on 2006/07/03 09:27:29 UTC, 0 replies.
- Re: Nutch web site - posted by Piotr Kosiorowski <pk...@gmail.com> on 2006/07/04 17:55:02 UTC, 0 replies.
- Re: 0.8 release - posted by Piotr Kosiorowski <pk...@gmail.com> on 2006/07/04 17:56:13 UTC, 8 replies.
- Error with Hadoop-0.4.0 - posted by Jérôme Charron <je...@gmail.com> on 2006/07/06 17:54:41 UTC, 12 replies.
- [jira] Resolved: (NUTCH-317) Clarify what the queryLanguage argument of Query.parse(...) means - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2006/07/06 18:49:30 UTC, 0 replies.
- [jira] Reopened: (NUTCH-309) Uses commons logging Code Guards - posted by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/07/07 10:59:32 UTC, 0 replies.
- [jira] Commented: (NUTCH-309) Uses commons logging Code Guards - posted by "Jerome Charron (JIRA)" <ji...@apache.org> on 2006/07/07 11:27:32 UTC, 0 replies.
- [jira] Commented: (NUTCH-300) Clustering API improvements - posted by "nutch.newbie (JIRA)" <ji...@apache.org> on 2006/07/07 12:01:30 UTC, 1 replies.
- Number of pages different to Indexed documents - posted by Lourival Júnior <ju...@gmail.com> on 2006/07/07 19:02:29 UTC, 0 replies.
- Nutch based directory and crawler based on keyword - posted by Syed Kamran Ali <sy...@gmail.com> on 2006/07/08 16:03:19 UTC, 1 replies.
- [jira] Updated: (NUTCH-279) Additions for regex-normalize - posted by "Stefan Neufeind (JIRA)" <ji...@apache.org> on 2006/07/09 17:33:33 UTC, 0 replies.
- Crawl error - posted by AJ Chen <ca...@gmail.com> on 2006/07/10 06:47:23 UTC, 0 replies.
- Re: [Nutch-dev] Crawl error - posted by Stefan Groschupf <sg...@media-style.com> on 2006/07/10 07:36:12 UTC, 2 replies.
- [jira] Created: (NUTCH-318) log4j not proper configured, readdb doesnt give any information - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/07/10 21:07:29 UTC, 0 replies.
- Opportunities at Oracle Corporation - Oracle Enterprise Search - posted by Mark Wilkerson <ma...@oracle.com> on 2006/07/11 07:42:02 UTC, 0 replies.
- [jira] Resolved: (NUTCH-172) Segment merger - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/11 23:02:30 UTC, 0 replies.
- Know about Xapian-features - posted by Stefan Neufeind <ap...@stefan-neufeind.de> on 2006/07/12 01:24:05 UTC, 0 replies.
- Simultaneous update/search? - posted by Stefan Neufeind <ap...@stefan-neufeind.de> on 2006/07/12 01:36:40 UTC, 0 replies.
- Basic character-cleanups easily possible? - posted by Stefan Neufeind <ap...@stefan-neufeind.de> on 2006/07/13 00:23:36 UTC, 1 replies.
- Re: Possible memory leak? - posted by Enrico Triolo <en...@gmail.com> on 2006/07/13 13:29:02 UTC, 1 replies.
- OPICScoringFilter & Metadata transport scores as String - posted by Stefan Groschupf <sg...@media-style.com> on 2006/07/16 00:36:47 UTC, 0 replies.
- [jira] Created: (NUTCH-319) OPICScoringFilter should use logging API instead of printStackTrace - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/07/16 00:45:14 UTC, 0 replies.
- Possible problem in WebAppModule - posted by William Surowiec <ws...@gmail.com> on 2006/07/17 04:40:13 UTC, 2 replies.
- [jira] Created: (NUTCH-320) DmozParser does not output urls to stdout - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/17 08:53:13 UTC, 0 replies.
- [jira] Resolved: (NUTCH-320) DmozParser does not output urls to stdout - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/17 08:55:14 UTC, 0 replies.
- [Re: Possible problem in WebAppModule] - posted by William Surowiec <ws...@gmail.com> on 2006/07/17 13:48:50 UTC, 0 replies.
- Vertical Search (Nutch) for Opensource Jobs- http://www.myopensourcejobs.com - posted by Sudhi Seshachala <su...@yahoo.com> on 2006/07/17 15:21:45 UTC, 1 replies.
- [jira] Created: (NUTCH-321) Scoring API deficiency - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/07/17 15:53:14 UTC, 0 replies.
- [jira] Updated: (NUTCH-321) Scoring API deficiency - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/07/17 16:08:14 UTC, 0 replies.
- Windows BAT - posted by Kerry Wilson <kw...@wmsco.com> on 2006/07/17 16:18:27 UTC, 0 replies.
- Library for extracting text content from binaries - posted by Jukka Zitting <ju...@gmail.com> on 2006/07/17 23:59:48 UTC, 4 replies.
- [jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/18 21:51:14 UTC, 3 replies.
- db.max.inlinks - posted by Stefan Groschupf <sg...@media-style.com> on 2006/07/19 01:00:20 UTC, 3 replies.
- [jira] Created: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/07/19 14:11:13 UTC, 0 replies.
- error in recommended plugin example - posted by Chris Stephens <ch...@liveoakinteractive.com> on 2006/07/19 19:24:50 UTC, 0 replies.
- [jira] Closed: (NUTCH-173) PerHost Crawling Policy ( crawl.ignore.external.links ) - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/07/19 19:34:14 UTC, 0 replies.
- [jira] Closed: (NUTCH-271) Meta-data per URL/site/section - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/07/19 20:22:14 UTC, 0 replies.
- [jira] Commented: (NUTCH-271) Meta-data per URL/site/section - posted by "Stefan Neufeind (JIRA)" <ji...@apache.org> on 2006/07/19 20:54:15 UTC, 2 replies.
- [jira] Created: (NUTCH-323) CrawlDatum.set just reference a mapWritable of a other object but not copy it. - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/07/19 23:39:14 UTC, 0 replies.
- [jira] Updated: (NUTCH-323) CrawlDatum.set just reference a mapWritable of a other object but not copy it. - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/07/19 23:41:16 UTC, 0 replies.
- [jira] Closed: (NUTCH-293) support for Crawl-delay in Robots.txt - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/07/20 00:06:15 UTC, 0 replies.
- [jira] Closed: (NUTCH-323) CrawlDatum.set just reference a mapWritable of a other object but not copy it. - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/07/20 00:33:15 UTC, 0 replies.
- [jira] Closed: (NUTCH-321) Scoring API deficiency - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/07/20 00:42:14 UTC, 0 replies.
- Webcrawler - posted by "Brian M.B. Keaney" <bk...@fas.harvard.edu> on 2006/07/20 01:20:39 UTC, 0 replies.
- [jira] Created: (NUTCH-324) db.score.link.internal and db.score.link.external are ignored - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/07/20 01:48:13 UTC, 0 replies.
- [jira] Updated: (NUTCH-324) db.score.link.internal and db.score.link.external are ignored - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/07/20 01:54:15 UTC, 0 replies.
- [jira] Resolved: (NUTCH-319) OPICScoringFilter should use logging API instead of printStackTrace - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/07/20 01:56:15 UTC, 0 replies.
- [jira] Commented: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages - posted by "Enrico Triolo (JIRA)" <ji...@apache.org> on 2006/07/20 11:50:14 UTC, 5 replies.
- nutch-extensionpoints not in plugin.includes - posted by Stefan Groschupf <sg...@media-style.com> on 2006/07/20 22:09:24 UTC, 2 replies.
- [jira] Created: (NUTCH-325) UrlFilters.java throws NPE in case urlfilter.order contains Filters that are not in plugin.includes - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/07/20 23:55:14 UTC, 0 replies.
- [jira] Updated: (NUTCH-325) UrlFilters.java throws NPE in case urlfilter.order contains Filters that are not in plugin.includes - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/07/20 23:57:17 UTC, 1 replies.
- log when blocked by robots.txt - posted by Stefan Groschupf <sg...@media-style.com> on 2006/07/21 01:21:38 UTC, 1 replies.
- Distributed Matrix Computering on Hadoop - posted by Jack Tang <hi...@gmail.com> on 2006/07/21 11:24:27 UTC, 0 replies.
- multiple query filters - posted by Chris Stephens <ch...@liveoakinteractive.com> on 2006/07/21 18:08:08 UTC, 0 replies.
- Changing javac.version to 1.5? - posted by Greg Kim <gr...@gmail.com> on 2006/07/21 21:44:13 UTC, 1 replies.
- [jira] Created: (NUTCH-326) WordExtractor throws java.util.NoSuchElementException on some documents - posted by "Tom Jensen (JIRA)" <ji...@apache.org> on 2006/07/21 23:59:13 UTC, 0 replies.
- [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/23 20:22:15 UTC, 2 replies.
- [jira] Created: (NUTCH-327) bin/nutch setting of log path problems on cygwin - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/23 20:30:13 UTC, 0 replies.
- [jira] Resolved: (NUTCH-327) bin/nutch setting of log path problems on cygwin - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/23 20:45:14 UTC, 0 replies.
- [jira] Created: (NUTCH-328) commons-cli-2.0-SNAPSHOT.jar provided with nutch is not compatible with jdk 1.4 - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/23 20:56:14 UTC, 0 replies.
- [jira] Resolved: (NUTCH-328) commons-cli-2.0-SNAPSHOT.jar provided with nutch is not compatible with jdk 1.4 - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/23 21:00:14 UTC, 0 replies.
- tests failing - posted by Sami Siren <ss...@gmail.com> on 2006/07/23 22:27:16 UTC, 1 replies.
- [Fwd: Re: [jira] Commented: (NUTCH-271) Meta-data per URL/site/section] - posted by Stefan Neufeind <ap...@stefan-neufeind.de> on 2006/07/24 01:01:20 UTC, 0 replies.
- result comparison tool? - posted by Stefan Groschupf <sg...@media-style.com> on 2006/07/24 02:41:56 UTC, 1 replies.
- [jira] Created: (NUTCH-329) CrawlDbReader processTopNJob does not set jobNames - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/07/24 03:11:13 UTC, 0 replies.
- [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2006/07/24 06:33:19 UTC, 0 replies.
- [jira] Closed: (NUTCH-329) CrawlDbReader processTopNJob does not set jobNames - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/07/24 10:39:15 UTC, 0 replies.
- [jira] Updated: (NUTCH-167) Observation of directive - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/07/24 17:08:14 UTC, 0 replies.
- [jira] Closed: (NUTCH-324) db.score.link.internal and db.score.link.external are ignored - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/07/24 17:26:14 UTC, 0 replies.
- segread vs. readseg - posted by Stefan Groschupf <sg...@media-style.com> on 2006/07/24 20:17:06 UTC, 4 replies.
- Why was "prune" removed in 0.8? - posted by Stefan Neufeind <ap...@stefan-neufeind.de> on 2006/07/25 01:37:29 UTC, 1 replies.
- Scanning the database - posted by Robert Sanford <rs...@trefs.com> on 2006/07/25 17:10:11 UTC, 1 replies.
- Indexing href attribute in links. - posted by Robert Sanford <rs...@trefs.com> on 2006/07/25 17:11:12 UTC, 0 replies.
- How can i get a page content or parse data by the page's url - posted by Aaron Tang <ga...@gmail.com> on 2006/07/25 18:36:55 UTC, 3 replies.
- Limiting Results By Domain - posted by Robert Sanford <rs...@trefs.com> on 2006/07/25 19:58:29 UTC, 0 replies.
- [jira] Updated: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/25 21:40:15 UTC, 0 replies.
- [jira] Updated: (NUTCH-249) black- white list url filtering - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/25 21:40:16 UTC, 0 replies.
- [jira] Updated: (NUTCH-86) LanguageIdentifier API enhancements - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/25 21:40:17 UTC, 0 replies.
- [jira] Updated: (NUTCH-74) French Analyzer Plugin - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/25 21:40:19 UTC, 0 replies.
- [jira] Updated: (NUTCH-251) Administration GUI - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/25 21:40:21 UTC, 0 replies.
- [jira] Updated: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/25 21:40:22 UTC, 0 replies.
- [jira] Updated: (NUTCH-318) log4j not proper configured, readdb doesnt give any information - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/25 21:40:23 UTC, 0 replies.
- [jira] Updated: (NUTCH-262) Summary excerpts and highlights problems - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/25 21:40:24 UTC, 0 replies.
- [jira] Updated: (NUTCH-310) Review Log Levels - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/25 21:40:25 UTC, 0 replies.
- [jira] Updated: (NUTCH-247) robot parser to restrict. - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/25 21:40:26 UTC, 0 replies.
- [jira] Updated: (NUTCH-233) wrong regular expression hang reduce process for ever - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/25 21:40:27 UTC, 0 replies.
- [jira] Commented: (NUTCH-318) log4j not proper configured, readdb doesnt give any information - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/07/25 22:16:15 UTC, 8 replies.
- [jira] Created: (NUTCH-330) command line tool to search a Lucene index - posted by "Renaud Richardet (JIRA)" <ji...@apache.org> on 2006/07/25 22:20:13 UTC, 0 replies.
- [jira] Updated: (NUTCH-330) command line tool to search a Lucene index - posted by "Renaud Richardet (JIRA)" <ji...@apache.org> on 2006/07/25 22:20:15 UTC, 1 replies.
- [jira] Commented: (NUTCH-233) wrong regular expression hang reduce process for ever - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/07/25 22:24:14 UTC, 0 replies.
- [jira] Updated: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2006/07/25 23:56:15 UTC, 1 replies.
- [jira] Resolved: (NUTCH-315) CrawlDbReader usage text - implementation mismatch - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/26 08:48:14 UTC, 0 replies.
- [jira] Created: (NUTCH-331) Fetcher incorrectly reports task progress to tasktracker resulting in skipped URLs - posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/07/27 13:12:14 UTC, 0 replies.
- [jira] Created: (NUTCH-332) doubling score causes by page internal anchors. - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/07/28 08:14:13 UTC, 0 replies.
- [jira] Updated: (NUTCH-332) doubling score causes by page internal anchors. - posted by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/07/28 08:16:14 UTC, 1 replies.
- [jira] Created: (NUTCH-333) SegmentMerger and SegmentReader should use NutchJob - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2006/07/29 06:22:13 UTC, 0 replies.
- [jira] Updated: (NUTCH-333) SegmentMerger and SegmentReader should use NutchJob - posted by "stack@archive.org (JIRA)" <ji...@apache.org> on 2006/07/29 06:27:14 UTC, 0 replies.
- [jira] Updated: (NUTCH-331) Fetcher incorrectly reports task progress to tasktracker resulting in skipped URLs - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/29 06:51:18 UTC, 0 replies.
- [jira] Updated: (NUTCH-309) Uses commons logging Code Guards - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/29 06:51:19 UTC, 0 replies.
- [jira] Updated: (NUTCH-261) Multi Language Support - posted by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/07/29 06:51:21 UTC, 0 replies.
- [jira] Created: (NUTCH-334) I am using the search technique - posted by "Siddharudh nadgeri (JIRA)" <ji...@apache.org> on 2006/07/31 16:33:15 UTC, 0 replies.
- [jira] Created: (NUTCH-335) Pdf summary corrupt issue - posted by "Siddharudh nadgeri (JIRA)" <ji...@apache.org> on 2006/07/31 16:37:13 UTC, 0 replies.
- [jira] Commented: (NUTCH-335) Pdf summary corrupt issue - posted by "Stefan Neufeind (JIRA)" <ji...@apache.org> on 2006/07/31 16:46:15 UTC, 0 replies.
- [jira] Commented: (NUTCH-334) I am using the search technique - posted by "Stefan Neufeind (JIRA)" <ji...@apache.org> on 2006/07/31 16:48:14 UTC, 0 replies.
- [jira] Updated: (NUTCH-208) http: proxy exception list: - posted by "Renaud Richardet (JIRA)" <ji...@apache.org> on 2006/07/31 23:25:15 UTC, 0 replies.