You are viewing a plain text version of this content. The canonical link for it is here.
- Jenkins build is back to normal : Nutch-nutchgora #213 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/01 06:21:17 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #1804 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/01 06:32:58 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1234) Upgrade to Tika 1.1 - posted by "Julien Nioche (Resolved) (JIRA)" <ji...@apache.org> on 2012/04/02 13:51:22 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1234) Upgrade to Tika 1.1 - posted by "Julien Nioche (Commented) (JIRA)" <ji...@apache.org> on 2012/04/02 13:51:22 UTC, 3 replies.
- Build failed in Jenkins: nutch-trunk-maven #222 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/02 16:06:24 UTC, 0 replies.
- [jira] [Created] (NUTCH-1323) AjaxNormalizer - posted by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/04/02 22:05:22 UTC, 0 replies.
- [jira] [Created] (NUTCH-1324) DupeDB for Nutch - posted by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/04/02 22:07:21 UTC, 0 replies.
- [jira] [Created] (NUTCH-1325) HostDB for Nutch - posted by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/04/02 22:07:21 UTC, 0 replies.
- [jira] [Created] (NUTCH-1326) HostDeduplicator for Nutch - posted by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/04/02 22:11:22 UTC, 0 replies.
- [jira] [Created] (NUTCH-1327) QueryStringNormalizer - posted by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/04/02 22:57:24 UTC, 0 replies.
- GSoC : Web page scraper plugin - posted by Aamir Khan <sy...@gmail.com> on 2012/04/03 06:45:51 UTC, 5 replies.
- Jenkins build is back to normal : nutch-trunk-maven #223 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/03 07:03:26 UTC, 0 replies.
- [jira] [Created] (NUTCH-1328) a problem with regex-normalize.xml - posted by "behnam nikbakht (Created) (JIRA)" <ji...@apache.org> on 2012/04/03 07:12:40 UTC, 0 replies.
- Re: NutchGora release, and Nutch 1.x trunk release - posted by Markus Jelsma <ma...@openindex.io> on 2012/04/03 12:29:48 UTC, 9 replies.
- [jira] [Resolved] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API - posted by "Markus Jelsma (Resolved) (JIRA)" <ji...@apache.org> on 2012/04/03 13:26:23 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1222) Upgrade to new Hadoop 0.22.0 - posted by "Markus Jelsma (Resolved) (JIRA)" <ji...@apache.org> on 2012/04/03 13:26:24 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1318) Parse time outs crash parsing fetcher - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 13:32:24 UTC, 0 replies.
- [jira] [Updated] (NUTCH-717) Make Nutch Solr integration easier - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 13:32:24 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1245) URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 13:32:24 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1219) Upgrade all jobs to new MapReduce API - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 13:32:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1251) Deletion of duplicates fails with org.apache.solr.client.solrj.SolrServerException - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 13:32:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1129) Any23 Nutch plugin - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 13:32:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-578) URL fetched with 403 is generated over and over again - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 13:32:26 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1249) Resolve all issues flagged up by adding javac -Xlint arguement - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 13:58:24 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1273) Fix [deprecation] javac warnings - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:24 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1194) CrawlDB lock should be released earlier - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1116) Write JUnit tests for all plugins - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index? - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1084) ReadDB url throws exception - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1150) http.redirect.max can lead to multiple parses of the same url - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1147) WebGraph nodeDumper uses only 1 reducer - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1183) Summary task for adding command line usage instructions to webgraph classes - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:26 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1176) Fix all javadoc warnings from nightly builds - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:26 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1262) Map `duplicating` content-types to a single type - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:26 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1274) Fix [cast] javac warnings - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:26 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1201) Allow for different FetcherThread impls - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:26 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1119) JUnit test for index-static - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:26 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1040) Backport REST-API from 2.0 - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:26 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1233) Rely on Tika for outlink extraction - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:27 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1014) Migrate from Apache ORO to java.util.regex - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:28 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1063) OutlinkExtractor test generates an exception but does not fail - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:29 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1123) JUnit test for scoring-link - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:29 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1224) Migrate FreeGenerator to MapReduce API - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:29 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1220) Upgrade Solr deps - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:29 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1120) JUnit test for microformats-reltag - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:30 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1122) JUnit test for protocol-ftp - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:30 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1121) JUnit test for parse-js - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:30 UTC, 0 replies.
- [jira] [Updated] (NUTCH-809) Parse-metatags plugin - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:30 UTC, 0 replies.
- [jira] [Updated] (NUTCH-865) Format source code in unique style - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:30 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1252) SegmentReader -get shows wrong data - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:30 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1100) SolrDedup broken - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:31 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1228) Change mapred.task.timeout to mapreduce.task.timeout in fetcher - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:31 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1197) Add statically configured field values to solrindex-mapping.xml - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:31 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1124) JUnit test for scoring-opic - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:31 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1001) bin/nutch fetch/parse handle crawl/segments directory - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:31 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1308) Unnecessary truncate content configuration, and logging in parse-zip/ZipParser - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:31 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1046) Add tests for indexing to SOLR - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:31 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1186) FreeGenerator always normalizes - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:31 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1060) URL filters to produce regexes to be used by OutlinkExtractor. - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:31 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1143) Omit anchor in webgraph's LinkDatum - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:32 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1179) Option to restrict generated records by metadata - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:32 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1247) CrawlDatum.retries should be int - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:32 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1127) JUnit test for urlfilter-validator - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:32 UTC, 0 replies.
- [jira] [Updated] (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1062) Migrate BasicURLNormalizer from Apache ORO to java.util.regex - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1107) Log slow parse entries - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1125) JUnit test for tld - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1126) JUnit test for urlfilter-prefix - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-208) http: proxy exception list: - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1226) Migrate CrawlDbReader to MapReduce API - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1130) JUnit test for Any23 RDF plugin - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1035) Tune Solr config for Nutch users - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1034) Create Solr Velocity templates - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1300) Indexer to normalize URL's - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:34 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1047) Pluggable indexing backends - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1223) Migrate WebGraph to MapReduce API - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1320) IndexChecker and ParseChecker choke on IDN's - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1128) JUnit test for urlmeta - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1319) HostNormalizer - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1151) Index-anchor to add numInlinks count - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1140) index-more plugin, resetTitle method creates multiple values in the Title field - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1021) Migrate OutlinkExtractor from Apache ORO to java.util.regex - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1039) Fetcher fails for pages without content-length header - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1275) Fix [unchecked] javac warnings - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-827) HTTP POST Authentication - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1079) StringBuffer converted to StringBuilder - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1202) Fetcher timebomb kills long waiting fetch jobs - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1218) Improve trunk API documentation - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:35 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default. - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:36 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1149) DomainStats should process numeric CrawlDB metadata - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:36 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1053) Parsing of RSS feeds fails - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:36 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1117) JUnit test for index-anchor - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:36 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1181) Indexer to use webgraph inlinks - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:36 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1118) JUnit test for index-basic - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:36 UTC, 0 replies.
- [jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:36 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:37 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1317) Max content length by MIME-type - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:37 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1215) UpdateDB should not require segment as input - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:37 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1088) Write Solr XML documents - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:38 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1103) Port protocol-sftp to 1.4 - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:38 UTC, 0 replies.
- [jira] [Updated] (NUTCH-828) Fetch Filter - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:08:38 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1277) Fix [fallthrough] javac warnings - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 14:10:23 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1270) some of Deflate encoded pages not fetched - posted by "behnam nikbakht (Commented) (JIRA)" <ji...@apache.org> on 2012/04/03 14:32:24 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1208) Don't include KEYS file in bin distribution - posted by "Julien Nioche (Resolved) (JIRA)" <ji...@apache.org> on 2012/04/03 14:42:30 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1208) Don't include KEYS file in bin distribution - posted by "Hudson (Commented) (JIRA)" <ji...@apache.org> on 2012/04/03 15:10:29 UTC, 1 replies.
- [Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2012/04/03 16:20:53 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "Release_HOWTO" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2012/04/03 16:22:07 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1306) Commit after finished writing to solr index - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/04/04 00:00:29 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1251) Deletion of duplicates fails with org.apache.solr.client.solrj.SolrServerException - posted by "Arkadi Kosmynin (Commented) (JIRA)" <ji...@apache.org> on 2012/04/04 01:20:25 UTC, 0 replies.
- [jira] [Created] (NUTCH-1329) parser not extract outlinks to external web sites - posted by "behnam nikbakht (Created) (JIRA)" <ji...@apache.org> on 2012/04/04 14:21:23 UTC, 0 replies.
- [Nutch Wiki] Update of "IndexMetatags" by JulienNioche - posted by Apache Wiki <wi...@apache.org> on 2012/04/04 16:46:07 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-809) Parse-metatags plugin - posted by "Julien Nioche (Resolved) (JIRA)" <ji...@apache.org> on 2012/04/04 16:51:24 UTC, 0 replies.
- [jira] [Closed] (NUTCH-809) Parse-metatags plugin - posted by "Julien Nioche (Closed) (JIRA)" <ji...@apache.org> on 2012/04/04 16:51:24 UTC, 0 replies.
- Jenkins build is back to normal : nutch-trunk-maven #226 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/06 07:05:35 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions - posted by "Ferdy Galema (Commented) (JIRA)" <ji...@apache.org> on 2012/04/06 12:23:23 UTC, 0 replies.
- [jira] [Created] (NUTCH-1330) OutlinkDB to preserve back up - posted by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/04/06 15:09:23 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1330) OutlinkDB to preserve back up - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/06 15:15:24 UTC, 1 replies.
- Build failed in Jenkins: Nutch-nutchgora #218 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/07 06:17:27 UTC, 4 replies.
- Jenkins build is back to normal : Nutch-nutchgora #219 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/08 06:19:51 UTC, 0 replies.
- [jira] [Commented] (NUTCH-366) Merge URLFilters and URLNormalizers - posted by "Yangxiaolong (Commented) (JIRA)" <ji...@apache.org> on 2012/04/09 14:37:20 UTC, 0 replies.
- Run Nutch Crawl in Eclipse - posted by Andy Xue <an...@gmail.com> on 2012/04/10 03:37:23 UTC, 0 replies.
- question about ObjectCache - posted by Xiaolong Yang <ya...@gmail.com> on 2012/04/10 05:00:53 UTC, 2 replies.
- Nutch 1.x trunk release - posted by Julien Nioche <li...@gmail.com> on 2012/04/10 17:07:57 UTC, 1 replies.
- [jira] [Commented] (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic - posted by "Manuel Antonio Novoa (Commented) (JIRA)" <ji...@apache.org> on 2012/04/11 06:08:38 UTC, 0 replies.
- [jira] [Created] (NUTCH-1331) limit crawler to defined depth - posted by "behnam nikbakht (Created) (JIRA)" <ji...@apache.org> on 2012/04/11 09:38:52 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1331) limit crawler to defined depth - posted by "behnam nikbakht (Updated) (JIRA)" <ji...@apache.org> on 2012/04/11 12:22:18 UTC, 1 replies.
- Build failed in Jenkins: Nutch-trunk #1813 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/11 13:06:18 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1331) limit crawler to defined depth - posted by "Julien Nioche (Commented) (JIRA)" <ji...@apache.org> on 2012/04/11 17:49:18 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #222 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/11 23:17:29 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #223 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/13 06:20:34 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #1814 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/13 06:32:19 UTC, 0 replies.
- [jira] [Created] (NUTCH-1332) db.max.outlinks.per.page not honored - posted by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/04/13 12:47:19 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #224 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/13 15:58:48 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1330) OutlinkDB to preserve back up - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/04/13 16:22:18 UTC, 2 replies.
- Build failed in Jenkins: Nutch-nutchgora #225 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/13 16:23:44 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #226 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/13 16:52:37 UTC, 1 replies.
- Build failed in Jenkins: Nutch-nutchgora #228 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/15 06:07:45 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #1816 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/15 06:09:15 UTC, 0 replies.
- [jira] [Created] (NUTCH-1333) Introduce AvroStore, DataFileAvroStore and Accumulo Datastore implementations - posted by "Lewis John McGibbney (Created) (JIRA)" <ji...@apache.org> on 2012/04/15 20:19:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1333) Introduce AvroStore, DataFileAvroStore and Accumulo Datastore implementations - posted by "Lewis John McGibbney (Updated) (JIRA)" <ji...@apache.org> on 2012/04/15 21:01:18 UTC, 1 replies.
- [jira] [Closed] (NUTCH-1333) Introduce AvroStore, DataFileAvroStore and Accumulo Datastore implementations - posted by "Lewis John McGibbney (Closed) (JIRA)" <ji...@apache.org> on 2012/04/15 21:01:18 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1333) Introduce AvroStore, DataFileAvroStore and Accumulo Datastore implementations - posted by "Lewis John McGibbney (Resolved) (JIRA)" <ji...@apache.org> on 2012/04/15 21:01:18 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #229 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/16 06:06:37 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #1817 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/16 06:08:10 UTC, 0 replies.
- Build failed in Jenkins: nutch-trunk-maven #236 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/16 07:06:18 UTC, 0 replies.
- [VOTE] Apache Nutch 1.5 release rc #1 - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2012/04/16 07:43:22 UTC, 10 replies.
- Jenkins build is back to normal : nutch-trunk-maven #237 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/16 08:06:24 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "AboutPlugins" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2012/04/16 14:15:54 UTC, 1 replies.
- [jira] [Created] (NUTCH-1334) NPE in FetcherOutputFormat - posted by "Julien Nioche (Created) (JIRA)" <ji...@apache.org> on 2012/04/16 14:52:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1334) NPE in FetcherOutputFormat - posted by "Julien Nioche (Updated) (JIRA)" <ji...@apache.org> on 2012/04/16 14:54:17 UTC, 0 replies.
- [jira] [Created] (NUTCH-1335) OutlinkDB to emit unique URL's only - posted by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/04/16 21:28:18 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #230 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/17 06:19:03 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #1818 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/17 06:31:15 UTC, 0 replies.
- [jira] [Created] (NUTCH-1336) Optionally not index db_notmodified pages - posted by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/04/17 12:07:18 UTC, 0 replies.
- NUTCH-1129 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/04/17 12:35:07 UTC, 8 replies.
- [jira] [Updated] (NUTCH-1336) Optionally not index db_notmodified pages - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/17 12:39:18 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1335) OutlinkDB to collect unique URL's only - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/17 14:09:22 UTC, 1 replies.
- [jira] [Commented] (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed - posted by "Roberto Gardenier (Commented) (JIRA)" <ji...@apache.org> on 2012/04/17 14:11:17 UTC, 3 replies.
- [jira] [Created] (NUTCH-1337) WebGraph to follow redirects - posted by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/04/17 14:51:17 UTC, 0 replies.
- [jira] [Issue Comment Edited] (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed - posted by "Markus Jelsma (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/04/17 15:03:18 UTC, 0 replies.
- [jira] [Created] (NUTCH-1338) Determine/remove activation WARN's from project builds. - posted by "Lewis John McGibbney (Created) (JIRA)" <ji...@apache.org> on 2012/04/17 20:27:20 UTC, 0 replies.
- [jira] [Created] (NUTCH-1339) Default URL normalization rules to remove page anchors completely - posted by "Sebastian Nagel (Created) (JIRA)" <ji...@apache.org> on 2012/04/17 22:53:13 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1339) Default URL normalization rules to remove page anchors completely - posted by "Sebastian Nagel (Updated) (JIRA)" <ji...@apache.org> on 2012/04/17 22:57:13 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1339) Default URL normalization rules to remove page anchors completely - posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org> on 2012/04/17 23:01:13 UTC, 1 replies.
- [jira] [Closed] (NUTCH-1246) Upgrade to Hadoop 1.0.0 - posted by "Julien Nioche (Closed) (JIRA)" <ji...@apache.org> on 2012/04/18 11:38:36 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls - posted by "Julien Nioche (Commented) (JIRA)" <ji...@apache.org> on 2012/04/18 11:42:37 UTC, 5 replies.
- [jira] [Commented] (NUTCH-1297) it is better for fetchItemQueues to select items from greater queues first - posted by "Ferdy Galema (Commented) (JIRA)" <ji...@apache.org> on 2012/04/18 14:33:47 UTC, 1 replies.
- [jira] [Created] (NUTCH-1340) Increase scalability by only removing markers when they actually exist for DbUpdaterReducer - posted by "Ferdy Galema (Created) (JIRA)" <ji...@apache.org> on 2012/04/18 16:39:36 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1340) Increase scalability by only removing markers when they actually exist for DbUpdaterReducer - posted by "Ferdy Galema (Updated) (JIRA)" <ji...@apache.org> on 2012/04/18 16:43:36 UTC, 1 replies.
- [jira] [Commented] (NUTCH-882) Design a Host table in GORA - posted by "Patrick Hennig (Commented) (JIRA)" <ji...@apache.org> on 2012/04/19 10:42:47 UTC, 4 replies.
- [jira] [Updated] (NUTCH-882) Design a Host table in GORA - posted by "Julien Nioche (Updated) (JIRA)" <ji...@apache.org> on 2012/04/19 10:56:48 UTC, 2 replies.
- [jira] [Created] (NUTCH-1341) NotModified time set to now but page not modified - posted by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/04/19 16:50:42 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1341) NotModified time set to now but page not modified - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/04/19 16:54:45 UTC, 0 replies.
- [jira] [Created] (NUTCH-1342) Read time out protocol-http - posted by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/04/20 11:40:40 UTC, 0 replies.
- [jira] [Created] (NUTCH-1343) Crawl sites with hashtags in url - posted by "Roberto Gardenier (Created) (JIRA)" <ji...@apache.org> on 2012/04/20 13:18:40 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #234 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/21 06:07:33 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #1822 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/21 06:09:17 UTC, 0 replies.
- [jira] [Created] (NUTCH-1344) BasicURLNormalizer to normalize https same as http - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2012/04/21 12:08:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1344) BasicURLNormalizer to normalize https same as http - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2012/04/21 12:12:35 UTC, 0 replies.
- [jira] [Created] (NUTCH-1345) JAVA_HOME should not be required - posted by "Ben McCann (JIRA)" <ji...@apache.org> on 2012/04/22 01:04:33 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #235 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/22 06:18:55 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #1823 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/04/22 06:30:04 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient - posted by "Ross Judson (JIRA)" <ji...@apache.org> on 2012/04/23 00:31:33 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1322) Indexer not to reindex unmodified docs - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2012/04/23 10:53:44 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1322) Indexer not to reindex unmodified docs - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2012/04/23 10:53:44 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1345) JAVA_HOME should not be required - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/23 16:21:36 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1345) JAVA_HOME should not be required - posted by "Ben McCann (JIRA)" <ji...@apache.org> on 2012/04/23 20:36:33 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1317) Max content length by MIME-type - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2012/04/24 09:39:50 UTC, 5 replies.
- [jira] [Created] (NUTCH-1346) Follow outlinks to ignore external - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2012/04/24 16:03:36 UTC, 0 replies.
- [jira] [Closed] (NUTCH-713) Config options for webgraph Scoring not documented - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2012/04/24 16:11:35 UTC, 0 replies.
- Skipping Root File from Indexing - posted by atul <at...@hexaware.com> on 2012/04/24 21:44:56 UTC, 1 replies.
- [jira] [Updated] (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 10:42:02 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1081) ant tests fail - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 10:42:02 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1167) Write JUnit tests for scoring-opic - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 10:44:03 UTC, 0 replies.
- [jira] [Updated] (NUTCH-896) Gora-based tests need to have their own config files - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 10:44:03 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1162) Write JUnit tests for parse-js - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 10:44:03 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1159) Write JUnit tests for index-anchor - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 10:44:04 UTC, 0 replies.
- Suitable naming for > Nutchgora branch? - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/04/25 10:46:22 UTC, 5 replies.
- [jira] [Resolved] (NUTCH-946) cache.jsp does not recognize encoding conversion from content different to UTF-8 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 10:52:02 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1170) Write JUnit tests for urlfilter-validator - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:12:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1158) Write JUnit tests for all nutchgora plugins - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:16:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1160) Write JUnit tests for index-basic - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:18:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1165) Write JUnit tests for protocol-sftp - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:18:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1161) Write JUnit tests for microformats-reltag plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:18:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1163) Write JUnit tests for protocol-ftp - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:18:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1169) Write JUnit tests for urlfilter-prefix - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:20:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1168) Write JUnit tests for tld - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:20:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1166) Write JUnit tests for scoring-link - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:20:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1164) Write JUnit tests for protocol-http - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:20:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1104) Port issues from trunk NutchGora branch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:22:18 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1340) Increase scalability by only removing markers when they actually exist for DbUpdaterReducer - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:26:18 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1283) Radically update all Solr configuration in Nutchgora - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:26:19 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1038) Port IndexingFiltersChecker to 2.0 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:28:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-887) Delegate parsing of feeds to Tika - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:42:20 UTC, 0 replies.
- [jira] [Updated] (NUTCH-842) AutoGenerate WebPage code - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:46:25 UTC, 0 replies.
- [jira] [Updated] (NUTCH-840) Port tests from parse-html to parse-tika - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:46:26 UTC, 0 replies.
- [jira] [Commented] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:52:18 UTC, 5 replies.
- [jira] [Updated] (NUTCH-956) solrindex issues - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/25 23:58:17 UTC, 0 replies.
- [jira] [Commented] (NUTCH-879) URL-s getting lost - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:00:18 UTC, 1 replies.
- [jira] [Updated] (NUTCH-992) SolrDedup is broken in trunk - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:00:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1026) Strip UTF-8 non-character codepoints - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:08:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1094) create comprehensive documentation for Nutchgora branch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:08:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-970) Injector job crashes with MySQL with table collation set to utf8_general_ci - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:10:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-875) Port Webgraph to Nutch 2.0 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:12:20 UTC, 0 replies.
- [jira] [Updated] (NUTCH-864) Fetcher generates entries with status 0 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:12:20 UTC, 0 replies.
- [jira] [Updated] (NUTCH-841) Nutch 2.0 webapp - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:14:19 UTC, 0 replies.
- [jira] [Updated] (NUTCH-978) A Plugin for extracting certain element of a web page on html page parsing. - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:16:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1285) Debian Packaging for Nutch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:18:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1025) Add option not to commit to Solr - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:20:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-944) Increase the number of elements to look for URLs and add the ability to specify multiple attributes by elements - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:20:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1290) crawlId not supported by all Tools - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:38:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-710) Support for rel="canonical" attribute - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:40:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a "?" - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:42:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-979) Add support for deleting Solr documents with ProtocolStatusCodes.NOTFOUND - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:46:17 UTC, 1 replies.
- [jira] [Updated] (NUTCH-849) different versions of the same library in nutch-2.0-dev.job and local\lib directory - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 00:50:17 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1340) Increase scalability by only removing markers when they actually exist for DbUpdaterReducer - posted by "Ferdy Galema (JIRA)" <ji...@apache.org> on 2012/04/26 11:04:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-882) Design a Host table in GORA - posted by "Ferdy Galema (JIRA)" <ji...@apache.org> on 2012/04/26 11:21:25 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1189) add commented out default settings to gora.properties files - posted by "Ferdy Galema (JIRA)" <ji...@apache.org> on 2012/04/26 12:13:18 UTC, 1 replies.
- [jira] [Closed] (NUTCH-882) Design a Host table in GORA - posted by "Ferdy Galema (JIRA)" <ji...@apache.org> on 2012/04/26 12:19:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1205) Upgrade gora modules to 0.2 in ivy/ivy.xml - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 13:27:20 UTC, 4 replies.
- [jira] [Closed] (NUTCH-1290) crawlId not supported by all Tools - posted by "Ferdy Galema (JIRA)" <ji...@apache.org> on 2012/04/26 13:35:18 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1205) Upgrade gora modules to 0.2 in ivy/ivy.xml - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 14:28:18 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1306) Commit after finished writing to solr index - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/26 15:10:20 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2012/04/26 22:46:50 UTC, 0 replies.
- [jira] [Issue Comment Edited] (NUTCH-1205) Upgrade gora modules to 0.2 in ivy/ivy.xml - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/04/27 00:16:48 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1205) Upgrade gora modules to 0.2 in ivy/ivy.xml - posted by "Ferdy Galema (JIRA)" <ji...@apache.org> on 2012/04/27 14:17:50 UTC, 0 replies.
- We just blocked Nutch - posted by Jerry Durand <jd...@interstellar.com> on 2012/04/29 15:29:35 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1084) ReadDB url throws exception - posted by "Andy Xue (JIRA)" <ji...@apache.org> on 2012/04/30 04:12:48 UTC, 0 replies.