You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (NUTCH-1242) Allow disabling of URL Filters in ParseSegment - posted by "Hudson (Commented) (JIRA)" <ji...@apache.org> on 2012/02/01 05:22:58 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1256) WebGraph to dump host + score - posted by "Hudson (Commented) (JIRA)" <ji...@apache.org> on 2012/02/01 05:22:58 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1081) ant tests fail - posted by "Ferdy Galema (Commented) (JIRA)" <ji...@apache.org> on 2012/02/01 10:16:59 UTC, 0 replies.
- [jira] [Created] (NUTCH-1264) Configurable indexing plugin (index-extra) - posted by "Julien Nioche (Created) (JIRA)" <ji...@apache.org> on 2012/02/01 13:19:04 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1264) Configurable indexing plugin (index-extra) - posted by "Julien Nioche (Updated) (JIRA)" <ji...@apache.org> on 2012/02/01 14:51:02 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1005) Index headings plugin - posted by "Julien Nioche (Commented) (JIRA)" <ji...@apache.org> on 2012/02/01 15:18:58 UTC, 8 replies.
- [jira] [Updated] (NUTCH-1005) Index headings plugin - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/02/01 15:40:58 UTC, 1 replies.
- [jira] [Created] (NUTCH-1265) [nutchgora] - update to work with gora-0.2-incubating - posted by "Sujit Pal (Created) (JIRA)" <ji...@apache.org> on 2012/02/02 01:49:54 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1265) [nutchgora] - update to work with gora-0.2-incubating - posted by "Sujit Pal (Updated) (JIRA)" <ji...@apache.org> on 2012/02/02 01:51:53 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1265) [nutchgora] - update to work with gora-0.2-incubating - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/02 01:59:53 UTC, 2 replies.
- NUTCH-1205 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/02 13:16:43 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1265) [nutchgora] - update to work with gora-0.2-incubating - posted by "Lewis John McGibbney (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/02 17:16:51 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1265) [nutchgora] - update to work with gora-0.2-incubating - posted by "Lewis John McGibbney (Closed) (JIRA)" <ji...@apache.org> on 2012/02/02 17:16:51 UTC, 0 replies.
- Nutch db_unfetched - posted by nutchsolruser <nu...@gmail.com> on 2012/02/03 09:37:41 UTC, 0 replies.
- Problem with db.max.anchor.length property in nutch-default.xml - posted by nutchsolruser <nu...@gmail.com> on 2012/02/03 14:19:33 UTC, 0 replies.
- [jira] [Commented] (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed - posted by "Abhay Dabholkar (Commented) (JIRA)" <ji...@apache.org> on 2012/02/03 15:55:53 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1140) index-more plugin, resetTitle method creates multiple values in the Title field - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/05 11:33:53 UTC, 0 replies.
- Fwd: [Announce] Google Summer of Code 2012 - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2012/02/06 05:24:49 UTC, 1 replies.
- [jira] [Created] (NUTCH-1266) Subcollection to optionally write to configured fields - posted by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/02/06 14:15:59 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1266) Subcollection to optionally write to configured fields - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/02/06 14:35:59 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1264) Configurable indexing plugin (index-extra) - posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org> on 2012/02/06 15:19:59 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic - posted by "Julien Nioche (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/06 17:33:59 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1264) Configurable indexing plugin (index-metadata) - posted by "Julien Nioche (Updated) (JIRA)" <ji...@apache.org> on 2012/02/06 17:37:59 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1264) Configurable indexing plugin (index-metadata) - posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org> on 2012/02/06 17:51:59 UTC, 2 replies.
- [jira] [Created] (NUTCH-1267) urlmeta to delegate indexing to index-metadata - posted by "Julien Nioche (Created) (JIRA)" <ji...@apache.org> on 2012/02/06 17:55:59 UTC, 0 replies.
- [jira] [Created] (NUTCH-1268) parse-meta to delegate indexing to index-metadata - posted by "Julien Nioche (Created) (JIRA)" <ji...@apache.org> on 2012/02/06 17:57:59 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1264) Configurable indexing plugin (index-metadata) - posted by "Julien Nioche (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/06 17:57:59 UTC, 0 replies.
- unsubscribe - posted by linyuan <li...@gmail.com> on 2012/02/07 07:45:35 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1210) DomainBlacklistFilter - posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org> on 2012/02/07 11:45:02 UTC, 9 replies.
- [jira] [Commented] (NUTCH-1266) Subcollection to optionally write to configured fields - posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org> on 2012/02/07 11:45:02 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1005) Parse headings plugin - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/02/07 11:48:59 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1005) Parse headings plugin - posted by "Markus Jelsma (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/07 14:26:59 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1005) Parse headings plugin - posted by "Hudson (Commented) (JIRA)" <ji...@apache.org> on 2012/02/07 15:02:59 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1259) TikaParser should not add Content-Type from HTTP Headers to Nutch Metadata - posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org> on 2012/02/07 16:24:59 UTC, 11 replies.
- [jira] [Updated] (NUTCH-1259) TikaParser should not add Content-Type from HTTP Headers to Nutch Metadata - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/02/07 16:26:59 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1258) MoreIndexingFilter should be able to read Content-Type from both parse metadata and content metadata - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/02/07 16:32:59 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1258) MoreIndexingFilter should be able to read Content-Type from both parse metadata and content metadata - posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org> on 2012/02/07 16:32:59 UTC, 0 replies.
- [jira] [Created] (NUTCH-1269) Generate main problems - posted by "behnam nikbakht (Created) (JIRA)" <ji...@apache.org> on 2012/02/08 11:24:59 UTC, 0 replies.
- [jira] [Created] (NUTCH-1270) some of Deflate encoded pages not fetched - posted by "behnam nikbakht (Created) (JIRA)" <ji...@apache.org> on 2012/02/08 11:39:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1269) Generate main problems - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/08 11:40:59 UTC, 3 replies.
- [jira] [Created] (NUTCH-1271) Fix errors @ compile time - posted by "Lewis John McGibbney (Created) (JIRA)" <ji...@apache.org> on 2012/02/08 11:46:59 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1269) Generate main problems - posted by "behnam nikbakht (Updated) (JIRA)" <ji...@apache.org> on 2012/02/08 12:18:57 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1270) some of Deflate encoded pages not fetched - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/08 13:35:59 UTC, 0 replies.
- Fwd: Mandatory svnpubsub migration by Jan 2013 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/08 13:40:03 UTC, 2 replies.
- tika-core, tika-parser - posted by Markus Jelsma <ma...@openindex.io> on 2012/02/08 13:50:18 UTC, 8 replies.
- Finding specific file types only --> *.ics files - posted by Peter Jameson <pe...@curveos.com> on 2012/02/08 18:04:49 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1206) tika parser of nutch 1.3 is failing to prcess pdfs - posted by "dibyendu ghosh (Commented) (JIRA)" <ji...@apache.org> on 2012/02/09 10:38:59 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1266) Subcollection to optionally write to configured fields - posted by "Markus Jelsma (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/09 10:57:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1262) Map `duplicating` content-types to a single type - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/02/09 10:57:01 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1145) Add linkrank config directives to default conf - posted by "Markus Jelsma (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/09 11:00:59 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1129) Any23 Nutch plugin - posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org> on 2012/02/09 16:27:59 UTC, 3 replies.
- Build failed in Jenkins: Nutch-nutchgora #158 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/10 05:17:55 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1205) Upgrade gora modules to 0.2-incubating in ivy/ivy.xml - posted by "Lewis John McGibbney (Updated) (JIRA)" <ji...@apache.org> on 2012/02/11 19:04:59 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "bin/nutch webgraph" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2012/02/11 19:36:01 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "bin/nutch linkrank" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2012/02/11 19:42:56 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #160 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/13 05:04:03 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #1756 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/13 05:04:09 UTC, 0 replies.
- subscribe - posted by nutch buddy <nu...@gmail.com> on 2012/02/13 09:08:44 UTC, 0 replies.
- (Unknown) - posted by nutch buddy <nu...@gmail.com> on 2012/02/13 09:09:57 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1259) Store detected content type in crawldatum metadata - posted by "Julien Nioche (Updated) (JIRA)" <ji...@apache.org> on 2012/02/13 12:25:00 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1259) Store detected content type in crawldatum metadata - posted by "Julien Nioche (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/13 12:50:59 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1259) Store detected content type in crawldatum metadata - posted by "Hudson (Commented) (JIRA)" <ji...@apache.org> on 2012/02/13 13:06:59 UTC, 4 replies.
- Re: Understanding NutchConfigration properly - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/13 14:36:41 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1205) Upgrade gora modules to 0.2-SNAPSHOT in ivy/ivy.xml - posted by "Lewis John McGibbney (Updated) (JIRA)" <ji...@apache.org> on 2012/02/13 15:44:59 UTC, 2 replies.
- Unsubscribe - posted by "Rajan Renuka (Nokia-LC/Chicago)" <re...@nokia.com> on 2012/02/13 16:02:30 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1205) Upgrade gora modules to 0.2-SNAPSHOT in ivy/ivy.xml - posted by "Ferdy Galema (Commented) (JIRA)" <ji...@apache.org> on 2012/02/13 16:26:59 UTC, 5 replies.
- Build failed in Jenkins: Nutch-nutchgora #161 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/14 05:02:09 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #1757 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/14 05:02:22 UTC, 0 replies.
- Build failed in Jenkins: nutch-trunk-maven #147 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/14 06:04:30 UTC, 0 replies.
- [jira] [Created] (NUTCH-1272) Wrong property name in nutch-default.xml - posted by "Daniel Baur (Created) (JIRA)" <ji...@apache.org> on 2012/02/14 10:39:59 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1272) Wrong property name in nutch-default.xml - posted by "Daniel Baur (Updated) (JIRA)" <ji...@apache.org> on 2012/02/14 10:51:59 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1272) Wrong property name in nutch-default.xml - posted by "Julien Nioche (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/14 13:07:01 UTC, 0 replies.
- Jenkins build is back to normal : nutch-trunk-maven #148 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/14 14:05:55 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-1259) Store detected content type in crawldatum metadata - posted by "Markus Jelsma (Reopened) (JIRA)" <ji...@apache.org> on 2012/02/14 14:57:00 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1222) Upgrade to new Hadoop 0.22.0 - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/14 15:19:59 UTC, 2 replies.
- [jira] [Created] (NUTCH-1273) Fix [deprecation] javac warnings - posted by "Lewis John McGibbney (Created) (JIRA)" <ji...@apache.org> on 2012/02/14 15:29:59 UTC, 0 replies.
- [jira] [Created] (NUTCH-1274) Fix [cast] javac warnings - posted by "Lewis John McGibbney (Created) (JIRA)" <ji...@apache.org> on 2012/02/14 15:36:03 UTC, 0 replies.
- [jira] [Created] (NUTCH-1275) Fix [unchecked] javac warnings - posted by "Lewis John McGibbney (Created) (JIRA)" <ji...@apache.org> on 2012/02/14 15:42:05 UTC, 0 replies.
- [jira] [Created] (NUTCH-1276) Fix [dep-ann] - posted by "Lewis John McGibbney (Created) (JIRA)" <ji...@apache.org> on 2012/02/14 15:43:59 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1276) Fix [dep-ann] - posted by "Lewis John McGibbney (Updated) (JIRA)" <ji...@apache.org> on 2012/02/14 15:44:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-1277) Fix [fallthrough] javac warnings - posted by "Lewis John McGibbney (Created) (JIRA)" <ji...@apache.org> on 2012/02/14 15:50:02 UTC, 0 replies.
- XSD for Solr Schema - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/14 19:23:25 UTC, 1 replies.
- Detecting Encoding with plugins - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/14 23:00:38 UTC, 10 replies.
- [jira] [Work started] (NUTCH-1129) Any23 Nutch plugin - posted by "Lewis John McGibbney (Work started) (JIRA)" <ji...@apache.org> on 2012/02/15 00:10:00 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1129) Any23 Nutch plugin - posted by "Lewis John McGibbney (Updated) (JIRA)" <ji...@apache.org> on 2012/02/15 00:34:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-1278) Fetch Improvement in threads per host - posted by "behnam nikbakht (Created) (JIRA)" <ji...@apache.org> on 2012/02/15 07:08:03 UTC, 0 replies.
- how linkdb impact on scores - posted by "behnam.nikbakht" <be...@gmail.com> on 2012/02/15 07:56:00 UTC, 1 replies.
- [jira] [Created] (NUTCH-1279) Check if limit has been reached in GeneraterReducer must be the first check performance-wise. - posted by "Ferdy Galema (Created) (JIRA)" <ji...@apache.org> on 2012/02/15 10:35:59 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1279) Check if limit has been reached in GeneraterReducer must be the first check performance-wise. - posted by "Ferdy Galema (Closed) (JIRA)" <ji...@apache.org> on 2012/02/15 10:39:59 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1279) Check if limit has been reached in GeneraterReducer must be the first check performance-wise. - posted by "Ferdy Galema (Updated) (JIRA)" <ji...@apache.org> on 2012/02/15 10:39:59 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1279) Check if limit has been reached in GeneraterReducer must be the first check performance-wise. - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/15 11:24:59 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1278) Fetch Improvement in threads per host - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/15 13:21:09 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1215) UpdateDB should not require segment as input - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/02/15 14:11:03 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1215) UpdateDB should not require segment as input - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/15 14:22:59 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1262) Map `duplicating` content-types to a single type - posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org> on 2012/02/15 14:29:02 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #162 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/16 05:02:20 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #1758 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/16 05:02:29 UTC, 0 replies.
- [jira] [Created] (NUTCH-1280) language-identifier should have option to use detected value by Tika even when uncertain - posted by "Ferdy Galema (Created) (JIRA)" <ji...@apache.org> on 2012/02/16 11:43:05 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1280) language-identifier should have option to use detected value by Tika even when uncertain - posted by "Ferdy Galema (Updated) (JIRA)" <ji...@apache.org> on 2012/02/16 11:44:59 UTC, 1 replies.
- [jira] [Commented] (NUTCH-809) Parse-metatags plugin - posted by "Rajasekar Karthik (Commented) (JIRA)" <ji...@apache.org> on 2012/02/16 21:31:00 UTC, 1 replies.
- Jenkins build is back to normal : Nutch-nutchgora #163 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/17 05:09:32 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #1759 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/17 05:18:55 UTC, 0 replies.
- issue in nutch-default.xml - posted by ka...@plutoz.com on 2012/02/17 08:15:58 UTC, 4 replies.
- [jira] [Commented] (NUTCH-1246) Upgrade to Hadoop 1.0.0 - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/17 16:11:59 UTC, 5 replies.
- [jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient - posted by "Lewis John McGibbney (Updated) (JIRA)" <ji...@apache.org> on 2012/02/17 16:15:59 UTC, 0 replies.
- [DISCUSS] Nutchgora 2.0 release - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/17 16:23:03 UTC, 5 replies.
- [jira] [Commented] (NUTCH-1001) bin/nutch fetch/parse handle crawl/segments directory - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/17 21:14:57 UTC, 4 replies.
- [jira] [Commented] (NUTCH-1079) StringBuffer converted to StringBuilder - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/17 21:28:57 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1193) Incorrect url transform to lowercase: parameter solr - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/17 21:48:58 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1193) Incorrect url transform to lowercase: parameter solr - posted by "Lewis John McGibbney (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/17 21:50:57 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1193) Incorrect url transform to lowercase: parameter solr - posted by "Lewis John McGibbney (Closed) (JIRA)" <ji...@apache.org> on 2012/02/17 21:50:57 UTC, 0 replies.
- Build failed in Jenkins: nutch-trunk-maven #153 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/17 22:02:28 UTC, 0 replies.
- Re: svn commit: r1245753 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/Crawl.java - posted by USC Mail <gg...@usc.edu> on 2012/02/18 00:00:46 UTC, 0 replies.
- Jenkins build is back to normal : nutch-trunk-maven #154 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/18 06:03:54 UTC, 0 replies.
- [jira] [Created] (NUTCH-1281) tika parser not work properly with unwanted file types that passed from filters in nutch - posted by "behnam nikbakht (Created) (JIRA)" <ji...@apache.org> on 2012/02/19 06:43:59 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1278) Fetch Improvement in threads per host - posted by "behnam nikbakht (Updated) (JIRA)" <ji...@apache.org> on 2012/02/19 10:24:34 UTC, 0 replies.
- [jira] [Created] (NUTCH-1282) linkdb scalability - posted by "behnam nikbakht (Created) (JIRA)" <ji...@apache.org> on 2012/02/19 11:16:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1281) tika parser not work properly with unwanted file types that passed from filters in nutch - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/19 12:38:34 UTC, 2 replies.
- [jira] [Created] (NUTCH-1283) Ridically update all Solr configuration in Nutchgora - posted by "Lewis John McGibbney (Created) (JIRA)" <ji...@apache.org> on 2012/02/19 13:50:34 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "NutchAdministrationUserInterface" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2012/02/19 15:22:46 UTC, 9 replies.
- [jira] [Commented] (NUTCH-929) Create a REST-based admin UI for Nutch - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/19 17:42:39 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1253) Incompatible neko and xerces versions - posted by "Lewis John McGibbney (Updated) (JIRA)" <ji...@apache.org> on 2012/02/19 18:50:36 UTC, 1 replies.
- [jira] [Updated] (NUTCH-728) Improve nutch release packaging - posted by "Lewis John McGibbney (Updated) (JIRA)" <ji...@apache.org> on 2012/02/19 19:08:34 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1276) Fix [dep-ann] - posted by "Lewis John McGibbney (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/19 19:26:36 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1276) Fix [dep-ann] - posted by "Lewis John McGibbney (Closed) (JIRA)" <ji...@apache.org> on 2012/02/19 19:26:36 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1249) Resolve all issues flagged up by adding javac -Xlint arguement - posted by "Lewis John McGibbney (Assigned) (JIRA)" <ji...@apache.org> on 2012/02/19 19:28:36 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1273) Fix [deprecation] javac warnings - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/19 19:28:36 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1271) Fix errors @ compile time - posted by "Lewis John McGibbney (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/19 19:32:37 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1271) Fix errors @ compile time - posted by "Lewis John McGibbney (Closed) (JIRA)" <ji...@apache.org> on 2012/02/19 19:32:38 UTC, 0 replies.
- [jira] [Commented] (NUTCH-978) [GSoC 2011] A Plugin for extracting certain element of a web page on html page parsing. - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/19 19:42:42 UTC, 9 replies.
- [jira] [Updated] (NUTCH-978) [GSoC 2011] A Plugin for extracting certain element of a web page on html page parsing. - posted by "Lewis John McGibbney (Updated) (JIRA)" <ji...@apache.org> on 2012/02/19 19:44:39 UTC, 0 replies.
- [jira] [Created] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default. - posted by "Lewis John McGibbney (Created) (JIRA)" <ji...@apache.org> on 2012/02/19 19:58:40 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1276) Fix [dep-ann] - posted by "Hudson (Commented) (JIRA)" <ji...@apache.org> on 2012/02/19 20:02:34 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1283) Ridically update all Solr configuration in Nutchgora - posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org> on 2012/02/20 10:13:38 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1280) language-identifier should have option to use detected value by Tika even when uncertain - posted by "Ferdy Galema (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/20 10:40:36 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1053) Parsing of RSS feeds fails - posted by "Michael Kazekin (Updated) (JIRA)" <ji...@apache.org> on 2012/02/20 12:41:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1283) Radically update all Solr configuration in Nutchgora - posted by "Lewis John McGibbney (Updated) (JIRA)" <ji...@apache.org> on 2012/02/20 12:43:34 UTC, 0 replies.
- [jira] [Issue Comment Edited] (NUTCH-1053) Parsing of RSS feeds fails - posted by "Michael Kazekin (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/02/20 12:43:34 UTC, 0 replies.
- [jira] [Created] (NUTCH-1285) Debian Packaging for Nutch - posted by "Lewis John McGibbney (Created) (JIRA)" <ji...@apache.org> on 2012/02/20 12:51:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1285) Debian Packaging for Nutch - posted by "Lewis John McGibbney (Updated) (JIRA)" <ji...@apache.org> on 2012/02/20 12:53:34 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1277) Fix [fallthrough] javac warnings - posted by "Lewis John McGibbney (Closed) (JIRA)" <ji...@apache.org> on 2012/02/20 15:30:32 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1277) Fix [fallthrough] javac warnings - posted by "Lewis John McGibbney (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/20 15:30:32 UTC, 0 replies.
- [jira] [Commented] (NUTCH-965) Skip parsing for truncated documents - posted by "Ferdy Galema (Commented) (JIRA)" <ji...@apache.org> on 2012/02/20 15:30:33 UTC, 26 replies.
- [jira] [Created] (NUTCH-1286) Refactoring/reimplementing crawling API (NutchApp) - posted by "Ferdy Galema (Created) (JIRA)" <ji...@apache.org> on 2012/02/20 15:55:36 UTC, 0 replies.
- Build failed in Jenkins: nutch-trunk-maven #158 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/20 16:02:37 UTC, 2 replies.
- [jira] [Reopened] (NUTCH-1277) Fix [fallthrough] javac warnings - posted by "Lewis John McGibbney (Reopened) (JIRA)" <ji...@apache.org> on 2012/02/20 16:51:34 UTC, 0 replies.
- [jira] [Created] (NUTCH-1287) Upgrade to hsqldb 2.2.8 - posted by "Ferdy Galema (Created) (JIRA)" <ji...@apache.org> on 2012/02/20 16:57:34 UTC, 0 replies.
- Jenkins build is back to normal : nutch-trunk-maven #159 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/20 17:05:08 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1287) Upgrade to hsqldb 2.2.8 - posted by "Ferdy Galema (Closed) (JIRA)" <ji...@apache.org> on 2012/02/20 17:09:36 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1286) Refactoring/reimplementing crawling API (NutchApp) - posted by "Ferdy Galema (Updated) (JIRA)" <ji...@apache.org> on 2012/02/20 17:27:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1287) Upgrade to hsqldb 2.2.8 - posted by "Hudson (Commented) (JIRA)" <ji...@apache.org> on 2012/02/21 05:51:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1280) language-identifier should have option to use detected value by Tika even when uncertain - posted by "Hudson (Commented) (JIRA)" <ji...@apache.org> on 2012/02/21 05:51:34 UTC, 0 replies.
- Build failed in Jenkins: nutch-trunk-maven #160 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/21 06:02:16 UTC, 0 replies.
- [jira] [Created] (NUTCH-1288) Generator should not generate filter and not found and denied and gone and permanently moved pages - posted by "behnam nikbakht (Created) (JIRA)" <ji...@apache.org> on 2012/02/21 08:59:32 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1288) Generator should not generate filter and not found and denied and gone and permanently moved pages - posted by "behnam nikbakht (Updated) (JIRA)" <ji...@apache.org> on 2012/02/21 09:02:33 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1288) Generator should not generate filter and not found and denied and gone and permanently moved pages - posted by "Julien Nioche (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/21 11:15:34 UTC, 0 replies.
- slf4j-log4j12 new version causes runtime error - posted by kaveh minooie <ka...@plutoz.com> on 2012/02/22 00:47:39 UTC, 3 replies.
- I think I found a bug --> multiple_values_encountered_for_non_multiValued_field_title - posted by kaveh minooie <ka...@plutoz.com> on 2012/02/22 01:48:41 UTC, 1 replies.
- Build failed in Jenkins: Nutch-nutchgora #169 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/22 05:12:21 UTC, 0 replies.
- Jenkins build is back to normal : nutch-trunk-maven #161 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/22 06:06:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-965) Skip parsing for truncated documents - posted by "Ferdy Galema (Updated) (JIRA)" <ji...@apache.org> on 2012/02/22 09:40:49 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-965) Skip parsing for truncated documents - posted by "Lewis John McGibbney (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/22 12:06:50 UTC, 0 replies.
- [jira] [Closed] (NUTCH-965) Skip parsing for truncated documents - posted by "Lewis John McGibbney (Closed) (JIRA)" <ji...@apache.org> on 2012/02/22 12:08:50 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #170 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/23 05:12:49 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #1766 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/23 05:23:09 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #171 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/23 12:49:03 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #172 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/23 13:05:53 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1210) DomainBlacklistFilter - posted by "Markus Jelsma (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/23 13:33:48 UTC, 0 replies.
- Re: svn commit: r1292764 - in /nutch/trunk: ./ conf/ src/plugin/ src/plugin/urlfilter-domainblacklist/ src/plugin/urlfilter-domainblacklist/data/ src/plugin/urlfilter-domainblacklist/src/ src/plugin/urlfilter-domainblacklist/src/java/ src/plugin/urlf - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/23 13:36:35 UTC, 1 replies.
- Jenkins build is back to normal : Nutch-trunk #1767 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/24 05:30:44 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "WhichTechnicalConceptsAreBehindTheNutchPluginSystem" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2012/02/25 12:44:37 UTC, 0 replies.
- [jira] [Commented] (NUTCH-670) feed plugin does not parse RSS2 enclosures - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/25 12:53:48 UTC, 2 replies.
- Build failed in Jenkins: nutch-trunk-maven #172 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/26 06:02:51 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1286) Refactoring/reimplementing crawling API (NutchApp) - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/26 13:22:48 UTC, 1 replies.
- [jira] [Commented] (NUTCH-728) Improve nutch release packaging - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/26 15:06:48 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/26 15:08:48 UTC, 0 replies.
- Proposal to remove o.a.n.crawl.MapWritable from Nutch codebase. - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/26 15:58:25 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1273) Fix [deprecation] javac warnings - posted by "Lewis John McGibbney (Updated) (JIRA)" <ji...@apache.org> on 2012/02/26 17:26:48 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1001) bin/nutch fetch/parse handle crawl/segments directory - posted by "Gabriele Kahlout (Updated) (JIRA)" <ji...@apache.org> on 2012/02/26 19:24:49 UTC, 1 replies.
- [jira] [Issue Comment Edited] (NUTCH-1001) bin/nutch fetch/parse handle crawl/segments directory - posted by "Gabriele Kahlout (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/02/26 19:26:48 UTC, 0 replies.
- Jenkins build is back to normal : nutch-trunk-maven #173 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/02/27 06:05:00 UTC, 0 replies.
- [jira] [Created] (NUTCH-1289) In distributed mode URL's are not partitioned - posted by "Dan Rosher (Created) (JIRA)" <ji...@apache.org> on 2012/02/27 12:07:48 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1289) In distributed mode URL's are not partitioned - posted by "Dan Rosher (Updated) (JIRA)" <ji...@apache.org> on 2012/02/27 12:07:48 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/27 12:37:48 UTC, 3 replies.
- [jira] [Created] (NUTCH-1290) crawlId not supported by all Tools - posted by "Mathijs Homminga (Created) (JIRA)" <ji...@apache.org> on 2012/02/28 11:39:49 UTC, 0 replies.
- [nutchgora] AbstractFetchSchedule.forceFetch method resets fetch status - posted by Mathijs Homminga <ma...@kalooga.com> on 2012/02/28 14:09:25 UTC, 2 replies.
- [jira] [Commented] (NUTCH-945) Indexing to multiple SOLR Servers - posted by "Sujit Pal (Commented) (JIRA)" <ji...@apache.org> on 2012/02/29 02:24:04 UTC, 1 replies.
- [jira] [Updated] (NUTCH-945) Indexing to multiple SOLR Servers - posted by "Sujit Pal (Updated) (JIRA)" <ji...@apache.org> on 2012/02/29 02:26:05 UTC, 2 replies.
- Fwd: [blog post] Accumulo, Nutch, and Gora - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2012/02/29 03:24:45 UTC, 0 replies.
- [jira] [Created] (NUTCH-1291) Fetcher to stringify exception on // unexpected exception - posted by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/02/29 14:53:59 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1291) Fetcher to stringify exception on // unexpected exception - posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2012/02/29 14:55:56 UTC, 1 replies.
- NUTCH-1273 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/29 14:57:46 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1291) Fetcher to stringify exception on // unexpected exception - posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2012/02/29 14:59:56 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-1291) Fetcher to stringify exception on // unexpected exception - posted by "Markus Jelsma (Resolved) (JIRA)" <ji...@apache.org> on 2012/02/29 15:13:56 UTC, 0 replies.
- [jira] [Created] (NUTCH-1292) Better exception logging and debugging during fetch. - posted by "Ferdy Galema (Created) (JIRA)" <ji...@apache.org> on 2012/02/29 16:41:58 UTC, 0 replies.