You are viewing a plain text version of this content. The canonical link for it is here.
- Build failed in Jenkins: Nutch-nutchgora #1213 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/11/01 05:03:18 UTC, 0 replies.
- [jira] [Created] (NUTCH-1886) Review and update default.properties - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 17:36:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1843) Upgrade to Gora 0.5 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 17:45:34 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1843) Upgrade to Gora 0.5 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 17:46:33 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #1214 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/11/01 18:41:22 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1709) Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 18:53:33 UTC, 2 replies.
- [jira] [Work started] (NUTCH-1709) Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 18:54:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1483) Can't crawl filesystem with protocol-file plugin - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:07:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1791) Null pointer exceptions with gora-cassandra-0.4 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:11:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-840) Port tests from parse-html to parse-tika - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:12:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1855) Upgrade Hadoop dependencies to Hadoop 2 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:12:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1886) Review and update default.properties - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:14:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1884) NullPointerException in parsechecker and indexchecker with symlinks in file URL - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:15:33 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1820) remove field "orig" which duplicates "id" - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:33:34 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1820) remove field "orig" which duplicates "id" - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:33:34 UTC, 0 replies.
- [jira] [Work started] (NUTCH-1820) remove field "orig" which duplicates "id" - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:45:33 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1820) remove field "orig" which duplicates "id" - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:45:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1885) Protocol-file should treat symbolic links as redirects - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:46:33 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1878) urlnormalizer-regex to keep third slash in file:///path/index.html - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:47:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1644) Should have a parser that uses xpath - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:47:34 UTC, 5 replies.
- Patch reviews for 2.X - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/11/01 19:51:07 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1679) UpdateDb using batchId, link may override crawled page. - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:51:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1823) Upgrade to elasticsearch 1.2/1.3 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:52:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1880) URLUtil should not add additional slashes for file URLs - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:52:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1879) Regex URL normalizer should remove multiple slashes after file: protocol - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/01 19:52:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1483) Can't crawl filesystem with protocol-file plugin - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/11/01 20:02:33 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1644) Should have a parser that uses xpath - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/02 01:15:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1791) Null pointer exceptions with gora-cassandra-0.4 - posted by "Renato Javier MarroquĂ­n Mogrovejo (JIRA)" <ji...@apache.org> on 2014/11/02 14:07:33 UTC, 6 replies.
- NSF DataViz Hackathon for Polar CyberInfrastructure: New York, NY 11/3/2014 - 11/4/2014 Call for Remote Participation - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2014/11/02 18:37:12 UTC, 1 replies.
- Nutch 2.X question - posted by amit sehas <cu...@yahoo.com> on 2014/11/04 19:26:23 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1870) Generic xsl parser plugin - posted by "Albinscode (JIRA)" <ji...@apache.org> on 2014/11/04 20:54:34 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1870) Generic xsl parser plugin - posted by "Albinscode (JIRA)" <ji...@apache.org> on 2014/11/04 20:55:34 UTC, 1 replies.
- [jira] [Comment Edited] (NUTCH-1870) Generic xsl parser plugin - posted by "Albinscode (JIRA)" <ji...@apache.org> on 2014/11/04 20:56:35 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1483) Can't crawl filesystem with protocol-file plugin - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/11/04 22:13:34 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1878) urlnormalizer-regex to keep third slash in file:///path/index.html - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/11/04 22:15:34 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1879) Regex URL normalizer should remove multiple slashes after file: protocol - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/11/04 22:15:36 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1885) Protocol-file should treat symbolic links as redirects - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/11/04 22:16:34 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1880) URLUtil should not add additional slashes for file URLs - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/11/04 22:16:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1879) Regex URL normalizer should remove multiple slashes after file: protocol - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/11/04 22:53:34 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1885) Protocol-file should treat symbolic links as redirects - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/11/04 22:53:35 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1880) URLUtil should not add additional slashes for file URLs - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/11/04 22:53:35 UTC, 1 replies.
- [jira] [Created] (NUTCH-1887) Specify HTMLMapper to use in TikaParser - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/11/05 16:45:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1887) Specify HTMLMapper to use in TikaParser - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/11/05 16:51:33 UTC, 1 replies.
- Re: dev Digest 4 Nov 2014 21:53:35 -0000 Issue 1905 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/11/05 21:19:44 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1887) Specify HTMLMapper to use in TikaParser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/11/05 21:28:34 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-1825) protocol-http may hang for certain web pages - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/11/06 22:54:34 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1884) NullPointerException in parsechecker and indexchecker with symlinks in file URL - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/11/06 23:01:35 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1825) protocol-http may hang for certain web pages - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/11/06 23:43:34 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1884) NullPointerException in parsechecker and indexchecker with symlinks in file URL - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/11/06 23:52:35 UTC, 0 replies.
- Re: Nutch 2.3 - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/11/07 04:31:48 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1709) Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/11/07 05:04:34 UTC, 0 replies.
- [jira] [Created] (NUTCH-1888) Specify HTMLMapper to use in TikaParser - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/11/07 10:58:33 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1887) Specify HTMLMapper to use in TikaParser - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/11/07 11:00:47 UTC, 0 replies.
- [jira] [Created] (NUTCH-1889) Store all values from Tika metadata in Nutch metadata - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/11/07 11:25:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1889) Store all values from Tika metadata in Nutch metadata - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/11/07 11:42:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1592) XPath works on documents parsed with parse-html but not parse-tika - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/11/07 15:43:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1883) bin/crawl: use function to run bin/nutch and check exit value - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/11/07 15:50:34 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1140) index-more plugin, resetTitle method creates multiple values in the Title field - posted by "kaveh minooie (JIRA)" <ji...@apache.org> on 2014/11/07 20:07:36 UTC, 3 replies.
- Re: svn commit: r1637236 - in /nutch: branches/2.x/ branches/2.x/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/ trunk/ trunk/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/ - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2014/11/08 18:17:50 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-1883) bin/crawl: use function to run bin/nutch and check exit value - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/11/09 17:04:34 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1883) bin/crawl: use function to run bin/nutch and check exit value - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/11/09 18:28:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1829) Generator : unable to distinguish real errors - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/11/09 18:35:34 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-1829) Generator : unable to distinguish real errors - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/11/09 18:35:34 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1829) Generator : unable to distinguish real errors - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/11/09 18:36:34 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1883) bin/crawl: use function to run bin/nutch and check exit value - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/11/11 17:21:35 UTC, 0 replies.
- [jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support - posted by "sarath chandra chama (JIRA)" <ji...@apache.org> on 2014/11/12 10:02:35 UTC, 0 replies.
- [jira] [Created] (NUTCH-1890) Major Typo in Documentation - posted by "Boadu Akoto Charles Jnr (JIRA)" <ji...@apache.org> on 2014/11/17 10:17:33 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1890) Major Typo in Documentation - posted by "Boadu Akoto Charles Jnr (JIRA)" <ji...@apache.org> on 2014/11/17 11:28:33 UTC, 1 replies.
- [jira] [Work started] (NUTCH-1890) Major Typo in Documentation - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/11/17 21:56:34 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1890) Major Typo in Documentation - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/11/17 21:56:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1890) Major Typo in Documentation - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/11/17 21:56:34 UTC, 1 replies.
- [GitHub] nutch pull request: Fix for NUTCH-1890 - posted by chrismattmann <gi...@git.apache.org> on 2014/11/17 22:44:17 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1890) Major Typo in Documentation for Integrating Nutch and Solr - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/11/17 22:52:34 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1890) Major Typo in Documentation for Integrating Nutch and Solr - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2014/11/17 22:53:34 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1890) Major Typo in Documentation for Integrating Nutch and Solr - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/11/17 22:54:34 UTC, 0 replies.
- Nutch ContributorsGroup on Wiki and admin group on Wiki += mattmann - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2014/11/17 22:57:58 UTC, 1 replies.
- [Nutch Wiki] Trivial Update of "ContributorsGroup" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2014/11/18 05:56:04 UTC, 1 replies.
- [Nutch Wiki] Trivial Update of "AdminGroup" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2014/11/18 05:56:07 UTC, 0 replies.
- [Nutch Wiki] Update of "NutchTutorial" by ChrisMattmann - posted by Apache Wiki <wi...@apache.org> on 2014/11/18 05:58:17 UTC, 0 replies.
- Where happens the inject of Redirects and outlinks? - posted by Alfonso Nishikawa <al...@gmail.com> on 2014/11/18 18:26:49 UTC, 3 replies.
- ExceptionInInitializerError caused by NPE - posted by MengYing Wang <me...@gmail.com> on 2014/11/19 16:20:54 UTC, 0 replies.
- Re: [nsf-polar-usc-students] ExceptionInInitializerError caused by NPE - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/11/19 17:42:40 UTC, 6 replies.
- [Nutch Wiki] Update of "PluginCentral" by JorgeLuis - posted by Apache Wiki <wi...@apache.org> on 2014/11/19 22:36:20 UTC, 1 replies.
- Nutch in Windows: Failed to set permissions of path - posted by MengYing Wang <me...@gmail.com> on 2014/11/20 19:29:51 UTC, 0 replies.
- Re: [nsf-polar-usc-students] Nutch in Windows: Failed to set permissions of path - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/11/20 19:35:10 UTC, 1 replies.
- [jira] [Created] (NUTCH-1891) Can't run nutch2.3-snapshot on hadoop2.4.0 using gora0.5 and mongodb as backend datastore - posted by "wilco sheh (JIRA)" <ji...@apache.org> on 2014/11/23 04:58:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1891) Can't run nutch2.3-snapshot on hadoop2.4.0 using gora0.5 and mongodb as backend datastore - posted by "wilco sheh (JIRA)" <ji...@apache.org> on 2014/11/23 05:01:12 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1891) Can't run nutch2.3-snapshot on hadoop2.4.0 using gora0.5 and mongodb as backend datastore - posted by "wilco sheh (JIRA)" <ji...@apache.org> on 2014/11/23 09:24:12 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1891) Can't run nutch2.3-snapshot on hadoop2.4.0 using gora0.5 and mongodb as backend datastore - posted by "wilco sheh (JIRA)" <ji...@apache.org> on 2014/11/25 07:59:12 UTC, 0 replies.
- [Nutch Wiki] Update of "Presentations" by SebastianNagel - posted by Apache Wiki <wi...@apache.org> on 2014/11/25 22:06:53 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2014/11/26 01:49:21 UTC, 0 replies.
- [Nutch Wiki] Trivial Update of "Nutch2Roadmap" by LewisJohnMcgibbney - posted by Apache Wiki <wi...@apache.org> on 2014/11/26 02:02:01 UTC, 0 replies.
- Query about indexing crawled data from Nutch to Solr - posted by Prashant Shekar <sh...@gmail.com> on 2014/11/26 19:33:04 UTC, 4 replies.
- [jira] [Created] (NUTCH-1892) Update the FileDumper tool to fetch only those URLs with status db_fetched in nutch - posted by "Prasanth Iyer (JIRA)" <ji...@apache.org> on 2014/11/26 22:48:12 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1877) Suffix URL to ignore query string by default - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/11/28 10:47:12 UTC, 3 replies.
- [jira] [Commented] (NUTCH-1877) Suffix URL to ignore query string by default - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/11/28 10:47:13 UTC, 0 replies.