You are viewing a plain text version of this content. The canonical link for it is here.
- [nutch] branch master updated (55c7f75 -> 3f0ecdf) - posted by sn...@apache.org on 2017/12/05 09:39:40 UTC, 0 replies.
- [nutch] 01/05: NUTCH-2456: Redirected documents are not indexed - posted by sn...@apache.org on 2017/12/05 09:39:41 UTC, 0 replies.
- [nutch] 02/05: Code style fixes. - posted by sn...@apache.org on 2017/12/05 09:39:42 UTC, 0 replies.
- [nutch] 03/05: Allow index removals even if dbDatum is null. - posted by sn...@apache.org on 2017/12/05 09:39:43 UTC, 0 replies.
- [nutch] 04/05: Fix for previous commit - posted by sn...@apache.org on 2017/12/05 09:39:44 UTC, 0 replies.
- [nutch] 05/05: NUTCH-2456 Allow to index pages/URLs not contained in CrawlDb - posted by sn...@apache.org on 2017/12/05 09:39:45 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2468 should filter out invalid URLs by default - enable plugin urlfilter-validate by default - posted by sn...@apache.org on 2017/12/05 10:22:43 UTC, 0 replies.
- [nutch] branch 2.x updated: NUTCH-2468 should filter out invalid URLs by default - enable plugin urlfilter-validate by default - posted by sn...@apache.org on 2017/12/05 10:23:00 UTC, 0 replies.
- [nutch] branch master updated (d8754b7 -> 9931acc) - posted by sn...@apache.org on 2017/12/05 11:10:06 UTC, 0 replies.
- [nutch] 01/03: This suggested change seems to work. MalformedURLExceptions no longer occur. - posted by sn...@apache.org on 2017/12/05 11:10:07 UTC, 0 replies.
- [nutch] 02/03: NUTCH-2451 protocol-ftp to resolve relative URL when following redirects - return empty protocol output instead of throwing exception if relative redirect URL fails to resolve - format source code - complete LOG message - posted by sn...@apache.org on 2017/12/05 11:10:08 UTC, 0 replies.
- [nutch] 03/03: Merge branch 'NUTCH-2451' - cherry-picked e159ad4 from HiranChaudhuri:NUTCH-2451 - closes #241 - posted by sn...@apache.org on 2017/12/05 11:10:09 UTC, 0 replies.
- [nutch] branch 2.x updated: NUTCH-2451 protocol-ftp to resolve relative URL when following redirects - posted by sn...@apache.org on 2017/12/05 11:10:12 UTC, 0 replies.
- [nutch] branch 2.x updated: NUTCH-2469 Documents not commited to solr in Sever mode - applied patch contributed by Ninaad Joshi - posted by sn...@apache.org on 2017/12/05 11:42:18 UTC, 0 replies.
- [nutch] branch master updated (9931acc -> 708cc56) - posted by sn...@apache.org on 2017/12/05 12:23:39 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #252 from sebastian-nagel/nutch-2470-crawldb-reader-stats-quantiles - posted by sn...@apache.org on 2017/12/05 12:23:40 UTC, 0 replies.
- [nutch] branch master updated (708cc56 -> f483e52) - posted by jo...@apache.org on 2017/12/06 11:48:27 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #236 from jorgelbg/NUTCH-2399 - posted by jo...@apache.org on 2017/12/06 11:48:28 UTC, 0 replies.
- [nutch] branch 2.x updated (cc2f4ab -> 3486539) - posted by le...@apache.org on 2017/12/13 20:47:27 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #258 from lewismc/NUTCH-2438 - posted by le...@apache.org on 2017/12/13 20:47:28 UTC, 0 replies.
- [nutch] branch master updated (f483e52 -> d4a2b47) - posted by le...@apache.org on 2017/12/13 20:54:15 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #217 from pipldev/LanguageIndexingFilter1 - posted by le...@apache.org on 2017/12/13 20:54:16 UTC, 0 replies.
- [nutch] branch master updated (d4a2b47 -> 6b04090) - posted by le...@apache.org on 2017/12/13 21:30:24 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #253 from smartive/fix/indexer-elastic-rest-dependecy - posted by le...@apache.org on 2017/12/13 21:30:25 UTC, 0 replies.
- [nutch] branch master updated (6b04090 -> 0e3036b) - posted by sn...@apache.org on 2017/12/14 15:12:12 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #255 from sebastian-nagel/nutch-2474-crawldb-reader-stats-class-cast-exception - posted by sn...@apache.org on 2017/12/14 15:12:13 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2035 urlfilter-regex case insensitive rules - posted by sn...@apache.org on 2017/12/15 16:26:49 UTC, 0 replies.
- [nutch] branch 2.x updated: NUTCH-2035 urlfilter-regex case insensitive rules - posted by sn...@apache.org on 2017/12/15 16:50:58 UTC, 0 replies.
- [nutch] branch master updated (df14c8a -> f6bd25b) - posted by sn...@apache.org on 2017/12/15 17:17:12 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #259 from sebastian-nagel/nutch-2439-upgrade-tika-1.17 - posted by sn...@apache.org on 2017/12/15 17:17:13 UTC, 0 replies.
- [nutch] branch master updated (f6bd25b -> 310295c) - posted by sn...@apache.org on 2017/12/15 19:36:17 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #260 from sebastian-nagel/nutch-2480-upgrade-crawler-commons-0.9 - posted by sn...@apache.org on 2017/12/15 19:36:18 UTC, 0 replies.
- [nutch] branch master updated (310295c -> bda25c8) - posted by sn...@apache.org on 2017/12/15 19:37:44 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #261 from sebastian-nagel/nutch-2354-upgrade-hadoop-2.7.4 - posted by sn...@apache.org on 2017/12/15 19:37:45 UTC, 0 replies.
- [nutch] branch master updated (bda25c8 -> cfd8900) - posted by sn...@apache.org on 2017/12/15 20:48:29 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #262 from sebastian-nagel/nutch-2362-update-maxmind-geoip-dependency - posted by sn...@apache.org on 2017/12/15 20:48:30 UTC, 0 replies.
- [nutch] branch master updated (cfd8900 -> 45ce310) - posted by le...@apache.org on 2017/12/16 18:51:45 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #257 from smartive/feat/indexer-elastic-rest-languages - posted by le...@apache.org on 2017/12/16 18:51:46 UTC, 0 replies.
- [nutch] branch master updated (45ce310 -> d73f293) - posted by sn...@apache.org on 2017/12/17 11:33:32 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #263 from sebastian-nagel/nutch-2478-parser-resolve-base-url - posted by sn...@apache.org on 2017/12/17 11:33:33 UTC, 0 replies.
- [nutch] branch master updated (d73f293 -> 8e6cb9d) - posted by sn...@apache.org on 2017/12/17 13:15:51 UTC, 0 replies.
- [nutch] 01/03: fix for NUTCH-2477 (refactor checker classes) contributed by Jurian Broertjes - posted by sn...@apache.org on 2017/12/17 13:15:52 UTC, 0 replies.
- [nutch] 02/03: Improve command-line help for URL filter and normalizer checker - posted by sn...@apache.org on 2017/12/17 13:15:53 UTC, 0 replies.
- [nutch] 03/03: Merge branch 'sju:NUTCH-2431' contributed by Jurian Broertjes, closes #256 - posted by sn...@apache.org on 2017/12/17 13:15:54 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2322 URL not available for Jexl operations - apply patch contributed by Markus Jelsma - posted by sn...@apache.org on 2017/12/17 14:35:09 UTC, 0 replies.
- [nutch] branch master updated (8b3412a -> 2ce1177) - posted by sn...@apache.org on 2017/12/17 14:46:52 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #180 from smadha/NUTCH-2370 - posted by sn...@apache.org on 2017/12/17 14:46:53 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2034 CrawlDB update job to count documents in CrawlDb rejected by URL filters (patch contributed by Luis Lopez) - posted by sn...@apache.org on 2017/12/17 15:15:05 UTC, 0 replies.
- [nutch] branch 2.x updated: NUTCH-2358 HostInjectorJob doesn't work - posted by le...@apache.org on 2017/12/17 17:13:48 UTC, 0 replies.
- [nutch] branch master updated (961c725 -> fc89e4f) - posted by sn...@apache.org on 2017/12/18 15:49:44 UTC, 0 replies.
- [nutch] 01/23: fix for NUTCH-2370 contributed by msharan@usc.edu - posted by sn...@apache.org on 2017/12/18 15:49:45 UTC, 0 replies.
- [nutch] 02/23: NUTCH-2474 CrawlDbReader -stats fails with ClassCastException - replace CrawlDbStatCombiner by CrawlDbStatReducer and ensure that data is properly processed independently whether and how often combiner is called - simplify calculation of minimum and maximum - posted by sn...@apache.org on 2017/12/18 15:49:46 UTC, 0 replies.
- [nutch] 03/23: - filter out NaN scores which break the quantile calculation - posted by sn...@apache.org on 2017/12/18 15:49:47 UTC, 0 replies.
- [nutch] 04/23: Extend indexer-elastic-rest to support languages - posted by sn...@apache.org on 2017/12/18 15:49:48 UTC, 0 replies.
- [nutch] 05/23: fix formatting - posted by sn...@apache.org on 2017/12/18 15:49:49 UTC, 0 replies.
- [nutch] 06/23: add languages to default config - posted by sn...@apache.org on 2017/12/18 15:49:50 UTC, 0 replies.
- [nutch] 07/23: fix delete - posted by sn...@apache.org on 2017/12/18 15:49:51 UTC, 0 replies.
- [nutch] 08/23: NUTCH-2439 Upgrade Apache Tika dependency to 1.17 - posted by sn...@apache.org on 2017/12/18 15:49:52 UTC, 0 replies.
- [nutch] 09/23: Add tika-config.xml to suppress Tika warnings on stderr - posted by sn...@apache.org on 2017/12/18 15:49:53 UTC, 0 replies.
- [nutch] 10/23: make fully configurable - posted by sn...@apache.org on 2017/12/18 15:49:54 UTC, 0 replies.
- [nutch] 11/23: NUTCH-2480 Upgrade crawler-commons dependency to 0.9 - posted by sn...@apache.org on 2017/12/18 15:49:55 UTC, 0 replies.
- [nutch] 12/23: fix indentation - posted by sn...@apache.org on 2017/12/18 15:49:56 UTC, 0 replies.
- [nutch] 13/23: scope variables - posted by sn...@apache.org on 2017/12/18 15:49:57 UTC, 0 replies.
- [nutch] 14/23: NUTCH-2354 Upgrade Hadoop dependencies to 2.7.4 - posted by sn...@apache.org on 2017/12/18 15:49:58 UTC, 0 replies.
- [nutch] 15/23: NUTCH-2362 Upgrade MaxMind GeoIP version in index-geoip - posted by sn...@apache.org on 2017/12/18 15:49:59 UTC, 0 replies.
- [nutch] 16/23: NUTCH-2035 urlfilter-regex case insensitive rules - posted by sn...@apache.org on 2017/12/18 15:50:00 UTC, 0 replies.
- [nutch] 17/23: NUTCH-2478 HTML parser should resolve base URL - fix parse-html and parse-tika - add unit test for parse-html - posted by sn...@apache.org on 2017/12/18 15:50:01 UTC, 0 replies.
- [nutch] 18/23: NUTCH-2478 HTML parser should resolve base URL - finally fix parse-tika: - href attribute of base element dropped in DOM - need to call tikamd.get("Content-Location") - port HTML parser test from parse-html to parse-tika - add method to DomUtil which prints DocumentFragment - posted by sn...@apache.org on 2017/12/18 15:50:02 UTC, 0 replies.
- [nutch] 19/23: fix for NUTCH-2477 (refactor checker classes) contributed by Jurian Broertjes - posted by sn...@apache.org on 2017/12/18 15:50:03 UTC, 0 replies.
- [nutch] 20/23: Improve command-line help for URL filter and normalizer checker - posted by sn...@apache.org on 2017/12/18 15:50:04 UTC, 0 replies.
- [nutch] 21/23: NUTCH-2322 URL not available for Jexl operations - apply patch contributed by Markus Jelsma - posted by sn...@apache.org on 2017/12/18 15:50:05 UTC, 0 replies.
- [nutch] 22/23: NUTCH-2034 CrawlDB update job to count documents in CrawlDb rejected by URL filters (patch contributed by Luis Lopez) - posted by sn...@apache.org on 2017/12/18 15:50:06 UTC, 0 replies.
- [nutch] 23/23: NUTCH-2415 Create a JEXL based IndexingFilter Merge branch 'pipldev-index-jexl-filter', closes #219 - posted by sn...@apache.org on 2017/12/18 15:50:07 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2380 Upgrade indexer-elastic to Elasticsearch version 5.3.0 (contributed by Jurian Broertjes) - posted by sn...@apache.org on 2017/12/18 16:23:52 UTC, 0 replies.
- [nutch] branch master updated (dd94a61 -> c6e5dfb) - posted by sn...@apache.org on 2017/12/18 16:29:53 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #264 from sebastian-nagel/nutch-2365-fetcher-redirects-mode - posted by sn...@apache.org on 2017/12/18 16:29:54 UTC, 0 replies.
- [nutch] branch master updated (c6e5dfb -> 30db933) - posted by sn...@apache.org on 2017/12/18 16:33:56 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #266 from sebastian-nagel/nutch-2295-docker - posted by sn...@apache.org on 2017/12/18 16:33:57 UTC, 0 replies.
- [nutch] branch master updated (30db933 -> c274029) - posted by sn...@apache.org on 2017/12/18 17:12:47 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #265 from sebastian-nagel/nutch-2483-remove-dependency-on-org-json - posted by sn...@apache.org on 2017/12/18 17:12:48 UTC, 0 replies.
- [nutch] branch master updated: NUTCH-2353 Create seed file with metadata using the REST API - reverse commits 0312bae38c9e95d496336dc24133b15ebefd4d3c and 7deb576bc58bb74725cbb6c5d82d7b9244c6ad42 to fix exception in Nutch webapp - posted by sn...@apache.org on 2017/12/18 17:14:18 UTC, 0 replies.
- [nutch] branch branch-1.14 created (now a8e60bd) - posted by sn...@apache.org on 2017/12/18 19:09:36 UTC, 0 replies.
- [nutch] 01/01: Nutch 1.14 release - update version number - add changes / release notes - posted by sn...@apache.org on 2017/12/18 19:09:37 UTC, 0 replies.
- [nutch] annotated tag release-1.14 updated (a8e60bd -> af6d141) - posted by sn...@apache.org on 2017/12/18 21:28:54 UTC, 0 replies.
- svn commit: r23782 - /release/nutch/KEYS - posted by sn...@apache.org on 2017/12/18 21:46:37 UTC, 0 replies.
- svn commit: r23783 - in /dev/nutch: ./ 1.14/ - posted by sn...@apache.org on 2017/12/18 22:07:02 UTC, 0 replies.
- [nutch] branch master updated (dae62f8 -> 9e0c316) - posted by le...@apache.org on 2017/12/19 14:56:27 UTC, 0 replies.
- [nutch] 01/01: Merge pull request #267 from smartive/fix/NUTCH-2486-unsafe-warning - posted by le...@apache.org on 2017/12/19 14:56:28 UTC, 0 replies.
- [nutch] branch 2.x updated: Nutch 2.X GeneratorJob creates NullPointerException when using DataFileAvroStore - posted by le...@apache.org on 2017/12/21 14:48:52 UTC, 1 replies.
- svn commit: r23868 - /release/nutch/1.14/ - posted by sn...@apache.org on 2017/12/22 17:55:46 UTC, 0 replies.
- svn commit: r23869 - /dev/nutch/1.14/ - posted by sn...@apache.org on 2017/12/22 17:56:00 UTC, 0 replies.
- svn commit: r1819181 - in /nutch/cms_site/trunk/content: ./ apidocs/apidocs-1.14/ apidocs/apidocs-1.14/org/ apidocs/apidocs-1.14/org/apache/ apidocs/apidocs-1.14/org/apache/nutch/ apidocs/apidocs-1.14/org/apache/nutch/analysis/ apidocs/apidocs-1.14/org... - posted by sn...@apache.org on 2017/12/23 20:37:27 UTC, 0 replies.
- svn commit: r1022673 - in /websites/staging/nutch/trunk/content: ./ apidocs/apidocs-1.14/ apidocs/apidocs-1.14/org/ apidocs/apidocs-1.14/org/apache/ apidocs/apidocs-1.14/org/apache/nutch/ apidocs/apidocs-1.14/org/apache/nutch/analysis/ apidocs/apidocs-... - posted by bu...@apache.org on 2017/12/24 16:21:19 UTC, 0 replies.
- svn commit: r1819242 - /nutch/cms_site/trunk/content/javadoc.md - posted by sn...@apache.org on 2017/12/25 11:36:22 UTC, 0 replies.
- svn commit: r1022700 - in /websites/staging/nutch/trunk/content: ./ javadoc.html - posted by bu...@apache.org on 2017/12/25 11:37:21 UTC, 0 replies.
- svn commit: r1022708 - /websites/production/nutch/content/ - posted by sn...@apache.org on 2017/12/25 17:34:45 UTC, 0 replies.
- svn commit: r23905 - /release/nutch/1.13/ - posted by sn...@apache.org on 2017/12/25 17:39:15 UTC, 0 replies.
- [nutch] branch master updated: Prepare for new development after release of 1.14, bump - version number (1.14 -> 1.15-SNAPSHOT) - year (2017 -> 2018) - posted by sn...@apache.org on 2017/12/25 17:56:10 UTC, 0 replies.