You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2018/06/02 11:35:16 UTC

[nutch] branch master updated (0cec7b5 -> 2544fad)

This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.


    from 0cec7b5  Merge pull request #335 from r0ann3l/NUTCH-2580
     add 4f73c63  NUTCH-2583 Upgrading Nutch's dependencies - apply patch contributed by Ralf
     add 20ecad2  NUTCH-2584 Upgrade parse-tika to use Tika 1.18
     add f5e3a30  NUTCH-2584 Upgrade parse-tika to use Tika 1.18 - fix failing unit tests - use Tika parser to get DOM tree of test documents - fix HTMLMetaProcessor to extract no-cache and base-href   attributes on DOM tree modified by Tika - ignore links from FORM and SOURCE elements which are   not extracted by Tika parser
     add 217e646  Add target "report" to view dependency tree of plugins
     add 107b364  NUTCH-2589 HTML redirections are not followed when using parse-tika - extract meta-refresh redirects from DOM tree normalized by Tika - add unit test to check whether meta-refresh redirects are   extracted and parse status holds the redirect target
     new 2544fad  Merge pull request #336 from sebastian-nagel/NUTCH-2583-upgrade-dependencies

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 ivy/ivy.xml                                        |  67 +++++------
 src/plugin/build-plugin.xml                        |   4 +
 src/plugin/parse-tika/build.xml                    |  15 +--
 src/plugin/parse-tika/howto_upgrade_tika.txt       |  16 ++-
 src/plugin/parse-tika/ivy.xml                      |   2 +-
 src/plugin/parse-tika/plugin.xml                   |  65 +++++++----
 .../apache/nutch/parse/tika/HTMLMetaProcessor.java | 125 +++++++++++++--------
 .../org/apache/nutch/parse/tika/TikaParser.java    |  20 ++--
 .../{ => parse}/tika/TestDOMContentUtils.java      |  78 +++++++------
 .../nutch/{ => parse}/tika/TestFeedParser.java     |   2 +-
 .../nutch/{ => parse}/tika/TestHtmlParser.java     |   2 +-
 .../nutch/{ => parse}/tika/TestImageMetadata.java  |   2 +-
 .../nutch/{ => parse}/tika/TestMSWordParser.java   |   2 +-
 .../nutch/{ => parse}/tika/TestOOParser.java       |   2 +-
 .../nutch/{ => parse}/tika/TestPdfParser.java      |   2 +-
 .../nutch/{ => parse}/tika/TestRTFParser.java      |   2 +-
 .../{ => parse}/tika/TestRobotsMetaProcessor.java  |  70 ++++++++----
 17 files changed, 276 insertions(+), 200 deletions(-)
 rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestDOMContentUtils.java (89%)
 rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestFeedParser.java (99%)
 rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestHtmlParser.java (99%)
 rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestImageMetadata.java (98%)
 rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestMSWordParser.java (98%)
 rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestOOParser.java (98%)
 rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestPdfParser.java (98%)
 rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestRTFParser.java (98%)
 rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestRobotsMetaProcessor.java (68%)

-- 
To stop receiving notification emails like this one, please contact
snagel@apache.org.

[nutch] 01/01: Merge pull request #336 from sebastian-nagel/NUTCH-2583-upgrade-dependencies

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git

commit 2544fad223faeaafef966d0f04ff00da9f749641
Merge: 0cec7b5 107b364
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Sat Jun 2 13:35:11 2018 +0200

    Merge pull request #336 from sebastian-nagel/NUTCH-2583-upgrade-dependencies
    
    NUTCH-2583 Upgrading Nutch's dependencies (contributed by Ralf)
    NUTCH-2584 Upgrade parse-tika to use Tika 1.18
    NUTCH-2589 HTML redirections are not followed when using parse-tika

 ivy/ivy.xml                                        |  67 +++++------
 src/plugin/build-plugin.xml                        |   4 +
 src/plugin/parse-tika/build.xml                    |  15 +--
 src/plugin/parse-tika/howto_upgrade_tika.txt       |  16 ++-
 src/plugin/parse-tika/ivy.xml                      |   2 +-
 src/plugin/parse-tika/plugin.xml                   |  65 +++++++----
 .../apache/nutch/parse/tika/HTMLMetaProcessor.java | 125 +++++++++++++--------
 .../org/apache/nutch/parse/tika/TikaParser.java    |  20 ++--
 .../{ => parse}/tika/TestDOMContentUtils.java      |  78 +++++++------
 .../nutch/{ => parse}/tika/TestFeedParser.java     |   2 +-
 .../nutch/{ => parse}/tika/TestHtmlParser.java     |   2 +-
 .../nutch/{ => parse}/tika/TestImageMetadata.java  |   2 +-
 .../nutch/{ => parse}/tika/TestMSWordParser.java   |   2 +-
 .../nutch/{ => parse}/tika/TestOOParser.java       |   2 +-
 .../nutch/{ => parse}/tika/TestPdfParser.java      |   2 +-
 .../nutch/{ => parse}/tika/TestRTFParser.java      |   2 +-
 .../{ => parse}/tika/TestRobotsMetaProcessor.java  |  70 ++++++++----
 17 files changed, 276 insertions(+), 200 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
snagel@apache.org.