You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2017/09/29 11:46:43 UTC

[nutch] branch master updated (da64358 -> 777e759)

This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.


    from da64358  Merge pull request #227 from kpm1985/NUTCH-2436
     add 7db1173  NUTCH-2433 / Html Parser: keep htmltag where the outlinks are found
     add ca59744  New configuration parameter: 'parser.html.outlinks.htmlnode_metadata_name' set empty value as default.
     add bfd47db  Small adjustment: Keep a reference to the last outlink to set metadata.
     add 3067753  Apply new parameter "parser.html.outlinks.ignore_tags" to the tika parser, as well. Some extra [eclipse-codeformat.xml] formatting changes applied as well.
     new 777e759  Merge pull request #224 from maborec/NUTCH-2433

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 conf/nutch-default.xml                             |  7 ++++
 .../apache/nutch/parse/html/DOMContentUtils.java   | 24 ++++++++++++--
 .../apache/nutch/parse/tika/DOMContentUtils.java   | 37 +++++++++++++++++-----
 3 files changed, 58 insertions(+), 10 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
['"commits@nutch.apache.org" <co...@nutch.apache.org>'].

[nutch] 01/01: Merge pull request #224 from maborec/NUTCH-2433

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git

commit 777e759ada24eac84072a5f1722938442432eadc
Merge: da64358 3067753
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Fri Sep 29 13:46:40 2017 +0200

    Merge pull request #224 from maborec/NUTCH-2433
    
    Nutch 2433 - New configuration for HTML parser to keep the HTML nodes in outlinks metadata

 conf/nutch-default.xml                             |  7 ++++
 .../apache/nutch/parse/html/DOMContentUtils.java   | 24 ++++++++++++--
 .../apache/nutch/parse/tika/DOMContentUtils.java   | 37 +++++++++++++++++-----
 3 files changed, 58 insertions(+), 10 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
"commits@nutch.apache.org" <co...@nutch.apache.org>.