You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/06/09 16:03:01 UTC

[jira] [Commented] (NUTCH-2788) ParseData: improve presentation of Metadata in method toString()

    [ https://issues.apache.org/jira/browse/NUTCH-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129461#comment-17129461 ] 

ASF GitHub Bot commented on NUTCH-2788:
---------------------------------------

sebastian-nagel opened a new pull request #529:
URL: https://github.com/apache/nutch/pull/529


   - switch to multi-line presentation of Metadata in ParseData::toString
   - default implementation of Metadata::toString is still single-line
   - replace StringBuffer by StringBuilder in modified methods
   
   Parsechecker will now show metadata as follows:
   ```
   $> bin/nutch parsechecker -Dplugin.includes='parse-(tika|metatags)|protocol-okhttp' http://localhost/
   fetching: http://localhost/
   ...
   Title: Apache2 Ubuntu Default Page: It works
   Outlinks: 2
     outlink: toUrl: http://localhost/icons/ubuntu-logo.png anchor: Ubuntu Logo
     outlink: toUrl: http://localhost/manual anchor: manual
   Content Metadata:
     Accept-Ranges = bytes
     Keep-Alive = timeout=5, max=100
     nutch.fetch.time = 1591696071739
     Server = Apache/2.4.41 (Ubuntu)
     ETag = "2aa6-59647cb960db3-gzip"
     Connection = Keep-Alive
     Vary = Accept-Encoding
     Last-Modified = Fri, 01 Nov 2019 12:06:26 GMT
     Date = Tue, 09 Jun 2020 09:47:51 GMT
     Content-Type = text/html
   Parse Metadata:
     dc:title = Apache2 Ubuntu Default Page: It works
     Content-Encoding = UTF-8
     Content-Type-Hint = text/html; charset=UTF-8
     Content-Type = application/xhtml+xml; charset=UTF-8
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> ParseData: improve presentation of Metadata in method toString()
> ----------------------------------------------------------------
>
>                 Key: NUTCH-2788
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2788
>             Project: Nutch
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 1.16
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.18
>
>
> See NUTCH-2567:
> bq. I would also suggest making the output of Metadata::toString more readable(for instance by adding a newline before each new metadata value). It would have made this bug way easier to spot inside the output of the parsechecker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)