You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/09/01 15:49:00 UTC

[jira] [Commented] (NUTCH-2696) Nutch SegmentReader does not dump non-ASCII characters with Hadoop 3.x

    [ https://issues.apache.org/jira/browse/NUTCH-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920421#comment-16920421 ] 

ASF GitHub Bot commented on NUTCH-2696:
---------------------------------------

sebastian-nagel commented on pull request #440: NUTCH-2696 Nutch SegmentReader does not dump non-ASCII characters with Hadoop 3.x
URL: https://github.com/apache/nutch/pull/440
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Nutch SegmentReader does not dump non-ASCII characters with Hadoop 3.x
> ----------------------------------------------------------------------
>
>                 Key: NUTCH-2696
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2696
>             Project: Nutch
>          Issue Type: Bug
>          Components: segment
>    Affects Versions: 1.15
>         Environment: Hadoop version : 3.0.0 (CDH 6.1)
> Nutch : 1.15
> Mode : distributed mode
>            Reporter: Laurent Hervaud
>            Assignee: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.16
>
>
> All Nutch tasks work properly with Hadoop 3.x. (except SegmentReader)
>  SegmentReader with -get option work fine.
>  SegmentReader with -dump option replace non-ascii character by ?
> Exemple url : [http://www.wikipedia.fr/index.php]
>  
> {code:java}
> command : ./runtime/deploy/bin/nutch readseg -dump /user/nutch/crawl1.15/segments/20190221093756 /tmp/dump1.15 -nocontent -nogenerate -noparse -noparsedata
> ParseText::
>  Wikipedia.fr - Portail de recherche sur les projets Wikim?dia
>  Chercher sur Wikip?dia en fran?ais
>  L?encyclop?die librement r?utilisable que chacun peut am?liorer.
> {code}
>  
>  
> {code:java}
> command : ./runtime/deploy/bin/nutch readseg -get /user/nutch/crawl1.15/segments/20190221093756 http://www.wikipedia.fr/index.php -nocontent -nogenerate -noparse -noparsedata
> ParseText::
>  Wikipedia.fr - Portail de recherche sur les projets Wikimédia
>  Chercher sur Wikipédia en français
>  L’encyclopédie librement réutilisable que chacun peut améliorer.
> {code}
>  
> I try to build with hadoop 3.0.0 dependencies in ivy.xml but i have the same result
> It's work fine in local mode.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)