You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2011/04/05 13:19:05 UTC

[jira] [Commented] (NUTCH-974) Parsing Error in Nutch 1.2 on Windows7

    [ https://issues.apache.org/jira/browse/NUTCH-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015864#comment-13015864 ] 

Markus Jelsma commented on NUTCH-974:
-------------------------------------

Niksa, i tested a fetch and parse cycle of that URL with both Nutch 1.1 and Nutch 1.2 without any problems. You have something misconfigured, probably in somewhere in parse-plugins or something. Next time, please open a thread first on the Nutch user mailings list before opening an issue in Jira.

Thanks.

> Parsing Error in Nutch 1.2 on Windows7
> --------------------------------------
>
>                 Key: NUTCH-974
>                 URL: https://issues.apache.org/jira/browse/NUTCH-974
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.2
>         Environment: Windows7 64-bit, Cygwin 1.7.9-1
>            Reporter: Niksa Jakovljevic
>
> Hello World example of crawling does not work with Nutch 1.2 libs, but works fine with Nutch 1.1 libs. Note that same configuration is used in both Nutch 1.2 and Nutch 1.1.
> Nutch 1.2 always throws following exception:
> 2011-04-01 16:33:45,177 WARN  parse.ParseUtil - Unable to successfully parse content http://www.test.com/ of type text/html
> 2011-04-01 16:33:45,177 WARN  fetcher.Fetcher - Error parsing: http://www.test.com/: failed(2,200): org.apache.nutch.parse.ParseException: Unable to successfully parse content
> Thanks,
> Niksa Jakovljevic

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira