You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2010/08/08 21:06:24 UTC

[jira] Resolved: (NUTCH-564) External parser supports encoding attribute

     [ https://issues.apache.org/jira/browse/NUTCH-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann resolved NUTCH-564.
-------------------------------------

    Fix Version/s: 2.0
       Resolution: Fixed

- patch applied in r983472. Thanks Antony! One thing to note: ext-parser hasn't been brought up to date with Nutch 2.0, so I imagine we'll have to do that before we release Nutch 2.0. I'll file an issue for that shortly.

> External parser supports encoding attribute
> -------------------------------------------
>
>                 Key: NUTCH-564
>                 URL: https://issues.apache.org/jira/browse/NUTCH-564
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 0.9.0
>         Environment: All
>            Reporter: Antony Bowesman
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 2.0
>
>         Attachments: ExtParser_0.9.0.patch, ExtParser_1.0.0.patch
>
>
> When an external component generates text, which is returned to the external parser, it always converts the text using the default character set.  (os.toString()).  For example, the returned text may be utf-8, but will not be converted to a String correctly.
> I added the attribute <encoding> to the <implementation> XML in plugin.xml and this is then used to convert the text.
> I have tested my original fix on my local 0.9 and include a patch, but have also made an untested patch for trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.