You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2010/09/17 21:07:35 UTC

[jira] Resolved: (NUTCH-906) Nutch OpenSearch sometimes raises DOMExceptions due to Lucene column names not being valid XML tag names

     [ https://issues.apache.org/jira/browse/NUTCH-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  resolved NUTCH-906.
-------------------------------------

    Fix Version/s: 1.2
       Resolution: Fixed

Fixed in rev. 998261. Thanks!

> Nutch OpenSearch sometimes raises DOMExceptions due to Lucene column names not being valid XML tag names
> --------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-906
>                 URL: https://issues.apache.org/jira/browse/NUTCH-906
>             Project: Nutch
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 1.1
>         Environment: Debian GNU/Linux 64-bit
>            Reporter: Asheesh Laroia
>            Assignee: Andrzej Bialecki 
>             Fix For: 1.2
>
>         Attachments: 0001-OpenSearch-If-a-Lucene-column-name-begins-with-a-num.patch
>
>   Original Estimate: 0.33h
>  Remaining Estimate: 0.33h
>
> The Nutch FAQ explains that OpenSearch includes "all fields that are available at search result time." However, some Lucene column names can start with numbers. Valid XML tags cannot. If Nutch is generating OpenSearch results for a document with a Lucene document column whose name starts with numbers, the underlying Xerces library throws this exception: 
> org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified. 
> So I have written a patch that tests strings before they are used to generate tags within OpenSearch.
> I hope you merge this, or a better version of the patch!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.