You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2012/07/07 23:34:34 UTC

[jira] [Resolved] (TIKA-518) Attribute values are not indexed

     [ https://issues.apache.org/jira/browse/TIKA-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-518.
--------------------------------

    Resolution: Won't Fix

Resolving as won't fix as in most common cases XML attribute values are not interesting from a text extraction perspective. Document types for which extracting attribute values make sense should have their own parser classes.
                
> Attribute values are not indexed
> --------------------------------
>
>                 Key: TIKA-518
>                 URL: https://issues.apache.org/jira/browse/TIKA-518
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 0.6
>            Reporter: Ovidiu  Cilnician
>            Assignee: Jukka Zitting
>
> I just switched from jackrabbit 1.4.11 to jackrabbit2.1.1
> Some of the test cases that were working in 1.4, fail in 2.1.1. 
> These test cases(CSW service related) contain an AnyText filter and they are looking for an attribute value. No records are returned in this case. It works when an element value is used.
> By looking at Jackrabbit Content Repository project I found this issue(JCR-470 XMLIndexFilter should index the attributes) which was fixed for Jackrabbit 1.4.
> Did the switch to tika(my version of jackrabbit 2.1.1 uses tika 0.6) caused this problem?
> Thank you.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira