You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Luis Filipe Nassif (JIRA)" <ji...@apache.org> on 2016/07/14 20:39:20 UTC

[jira] [Created] (TIKA-2033) Value attributes of input elements not extracted from HTML

Luis Filipe Nassif created TIKA-2033:
----------------------------------------

             Summary: Value attributes of input elements not extracted from HTML 
                 Key: TIKA-2033
                 URL: https://issues.apache.org/jira/browse/TIKA-2033
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 1.10
         Environment: Windows 7, java8 x64
            Reporter: Luis Filipe Nassif
            Priority: Minor


The text of value attributes of input elements currently is not extracted from HTML files. Note it is rendered by browsers. I tried using IdentityHtmlMapper and played with HtmlSchema with no luck. Simple test HTML below:

<HTML><body><input value='text'></input></body></HTML>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)