You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Luis Filipe Nassif (JIRA)" <ji...@apache.org> on 2016/07/14 20:39:20 UTC
[jira] [Created] (TIKA-2033) Value attributes of input elements not
extracted from HTML
Luis Filipe Nassif created TIKA-2033:
----------------------------------------
Summary: Value attributes of input elements not extracted from HTML
Key: TIKA-2033
URL: https://issues.apache.org/jira/browse/TIKA-2033
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.10
Environment: Windows 7, java8 x64
Reporter: Luis Filipe Nassif
Priority: Minor
The text of value attributes of input elements currently is not extracted from HTML files. Note it is rendered by browsers. I tried using IdentityHtmlMapper and played with HtmlSchema with no luck. Simple test HTML below:
<HTML><body><input value='text'></input></body></HTML>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)