You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Matthew Caruana Galizia (JIRA)" <ji...@apache.org> on 2017/02/23 11:23:44 UTC

[jira] [Created] (TIKA-2274) and <meta name="title"> metadata collision</h1><pre>Matthew Caruana Galizia created TIKA-2274: --------------------------------------------- Summary: <title> and <meta name="title"> metadata collision Key: TIKA-2274 URL: https://issues.apache.org/jira/browse/TIKA-2274 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.14 Reporter: Matthew Caruana Galizia Priority: Minor In several different corpuses I've found HTML files which look like the following: {code} <html> <head> <title>Some title</title> <meta name="title" content="some other title"> </head> ... </html> {code} This causes the "title" property in the metadata to have two values set, when one would expect that this field is not multivalued. Perhaps some fields from <meta> tags, like this one, should be namespaced. -- This message was sent by Atlassian JIRA (v6.3.15#6346) </pre><hr/> </body> </html>