You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Matthew Caruana Galizia (JIRA)" <ji...@apache.org> on 2017/02/23 11:23:44 UTC
[jira] [Created] (TIKA-2274) and
metadata collision
Matthew Caruana Galizia created TIKA-2274:
---------------------------------------------
Summary: <title> and <meta name="title"> metadata collision
Key: TIKA-2274
URL: https://issues.apache.org/jira/browse/TIKA-2274
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.14
Reporter: Matthew Caruana Galizia
Priority: Minor
In several different corpuses I've found HTML files which look like the following:
{code}
<html>
<head>
<title>Some title</title>
<meta name="title" content="some other title">
</head>
...
</html>
{code}
This causes the "title" property in the metadata to have two values set, when one would expect that this field is not multivalued.
Perhaps some fields from <meta> tags, like this one, should be namespaced.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)