You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Younes (Jira)" <ji...@apache.org> on 2020/06/10 14:11:00 UTC
[jira] [Created] (TIKA-3109) Ingest attachment: failed to extract
text from iframe
Younes created TIKA-3109:
----------------------------
Summary: Ingest attachment: failed to extract text from iframe
Key: TIKA-3109
URL: https://issues.apache.org/jira/browse/TIKA-3109
Project: Tika
Issue Type: Bug
Affects Versions: 1.22
Environment: * Apache Tika 1.22
* {{Java}}
{{java 13.0.2 2020-01-14}}
* {{Ubuntu 18.04.1 LTS}}
{{Linux XXXXX 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux}}
Reporter: Younes
This standalone [HTML|https://github.com/elastic/elasticsearch/files/4757855/c0711285-8ab7-46c3-b730-7c0639466537.html.zip] page has all its CSS/JS/IMAGEs embedded.
After indexing it using ElasticSearch, we tried to search the keyword *logarithmic* which exists. Unfortunately, we couldn't find it.
[~dadoonet] was able to reproduce the issue which is fully described [elasticsearch|https://github.com/elastic/elasticsearch/issues/57924]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)