You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Younes (Jira)" <ji...@apache.org> on 2020/06/10 21:22:00 UTC

[jira] [Comment Edited] (TIKA-3109) Ingest attachment: failed to extract text from iframe

    [ https://issues.apache.org/jira/browse/TIKA-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132723#comment-17132723 ] 

Younes edited comment on TIKA-3109 at 6/10/20, 9:21 PM:
--------------------------------------------------------

[~kkrugler] this HTML main purpose was to pinpoint the *iframe* indexing issue.
 How about this simple [one|https://github.com/elastic/elasticsearch/issues/57924#issuecomment-642016522] from [~dadoonet]:
{code:java}
<html>
<body>
outside
<iframe srcdoc="<html><body>content iframe</body></html>"></iframe>
</body>
</html>
{code}


was (Author: younes):
[~kkrugler] this HTML main purpose was to pinpoint the *iframe* indexing issue.
How about this simple one (from [~dadoonet] ):
{code:java}
<html>
<body>
outside
<iframe srcdoc="<html><body>content iframe</body></html>"></iframe>
</body>
</html>
{code}

> Ingest attachment: failed to extract text from iframe
> -----------------------------------------------------
>
>                 Key: TIKA-3109
>                 URL: https://issues.apache.org/jira/browse/TIKA-3109
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.22
>         Environment: * Apache Tika 1.22
>  * {{Java}}
> {{java 13.0.2 2020-01-14}}
>  * {{Ubuntu 18.04.1 LTS}}
> {{Linux XXXXX 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux}}
>            Reporter: Younes
>            Priority: Major
>
> This standalone [HTML|https://github.com/elastic/elasticsearch/files/4757855/c0711285-8ab7-46c3-b730-7c0639466537.html.zip] page has all its CSS/JS/IMAGEs embedded.
>  After indexing it using ElasticSearch, we tried to search the keyword *logarithmic* which exists. Unfortunately, we couldn't find it.
> [~dadoonet] was able to reproduce the issue which is fully described [elasticsearch|https://github.com/elastic/elasticsearch/issues/57924]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)