You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2022/03/03 19:14:00 UTC

[jira] [Resolved] (TIKA-3687) Email file detected as text/html

     [ https://issues.apache.org/jira/browse/TIKA-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison resolved TIKA-3687.
-------------------------------
    Fix Version/s: 2.3.1
       Resolution: Fixed

Thank you!

> Email file detected as text/html
> --------------------------------
>
>                 Key: TIKA-3687
>                 URL: https://issues.apache.org/jira/browse/TIKA-3687
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Thierry Guérin
>            Priority: Minor
>             Fix For: 2.3.1
>
>         Attachments: testRFC822-ARC.eml
>
>
> The attached email (which I redacted from a real email received from Office365) is detected a HTML.
> This is because it contains ARC * headers, but they're not the first one, so the matcher that looks for ARC headers fails, and the matcher for regular 'From' header also fails because the 'From' headers occurs after 1024 characters.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)