You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ghenadie (JIRA)" <ji...@apache.org> on 2018/09/05 08:51:00 UTC

[jira] [Created] (TIKA-2723) Issue with parsing .mht container

Ghenadie created TIKA-2723:
------------------------------

             Summary: Issue with parsing .mht container
                 Key: TIKA-2723
                 URL: https://issues.apache.org/jira/browse/TIKA-2723
             Project: Tika
          Issue Type: Bug
          Components: mime
    Affects Versions: 1.17
            Reporter: Ghenadie
             Fix For: 1.17


Hello,

I have a file with .mht extension. Tika processes  this file  as an email (Is Email? - true), and uses RFC822Parser to parse it. As a result, I have the content with email fields, as: From, To, CC, BCC, Subject. 

This is an issue for me. And seems to be an issue from Tika. As far as this is a web container, it should not be parsed through RFCParser (which is an email parser). 

Please investigate this issue as soon as possible. 

Please let me know in case of any questions.

 

Thank you,

Ghenadie R.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)