You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ghenadie (JIRA)" <ji...@apache.org> on 2018/10/02 11:33:00 UTC

[jira] [Updated] (TIKA-2723) Issue with parsing .mht container

     [ https://issues.apache.org/jira/browse/TIKA-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ghenadie updated TIKA-2723:
---------------------------
    Description: 
Hello,

I have a file with .mht extension. Tika processes  this file  as an email (Is Email? - true), and uses RFC822Parser to parse it. 

This is an issue for me. And seems to be an issue from Tika. As far as this is a web container, it should not be parsed through RFCParser (which is an email parser). 

Please investigate this issue as soon as possible. 

Please let me know in case of any questions.

 

Thank you,

Ghenadie R.

  was:
Hello,

I have a file with .mht extension. Tika processes  this file  as an email (Is Email? - true), and uses RFC822Parser to parse it. As a result, I have the content with email fields, as: From, To, CC, BCC, Subject. 

This is an issue for me. And seems to be an issue from Tika. As far as this is a web container, it should not be parsed through RFCParser (which is an email parser). 

Please investigate this issue as soon as possible. 

Please let me know in case of any questions.

 

Thank you,

Ghenadie R.


> Issue with parsing .mht container
> ---------------------------------
>
>                 Key: TIKA-2723
>                 URL: https://issues.apache.org/jira/browse/TIKA-2723
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.17
>            Reporter: Ghenadie
>            Priority: Major
>              Labels: patch
>             Fix For: 1.17
>
>         Attachments: Sample-excel.mht, [TIKA-2723] Issue with parsing _mht container - ASF JIRA.mht
>
>
> Hello,
> I have a file with .mht extension. Tika processes  this file  as an email (Is Email? - true), and uses RFC822Parser to parse it. 
> This is an issue for me. And seems to be an issue from Tika. As far as this is a web container, it should not be parsed through RFCParser (which is an email parser). 
> Please investigate this issue as soon as possible. 
> Please let me know in case of any questions.
>  
> Thank you,
> Ghenadie R.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)