You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Jira)" <ji...@apache.org> on 2021/06/14 12:45:00 UTC
[jira] [Commented] (TIKA-3445) Extension reading it as eml instead
of txt when headers are not present
[ https://issues.apache.org/jira/browse/TIKA-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362930#comment-17362930 ]
Nick Burch commented on TIKA-3445:
----------------------------------
This file does seem to be a series of emails. Checking with the latest 1.x and 2.x versions of Apache Tika, detection identifies the file as {{message/rfc822}} which I'd say is correct
> Extension reading it as eml instead of txt when headers are not present
> -----------------------------------------------------------------------
>
> Key: TIKA-3445
> URL: https://issues.apache.org/jira/browse/TIKA-3445
> Project: Tika
> Issue Type: Bug
> Components: core, detector, metadata, mime, parser
> Affects Versions: 1.25, 1.26
> Reporter: Vamsi Molli
> Priority: Blocker
> Fix For: 1.24.1
>
> Attachments: test_sample_message (1).txt
>
>
> The attached txt file doesn't have starting headers it is treating as .eml file but it should be .txt.
> stream = TikaInputStream.get(fis = new FileInputStream(paths));stream = TikaInputStream.get(fis = new FileInputStream(paths)); metadata.add(Metadata.RESOURCE_NAME_KEY, paths); MediaType mediaType = detector.detect(stream, metadata);
> MediaType detect(InputStream input, Metadata metadata) throws IOException;
--
This message was sent by Atlassian Jira
(v8.3.4#803005)