You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Umut Saribiyik (JIRA)" <ji...@apache.org> on 2018/08/14 14:59:00 UTC

[jira] [Comment Edited] (TIKA-2611) Tika mistakenly determines mimetype of .js file as application/x-elc

    [ https://issues.apache.org/jira/browse/TIKA-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579914#comment-16579914 ] 

Umut Saribiyik edited comment on TIKA-2611 at 8/14/18 2:58 PM:
---------------------------------------------------------------

Hello [~Sarhanto],

I had the same problem with a text file. The file contained empty lines. As I removed the empty lines the parsing worked fine for me.

I have used the following regex to remove the empty lines before the file where parsed by Tika:

content.replaceAll("(?m)^[ \t]*\r?\n", "");

 

Kind regards

Umut Saribiyik

 


was (Author: umut.saribiyik@wabion.com):
Hello [~Sarhanto],

I had the same problem with a text file. The file contained empty lines. As I removed the empty lines the parsing worked fine for me.

I have used the following regex to remove the empty lines before the file where parsed by Tika:

content.replaceAll("(?m)^[ \t]*\r?\n", "");

> Tika mistakenly determines mimetype of .js file as application/x-elc
> --------------------------------------------------------------------
>
>                 Key: TIKA-2611
>                 URL: https://issues.apache.org/jira/browse/TIKA-2611
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 1.17
>            Reporter: Anto
>            Priority: Major
>             Fix For: 2.0.0
>
>         Attachments: 980x240_edge.js
>
>
> Attached file is misinterpreted as being an application/x-elc file when it's really just a plain javascript file.
> Using:
> {code:java}
> private final DefaultDetector mimeTypeDetector = new DefaultDetector();
> public String determineMimeType(final byte[] data, final String fileName) {
>     final TikaInputStream inputStream = TikaInputStream.get(data);
>     final Metadata metadata = new Metadata();
>     metadata.set(Metadata.RESOURCE_NAME_KEY, fileName);
>     try {
>         return mimeTypeDetector.detect(inputStream, metadata).toString();
>     } catch (final IOException e) {
>         throw new ApiException(e);
>     }
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)