You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2014/03/17 19:29:47 UTC

[jira] [Resolved] (TIKA-1260) Detection result for zero-byte files is text/plain

     [ https://issues.apache.org/jira/browse/TIKA-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-1260.
---------------------------------

       Resolution: Not A Problem
    Fix Version/s:     (was: 1.5)

What you're seeing is the result of using the file name as a hint of the type of the file. If the file name ends in {{.txt}} or some similar suffix, it probably should be treated as a text file, even if it doesn't contain anything. Only when no such hints are available will Tika fall back to {{application/octet-stream}}. See:

{code}
$ touch empty.txt
$ java -jar tika-app-1.5.jar --detect empty.txt
text/plain
$ java -jar tika-app-1.5.jar --detect < empty.txt
application/octet-stream
{code}

> Detection result for zero-byte files is text/plain
> --------------------------------------------------
>
>                 Key: TIKA-1260
>                 URL: https://issues.apache.org/jira/browse/TIKA-1260
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 1.5
>         Environment: Linux Mint 16 
>            Reporter: Johan van der Knijff
>            Priority: Minor
>              Labels: empty, zero-length
>
> Running Tika with the -d (detection) option, any zero-byte files are identified as "text/plain". I'm wondering if this is the intended behavior? I know the Unix File tool reports "inode/x-empty" in such cases. Perhaps Tika should do this as well?



--
This message was sent by Atlassian JIRA
(v6.2#6252)