You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/02/17 18:36:00 UTC

[jira] [Commented] (TIKA-3976) Allow users to configure behavior for zero-byte files

    [ https://issues.apache.org/jira/browse/TIKA-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690528#comment-17690528 ] 

ASF GitHub Bot commented on TIKA-3976:
--------------------------------------

tballison merged PR #972:
URL: https://github.com/apache/tika/pull/972




> Allow users to configure behavior for zero-byte files
> -----------------------------------------------------
>
>                 Key: TIKA-3976
>                 URL: https://issues.apache.org/jira/browse/TIKA-3976
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>
> We currently throw a ZeroByteFileException whenever the stream is empty in AutoDetectParser.
> I _think_ the reason we did this was for use cases in search systems, where it would be exceptional to send in a zero-byte file.
> For other use cases, though, especially for embedded files, it is kind of normal to have zero-byte contents but have meaningful metadata.
> So, embedded files generally are one place (as in .ppt, etc.), but WARC redirects and HTTPResponse files would be other types of containers that may include meaningful metadata in the embedded file, but the embedded file has a zero-byte stream. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)