You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Jira)" <ji...@apache.org> on 2021/05/19 08:44:00 UTC

[jira] [Commented] (TIKA-3409) provide isBinary/isText method

    [ https://issues.apache.org/jira/browse/TIKA-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347418#comment-17347418 ] 

Nick Burch commented on TIKA-3409:
----------------------------------

Do you want to know if Apache Tika can parse the file? Or if you could make sense of it by eg opening in a text editor?

For the former, you can ask Tika what mime types it has parsers for, see [https://cwiki.apache.org/confluence/display/TIKA/Troubleshooting+Tika#TroubleshootingTika-IdentifyingwhatParsersyourTikainstallsupports]

For the latter, you can get most of that by asking Tika to detect the type, ask Tika for the aliases for that type, and see if the primary type or any aliases start with `text/` - that should get pretty much anything that's actually text-based

> provide isBinary/isText method
> ------------------------------
>
>                 Key: TIKA-3409
>                 URL: https://issues.apache.org/jira/browse/TIKA-3409
>             Project: Tika
>          Issue Type: New Feature
>            Reporter: Caleb Cushing
>            Priority: Major
>
> Since tika can detect what kind of file something is, it could also know whether that file type is binary or not, I'd love to have a method  `MimeType::isBinary` or something, so I could know if I could try "parsing" the file.
> related https://stackoverflow.com/q/620993/206466



--
This message was sent by Atlassian Jira
(v8.3.4#803005)