You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/09/30 18:07:33 UTC

[jira] Commented: (TIKA-383) new option for TIKA CLI to get only the languages of a document

    [ https://issues.apache.org/jira/browse/TIKA-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916531#action_12916531 ] 

Jukka Zitting commented on TIKA-383:
------------------------------------

Thanks, and sorry for the long delay in responding! I applied the patch along with some cleanups in revision 1003129.

Currently the Tika parsers don't do language identification by default, so I'm planning to revise the TikaCLI code to explicitly use the LanguageIdentifier features to detect the language of the parsed document.

> new option for TIKA CLI to get only the languages of a document
> ---------------------------------------------------------------
>
>                 Key: TIKA-383
>                 URL: https://issues.apache.org/jira/browse/TIKA-383
>             Project: Tika
>          Issue Type: Improvement
>          Components: cli
>    Affects Versions: 0.7
>            Reporter: Markus Goldbach
>         Attachments: TIKA-383.patch
>
>
> The TIKA CLI returns all metadata of an Document, but sometimes you only need one part of the metadatas. in my case I only need the language. I wrote a small patch, wich adds the arguments -l and --language to the CLI, who filters the metadata and return only the language. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.