You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jan Høydahl (JIRA)" <ji...@apache.org> on 2011/09/18 01:26:09 UTC

[jira] [Commented] (TIKA-546) Add ability to create language profiles to tika-app

    [ https://issues.apache.org/jira/browse/TIKA-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107290#comment-13107290 ] 

Jan Høydahl commented on TIKA-546:
----------------------------------

What's the state of this issue? It says "unresolved" but something is committed?

> Add ability to create language profiles to tika-app
> ---------------------------------------------------
>
>                 Key: TIKA-546
>                 URL: https://issues.apache.org/jira/browse/TIKA-546
>             Project: Tika
>          Issue Type: New Feature
>          Components: cli, languageidentifier
>    Affects Versions: 0.7
>            Reporter: Jan Høydahl
>            Assignee: Chris A. Mattmann
>         Attachments: TIKA-546.tikhonov.18042011.PATCH
>
>
> Since TIKA-490 it is supposed to be easy adding new language profiles to TIKA. However, currently the process involves using Nutch's NGramProfile tool and editing the output.
> We should port Nutch's profile builder to Tika and make it part of tika-app.jar:
> # See http://wiki.apache.org/nutch/LanguageIdentifier
> # java -jar tika-app.jar --create-profile [--gramsizes=<n>,<n>,...] [--maxlines=<max>] <profile-name> <filename> <encoding>
> Using --gramsizes and --maxlines, we could support both Tika-style profiles and Nutch-style profiles and thus deprecate the Nutch tool. Defaults should be --gramsizes=3 --maxlines=1000

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Re: [jira] [Commented] (TIKA-546) Add ability to create language profiles to tika-app

Posted by Oleg Tikhonov <ol...@gmail.com>.
Yes, it's resolved, need to change the status.


2011/9/18 Jan Høydahl (JIRA) <ji...@apache.org>

>
>    [
> https://issues.apache.org/jira/browse/TIKA-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107290#comment-13107290]
>
> Jan Høydahl commented on TIKA-546:
> ----------------------------------
>
> What's the state of this issue? It says "unresolved" but something is
> committed?
>
> > Add ability to create language profiles to tika-app
> > ---------------------------------------------------
> >
> >                 Key: TIKA-546
> >                 URL: https://issues.apache.org/jira/browse/TIKA-546
> >             Project: Tika
> >          Issue Type: New Feature
> >          Components: cli, languageidentifier
> >    Affects Versions: 0.7
> >            Reporter: Jan Høydahl
> >            Assignee: Chris A. Mattmann
> >         Attachments: TIKA-546.tikhonov.18042011.PATCH
> >
> >
> > Since TIKA-490 it is supposed to be easy adding new language profiles to
> TIKA. However, currently the process involves using Nutch's NGramProfile
> tool and editing the output.
> > We should port Nutch's profile builder to Tika and make it part of
> tika-app.jar:
> > # See http://wiki.apache.org/nutch/LanguageIdentifier
> > # java -jar tika-app.jar --create-profile [--gramsizes=<n>,<n>,...]
> [--maxlines=<max>] <profile-name> <filename> <encoding>
> > Using --gramsizes and --maxlines, we could support both Tika-style
> profiles and Nutch-style profiles and thus deprecate the Nutch tool.
> Defaults should be --gramsizes=3 --maxlines=1000
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>