You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/11/06 00:23:27 UTC

[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika

    [ https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992696#comment-14992696 ] 

ASF GitHub Bot commented on TIKA-1787:
--------------------------------------

GitHub user TaichiHo opened a pull request:

    https://github.com/apache/tika/pull/62

    fix for TIKA-1787 contributed by Yueheng He

    Succeed in building using java 1.8.0_65. 
    To see the effect, create a text file like the following. 
    ```
    Good afternoon Rajat Raina, how are you today? Hi, I am Tom Brady. I go to school at Stanford University, which is located in California.
    ```
    Save it as test.ner and feed it to tika. 
    ```
    java -classpath tika-app/target/tika-app-1.12-SNAPSHOT.jar org.apache.tika.cli.TikaCLI -m test.ner
    ```
    The result should look like this
    ```
    Content-Length: 137
    Content-Type: application/stanford-ner
    LOCATION: [California]
    ORGANIZATION: [Stanford University]
    PERSON: [Rajat Raina, Tom Brady]
    X-Parsed-By: org.apache.tika.parser.DefaultParser
    X-Parsed-By: org.apache.tika.parser.stanfordNer.StanfordNerParser
    resourceName: test.ner
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/TaichiHo/tika TIKA-1787

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tika/pull/62.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #62
    
----
commit b94331ece262bb8d8408dda7b22b6dc0bb69557e
Author: Taichi <he...@gmail.com>
Date:   2015-11-05T22:47:22Z

    fix for TIKA-1787 contributed by Yueheng He

----


> Include Stanford Name Entity Recognition in Tika
> ------------------------------------------------
>
>                 Key: TIKA-1787
>                 URL: https://issues.apache.org/jira/browse/TIKA-1787
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime, parser
>    Affects Versions: 1.12
>         Environment: Java 1.8, Mac OSX 10.11
>            Reporter: Yueheng He
>              Labels: features, newbie, test
>             Fix For: 1.12
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Using the Stanford Name Entity Recognition, Tika will be able to extract name entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The extracted name entities will be added to the metadata



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)