You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by thammegowda <gi...@git.apache.org> on 2015/10/31 05:53:46 UTC

[GitHub] tika pull request: NamedEntityParser

GitHub user thammegowda opened a pull request:

    https://github.com/apache/tika/pull/61

    NamedEntityParser

    Added NamedEntityParser that supports loading of NER implementations at runtime.
    The default NER implementation based on OpenNLP is supplied.
    
    Another implementation based on StanfordCoreNLP and is located [here](https://github.com/thammegowda/tika-ner-corenlp/blob/master/src/test/java/edu/usc/cs/ir/tika/ner/NamedEntityParserTest.java) This is GNU GPL 3, So kept separate.
    
    @chrismattmann This is not  100% complete, here are few TODOs : 
    1. The NER imaplementing class name needs is to be read from tika config if possible/available. Currently replying on Java Properties. Please suggest me on how to resolve [this todo](https://github.com/thammegowda/tika/blob/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/NamedEntityParser.java#L70)
    2. The best way to parsed text within the NamedEntityParser (not sure if parser2 can read output of parser1). Please suggest me to resolve [this todo](https://github.com/thammegowda/tika/blob/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/NamedEntityParser.java#L91)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/thammegowda/tika trunk

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tika/pull/61.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #61
    
----
commit e96da2bc28d5eef81d034e39eb05099ed5d38ac1
Author: Thamme Gowda <tg...@gmail.com>
Date:   2015-10-30T21:47:45Z

    Add NamedEntityParser
    
    Add OpenNLPNERecogniser as default

commit a720507a1c1906a501470a7d5c5cec335412fcd3
Author: Thamme Gowda <tg...@gmail.com>
Date:   2015-10-30T22:16:11Z

    Set charset for converting text to stream

commit 6b1a20e681a5d319886464ec147967c876b7e60d
Author: Thamme Gowda <tg...@gmail.com>
Date:   2015-10-31T04:23:43Z

    Automated OpenNLP NER model downloader

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tika pull request: Fix for TIKA-1787 : NamedEntityParser

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/tika/pull/61


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---