You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by manjunath nakshathri <na...@gmail.com> on 2018/03/05 10:46:52 UTC

Open NLP : Categorization

Hello There,

We are using opennlp for document categorization with Ngram Features to
categorize our incoming text. For example :

"The shape of water and Frances McDormand rule oscar 2018"

Given this sentence we would like to arrive at :

Shape of Water : Movie
Frances McDormand : Actress

This we are able to achieve with the following document categorization
training data and with the ngram features;

Movie Shape of Water
Actress Frances McDormand

*What is not working:*
If we try to categorize a single word say Oscar as an award category, we
are not able to. Any idea how we can get this working?

*Target training data*
Movie Shape of Water
Actress Frances McDormand
Award Oscar

*Desired Output :*
Shape of Water : Movie
Frances McDormand : Actress
Oscar: Award

Implementation details :
Open NLP version : 1.8.4
Training Algorithm used : Naive Bayes
Iteraitions set : 100

*General Questions*
Q :Why we cant use NER ?
A : We need ngram feature analysis which is not possible in NER.

Q : Are we going to build our own training data ?
A : Yes

Really appreciate any help towards solving this issue.

-- 
Thanks and Regards
Manjunath

Re: Open NLP : Categorization

Posted by Joern Kottmann <ko...@gmail.com>.

There is also a n-gram feature generator that can be used with the
Name Finder, you should give it a try to establish a baseline on your
data and then you can still tune it and test different feature
generation strategies.

Jörn

On Mon, Mar 5, 2018 at 11:57 AM, Manoj B. Narayanan
<ma...@gmail.com> wrote:
> Hi Manjunath,
>
> The best way is to go with NER.
>
> I don't get what you mean by N-gram feature analysis. Would be helpful if
> you could elaborate.
>
> From your example I see all are exact matches. So I suggest you go with a
> Dictionary Name Finder.
>
> Thanks,
> Manoj.
>
> On Mon, Mar 5, 2018 at 4:16 PM, manjunath nakshathri <na...@gmail.com>
> wrote:
>
>> Hello There,
>>
>> We are using opennlp for document categorization with Ngram Features to
>> categorize our incoming text. For example :
>>
>> "The shape of water and Frances McDormand rule oscar 2018"
>>
>> Given this sentence we would like to arrive at :
>>
>> Shape of Water : Movie
>> Frances McDormand : Actress
>>
>> This we are able to achieve with the following document categorization
>> training data and with the ngram features;
>>
>> Movie Shape of Water
>> Actress Frances McDormand
>>
>> *What is not working:*
>> If we try to categorize a single word say Oscar as an award category, we
>> are not able to. Any idea how we can get this working?
>>
>> *Target training data*
>> Movie Shape of Water
>> Actress Frances McDormand
>> Award Oscar
>>
>> *Desired Output :*
>> Shape of Water : Movie
>> Frances McDormand : Actress
>> Oscar: Award
>>
>> Implementation details :
>> Open NLP version : 1.8.4
>> Training Algorithm used : Naive Bayes
>> Iteraitions set : 100
>>
>> *General Questions*
>> Q :Why we cant use NER ?
>> A : We need ngram feature analysis which is not possible in NER.
>>
>> Q : Are we going to build our own training data ?
>> A : Yes
>>
>> Really appreciate any help towards solving this issue.
>>
>> --
>> Thanks and Regards
>> Manjunath
>>
>
>
>
> --
> Regards,
> Manoj.

Re: Open NLP : Categorization

Posted by "Manoj B. Narayanan" <ma...@gmail.com>.

Hi Manjunath,

The best way is to go with NER.

I don't get what you mean by N-gram feature analysis. Would be helpful if
you could elaborate.

From your example I see all are exact matches. So I suggest you go with a
Dictionary Name Finder.

Thanks,
Manoj.

On Mon, Mar 5, 2018 at 4:16 PM, manjunath nakshathri <na...@gmail.com>
wrote:

> Hello There,
>
> We are using opennlp for document categorization with Ngram Features to
> categorize our incoming text. For example :
>
> "The shape of water and Frances McDormand rule oscar 2018"
>
> Given this sentence we would like to arrive at :
>
> Shape of Water : Movie
> Frances McDormand : Actress
>
> This we are able to achieve with the following document categorization
> training data and with the ngram features;
>
> Movie Shape of Water
> Actress Frances McDormand
>
> *What is not working:*
> If we try to categorize a single word say Oscar as an award category, we
> are not able to. Any idea how we can get this working?
>
> *Target training data*
> Movie Shape of Water
> Actress Frances McDormand
> Award Oscar
>
> *Desired Output :*
> Shape of Water : Movie
> Frances McDormand : Actress
> Oscar: Award
>
> Implementation details :
> Open NLP version : 1.8.4
> Training Algorithm used : Naive Bayes
> Iteraitions set : 100
>
> *General Questions*
> Q :Why we cant use NER ?
> A : We need ngram feature analysis which is not possible in NER.
>
> Q : Are we going to build our own training data ?
> A : Yes
>
> Really appreciate any help towards solving this issue.
>
> --
> Thanks and Regards
> Manjunath
>



-- 
Regards,
Manoj.