You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by mark meiklejohn <ma...@yahoo.co.uk> on 2011/07/25 23:07:31 UTC

NLP Instantiation Help!

Hi,

I'm coming from 1.3.1 to 1.5.1, now I can get 1.5.1 up and running fine 
with the examples. However, there are some features missing and I'm 
wondering how I can go about incorporating/instantiating them. 
Typically, I used the TreebankParser as it gives me nice structure to 
traverse, but that seems to have gone AWOL or has been replaced by the 
POSModels.

First off I'm looking to use the 'tagdict' that was with 1.3.1 & case 
insensitive mode. The reason being is that I have no control over the 
input that I will be processing.

So it could be entirely possible that information I receive could be all 
in capitals i.e. "I NEED OPENNLP TO BE ABLE TO PROCESS IN CASE 
INSENSITIVE MODE" now in this case 1.5.1 typically returns the majority 
of these as NNPs as would 1.3.1, which is no good, but since 1.3.1 would 
process in case insensitive it give me a better parse structure for it.

Now I can't just reduce everything to lower case as it comes through as 
this may have knock-on effects. So is there away to achieve what I want 
to achieve??

If someone knows how to go about instantiating what I'm looking for an 
example would be greatly appreciated

TIA

Mark

Re: NLP Instantiation Help!

Posted by mark meiklejohn <ma...@yahoo.co.uk>.

On 26/07/2011 09:06, Jörn Kottmann wrote:
> On 7/25/11 11:07 PM, mark meiklejohn wrote:
>> Hi,
>>
>> I'm coming from 1.3.1 to 1.5.1, now I can get 1.5.1 up and running
>> fine with the examples. However, there are some features missing and
>> I'm wondering how I can go about incorporating/instantiating them.
>> Typically, I used the TreebankParser as it gives me nice structure to
>> traverse, but that seems to have gone AWOL or has been replaced by the
>> POSModels.
>>
> Do you need to parse a sentence, or do you only want to do
> part-of-speech tagging? If you only do pos tagging you should
> only use the pos tagger, because it is much faster.

I agree it is much faster but I need the full parse.

>
>> First off I'm looking to use the 'tagdict' that was with 1.3.1 & case
>> insensitive mode. The reason being is that I have no control over the
>> input that I will be processing.
>>
>> So it could be entirely possible that information I receive could be
>> all in capitals i.e. "I NEED OPENNLP TO BE ABLE TO PROCESS IN CASE
>> INSENSITIVE MODE" now in this case 1.5.1 typically returns the
>> majority of these as NNPs as would 1.3.1, which is no good, but since
>> 1.3.1 would process in case insensitive it give me a better parse
>> structure for it.
>>
>> Now I can't just reduce everything to lower case as it comes through
>> as this may have knock-on effects. So is there away to achieve what I
>> want to achieve??
>>
>> If someone knows how to go about instantiating what I'm looking for an
>> example would be greatly appreciated
>
> Just had a look at the code. Looks like the case sensitive flag doesn't
> work correctly with the pos dictionary we currently have.
> It is not possible to set it to false.
>
> Do you want to open a jira?

I'll raise an issue through jira
>
> It should be fixed for 1.5.2, which will be released soon.
>
> Jörn
>
>
>

Re: NLP Instantiation Help!

Posted by Jörn Kottmann <ko...@gmail.com>.

On 7/25/11 11:07 PM, mark meiklejohn wrote:
> Hi,
>
> I'm coming from 1.3.1 to 1.5.1, now I can get 1.5.1 up and running 
> fine with the examples. However, there are some features missing and 
> I'm wondering how I can go about incorporating/instantiating them. 
> Typically, I used the TreebankParser as it gives me nice structure to 
> traverse, but that seems to have gone AWOL or has been replaced by the 
> POSModels.
>
Do you need to parse a sentence, or do you only want to do 
part-of-speech tagging? If you only do pos tagging you should
only use the pos tagger, because it is much faster.

> First off I'm looking to use the 'tagdict' that was with 1.3.1 & case 
> insensitive mode. The reason being is that I have no control over the 
> input that I will be processing.
>
> So it could be entirely possible that information I receive could be 
> all in capitals i.e. "I NEED OPENNLP TO BE ABLE TO PROCESS IN CASE 
> INSENSITIVE MODE" now in this case 1.5.1 typically returns the 
> majority of these as NNPs as would 1.3.1, which is no good, but since 
> 1.3.1 would process in case insensitive it give me a better parse 
> structure for it.
>
> Now I can't just reduce everything to lower case as it comes through 
> as this may have knock-on effects. So is there away to achieve what I 
> want to achieve??
>
> If someone knows how to go about instantiating what I'm looking for an 
> example would be greatly appreciated

Just had a look at the code. Looks like the case sensitive flag doesn't 
work correctly with the pos dictionary we currently have.
It is not possible to set it to false.

Do you want to open a jira?

It should be fixed for 1.5.2, which will be released soon.

Jörn