You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by mark meiklejohn <ma...@yahoo.co.uk> on 2011/07/25 23:07:31 UTC
NLP Instantiation Help!
Hi,
I'm coming from 1.3.1 to 1.5.1, now I can get 1.5.1 up and running fine
with the examples. However, there are some features missing and I'm
wondering how I can go about incorporating/instantiating them.
Typically, I used the TreebankParser as it gives me nice structure to
traverse, but that seems to have gone AWOL or has been replaced by the
POSModels.
First off I'm looking to use the 'tagdict' that was with 1.3.1 & case
insensitive mode. The reason being is that I have no control over the
input that I will be processing.
So it could be entirely possible that information I receive could be all
in capitals i.e. "I NEED OPENNLP TO BE ABLE TO PROCESS IN CASE
INSENSITIVE MODE" now in this case 1.5.1 typically returns the majority
of these as NNPs as would 1.3.1, which is no good, but since 1.3.1 would
process in case insensitive it give me a better parse structure for it.
Now I can't just reduce everything to lower case as it comes through as
this may have knock-on effects. So is there away to achieve what I want
to achieve??
If someone knows how to go about instantiating what I'm looking for an
example would be greatly appreciated
TIA
Mark
Re: NLP Instantiation Help!
Posted by mark meiklejohn <ma...@yahoo.co.uk>.
On 26/07/2011 09:06, Jörn Kottmann wrote:
> On 7/25/11 11:07 PM, mark meiklejohn wrote:
>> Hi,
>>
>> I'm coming from 1.3.1 to 1.5.1, now I can get 1.5.1 up and running
>> fine with the examples. However, there are some features missing and
>> I'm wondering how I can go about incorporating/instantiating them.
>> Typically, I used the TreebankParser as it gives me nice structure to
>> traverse, but that seems to have gone AWOL or has been replaced by the
>> POSModels.
>>
> Do you need to parse a sentence, or do you only want to do
> part-of-speech tagging? If you only do pos tagging you should
> only use the pos tagger, because it is much faster.
I agree it is much faster but I need the full parse.
>
>> First off I'm looking to use the 'tagdict' that was with 1.3.1 & case
>> insensitive mode. The reason being is that I have no control over the
>> input that I will be processing.
>>
>> So it could be entirely possible that information I receive could be
>> all in capitals i.e. "I NEED OPENNLP TO BE ABLE TO PROCESS IN CASE
>> INSENSITIVE MODE" now in this case 1.5.1 typically returns the
>> majority of these as NNPs as would 1.3.1, which is no good, but since
>> 1.3.1 would process in case insensitive it give me a better parse
>> structure for it.
>>
>> Now I can't just reduce everything to lower case as it comes through
>> as this may have knock-on effects. So is there away to achieve what I
>> want to achieve??
>>
>> If someone knows how to go about instantiating what I'm looking for an
>> example would be greatly appreciated
>
> Just had a look at the code. Looks like the case sensitive flag doesn't
> work correctly with the pos dictionary we currently have.
> It is not possible to set it to false.
>
> Do you want to open a jira?
I'll raise an issue through jira
>
> It should be fixed for 1.5.2, which will be released soon.
>
> Jörn
>
>
>
Re: NLP Instantiation Help!
Posted by Jörn Kottmann <ko...@gmail.com>.
On 7/25/11 11:07 PM, mark meiklejohn wrote:
> Hi,
>
> I'm coming from 1.3.1 to 1.5.1, now I can get 1.5.1 up and running
> fine with the examples. However, there are some features missing and
> I'm wondering how I can go about incorporating/instantiating them.
> Typically, I used the TreebankParser as it gives me nice structure to
> traverse, but that seems to have gone AWOL or has been replaced by the
> POSModels.
>
Do you need to parse a sentence, or do you only want to do
part-of-speech tagging? If you only do pos tagging you should
only use the pos tagger, because it is much faster.
> First off I'm looking to use the 'tagdict' that was with 1.3.1 & case
> insensitive mode. The reason being is that I have no control over the
> input that I will be processing.
>
> So it could be entirely possible that information I receive could be
> all in capitals i.e. "I NEED OPENNLP TO BE ABLE TO PROCESS IN CASE
> INSENSITIVE MODE" now in this case 1.5.1 typically returns the
> majority of these as NNPs as would 1.3.1, which is no good, but since
> 1.3.1 would process in case insensitive it give me a better parse
> structure for it.
>
> Now I can't just reduce everything to lower case as it comes through
> as this may have knock-on effects. So is there away to achieve what I
> want to achieve??
>
> If someone knows how to go about instantiating what I'm looking for an
> example would be greatly appreciated
Just had a look at the code. Looks like the case sensitive flag doesn't
work correctly with the pos dictionary we currently have.
It is not possible to set it to false.
Do you want to open a jira?
It should be fixed for 1.5.2, which will be released soon.
Jörn