You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Bob Carpenter <ca...@alias-i.com> on 2006/11/22 00:10:21 UTC

Re: part of speech tagger

zzzzz shalev wrote:
> hello all,
>    
>     i would like to retrieve during query time, the part of speech of each word in a query,
>   does anyone know of an implementation of a java part of speech api?

The standard statistical POS taggers, such as
the ones recommended (Brill's, OpenNLP, LingPipe)
use syntactic context to disambiguate.   (Aramorph
is the exception.)  Some of them, such as ours (LingPipe),
can return multiple answers with confidence scores.

What they can't do is determine the part-of-speech
of words in a bag of words from a query.  So whether
this will work will depend on whether the queries
come in in whole sentences.

Most of these systems are trained on newswire, too,
so they won't do as well with questions, which have
different syntactic forms in most languages.

For instance, "run home" might be a verb (run)
and noun (home), or the query might be about baseball
and it's really two nouns, "run" and "home" (not to
be confused with "home run", which is a compound
noun with an idiomatic meaning in baseball).

- Bob

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org