You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by zzzzz shalev <zz...@yahoo.com> on 2006/10/20 12:25:17 UTC

part of speech tagger

hello all,
   
    i would like to retrieve during query time, the part of speech of each word in a query,
  does anyone know of an implementation of a java part of speech api?
   
  thanks in advance,
   
   

 		
---------------------------------
Stay in the know. Pulse on the new Yahoo.com.  Check it out. 

Re: part of speech tagger

Posted by Grant Ingersoll <gs...@apache.org>.
Google Brill tagger or Brill part of speech tagger.  I believe there  
is a Java API.  It is trainable, as well.

-Grant

On Oct 20, 2006, at 6:25 AM, zzzzz shalev wrote:

> hello all,
>
>     i would like to retrieve during query time, the part of speech  
> of each word in a query,
>   does anyone know of an implementation of a java part of speech api?
>
>   thanks in advance,
>
>
>
>  		
> ---------------------------------
> Stay in the know. Pulse on the new Yahoo.com.  Check it out.

--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: part of speech tagger

Posted by Bob Carpenter <ca...@alias-i.com>.
zzzzz shalev wrote:
> hello all,
>    
>     i would like to retrieve during query time, the part of speech of each word in a query,
>   does anyone know of an implementation of a java part of speech api?

The standard statistical POS taggers, such as
the ones recommended (Brill's, OpenNLP, LingPipe)
use syntactic context to disambiguate.   (Aramorph
is the exception.)  Some of them, such as ours (LingPipe),
can return multiple answers with confidence scores.

What they can't do is determine the part-of-speech
of words in a bag of words from a query.  So whether
this will work will depend on whether the queries
come in in whole sentences.

Most of these systems are trained on newswire, too,
so they won't do as well with questions, which have
different syntactic forms in most languages.

For instance, "run home" might be a verb (run)
and noun (home), or the query might be about baseball
and it's really two nouns, "run" and "home" (not to
be confused with "home run", which is a compound
noun with an idiomatic meaning in baseball).

- Bob

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: part of speech tagger

Posted by Breck Baldwin <br...@alias-i.com>.
LingPipe has one, see a tutorial at:

http://alias-i.com/lingpipe/demos/tutorial/posTags/read-me.html

Also look at the competition for academic packages that may or may not 
have POS taggers. See:

http://alias-i.com/lingpipe/web/competition.html

breck (disclosure--we make LingPipe)

zzzzz shalev wrote:
> hello all,
>    
>     i would like to retrieve during query time, the part of speech of each word in a query,
>   does anyone know of an implementation of a java part of speech api?
>    
>   thanks in advance,
>    
>    
> 
>  		
> ---------------------------------
> Stay in the know. Pulse on the new Yahoo.com.  Check it out. 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: part of speech tagger

Posted by Pierrick Brihaye <pi...@free.fr>.
Ji,

zzzzz shalev a écrit :

> hello all,
> 
> i would like to retrieve during query time, the part of speech of
> each word in a query, does anyone know of an implementation of a java
> part of speech api?
> 
> thanks in advance,

Aramorph for Java, which is an arabic Analyzer that provides a Lucene
interface, feeds the token type
(http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Token.html#type())
with a (possible) POS label. See
http://www.nongnu.org/aramorph/english/lucene.html.

This token type may (and actually is) be used to filter "empty" words.

Cheers,

p.b.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: part of speech tagger

Posted by Fredrik Hedberg <fr...@avafan.com>.
http://opennlp.sf.net/

 - Fredrik

2006/10/20, zzzzz shalev <zz...@yahoo.com>:
> hello all,
>
>     i would like to retrieve during query time, the part of speech of each word in a query,
>   does anyone know of an implementation of a java part of speech api?
>
>   thanks in advance,
>
>
>
>
> ---------------------------------
> Stay in the know. Pulse on the new Yahoo.com.  Check it out.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org