You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Joe Corneli <ho...@gmail.com> on 2016/01/14 16:39:52 UTC
multiple pos and alternate parses (clojure interface)
I'm using the OpenNLP Clojure interface,
https://github.com/dakrone/clojure-opennlp
In my first attempt at parsing a sentence with the Treebank model, I
tried the following:
(treebank-parser ["What can happen in a second ."])
I got the following answer:
(TOP
(SBARQ
(WHNP (WP What))
(SQ
(VP (MD can) (VP (VB happen) (PP (IN in) (NP (DT a) (JJ second))))))
(. .)))
For the most part that seems OK, except that "second" is tagged as an
adjective (JJ) rather than as a noun (NN).
[I'm certainly no linguist, but is it even meaningful to talk about a NP
without a noun in it?]
Anyway, at a technical level, I wonder how I can get the parser (or
tagger) to notice and show me the alternative possibilities (i.e. where
"second" is understood as a noun)?
>From looking around online, I'm pretty sure this is possible, though I
don't know if it's directly supported by the Clojure interface! I'd
also appreciate any pointers to how to do it directly in Java, so I know
what sorts of questions to ask next.
Many thanks,
Joe
PS. The issue of indeterminacy is described in "Building a large
annotated corpus of English: the Penn Treebank" as follows:
«Since a major concern of the Treebank is avoid requiring annotators to
make arbitrary decisions, we allow words to be associated with more than
one POS tag. Such multiple tagging indicates either that the word's part
of speech simply cannot be decided or that the annotator is unsure which
of the alternative tags is the correct one. In principle, annotators can
tag a word with any number of tags, but in practice, multiple tags are
restricted to a small number of recurring two-tag combinations: JJNN
(adjective or noun as prenominal modifier), JJVBG (adjective or
gerund/present participle), JJVBN (adjective or past participle), NNVBG
(noun or gerund), and RBRP (adverb or particle).»
- https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html