You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@opennlp.apache.org by jo...@apache.org on 2011/01/13 19:12:44 UTC
svn commit: r1058696 - in /incubator/opennlp/trunk/opennlp-docs/src/docbkx:
postagger.xml tokenizer.xml
Author: joern
Date: Thu Jan 13 18:12:44 2011
New Revision: 1058696
URL: http://svn.apache.org/viewvc?rev=1058696&view=rev
Log:
OPENNLP-45 Now uses programmlisting for text samples
Modified:
incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml
incubator/opennlp/trunk/opennlp-docs/src/docbkx/tokenizer.xml
Modified: incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml
URL: http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml?rev=1058696&r1=1058695&r2=1058696&view=diff
==============================================================================
--- incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml (original)
+++ incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml Thu Jan 13 18:12:44 2011
@@ -44,17 +44,17 @@ $ bin/opennlp POSTagger en-pos-maxent.bi
</screen>
The POS Tagger now reads a tokenized sentence per line from stdin.
Copy these two sentences to the console:
- <literallayout>
+ <programlisting>
<![CDATA[
Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .
Mr. Vinken is chairman of Elsevier N.V. , the Dutch publishing group .]]>
- </literallayout>
+ </programlisting>
the POS Tagger will now echo the sentences with pos tags to the console:
- <literallayout>
+ <programlisting>
<![CDATA[
Pierre_NNP Vinken_NNP ,_, 61_CD years_NNS old_JJ ,_, will_MD join_VB the_DT board_NN as_IN a_DT nonexecutive_JJ director_NN Nov._NNP 29_CD ._.
Mr._NNP Vinken_NNP is_VBZ chairman_NN of_IN Elsevier_NNP N.V._NNP ,_, the_DT Dutch_NNP publishing_VBG group_NN]]>
- </literallayout>
+ </programlisting>
The tag set used by the english pos model is the Penn Treebank tag set. See the link below for a description of the tags.
</para>
</section>
Modified: incubator/opennlp/trunk/opennlp-docs/src/docbkx/tokenizer.xml
URL: http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-docs/src/docbkx/tokenizer.xml?rev=1058696&r1=1058695&r2=1058696&view=diff
==============================================================================
--- incubator/opennlp/trunk/opennlp-docs/src/docbkx/tokenizer.xml (original)
+++ incubator/opennlp/trunk/opennlp-docs/src/docbkx/tokenizer.xml Thu Jan 13 18:12:44 2011
@@ -24,25 +24,25 @@
tokens. Tokens are usually
words, punctuation, numbers, etc.
- <literallayout>
+ <programlisting>
<![CDATA[
Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.
Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group.
Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC, was named a director of this British industrial conglomerate.
]]>
- </literallayout>
+ </programlisting>
The following result shows the individual tokens in a whitespace
separated representation.
- <literallayout>
+ <programlisting>
<![CDATA[
Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .
Mr. Vinken is chairman of Elsevier N.V. , the Dutch publishing group .
Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields PLC , was named a nonexecutive director of this British industrial conglomerate .
A form of asbestos once used to make Kent cigarette filters has caused a high percentage of cancer deaths among a group of workers exposed to it more than 30 years ago , researchers reported .
]]>
- </literallayout>
+ </programlisting>
OpenNLP offers multiple tokenizer implementations:
<itemizedlist>