You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@opennlp.apache.org by jo...@apache.org on 2011/01/13 18:05:14 UTC

svn commit: r1058665 - in /incubator/opennlp/trunk/opennlp-docs/src/docbkx: opennlp.xml postagger.xml

Author: joern
Date: Thu Jan 13 17:05:13 2011
New Revision: 1058665

URL: http://svn.apache.org/viewvc?rev=1058665&view=rev
Log:
OPENNLP-45 Added first documentation bits for the pos tagger

Added:
    incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml   (with props)
Modified:
    incubator/opennlp/trunk/opennlp-docs/src/docbkx/opennlp.xml

Modified: incubator/opennlp/trunk/opennlp-docs/src/docbkx/opennlp.xml
URL: http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-docs/src/docbkx/opennlp.xml?rev=1058665&r1=1058664&r2=1058665&view=diff
==============================================================================
--- incubator/opennlp/trunk/opennlp-docs/src/docbkx/opennlp.xml (original)
+++ incubator/opennlp/trunk/opennlp-docs/src/docbkx/opennlp.xml Thu Jan 13 17:05:13 2011
@@ -79,7 +79,7 @@ under the License.
 	<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./tokenizer.xml" />
 	<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./namefinder.xml" />
 	<!--xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./doccat.xml" /-->
-	<!--xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./postagger.xml" /-->
+	<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./postagger.xml" />
 	<!--xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./chunker.xml" /-->
 	<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./parser.xml" />
 	<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./corpora.xml" />

Added: incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml
URL: http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml?rev=1058665&view=auto
==============================================================================
--- incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml (added)
+++ incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml Thu Jan 13 17:05:13 2011
@@ -0,0 +1,62 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="tools.postagger">
+<title>Part-of-Speech Tagger</title>
+	<section id="tools.postagger.tagging">
+		<title>Tagging</title>
+		<para>
+		The Part of Speech Tagger marks tokens with their corresponding word type
+		based on the token itself and the context of the token. A token can have
+		multiple pos tags depending on the token and the context. The OpenNLP POS Tagger
+		uses a probability model to guess the correct pos tag out of the tag set.
+		To limit the possible tags for a token a tag dictionary can be used which increases
+		the tagging and runtime performance of the tagger.
+		</para>
+			<section id="tools.postagger.tagging.cmdline">
+		<title>POS Tagger Tool</title>
+		<para>
+		The easiest way to try out the POS Tagger is the command line tool. The tool is only intended for demonstration and testing.
+		Download the english maxent pos model and start the POS Tagger Tool with this command:
+		<screen>
+			<![CDATA[
+$ bin/opennlp POSTagger en-pos-maxent.bin]]>
+		 </screen>
+		The POS Tagger now reads a tokenized sentence per line from stdin.
+		Copy these two sentences to the console:
+		<literallayout>
+			<![CDATA[
+Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .
+Mr. Vinken is chairman of Elsevier N.V. , the Dutch publishing group .]]>
+		 </literallayout>
+		 the POS Tagger will now echo the sentences with pos tags to the console:
+				<literallayout>
+			<![CDATA[
+Pierre_NNP Vinken_NNP ,_, 61_CD years_NNS old_JJ ,_, will_MD join_VB the_DT board_NN as_IN a_DT nonexecutive_JJ director_NN Nov._NNP 29_CD ._.
+Mr._NNP Vinken_NNP is_VBZ chairman_NN of_IN Elsevier_NNP N.V._NNP ,_, the_DT Dutch_NNP publishing_VBG group_NN]]>
+		 </literallayout> 
+		 The tag set used by the english pos model is the Penn Treebank tag set. See the link below for a description of the tags.
+		</para>
+		</section>
+	</section>
+</chapter>
\ No newline at end of file

Propchange: incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml
------------------------------------------------------------------------------
    svn:mime-type = text/plain