You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@opennlp.apache.org by jo...@apache.org on 2011/01/13 18:05:14 UTC
svn commit: r1058665 - in /incubator/opennlp/trunk/opennlp-docs/src/docbkx:
opennlp.xml postagger.xml
Author: joern
Date: Thu Jan 13 17:05:13 2011
New Revision: 1058665
URL: http://svn.apache.org/viewvc?rev=1058665&view=rev
Log:
OPENNLP-45 Added first documentation bits for the pos tagger
Added:
incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml (with props)
Modified:
incubator/opennlp/trunk/opennlp-docs/src/docbkx/opennlp.xml
Modified: incubator/opennlp/trunk/opennlp-docs/src/docbkx/opennlp.xml
URL: http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-docs/src/docbkx/opennlp.xml?rev=1058665&r1=1058664&r2=1058665&view=diff
==============================================================================
--- incubator/opennlp/trunk/opennlp-docs/src/docbkx/opennlp.xml (original)
+++ incubator/opennlp/trunk/opennlp-docs/src/docbkx/opennlp.xml Thu Jan 13 17:05:13 2011
@@ -79,7 +79,7 @@ under the License.
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./tokenizer.xml" />
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./namefinder.xml" />
<!--xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./doccat.xml" /-->
- <!--xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./postagger.xml" /-->
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./postagger.xml" />
<!--xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./chunker.xml" /-->
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./parser.xml" />
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./corpora.xml" />
Added: incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml
URL: http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml?rev=1058665&view=auto
==============================================================================
--- incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml (added)
+++ incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml Thu Jan 13 17:05:13 2011
@@ -0,0 +1,62 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="tools.postagger">
+<title>Part-of-Speech Tagger</title>
+ <section id="tools.postagger.tagging">
+ <title>Tagging</title>
+ <para>
+ The Part of Speech Tagger marks tokens with their corresponding word type
+ based on the token itself and the context of the token. A token can have
+ multiple pos tags depending on the token and the context. The OpenNLP POS Tagger
+ uses a probability model to guess the correct pos tag out of the tag set.
+ To limit the possible tags for a token a tag dictionary can be used which increases
+ the tagging and runtime performance of the tagger.
+ </para>
+ <section id="tools.postagger.tagging.cmdline">
+ <title>POS Tagger Tool</title>
+ <para>
+ The easiest way to try out the POS Tagger is the command line tool. The tool is only intended for demonstration and testing.
+ Download the english maxent pos model and start the POS Tagger Tool with this command:
+ <screen>
+ <![CDATA[
+$ bin/opennlp POSTagger en-pos-maxent.bin]]>
+ </screen>
+ The POS Tagger now reads a tokenized sentence per line from stdin.
+ Copy these two sentences to the console:
+ <literallayout>
+ <![CDATA[
+Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .
+Mr. Vinken is chairman of Elsevier N.V. , the Dutch publishing group .]]>
+ </literallayout>
+ the POS Tagger will now echo the sentences with pos tags to the console:
+ <literallayout>
+ <![CDATA[
+Pierre_NNP Vinken_NNP ,_, 61_CD years_NNS old_JJ ,_, will_MD join_VB the_DT board_NN as_IN a_DT nonexecutive_JJ director_NN Nov._NNP 29_CD ._.
+Mr._NNP Vinken_NNP is_VBZ chairman_NN of_IN Elsevier_NNP N.V._NNP ,_, the_DT Dutch_NNP publishing_VBG group_NN]]>
+ </literallayout>
+ The tag set used by the english pos model is the Penn Treebank tag set. See the link below for a description of the tags.
+ </para>
+ </section>
+ </section>
+</chapter>
\ No newline at end of file
Propchange: incubator/opennlp/trunk/opennlp-docs/src/docbkx/postagger.xml
------------------------------------------------------------------------------
svn:mime-type = text/plain