You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@opennlp.apache.org by jo...@apache.org on 2011/05/31 13:14:54 UTC

svn commit: r1129626 - /incubator/opennlp/trunk/opennlp-docs/src/docbkx/namefinder.xml

Author: joern
Date: Tue May 31 11:14:54 2011
New Revision: 1129626

URL: http://svn.apache.org/viewvc?rev=1129626&view=rev
Log:
OPENNLP-194 Fixed too long lines

Modified:
    incubator/opennlp/trunk/opennlp-docs/src/docbkx/namefinder.xml

Modified: incubator/opennlp/trunk/opennlp-docs/src/docbkx/namefinder.xml
URL: http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-docs/src/docbkx/namefinder.xml?rev=1129626&r1=1129625&r2=1129626&view=diff
==============================================================================
--- incubator/opennlp/trunk/opennlp-docs/src/docbkx/namefinder.xml (original)
+++ incubator/opennlp/trunk/opennlp-docs/src/docbkx/namefinder.xml Tue May 31 11:14:54 2011
@@ -58,14 +58,16 @@ $bin/opennlp TokenNameFinder en-ner-pers
 				<![CDATA[
 Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .
 Mr . Vinken is chairman of Elsevier N.V. , the Dutch publishing group .
-Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields PLC , was named a director of this British industrial conglomerate .]]>
+Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields PLC , was named
+    a director of this British industrial conglomerate .]]>
 			 </programlisting>
 			 the name finder will now output the text with markup for person names:
 			<programlisting>
 				<![CDATA[
 <START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 .
 Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group .
-<START:person> Rudolph Agnew <END> , 55 years old and former chairman of Consolidated Gold Fields PLC , was named a director of this British industrial conglomerate .
+<START:person> Rudolph Agnew <END> , 55 years old and former chairman of Consolidated Gold Fields PLC ,
+    was named a director of this British industrial conglomerate .
 				]]>
 			 </programlisting>		 
 		</para>
@@ -118,9 +120,14 @@ finally {
 				<![CDATA[
 NameFinderME nameFinder = new NameFinderME(model);]]>
 			</programlisting>
-			The initialization is now finished and the Name Finder can be used. The NameFinderME class is not thread safe, it must only be called from one thread. To use multiple threads multiple NameFinderME instances sharing the same model instance can be created.
+			The initialization is now finished and the Name Finder can be used. The NameFinderME
+			class is not thread safe, it must only be called from one thread. To use multiple threads
+			multiple NameFinderME instances sharing the same model instance can be created.
 			The input text should be segmented into documents, sentences and tokens.
-			To perform entity detection an application calls the find method for every sentence in the document. After every document clearAdaptiveData must be called to clear the adaptive data in the feature generators. Not calling clearAdaptiveData can lead to a sharp drop in the detection rate after a few documents.
+			To perform entity detection an application calls the find method for every sentence in the
+			document. After every document clearAdaptiveData must be called to clear the adaptive data in
+			the feature generators. Not calling clearAdaptiveData can lead to a sharp drop in the detection
+			rate after a few documents.
 			The following code illustrates that:
 			<programlisting language="java">
 				<![CDATA[
@@ -149,8 +156,14 @@ String sentence = new String[]{
 
 Span nameSpans[] = nameFinder.find(sentence);]]>
 			</programlisting>
-			The nameSpans arrays contains now exactly one Span which marks the name Pierre Vinken. The elements between the begin and end offsets are the name tokens. In this case the begin offset is 0 and the end offset is 2. The Span object also knows the type of the entity. In this case its person (defined by the model). It can be retrieved with a call to Span.getType().
-			Additionally to the statistical Name Finder, OpenNLP also offers a dictionary and a regular expression name finder implementation.
+			The nameSpans arrays contains now exactly one Span which marks the name Pierre Vinken. 
+			The elements between the begin and end offsets are the name tokens. In this case the begin 
+			offset is 0 and the end offset is 2. The Span object also knows the type of the entity.
+			In this case its person (defined by the model). It can be retrieved with a call to Span.getType().
+			Additionally to the statistical Name Finder, OpenNLP also offers a dictionary and a regular
+			expression name finder implementation.
+		</para>
+		<para>
 			TODO: Explain how to retrieve probs from the name finder for names and for non recognized names
 		</para>
 	</section>
@@ -158,8 +171,10 @@ Span nameSpans[] = nameFinder.find(sente
 	<section id="tools.namefind.training">
 		<title>Name Finder Training</title>
 		<para>
-			The pre-trained models might not be available for a desired language, can not detect important entities or the performance is not good enough outside the news domain.
-			These are the typical reason to do custom training of the name finder on a new corpus or on a corpus which is extended by private training data taken from the data which should be analyzed.
+			The pre-trained models might not be available for a desired language, can not detect
+			important entities or the performance is not good enough outside the news domain.
+			These are the typical reason to do custom training of the name finder on a new corpus
+			or on a corpus which is extended by private training data taken from the data which should be analyzed.
 		</para>
 		
 		<section id="tools.namefind.training.tool">
@@ -189,7 +204,8 @@ Mr . <START:person> Vinken <END> is chai
 			<screen>
 				<![CDATA[
 $ bin/opennlp TokenNameFinderTrainer
-Usage: opennlp TokenNameFinderTrainer -lang language -encoding charset [-iterations num] [-cutoff num] [-type type] -data trainingData -model model
+Usage: opennlp TokenNameFinderTrainer -lang language -encoding charset [-iterations num] \ 
+[-cutoff num] [-type type] -data trainingData -model model
 -lang language     specifies the language which is being processed.
 -encoding charset  specifies the encoding which should be used for reading and writing text.
 -iterations num    specified the number of training iterations
@@ -210,7 +226,8 @@ $bin/opennlp TokenNameFinderTrainer -enc
 		<section id="tools.namefind.training.api">
 		<title>Training API</title>
 		<para>
-			To train the name finder from within an application its recommended to use the training API instead of the command line tool.
+			To train the name finder from within an application its recommended to use the training
+			API instead of the command line tool.
 			Basically three steps are necessary to train it:
 			<itemizedlist>
 				<listitem>
@@ -266,7 +283,9 @@ AdaptiveFeatureGenerator featureGenerato
            });]]>
 			</programlisting>
 			which is similar to the default feature generator.
-			The javadoc of the feature generator classes explain what the individual feature generators do. To write a custom feature generator please implement the AdaptiveFeatureGenerator interface or if it must not be adaptive extend the FeatureGeneratorAdapter.
+			The javadoc of the feature generator classes explain what the individual feature generators do.
+			To write a custom feature generator please implement the AdaptiveFeatureGenerator interface or
+			if it must not be adaptive extend the FeatureGeneratorAdapter.
 			The train method which should be used is defined as
 			<programlisting language="java">
 				<![CDATA[
@@ -322,7 +341,8 @@ FMeasure result = evaluator.getFMeasure(
 
 System.out.println(result.toString());]]>
 			</programlisting>
-			In the cross validation case all the training arguments must be provided (see the Training API section above).
+			In the cross validation case all the training arguments must be
+			provided (see the Training API section above).
 			To perform cross validation the ObjectStream must be resettable.
 			<programlisting language="java">
 				<![CDATA[
@@ -367,7 +387,7 @@ System.out.println(result.toString());]]
 						</ulink>
 					</para>
 				</listitem>
-				<!-- Add CONLL 2002 link here ...-->
+				<!-- TODO: Add CONLL 2002 link here ...-->
 				<listitem>
 					<para>
 						<ulink  url="http://www.cnts.ua.ac.be/conll2003/ner/annotation.txt">