You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@opennlp.apache.org by aa...@apache.org on 2023/04/25 08:00:22 UTC

[opennlp] branch main updated: OPENNLP-1482 : Documentation for OpenNLP Eval Test Data (#531)

This is an automated email from the ASF dual-hosted git repository.

aarora pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/opennlp.git


The following commit(s) were added to refs/heads/main by this push:
     new 265e643c OPENNLP-1482 : Documentation for OpenNLP Eval Test Data (#531)
265e643c is described below

commit 265e643cd85fdeaea9a13b1e4e861bb1aa8226e3
Author: Atita Arora <at...@users.noreply.github.com>
AuthorDate: Tue Apr 25 10:00:15 2023 +0200

    OPENNLP-1482 : Documentation for OpenNLP Eval Test Data (#531)
    
    * OPENNLP-1482 : Documentation for OpenNLP Eval Test Data
    
    * OPENNLP-1482 : PR Feedback accommodated
---
 opennlp-docs/src/docbkx/evaltest.xml | 80 ++++++++++++++++++++++++++++++++++++
 opennlp-docs/src/docbkx/opennlp.xml  |  1 +
 2 files changed, 81 insertions(+)

diff --git a/opennlp-docs/src/docbkx/evaltest.xml b/opennlp-docs/src/docbkx/evaltest.xml
new file mode 100644
index 00000000..2641fe84
--- /dev/null
+++ b/opennlp-docs/src/docbkx/evaltest.xml
@@ -0,0 +1,80 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="opennlp.evaltest">
+<title>Evaluation Test Data</title>
+	<section id="opennlp.evaltest.whatisit">
+		<title>What is it ?</title>
+		<para>
+			The evaluation test data is the data used in the tests that evaluate functionality and performance of
+			OpenNLP.
+			These tests ensure reliability and can help identify potential bugs, errors, or performance issues.
+		</para>
+		<para>
+			The evaluation tests leverage the k-fold cross-validation procedure.
+			This technique works by dividing the evaluation data into <code>k</code> equally sized parts or folds.
+			The algorithm is then trained on <code>k-1</code> of the folds and tested on the remaining fold.
+			This process is repeated <code>k</code> times, so that each of the k-folds is used exactly once as the test data,
+			and the results of each fold are combined to produce an overall estimate of the algorithm's performance.
+		</para>
+	</section>
+	<section id="opennlp.evaltest.whereisit">
+		<title>Where is it?</title>
+		<para>
+			OpenNLP evaluation tests data is available at <ulink url="https://nightlies.apache.org/opennlp/">
+			https://nightlies.apache.org/opennlp/</ulink> (file name : <code>opennlp-data.zip</code>)
+		</para>
+		<para>
+			Here's a link to the evaluation-tests build on Jenkins:<ulink url="https://builds.apache.org/job/OpenNLP/">
+			https://builds.apache.org/job/OpenNLP/</ulink>
+		</para>
+ 	 </section>
+  	<section id="opennlp.evaltest.howtouseit">
+		<title>How to use the evaluation test data to run test?</title>
+		<para>
+			The evaluation tests data can be downloaded and saved in the desired directory and can be used to run
+			OpenNLP Evaluation Tests as below:
+		<screen>
+			<![CDATA[
+mvn test -DOPENNLP_DATA_DIR=/path/to/opennlp-eval-test-data/ -Peval-tests
+			]]>
+		</screen>
+		</para>
+	</section>
+	<section id="opennlp.evaltest.howtochangeit">
+		<title>How to change evaluation data?</title>
+		<para>
+			OpenNLP Evaluation Tests use <code><ulink url="https://nightlies.apache.org/">nightlies.apache.org</ulink></code> to
+			share data for testing and releasing candidate build.
+			You can also upload the opennlp-data.zip to <code>nightlies.apache.org</code> as below:
+			<screen>
+				<![CDATA[
+curl -u your_asf_username -T ./opennlp-data.zip "https://nightlies.apache.org/opennlp/"
+				]]>
+			</screen>
+			More information about changing the evaluation test data on <code>nightlies.apache.org</code> can be found
+			at :<ulink url="https://nightlies.apache.org/authoring.html">https://nightlies.apache.org/authoring.html
+			</ulink>
+		</para>
+	</section>
+</chapter>
diff --git a/opennlp-docs/src/docbkx/opennlp.xml b/opennlp-docs/src/docbkx/opennlp.xml
index 75184b91..00c55d53 100644
--- a/opennlp-docs/src/docbkx/opennlp.xml
+++ b/opennlp-docs/src/docbkx/opennlp.xml
@@ -92,4 +92,5 @@ under the License.
 	<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./uima-integration.xml" />
 	<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./morfologik-addon.xml" />
 	<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./cli.xml" />
+	<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="evaltest.xml" />
 </book>