You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@manifoldcf.apache.org by kw...@apache.org on 2014/11/07 09:26:28 UTC

svn commit: r1637310 - in /manifoldcf/trunk: CHANGES.txt site/src/documentation/content/xdocs/en_US/end-user-documentation.xml site/src/documentation/resources/images/en_US/tika-job-boilerplate.PNG

Author: kwright
Date: Fri Nov  7 08:26:27 2014
New Revision: 1637310

URL: http://svn.apache.org/r1637310
Log:
Fix for CONNECTORS-1096.

Added:
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/tika-job-boilerplate.PNG   (with props)
Modified:
    manifoldcf/trunk/CHANGES.txt
    manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml

Modified: manifoldcf/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/CHANGES.txt?rev=1637310&r1=1637309&r2=1637310&view=diff
==============================================================================
--- manifoldcf/trunk/CHANGES.txt (original)
+++ manifoldcf/trunk/CHANGES.txt Fri Nov  7 08:26:27 2014
@@ -3,6 +3,9 @@ $Id$
 
 ======================= 2.0-dev =====================
 
+CONNECTORS-1096: Document boilerplate removal options.
+(Karl Wright)
+
 CONNECTORS-1095: Use https for downloading everywhere.
 (Aeham Abushwashi)
 

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml?rev=1637310&r1=1637309&r2=1637310&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml (original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml Fri Nov  7 08:26:27 2014
@@ -953,6 +953,8 @@ curl -XGET http://localhost:9200/index/_
 
             <section id="nulltransformer">
                 <title>Null Transformer</title>
+                <p>The null transformer does nothing other than record activity through the transformer.  It is thus useful primarily as a coding model, and a diagnostic
+                      aid.  It requires no non-standard configuration information, and provides no tabs for a job that includes it.</p>
             </section>
 
             <section id="tikaextractor">
@@ -964,8 +966,8 @@ curl -XGET http://localhost:9200/index/_
                 <p>As with all document transformers,  more than one Tika Content Extractor transformation filter can be used in a single pipeline.  In the case
                       of the Tika Content Extractor, this does not seem to be of much utility.</p>
                 <p>The Tika Content Extractor transformation connection type does not require anything other than standard configuration information.</p>
-                <p>The Tika Content Extractor transformation connection type contributes two tabs to a job definition.  These are the "Field mapping" tab, and the "Exceptions" tab.
-                      The "Field mapping" tab looks like this:</p>
+                <p>The Tika Content Extractor transformation connection type contributes three tabs to a job definition.  These are the "Field mapping" tab, the "Exceptions" tab,
+                      and the "Boilerplate" tab.  The "Field mapping" tab looks like this:</p>
                 <br/><br/>
                 <figure src="images/en_US/tika-job-field-mapping.PNG" alt="Tika Content Extractor specification, Field Mapping tab" width="80%"/>
                 <br/><br/>
@@ -976,6 +978,12 @@ curl -XGET http://localhost:9200/index/_
                 <figure src="images/en_US/tika-job-exceptions.PNG" alt="Tika Content Extractor specification, Exceptions tab" width="80%"/>
                 <br/><br/>
                 <p>Uncheck the checkbox to allow indexing of document metadata even when Tika fails to extract content from the document.</p>
+                <p>The "Boilerplate" tab looks like this:</p>
+                <br/><br/>
+                <figure src="images/en_US/tika-job-boilerplate.PNG" alt="Tika Content Extractor specification, Boilerplate tab" width="80%"/>
+                <br/><br/>
+                <p>Select the HTML boilerplate removal option you want.  These are implementations provided by the "Boilerpipe" project; they are lightly documented,
+                      so you will need to experiment with your particular application to find the one most appropriate for your application.</p>
             </section>
 
         </section>

Added: manifoldcf/trunk/site/src/documentation/resources/images/en_US/tika-job-boilerplate.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/tika-job-boilerplate.PNG?rev=1637310&view=auto
==============================================================================
Binary file - no diff available.

Propchange: manifoldcf/trunk/site/src/documentation/resources/images/en_US/tika-job-boilerplate.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream