You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@manifoldcf.apache.org by kw...@apache.org on 2014/11/07 09:26:28 UTC
svn commit: r1637310 - in /manifoldcf/trunk: CHANGES.txt
site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
site/src/documentation/resources/images/en_US/tika-job-boilerplate.PNG
Author: kwright
Date: Fri Nov 7 08:26:27 2014
New Revision: 1637310
URL: http://svn.apache.org/r1637310
Log:
Fix for CONNECTORS-1096.
Added:
manifoldcf/trunk/site/src/documentation/resources/images/en_US/tika-job-boilerplate.PNG (with props)
Modified:
manifoldcf/trunk/CHANGES.txt
manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
Modified: manifoldcf/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/CHANGES.txt?rev=1637310&r1=1637309&r2=1637310&view=diff
==============================================================================
--- manifoldcf/trunk/CHANGES.txt (original)
+++ manifoldcf/trunk/CHANGES.txt Fri Nov 7 08:26:27 2014
@@ -3,6 +3,9 @@ $Id$
======================= 2.0-dev =====================
+CONNECTORS-1096: Document boilerplate removal options.
+(Karl Wright)
+
CONNECTORS-1095: Use https for downloading everywhere.
(Aeham Abushwashi)
Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml?rev=1637310&r1=1637309&r2=1637310&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml (original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml Fri Nov 7 08:26:27 2014
@@ -953,6 +953,8 @@ curl -XGET http://localhost:9200/index/_
<section id="nulltransformer">
<title>Null Transformer</title>
+ <p>The null transformer does nothing other than record activity through the transformer. It is thus useful primarily as a coding model, and a diagnostic
+ aid. It requires no non-standard configuration information, and provides no tabs for a job that includes it.</p>
</section>
<section id="tikaextractor">
@@ -964,8 +966,8 @@ curl -XGET http://localhost:9200/index/_
<p>As with all document transformers, more than one Tika Content Extractor transformation filter can be used in a single pipeline. In the case
of the Tika Content Extractor, this does not seem to be of much utility.</p>
<p>The Tika Content Extractor transformation connection type does not require anything other than standard configuration information.</p>
- <p>The Tika Content Extractor transformation connection type contributes two tabs to a job definition. These are the "Field mapping" tab, and the "Exceptions" tab.
- The "Field mapping" tab looks like this:</p>
+ <p>The Tika Content Extractor transformation connection type contributes three tabs to a job definition. These are the "Field mapping" tab, the "Exceptions" tab,
+ and the "Boilerplate" tab. The "Field mapping" tab looks like this:</p>
<br/><br/>
<figure src="images/en_US/tika-job-field-mapping.PNG" alt="Tika Content Extractor specification, Field Mapping tab" width="80%"/>
<br/><br/>
@@ -976,6 +978,12 @@ curl -XGET http://localhost:9200/index/_
<figure src="images/en_US/tika-job-exceptions.PNG" alt="Tika Content Extractor specification, Exceptions tab" width="80%"/>
<br/><br/>
<p>Uncheck the checkbox to allow indexing of document metadata even when Tika fails to extract content from the document.</p>
+ <p>The "Boilerplate" tab looks like this:</p>
+ <br/><br/>
+ <figure src="images/en_US/tika-job-boilerplate.PNG" alt="Tika Content Extractor specification, Boilerplate tab" width="80%"/>
+ <br/><br/>
+ <p>Select the HTML boilerplate removal option you want. These are implementations provided by the "Boilerpipe" project; they are lightly documented,
+ so you will need to experiment with your particular application to find the one most appropriate for your application.</p>
</section>
</section>
Added: manifoldcf/trunk/site/src/documentation/resources/images/en_US/tika-job-boilerplate.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/tika-job-boilerplate.PNG?rev=1637310&view=auto
==============================================================================
Binary file - no diff available.
Propchange: manifoldcf/trunk/site/src/documentation/resources/images/en_US/tika-job-boilerplate.PNG
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream