You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@manifoldcf.apache.org by kw...@apache.org on 2014/06/10 13:29:43 UTC

svn commit: r1601605 - in /manifoldcf/trunk/site/src/documentation/content/xdocs: en_US/ ja_JP/

Author: kwright
Date: Tue Jun 10 11:29:42 2014
New Revision: 1601605

URL: http://svn.apache.org/r1601605
Log:
Add a section on writing transformation connectors.  Part of CONNECTORS-959

Added:
    manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-transformation-connectors.xml   (with props)
    manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-transformation-connectors.xml   (with props)
Modified:
    manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/technical-resources.xml
    manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-output-connectors.xml
    manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-repository-connectors.xml
    manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/technical-resources.xml
    manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-output-connectors.xml
    manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-repository-connectors.xml

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/technical-resources.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/technical-resources.xml?rev=1601605&r1=1601604&r2=1601605&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/technical-resources.xml (original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/technical-resources.xml Tue Jun 10 11:29:42 2014
@@ -49,6 +49,7 @@
 	</p> 
 	<ul>
 	    <li><a href="writing-output-connectors.html">How to write an output connector</a></li>
+	    <li><a href="writing-transformation-connectors.html">How to write a transformation connector</a></li>
 	    <li><a href="writing-mapping-connectors.html">How to write a user mapping connector</a></li>
 	    <li><a href="writing-authority-connectors.html">How to write an authority connector</a></li>
 	    <li><a href="writing-repository-connectors.html">How to write a repository connector</a></li>

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-output-connectors.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-output-connectors.xml?rev=1601605&r1=1601604&r2=1601605&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-output-connectors.xml (original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-output-connectors.xml Tue Jun 10 11:29:42 2014
@@ -72,18 +72,21 @@
           <p></p>
           <table>
             <tr><th>Method</th><th>What it should do</th></tr>
+            <tr><td><strong>checkMimetypeIndexable()</strong></td><td>Decide whether a document with a given mime type is indexable or not</td></tr>
             <tr><td><strong>checkDocumentIndexable()</strong></td><td>Decide whether a file is indexable or not</td></tr>
-            <tr><td><strong>getOutputDescription()</strong></td><td>Use the supplied output specification to come up with an output version string</td></tr>
+            <tr><td><strong>checkLengthIndexable()</strong></td><td>Decide whether a document of a given length is indexable or not</td></tr>
+            <tr><td><strong>checkURLIndexable()</strong></td><td>Decide whether a document with a given URL is indexable or not</td></tr>
+            <tr><td><strong>getPipelineDescription()</strong></td><td>Use the supplied output specification to come up with an output version string</td></tr>
             <tr><td><strong>addOrReplaceDocument()</strong></td><td>Add or replace the specified document within the target repository, or signal if the document cannot be handled</td></tr>
             <tr><td><strong>removeDocument()</strong></td><td>Remove the specified document from the target repository</td></tr>
             <tr><td><strong>outputConfigurationHeader()</strong></td><td>Output the head-section part of an output connection <em>ConfigParams</em> editing page</td></tr>
             <tr><td><strong>outputConfigurationBody()</strong></td><td>Output the body-section part of an output connection <em>ConfigParams</em> editing page</td></tr>
             <tr><td><strong>processConfigurationPost()</strong></td><td>Receive and process form data from an output connection <em>ConfigParams</em> editing page</td></tr>
             <tr><td><strong>viewConfiguration()</strong></td><td>Output the viewing HTML for an output connection <em>ConfigParams</em> object</td></tr>
-            <tr><td><strong>outputSpecificationHeader()</strong></td><td>Output the head-section part of an <em>OutputSpecification</em> editing page</td></tr>
-            <tr><td><strong>outputSpecificationBody()</strong></td><td>Output the body-section part of an <em>OutputSpecification</em> editing page</td></tr>
-            <tr><td><strong>processSpecificationPost()</strong></td><td>Receive and process form data from an <em>OutputSpecification</em> editing page</td></tr>
-            <tr><td><strong>viewSpecification()</strong></td><td>Output the viewing page for an <em>OutputSpecification</em> object</td></tr>
+            <tr><td><strong>outputSpecificationHeader()</strong></td><td>Output the head-section part of a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>outputSpecificationBody()</strong></td><td>Output the body-section part of a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>processSpecificationPost()</strong></td><td>Receive and process form data from a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>viewSpecification()</strong></td><td>Output the viewing page for a <em>Specification</em> object</td></tr>
           </table>
           <p></p>
           <p>These methods come in three broad classes: (a) functional methods for doing the work of the connector; (b) UI methods for configuring a connection; and (c) UI methods for editing the output specification for a job.  Together they do the heavy lifting of your connector.  But before you can write any code at all, you need to plan things out a bit.</p>

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-repository-connectors.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-repository-connectors.xml?rev=1601605&r1=1601604&r2=1601605&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-repository-connectors.xml (original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-repository-connectors.xml Tue Jun 10 11:29:42 2014
@@ -94,10 +94,10 @@
             <tr><td><strong>outputConfigurationBody()</strong></td><td>Output the body-section part of a repository connection <em>ConfigParams</em> editing page</td></tr>
             <tr><td><strong>processConfigurationPost()</strong></td><td>Receive and process form data from a repository connection <em>ConfigParams</em> editing page</td></tr>
             <tr><td><strong>viewConfiguration()</strong></td><td>Output the viewing HTML for a repository connection <em>ConfigParams</em> object</td></tr>
-            <tr><td><strong>outputSpecificationHeader()</strong></td><td>Output the head-section part of an <em>DocumentSpecification</em> editing page</td></tr>
-            <tr><td><strong>outputSpecificationBody()</strong></td><td>Output the body-section part of an <em>DocumentSpecification</em> editing page</td></tr>
-            <tr><td><strong>processSpecificationPost()</strong></td><td>Receive and process form data from an <em>DocumentSpecification</em> editing page</td></tr>
-            <tr><td><strong>viewSpecification()</strong></td><td>Output the viewing page for an <em>DocumentSpecification</em> object</td></tr>
+            <tr><td><strong>outputSpecificationHeader()</strong></td><td>Output the head-section part of a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>outputSpecificationBody()</strong></td><td>Output the body-section part of a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>processSpecificationPost()</strong></td><td>Receive and process form data from a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>viewSpecification()</strong></td><td>Output the viewing page for a <em>Specification</em> object</td></tr>
           </table>
           <p></p>
           <p>These methods come in three broad classes: (a) functional methods for doing the work of the connector; (b) UI methods for configuring a connection; and (c) UI methods for editing the document specification for a job.  Together they do the heavy lifting of your connector.  But before you can write any code at all, you need to plan things out a bit.</p>

Added: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-transformation-connectors.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-transformation-connectors.xml?rev=1601605&view=auto
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-transformation-connectors.xml (added)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-transformation-connectors.xml Tue Jun 10 11:29:42 2014
@@ -0,0 +1,160 @@
+<?xml version="1.0"?>
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" 
+          "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<document> 
+
+  <header> 
+    <title>Writing transformation connectors</title> 
+  </header> 
+
+  <body> 
+    <section>
+      <title>Writing a Transformation Connector</title>
+      <p></p>
+      <p>A transformation connector provides a mechanism by which content that has been fetched from a repository can be modified before it gets handed to a back-end repository for processing.</p>
+      <p></p>
+      <p>As is the case with all connectors under the ManifoldCF umbrella, a transformation connector consists of a single part, which is:</p>
+      <p></p>
+      <ul>
+        <li>A class implementing an interface (in this case, <em>org.apache.manifoldcf.agents.interfaces.ITransformationConnector</em>)</li>
+      </ul>
+      <p></p>
+      <section>
+        <title>Key concepts</title>
+        <p></p>
+        <p>The transformation connector abstraction makes use of, or introduces, the following concepts:</p>
+        <p></p>
+        <table>
+          <tr><th>Concept</th><th>What it is</th></tr>
+          <tr><td>Configuration parameters</td><td>A hierarchical structure, internally represented as an XML document, which describes a specific configuration of a specific
+            transformation connector, i.e. <strong>how</strong> the connector should do its job; see <em>org.apache.manifoldcf.core.interfaces.ConfigParams</em></td></tr>
+          <tr><td>Transformation connection</td><td>An output connector instance that has been furnished with configuration data</td></tr>
+          <tr><td>Repository document</td><td>An object that describes a document's contents, including raw document data (as a stream), metadata (as either strings or
+            streams), and access tokens; see <em>org.apache.manifoldcf.agents.interfaces.RepositoryDocument</em></td></tr>
+          <tr><td>Connection management/threading/pooling model</td><td>How an individual output connector class instance is managed and used</td></tr>
+          <tr><td>Activity infrastructure</td><td>The framework API provided to specific methods allowing those methods to perform specific actions within the framework,
+            e.g. recording activities; see <em>org.apache.manifoldcf.agents.interfaces.IOutputAddActivity</em> and <em>org.apache.manifoldcf.agents.interfaces.IOutputRemoveActivity</em></td></tr>
+          <tr><td>Output specification</td><td>A hierarchical structure, internally represented as an XML document, which describes <strong>what</strong> a specific
+            transformation connector should do in the context of a specific job; see <em>org.apache.manifoldcf.agents.interfaces.OutputSpecification</em></td></tr>
+          <tr><td>Transformation version string</td><td>A simple string, used for comparison purposes, that allows ManifoldCF to figure out if an ingestion operation needs
+            to be repeated as a result of changes to the output specification in effect for a document</td></tr>
+          <tr><td>Service interruption</td><td>A specific kind of exception that signals ManifoldCF that the transformation engine is unavailable, and gives a best estimate
+            of when it might become available again; see <em>org.apache.manifoldcf.agents.interfaces.ServiceInterruption</em></td></tr>
+        </table>
+        <p></p>
+        <p></p>
+      </section>
+      <section>
+        <title>Implementing the Transformation Connector class</title>
+        <p></p>
+        <p>A very good place to start is to read the javadoc for the transformation connector interface.  You will note that the javadoc describes the usage and pooling model for a
+          connector class pretty thoroughly.  It is very important to understand the model thoroughly in order to write reliable connectors!  Use of static variables, for one thing, must
+          be done in a very careful way, to avoid issues that would be hard to detect with a cursory test.</p>
+        <p></p>
+        <p>The second thing to do is to examine the provided transformation connector implementations.  The only currently enabled connector is the null transformation connector,
+          which demonstrate (to some degree) the sorts of techniques you will need for an effective implementation.  You will also note that this connector extends a framework-provided
+          transformation connector base class, found at <em>org.apache.manifoldcf.agents.transformation.BaseTransformationConnector</em>.  This base class furnishes some basic
+          bookkeeping logic for managing the connector pool, as well as default implementations of some of the less typical functionality a connector may have.  For example, connectors
+          are allowed to have database tables of their own, which are instantiated when the connector is registered, and are torn down when the connector is removed.  This is, however,
+          not very typical, and the base implementation reflects that.</p>
+        <p></p>
+        <section>
+          <title>Principle methods</title>
+          <p></p>
+          <p>The principle methods an implementer should be concerned with for creating an output connector are the following:</p>
+          <p></p>
+          <table>
+            <tr><th>Method</th><th>What it should do</th></tr>
+            <tr><td><strong>checkMimetypeIndexable()</strong></td><td>Decide whether a document with a given mime type is indexable or not</td></tr>
+            <tr><td><strong>checkDocumentIndexable()</strong></td><td>Decide whether a file is indexable or not</td></tr>
+            <tr><td><strong>checkLengthIndexable()</strong></td><td>Decide whether a document of a given length is indexable or not</td></tr>
+            <tr><td><strong>checkURLIndexable()</strong></td><td>Decide whether a document with a given URL is indexable or not</td></tr>
+            <tr><td><strong>getPipelineDescription()</strong></td><td>Use the supplied output specification to come up with an output version string</td></tr>
+            <tr><td><strong>addOrReplaceDocument()</strong></td><td>Add or replace the specified document within the target repository, or signal if the document cannot be handled</td></tr>
+            <tr><td><strong>outputConfigurationHeader()</strong></td><td>Output the head-section part of an output connection <em>ConfigParams</em> editing page</td></tr>
+            <tr><td><strong>outputConfigurationBody()</strong></td><td>Output the body-section part of an output connection <em>ConfigParams</em> editing page</td></tr>
+            <tr><td><strong>processConfigurationPost()</strong></td><td>Receive and process form data from an output connection <em>ConfigParams</em> editing page</td></tr>
+            <tr><td><strong>viewConfiguration()</strong></td><td>Output the viewing HTML for an output connection <em>ConfigParams</em> object</td></tr>
+            <tr><td><strong>outputSpecificationHeader()</strong></td><td>Output the head-section part of a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>outputSpecificationBody()</strong></td><td>Output the body-section part of a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>processSpecificationPost()</strong></td><td>Receive and process form data from a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>viewSpecification()</strong></td><td>Output the viewing page for a <em>Specification</em> object</td></tr>
+          </table>
+          <p></p>
+          <p>These methods come in three broad classes: (a) functional methods for doing the work of the connector; (b) UI methods for configuring a connection; and (c) UI methods for editing the output specification for a job.  Together they do the heavy lifting of your connector.  But before you can write any code at all, you need to plan things out a bit.</p>
+          <p></p>
+        </section>
+        <section>
+          <title>Choosing the form of the transformation version string</title>
+          <p></p>
+          <p>The output version string is used by ManifoldCF to determine whether or not the output specification or configuration changed in such a way as to require that the document be reprocessed.  ManifoldCF therefore requests the output version string for any document that is ready for processing, and usually does not process the document again if the returned output version string agrees with the output version string it has stored.</p>
+          <p></p>
+          <p>Thinking about it more carefully, it is clear that what an output connector writer needs to do is include everything in the output version string that could potentially affect how the document gets ingested, save that which is specific to the repository connector.  That may include bits of output connector configuration information, as well as data from the output specification.  When it's time to ingest, it's usually the correct thing to do to obtain the necessary data for ingestion out of the output version string, rather than calculating it or fetching it anew, because that guarantees that the document processing was done in a manner that agrees with its recorded output version string, thus eliminating any chance of ManifoldCF getting confused.</p>
+          <p></p>
+        </section>
+        <section>
+          <title>Notes on connector UI methods</title>
+          <p></p>
+          <p>The crawler UI uses a tabbed layout structure, and thus each of the UI methods must properly implement the tabbed model.  This means that the "header" methods above must add the desired tab names to a specified array, and the "body" methods must provide appropriate HTML which handles both the case where a tab is displayed, and where it is not displayed.  Also, it makes sense to use the appropriate css definitions, so that the connector UI pages have a similar look-and-feel to the rest of ManifoldCF's crawler ui.  We strongly suggest starting with one of the supplied connector's UI code, both for a description of the arguments to each method, and for some decent ideas of ways to organize your connector's UI code.</p>
+          <p></p>
+          <p>Please also note that it is good practice to name the form fields in your HTML in such a way that they cannot collide with form fields that may come from the framework's HTML or any specific repository connector's HTML.  The <em>OutputSpecification</em> HTML especially may be prone to collisions, because within any given job, this HTML is included in the same page as HTML from the chosen repository connector.</p>
+          <p></p>
+          <p></p>
+        </section>
+      </section>
+      <section>
+        <title>Implementation support provided by the framework</title>
+        <p></p>
+        <p>ManifoldCF's framework provides a number of helpful services designed to make the creation of a connector easier.  These services are summarized below.  (This is not an exhaustive list, by any means.)</p>
+        <p></p>
+        <ul>
+          <li>Lock management and synchronization (see <em>org.apache.manifoldcf.core.interfaces.LockManagerFactory</em>)</li>
+          <li>Cache management (see <em>org.apache.manifoldcf.core.interfaces.CacheManagerFactory</em>)</li>
+          <li>Local keystore management (see <em>org.apache.manifoldcf.core.KeystoreManagerFactory</em>)</li>
+          <li>Database management (see <em>org.apache.manifoldcf.core.DBInterfaceFactory</em>)</li>
+        </ul>
+        <p></p>
+        <p>For UI method support, these too are very useful:</p>
+        <p></p>
+        <ul>
+          <li>Multipart form processing (see <em>org.apache.manifoldcf.ui.multipart.MultipartWrapper</em>)</li>
+          <li>HTML encoding (see <em>org.apache.manifoldcf.ui.util.Encoder</em>)</li>
+          <li>HTML formatting (see <em>org.apache.manifoldcf.ui.util.Formatter</em>)</li>
+        </ul>
+        <p></p>
+      </section>
+      <section>
+        <title>DO's and DON'T DO's</title>
+        <p></p>
+        <p>It's always a good idea to make use of an existing infrastructure component, if it's meant for that purpose, rather than inventing your own.  There are, however, some limitations we recommend you adhere to.</p>
+        <p></p>
+        <ul>
+          <li>DO make use of infrastructure components described in the section above</li>
+          <li>DON'T make use of infrastructure components that aren't mentioned, without checking first</li>
+          <li>NEVER write connector code that directly uses framework database tables, other than the ones installed and managed by your connector</li>
+        </ul>
+        <p></p>
+        <p>If you are tempted to violate these rules, it may well mean you don't understand something important.  At the very least, we'd like to know why.  Send email to dev@manifoldcf.apache.org with a description of your problem and how you are tempted to solve it.</p>
+      </section>
+    </section>
+  </body>
+</document>
\ No newline at end of file

Propchange: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-transformation-connectors.xml
------------------------------------------------------------------------------
    svn:eol-style = native

Propchange: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-transformation-connectors.xml
------------------------------------------------------------------------------
    svn:keywords = Id

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/technical-resources.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/technical-resources.xml?rev=1601605&r1=1601604&r2=1601605&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/technical-resources.xml (original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/technical-resources.xml Tue Jun 10 11:29:42 2014
@@ -49,6 +49,7 @@
 	</p> 
 	<ul>
 	    <li><a href="writing-output-connectors.html">出力コネクションの作成</a></li>
+	    <li><a href="writing-transformation-connectors.html">How to write a transformation connector</a></li>
 	    <li><a href="writing-mapping-connectors.html">How to write a user mapping connector</a></li>
 	    <li><a href="writing-authority-connectors.html">権限コネクションの作成</a></li>
 	    <li><a href="writing-repository-connectors.html">リポジトリコネクションの作成</a></li>

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-output-connectors.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-output-connectors.xml?rev=1601605&r1=1601604&r2=1601605&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-output-connectors.xml (original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-output-connectors.xml Tue Jun 10 11:29:42 2014
@@ -72,18 +72,21 @@
           <p></p>
           <table>
             <tr><th>Method</th><th>What it should do</th></tr>
+            <tr><td><strong>checkMimetypeIndexable()</strong></td><td>Decide whether a document with a given mime type is indexable or not</td></tr>
             <tr><td><strong>checkDocumentIndexable()</strong></td><td>Decide whether a file is indexable or not</td></tr>
-            <tr><td><strong>getOutputDescription()</strong></td><td>Use the supplied output specification to come up with an output version string</td></tr>
+            <tr><td><strong>checkLengthIndexable()</strong></td><td>Decide whether a document of a given length is indexable or not</td></tr>
+            <tr><td><strong>checkURLIndexable()</strong></td><td>Decide whether a document with a given URL is indexable or not</td></tr>
+            <tr><td><strong>getPipelineDescription()</strong></td><td>Use the supplied output specification to come up with an output version string</td></tr>
             <tr><td><strong>addOrReplaceDocument()</strong></td><td>Add or replace the specified document within the target repository, or signal if the document cannot be handled</td></tr>
             <tr><td><strong>removeDocument()</strong></td><td>Remove the specified document from the target repository</td></tr>
             <tr><td><strong>outputConfigurationHeader()</strong></td><td>Output the head-section part of an output connection <em>ConfigParams</em> editing page</td></tr>
             <tr><td><strong>outputConfigurationBody()</strong></td><td>Output the body-section part of an output connection <em>ConfigParams</em> editing page</td></tr>
             <tr><td><strong>processConfigurationPost()</strong></td><td>Receive and process form data from an output connection <em>ConfigParams</em> editing page</td></tr>
             <tr><td><strong>viewConfiguration()</strong></td><td>Output the viewing HTML for an output connection <em>ConfigParams</em> object</td></tr>
-            <tr><td><strong>outputSpecificationHeader()</strong></td><td>Output the head-section part of an <em>OutputSpecification</em> editing page</td></tr>
-            <tr><td><strong>outputSpecificationBody()</strong></td><td>Output the body-section part of an <em>OutputSpecification</em> editing page</td></tr>
-            <tr><td><strong>processSpecificationPost()</strong></td><td>Receive and process form data from an <em>OutputSpecification</em> editing page</td></tr>
-            <tr><td><strong>viewSpecification()</strong></td><td>Output the viewing page for an <em>OutputSpecification</em> object</td></tr>
+            <tr><td><strong>outputSpecificationHeader()</strong></td><td>Output the head-section part of a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>outputSpecificationBody()</strong></td><td>Output the body-section part of a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>processSpecificationPost()</strong></td><td>Receive and process form data from a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>viewSpecification()</strong></td><td>Output the viewing page for a <em>Specification</em> object</td></tr>
           </table>
           <p></p>
           <p>These methods come in three broad classes: (a) functional methods for doing the work of the connector; (b) UI methods for configuring a connection; and (c) UI methods for editing the output specification for a job.  Together they do the heavy lifting of your connector.  But before you can write any code at all, you need to plan things out a bit.</p>

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-repository-connectors.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-repository-connectors.xml?rev=1601605&r1=1601604&r2=1601605&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-repository-connectors.xml (original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-repository-connectors.xml Tue Jun 10 11:29:42 2014
@@ -94,10 +94,10 @@
             <tr><td><strong>outputConfigurationBody()</strong></td><td>Output the body-section part of a repository connection <em>ConfigParams</em> editing page</td></tr>
             <tr><td><strong>processConfigurationPost()</strong></td><td>Receive and process form data from a repository connection <em>ConfigParams</em> editing page</td></tr>
             <tr><td><strong>viewConfiguration()</strong></td><td>Output the viewing HTML for a repository connection <em>ConfigParams</em> object</td></tr>
-            <tr><td><strong>outputSpecificationHeader()</strong></td><td>Output the head-section part of an <em>DocumentSpecification</em> editing page</td></tr>
-            <tr><td><strong>outputSpecificationBody()</strong></td><td>Output the body-section part of an <em>DocumentSpecification</em> editing page</td></tr>
-            <tr><td><strong>processSpecificationPost()</strong></td><td>Receive and process form data from an <em>DocumentSpecification</em> editing page</td></tr>
-            <tr><td><strong>viewSpecification()</strong></td><td>Output the viewing page for an <em>DocumentSpecification</em> object</td></tr>
+            <tr><td><strong>outputSpecificationHeader()</strong></td><td>Output the head-section part of a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>outputSpecificationBody()</strong></td><td>Output the body-section part of a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>processSpecificationPost()</strong></td><td>Receive and process form data from a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>viewSpecification()</strong></td><td>Output the viewing page for a <em>Specification</em> object</td></tr>
           </table>
           <p></p>
           <p>These methods come in three broad classes: (a) functional methods for doing the work of the connector; (b) UI methods for configuring a connection; and (c) UI methods for editing the document specification for a job.  Together they do the heavy lifting of your connector.  But before you can write any code at all, you need to plan things out a bit.</p>

Added: manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-transformation-connectors.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-transformation-connectors.xml?rev=1601605&view=auto
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-transformation-connectors.xml (added)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-transformation-connectors.xml Tue Jun 10 11:29:42 2014
@@ -0,0 +1,160 @@
+<?xml version="1.0"?>
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" 
+          "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<document> 
+
+  <header> 
+    <title>Writing transformation connectors</title> 
+  </header> 
+
+  <body> 
+    <section>
+      <title>Writing a Transformation Connector</title>
+      <p></p>
+      <p>A transformation connector provides a mechanism by which content that has been fetched from a repository can be modified before it gets handed to a back-end repository for processing.</p>
+      <p></p>
+      <p>As is the case with all connectors under the ManifoldCF umbrella, a transformation connector consists of a single part, which is:</p>
+      <p></p>
+      <ul>
+        <li>A class implementing an interface (in this case, <em>org.apache.manifoldcf.agents.interfaces.ITransformationConnector</em>)</li>
+      </ul>
+      <p></p>
+      <section>
+        <title>Key concepts</title>
+        <p></p>
+        <p>The transformation connector abstraction makes use of, or introduces, the following concepts:</p>
+        <p></p>
+        <table>
+          <tr><th>Concept</th><th>What it is</th></tr>
+          <tr><td>Configuration parameters</td><td>A hierarchical structure, internally represented as an XML document, which describes a specific configuration of a specific
+            transformation connector, i.e. <strong>how</strong> the connector should do its job; see <em>org.apache.manifoldcf.core.interfaces.ConfigParams</em></td></tr>
+          <tr><td>Transformation connection</td><td>An output connector instance that has been furnished with configuration data</td></tr>
+          <tr><td>Repository document</td><td>An object that describes a document's contents, including raw document data (as a stream), metadata (as either strings or
+            streams), and access tokens; see <em>org.apache.manifoldcf.agents.interfaces.RepositoryDocument</em></td></tr>
+          <tr><td>Connection management/threading/pooling model</td><td>How an individual output connector class instance is managed and used</td></tr>
+          <tr><td>Activity infrastructure</td><td>The framework API provided to specific methods allowing those methods to perform specific actions within the framework,
+            e.g. recording activities; see <em>org.apache.manifoldcf.agents.interfaces.IOutputAddActivity</em> and <em>org.apache.manifoldcf.agents.interfaces.IOutputRemoveActivity</em></td></tr>
+          <tr><td>Output specification</td><td>A hierarchical structure, internally represented as an XML document, which describes <strong>what</strong> a specific
+            transformation connector should do in the context of a specific job; see <em>org.apache.manifoldcf.agents.interfaces.OutputSpecification</em></td></tr>
+          <tr><td>Transformation version string</td><td>A simple string, used for comparison purposes, that allows ManifoldCF to figure out if an ingestion operation needs
+            to be repeated as a result of changes to the output specification in effect for a document</td></tr>
+          <tr><td>Service interruption</td><td>A specific kind of exception that signals ManifoldCF that the transformation engine is unavailable, and gives a best estimate
+            of when it might become available again; see <em>org.apache.manifoldcf.agents.interfaces.ServiceInterruption</em></td></tr>
+        </table>
+        <p></p>
+        <p></p>
+      </section>
+      <section>
+        <title>Implementing the Transformation Connector class</title>
+        <p></p>
+        <p>A very good place to start is to read the javadoc for the transformation connector interface.  You will note that the javadoc describes the usage and pooling model for a
+          connector class pretty thoroughly.  It is very important to understand the model thoroughly in order to write reliable connectors!  Use of static variables, for one thing, must
+          be done in a very careful way, to avoid issues that would be hard to detect with a cursory test.</p>
+        <p></p>
+        <p>The second thing to do is to examine the provided transformation connector implementations.  The only currently enabled connector is the null transformation connector,
+          which demonstrate (to some degree) the sorts of techniques you will need for an effective implementation.  You will also note that this connector extends a framework-provided
+          transformation connector base class, found at <em>org.apache.manifoldcf.agents.transformation.BaseTransformationConnector</em>.  This base class furnishes some basic
+          bookkeeping logic for managing the connector pool, as well as default implementations of some of the less typical functionality a connector may have.  For example, connectors
+          are allowed to have database tables of their own, which are instantiated when the connector is registered, and are torn down when the connector is removed.  This is, however,
+          not very typical, and the base implementation reflects that.</p>
+        <p></p>
+        <section>
+          <title>Principle methods</title>
+          <p></p>
+          <p>The principle methods an implementer should be concerned with for creating an output connector are the following:</p>
+          <p></p>
+          <table>
+            <tr><th>Method</th><th>What it should do</th></tr>
+            <tr><td><strong>checkMimetypeIndexable()</strong></td><td>Decide whether a document with a given mime type is indexable or not</td></tr>
+            <tr><td><strong>checkDocumentIndexable()</strong></td><td>Decide whether a file is indexable or not</td></tr>
+            <tr><td><strong>checkLengthIndexable()</strong></td><td>Decide whether a document of a given length is indexable or not</td></tr>
+            <tr><td><strong>checkURLIndexable()</strong></td><td>Decide whether a document with a given URL is indexable or not</td></tr>
+            <tr><td><strong>getPipelineDescription()</strong></td><td>Use the supplied output specification to come up with an output version string</td></tr>
+            <tr><td><strong>addOrReplaceDocument()</strong></td><td>Add or replace the specified document within the target repository, or signal if the document cannot be handled</td></tr>
+            <tr><td><strong>outputConfigurationHeader()</strong></td><td>Output the head-section part of an output connection <em>ConfigParams</em> editing page</td></tr>
+            <tr><td><strong>outputConfigurationBody()</strong></td><td>Output the body-section part of an output connection <em>ConfigParams</em> editing page</td></tr>
+            <tr><td><strong>processConfigurationPost()</strong></td><td>Receive and process form data from an output connection <em>ConfigParams</em> editing page</td></tr>
+            <tr><td><strong>viewConfiguration()</strong></td><td>Output the viewing HTML for an output connection <em>ConfigParams</em> object</td></tr>
+            <tr><td><strong>outputSpecificationHeader()</strong></td><td>Output the head-section part of a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>outputSpecificationBody()</strong></td><td>Output the body-section part of a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>processSpecificationPost()</strong></td><td>Receive and process form data from a <em>Specification</em> editing page</td></tr>
+            <tr><td><strong>viewSpecification()</strong></td><td>Output the viewing page for a <em>Specification</em> object</td></tr>
+          </table>
+          <p></p>
+          <p>These methods come in three broad classes: (a) functional methods for doing the work of the connector; (b) UI methods for configuring a connection; and (c) UI methods for editing the output specification for a job.  Together they do the heavy lifting of your connector.  But before you can write any code at all, you need to plan things out a bit.</p>
+          <p></p>
+        </section>
+        <section>
+          <title>Choosing the form of the transformation version string</title>
+          <p></p>
+          <p>The output version string is used by ManifoldCF to determine whether or not the output specification or configuration changed in such a way as to require that the document be reprocessed.  ManifoldCF therefore requests the output version string for any document that is ready for processing, and usually does not process the document again if the returned output version string agrees with the output version string it has stored.</p>
+          <p></p>
+          <p>Thinking about it more carefully, it is clear that what an output connector writer needs to do is include everything in the output version string that could potentially affect how the document gets ingested, save that which is specific to the repository connector.  That may include bits of output connector configuration information, as well as data from the output specification.  When it's time to ingest, it's usually the correct thing to do to obtain the necessary data for ingestion out of the output version string, rather than calculating it or fetching it anew, because that guarantees that the document processing was done in a manner that agrees with its recorded output version string, thus eliminating any chance of ManifoldCF getting confused.</p>
+          <p></p>
+        </section>
+        <section>
+          <title>Notes on connector UI methods</title>
+          <p></p>
+          <p>The crawler UI uses a tabbed layout structure, and thus each of the UI methods must properly implement the tabbed model.  This means that the "header" methods above must add the desired tab names to a specified array, and the "body" methods must provide appropriate HTML which handles both the case where a tab is displayed, and where it is not displayed.  Also, it makes sense to use the appropriate css definitions, so that the connector UI pages have a similar look-and-feel to the rest of ManifoldCF's crawler ui.  We strongly suggest starting with one of the supplied connector's UI code, both for a description of the arguments to each method, and for some decent ideas of ways to organize your connector's UI code.</p>
+          <p></p>
+          <p>Please also note that it is good practice to name the form fields in your HTML in such a way that they cannot collide with form fields that may come from the framework's HTML or any specific repository connector's HTML.  The <em>OutputSpecification</em> HTML especially may be prone to collisions, because within any given job, this HTML is included in the same page as HTML from the chosen repository connector.</p>
+          <p></p>
+          <p></p>
+        </section>
+      </section>
+      <section>
+        <title>Implementation support provided by the framework</title>
+        <p></p>
+        <p>ManifoldCF's framework provides a number of helpful services designed to make the creation of a connector easier.  These services are summarized below.  (This is not an exhaustive list, by any means.)</p>
+        <p></p>
+        <ul>
+          <li>Lock management and synchronization (see <em>org.apache.manifoldcf.core.interfaces.LockManagerFactory</em>)</li>
+          <li>Cache management (see <em>org.apache.manifoldcf.core.interfaces.CacheManagerFactory</em>)</li>
+          <li>Local keystore management (see <em>org.apache.manifoldcf.core.KeystoreManagerFactory</em>)</li>
+          <li>Database management (see <em>org.apache.manifoldcf.core.DBInterfaceFactory</em>)</li>
+        </ul>
+        <p></p>
+        <p>For UI method support, these too are very useful:</p>
+        <p></p>
+        <ul>
+          <li>Multipart form processing (see <em>org.apache.manifoldcf.ui.multipart.MultipartWrapper</em>)</li>
+          <li>HTML encoding (see <em>org.apache.manifoldcf.ui.util.Encoder</em>)</li>
+          <li>HTML formatting (see <em>org.apache.manifoldcf.ui.util.Formatter</em>)</li>
+        </ul>
+        <p></p>
+      </section>
+      <section>
+        <title>DO's and DON'T DO's</title>
+        <p></p>
+        <p>It's always a good idea to make use of an existing infrastructure component, if it's meant for that purpose, rather than inventing your own.  There are, however, some limitations we recommend you adhere to.</p>
+        <p></p>
+        <ul>
+          <li>DO make use of infrastructure components described in the section above</li>
+          <li>DON'T make use of infrastructure components that aren't mentioned, without checking first</li>
+          <li>NEVER write connector code that directly uses framework database tables, other than the ones installed and managed by your connector</li>
+        </ul>
+        <p></p>
+        <p>If you are tempted to violate these rules, it may well mean you don't understand something important.  At the very least, we'd like to know why.  Send email to dev@manifoldcf.apache.org with a description of your problem and how you are tempted to solve it.</p>
+      </section>
+    </section>
+  </body>
+</document>
\ No newline at end of file

Propchange: manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-transformation-connectors.xml
------------------------------------------------------------------------------
    svn:eol-style = native

Propchange: manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-transformation-connectors.xml
------------------------------------------------------------------------------
    svn:keywords = Id