You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@manifoldcf.apache.org by kw...@apache.org on 2013/03/16 22:03:09 UTC

svn commit: r1457305 [3/3] - in /manifoldcf/trunk: ./ connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/ framework/crawler-ui/src/main/webapp/ framework/pull-agent/src/main/java/org/apache/manifoldcf/craw...

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/programmatic-operation.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/programmatic-operation.xml?rev=1457305&r1=1457304&r2=1457305&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/programmatic-operation.xml (original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/programmatic-operation.xml Sat Mar 16 21:03:08 2013
@@ -30,7 +30,9 @@
     <section>
       <title>Programmatic Operation</title>
       <p></p>
-      <p>A certain subset of ManifoldCF users want to think of ManifoldCF as an engine that they can poke from whatever other system they are developing.  While ManifoldCF is not precisely a document indexing engine per se, it can certainly be controlled programmatically.  Right now, there are three principle ways of achieving this control.</p>
+      <p>A certain subset of ManifoldCF users want to think of ManifoldCF as an engine that they can poke from whatever other system they are developing.  While
+        ManifoldCF is not precisely a document indexing engine per se, it can certainly be controlled programmatically.  Right now, there are three principle ways of
+        achieving this control.</p>
       <p></p>
       <section>
         <title>Control by Servlet API</title>
@@ -46,17 +48,17 @@
           <p>http[s]://<em>&lt;server_and_port&gt;</em>/mcf-api-service/json/<em>&lt;resource&gt;</em></p>
           <p></p>
           <p>The servlet ignores request data, except when the PUT or POST verb is used.  In that case, the request data is presumed to be a JSON object.  The servlet
-            responds either with an error response code (either 400 or 500) with an appropriate explanatory message, or with a 200 (OK), 201 (CREATED), or 404 (NOT FOUND)
-            response code along with a response JSON object.</p>
+            responds either with an error response code (either 400 or 500) with an appropriate explanatory message, or with a 200 (OK), 201 (CREATED), or
+            404 (NOT FOUND) response code along with a response JSON object.</p>
           <p></p>
         </section>
         <section>
           <title>JSON equivalents for ManifoldCF</title>
           <p></p>
-          <p>ManifoldCF treats certain JSON forms as equivalent, for the purposes of readability.  For example, the array form <strong>"foo" : [ { ... } ]</strong> is treated equivalently to
-            <strong>"foo" : { }</strong>, whenever there is only one array element.  This gives a coder some flexibility as to how s/he encodes JSON in requests.  Please also be aware that
-            similar compressions will occur in the JSON responses from the API servlet, and your code must be able to deal with this possibility.  The following table
-            describes some of the equivalences:</p>
+          <p>ManifoldCF treats certain JSON forms as equivalent, for the purposes of readability.  For example, the array form <strong>"foo" : [ { ... } ]</strong> is
+            treated equivalently to <strong>"foo" : { }</strong>, whenever there is only one array element.  This gives a coder some flexibility as to how s/he encodes
+            JSON in requests.  Please also be aware that similar compressions will occur in the JSON responses from the API servlet, and your code must be able to deal
+            with this possibility.  The following table describes some of the equivalences:</p>
           <p></p>
           <p></p>
           <p></p>
@@ -107,8 +109,10 @@
             <tr><td>jobstatuses/<em>&lt;job_id&gt;</em></td><td>GET</td><td>Get a specific job's status</td><td>N/A</td><td>{"jobstatus":<em>&lt;job_status_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} </td></tr>
             <tr><td>jobstatusesnocounts/<em>&lt;job_id&gt;</em></td><td>GET</td><td>Get a specific job's status, returning '0' for all counts</td><td>N/A</td><td>{"jobstatus":<em>&lt;job_status_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} </td></tr>
             <tr><td>start/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Start a specified job manually</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>startminimal/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Start a specified job manually, minimal run requested</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
             <tr><td>abort/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Abort a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
             <tr><td>restart/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Stop and start a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>restartminimal/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Stop and start a specified job, minimal run requested</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
             <tr><td>pause/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Pause a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
             <tr><td>resume/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Resume a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
 
@@ -343,6 +347,7 @@
             <tr><td>"year"</td><td>The optional year enumeration object</td></tr>
             <tr><td>"hourofday"</td><td>The optional hour-of-the-day enumeration object</td></tr>
             <tr><td>"minutesofhour"</td><td>The optional minutes-of-the-hour enumeration object</td></tr>
+            <tr><td>"requestminimum"</td><td>Optional flag indicating whether the job run will be minimal or not ("true" means minimal)</td></tr>
           </table>
           <p></p>
           <p>Each enumeration object describes an array of integers using the form:</p>
@@ -434,16 +439,21 @@
       <section>
         <title>Control by direct code</title>
         <p></p>
-        <p>Control by direct java code is quite a reasonable thing to do.  The sources of the above commands should give a pretty clear idea how to proceed, if that's what you want to do.</p>
+        <p>Control by direct java code is quite a reasonable thing to do.  The sources of the above commands should give a pretty clear idea how to proceed, if that's what you
+          want to do.</p>
         <p></p>
         <p></p>
       </section>
       <section>
         <title>Caveats</title>
         <p></p>
-        <p>The above commands know nothing about the differences between connection types.  Instead, they deal with configuration and specification information in the form of XML documents.  Normally, these XML documents are hidden from a system integrator, unless they happen to look into the database with a tool such as psql.  But the API commands above often will require such XML documents to be included as part of the command execution.</p>
+        <p>The above commands know nothing about the differences between connection types.  Instead, they deal with configuration and specification information in the
+          form of XML documents.  Normally, these XML documents are hidden from a system integrator, unless they happen to look into the database with a tool such as
+          psql.  But the API commands above often will require such XML documents to be included as part of the command execution.</p>
         <p></p>
-        <p>This has one major consequence.  Any application that would manipulate connections and jobs directly cannot be connection-type independent - these applications must know the proper form of XML to submit to the command.  So, it is not possible to use these command APIs to write one's own UI wrapper, without sacrificing some of the repository independence that ManifoldCF by itself maintains.</p>
+        <p>This has one major consequence.  Any application that would manipulate connections and jobs directly cannot be connection-type independent - these
+          applications must know the proper form of XML to submit to the command.  So, it is not possible to use these command APIs to write one's own UI wrapper,
+          without sacrificing some of the repository independence that ManifoldCF by itself maintains.</p>
       </section>
     </section>
   </body>

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-repository-connectors.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-repository-connectors.xml?rev=1457305&r1=1457304&r2=1457305&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-repository-connectors.xml (original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/writing-repository-connectors.xml Sat Mar 16 21:03:08 2013
@@ -73,8 +73,9 @@
           <li>LiveLink (demonstrates use of local keystore infrastructure)</li>
           <li>Meridio (local keystore, web services, result sets)</li>
           <li>SharePoint (local keystore, web services)</li>
-          <li>RSS (local keystore, binning)</li>
-          <li>Web (local database schema, local keystore, binning, events and prerequisites, cache management)</li>
+          <li>RSS (local keystore, binning, fuzzy xml parsing)</li>
+          <li>Web (local database schema, local keystore, binning, events and prerequisites, cache management, fuzzy xml parsing)</li>
+          <li>Wiki (binning, rigorous xml parsing)</li>
         </ul>
         <p></p>
         <p>You will also note that all of these connectors extend a framework-provided repository connector base class, found at <em>org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector</em>.  This base class furnishes some basic bookkeeping logic for managing the connector pool, as well as default implementations of some of the less typical functionality a connector may have.  For example, connectors are allowed to have database tables of their own, which are instantiated when the connector is registered, and are torn down when the connector is removed.  This is, however, not very typical, and the base implementation reflects that.</p>
@@ -109,15 +110,24 @@
           <p></p>
           <table>
             <tr><th>Model</th><th>Description</th></tr>
+            <tr><td><em>MODEL_ALL</em></td><td>The <strong>addSeedDocuments()</strong> method supplies all specified documents on each call</td></tr>
+            <tr><td><em>MODEL_PARTIAL</em></td><td>The <strong>addSeedDocuments()</strong> does not return a complete list of documents that match the criteria and time interval, because some of those documents are no longer discoverable</td></tr>
             <tr><td><em>MODEL_ADD</em></td><td>The <strong>addSeedDocuments()</strong> method supplies at least all the matching documents that have been added to the repository, within the specified time interval</td></tr>
             <tr><td><em>MODEL_ADD_CHANGE</em></td><td>The <strong>addSeedDocuments()</strong> method supplies at least those matching documents that have been added or changed in the repository, within the specified time interval</td></tr>
             <tr><td><em>MODEL_ADD_CHANGE_DELETE</em></td><td>The <strong>addSeedDocuments()</strong> method supplies at least those matching documents that have been added, changed, or removed in the repository, within the specified time interval</td></tr>
-            <tr><td><em>MODEL_PARTIAL</em></td><td>The <strong>addSeedDocuments()</strong> does not return a complete list of documents that match the criteria and time interval, because some of those documents are no longer discoverable</td></tr>
+            <tr><td><em>MODEL_CHAINED_ADD</em></td><td>The <strong>addSeedDocuments()</strong> method, plus documents reachable by discovery from seeds, supplies at least all the matching documents that have been added to the repository, within the specified time interval</td></tr>
+            <tr><td><em>MODEL_CHAINED_ADD_CHANGE</em></td><td>The <strong>addSeedDocuments()</strong> method, plus documents reachable by discovery from seeds, supplies at least those matching documents that have been added or changed in the repository, within the specified time interval</td></tr>
+            <tr><td><em>MODEL_CHAINED_ADD_CHANGE_DELETE</em></td><td>The <strong>addSeedDocuments()</strong> method, plus documents reachable by discovery from seeds, supplies at least those matching documents that have been added, changed, or removed in the repository, within the specified time interval</td></tr>
           </table>
           <p></p>
-          <p>Note that the choice of model is actually much more subtle than the above description might indicate.  It may, for one thing, be affected by characteristics of the repository, such as whether the repository considers a document to have been changed if its security information was changed.  This would mean that, even though most document changes are picked up and thus one might be tempted to declare the connector to be <em>MODEL_ADD_CHANGE</em>, the correct choice would in fact be <em>MODEL_ADD</em>.</p>
-          <p></p>
-          <p>Another subtle point is what documents the connector is actually supposed to return by means of the <strong>addSeedDocuments()</strong> method.  The start time and end time parameters handed to the method do not have to be strictly adhered to, for instance; it is always okay to return more documents.  It is never okay for the connector to return fewer documents than were requested, on the other hand.</p>
+          <p>Note that the choice of model is actually much more subtle than the above description might indicate.  It may, for one thing, be affected by characteristics of
+            the repository, such as whether the repository considers a document to have been changed if its security information was changed.  This would mean that,
+            even though most document changes are picked up and thus one might be tempted to declare the connector to be <em>MODEL_ADD_CHANGE</em>, the
+            correct choice would in fact be <em>MODEL_ADD</em>.</p>
+          <p></p>
+          <p>Another subtle point is what documents the connector is actually supposed to return by means of the <strong>addSeedDocuments()</strong> method.  The
+            start time and end time parameters handed to the method do not have to be strictly adhered to, for instance; it is always okay to return more documents.  It is never
+            okay for the connector to return fewer documents than were requested, on the other hand.</p>
           <p></p>
         </section>
         <section>
@@ -139,19 +149,34 @@
         <section>
           <title>Choosing the form of the document version string</title>
           <p></p>
-          <p>The document version string is used by ManifoldCF to determine whether or not the document or configuration changed in such a way as to require that the document be reprocessed.  ManifoldCF therefore requests the version string for any document that is ready for processing, and usually does not process the document again if the returned version string agrees with the version string it has stored.</p>
-          <p></p>
-          <p>Thinking about it more carefully, it is clear that what a connector writer needs to do is include everything in the version string that could potentially affect how the document gets processed.  That may include the version of the document in the repository, bits of configuration information, metadata, and even access tokens (if the underlying repository versions these things independently from the document itself).  Storing all of that information in the version string seems like a lot - but the string is unlimited in length, and it actually serves another useful purpose to do it that way.  Specifically, when it comes time to do the actual processing, it's often the correct thing to do to obtain the necessary data out of the version string, rather than calculating it or fetching it anew.  That way of working guarantees that the document processing was done in a manner that agrees with its recorded version string, thus eliminating any chance of ManifoldCF gettin
 g confused.</p>
-          <p></p>
-          <p>For longer data that needs to persist between the <strong>getDocumentVersions()</strong> method call and the <strong>processDocuments()</strong> method call, the connector is welcome to save this information in a temporary disk file.  To help make sure nothing leaks which this approach is used, the IRepositoryConnector interface has a method that will be called to clean up any temporary files that might have been created in the handling of a given document identifier.</p>
+          <p>The document version string is used by ManifoldCF to determine whether or not the document or configuration changed in such a way as to require that the document
+            be reprocessed.  ManifoldCF therefore requests the version string for any document that is ready for processing, and usually does not process the document again if the
+            returned version string agrees with the version string it has stored.</p>
+          <p></p>
+          <p>Thinking about it more carefully, it is clear that what a connector writer needs to do is include everything in the version string that could potentially affect how the
+            document gets processed.  That may include the version of the document in the repository, bits of configuration information, metadata, and even access tokens (if the
+            underlying repository versions these things independently from the document itself).  Storing all of that information in the version string seems like a lot - but the string
+            is unlimited in length, and it actually serves another useful purpose to do it that way.  Specifically, when it comes time to do the actual processing, it's often the correct
+            thing to do to obtain the necessary data out of the version string, rather than calculating it or fetching it anew.  That way of working guarantees that the document
+            processing was done in a manner that agrees with its recorded version string, thus eliminating any chance of ManifoldCF getting confused.</p>
+          <p></p>
+          <p>For longer data that needs to persist between the <strong>getDocumentVersions()</strong> method call and the <strong>processDocuments()</strong> method
+            call, the connector is welcome to save this information in a temporary disk file.  To help make sure nothing leaks which this approach is used, the IRepositoryConnector
+            interface has a method that will be called to clean up any temporary files that might have been created in the handling of a given document identifier.</p>
           <p></p>
         </section>
         <section>
           <title>Notes on connector UI methods</title>
           <p></p>
-          <p>The crawler UI uses a tabbed layout structure, and thus each of these elements must properly implement the tabbed model.  This means that the "header" methods above must add the desired tab names to a specified array, and the "body" methods must provide appropriate HTML which handles both the case where a tab is displayed, and where it is not displayed.  Also, it makes sense to use the appropriate css definitions, so that the connector UI pages have a similar look-and-feel to the rest of ManifoldCF's crawler ui.  We strongly suggest starting with one of the supplied connector's UI code, both for a description of the arguments to each page, and for some decent ideas of ways to organize your connector's UI code.  </p>
-          <p></p>
-          <p>Please also note that it is good practice to name the form fields in your HTML in such a way that they cannot collide with form fields that may come from the framework's HTML or any specific output connector's HTML.  The <em>DocumentSpecification</em> editing HTML especially may be prone to collisions, because within any given job, this HTML is included in the same page as HTML from the chosen output connector.</p>
+          <p>The crawler UI uses a tabbed layout structure, and thus each of these elements must properly implement the tabbed model.  This means that the "header" methods
+            above must add the desired tab names to a specified array, and the "body" methods must provide appropriate HTML which handles both the case where a tab is
+            displayed, and where it is not displayed.  Also, it makes sense to use the appropriate css definitions, so that the connector UI pages have a similar look-and-feel to the
+            rest of ManifoldCF's crawler ui.  We strongly suggest starting with one of the supplied connector's UI code, both for a description of the arguments to each page, and
+            for some decent ideas of ways to organize your connector's UI code.  </p>
+          <p></p>
+          <p>Please also note that it is good practice to name the form fields in your HTML in such a way that they cannot collide with form fields that may come from the
+            framework's HTML or any specific output connector's HTML.  The <em>DocumentSpecification</em> editing HTML especially may be prone to collisions, because
+            within any given job, this HTML is included in the same page as HTML from the chosen output connector.</p>
           <p></p>
           <p></p>
         </section>
@@ -159,7 +184,8 @@
       <section>
         <title>Implementation support provided by the framework</title>
         <p></p>
-        <p>ManifoldCF's framework provides a number of helpful services designed to make the creation of a connector easier.  These services are summarized below.  (This is not an exhaustive list, by any means.)</p>
+        <p>ManifoldCF's framework provides a number of helpful services designed to make the creation of a connector easier.  These services are summarized below.
+          (This is not an exhaustive list, by any means.)</p>
         <p></p>
         <ul>
           <li>Lock management and synchronization (see <em>org.apache.manifoldcf.core.interfaces.LockManagerFactory</em>)</li>
@@ -180,7 +206,8 @@
       <section>
         <title>DO's and DON'T DO's</title>
         <p></p>
-        <p>It's always a good idea to make use of an existing infrastructure component, if it's meant for that purpose, rather than inventing your own.  There are, however, some limitations we recommend you adhere to.</p>
+        <p>It's always a good idea to make use of an existing infrastructure component, if it's meant for that purpose, rather than inventing your own.  There are, however,
+          some limitations we recommend you adhere to.</p>
         <p></p>
         <ul>
           <li>DO make use of infrastructure components described in the section above</li>
@@ -188,7 +215,8 @@
           <li>NEVER write connector code that directly uses framework database tables, other than the ones installed and managed by your connector</li>
         </ul>
         <p></p>
-        <p>If you are tempted to violate these rules, it may well mean you don't understand something important.  At the very least, we'd like to know why.  Send email to dev@manifoldcf.apache.org with a description of your problem and how you are tempted to solve it.</p>
+        <p>If you are tempted to violate these rules, it may well mean you don't understand something important.  At the very least, we'd like to know why.  Send email
+          to dev@manifoldcf.apache.org with a description of your problem and how you are tempted to solve it.</p>
       </section>
     </section>
   </body>

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/programmatic-operation.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/programmatic-operation.xml?rev=1457305&r1=1457304&r2=1457305&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/programmatic-operation.xml (original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/programmatic-operation.xml Sat Mar 16 21:03:08 2013
@@ -30,7 +30,9 @@
     <section>
       <title>Programmatic Operation</title>
       <p></p>
-      <p>A certain subset of ManifoldCF users want to think of ManifoldCF as an engine that they can poke from whatever other system they are developing.  While ManifoldCF is not precisely a document indexing engine per se, it can certainly be controlled programmatically.  Right now, there are three principle ways of achieving this control.</p>
+      <p>A certain subset of ManifoldCF users want to think of ManifoldCF as an engine that they can poke from whatever other system they are developing.  While
+        ManifoldCF is not precisely a document indexing engine per se, it can certainly be controlled programmatically.  Right now, there are three principle ways of
+        achieving this control.</p>
       <p></p>
       <section>
         <title>Control by Servlet API</title>
@@ -46,17 +48,17 @@
           <p>http[s]://<em>&lt;server_and_port&gt;</em>/mcf-api-service/json/<em>&lt;resource&gt;</em></p>
           <p></p>
           <p>The servlet ignores request data, except when the PUT or POST verb is used.  In that case, the request data is presumed to be a JSON object.  The servlet
-            responds either with an error response code (either 400 or 500) with an appropriate explanatory message, or with a 200 (OK), 201 (CREATED), or 404 (NOT FOUND)
-            response code along with a response JSON object.</p>
+            responds either with an error response code (either 400 or 500) with an appropriate explanatory message, or with a 200 (OK), 201 (CREATED), or
+            404 (NOT FOUND) response code along with a response JSON object.</p>
           <p></p>
         </section>
         <section>
           <title>JSON equivalents for ManifoldCF</title>
           <p></p>
-          <p>ManifoldCF treats certain JSON forms as equivalent, for the purposes of readability.  For example, the array form <strong>"foo" : [ { ... } ]</strong> is treated equivalently to
-            <strong>"foo" : { }</strong>, whenever there is only one array element.  This gives a coder some flexibility as to how s/he encodes JSON in requests.  Please also be aware that
-            similar compressions will occur in the JSON responses from the API servlet, and your code must be able to deal with this possibility.  The following table
-            describes some of the equivalences:</p>
+          <p>ManifoldCF treats certain JSON forms as equivalent, for the purposes of readability.  For example, the array form <strong>"foo" : [ { ... } ]</strong> is
+            treated equivalently to <strong>"foo" : { }</strong>, whenever there is only one array element.  This gives a coder some flexibility as to how s/he encodes
+            JSON in requests.  Please also be aware that similar compressions will occur in the JSON responses from the API servlet, and your code must be able to deal
+            with this possibility.  The following table describes some of the equivalences:</p>
           <p></p>
           <p></p>
           <p></p>
@@ -107,8 +109,10 @@
             <tr><td>jobstatuses/<em>&lt;job_id&gt;</em></td><td>GET</td><td>Get a specific job's status</td><td>N/A</td><td>{"jobstatus":<em>&lt;job_status_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} </td></tr>
             <tr><td>jobstatusesnocounts/<em>&lt;job_id&gt;</em></td><td>GET</td><td>Get a specific job's status, returning '0' for all counts</td><td>N/A</td><td>{"jobstatus":<em>&lt;job_status_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} </td></tr>
             <tr><td>start/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Start a specified job manually</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>startminimal/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Start a specified job manually, minimal run requested</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
             <tr><td>abort/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Abort a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
             <tr><td>restart/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Stop and start a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>restartminimal/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Stop and start a specified job, minimal run requested</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
             <tr><td>pause/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Pause a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
             <tr><td>resume/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Resume a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
 
@@ -343,6 +347,7 @@
             <tr><td>"year"</td><td>The optional year enumeration object</td></tr>
             <tr><td>"hourofday"</td><td>The optional hour-of-the-day enumeration object</td></tr>
             <tr><td>"minutesofhour"</td><td>The optional minutes-of-the-hour enumeration object</td></tr>
+            <tr><td>"requestminimum"</td><td>Optional flag indicating whether the job run will be minimal or not ("true" means minimal)</td></tr>
           </table>
           <p></p>
           <p>Each enumeration object describes an array of integers using the form:</p>
@@ -434,16 +439,21 @@
       <section>
         <title>Control by direct code</title>
         <p></p>
-        <p>Control by direct java code is quite a reasonable thing to do.  The sources of the above commands should give a pretty clear idea how to proceed, if that's what you want to do.</p>
+        <p>Control by direct java code is quite a reasonable thing to do.  The sources of the above commands should give a pretty clear idea how to proceed, if that's what you
+          want to do.</p>
         <p></p>
         <p></p>
       </section>
       <section>
         <title>Caveats</title>
         <p></p>
-        <p>The above commands know nothing about the differences between connection types.  Instead, they deal with configuration and specification information in the form of XML documents.  Normally, these XML documents are hidden from a system integrator, unless they happen to look into the database with a tool such as psql.  But the API commands above often will require such XML documents to be included as part of the command execution.</p>
+        <p>The above commands know nothing about the differences between connection types.  Instead, they deal with configuration and specification information in the
+          form of XML documents.  Normally, these XML documents are hidden from a system integrator, unless they happen to look into the database with a tool such as
+          psql.  But the API commands above often will require such XML documents to be included as part of the command execution.</p>
         <p></p>
-        <p>This has one major consequence.  Any application that would manipulate connections and jobs directly cannot be connection-type independent - these applications must know the proper form of XML to submit to the command.  So, it is not possible to use these command APIs to write one's own UI wrapper, without sacrificing some of the repository independence that ManifoldCF by itself maintains.</p>
+        <p>This has one major consequence.  Any application that would manipulate connections and jobs directly cannot be connection-type independent - these
+          applications must know the proper form of XML to submit to the command.  So, it is not possible to use these command APIs to write one's own UI wrapper,
+          without sacrificing some of the repository independence that ManifoldCF by itself maintains.</p>
       </section>
     </section>
   </body>

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-repository-connectors.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-repository-connectors.xml?rev=1457305&r1=1457304&r2=1457305&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-repository-connectors.xml (original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/ja_JP/writing-repository-connectors.xml Sat Mar 16 21:03:08 2013
@@ -73,8 +73,9 @@
           <li>LiveLink (demonstrates use of local keystore infrastructure)</li>
           <li>Meridio (local keystore, web services, result sets)</li>
           <li>SharePoint (local keystore, web services)</li>
-          <li>RSS (local keystore, binning)</li>
-          <li>Web (local database schema, local keystore, binning, events and prerequisites, cache management)</li>
+          <li>RSS (local keystore, binning, fuzzy xml parsing)</li>
+          <li>Web (local database schema, local keystore, binning, events and prerequisites, cache management, fuzzy xml parsing)</li>
+          <li>Wiki (binning, rigorous xml parsing)</li>
         </ul>
         <p></p>
         <p>You will also note that all of these connectors extend a framework-provided repository connector base class, found at <em>org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector</em>.  This base class furnishes some basic bookkeeping logic for managing the connector pool, as well as default implementations of some of the less typical functionality a connector may have.  For example, connectors are allowed to have database tables of their own, which are instantiated when the connector is registered, and are torn down when the connector is removed.  This is, however, not very typical, and the base implementation reflects that.</p>
@@ -109,15 +110,24 @@
           <p></p>
           <table>
             <tr><th>Model</th><th>Description</th></tr>
+            <tr><td><em>MODEL_ALL</em></td><td>The <strong>addSeedDocuments()</strong> method supplies all specified documents on each call</td></tr>
+            <tr><td><em>MODEL_PARTIAL</em></td><td>The <strong>addSeedDocuments()</strong> does not return a complete list of documents that match the criteria and time interval, because some of those documents are no longer discoverable</td></tr>
             <tr><td><em>MODEL_ADD</em></td><td>The <strong>addSeedDocuments()</strong> method supplies at least all the matching documents that have been added to the repository, within the specified time interval</td></tr>
             <tr><td><em>MODEL_ADD_CHANGE</em></td><td>The <strong>addSeedDocuments()</strong> method supplies at least those matching documents that have been added or changed in the repository, within the specified time interval</td></tr>
             <tr><td><em>MODEL_ADD_CHANGE_DELETE</em></td><td>The <strong>addSeedDocuments()</strong> method supplies at least those matching documents that have been added, changed, or removed in the repository, within the specified time interval</td></tr>
-            <tr><td><em>MODEL_PARTIAL</em></td><td>The <strong>addSeedDocuments()</strong> does not return a complete list of documents that match the criteria and time interval, because some of those documents are no longer discoverable</td></tr>
+            <tr><td><em>MODEL_CHAINED_ADD</em></td><td>The <strong>addSeedDocuments()</strong> method, plus documents reachable by discovery from seeds, supplies at least all the matching documents that have been added to the repository, within the specified time interval</td></tr>
+            <tr><td><em>MODEL_CHAINED_ADD_CHANGE</em></td><td>The <strong>addSeedDocuments()</strong> method, plus documents reachable by discovery from seeds, supplies at least those matching documents that have been added or changed in the repository, within the specified time interval</td></tr>
+            <tr><td><em>MODEL_CHAINED_ADD_CHANGE_DELETE</em></td><td>The <strong>addSeedDocuments()</strong> method, plus documents reachable by discovery from seeds, supplies at least those matching documents that have been added, changed, or removed in the repository, within the specified time interval</td></tr>
           </table>
           <p></p>
-          <p>Note that the choice of model is actually much more subtle than the above description might indicate.  It may, for one thing, be affected by characteristics of the repository, such as whether the repository considers a document to have been changed if its security information was changed.  This would mean that, even though most document changes are picked up and thus one might be tempted to declare the connector to be <em>MODEL_ADD_CHANGE</em>, the correct choice would in fact be <em>MODEL_ADD</em>.</p>
-          <p></p>
-          <p>Another subtle point is what documents the connector is actually supposed to return by means of the <strong>addSeedDocuments()</strong> method.  The start time and end time parameters handed to the method do not have to be strictly adhered to, for instance; it is always okay to return more documents.  It is never okay for the connector to return fewer documents than were requested, on the other hand.</p>
+          <p>Note that the choice of model is actually much more subtle than the above description might indicate.  It may, for one thing, be affected by characteristics of
+            the repository, such as whether the repository considers a document to have been changed if its security information was changed.  This would mean that,
+            even though most document changes are picked up and thus one might be tempted to declare the connector to be <em>MODEL_ADD_CHANGE</em>, the
+            correct choice would in fact be <em>MODEL_ADD</em>.</p>
+          <p></p>
+          <p>Another subtle point is what documents the connector is actually supposed to return by means of the <strong>addSeedDocuments()</strong> method.  The
+            start time and end time parameters handed to the method do not have to be strictly adhered to, for instance; it is always okay to return more documents.  It is never
+            okay for the connector to return fewer documents than were requested, on the other hand.</p>
           <p></p>
         </section>
         <section>
@@ -139,19 +149,34 @@
         <section>
           <title>Choosing the form of the document version string</title>
           <p></p>
-          <p>The document version string is used by ManifoldCF to determine whether or not the document or configuration changed in such a way as to require that the document be reprocessed.  ManifoldCF therefore requests the version string for any document that is ready for processing, and usually does not process the document again if the returned version string agrees with the version string it has stored.</p>
-          <p></p>
-          <p>Thinking about it more carefully, it is clear that what a connector writer needs to do is include everything in the version string that could potentially affect how the document gets processed.  That may include the version of the document in the repository, bits of configuration information, metadata, and even access tokens (if the underlying repository versions these things independently from the document itself).  Storing all of that information in the version string seems like a lot - but the string is unlimited in length, and it actually serves another useful purpose to do it that way.  Specifically, when it comes time to do the actual processing, it's often the correct thing to do to obtain the necessary data out of the version string, rather than calculating it or fetching it anew.  That way of working guarantees that the document processing was done in a manner that agrees with its recorded version string, thus eliminating any chance of ManifoldCF gettin
 g confused.</p>
-          <p></p>
-          <p>For longer data that needs to persist between the <strong>getDocumentVersions()</strong> method call and the <strong>processDocuments()</strong> method call, the connector is welcome to save this information in a temporary disk file.  To help make sure nothing leaks which this approach is used, the IRepositoryConnector interface has a method that will be called to clean up any temporary files that might have been created in the handling of a given document identifier.</p>
+          <p>The document version string is used by ManifoldCF to determine whether or not the document or configuration changed in such a way as to require that the document
+            be reprocessed.  ManifoldCF therefore requests the version string for any document that is ready for processing, and usually does not process the document again if the
+            returned version string agrees with the version string it has stored.</p>
+          <p></p>
+          <p>Thinking about it more carefully, it is clear that what a connector writer needs to do is include everything in the version string that could potentially affect how the
+            document gets processed.  That may include the version of the document in the repository, bits of configuration information, metadata, and even access tokens (if the
+            underlying repository versions these things independently from the document itself).  Storing all of that information in the version string seems like a lot - but the string
+            is unlimited in length, and it actually serves another useful purpose to do it that way.  Specifically, when it comes time to do the actual processing, it's often the correct
+            thing to do to obtain the necessary data out of the version string, rather than calculating it or fetching it anew.  That way of working guarantees that the document
+            processing was done in a manner that agrees with its recorded version string, thus eliminating any chance of ManifoldCF getting confused.</p>
+          <p></p>
+          <p>For longer data that needs to persist between the <strong>getDocumentVersions()</strong> method call and the <strong>processDocuments()</strong> method
+            call, the connector is welcome to save this information in a temporary disk file.  To help make sure nothing leaks which this approach is used, the IRepositoryConnector
+            interface has a method that will be called to clean up any temporary files that might have been created in the handling of a given document identifier.</p>
           <p></p>
         </section>
         <section>
           <title>Notes on connector UI methods</title>
           <p></p>
-          <p>The crawler UI uses a tabbed layout structure, and thus each of these elements must properly implement the tabbed model.  This means that the "header" methods above must add the desired tab names to a specified array, and the "body" methods must provide appropriate HTML which handles both the case where a tab is displayed, and where it is not displayed.  Also, it makes sense to use the appropriate css definitions, so that the connector UI pages have a similar look-and-feel to the rest of ManifoldCF's crawler ui.  We strongly suggest starting with one of the supplied connector's UI code, both for a description of the arguments to each page, and for some decent ideas of ways to organize your connector's UI code.  </p>
-          <p></p>
-          <p>Please also note that it is good practice to name the form fields in your HTML in such a way that they cannot collide with form fields that may come from the framework's HTML or any specific output connector's HTML.  The <em>DocumentSpecification</em> editing HTML especially may be prone to collisions, because within any given job, this HTML is included in the same page as HTML from the chosen output connector.</p>
+          <p>The crawler UI uses a tabbed layout structure, and thus each of these elements must properly implement the tabbed model.  This means that the "header" methods
+            above must add the desired tab names to a specified array, and the "body" methods must provide appropriate HTML which handles both the case where a tab is
+            displayed, and where it is not displayed.  Also, it makes sense to use the appropriate css definitions, so that the connector UI pages have a similar look-and-feel to the
+            rest of ManifoldCF's crawler ui.  We strongly suggest starting with one of the supplied connector's UI code, both for a description of the arguments to each page, and
+            for some decent ideas of ways to organize your connector's UI code.  </p>
+          <p></p>
+          <p>Please also note that it is good practice to name the form fields in your HTML in such a way that they cannot collide with form fields that may come from the
+            framework's HTML or any specific output connector's HTML.  The <em>DocumentSpecification</em> editing HTML especially may be prone to collisions, because
+            within any given job, this HTML is included in the same page as HTML from the chosen output connector.</p>
           <p></p>
           <p></p>
         </section>
@@ -159,7 +184,8 @@
       <section>
         <title>Implementation support provided by the framework</title>
         <p></p>
-        <p>ManifoldCF's framework provides a number of helpful services designed to make the creation of a connector easier.  These services are summarized below.  (This is not an exhaustive list, by any means.)</p>
+        <p>ManifoldCF's framework provides a number of helpful services designed to make the creation of a connector easier.  These services are summarized below.
+          (This is not an exhaustive list, by any means.)</p>
         <p></p>
         <ul>
           <li>Lock management and synchronization (see <em>org.apache.manifoldcf.core.interfaces.LockManagerFactory</em>)</li>
@@ -180,7 +206,8 @@
       <section>
         <title>DO's and DON'T DO's</title>
         <p></p>
-        <p>It's always a good idea to make use of an existing infrastructure component, if it's meant for that purpose, rather than inventing your own.  There are, however, some limitations we recommend you adhere to.</p>
+        <p>It's always a good idea to make use of an existing infrastructure component, if it's meant for that purpose, rather than inventing your own.  There are, however,
+          some limitations we recommend you adhere to.</p>
         <p></p>
         <ul>
           <li>DO make use of infrastructure components described in the section above</li>
@@ -188,7 +215,8 @@
           <li>NEVER write connector code that directly uses framework database tables, other than the ones installed and managed by your connector</li>
         </ul>
         <p></p>
-        <p>If you are tempted to violate these rules, it may well mean you don't understand something important.  At the very least, we'd like to know why.  Send email to dev@manifoldcf.apache.org with a description of your problem and how you are tempted to solve it.</p>
+        <p>If you are tempted to violate these rules, it may well mean you don't understand something important.  At the very least, we'd like to know why.  Send email
+          to dev@manifoldcf.apache.org with a description of your problem and how you are tempted to solve it.</p>
       </section>
     </section>
   </body>

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/job-status.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/job-status.PNG?rev=1457305&r1=1457304&r2=1457305&view=diff
==============================================================================
Binary files - no diff available.

Modified: manifoldcf/trunk/tests/filesystem/src/test/java/org/apache/manifoldcf/filesystem_tests/SanityTester.java
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/tests/filesystem/src/test/java/org/apache/manifoldcf/filesystem_tests/SanityTester.java?rev=1457305&r1=1457304&r2=1457305&view=diff
==============================================================================
--- manifoldcf/trunk/tests/filesystem/src/test/java/org/apache/manifoldcf/filesystem_tests/SanityTester.java (original)
+++ manifoldcf/trunk/tests/filesystem/src/test/java/org/apache/manifoldcf/filesystem_tests/SanityTester.java Sat Mar 16 21:03:08 2013
@@ -131,11 +131,11 @@ public class SanityTester
     if (status.getDocumentsProcessed() != 5)
       throw new ManifoldCFException("Wrong number of documents processed - expected 5, saw "+new Long(status.getDocumentsProcessed()).toString());
       
-    // Add a file and recrawl
+    // Add a file and recrawl using minimal crawl
     FileHelper.createFile(new File("testdata/testdir/test4.txt"),"Added file");
 
     // Now, start the job, and wait until it completes.
-    jobManager.manualStart(job.getID());
+    jobManager.manualStart(job.getID(),true);
     instance.waitJobInactiveNative(jobManager,job.getID(),120000L);
 
     status = jobManager.getStatus(job.getID());
@@ -143,11 +143,11 @@ public class SanityTester
     if (status.getDocumentsProcessed() != 6)
       throw new ManifoldCFException("Wrong number of documents processed after add - expected 6, saw "+new Long(status.getDocumentsProcessed()).toString());
 
-    // Change a file, and recrawl
+    // Change a file, and recrawl, once again using minimal
     FileHelper.changeFile(new File("testdata/test1.txt"),"Modified contents");
       
     // Now, start the job, and wait until it completes.
-    jobManager.manualStart(job.getID());
+    jobManager.manualStart(job.getID(),true);
     instance.waitJobInactiveNative(jobManager,job.getID(),120000L);
 
     status = jobManager.getStatus(job.getID());
@@ -159,8 +159,17 @@ public class SanityTester
       
     // Delete a file, and recrawl
     FileHelper.removeFile(new File("testdata/test2.txt"));
-      
-    // Now, start the job, and wait until it completes.
+    
+    // Do a minimal recrawl first; the delete should not be picked up.
+    jobManager.manualStart(job.getID(),true);
+    instance.waitJobInactiveNative(jobManager,job.getID(),120000L);
+
+    status = jobManager.getStatus(job.getID());
+    // The test data area has 4 documents and one directory, and we have to count the root directory too.
+    if (status.getDocumentsProcessed() != 6)
+      throw new ManifoldCFException("Wrong number of documents processed after delete with minimal crawl - expected 6, saw "+new Long(status.getDocumentsProcessed()).toString());
+    
+    // Now, do a complete crawl - the delete should be found now.
     jobManager.manualStart(job.getID());
     instance.waitJobInactiveNative(jobManager,job.getID(),120000L);