You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@manifoldcf.apache.org by mi...@apache.org on 2014/08/10 09:45:41 UTC

svn commit: r1617058 [5/8] - in /manifoldcf/trunk: connectors/rss/connector/src/main/native2ascii/org/apache/manifoldcf/crawler/connectors/rss/ framework/ui-core/src/main/native2ascii/org/apache/manifoldcf/ui/i18n/ site/src/documentation/content/xdocs/...

Added: manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/included-connectors.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/included-connectors.xml?rev=1617058&view=auto
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/included-connectors.xml (added)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/included-connectors.xml Sun Aug 10 07:45:35 2014
@@ -0,0 +1,61 @@
+<?xml version="1.0" encoding="utf-8"?>
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" 
+          "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<document> 
+
+  <header> 
+    <title></title> 
+  </header> 
+
+  <body> 
+    <section>
+      <title>连接器一览</title>
+      <p>ManifoldCF提供如下连接器:</p>
+      <p></p>
+      <table>
+        <caption>支持连接器</caption>
+        <tr><th>连接器名</th><th>连接器平台</th><th>服务器平台</th><th>客户端版本</th><th>服务器版本</th></tr>
+        <tr><td>CMIS</td><td>Java</td><td>多个</td><td>CMIS 1.0</td><td>CMIS 1.0</td></tr>
+        <tr><td>DropBox</td><td>Java</td><td>多个</td><td>1.5.3</td><td>N/A</td></tr>
+        <tr><td>Email</td><td>Java</td><td>多个</td><td>Javamail 1.4</td><td>N/A</td></tr>
+        <tr><td>File System</td><td>Java</td><td>Win/*NIX</td><td>N/A</td><td>N/A</td></tr>
+        <tr><td>Google Drive</td><td>Java</td><td>多个</td><td>v2-rev64-1.14.1-beta</td><td>N/A</td></tr>
+        <tr><td>HDFS</td><td>Java</td><td>多个</td><td>2.2.0</td><td>1.1.2</td></tr>
+        <tr><td>Windows共享</td><td>Java</td><td> Win, Samba, NetApp,其它NAS系统</td><td>N/A</td><td>N/A</td></tr>
+        <tr><td>JDBC</td><td>Java </td><td>多个</td><td>支持JDBC V2, V3, V4;Oracle 10, JTDS 1.2, Postgresql 9.1, MySQL 5.5等驱动器通过验证</td><td>多个</td></tr>
+        <tr><td>Jira</td><td>Java </td><td>多个</td><td>N/A</td><td>5.0-6.1</td></tr>
+        <tr><td>RSS</td><td>Java </td><td> N/A </td><td> N/A </td><td>Atom,RSS 2.0及其它</td></tr>
+        <tr><td>Web</td><td>Java </td><td>N/A</td><td> N/A </td><td>HTML 1.0, 1.1, 2.0, Atom, RSS 2.0及其它</td></tr>
+        <tr><td>Wiki</td><td>Java </td><td>N/A</td><td> N/A </td><td>Wiki 1.8或以上</td></tr>
+        <tr><td>LiveLink (OpenText)</td><td>Java </td><td> Win </td><td> LAPI 9.7.1 </td><td>9.2.0 - 10.2.0通过验证</td></tr>
+        <tr><td>Solr</td><td>Java </td><td> N/A </td><td> N/A</td><td>Solr 1.4, 3.6.2, 4.0.0, 4.1.0, 4.2.0, 4.3.0通过验证</td></tr>
+        <tr><td>OpenSearchServer</td><td>Java </td><td> N/A </td><td> N/A</td><td>OpenSearchServer 1.2.1, 1.2.2, 1.2.3通过验证</td></tr>
+        <tr><td>ElasticSearch</td><td>Java </td><td> N/A </td><td> N/A</td><td>ElasticSearch 0.18.3, 0.18.4, 0.18.5, 0.18.6, 0.18.7通过验证</td></tr>
+        <tr><td>Documentum (EMC)</td><td>Win, RedHat</td><td> Win, RedHat </td><td>DFC 5.3 SP5通过验证</td><td>5.3, 6.0和6.5服务器通过验证</td></tr>
+        <tr><td>SharePoint (MSFT)</td><td>Java </td><td>Win</td><td> N/A </td><td>SharePoint 2003 (2.0), 2007 (3.0)通过验证, 无Claim Space Auth条件下验证2010 (4.0)</td></tr>
+        <tr><td>Meridio (Autonomy)</td><td>Java </td><td> Win </td><td> N/A </td><td>Meridio 4.1, 5.0通过验证</td></tr>
+        <tr><td>FileNet (IBM)</td><td>Java</td><td>Win, RedHat</td><td>P8 V4.1, V4.5通过验证</td><td>P8 V4.1, V4.5通过验证</td></tr>
+      </table>
+    </section>
+  </body>
+</document>
+  

Propchange: manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/included-connectors.xml
------------------------------------------------------------------------------
    svn:executable = *

Propchange: manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/included-connectors.xml
------------------------------------------------------------------------------
    svn:mime-type = text/plain;charse=utf-8

Added: manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/index.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/index.xml?rev=1617058&view=auto
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/index.xml (added)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/index.xml Sun Aug 10 07:45:35 2014
@@ -0,0 +1,45 @@
+<?xml version="1.0" encoding="utf-8"?>
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" 
+          "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<document> 
+
+  <header> 
+    <title>欢迎来到ManifoldCF!</title> 
+  </header> 
+
+  <body> 
+    <section>
+	<title>什么是ManifoldCF?</title>
+	<p>ManifoldCF由Java程序实现,它可以搜集保存在互联网或企业网上各种服务器里的文档或Web网页,并将其发送到搜索引擎。在搜集文档的同时还在ActiveDirectory等认证系统的协同下搜集文档的权限信息,并利用这些权限信息对其搜索结果加以限制。比如,人事档案仅限由人事部搜索时才加以显示。</p>
+	<p>ManifoldCF现版本可利用下述通用连接器搜集FileNet P8 (IBM),Documentum (EMC),LiveLink (OpenText),Meridio (Autonomy), Windows共享(Microsoft),SharePoint (Microsoft)等商用系统保存着的各种文档,如:CMIS连接器,File System连接器,JDBC连接器,RSS Feed连接器,Wiki连接器,HTML连接器。而且,还可将搜集到的文档发送到Apache Solr,QBase(旧MetaCarta) GTS,OpenSearchServer,Elasticsearch。所支持的产品及规格一览请参照<a href="included-connectors.html">这里</a>。</p>
+	<p>Apache ManifoldCF由MetaCarta, Inc.开发,为供给多家企业经历5年反反复复的开发,调试之后,2009年12月其源代码赠送到Apache Software Foundation。</p>
+    </section>
+
+    <section>
+	<title>关于第三方存储库</title>
+	<p>为生成ManifoldCF包含的面向商业软件的连接器,有时需第三方软件库,包或软件本身。虽然开发人员需得到这些第三方软件,但可以有条件地编译连接器自身的源代码并作为Apache发布之。我们虽期望并努力做到使一切遵照Apache版权,但现状不会立即发生变化。</p>
+	<p>关于包括第三方软件的生成步骤,请参照Wiki页面。</p>
+    </section>
+
+  </body>
+
+</document>

Propchange: manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/index.xml
------------------------------------------------------------------------------
    svn:executable = *

Propchange: manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/index.xml
------------------------------------------------------------------------------
    svn:mime-type = text/plain;charse=utf-8

Added: manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/javadoc.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/javadoc.xml?rev=1617058&view=auto
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/javadoc.xml (added)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/javadoc.xml Sun Aug 10 07:45:35 2014
@@ -0,0 +1,64 @@
+<?xml version="1.0"?>
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" 
+          "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<document> 
+
+  <header> 
+    <title>ManifoldCF Javadoc</title> 
+  </header> 
+
+  <body> 
+    <section>
+      <title>ManifoldCF Javadoc</title>
+      <p></p>
+      <p>关于最新版ManifoldCF及其连接器的Javadoc可通过下述链接加以参照</p>
+      <p><a href="../api/framework/index.html">ManifoldCF框架</a></p>
+      <p><a href="../api/activedirectory/index.html">Active Directory权限</a></p>
+      <p><a href="../api/alfresco/index.html">Alfresco连接器</a></p>
+      <p><a href="../api/cmis/index.html">CMIS authority连接器</a></p>
+      <p><a href="../api/documentum/index.html">Documentum权限,连接器和支持处理</a></p>
+      <p><a href="../api/dropbox/index.html">Dropbox连接器</a></p>
+      <p><a href="../api/email/index.html">Email连接器</a></p>
+      <p><a href="../api/filenet/index.html">FileNet连接器和支持处理</a></p>
+      <p><a href="../api/filesystem/index.html">File system存储器和输出连接器</a></p>
+      <p><a href="../api/googledrive/index.html">GoogleDrive连接器</a></p>
+      <p><a href="../api/gridfs/index.html">GridFS连接器</a></p>
+      <p><a href="../api/gts/index.html">qBase GTS输出连接器</a></p>
+      <p><a href="../api/hdfs/index.html">HDFS存储器和输出连接器</a></p>
+      <p><a href="../api/jcifs/index.html">CIFS连接器</a></p>
+      <p><a href="../api/jira/index.html">JIRA连接器和权限</a></p>
+      <p><a href="../api/jdbc/index.html">JDBC连接器</a></p>
+      <p><a href="../api/livelink/index.html">LiveLink权限和连接器</a></p>
+      <p><a href="../api/meridio/index.html">Meridio权限和连接器</a></p>
+      <p><a href="../api/opensearchserver/index.html">OpenSearchServer输出连接器</a></p>
+      <p><a href="../api/elasticsearch/index.html">ElasticSearch输出连接器</a></p>
+      <p><a href="../api/nullauthority/index.html">Null权限</a></p>
+      <p><a href="../api/nulloutput/index.html">Null输出连接器</a></p>
+      <p><a href="../api/regexpmapper/index.html">Regular expression mapping连接器</a></p>
+      <p><a href="../api/rss/index.html">RSS连接器</a></p>
+      <p><a href="../api/sharepoint/index.html">SharePoint连接器</a></p>
+      <p><a href="../api/solr/index.html">Solr输出连接器</a></p>
+      <p><a href="../api/webcrawler/index.html">Web连接器</a></p>
+      <p><a href="../api/wiki/index.html">Wiki连接器</a></p>
+    </section>
+  </body>
+</document>

Propchange: manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/javadoc.xml
------------------------------------------------------------------------------
    svn:executable = *

Added: manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/performance-tuning.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/performance-tuning.xml?rev=1617058&view=auto
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/performance-tuning.xml (added)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/performance-tuning.xml Sun Aug 10 07:45:35 2014
@@ -0,0 +1,170 @@
+<?xml version="1.0"?>
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" 
+          "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<document> 
+
+  <header> 
+    <title>性能调优</title> 
+  </header> 
+
+  <body> 
+    <section>
+      <title>性能调优</title>
+      <p></p>
+      <p>
+        为了让ManifoldCF最大限度的发挥功能和效率, 需了解如下几点。首先,要掌握如何设置项目使其在最佳状态下运行。您还需了解最佳硬件配置。另外,毫无疑问,您需要判断您所做的一切操作是否正常结束,为此我们需要一些数据来加以比较。
+        此页旨在供您关于上述问题的答案。
+      </p>
+      <section>
+        <title>性能相关设置</title>
+        <p></p>
+        <p>
+          ManifoldCF性能调优的目标在于,最大限度的利用系统的并行处理,并保证不存在任何瓶颈使系统减慢。
+          ManifoldCF最为重要的基础是数据库,因为它是ManifoldCF所使用的唯一的持久性存储机制。故如何正确使用数据库将是我们的第一目标。
+        </p>
+        <section>
+          <title>选择数据库</title>
+          <p>
+            起初使用PostgreSQL胜过Derby,因为Derby在处理死锁时具有已知的性能问题。
+            在像ManifoldCF高度线程化的系统,数据库死锁的发生不足为奇,我们可以降低由它引发的风险,但不能完全排除。
+            换句话说,Derby具有死锁功能,不如在DELETE一个表格的同时SELECT同一个表格,此时Derby要求悬挂片刻检验该死锁。
+            显然,此动作与高性能格格不入,因此当您在意一切爬虫性能时请使用PostgreSQL。请参照<a href="how-to-build-and-deploy.html">how-to-deploy</a> 了解如何在PostgreSQL下驱动ManifoldCF。
+          </p>
+          <p>
+            某些PostgreSQL版本同样具有给ManifoldCF查询生不良计划的已知问题。当它发生时,任何大小的爬取都将变得异常缓慢。
+            ManifoldCF日志开始包含许多此类警告问题,如"Query took more than a minute"加上相应的转储计划,表明正在全表扫描一个大表格。
+            从这一点,你应怀疑你或许在使用糟糕的版本,如8.3.12。相反,8.3.7, 8.3.8,8.4.5均没有问题。
+          </p>
+        </section>
+        <section>
+          <title>正确配置PostgreSQL</title>
+          <p>
+            The key configuration changes you need to make to PostgreSQL from its 预设配置 settings are intended to:
+          </p>
+          <ul>
+            <li>Set PostgreSQL up with enough database handles so that that will not be a bottleneck;</li>
+            <li>Make sure PostgreSQL has enough shared memory allocated to support the number of handles you selected;</li>
+            <li>Turn off autovacuuming.</li>
+          </ul>
+          <p>
+            The <em>postgresql.conf</em> file is where you set most of these options.  Some recommended settings are described in <a href="how-to-build-and-deploy.html">the deployment page</a>.
+            The postgresql.conf file describes the relationship between parameters, especially between the number of database handles and the amount of shared memory allocated.  This can differ significantly
+            from version to version, so it never hurts to read the text in that file, and understand what you are trying to achieve.
+          </p>
+          <p>
+            The number of database handles you need will depend on your ManifoldCF setup.  If you use the Quick Start, for instance, fewer handles are needed, because only one process is used.  The formula relating handle count to other
+            parameters of ManifoldCF is presented below.
+          </p>
+          <p></p>
+          <p>manifoldcf_db_pool_size * number_of_manifoldcf_processes &lt;= maximum_postgresql_database_handles - 2</p>
+          <p></p>
+          <p>
+            The number of processes you might have depends on how you deployed ManifoldCF.  If you used the Quick Start, you will only have one process.  But if you deployed in a more distributed way,
+            you will have at least a process for the agents daemon, as well as at one process for each web application.  If you anticipate that a command-line utility could be used at the same time,
+            that's one more process.  These multiply quickly, so the number of database handles you need to make available can get quite large, unless you limit the ManifoldCF pool size artificially
+            instead.
+          </p>
+          <p>Setting the parameters that control the size of the database connection pool is covered in the next section.</p>
+        </section>
+        <section>
+          <title>Setting the ManifoldCF database handle pool size</title>
+          <p>
+            The database handle pool size must be set correctly, or ManifoldCF will not perform well, and may even deadlock waiting to get a database handle.
+            The properties.xml parameter that controls this is <em>org.apache.manifoldcf.database.maxhandles</em>.  The formula you should use to properly set the value is below.
+          </p>
+          <p></p>
+          <p>worker_thread_count + delete_thread_count + expiration_thread_count + cleanup_thread_count + 10 &lt; manifoldcf_db_pool_size</p>
+          <p></p>
+        </section>
+        <section>
+          <title>Setting the number of worker, delete, and expiration threads</title>
+          <p>
+            The number of each variety of thread you choose depends on a number of factors that are specific to the kinds of tasks you expect to do.
+            First, note that constraints based on your hardware may have the effect of setting an upper bound on the total number of threads.  If, for example, memory constraints
+            on your system have the effect of limiting the number of available PostgreSQL handles, the total threads will also be limited as a result of applying the formulas already given.
+          </p>
+          <p>
+            If you do not have any such constraints, then you can choose the number of threads based on other hardware factors.  Typically, the number of processors would be what you'd consider
+            in coming up with the total thread count.  A value of between 12 and 35 threads per processor is typical.  The optimal number for you will require some experimentation.
+          </p>
+          <p>The threads then have to be allocated to the worker, deletion, or expiration category.  If your work load does not require much in the way of deleting documents or expiring them,
+            it is usually adequate to retain the default of 10 deletion and 10 expiration threads, and simply adjust the worker thread count.  The worker thread count parameter is <em>org.apache.manifoldcf.crawler.threads</em>.
+            See <a href="how-to-build-and-deploy.html">the deployment page</a> for a list of all of these parameters.
+          </p>
+        </section>
+        <section>
+          <title>Database maintenance</title>
+          <p>
+            Once you have the database and ManifoldCF configurated correctly, you will discover that the performance of the system gradually degrades over time.  This is because PostgreSQL
+            requires periodic maintenance in order to function optimally.  This maintenance is called <em>vacuuming</em>.
+          </p>
+          <p>
+            Our recommendation is to vacuum on a schedule, and to use the "full" variant of the vacuum command (e.g. "VACUUM FULL").  PostgreSQL gives you the option of lesser
+            vacuums, some of which can be done in background, but in our experience these are very expensive performance-wise, and are not very helpful either.  "VACUUM FULL" makes a
+            complete new copy of the database, a table at a time, stored in an optimal way.  It is also reasonably quick, considering what it is doing.
+          </p>
+        </section>
+      </section>
+      <section>
+        <title>Some results</title>
+        <p>
+          We've run performance test on several systems.  Depending on hardware configuration, we've seen as fast as 57 documents per second to 16 documents per second.  We tested with three different systems and ran the test
+          across 306,944 documents.  The table below shows the relevant configurations and results:
+        </p>
+        <table>
+          <tr><th>System</th><th>Processors (2+ Ghz)</th><th>Memory</th><th>Disk drives</th><th>Elapsed time (seconds)</th><th>Documents per second</th></tr>
+          <tr><td>Desktop</td><td>2</td><td>8 GB</td><td>7,200 RPM</td><td>19,492</td><td>16</td></tr>
+          <tr><td>Laptop</td><td>2</td><td>4 GB</td><td>Samsung SSD RBX</td><td>9,230</td><td>33</td></tr>
+          <tr><td>Server</td><td>8</td><td>8 GB</td><td>10,000 RPM</td><td>5,366</td><td>57</td></tr>
+        </table>
+        <p>
+          For these tests, we ran the Quick-Start example configuration from ManifoldCF as is, with the exception of using an external PostgreSQL database instead of the embedded Derby.
+          We altered the ManifoldCF and PostgreSQL configuration from their default settings to maximize system resource usage.  The table below shows the key configuration changes.
+        </p>
+        <table>
+          <tr><th>Workers</th><th>ManifoldCF DB Connections</th><th>PostgreSQL Connections</th><th>Max repository connections</th><th>JVM Memory</th></tr>
+          <tr><td>100</td><td>105</td><td>200</td><td>105</td><td>1024 MB</td></tr>
+        </table>
+        <p>Additionally, we made postgresql.conf changes as shown in the table below:</p>
+        <table>
+          <tr><th>Parameter</th><th>Value</th></tr>
+          <tr><td>shared_buffers</td><td>1024MB</td></tr>
+          <tr><td>checkpoint_segments</td><td>300</td></tr>
+          <tr><td>maintenanceworkmem</td><td>2MB</td></tr>
+          <tr><td>tcpip_socket</td><td>true</td></tr>
+          <tr><td>max_connections</td><td>200</td></tr>
+          <tr><td>checkpoint_timeout</td><td>900</td></tr>
+          <tr><td>datestyle</td><td>ISO,European</td></tr>
+          <tr><td>autovacuum</td><td>off</td></tr>
+        </table>
+        <p>这里有有趣的结论,for example the use of Solid State Drives for the laptop.  Even though addressable memory was reduced to 4 GB, the system processed twice as much documents than the desktop did with slower disks.  The other interesting fact is that the server had lower performing disks, but 4 times as many processors, and it was twice as fast as the laptop.</p>
+      </section>
+    </section>
+  </body>
+</document>
+
+
+
+
+
+
+        
\ No newline at end of file

Added: manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/programmatic-operation.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/programmatic-operation.xml?rev=1617058&view=auto
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/programmatic-operation.xml (added)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/zh_CN/programmatic-operation.xml Sun Aug 10 07:45:35 2014
@@ -0,0 +1,612 @@
+<?xml version="1.0"?>
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" 
+          "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<document> 
+
+  <header> 
+    <title>Programmatic Operation</title> 
+  </header> 
+
+  <body> 
+    <section>
+      <title>Programmatic Operation</title>
+      <p></p>
+      <p>A certain subset of ManifoldCF users want to think of ManifoldCF as an engine that they can poke from whatever other system they are developing.  While
+        ManifoldCF is not precisely a document indexing engine per se, it can certainly be controlled programmatically.  Right now, there are three principle ways of
+        achieving this control.</p>
+      <p></p>
+      <section>
+        <title>Control by Servlet API</title>
+        <p></p>
+        <p>ManifoldCF provides a servlet-based JSON API that gives you the complete ability to define connections and jobs, and control job execution.  You can read
+          about JSON <a href="http://www.json.org">here</a>.  The API is designed to be RESTful in character.  Thus, it makes full use of the HTTP verbs
+          GET, PUT, POST, and DELETE, and represents objects as URLs.</p>
+        <section>
+          <title>URL format</title>
+          <p></p>
+          <p>The basic format of the JSON servlet resource URLs is as follows:</p>
+          <p></p>
+          <p>http[s]://<em>&lt;server_and_port&gt;</em>/mcf-api-service/json/<em>&lt;resource&gt;</em></p>
+          <p></p>
+          <p>The servlet ignores request data, except when the PUT or POST verb is used.  In that case, the request data is presumed to be a JSON object.  The servlet
+            responds either with an error response code (either 400 or 500) with an appropriate explanatory message, or with a 200 (OK), 201 (CREATED), or
+            404 (NOT FOUND) response code along with a response JSON object.</p>
+          <p></p>
+        </section>
+        <section>
+          <title>JSON equivalents for ManifoldCF</title>
+          <p></p>
+          <p>ManifoldCF treats certain JSON forms as equivalent, for the purposes of readability.  For example, the array form <strong>"foo" : [ { ... } ]</strong> is
+            treated equivalently to <strong>"foo" : { }</strong>, whenever there is only one array element.  This gives a coder some flexibility as to how s/he encodes
+            JSON in requests.  Please also be aware that similar compressions will occur in the JSON responses from the API servlet, and your code must be able to deal
+            with this possibility.  The following table describes some of the equivalences:</p>
+          <p></p>
+          <p></p>
+          <p></p>
+          <table>
+            <tr><th>Form</th><th>Equivalent</th></tr>
+            <tr><td>[ { ... } ]</td><td>{ ... }</td></tr>
+            <tr><td>"foo" : { "_value_" : "bar" }</td><td>"foo" : "bar"</td></tr>
+            <tr><td>"_children_" : [ "foo":{ ... }, "foo":{ ... } ]</td><td>"foo" : [ { ... }, { ... } ]</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Resources and commands</title>
+          <p></p>
+          <p>The actual available resources and commands are as follows:</p>
+          <p></p>
+          <p></p>
+          <p></p>
+          <table>
+            <tr><th>Resource</th><th>Verb</th><th>What it does</th><th>Input format/query args</th><th>Output format</th></tr>
+            <tr><td>authorizationdomains</td><td>GET</td><td>List all registered authorization domains</td><td>N/A</td><td>{"authorizationdomain":[<em>&lt;list_of_authorization_domain_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>outputconnectors</td><td>GET</td><td>List all registered output connectors</td><td>N/A</td><td>{"outputconnector":[<em>&lt;list_of_output_connector_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>transformationconnectors</td><td>GET</td><td>List all registered transformation connectors</td><td>N/A</td><td>{"transformationconnector":[<em>&lt;list_of_transformation_connector_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>mappingconnectors</td><td>GET</td><td>List all registered mapping connectors</td><td>N/A</td><td>{"mappingconnector":[<em>&lt;list_of_mapping_connector_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>authorityconnectors</td><td>GET</td><td>List all registered authority connectors</td><td>N/A</td><td>{"authorityconnector":[<em>&lt;list_of_authority_connector_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>repositoryconnectors</td><td>GET</td><td>List all registered repository connectors</td><td>N/A</td><td>{"repositoryconnector":[<em>&lt;list_of_repository_connector_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>authoritygroups</td><td>GET</td><td>List all authority groups</td><td>N/A</td><td>{"authoritygroup":[<em>&lt;list_of_authority_group_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>authoritygroups/<em>&lt;encoded_group_name&gt;</em></td><td>GET</td><td>Get a specific authority group</td><td>N/A</td><td>{"authoritygroup":<em>&lt;authority_group_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>authoritygroups/<em>&lt;encoded_group_name&gt;</em></td><td>PUT</td><td>Save or create an authority group</td><td>{"authoritygroup":<em>&lt;authority_group_object&gt;</em>}</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>authoritygroups/<em>&lt;encoded_group_name&gt;</em></td><td>DELETE</td><td>Delete an authority group</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>outputconnections</td><td>GET</td><td>List all output connections</td><td>N/A</td><td>{"outputconnection":[<em>&lt;list_of_output_connection_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>outputconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Get a specific output connection</td><td>N/A</td><td>{"outputconnection":<em>&lt;output_connection_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>outputconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>PUT</td><td>Save or create an output connection</td><td>{"outputconnection":<em>&lt;output_connection_object&gt;</em>}</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>outputconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>DELETE</td><td>Delete an output connection</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>status/outputconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Check the status of an output connection</td><td>N/A</td><td>{"check_result":<em>&lt;message&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>info/outputconnections/<em>&lt;encoded_connection_name&gt;</em>/<em>&lt;connector_specific_resource&gt;</em></td><td>GET</td><td>Retrieve arbitrary connector-specific resource</td><td>N/A</td><td><em>&lt;response_data&gt;</em> <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} <strong>OR</strong> {"service_interruption":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>transformationconnections</td><td>GET</td><td>List all transformation connections</td><td>N/A</td><td>{"transformationconnection":[<em>&lt;list_of_transformation_connection_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>transformationconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Get a specific transformation connection</td><td>N/A</td><td>{"transformationconnection":<em>&lt;transformation_connection_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>transformationconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>PUT</td><td>Save or create a transformation connection</td><td>{"outputconnection":<em>&lt;transformation_connection_object&gt;</em>}</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>transformationconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>DELETE</td><td>Delete a transformation connection</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>status/transformationconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Check the status of a transformation connection</td><td>N/A</td><td>{"check_result":<em>&lt;message&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>info/transformationconnections/<em>&lt;encoded_connection_name&gt;</em>/<em>&lt;connector_specific_resource&gt;</em></td><td>GET</td><td>Retrieve arbitrary connector-specific resource</td><td>N/A</td><td><em>&lt;response_data&gt;</em> <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} <strong>OR</strong> {"service_interruption":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>clearversions/<em>&lt;encoded_output_connection_name&gt;</em></td><td>PUT</td><td>Forget previous indexed document versions</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>clearrecords/<em>&lt;encoded_output_connection_name&gt;</em></td><td>PUT</td><td>Remove all previous indexing records</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>mappingconnections</td><td>GET</td><td>List all mapping connections</td><td>N/A</td><td>{"mappingconnection":[<em>&lt;list_of_mapping_connection_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>mappingconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Get a specific mapping connection</td><td>N/A</td><td>{"mappingconnection":<em>&lt;mapping_connection_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>mappingconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>PUT</td><td>Save or create a mapping connection</td><td>{"mappingconnection":<em>&lt;mapping_connection_object&gt;</em>}</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>mappingconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>DELETE</td><td>Delete a mapping connection</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>status/mappingconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Check the status of a mapping connection</td><td>N/A</td><td>{"check_result":<em>&lt;message&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>authorityconnections</td><td>GET</td><td>List all authority connections</td><td>N/A</td><td>{"authorityconnection":[<em>&lt;list_of_authority_connection_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>authorityconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Get a specific authority connection</td><td>N/A</td><td>{"authorityconnection":<em>&lt;authority_connection_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>authorityconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>PUT</td><td>Save or create an authority connection</td><td>{"authorityconnection":<em>&lt;authority_connection_object&gt;</em>}</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>authorityconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>DELETE</td><td>Delete an authority connection</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>status/authorityconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Check the status of an authority connection</td><td>N/A</td><td>{"check_result":<em>&lt;message&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>repositoryconnections</td><td>GET</td><td>List all repository connections</td><td>N/A</td><td>{"repositoryconnection":[<em>&lt;list_of_repository_connection_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>repositoryconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Get a specific repository connection</td><td>N/A</td><td>{"repositoryconnection":<em>&lt;repository_connection_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>repositoryconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>PUT</td><td>Save or create a repository connection</td><td>{"repositoryconnection":<em>&lt;repository_connection_object&gt;</em>}</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>repositoryconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>DELETE</td><td>Delete a repository connection</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>status/repositoryconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Check the status of a repository connection</td><td>N/A</td><td>{"check_result":<em>&lt;message&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>info/repositoryconnections/<em>&lt;encoded_connection_name&gt;</em>/<em>&lt;connector_specific_resource&gt;</em></td><td>GET</td><td>Retrieve arbitrary connector-specific resource</td><td>N/A</td><td><em>&lt;response_data&gt;</em> <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} <strong>OR</strong> {"service_interruption":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>clearhistory/<em>&lt;encoded_repository_connection_name&gt;</em></td><td>PUT</td><td>Clear history linked with repository connection</td><td>N/A</td><td><em>&lt;response_data&gt;</em> <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} <strong>OR</strong> {"service_interruption":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>jobs</td><td>GET</td><td>List all job definitions</td><td>N/A</td><td>{"job":[<em>&lt;list_of_job_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>jobs</td><td>POST</td><td>Create a job</td><td>{"job":<em>&lt;job_object&gt;</em>}</td><td>{"job_id":<em>&lt;job_identifier&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>jobs/<em>&lt;job_id&gt;</em></td><td>GET</td><td>Get a specific job definition</td><td>N/A</td><td>{"job":<em>&lt;job_object_&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>jobs/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Save a job definition</td><td>{"job":<em>&lt;job_object&gt;</em>}</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>jobs/<em>&lt;job_id&gt;</em></td><td>DELETE</td><td>Delete a job definition</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>jobstatuses</td><td>GET</td><td>List all jobs and their status</td><td>maxcount=&lt;maximum_documents_to_count&gt;</td><td>{"jobstatus":[<em>&lt;list_of_job_status_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>jobstatuses/<em>&lt;job_id&gt;</em></td><td>GET</td><td>Get a specific job's status</td><td>maxcount=&lt;maximum_documents_to_count&gt;</td><td>{"jobstatus":<em>&lt;job_status_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} </td></tr>
+            <tr><td>jobstatusesnocounts<em>&lt;job_id&gt;</em></td><td>GET</td><td>List all jobs and their status, returning '0' for all counts</td><td>N/A</td><td>{"jobstatus":[<em>&lt;list_of_job_status_objects&gt;</em>]} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} </td></tr>
+            <tr><td>jobstatusesnocounts/<em>&lt;job_id&gt;</em></td><td>GET</td><td>Get a specific job's status, returning '0' for all counts</td><td>N/A</td><td>{"jobstatus":<em>&lt;job_status_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} </td></tr>
+            <tr><td>start/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Start a specified job manually</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>startminimal/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Start a specified job manually, minimal run requested</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>abort/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Abort a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>restart/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Stop and start a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>restartminimal/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Stop and start a specified job, minimal run requested</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>pause/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Pause a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>resume/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Resume a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>reseed/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Reset incremental seeding for a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+
+            <tr><td>repositoryconnectionhistory/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Get a history report</td><td><em>&lt;history_query_parameters&gt;</em></td><td>{"row":[{"column":[{"name":<em>&lt;col_name&gt;</em>,"value":<em>&lt;col_value&gt;</em>}, ...]}, ...]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>repositoryconnectionquery/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Get a queue report</td><td><em>&lt;queue_query_parameters&gt;</em></td><td>{"row":[{"column":[{"name":<em>&lt;col_name&gt;</em>,"value":<em>&lt;col_value&gt;</em>}, ...]}, ...]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>repositoryconnectionactivities/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Get a list of legal activities for a connection</td><td>N/A</td><td>{"activity":[<em>&lt;activity_name&gt;</em>, ...]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+            <tr><td>repositoryconnectionjobs/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Get a list of jobs for a connection</td><td>N/A</td><td>{"job":[<em>&lt;list_of_job_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>History query parameters</title>
+          <p></p>
+          <p>The history query parameters and their meanings are as follows:</p>
+          <table>
+            <tr><th>Parameter</th><th>Report type</th><th>Multivalued?</th><th>Meaning</th></tr>
+            <tr><td>report</td><td>All</td><td>No</td><td>The kind of history report desired; legal values are "simple", "maxactivity", "maxbandwidth", and "result"; defaults to "simple"</td></tr>
+            <tr><td>starttime</td><td>All</td><td>No</td><td>Starting time in ms since epoch; defaults to "0"</td></tr>
+            <tr><td>endtime</td><td>All</td><td>No</td><td>Ending time in ms since epoch; defaults to now</td></tr>
+            <tr><td>activity</td><td>All</td><td>Yes</td><td>Which activities you want to see</td></tr>
+            <tr><td>entitymatch</td><td>All</td><td>No</td><td>Regular expression matching entity identifier; defaults to ""</td></tr>
+            <tr><td>entitymatch_insensitive</td><td>All</td><td>No</td><td>Case insensitive version of entitymatch</td></tr>
+            <tr><td>resultcodematch</td><td>All</td><td>No</td><td>Regular expression match result code; defaults to ""</td></tr>
+            <tr><td>resultcodematch_insensitive</td><td>All</td><td>No</td><td>Case insensitive version of resultcodematch</td></tr>
+            <tr><td>sortcolumn</td><td>All</td><td>Yes</td><td>Result column to sort the result by</td></tr>
+            <tr><td>sortcolumn_direction</td><td>All</td><td>Yes</td><td>Direction to sort the corresponding column ("ascending" or "descending")</td></tr>
+            <tr><td>startrow</td><td>All</td><td>No</td><td>Starting row in resultset to return; defaults to 0</td></tr>
+            <tr><td>rowcount</td><td>All</td><td>No</td><td>Maximum number of rows to return; defaults to 20</td></tr>
+            <tr><td>idbucket</td><td>maxactivity, maxbandwidth, result</td><td>No</td><td>Regular expression selecting which part of the entity identifier to use as an aggregation key; defaults to "()"</td></tr>
+            <tr><td>idbucket_insensitive</td><td>maxactivity, maxbandwidth, result</td><td>No</td><td>Case insensitive version of idbucket</td></tr>
+            <tr><td>resultcodebucket</td><td>result</td><td>No</td><td>Regular expression selecting which part of the result code to use as an aggregation key; defaults to "(.*)"</td></tr>
+            <tr><td>resultcodebucket_insensitive</td><td>result</td><td>No</td><td>Case insensitive version of resultcodebucket</td></tr>
+            <tr><td>interval</td><td>maxactivity, maxbandwidth</td><td>No</td><td>Size of window in milliseconds for assessing rate; defaults to 300000</td></tr>
+          </table>
+          <p></p>
+          <p>Each report type has different return columns, as listed below:</p>
+          <p></p>
+          <table>
+            <tr><th>Report type</th><th>Return columns</th></tr>
+            <tr><td>simple</td><td>starttime, resultcode, resultdesc, identifier, activity, bytes, elapsedtime</td></tr>
+            <tr><td>maxactivity</td><td>starttime, endtime, activitycount, idbucket</td></tr>
+            <tr><td>maxbandwidth</td><td>starttime, endtime, bytecount, idbucket</td></tr>
+            <tr><td>result</td><td>idbucket, resultcodebucket, eventcount</td></tr>
+          </table>
+        </section>
+        <section>
+          <title>Queue query parameters</title>
+          <p></p>
+          <p>The queue query parameters and their meanings are as follows:</p>
+          <table>
+            <tr><th>Parameter</th><th>Report type</th><th>Multivalued?</th><th>Meaning</th></tr>
+            <tr><td>report</td><td>All</td><td>No</td><td>The kind of queue report desired; legal values are "document" or "status"; defaults to "document"</td></tr>
+            <tr><td>now</td><td>All</td><td>No</td><td>The time in milliseconds since epoch to perform the queue assessment relative to; defaults to current time</td></tr>
+            <tr><td>idmatch</td><td>All</td><td>No</td><td>Regular expression matching document identifier; defaults to ""</td></tr>
+            <tr><td>idmatch_insensitive</td><td>All</td><td>No</td><td>Case insensitive version of idmatch</td></tr>
+            <tr><td>statematch</td><td>All</td><td>Yes</td><td>State to match; valid values are "neverprocessed", "previouslyprocessed", "outofscope"</td></tr>
+            <tr><td>statusmatch</td><td>All</td><td>Yes</td><td>Status to match; valid values are "inactive", "processing", "expiring", "deleting", "readyforprocessing", "readyforexpiration", "waitingforprocessing", "waitingforexpiration", "waitingforever", and "hopcountexceeded"</td></tr>
+            <tr><td>sortcolumn</td><td>All</td><td>Yes</td><td>Result column to sort the result by</td></tr>
+            <tr><td>sortcolumn_direction</td><td>All</td><td>Yes</td><td>Direction to sort the corresponding column ("ascending" or "descending")</td></tr>
+            <tr><td>startrow</td><td>All</td><td>No</td><td>Starting row in resultset to return; defaults to 0</td></tr>
+            <tr><td>rowcount</td><td>All</td><td>No</td><td>Maximum number of rows to return; defaults to 20</td></tr>
+            <tr><td>idbucket</td><td>status</td><td>No</td><td>Regular expression selecting which part of the document identifier to use as an aggregation key; defaults to "()"</td></tr>
+            <tr><td>idbucket_insensitive</td><td>status</td><td>No</td><td>Case insensitive version of idbucket</td></tr>
+          </table>
+          <p></p>
+          <p>Each report type has different return columns, as listed below:</p>
+          <p></p>
+          <table>
+            <tr><th>Report type</th><th>Return columns</th></tr>
+            <tr><td>document</td><td>identifier, job, state, status, scheduled, action, retrycount, retrylimit</td></tr>
+            <tr><td>status</td><td>idbucket, inactive, processing, expiring, deleting, processready, expireready, processwaiting, expirewaiting, waitingforever, hopcountexceeded</td></tr>
+          </table>
+        </section>
+        <section>
+          <title>Authorization domain objects</title>
+          <p></p>
+          <p>The JSON fields an authorization domain object has are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"description"</td><td>The optional description of the authorization domain</td></tr>
+            <tr><td>"domain_name"</td><td>The internal name of the authorization domain, i.e. what is sent to the Authority Service</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Output connector objects</title>
+          <p></p>
+          <p>The JSON fields an output connector object has are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"description"</td><td>The optional description of the connector</td></tr>
+            <tr><td>"class_name"</td><td>The class name of the class implementing the connector</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Transformation connector objects</title>
+          <p></p>
+          <p>The JSON fields a transformation connector object has are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"description"</td><td>The optional description of the connector</td></tr>
+            <tr><td>"class_name"</td><td>The class name of the class implementing the connector</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Mapping connector objects</title>
+          <p></p>
+          <p>The JSON fields a mapping connector object has are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"description"</td><td>The optional description of the connector</td></tr>
+            <tr><td>"class_name"</td><td>The class name of the class implementing the connector</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Authority connector objects</title>
+          <p></p>
+          <p>The JSON fields an authority connector object has are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"description"</td><td>The optional description of the connector</td></tr>
+            <tr><td>"class_name"</td><td>The class name of the class implementing the connector</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Repository connector objects</title>
+          <p></p>
+          <p>The JSON fields a repository connector object has are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"description"</td><td>The optional description of the connector</td></tr>
+            <tr><td>"class_name"</td><td>The class name of the class implementing the connector</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Authority group objects</title>
+          <p></p>
+          <p>Authority group names, when they are part of a URL, should be encoded as follows:</p>
+          <p></p>
+          <ol>
+            <li>All instances of '.' should be replaced by '..'.</li>
+            <li>All instances of '/' should be replaced by '.+'.</li>
+            <li>The URL should be encoded using standard URL utf-8-based %-encoding.</li>
+          </ol>
+          <p></p>
+          <p>The JSON fields an authority group object has are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"name"</td><td>The unique name of the group</td></tr>
+            <tr><td>"description"</td><td>The description of the group</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Output connection objects</title>
+          <p></p>
+          <p>Output connection names, when they are part of a URL, should be encoded as follows:</p>
+          <p></p>
+          <ol>
+            <li>All instances of '.' should be replaced by '..'.</li>
+            <li>All instances of '/' should be replaced by '.+'.</li>
+            <li>The URL should be encoded using standard URL utf-8-based %-encoding.</li>
+          </ol>
+          <p></p>
+          <p>The JSON fields an output connection object has are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"name"</td><td>The unique name of the connection</td></tr>
+            <tr><td>"description"</td><td>The description of the connection</td></tr>
+            <tr><td>"class_name"</td><td>The java class name of the class implementing the connection</td></tr>
+            <tr><td>"max_connections"</td><td>The total number of outstanding connections allowed to exist at a time</td></tr>
+            <tr><td>"configuration"</td><td>The configuration object for the connection, which is specific to the connection class</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Transformation connection objects</title>
+          <p></p>
+          <p>Transformation connection names, when they are part of a URL, should be encoded as follows:</p>
+          <p></p>
+          <ol>
+            <li>All instances of '.' should be replaced by '..'.</li>
+            <li>All instances of '/' should be replaced by '.+'.</li>
+            <li>The URL should be encoded using standard URL utf-8-based %-encoding.</li>
+          </ol>
+          <p></p>
+          <p>The JSON fields an output connection object has are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"name"</td><td>The unique name of the connection</td></tr>
+            <tr><td>"description"</td><td>The description of the connection</td></tr>
+            <tr><td>"class_name"</td><td>The java class name of the class implementing the connection</td></tr>
+            <tr><td>"max_connections"</td><td>The total number of outstanding connections allowed to exist at a time</td></tr>
+            <tr><td>"configuration"</td><td>The configuration object for the connection, which is specific to the connection class</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Mapping connection objects</title>
+          <p></p>
+          <p>Mapping connection names, when they are part of a URL, should be encoded as follows:</p>
+          <p></p>
+          <ol>
+            <li>All instances of '.' should be replaced by '..'.</li>
+            <li>All instances of '/' should be replaced by '.+'.</li>
+            <li>The URL should be encoded using standard URL utf-8-based %-encoding.</li>
+          </ol>
+          <p></p>
+          <p>The JSON fields for a mapping connection object are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"name"</td><td>The unique name of the connection</td></tr>
+            <tr><td>"description"</td><td>The description of the connection</td></tr>
+            <tr><td>"class_name"</td><td>The java class name of the class implementing the connection</td></tr>
+            <tr><td>"max_connections"</td><td>The total number of outstanding connections allowed to exist at a time</td></tr>
+            <tr><td>"configuration"</td><td>The configuration object for the connection, which is specific to the connection class</td></tr>
+            <tr><td>"prerequisite"</td><td>The mapping connection prerequisite, if any</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Authority connection objects</title>
+          <p></p>
+          <p>Authority connection names, when they are part of a URL, should be encoded as follows:</p>
+          <p></p>
+          <ol>
+            <li>All instances of '.' should be replaced by '..'.</li>
+            <li>All instances of '/' should be replaced by '.+'.</li>
+            <li>The URL should be encoded using standard URL utf-8-based %-encoding.</li>
+          </ol>
+          <p></p>
+          <p>The JSON fields for an authority connection object are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"name"</td><td>The unique name of the connection</td></tr>
+            <tr><td>"description"</td><td>The description of the connection</td></tr>
+            <tr><td>"class_name"</td><td>The java class name of the class implementing the connection</td></tr>
+            <tr><td>"max_connections"</td><td>The total number of outstanding connections allowed to exist at a time</td></tr>
+            <tr><td>"configuration"</td><td>The configuration object for the connection, which is specific to the connection class</td></tr>
+            <tr><td>"prerequisite"</td><td>The mapping connection prerequisite, if any</td></tr>
+            <tr><td>"authdomain"</td><td>The authorization domain for the authority connection, if any</td></tr>
+            <tr><td>"authgroup"</td><td>The required authority group for the authority connection</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Repository connection objects</title>
+          <p></p>
+          <p>Repository connection names, when they are part of a URL, should be encoded as follows:</p>
+          <p></p>
+          <ol>
+            <li>All instances of '.' should be replaced by '..'.</li>
+            <li>All instances of '/' should be replaced by '.+'.</li>
+            <li>The URL should be encoded using standard URL utf-8-based %-encoding.</li>
+          </ol>
+          <p></p>
+          <p>The JSON fields for a repository connection object are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"name"</td><td>The unique name of the connection</td></tr>
+            <tr><td>"description"</td><td>The description of the connection</td></tr>
+            <tr><td>"class_name"</td><td>The java class name of the class implementing the connection</td></tr>
+            <tr><td>"max_connections"</td><td>The total number of outstanding connections allowed to exist at a time</td></tr>
+            <tr><td>"configuration"</td><td>The configuration object for the connection, which is specific to the connection class</td></tr>
+            <tr><td>"acl_authority"</td><td>The (optional) name of the authority group that will enforce security for this connection</td></tr>
+            <tr><td>"throttle"</td><td>An array of throttle objects, which control how quickly documents can be requested from this connection</td></tr>
+          </table>
+          <p></p>
+          <p>Each throttle object has the following fields:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"match"</td><td>The regular expression which is used to match a document's bins to determine if the throttle should be applied</td></tr>
+            <tr><td>"match_description"</td><td>Optional text describing the meaning of the throttle</td></tr>
+            <tr><td>"rate"</td><td>The maximum fetch rate to use if the throttle applies, in fetches per minute</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Job objects</title>
+          <p></p>
+          <p>The JSON fields for a job are is as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"id"</td><td>The job's identifier, if present.  If not present, ManifoldCF will create one (and will also create the job when saved).</td></tr>
+            <tr><td>"description"</td><td>Text describing the job</td></tr>
+            <tr><td>"repository_connection"</td><td>The name of the repository connection to use with the job</td></tr>
+            <tr><td>"output_connection"</td><td>The name of the output connection to use with the job</td></tr>
+            <tr><td>"document_specification"</td><td>The document specification object for the job, whose format is repository-connection specific</td></tr>
+            <tr><td>"output_specification"</td><td>The output specification object for the job, whose format is output-connection specific</td></tr>
+            <tr><td>"start_mode"</td><td>The start mode for the job, which can be one of "schedule window start", "schedule window anytime", or "manual"</td></tr>
+            <tr><td>"run_mode"</td><td>The run mode for the job, which can be either "continuous" or "scan once"</td></tr>
+            <tr><td>"hopcount_mode"</td><td>The hopcount mode for the job, which can be either "accurate", "no delete", "never delete"</td></tr>
+            <tr><td>"priority"</td><td>The job's priority, typically "5"</td></tr>
+            <tr><td>"recrawl_interval"</td><td>The default time between recrawl of documents (if the job is "continuous"), in milliseconds, or "infinite" for infinity</td></tr>
+            <tr><td>"expiration_interval"</td><td>The time until a document expires (if the job is "continuous"), in milliseconds, or "infinite" for infinity</td></tr>
+            <tr><td>"reseed_interval"</td><td>The time between reseeding operations (if the job is "continuous"), in milliseconds, or "infinite" for infinity</td></tr>
+            <tr><td>"hopcount"</td><td>An array of hopcount objects, describing the link types and associated maximum hops permitted for the job</td></tr>
+            <tr><td>"schedule"</td><td>An array of schedule objects, describing when the job should be started and run</td></tr>
+            <tr><td>"forcedparam"</td><td>An array of forcedparam objects, describing what forced parameters should be set</td></tr>
+            <tr><td>"pipelinestage"</td><td>An array of pipelinestage objects, describing what the transformation pipeline is</td></tr>
+          </table>
+          <p></p>
+          <p>Each pipelinestage object has the following fields:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"stage_connectionname"</td><td>The connection name for the pipeline stage</td></tr>
+            <tr><td>"stage_description"</td><td>A description of the pipeline stage</td></tr>
+            <tr><td>"stage_specification"</td><td>The transformation specification for the pipeline stage</td></tr>
+          </table>
+          <p></p>
+          <p>Each forcedparam object has the following fields:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"paramname"</td><td>The name of the parameter</td></tr>
+            <tr><td>"paramvalue"</td><td>The value of the parameter</td></tr>
+          </table>
+          <p></p>
+          <p>Each hopcount object has the following fields:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"link_type"</td><td>The connection-type-dependent type of a link for which a hop count restriction is specified</td></tr>
+            <tr><td>"count"</td><td>The maximum number of hops allowed for the associated link type, starting at a seed</td></tr>
+          </table>
+          <p></p>
+          <p>Each schedule object has the following fields:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"timezone"</td><td>The optional time zone for the schedule object; if not present the default server time zone is used</td></tr>
+            <tr><td>"duration"</td><td>The optional length of the described time window, in milliseconds; if not present, duration is considered infinite</td></tr>
+            <tr><td>"dayofweek"</td><td>The optional day-of-the-week enumeration object</td></tr>
+            <tr><td>"monthofyear"</td><td>The optional month-of-the-year enumeration object</td></tr>
+            <tr><td>"dayofmonth"</td><td>The optional day-of-the-month enumeration object</td></tr>
+            <tr><td>"year"</td><td>The optional year enumeration object</td></tr>
+            <tr><td>"hourofday"</td><td>The optional hour-of-the-day enumeration object</td></tr>
+            <tr><td>"minutesofhour"</td><td>The optional minutes-of-the-hour enumeration object</td></tr>
+            <tr><td>"requestminimum"</td><td>Optional flag indicating whether the job run will be minimal or not ("true" means minimal)</td></tr>
+          </table>
+          <p></p>
+          <p>Each enumeration object describes an array of integers using the form:</p>
+          <p></p>
+          <p>{"value":[<em>&lt;integer_list&gt;</em>]}</p>
+          <p></p>
+          <p>Each integer is a zero-based index describing which entity is being specified.  For example, for "dayofweek", 0 corresponds to Sunday, etc., and thus "dayofweek":{"value":[0,6]} would describe Saturdays and Sundays.</p>
+          <p></p>
+        </section>
+        <section>
+          <title>Job status objects</title>
+          <p></p>
+          <p>The JSON fields of a job status object are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"job_id"</td><td>The job identifier</td></tr>
+            <tr><td>"status"</td><td>The job status, having the possible values: "not yet run", "running", "paused", "done", "waiting", "stopping", "resuming", "starting up", "cleaning up", "error", "aborting", "restarting", "running no connector", and "terminating"</td></tr>
+            <tr><td>"error_text"</td><td>The error text, if the status is "error"</td></tr>
+            <tr><td>"start_time"</td><td>The job start time, in milliseconds since Jan 1, 1970</td></tr>
+            <tr><td>"end_time"</td><td>The job end time, in milliseconds since Jan 1, 1970</td></tr>
+            <tr><td>"documents_in_queue"</td><td>The total number of documents in the queue for the job</td></tr>
+            <tr><td>"documents_outstanding"</td><td>The number of documents for the job that are currently considered 'active'</td></tr>
+            <tr><td>"documents_processed"</td><td>The number of documents that in the queue for the job that have been processed at least once</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Connection-type-specific objects</title>
+          <p></p>
+          <p>As you may note when trying to use the above JSON API methods, you cannot get very far in defining connections or jobs without knowing the JSON format of a connection's configuration information, or a job's connection-specific document specification and output specification information.  The form of these objects is controlled by the Java implementation of the underlying connector, and is translated directly into JSON, so if you write your own connector you should be able to figure out what it will be in the API.  For connectors already part of ManifoldCF, it remains an ongoing task to document these connector-specific objects.  This task is not yet underway.</p>
+          <p></p>
+          <p>Luckily, it is pretty easy to learn a lot about the objects in question by simply creating connections and jobs in the ManifoldCF crawler UI, and then inspecting the resulting JSON objects through the API.  In this way, it should be possible to do a decent job of coding most API-based integrations.  The one place where difficulties will certainly occur will be if you try to completely replace the ManifoldCF crawler UI with one of your own.  This is because most connectors have methods that communicate with their respective back-ends in order to allow the user to select appropriate values.  For example, the path drill-down that is presented by the LiveLink connector requires that the connector interrogate the appropriate LiveLink repository in order to populate its path selection pull-downs.  There is, at this time, only one sanctioned way to accomplish the same job using the API, which is to use the appropriate "<em>connection_type</em>/execute/<em>type-specific_command
 </em>" command to perform the necessary functions.  Some set of useful functions has been coded for every appropriate connector, but the exact commands for every connector, and their JSON syntax, remains undocumented for now.</p>
+          <p></p>
+        </section>
+        <section>
+          <title>File system connector</title>
+          <p></p>
+          <p>The file system connector has no configuration information, and no connector-specific commands.  However, it does have document specification information.  The information looks something like this:</p>
+          <p></p>
+          <p>{"startpoint":[{"_attribute_path":"c:\path_to_files","include":[{"_attribute_type":"file","_attribute_match":"*.txt"},{"_attribute_type":"file","_attribute_match":"*.doc"\,"_attribute_type":"directory","_attribute_match":"*"],"exclude":["*.mov"]]}</p>
+          <p></p>
+          <p>As you can see, multiple starting paths are possible, and the inclusion and exclusion rules also can be one or multiple.</p>
+          <p></p>
+          <p></p>
+        </section>
+      </section>
+      <section>
+        <title>Control via Commands</title>
+        <p></p>
+        <p>For script writers, there currently exist a number of ManifoldCF execution commands.  These commands are primarily rich in the area of definition of connections and jobs, controlling jobs, and running reports.  The following table lists the current suite.</p>
+        <p></p>
+        <table>
+          <tr><th>Command</th><th>What it does</th></tr>
+          <tr><td>org.apache.manifoldcf.agents.DefineOutputConnection</td><td>Create a new output connection</td></tr>
+          <tr><td>org.apache.manifoldcf.agents.DeleteOutputConnection</td><td>Delete an existing output connection</td></tr>
+          <tr><td>org.apache.manifoldcf.agents.DefineTransformationConnection</td><td>Create a new transformation connection</td></tr>
+          <tr><td>org.apache.manifoldcf.agents.DeleteTransformationConnection</td><td>Delete an existing transformation connection</td></tr>
+          <tr><td>org.apache.manifoldcf.authorities.ChangeAuthSpec</td><td>Modify an authority's configuration information</td></tr>
+          <tr><td>org.apache.manifoldcf.authorities.CheckAll</td><td>Check all authorities to be sure they are functioning</td></tr>
+          <tr><td>org.apache.manifoldcf.authorities.DefineAuthorityConnection</td><td>Create a new authority connection</td></tr>
+          <tr><td>org.apache.manifoldcf.authorities.DeleteAuthorityConnection</td><td>Delete an existing authority connection</td></tr>
+          <tr><td>org.apache.manifoldcf.authorities.DefineMappingConnection</td><td>Create a new mapping connection</td></tr>
+          <tr><td>org.apache.manifoldcf.authorities.DeleteMappingConnection</td><td>Delete an existing mapping connection</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.AbortJob</td><td>Abort a running job</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.AddScheduledTime</td><td>Add a schedule record to a job</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.ChangeJobDocSpec</td><td>Modify a job's specification information</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.DefineJob</td><td>Create a new job</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.DefineRepositoryConnection</td><td>Create a new repository connection</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.DeleteJob</td><td>Delete an existing job</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.DeleteRepositoryConnection</td><td>Delete an existing repository connection</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.ExportConfiguration</td><td>Write the complete list of all connection definitions and job specifications to a file</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.FindJob</td><td>Locate a job identifier given a job's name</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.GetJobSchedule</td><td>Find a job's schedule given a job's identifier</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.ImportConfiguration</td><td>Import configuration as written by a previous ExportConfiguration command</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.ListJobStatuses</td><td>List the status of all jobs</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.ListJobs</td><td>List the identifiers for all jobs</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.PauseJob</td><td>Given a job identifier, pause the specified job</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.RestartJob</td><td>Given a job identifier, restart the specified job</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.RunDocumentStatus</td><td>Run a document status report</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.RunMaxActivityHistory</td><td>Run a maximum activity report</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.RunMaxBandwidthHistory</td><td>Run a maximum bandwidth report</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.RunQueueStatus</td><td>Run a queue status report</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.RunResultHistory</td><td>Run a result history report</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.RunSimpleHistory</td><td>Run a simply history report</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.StartJob</td><td>Start a job</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.WaitForJobDeleted</td><td>After a job has been deleted, wait until the delete has completed</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.WaitForJobInactive</td><td>After a job has been started or aborted, wait until the job ceases all activity</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.WaitJobPaused</td><td>After a job has been paused, wait for the pause to take effect</td></tr>
+        </table>
+        <p></p>
+      </section>
+      <section>
+        <title>Control by direct code</title>
+        <p></p>
+        <p>Control by direct java code is quite a reasonable thing to do.  The sources of the above commands should give a pretty clear idea how to proceed, if that's what you
+          want to do.</p>
+        <p></p>
+        <p></p>
+      </section>
+      <section>
+        <title>Caveats</title>
+        <p></p>
+        <p>The above commands know nothing about the differences between connection types.  Instead, they deal with configuration and specification information in the
+          form of XML documents.  Normally, these XML documents are hidden from a system integrator, unless they happen to look into the database with a tool such as
+          psql.  But the API commands above often will require such XML documents to be included as part of the command execution.</p>
+        <p></p>
+        <p>This has one major consequence.  Any application that would manipulate connections and jobs directly cannot be connection-type independent - these
+          applications must know the proper form of XML to submit to the command.  So, it is not possible to use these command APIs to write one's own UI wrapper,
+          without sacrificing some of the repository independence that ManifoldCF by itself maintains.</p>
+      </section>
+    </section>
+  </body>
+
+</document>