You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@manifoldcf.apache.org by kw...@apache.org on 2013/03/29 20:21:00 UTC

svn commit: r1462614 - in /manifoldcf/trunk: ./ site/src/documentation/content/xdocs/en_US/ site/src/documentation/resources/images/en_US/

Author: kwright
Date: Fri Mar 29 19:21:00 2013
New Revision: 1462614

URL: http://svn.apache.org/r1462614
Log:
Update Solr connector documentation.  Part of CONNECTORS-665.

Added:
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/solr-configure-solr-type.PNG   (with props)
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/solr-configure-zookeeper.PNG   (with props)
Modified:
    manifoldcf/trunk/CHANGES.txt
    manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/solr-configure-schema.PNG
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/solr-configure-server.PNG

Modified: manifoldcf/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/CHANGES.txt?rev=1462614&r1=1462613&r2=1462614&view=diff
==============================================================================
--- manifoldcf/trunk/CHANGES.txt (original)
+++ manifoldcf/trunk/CHANGES.txt Fri Mar 29 19:21:00 2013
@@ -3,6 +3,9 @@ $Id$
 
 ======================= 1.2-dev =====================
 
+CONNECTORS-665: Update Solr connector type end-user documentation.
+(Karl Wright)
+
 CONNECTORS-667: Fix livelink authority caching to work properly with
 new SSL connection support.
 (David Morana, Karl Wright)

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml?rev=1462614&r1=1462613&r2=1462614&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml (original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml Fri Mar 29 19:21:00 2013
@@ -435,41 +435,60 @@
             
             <section id="solroutputconnector">
                 <title>Solr Output Connection</title>
-                <p>The Solr output connection type is designed to allow ManifoldCF to submit documents to an appropriate Solr pipeline, via the Solr
-                       HTTP ingestion API.  The configuration parameters are set to the default Solr values, which can be changed (since Solr's configuration can be changed).
-                       The Solr output connection type furthermore makes no judgment as to whether a given document is indexable or not - it accepts everything, and passes all documents
-                       on to the pipeline, where presumably the configured pipeline will decide if a document should be rejected or not.  (All of that happens without a Solr connection
-                       being aware of it in any way.)</p>
-                <p>Unfortunately, this lack of specificity comes at a cost.  Unless you take care to filter documents properly in each job, large movie files or other opaque
-                       content may well be picked up and sent to Solr for indexing, which will greatly increase the dead load on the overall system.  It is therefore a good idea to review
-                       all crawls done through a Solr connection while they are underway, to be sure there isn't a misconfiguration of this kind.</p>
-                <p>When you create a Solr output connection, five configuration tabs appear.  The "Server" tab allows you to configure the HTTP target of the connection:</p>
+                <p>The Solr output connection type is designed to allow ManifoldCF to submit documents to either an appropriate Apache Solr instance,
+                       via the Solr HTTP API, or alternatively to a Solr Cloud cluster.  The configuration parameters are initially set to appropriate default
+                       values for a stand-alone Solr instance.</p>
+                <p>When you create a Solr output connection, multiple configuration tabs appear.  The first tab is the "Solr type" tab.  Here you select
+                       whether you want your connection to communicate to a standalone Solr instance, or to a Solr Cloud cluster:</p>
+                <br/><br/>
+                <figure src="images/en_US/solr-configure-solr-type.PNG" alt="Solr Configuration, Solr type tab" width="80%"/>
+                <br/><br/>
+                <p>Select which kind of Solr installation you want to communicate with.  Based on your selection, you can proceed to either the "Server"
+                       tab (if a standalone instance) or to the "ZooKeeper" tab (if a SolrCloud cluster).</p>
+                <p>The "Server" tab allows you to configure the HTTP parameters appropriate for communicating with a standalone Solr instance:</p>
                 <br/><br/>
                 <figure src="images/en_US/solr-configure-server.PNG" alt="Solr Configuration, Server tab" width="80%"/>
                 <br/><br/>
-                <p>Fill in the fields according to your Solr configuration.  The Solr connection type supports only basic authentication at this time; if you have this enabled, supply the credentials
-                       as requested on the bottom part of the form.</p>
-                <p>The second tab is the "Schema" tab, which allows you to specify the name of the Solr field to use as a document identifier.  The Solr connection type will treat
-                       this field as being a unique key for locating the indexed document for further modification or deletion:</p>
+                <p>If your Solr setup is a standalone instance, fill in the fields according to your Solr configuration.  The Solr connection type supports
+                       only basic authentication at this time; if you have this enabled, supply the credentials as requested on the bottom part of the form.</p>
+                <p>The "Zookeeper" tab allows your to configure the connection type to communicate with a Solr Cloud cluster:</p>
+                <br/><br/>
+                <figure src="images/en_US/solr-configure-zookeeper.PNG" alt="Solr Configuration, Zookeeper tab" width="80%"/>
+                <br/><br/>
+                <p>Here, add each ZooKeeper instance in the SolrCloud cluster to the list of ZooKeeper instances.  The connection comes preconfigured with
+                       "localhost" as being a ZooKeeper instance.  You may delete this if it is not the case.</p>
+                <p>The next tab is the "Schema" tab, which allows you to specify the names of various Solr fields into which the Solr connection type will
+                       place built-in document metadata:</p>
                 <br/><br/>
                 <figure src="images/en_US/solr-configure-schema.PNG" alt="Solr Configuration, Schema tab" width="80%"/>
                 <br/><br/>
-                <p>The third tab is the "Arguments" tab, which allows you to specify arbitrary arguments to be sent to Solr. All valid Solr update request parameters
-                       can be specified here. You can for instance add <a href="http://wiki.apache.org/solr/UpdateRequestProcessor">update.chain=myChain</a> to select the document processing pipeline/chain to use for
-                       processing documents in Solr. See the Solr documentation for more valid arguments. The tab looks like:</p>
+                <p>The most important of these is the document identifier field, which MUST be present for the connection type to function.  This field will
+                       be used to uniquely identify the document within Solr, and will contain the document's URL.  The Solr connection type will treat this field as being
+                       a unique key for locating the indexed document for further modification or deletion.  The other Solr fields are optional, and largely self-
+                       explanatory.</p>
+                <p>The next tab is the "Arguments" tab, which allows you to specify arbitrary arguments to be sent to Solr:</p>
                 <br/><br/>
                 <figure src="images/en_US/solr-configure-arguments.PNG" alt="Solr Configuration, Arguments tab" width="80%"/>
                 <br/><br/>
                 <p>Fill in the argument name and value, and click the "Add" button.  Bear in mind that if you add an argument with the same name as an existing one, it will replace the
                        existing one with the new specified value.  You can delete existing arguments by clicking the "Delete" button next to the argument you want to delete.</p>
-                <p>The fourth tab is the "Documents" tab, which allows you to do document filtering based on size and mime types. By specifying a maximum document length in bytes, you can filter out documents which exceed that size (e.g. 10485760 which is equivalent to 10 MB). If you only want to add documents with specific mime types, you can enter them into the "included mime types" field (e.g. "text/html" for filtering out all documents but HTML). The "excluded mime types" field is for excluding documents with specific mime types (e.g. "image/jpeg" for filtering out JPEG images). The tab looks like:</p>
+                <p>Use this tab to specify any and all desired Solr update request parameters.  You can, for instance, add
+                       <a href="http://wiki.apache.org/solr/UpdateRequestProcessor">update.chain=myChain</a> to select a specific document processing pipeline/chain to
+                       use for processing documents. See the Solr documentation for more valid arguments.</p>
+                <p>The next tab is the "Documents" tab, which allows you to do document filtering based on size and mime types. By specifying a maximum document
+                       length in bytes, you can filter out documents which exceed that size (e.g. 10485760 which is equivalent to 10 MB). If you only want to add
+                       documents with specific mime types, you can enter them into the "included mime types" field (e.g. "text/html" for filtering out all documents but HTML).
+                       The "excluded mime types" field is for excluding documents with specific mime types (e.g. "image/jpeg" for filtering out JPEG images). The tab looks like:</p>
                 <figure src="images/en_US/solr-configure-documents.PNG" alt="Solr Configuration, Documents tab" width="80%"/>
                 <br/><br/>
-                <p>The fifth tab is the "Commits" tab, which allows you to control the commit strategies. As well as committing documents at the end of every job, an option which is enabled by default, you may also commit each document within a certain time in milliseconds (e.g. "10000" for committing within 10 seconds). The <a href="http://wiki.apache.org/solr/CommitWithin">commit within</a> strategy will leave the responsibility to Solr instead of ManifoldCF. The tab looks like:</p>
+                <p>The fifth tab is the "Commits" tab, which allows you to control the commit strategies. As well as committing documents at the end of every job, an
+                       option which is enabled by default, you may also commit each document within a certain time in milliseconds (e.g. "10000" for committing within
+                       10 seconds). The <a href="http://wiki.apache.org/solr/CommitWithin">commit within</a> strategy will leave the responsibility to Solr instead
+                       of ManifoldCF. The tab looks like:</p>
                 <figure src="images/en_US/solr-configure-commits.PNG" alt="Solr Configuration, Documents tab" width="80%"/>
                 <br/><br/>
-                <p>When you are done, don't forget to click the "Save" button to save your changes!  When you do, a connection summary and status screen will be presented, which
-                       may look something like this:</p>
+                <p>When you are done, don't forget to click the "Save" button to save your changes!  When you do, a connection summary and status screen will be
+                       presented, which may look something like this:</p>
                 <br/><br/>
                 <figure src="images/en_US/solr-status.PNG" alt="Solr Status" width="80%"/>
                 <br/><br/>

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/solr-configure-schema.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/solr-configure-schema.PNG?rev=1462614&r1=1462613&r2=1462614&view=diff
==============================================================================
Binary files - no diff available.

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/solr-configure-server.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/solr-configure-server.PNG?rev=1462614&r1=1462613&r2=1462614&view=diff
==============================================================================
Binary files - no diff available.

Added: manifoldcf/trunk/site/src/documentation/resources/images/en_US/solr-configure-solr-type.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/solr-configure-solr-type.PNG?rev=1462614&view=auto
==============================================================================
Binary file - no diff available.

Propchange: manifoldcf/trunk/site/src/documentation/resources/images/en_US/solr-configure-solr-type.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: manifoldcf/trunk/site/src/documentation/resources/images/en_US/solr-configure-zookeeper.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/solr-configure-zookeeper.PNG?rev=1462614&view=auto
==============================================================================
Binary file - no diff available.

Propchange: manifoldcf/trunk/site/src/documentation/resources/images/en_US/solr-configure-zookeeper.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream