You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by ma...@apache.org on 2020/10/12 21:30:02 UTC

[lucene-solr] 02/05: @972 Remove orphaned doc pages.

This is an automated email from the ASF dual-hosted git repository.

markrmiller pushed a commit to branch reference_impl_dev
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git

commit 006d20dbf4fb63c7b25a54cf850c775eff5ee79a
Author: markrmiller@gmail.com <ma...@gmail.com>
AuthorDate: Mon Oct 12 15:19:22 2020 -0500

    @972 Remove orphaned doc pages.
---
 .../adding-custom-plugins-in-solrcloud-mode.adoc   |  333 ------
 solr/solr-ref-guide/src/dataimport-screen.adoc     |   26 -
 ...ta-store-data-with-the-data-import-handler.adoc | 1075 --------------------
 .../src/velocity-response-writer.adoc              |  122 ---
 solr/solr-ref-guide/src/velocity-search-ui.adoc    |   26 -
 5 files changed, 1582 deletions(-)

diff --git a/solr/solr-ref-guide/src/adding-custom-plugins-in-solrcloud-mode.adoc b/solr/solr-ref-guide/src/adding-custom-plugins-in-solrcloud-mode.adoc
deleted file mode 100644
index 29c8bf7..0000000
--- a/solr/solr-ref-guide/src/adding-custom-plugins-in-solrcloud-mode.adoc
+++ /dev/null
@@ -1,333 +0,0 @@
-= Adding Custom Plugins in SolrCloud Mode
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-In SolrCloud mode, custom plugins need to be shared across all nodes of the cluster.
-
-.Deprecated
-[IMPORTANT]
-====
-The functionality here is a subset of the <<package-manager.adoc#package-manager,Package Management>> system.  It will no longer be supported in Solr 9.
-====
-
-When running Solr in SolrCloud mode and you want to use custom code (such as custom analyzers, tokenizers, query parsers, and other plugins), it can be cumbersome to add jars to the classpath on all nodes in your cluster. Using the <<blob-store-api.adoc#blob-store-api,Blob Store API>> and special commands with the <<config-api.adoc#config-api,Config API>>, you can upload jars to a special system-level collection and dynamically load plugins from them at runtime without needing to restart [...]
-
-.This Feature is Disabled By Default
-[IMPORTANT]
-====
-In addition to requiring that Solr is running in <<solrcloud.adoc#solrcloud,SolrCloud>> mode, this feature is also disabled by default unless all Solr nodes are run with the `-Denable.runtime.lib=true` option on startup.
-
-Before enabling this feature, users should carefully consider the issues discussed in the <<Securing Runtime Libraries>> section below.
-====
-
-== Uploading Jar Files
-
-Use your own service to host the jars or you can use Solr itself to host the jars.
-
-Use the <<blob-store-api.adoc#blob-store-api,Blob Store API>> to upload your jar files to Solr. This will to put your jars in the `.system` collection and distribute them across your SolrCloud nodes. These jars are added to a separate classloader and only accessible to components that are configured with the property `runtimeLib=true`. These components are loaded lazily because the `.system` collection may not be loaded when a particular core is loaded.
-
-== Config API Commands to use Jars as Runtime Libraries
-
-The runtime library feature uses a special set of commands for the <<config-api.adoc#config-api,Config API>> to add, update, or remove jar files currently available in the blob store to the list of runtime libraries.
-
-The following commands are used to manage runtime libs:
-
-* `add-runtimelib`
-* `update-runtimelib`
-* `delete-runtimelib`
-
-[.dynamic-tabs]
---
-[example.tab-pane#v1manage-libs]
-====
-[.tab-label]*V1 API*
-
-[source,bash]
-----
-curl http://localhost:8983/solr/techproducts/config -H 'Content-type:application/json' -d '{
-   "add-runtimelib": { "name":"jarblobname", "version":2 },
-   "update-runtimelib": { "name":"jarblobname", "version":3 },
-   "delete-runtimelib": "jarblobname"
-}'
-----
-====
-
-[example.tab-pane#v2manage-libs]
-====
-[.tab-label]*V2 API*
-
-[source,bash]
-----
-curl http://localhost:8983/api/collections/techproducts/config -H 'Content-type:application/json' -d '{
-   "add-runtimelib": { "name":"jarblobname", "version":2 },
-   "update-runtimelib": { "name":"jarblobname", "version":3 },
-   "delete-runtimelib": "jarblobname"
-}'
-----
-====
---
-
-The name to use is the name of the blob that you specified when you uploaded your jar to the blob store. You should also include the version of the jar found in the blob store that you want to use. These details are added to `configoverlay.json`.
-
-The default `SolrResourceLoader` does not have visibility to the jars that have been defined as runtime libraries. There is a classloader that can access these jars which is made available only to those components which are specially annotated.
-
-Every pluggable component can have an optional extra attribute called `runtimeLib=true`, which means that the components are not loaded at core load time. Instead, they will be loaded on demand. If all the dependent jars are not available when the component is loaded, an error is thrown.
-
-This example shows creating a ValueSourceParser using a jar that has been loaded to the Blob store.
-
-[.dynamic-tabs]
---
-[example.tab-pane#v1add-jar]
-====
-[.tab-label]*V1 API*
-
-[source,bash]
-----
-curl http://localhost:8983/solr/techproducts/config -H 'Content-type:application/json' -d '{
-  "create-valuesourceparser": {
-    "name": "nvl",
-    "runtimeLib": true,
-    "class": "solr.org.apache.solr.search.function.NvlValueSourceParser,
-    "nvlFloatValue": 0.0 }
-}'
-----
-====
-
-[example.tab-pane#v2add-jar]
-====
-[.tab-label]*V2 API*
-
-[source,bash]
-----
-curl http://localhost:8983/api/collections/techproducts/config -H 'Content-type:application/json' -d '{
-  "create-valuesourceparser": {
-    "name": "nvl",
-    "runtimeLib": true,
-    "class": "solr.org.apache.solr.search.function.NvlValueSourceParser,
-    "nvlFloatValue": 0.0 }
-}'
-----
-====
---
-
-== Example: Using external service to host your jars
-
-Hosting your jars externally is more convenient if you have a reliable server to host your jars. There is no need to create and manage a `.system` collection.
-
-Step 1: Download a jar from github to the current directory
-
-[source,bash]
-----
- curl -o runtimelibs.jar   -LO https://github.com/apache/lucene-solr/blob/master/solr/core/src/test-files/runtimecode/runtimelibs.jar.bin?raw=true
-----
-Step 2: Get the `sha512` hash of the jar
-
-[source,bash]
-----
- openssl dgst -sha512 runtimelibs.jar
-----
-
-Step 3 :  Start Solr with runtime lib enabled
-
-[source,bash]
-----
- bin/solr start -e cloud -a "-Denable.runtime.lib=true" -noprompt
-----
-
-Step 4: Run a local server. Skip this step if you have another place to host your jars. Ensure that the url is set appropriately
-
-[source,bash]
-----
- python -m SimpleHTTPServer 8000 &
-----
-
-Step 5: Add the jar to your collection `gettingstarted`
-
-[source,bash]
-----
- curl http://localhost:8983/solr/gettingstarted/config -H 'Content-type:application/json' -d '{
-    "add-runtimelib": { "name" : "testjar",
-    "url":"http://localhost:8000/runtimelibs.jar" ,
-    "sha512" : "d01b51de67ae1680a84a813983b1de3b592fc32f1a22b662fc9057da5953abd1b72476388ba342cad21671cd0b805503c78ab9075ff2f3951fdf75fa16981420"}
-    }'
-----
-
-Step 6: Create a new request handler '/test' for the collection 'gettingstarted' from the jar we just added
-
-[source,bash]
-----
-curl http://localhost:8983/solr/gettingstarted/config -H 'Content-type:application/json' -d '{
-    "create-requesthandler": { "name" : "/test",
-    'class': 'org.apache.solr.core.RuntimeLibReqHandler', 'runtimeLib' : true}
-    }'
-----
-
-Step 7:  Test your request handler
-
-[source,bash]
-----
-curl  http://localhost:8983/solr/gettingstarted/test
-----
-
-output:
-[source,json]
-----
-{
-  "responseHeader":{
-    "status":0,
-    "QTime":0},
-  "params":{},
-  "context":{
-    "webapp":"/solr",
-    "path":"/test",
-    "httpMethod":"GET"},
-  "class":"org.apache.solr.core.RuntimeLibReqHandler",
-  "loader":"org.apache.solr.core.MemClassLoader"}
-----
-
-=== Updating Remote Jars
-
-Example:
-
-* Host the new jar to a new url, e.g., http://localhost:8000/runtimelibs_v2.jar
-* Get the `sha512` hash of the new jar.
-* Run the `update-runtimelib` command.
-
-[source,bash]
-----
- curl http://localhost:8983/solr/gettingstarted/config -H 'Content-type:application/json' -d '{
-    "update-runtimelib": { "name" : "testjar",
-    "url":"http://localhost:8000/runtimelibs_v2.jar" ,
-    "sha512" : "<replace-the-new-sha512-digest-here>"}
-    }'
-----
-
-NOTE: Always upload your jar to a new url as the Solr cluster is still referring to the old jar. If the existing jar is modified it can cause errors as the hash may not match.
-
-== Securing Runtime Libraries
-
-A drawback of this feature is that it could be used to load malicious executable code into the system. However, it is possible to restrict the system to load only trusted jars using http://en.wikipedia.org/wiki/Public_key_infrastructure[PKI] to verify that the executables loaded into the system are trustworthy.
-
-The following steps will allow you enable security for this feature. The instructions assume you have started all your Solr nodes with the `-Denable.runtime.lib=true`.
-
-=== Step 1: Generate an RSA Private Key
-
-The first step is to generate an RSA private key. The example below uses a 512-bit key, but you should use the strength appropriate to your needs.
-
-[source,bash]
-----
-$ openssl genrsa -out priv_key.pem 512
-----
-
-=== Step 2: Output the Public Key
-
-The public portion of the key should be output in DER format so Java can read it.
-
-[source,bash]
-----
-$ openssl rsa -in priv_key.pem -pubout -outform DER -out pub_key.der
-----
-
-=== Step 3: Load the Key to ZooKeeper
-
-The `.der` files that are output from Step 2 should then be loaded to ZooKeeper under a node `/keys/exe` so they are available throughout every node. You can load any number of public keys to that node and all are valid. If a key is removed from the directory, the signatures of that key will cease to be valid. So, before removing the a key, make sure to update your runtime library configurations with valid signatures with the `update-runtimelib` command.
-
-At the current time, you can only use the ZooKeeper `zkCli.sh` (or `zkCli.cmd` on Windows) script to issue these commands (the Solr version has the same name, but is not the same). If you have your own ZooKeeper ensemble running already, you can find the script in `$ZK_INSTALL/bin/zkCli.sh` (or `zkCli.cmd` if you are using Windows).
-
-NOTE: If you are running the embedded ZooKeeper that is included with Solr, you *do not* have this script already; in order to use it, you will need to download a copy of ZooKeeper v{ivy-zookeeper-version} from http://zookeeper.apache.org/. Don't worry about configuring the download, you're just trying to get the command line utility script. When you start the script, you will connect to the embedded ZooKeeper.
-
-To load the keys, you will need to connect to ZooKeeper with `zkCli.sh`, create the directories, and then create the key file, as in the following example.
-
-[source,bash]
-----
-# Connect to ZooKeeper
-# Replace the server location below with the correct ZooKeeper connect string for your installation.
-$ .bin/zkCli.sh -server localhost:9983
-
-# After connection, you will interact with the ZK prompt.
-# Create the directories
-[zk: localhost:9983(CONNECTED) 5] create /keys
-[zk: localhost:9983(CONNECTED) 5] create /keys/exe
-
-# Now create the public key file in ZooKeeper
-# The second path is the path to the .der file on your local machine
-[zk: localhost:9983(CONNECTED) 5] create /keys/exe/pub_key.der /myLocal/pathTo/pub_key.der
-----
-
-After this, any attempt to load a jar will fail. All your jars must be signed with one of your private keys for Solr to trust it. The process to sign your jars and use the signature is outlined in Steps 4-6.
-
-=== Step 4: Sign the jar File
-
-Next you need to sign the sha1 digest of your jar file and get the base64 string.
-
-[source,bash]
-----
-$ openssl dgst -sha1 -sign priv_key.pem myjar.jar | openssl enc -base64
-----
-
-The output of this step will be a string that you will need to add the jar to your classpath in Step 6 below.
-
-=== Step 5: Load the jar to the Blob Store
-
-Load your jar to the Blob store, using the <<blob-store-api.adoc#blob-store-api,Blob Store API>>. This step does not require a signature; you will need the signature in Step 6 to add it to your classpath.
-
-[source,bash]
-----
-curl -X POST -H 'Content-Type: application/octet-stream' --data-binary @{filename}
-http://localhost:8983/solr/.system/blob/{blobname}
-----
-
-The blob name that you give the jar file in this step will be used as the name in the next step.
-
-=== Step 6: Add the jar to the Classpath
-
-Finally, add the jar to the classpath using the Config API as detailed above. In this step, you will need to provide the signature of the jar that you got in Step 4.
-
-[.dynamic-tabs]
---
-[example.tab-pane#v1add-jar2]
-====
-[.tab-label]*V1 API*
-
-[source,bash]
-----
-curl http://localhost:8983/solr/techproducts/config -H 'Content-type:application/json'  -d '{
-  "add-runtimelib": {
-    "name":"blobname",
-    "version":2,
-    "sig":"mW1Gwtz2QazjfVdrLFHfbGwcr8xzFYgUOLu68LHqWRDvLG0uLcy1McQ+AzVmeZFBf1yLPDEHBWJb5KXr8bdbHN/
-           PYgUB1nsr9pk4EFyD9KfJ8TqeH/ijQ9waa/vjqyiKEI9U550EtSzruLVZ32wJ7smvV0fj2YYhrUaaPzOn9g0=" }
-}'
-----
-====
-
-[example.tab-pane#v2add-jar2]
-====
-[.tab-label]*V2 API*
-
-[source,bash]
-----
-curl http://localhost:8983/api/collections/techproducts/config -H 'Content-type:application/json'  -d '{
-  "add-runtimelib": {
-    "name":"blobname",
-    "version":2,
-    "sig":"mW1Gwtz2QazjfVdrLFHfbGwcr8xzFYgUOLu68LHqWRDvLG0uLcy1McQ+AzVmeZFBf1yLPDEHBWJb5KXr8bdbHN/
-           PYgUB1nsr9pk4EFyD9KfJ8TqeH/ijQ9waa/vjqyiKEI9U550EtSzruLVZ32wJ7smvV0fj2YYhrUaaPzOn9g0=" }
-}'
-----
-====
---
diff --git a/solr/solr-ref-guide/src/dataimport-screen.adoc b/solr/solr-ref-guide/src/dataimport-screen.adoc
deleted file mode 100644
index 647814c..0000000
--- a/solr/solr-ref-guide/src/dataimport-screen.adoc
+++ /dev/null
@@ -1,26 +0,0 @@
-= Dataimport Screen
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-The Dataimport screen shows the configuration of the DataImportHandler (DIH) and allows you start, and monitor the status of, import commands as defined by the options selected on the screen and defined in the configuration file.
-
-.The Dataimport Screen
-image::images/dataimport-screen/dataimport.png[image,width=485,height=250]
-
-This screen also lets you adjust various options to control how the data is imported to Solr, and view the data import configuration file that controls the import.
-
-For more information about data importing with DIH, see the section on <<uploading-structured-data-store-data-with-the-data-import-handler.adoc#uploading-structured-data-store-data-with-the-data-import-handler,Uploading Structured Data Store Data with the Data Import Handler>>.
diff --git a/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc b/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc
deleted file mode 100644
index d9f6cf8..0000000
--- a/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc
+++ /dev/null
@@ -1,1075 +0,0 @@
-= Uploading Structured Data Store Data with the Data Import Handler
-:toclevels: 1
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-Many search applications store the content to be indexed in a structured data store, such as a relational database. The Data Import Handler (DIH) provides a mechanism for importing content from a data store and indexing it.
-
-In addition to relational databases, DIH can index content from HTTP based data sources such as RSS and ATOM feeds, e-mail repositories, and structured XML where an XPath processor is used to generate fields.
-
-== DIH Concepts and Terminology
-
-Descriptions of the Data Import Handler use several familiar terms, such as entity and processor, in specific ways, as explained in the table below.
-
-Datasource::
-As its name suggests, a datasource defines the location of the data of interest. For a database, it's a DSN. For an HTTP datasource, it's the base URL.
-
-Entity::
-Conceptually, an entity is processed to generate a set of documents, containing multiple fields, which (after optionally being transformed in various ways) are sent to Solr for indexing. For a RDBMS data source, an entity is a view or table, which would be processed by one or more SQL statements to generate a set of rows (documents) with one or more columns (fields).
-
-Processor::
-An entity processor does the work of extracting content from a data source, transforming it, and adding it to the index. Custom entity processors can be written to extend or replace the ones supplied.
-
-Transformer::
-Each set of fields fetched by the entity may optionally be transformed. This process can modify the fields, create new fields, or generate multiple rows/documents form a single row. There are several built-in transformers in the DIH, which perform functions such as modifying dates and stripping HTML. It is possible to write custom transformers using the publicly available interface.
-
-== Solr's DIH Examples
-
-The `example/example-DIH` directory contains several collections to demonstrate many of the features of the data import handler. These are available with the `dih` example from the <<solr-control-script-reference.adoc#solr-control-script-reference,Solr Control Script>>:
-
-[source,bash]
-----
-bin/solr -e dih
-----
-
-This launches a standalone Solr instance with several collections that correspond to detailed examples. The available examples are `atom`, `db`, `mail`, `solr`, and `tika`.
-
-All examples in this section assume you are running the DIH example server.
-
-== Configuring DIH
-
-=== Configuring solrconfig.xml for DIH
-
-The Data Import Handler has to be registered in `solrconfig.xml`. For example:
-
-[source,xml]
-----
-<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
-  <lst name="defaults">
-    <str name="config">/path/to/my/DIHconfigfile.xml</str>
-  </lst>
-</requestHandler>
-----
-
-The only required parameter is the `config` parameter, which specifies the location of the DIH configuration file that contains specifications for the data source, how to fetch data, what data to fetch, and how to process it to generate the Solr documents to be posted to the index.
-
-You can have multiple DIH configuration files. Each file would require a separate definition in the `solrconfig.xml` file, specifying a path to the file.
-
-=== Configuring the DIH Configuration File
-
-An annotated configuration file, based on the `db` collection in the `dih` example server, is shown below (this file is located in `example/example-DIH/solr/db/conf/db-data-config.xml`).
-
-This example shows how to extract fields from four tables defining a simple product database. More information about the parameters and options shown here will be described in the sections following.
-
-[source,xml]
-----
-<dataConfig>
-
-  <dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:./example-DIH/hsqldb/ex" --<1>
-    user="sa" password="secret"/> --<2>
-  <document> --<3>
-    <entity name="item" query="select * from item"
-            deltaQuery="select id from item where last_modified > '${dataimporter.last_index_time}'"> --<4>
-      <field column="NAME" name="name" />
-
-      <entity name="feature"
-              query="select DESCRIPTION from FEATURE where ITEM_ID='${item.ID}'"
-              deltaQuery="select ITEM_ID from FEATURE where last_modified > '${dataimporter.last_index_time}'"
-              parentDeltaQuery="select ID from item where ID=${feature.ITEM_ID}"> --<5>
-        <field name="features" column="DESCRIPTION" />
-      </entity>
-
-      <entity name="item_category"
-              query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'"
-              deltaQuery="select ITEM_ID, CATEGORY_ID from item_category where last_modified > '${dataimporter.last_index_time}'"
-              parentDeltaQuery="select ID from item where ID=${item_category.ITEM_ID}">
-        <entity name="category"
-                query="select DESCRIPTION from category where ID = '${item_category.CATEGORY_ID}'"
-                deltaQuery="select ID from category where last_modified > '${dataimporter.last_index_time}'"
-                parentDeltaQuery="select ITEM_ID, CATEGORY_ID from item_category where CATEGORY_ID=${category.ID}">
-          <field column="description" name="cat" />
-        </entity>
-      </entity>
-    </entity>
-  </document>
-</dataConfig>
-----
-<1> The first element is the `dataSource`, in this case an HSQLDB database. The path to the JDBC driver and the JDBC URL and login credentials are all specified here. Other permissible attributes include whether or not to autocommit to Solr, the batchsize used in the JDBC connection, and a `readOnly` flag.
-<2> The password attribute is optional if there is no password set for the DB. Alternately, the password can be encrypted; the section <<Encrypting a Database Password>> below describes how to do this.
-<3> A `document` element follows, containing multiple `entity` elements. Note that `entity` elements can be nested, and this allows the entity relationships in the sample database to be mirrored here, so that we can generate a denormalized Solr record which may include multiple features for one item, for instance.
-<4> The possible attributes for the `entity` element are described in later sections. Entity elements may contain one or more `field` elements, which map the data source field names to Solr fields, and optionally specify per-field transformations. This entity is the `root` entity.
-<5> This entity is nested and reflects the one-to-many relationship between an item and its multiple features. Note the use of variables; `${item.ID}` is the value of the column 'ID' for the current item (`item` referring to the entity name).
-
-Datasources can still be specified in `solrconfig.xml`. These must be specified in the defaults section of the handler in `solrconfig.xml`. However, these are not parsed until the main configuration is loaded.
-
-The entire configuration itself can be passed as a request parameter using the `dataConfig` parameter rather than using a file. When configuration errors are encountered, the error message is returned in XML format.  Due to security concerns, this only works if you start Solr with `-Denable.dih.dataConfigParam=true`.
-
-A `reload-config` command is also supported, which is useful for validating a new configuration file, or if you want to specify a file, load it, and not have it reloaded again on import. If there is an `xml` mistake in the configuration a user-friendly message is returned in `xml` format. You can then fix the problem and do a `reload-config`.
-
-TIP: You can also view the DIH configuration in the Solr Admin UI from the <<dataimport-screen.adoc#dataimport-screen,Dataimport Screen>>. It includes an interface to import content.
-
-==== DIH Request Parameters
-
-Request parameters can be substituted in configuration with placeholder `${dataimporter.request._paramname_}`, as in this example:
-
-[source,xml]
-----
-<dataSource driver="org.hsqldb.jdbcDriver"
-            url="${dataimporter.request.jdbcurl}"
-            user="${dataimporter.request.jdbcuser}"
-            password="${dataimporter.request.jdbcpassword}" />
-----
-
-These parameters can then be passed to the `full-import` command or defined in the `<defaults>` section in `solrconfig.xml`. This example shows the parameters with the full-import command:
-
-[source,bash]
-http://localhost:8983/solr/dih/dataimport?command=full-import&jdbcurl=jdbc:hsqldb:./example-DIH/hsqldb/ex&jdbcuser=sa&jdbcpassword=secret
-
-==== Encrypting a Database Password
-
-The database password can be encrypted if necessary to avoid plaintext passwords being exposed in unsecured files. To do this, we will replace the password in `data-config.xml` with an encrypted password. We will use the `openssl` tool for the encryption, and the encryption key will be stored in a file which is only readable to the `solr` process. Please follow these steps:
-
-. Create a strong encryption password and store it in a file. Then make sure it is readable only for the `solr` user. Example commands:
-+
-[source,text]
-echo -n "a-secret" > /var/solr/data/dih-encryptionkey
-chown solr:solr /var/solr/data/dih-encryptionkey
-chmod 600 /var/solr/data/dih-encryptionkey
-+
-CAUTION: Note that we use the `-n` argument to `echo` to avoid including a newline character at the end of the password. If you use another method to generate the encrypted password, make sure to avoid newlines as well.
-
-. Encrypt the JDBC database password using `openssl` as follows:
-+
-[source,text]
-echo -n "my-jdbc-password" | openssl enc -aes-128-cbc -a -salt -md md5 -pass file:/var/solr/data/dih-encryptionkey
-+
-The output of the command will be a long string such as `U2FsdGVkX18QMjY0yfCqlfBMvAB4d3XkwY96L7gfO2o=`. You will use this as `password` in your `data-config.xml` file.
-
-. In your `data-config.xml`, you'll add the `password` and `encryptKeyFile` parameters to the `<datasource>` configuration, as in this example:
-+
-[source,xml]
-<dataSource driver="org.hsqldb.jdbcDriver"
-    url="jdbc:hsqldb:./example-DIH/hsqldb/ex"
-    user="sa"
-    password="U2FsdGVkX18QMjY0yfCqlfBMvAB4d3XkwY96L7gfO2o="
-    encryptKeyFile="/var/solr/data/dih-encryptionkey" />
-
-
-
-== DataImportHandler Commands
-
-DIH commands are sent to Solr via an HTTP request. The following operations are supported.
-
-abort::
-Aborts an ongoing operation. For example: `\http://localhost:8983/solr/dih/dataimport?command=abort`.
-
-delta-import::
-For incremental imports and change detection. Only the <<The SQL Entity Processor,SqlEntityProcessor>> supports delta imports.
-+
-For example: `\http://localhost:8983/solr/dih/dataimport?command=delta-import`.
-+
-This command supports the same `clean`, `commit`, `optimize` and `debug` parameters as `full-import` command described below.
-
-full-import::
-A Full Import operation can be started with a URL such as `\http://localhost:8983/solr/dih/dataimport?command=full-import`. The command returns immediately.
-+
-The operation will be started in a new thread and the _status_ attribute in the response should be shown as _busy_. The operation may take some time depending on the size of dataset. Queries to Solr are not blocked during full-imports.
-+
-When a `full-import` command is executed, it stores the start time of the operation in a file located at `conf/dataimport.properties`. This stored timestamp is used when a `delta-import` operation is executed.
-+
-Commands available to `full-import` are:
-+
-clean:::
-Default is true. Tells whether to clean up the index before the indexing is started.
-commit:::
-Default is true. Tells whether to commit after the operation.
-debug:::
-Default is false. Runs the command in debug mode and is used by the interactive development mode.
-+
-Note that in debug mode, documents are never committed automatically. If you want to run debug mode and commit the results too, add `commit=true` as a request parameter.
-entity:::
-The name of an entity directly under the `<document>` tag in the configuration file. Use this to execute one or more entities selectively.
-+
-Multiple "entity" parameters can be passed on to run multiple entities at once. If nothing is passed, all entities are executed.
-optimize:::
-Default is true. Tells Solr whether to optimize after the operation.
-synchronous:::
-Blocks request until import is completed. Default is false.
-
-reload-config::
-If the configuration file has been changed and you wish to reload it without restarting Solr, run the command `\http://localhost:8983/solr/dih/dataimport?command=reload-config`
-
-status::
-This command returns statistics on the number of documents created, deleted, queries run, rows fetched, status, and so on. For example:  `\http://localhost:8983/solr/dih/dataimport?command=status`.
-
-show-config::
-This command responds with configuration: `\http://localhost:8983/solr/dih/dataimport?command=show-config`.
-
-
-== Property Writer
-
-The `propertyWriter` element defines the date format and locale for use with delta queries. It is an optional configuration. Add the element to the DIH configuration file, directly under the `dataConfig` element.
-
-[source,xml]
-----
-<propertyWriter dateFormat="yyyy-MM-dd HH:mm:ss" type="SimplePropertiesWriter"
-                directory="data" filename="my_dih.properties" locale="en-US" />
-----
-
-The parameters available are:
-
-dateFormat::
-A `java.text.SimpleDateFormat` to use when converting the date to text. The default is `yyyy-MM-dd HH:mm:ss`.
-
-type::
-The implementation class. Use `SimplePropertiesWriter` for non-SolrCloud installations. If using SolrCloud, use `ZKPropertiesWriter`.
-+
-If this is not specified, it will default to the appropriate class depending on if SolrCloud mode is enabled.
-
-directory::
-Used with the `SimplePropertiesWriter` only. The directory for the properties file. If not specified, the default is `conf`.
-
-filename::
-Used with the `SimplePropertiesWriter` only. The name of the properties file.
-+
-If not specified, the default is the requestHandler name (as defined in `solrconfig.xml`, appended by ".properties" (such as, `dataimport.properties`).
-
-locale::
-The locale. If not defined, the ROOT locale is used. It must be specified as language-country (https://tools.ietf.org/html/bcp47[BCP 47 language tag]). For example, `en-US`.
-
-== Data Sources
-
-A data source specifies the origin of data and its type. Somewhat confusingly, some data sources are configured within the associated entity processor. Data sources can also be specified in `solrconfig.xml`, which is useful when you have multiple environments (for example, development, QA, and production) differing only in their data sources.
-
-You can create a custom data source by writing a class that extends `org.apache.solr.handler.dataimport.DataSource`.
-
-The mandatory attributes for a data source definition are its name and type. The name identifies the data source to an Entity element.
-
-The types of data sources available are described below.
-
-=== ContentStreamDataSource
-
-This takes the POST data as the data source. This can be used with any EntityProcessor that uses a `DataSource<Reader>`.
-
-=== FieldReaderDataSource
-
-This can be used where a database field contains XML which you wish to process using the XPathEntityProcessor. You would set up a configuration with both JDBC and FieldReader data sources, and two entities, as follows:
-
-[source,xml]
-----
-<dataSource name="a1" driver="org.hsqldb.jdbcDriver" ...  />
-<dataSource name="a2" type="FieldReaderDataSource" />
-<document>
-
-  <!-- processor for database -->
-  <entity name ="e1" dataSource="a1" processor="SqlEntityProcessor" pk="docid"
-          query="select * from t1 ...">
-
-    <!-- nested XpathEntity; the field in the parent which is to be used for
-         XPath is set in the "datafield" attribute in place of the "url" attribute -->
-    <entity name="e2" dataSource="a2" processor="XPathEntityProcessor"
-            dataField="e1.fieldToUseForXPath">
-
-      <!-- XPath configuration follows -->
-      ...
-    </entity>
-  </entity>
-</document>
-----
-
-The `FieldReaderDataSource` can take an `encoding` parameter, which will default to "UTF-8" if not specified. It must be specified as language-country. For example, `en-US`.
-
-=== FileDataSource
-
-This can be used like a <<URLDataSource>>, but is used to fetch content from files on disk. The only difference from `URLDataSource`, when accessing disk files, is how a pathname is specified.
-
-This data source accepts these optional attributes.
-
-basePath::
-The base path relative to which the value is evaluated if it is not absolute.
-
-encoding::
-Defines the character encoding to use. If not defined, UTF-8 is used.
-
-=== JdbcDataSource
-
-This is the default datasource. It's used with the <<The SQL Entity Processor,SqlEntityProcessor>>. See the example in the <<FieldReaderDataSource>> section for details on configuration. `JdbcDatasource` supports at least the following attributes:
-
-driver, url, user, password, encryptKeyFile::
-Usual JDBC connection properties.
-
-batchSize::
-Passed to `Statement#setFetchSize`, default value 500.
-+
-For MySQL driver, which doesn't honor fetchSize and pulls whole resultSet, which often lead to OutOfMemoryError.
-+
-In this case, set `batchSize=-1` that pass setFetchSize(Integer.MIN_VALUE), and switch result set to pull row by row
-
-All of them substitute properties via `$\{placeholders}`.
-
-=== URLDataSource
-
-This data source is often used with <<The XPathEntityProcessor,XPathEntityProcessor>> to fetch content from an underlying `file://` or `http://` location. Here's an example:
-
-[source,xml]
-----
-<dataSource name="a"
-            type="URLDataSource"
-            baseUrl="http://host:port/"
-            encoding="UTF-8"
-            connectionTimeout="5000"
-            readTimeout="10000"/>
-----
-
-The URLDataSource type accepts these optional parameters:
-
-baseURL::
-Specifies a new baseURL for pathnames. You can use this to specify host/port changes between Dev/QA/Prod environments. Using this attribute isolates the changes to be made to the `solrconfig.xml`
-
-connectionTimeout::
-Specifies the length of time in milliseconds after which the connection should time out. The default value is 5000ms.
-
-encoding::
-By default the encoding in the response header is used. You can use this property to override the default encoding.
-
-readTimeout::
-Specifies the length of time in milliseconds after which a read operation should time out. The default value is 10000ms.
-
-
-== Entity Processors
-
-Entity processors extract data, transform it, and add it to a Solr index. Examples of entities include views or tables in a data store.
-
-Each processor has its own set of attributes, described in its own section below. In addition, there are several attributes common to all entities which may be specified:
-
-dataSource::
-The name of a data source. If there are multiple data sources defined, use this attribute with the name of the data source for this entity.
-
-name::
-Required. The unique name used to identify an entity.
-
-pk::
-The primary key for the entity. It is optional, and required only when using delta-imports. It has no relation to the uniqueKey defined in `schema.xml` but they can both be the same.
-+
-This attribute is mandatory if you do delta-imports and then refer to the column name in `${dataimporter.delta.<column-name>}` which is used as the primary key.
-
-processor::
-Default is <<The SQL Entity Processor,SqlEntityProcessor>>. Required only if the datasource is not RDBMS.
-
-onError::
-Defines what to do if an error is encountered.
-+
-Permissible values are:
-
-abort::: Stops the import.
-
-skip::: Skips the current document.
-
-continue::: Ignores the error and processing continues.
-
-preImportDeleteQuery::
-Before a `full-import` command, use this query this to cleanup the index instead of using `\*:*`. This is honored only on an entity that is an immediate sub-child of `<document>`.
-
-postImportDeleteQuery::
-Similar to `preImportDeleteQuery`, but it executes after the import has completed.
-
-rootEntity::
-By default the entities immediately under `<document>` are root entities. If this attribute is set to false, the entity directly falling under that entity will be treated as the root entity (and so on). For every row returned by the root entity, a document is created in Solr.
-
-transformer::
-Optional. One or more transformers to be applied on this entity.
-
-cacheImpl::
-Optional. A class (which must implement `DIHCache`) to use for caching this entity when doing lookups from an entity which wraps it. Provided implementation is `SortedMapBackedCache`.
-
-cacheKey::
-The name of a property of this entity to use as a cache key if `cacheImpl` is specified.
-
-cacheLookup::
-An entity + property name that will be used to lookup cached instances of this entity if `cacheImpl` is specified.
-
-where::
-An alternative way to specify `cacheKey` and `cacheLookup` concatenated with '='.
-+
-For example, `where="CODE=People.COUNTRY_CODE"` is equivalent to `cacheKey="CODE" cacheLookup="People.COUNTRY_CODE"`
-
-child="true"::
-Enables indexing document blocks aka <<uploading-data-with-index-handlers.adoc#uploading-data-with-index-handlers,Nested Child Documents>> for searching with <<other-parsers.adoc#other-parsers,Block Join Query Parsers>>. It can be only specified on the `<entity>` element under another root entity. It switches from default behavior (merging field values) to nesting documents as children documents.
-+
-Note: parent `<entity>` should add a field which is used as a parent filter in query time.
-
-join="zipper"::
-Enables merge join, aka "zipper" algorithm, for joining parent and child entities without cache. It should be specified at child (nested) `<entity>`. It implies that parent and child queries return results ordered by keys, otherwise it throws an exception. Keys should be specified either with `where` attribute or with `cacheKey` and `cacheLookup`.
-
-=== Entity Caching
-Caching of entities in DIH is provided to avoid repeated lookups for same entities again and again. The default `SortedMapBackedCache` is a `HashMap` where a key is a field in the row and the value is a bunch of rows for that same key.
-
-In the example below, each `manufacturer` entity is cached using the `id` property as a cache key. Cache lookups will be performed for each `product` entity based on the product's `manu` property. When the cache has no data for a particular key, the query is run and the cache is populated
-
-[source,xml]
-----
-<entity name="product" query="select description,sku, manu from product" >
-  <entity name="manufacturer" query="select id, name from manufacturer"
-          cacheKey="id" cacheLookup="product.manu" cacheImpl="SortedMapBackedCache"/>
-</entity>
-----
-
-=== The SQL Entity Processor
-
-The SqlEntityProcessor is the default processor. The associated <<JdbcDataSource>> should be a JDBC URL.
-
-The entity attributes specific to this processor are shown in the table below. These are in addition to the attributes common to all entity processors described above.
-
-query::
-Required. The SQL query used to select rows.
-
-deltaQuery::
-SQL query used if the operation is delta-import. This query selects the primary keys of the rows which will be parts of the delta-update. The pks will be available to the deltaImportQuery through the variable `${dataimporter.delta.<column-name>}`.
-
-parentDeltaQuery::
-SQL query used if the operation is `delta-import`.
-
-deletedPkQuery::
-SQL query used if the operation is `delta-import`.
-
-deltaImportQuery::
-SQL query used if the operation is `delta-import`. If this is not present, DIH tries to construct the import query by (after identifying the delta) modifying the 'query' (this is error prone).
-+
-There is a namespace `${dataimporter.delta.<column-name>}` which can be used in this query. For example, `select * from tbl where id=${dataimporter.delta.id}`.
-
-=== The XPathEntityProcessor
-
-This processor is used when indexing XML formatted data. The data source is typically <<URLDataSource>> or <<FileDataSource>>. XPath can also be used with the <<The FileListEntityProcessor,FileListEntityProcessor>> described below, to generate a document from each file.
-
-The entity attributes unique to this processor are shown below. These are in addition to the attributes common to all entity processors described above.
-
-Processor::
-Required. Must be set to `XpathEntityProcessor`.
-
-url::
-Required. The HTTP URL or file location.
-
-stream::
-Optional: Set to true for a large file or download.
-
-forEach::
-Required unless you define `useSolrAddSchema`. The XPath expression which demarcates each record. This will be used to set up the processing loop.
-
-xsl::
-Optional: Its value (a URL or filesystem path) is the name of a resource used as a preprocessor for applying the XSL transformation.
-
-useSolrAddSchema::
-Set this to true if the content is in the form of the standard Solr update XML schema.
-
-Each `<field>` element in the entity can have the following attributes as well as the default ones.
-
-xpath::
-Required. The XPath expression which will extract the content from the record for this field. Only a subset of XPath syntax is supported.
-
-commonField::
-Optional. If true, then when this field is encountered in a record it will be copied to future records when creating a Solr document.
-
-flatten::
-Optional. If set to true, then any children text nodes are collected to form the value of a field.
-+
-[WARNING]
-The default value is false, meaning that if there are any sub-elements of the node pointed to by the XPath expression, they will be quietly omitted.
-
-Here is an example from the `atom` collection in the `dih` example (data-config file found at `example/example-DIH/solr/atom/conf/atom-data-config.xml`):
-
-[source,xml]
-----
-<dataConfig>
-  <dataSource type="URLDataSource"/>
-  <document>
-
-    <entity name="stackoverflow"
-            url="https://stackoverflow.com/feeds/tag/solr"
-            processor="XPathEntityProcessor"
-            forEach="/feed|/feed/entry"
-            transformer="HTMLStripTransformer,RegexTransformer">
-
-      <!-- Pick this value up from the feed level and apply to all documents -->
-      <field column="lastchecked_dt" xpath="/feed/updated" commonField="true"/>
-
-      <!-- Keep only the final numeric part of the URL -->
-      <field column="id" xpath="/feed/entry/id" regex=".*/" replaceWith=""/>
-
-      <field column="title"    xpath="/feed/entry/title"/>
-      <field column="author"   xpath="/feed/entry/author/name"/>
-      <field column="category" xpath="/feed/entry/category/@term"/>
-      <field column="link"     xpath="/feed/entry/link[@rel='alternate']/@href"/>
-
-      <!-- Use transformers to convert HTML into plain text.
-        There is also an UpdateRequestProcess to trim remaining spaces.
-      -->
-      <field column="summary" xpath="/feed/entry/summary" stripHTML="true" regex="( |\n)+" replaceWith=" "/>
-
-      <!-- Ignore namespaces when matching XPath -->
-      <field column="rank" xpath="/feed/entry/rank"/>
-
-      <field column="published_dt" xpath="/feed/entry/published"/>
-      <field column="updated_dt" xpath="/feed/entry/updated"/>
-    </entity>
-
-  </document>
-</dataConfig>
-----
-
-=== The MailEntityProcessor
-
-The MailEntityProcessor uses the Java Mail API to index email messages using the IMAP protocol.
-
-The MailEntityProcessor works by connecting to a specified mailbox using a username and password, fetching the email headers for each message, and then fetching the full email contents to construct a document (one document for each mail message).
-
-The entity attributes unique to the MailEntityProcessor are shown below. These are in addition to the attributes common to all entity processors described above.
-
-processor::
-Required. Must be set to `MailEntityProcessor`.
-
-user::
-Required. Username for authenticating to the IMAP server; this is typically the email address of the mailbox owner.
-
-password::
-Required. Password for authenticating to the IMAP server.
-
-host::
-Required. The IMAP server to connect to.
-
-protocol::
-Required. The IMAP protocol to use, valid values are: imap, imaps, gimap, and gimaps.
-
-fetchMailsSince::
-Optional. Date/time used to set a filter to import messages that occur after the specified date; expected format is: `yyyy-MM-dd HH:mm:ss`.
-
-folders::
-Required. Comma-delimited list of folder names to pull messages from, such as "inbox".
-
-recurse::
-Optional. Default is true. Flag to indicate if the processor should recurse all child folders when looking for messages to import.
-
-include::
-Optional. Comma-delimited list of folder patterns to include when processing folders (can be a literal value or regular expression).
-
-exclude::
-Optional. Comma-delimited list of folder patterns to exclude when processing folders (can be a literal value or regular expression). Excluded folder patterns take precedence over include folder patterns.
-
-processAttachement or processAttachments::
-Optional. Default is true. Use Tika to process message attachments.
-
-includeContent::
-Optional. Default is true. Include the message body when constructing Solr documents for indexing.
-
-Here is an example from the `mail` collection of the `dih` example (data-config file found at `example/example-DIH/mail/conf/mail-data-config.xml`):
-
-[source,xml]
-----
-<dataConfig>
-  <document>
-      <entity processor="MailEntityProcessor"
-              user="email@gmail.com"
-              password="password"
-              host="imap.gmail.com"
-              protocol="imaps"
-              fetchMailsSince="2014-06-30 00:00:00"
-              batchSize="20"
-              folders="inbox"
-              processAttachement="false"
-              name="mail_entity"/>
-  </document>
-</dataConfig>
-----
-
-==== Importing New Emails Only
-
-After running a full import, the MailEntityProcessor keeps track of the timestamp of the previous import so that subsequent imports can use the fetchMailsSince filter to only pull new messages from the mail server. This occurs automatically using the DataImportHandler `dataimport.properties` file (stored in `conf`).
-
-For instance, if you set `fetchMailsSince="2014-08-22 00:00:00"` in your `mail-data-config.xml`, then all mail messages that occur after this date will be imported on the first run of the importer. Subsequent imports will use the date of the previous import as the `fetchMailsSince` filter, so that only new emails since the last import are indexed each time.
-
-==== GMail Extensions
-
-When connecting to a GMail account, you can improve the efficiency of the MailEntityProcessor by setting the protocol to *gimap* or *gimaps*.
-
-This allows the processor to send the `fetchMailsSince` filter to the GMail server to have the date filter applied on the server, which means the processor only receives new messages from the server. However, GMail only supports date granularity, so the server-side filter may return previously seen messages if run more than once a day.
-
-=== The TikaEntityProcessor
-
-The TikaEntityProcessor uses Apache Tika to process incoming documents. This is similar to <<uploading-data-with-solr-cell-using-apache-tika.adoc#uploading-data-with-solr-cell-using-apache-tika,Uploading Data with Solr Cell using Apache Tika>>, but using DataImportHandler options instead.
-
-The parameters for this processor are described in the table below. These are in addition to the attributes common to all entity processors described above.
-
-dataSource::
-This parameter defines the data source and an optional name which can be referred to in later parts of the configuration if needed. This is the same `dataSource` explained in the description of general entity processor attributes above.
-+
-The available data source types for this processor are:
-+
-* BinURLDataSource: used for HTTP resources, but can also be used for files.
-* BinContentStreamDataSource: used for uploading content as a stream.
-* BinFileDataSource: used for content on the local filesystem.
-
-url::
-Required. The path to the source file(s), as a file path or a traditional internet URL.
-
-htmlMapper::
-Optional. Allows control of how Tika parses HTML. If this parameter is defined, it must be either *default* or *identity*; if it is absent, "default" is assumed.
-+
-The "default" mapper strips much of the HTML from documents while the "identity" mapper passes all HTML as-is with no modifications.
-
-format::
-The output format. The options are *text*, *xml*, *html* or *none*. The default is "text" if not defined. The format "none" can be used if metadata only should be indexed and not the body of the documents.
-
-parser::
-Optional. The default parser is `org.apache.tika.parser.AutoDetectParser`. If a custom or other parser should be used, it should be entered as a fully-qualified name of the class and path.
-
-fields::
-The list of fields from the input documents and how they should be mapped to Solr fields. If the attribute `meta` is defined as "true", the field will be obtained from the metadata of the document and not parsed from the body of the main text.
-
-extractEmbedded::
-Instructs the TikaEntityProcessor to extract embedded documents or attachments when *true*. If false, embedded documents and attachments will be ignored.
-
-onError::
-By default, the TikaEntityProcessor will stop processing documents if it finds one that generates an error. If you define `onError` to "skip", the TikaEntityProcessor will instead skip documents that fail processing and log a message that the document was skipped.
-
-Here is an example from the `tika` collection of the `dih` example (data-config file found in `example/example-DIH/tika/conf/tika-data-config.xml`):
-
-[source,xml]
-----
-<dataConfig>
-  <dataSource type="BinFileDataSource"/>
-  <document>
-    <entity name="file" processor="FileListEntityProcessor" dataSource="null"
-            baseDir="${solr.install.dir}/example/exampledocs" fileName=".*pdf"
-            rootEntity="false">
-
-      <field column="file" name="id"/>
-
-      <entity name="pdf" processor="TikaEntityProcessor"
-              url="${file.fileAbsolutePath}" format="text">
-
-        <field column="Author" name="author" meta="true"/>
-        <!-- in the original PDF, the Author meta-field name is upper-cased,
-          but in Solr schema it is lower-cased
-         -->
-
-        <field column="title" name="title" meta="true"/>
-        <field column="dc:format" name="format" meta="true"/>
-
-        <field column="text" name="text"/>
-
-      </entity>
-    </entity>
-  </document>
-</dataConfig>
-----
-
-=== The FileListEntityProcessor
-
-This processor is basically a wrapper, and is designed to generate a set of files satisfying conditions specified in the attributes which can then be passed to another processor, such as the <<The XPathEntityProcessor,XPathEntityProcessor>>.
-
-The entity information for this processor would be nested within the FileListEntity entry. It generates five implicit fields: `fileAbsolutePath`, `fileDir`, `fileSize`, `fileLastModified`, and `file`, which can be used in the nested processor. This processor does not use a data source.
-
-The attributes specific to this processor are described in the table below:
-
-fileName::
-Required. A regular expression pattern to identify files to be included.
-
-basedir::
-Required. The base directory (absolute path).
-
-recursive::
-Whether to search directories recursively. Default is 'false'.
-
-excludes::
-A regular expression pattern to identify files which will be excluded.
-
-newerThan::
-A date in the format `yyyy-MM-ddHH:mm:ss` or a date math expression (`NOW - 2YEARS`).
-
-olderThan::
-A date, using the same formats as newerThan.
-
-rootEntity::
-This should be set to false. This ensures that each row (filepath) emitted by this processor is considered to be a document.
-
-dataSource::
-Must be set to null.
-
-The example below shows the combination of the FileListEntityProcessor with another processor which will generate a set of fields from each file found.
-
-[source,xml]
-----
-<dataConfig>
-  <dataSource type="FileDataSource"/>
-  <document>
-    <!-- this outer processor generates a list of files satisfying the conditions
-         specified in the attributes -->
-    <entity name="f" processor="FileListEntityProcessor"
-            fileName=".*xml"
-            newerThan="'NOW-30DAYS'"
-            recursive="true"
-            rootEntity="false"
-            dataSource="null"
-            baseDir="/my/document/directory">
-
-      <!-- this processor extracts content using XPath from each file found -->
-
-      <entity name="nested" processor="XPathEntityProcessor"
-              forEach="/rootelement" url="${f.fileAbsolutePath}" >
-        <field column="name" xpath="/rootelement/name"/>
-        <field column="number" xpath="/rootelement/number"/>
-      </entity>
-    </entity>
-  </document>
-</dataConfig>
-----
-
-=== LineEntityProcessor
-
-This EntityProcessor reads all content from the data source on a line by line basis and returns a field called `rawLine` for each line read. The content is not parsed in any way; however, you may add transformers to manipulate the data within the `rawLine` field, or to create other additional fields.
-
-The lines read can be filtered by two regular expressions specified with the `acceptLineRegex` and `omitLineRegex` attributes.
-
-The LineEntityProcessor has the following attributes:
-
-url::
-A required attribute that specifies the location of the input file in a way that is compatible with the configured data source. If this value is relative and you are using FileDataSource or URLDataSource, it assumed to be relative to baseLoc.
-
-acceptLineRegex::
-An optional attribute that if present discards any line which does not match the regular expression.
-
-omitLineRegex::
-An optional attribute that is applied after any `acceptLineRegex` and that discards any line which matches this regular expression.
-
-For example:
-
-[source,xml]
-----
-<entity name="jc"
-        processor="LineEntityProcessor"
-        acceptLineRegex="^.*\.xml$"
-        omitLineRegex="/obsolete"
-        url="file:///Volumes/ts/files.lis"
-        rootEntity="false"
-        dataSource="myURIreader1"
-        transformer="RegexTransformer,DateFormatTransformer">
-</entity>
-----
-
-While there are use cases where you might need to create a Solr document for each line read from a file, it is expected that in most cases that the lines read by this processor will consist of a pathname, which in turn will be consumed by another entity processor, such as the XPathEntityProcessor.
-
-=== PlainTextEntityProcessor
-
-This EntityProcessor reads all content from the data source into an single implicit field called `plainText`. The content is not parsed in any way, however you may add <<Transformers,transformers>> to manipulate the data within the `plainText` as needed, or to create other additional fields.
-
-For example:
-
-[source,xml]
-----
-<entity processor="PlainTextEntityProcessor" name="x" url="http://abc.com/a.txt" dataSource="data-source-name">
-  <!-- copies the text to a field called 'text' in Solr-->
-  <field column="plainText" name="text"/>
-</entity>
-----
-
-Ensure that the dataSource is of type `DataSource<Reader>` (`FileDataSource`, `URLDataSource`).
-
-=== SolrEntityProcessor
-
-This EntityProcessor imports data from different Solr instances and cores. The data is retrieved based on a specified filter query. This EntityProcessor is useful in cases you want to copy your Solr index and want to modify the data in the target index.
-
-The SolrEntityProcessor can only copy fields that are stored in the source index.
-
-The SolrEntityProcessor supports the following parameters:
-
-url::
-Required. The URL of the source Solr instance and/or core.
-
-query::
-Required. The main query to execute on the source index.
-
-fq::
-Any filter queries to execute on the source index. If more than one filter query is defined, they must be separated by a comma.
-
-rows::
-The number of rows to return for each iteration. The default is 50 rows.
-
-fl::
-A comma-separated list of fields to fetch from the source index. Note, these fields must be stored in the source Solr instance.
-
-qt::
-The search handler to use, if not the default.
-
-wt::
-The response format to use, either *javabin* or *xml*.
-
-timeout::
-The query timeout in seconds. The default is 5 minutes (300 seconds).
-
-cursorMark="true"::
-Use this to enable cursor for efficient result set scrolling.
-
-sort="id asc"::
-This should be used to specify a sort parameter referencing the uniqueKey field of the source Solr instance. See <<pagination-of-results.adoc#pagination-of-results,Pagination of Results>> for details.
-
-Here is a simple example of a SolrEntityProcessor:
-
-[source,xml]
-<dataConfig>
-  <document>
-    <entity name="sep" processor="SolrEntityProcessor"
-            url="http://127.0.0.1:8983/solr/db "
-            query="*:*"
-            fl="*,orig_version_l:_version_,ignored_price_c:price_c"/>
-  </document>
-</dataConfig>
-
-== Transformers
-
-Transformers manipulate the fields in a document returned by an entity. A transformer can create new fields or modify existing ones. You must tell the entity which transformers your import operation will be using, by adding an attribute containing a comma separated list to the `<entity>` element.
-
-[source,xml]
-----
-<entity name="abcde" transformer="org.apache.solr....,my.own.transformer,..." />
-----
-
-Specific transformation rules are then added to the attributes of a `<field>` element, as shown in the examples below. The transformers are applied in the order in which they are specified in the transformer attribute.
-
-The DataImportHandler contains several built-in transformers.
-You can also write your own custom transformers if necessary.
-The ScriptTransformer described below offers an alternative method for writing your own transformers.
-
-=== ClobTransformer
-
-You can use the ClobTransformer to create a string out of a CLOB in a database. A http://en.wikipedia.org/wiki/Character_large_object[CLOB] is a character large object: a collection of character data typically stored in a separate location that is referenced in the database.
-
-The ClobTransformer accepts these attributes:
-
-clob::
-Boolean value to signal if ClobTransformer should process this field or not. If this attribute is omitted, then the corresponding field is not transformed.
-
-sourceColName::
-The source column to be used as input. If this is absent source and target are same
-
-Here's an example of invoking the ClobTransformer.
-
-[source,xml]
-----
-<entity name="example" transformer="ClobTransformer" ...>
-  <field column="hugeTextField" clob="true" />
-  ...
-</entity>
-----
-
-=== The DateFormatTransformer
-
-This transformer converts dates from one format to another. This would be useful, for example, in a situation where you wanted to convert a field with a fully specified date/time into a less precise date format, for use in faceting.
-
-DateFormatTransformer applies only on the fields with an attribute `dateTimeFormat`. Other fields are not modified.
-
-This transformer recognizes the following attributes:
-
-dateTimeFormat::
-The format used for parsing this field. This must comply with the syntax of the {java-javadocs}java/text/SimpleDateFormat.html[Java SimpleDateFormat] class.
-
-sourceColName::
-The column on which the dateFormat is to be applied. If this is absent source and target are same.
-
-locale::
-The locale to use for date transformations. If not defined, the ROOT locale is used. It must be specified as language-country (https://tools.ietf.org/html/bcp47[BCP 47 language tag]). For example, `en-US`.
-
-Here is example code that returns the date rounded up to the month "2007-JUL":
-
-[source,xml]
-----
-<entity name="en" pk="id" transformer="DateFormatTransformer" ... >
-  ...
-  <field column="date" sourceColName="fulldate" dateTimeFormat="yyyy-MMM"/>
-</entity>
-----
-
-=== The HTMLStripTransformer
-
-You can use this transformer to strip HTML out of a field.
-
-There is one attribute for this transformer, `stripHTML`, which is a boolean value (true or false) to signal if the HTMLStripTransformer should process the field or not.
-
-For example:
-
-[source,xml]
-----
-<entity name="e" transformer="HTMLStripTransformer" ... >
-  <field column="htmlText" stripHTML="true" />
-  ...
-</entity>
-----
-
-=== The LogTransformer
-
-You can use this transformer to log data to the console or log files. For example:
-
-[source,xml]
-----
-<entity ...
-        transformer="LogTransformer"
-        logTemplate="The name is ${e.name}" logLevel="debug">
-  ....
-</entity>
-----
-
-Unlike other transformers, the LogTransformer does not apply to any field, so the attributes are applied on the entity itself.
-
-=== The NumberFormatTransformer
-
-Use this transformer to parse a number from a string, converting it into the specified format, and optionally using a different locale.
-
-NumberFormatTransformer will be applied only to fields with an attribute `formatStyle`.
-
-This transformer recognizes the following attributes:
-
-formatStyle::
-The format used for parsing this field. The value of the attribute must be one of `number`, `percent`, `integer`, or `currency`. This uses the semantics of the Java NumberFormat class.
-
-sourceColName::
-The column on which the NumberFormat is to be applied. This is attribute is absent. The source column and the target column are the same.
-
-locale::
-The locale to be used for parsing the strings. The locale. If not defined, the ROOT locale is used. It must be specified as language-country (https://tools.ietf.org/html/bcp47[BCP 47 language tag]). For example, `en-US`.
-
-For example:
-
-[source,xml]
-----
-<entity name="en" pk="id" transformer="NumberFormatTransformer" ...>
-  ...
-
-  <!-- treat this field as UK pounds -->
-
-  <field name="price_uk" column="price" formatStyle="currency" locale="en-UK"/>
-</entity>
-----
-
-=== The RegexTransformer
-
-The regex transformer helps in extracting or manipulating values from fields (from the source) using Regular Expressions. The actual class name is `org.apache.solr.handler.dataimport.RegexTransformer`. But as it belongs to the default package the package-name can be omitted.
-
-The table below describes the attributes recognized by the regex transformer.
-
-regex::
-The regular expression that is used to match against the column or sourceColName's value(s). If replaceWith is absent, each regex _group_ is taken as a value and a list of values is returned.
-
-sourceColName::
-The column on which the regex is to be applied. If not present, then the source and target are identical.
-
-splitBy::
-Used to split a string. It returns a list of values. Note, this is a regular expression so it may need to be escaped (e.g., via back-slashes).
-
-groupNames::
-A comma separated list of field column names, used where the regex contains groups and each group is to be saved to a different field. If some groups are not to be named leave a space between commas.
-
-replaceWith::
-Used along with regex. It is equivalent to the method `new String(<sourceColVal>).replaceAll(<regex>, <replaceWith>)`.
-
-Here is an example of configuring the regex transformer:
-
-[source,xml]
-----
-<entity name="foo" transformer="RegexTransformer"
-        query="select full_name, emailids from foo"> --<1>
-  <field column="full_name"/> --<2>
-  <field column="firstName" regex="Mr(\w*)\b.*" sourceColName="full_name"/>
-  <field column="lastName" regex="Mr.*?\b(\w*)" sourceColName="full_name"/>
-
-  <!-- another way of doing the same -->
-
-  <field column="fullName" regex="Mr(\w*)\b(.*)" groupNames="firstName,lastName"/>
-  <field column="mailId" splitBy="," sourceColName="emailids"/> --<3>
-</entity>
-----
-
-<1> In this example, `regex` and `sourceColName` are custom attributes used by the transformer.
-<2> The transformer reads the field `full_name` from the result set and transforms it to two new target fields, `firstName` and `lastName`. Even though the query returned only one column, `full_name`, in the result set, the Solr document gets two extra fields `firstName` and `lastName` which are "derived" fields. These new fields are only created if the regexp matches.
-<3> The `emailids` field in the table can be a comma-separated value. It ends up producing one or more email IDs, and we expect the `mailId` to be a multivalued field in Solr.
-
-Note that this transformer can be used to either split a string into tokens based on a splitBy pattern, or to perform a string substitution as per `replaceWith`, or it can assign groups within a pattern to a list of `groupNames`. It decides what it is to do based upon the above attributes `splitBy`, `replaceWith` and `groupNames` which are looked for in order. This first one found is acted upon and other unrelated attributes are ignored.
-
-=== The ScriptTransformer
-
-The script transformer allows arbitrary transformer functions to be written in any scripting language supported by Java, such as Javascript, JRuby, Jython, Groovy, or BeanShell. Javascript is integrated into Java by default; you'll need to integrate other languages yourself.
-
-Each function you write must accept a row variable (which corresponds to a `Java Map<String,Object>`, thus permitting `get,put,remove` operations). Thus you can modify the value of an existing field or add new fields. The return value of the function is the returned object.
-
-The script is inserted into the DIH configuration file at the top level and is called once for each row.
-
-Here is a simple example.
-
-[source,xml]
-----
-<dataconfig>
-
-  <!-- simple script to generate a new row, converting a temperature from Fahrenheit to Centigrade -->
-
-  <script><![CDATA[
-    function f2c(row) {
-      var tempf, tempc;
-      tempf = row.get('temp_f');
-      if (tempf != null) {
-        tempc = (tempf - 32.0)*5.0/9.0;
-        row.put('temp_c', temp_c);
-      }
-      return row;
-    }
-    ]]>
-  </script>
-  <document>
-
-    <!-- the function is specified as an entity attribute -->
-
-    <entity name="e1" pk="id" transformer="script:f2c" query="select * from X">
-      ....
-    </entity>
-  </document>
-</dataConfig>
-----
-
-=== The TemplateTransformer
-
-You can use the template transformer to construct or modify a field value, perhaps using the value of other fields. You can insert extra text into the template.
-
-[source,xml]
-----
-<entity name="en" pk="id" transformer="TemplateTransformer" ...>
-  ...
-  <!-- generate a full address from fields containing the component parts -->
-  <field column="full_address" template="${en.street},${en.city},${en.zip}" />
-</entity>
-----
-
-== Special Commands for DIH
-
-You can pass special commands to the DIH by adding any of the variables listed below to any row returned by any component:
-
-$skipDoc::
-Skip the current document; that is, do not add it to Solr. The value can be the string `true` or `false`.
-
-$skipRow::
-Skip the current row. The document will be added with rows from other entities. The value can be the string `true` or `false`.
-
-$deleteDocById::
-Delete a document from Solr with this ID. The value has to be the `uniqueKey` value of the document.
-
-$deleteDocByQuery::
-Delete documents from Solr using this query. The value must be a Solr Query.
diff --git a/solr/solr-ref-guide/src/velocity-response-writer.adoc b/solr/solr-ref-guide/src/velocity-response-writer.adoc
deleted file mode 100644
index c493440..0000000
--- a/solr/solr-ref-guide/src/velocity-response-writer.adoc
+++ /dev/null
@@ -1,122 +0,0 @@
-= Velocity Response Writer
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-The VelocityResponseWriter is an optional plugin available in the `contrib/velocity` directory. It powers the /browse user interfaces when using some example configurations such as "techproducts" and "example/files".
-
-[IMPORTANT]
-====
-The VelocityResponseWriter has been deprecated and may be removed in a future version of Solr.
-====
-
-Its JAR and dependencies must be added (via `<lib>` or solr/home lib inclusion), and must be registered in `solrconfig.xml` like this:
-
-[source,xml]
-----
-<queryResponseWriter name="velocity" class="solr.VelocityResponseWriter">
-  <str name="template.base.dir">${velocity.template.base.dir:}</str>
-
-<!--
-  <str name="init.properties.file">velocity-init.properties</str>
-  <lst name="tools">
-    <str name="mytool">com.example.MyCustomTool</str>
-  </lst>
--->
-</queryResponseWriter>
-----
-
-== Configuration & Usage
-
-=== Template Rendering Protections
-
-Velocity template rendering is largely controlled by the `trusted` configset flag.  Templates built into (the `/browse` ones) the component library are always available
-with this component.  In a trusted configset, templates in the `velocity/` subdirectory of the configset are renderable.  Also in a trusted configset, when `template.base.dir`
-is specified those templates are renderable.
-
-=== VelocityResponseWriter Initialization Parameters
-
-`template.base.dir`::
-If specified and exists as a file system directory, a file resource loader will be added for this directory. Templates in this directory will override "solr" resource loader templates.
-
-`init.properties.file`:: Specifies a properties file name which must exist in the Solr `conf/` directory (*not* under a `velocity/` subdirectory) or root of a JAR file in a <lib>.
-
-`tools`::
-External "tools" can be specified as list of string name/value (tool name / class name) pairs. Tools, in the Velocity context, are simply Java objects. Tool classes are constructed using a no-arg constructor (or a single-SolrCore-arg constructor if it exists) and added to the Velocity context with the specified name.
-+
-A custom registered tool can override the built-in context objects with the same name, except for `$request`, `$response`, `$page`, and `$debug` (these tools are designed to not be overridden).
-
-=== VelocityResponseWriter Request Parameters
-
-`v.template`::
-Specifies the name of the template to render.
-
-`v.layout`::
-Specifies a template name to use as the layout around the main, `v.template`, specified template.
-+
-The main template is rendered into a string value included into the layout rendering as `$content`.
-
-`v.layout.enabled`::
-Determines if the main template should have a layout wrapped around it. The default is `true`, but requires `v.layout` to specified as well.
-
-`v.contentType`::
-Specifies the content type used in the HTTP response. If not specified, the default will depend on whether `v.json` is specified or not.
-+
-The default without `v.json=wrf`: `text/html;charset=UTF-8`.
-+
-The default with `v.json=wrf`: `application/json;charset=UTF-8`.
-
-`v.json`::
-Specifies a function name to wrap around the response rendered as JSON. If specified, the content type used in the response will be "application/json;charset=UTF-8", unless overridden by `v.contentType`.
-+
-Output will be in this format (with `v.json=wrf`):
-+
-`wrf("result":"<Velocity generated response string, with quotes and backslashes escaped>")`
-
-`v.locale`::
-Locale to use with the `$resource` tool and other LocaleConfig implementing tools. The default locale is `Locale.ROOT`. Localized resources are loaded from standard Java resource bundles named `resources[_locale-code].properties`.
-+
-Resource bundles can be added by providing a JAR file visible by the SolrResourceLoader with resource bundles under a velocity sub-directory. Resource bundles are not loadable under `conf/`, as only the class loader aspect of SolrResourceLoader can be used here.
-
-
-=== VelocityResponseWriter Context Objects
-
-// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
-
-[cols="30,70",options="header"]
-|===
-|Context Reference |Description
-|`request` |{solr-javadocs}solr-core/org/apache/solr/request/SolrQueryRequest.html[SolrQueryRequest] javadocs
-|`response` |{solr-javadocs}solr-core/org/apache/solr/response/SolrQueryResponse.html[QueryResponse] most of the time, but in some cases where QueryResponse doesn't like the request handler's output (https://cwiki.apache.org/confluence/display/solr/AnalysisRequestHandler[AnalysisRequestHandler], for example, causes a ClassCastException parsing "response"), the response will be a SolrResponseBase object.
-|`esc` |A Velocity http://velocity.apache.org/tools/{ivy-velocity-tools-version}/tools-summary.html#EscapeTool[EscapeTool] instance
-|`date` |A Velocity http://velocity.apache.org/tools/{ivy-velocity-tools-version}/tools-summary.html#ComparisonDateTool[ComparisonDateTool] instance
-|`math` |A Velocity http://velocity.apache.org/tools/{ivy-velocity-tools-version}/tools-summary.html#MathTool[MathTool] instance
-|`number` |A Velocity http://velocity.apache.org/tools/{ivy-velocity-tools-version}/tools-summary.html#NumberTool[NumberTool] instance
-|`sort` |A Velocity http://velocity.apache.org/tools/{ivy-velocity-tools-version}/tools-summary.html#SortTool[SortTool] instance
-|`display` |A Velocity http://velocity.apache.org/tools/{ivy-velocity-tools-version}/tools-summary.html#DisplayTool[DisplayTool] instance
-|`resource` |A Velocity http://velocity.apache.org/tools/{ivy-velocity-tools-version}/tools-summary.html#ResourceTool[ResourceTool] instance
-|`engine` |The current VelocityEngine instance
-|`page` |An instance of Solr's PageTool (only included if the response is a QueryResponse where paging makes sense)
-|`debug` |A shortcut to the debug part of the response, or null if debug is not on. This is handy for having debug-only sections in a template using `#if($debug)...#end`
-|`content` |The rendered output of the main template, when rendering the layout (`v.layout.enabled=true` and `v.layout=<template>`).
-|[custom tool(s)] |Tools provided by the optional "tools" list of the VelocityResponseWriter registration are available by their specified name.
-|===
-
-=== VelocityResponseWriter Usage
-
-To see results in an HTML user interface on your own collection, try http://localhost:8983/solr/<my collection>/select?q=*:*&wt=velocity&v.template=browse&v.layout=layout
-
-Or try `/browse` in the examples techproducts or example/files.
diff --git a/solr/solr-ref-guide/src/velocity-search-ui.adoc b/solr/solr-ref-guide/src/velocity-search-ui.adoc
deleted file mode 100644
index 96d7f93..0000000
--- a/solr/solr-ref-guide/src/velocity-search-ui.adoc
+++ /dev/null
@@ -1,26 +0,0 @@
-= Velocity Search UI
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-Solr includes a sample search UI based on the <<response-writers.adoc#velocity-writer,VelocityResponseWriter>> (also known as Solritas) that demonstrates several useful features, such as searching, faceting, highlighting, autocomplete, and geospatial searching.
-
-When using the `sample_techproducts_configs` configset, you can access the Velocity sample Search UI: `\http://localhost:8983/solr/techproducts/browse`
-
-.The Velocity Search UI
-image::images/velocity-search-ui/techproducts_browse.png[image,width=500]
-
-For more information about the Velocity Response Writer, see the <<response-writers.adoc#velocity-writer,Response Writer page>>.