You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by cp...@apache.org on 2017/05/12 13:42:57 UTC
[04/50] [abbrv] lucene-solr:jira/solr-8668: squash merge jira/solr-10290 into master

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc b/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc
new file mode 100644
index 0000000..0fd66ab
--- /dev/null
+++ b/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc
@@ -0,0 +1,940 @@
+= Uploading Structured Data Store Data with the Data Import Handler
+:page-shortname: uploading-structured-data-store-data-with-the-data-import-handler
+:page-permalink: uploading-structured-data-store-data-with-the-data-import-handler.html
+:toclevels: 1
+
+Many search applications store the content to be indexed in a structured data store, such as a relational database. The Data Import Handler (DIH) provides a mechanism for importing content from a data store and indexing it. In addition to relational databases, DIH can index content from HTTP based data sources such as RSS and ATOM feeds, e-mail repositories, and structured XML where an XPath processor is used to generate fields.
+
+The `example/example-DIH` directory contains several collections many of the features of the data import handler. To run this "```dih```" example:
+
+[source,bash]
+----
+bin/solr -e dih
+----
+
+For more information about the Data Import Handler, see https://wiki.apache.org/solr/DataImportHandler.
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-ConceptsandTerminology]]
+== Concepts and Terminology
+
+Descriptions of the Data Import Handler use several familiar terms, such as entity and processor, in specific ways, as explained in the table below.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Term |Definition
+|Datasource |As its name suggests, a datasource defines the location of the data of interest. For a database, it's a DSN. For an HTTP datasource, it's the base URL.
+|Entity |Conceptually, an entity is processed to generate a set of documents, containing multiple fields, which (after optionally being transformed in various ways) are sent to Solr for indexing. For a RDBMS data source, an entity is a view or table, which would be processed by one or more SQL statements to generate a set of rows (documents) with one or more columns (fields).
+|Processor |An entity processor does the work of extracting content from a data source, transforming it, and adding it to the index. Custom entity processors can be written to extend or replace the ones supplied.
+|Transformer |Each set of fields fetched by the entity may optionally be transformed. This process can modify the fields, create new fields, or generate multiple rows/documents form a single row. There are several built-in transformers in the DIH, which perform functions such as modifying dates and stripping HTML. It is possible to write custom transformers using the publicly available interface.
+|===
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-Configuration]]
+== Configuration
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-Configuringsolrconfig.xml]]
+=== Configuring `solrconfig.xml`
+
+The Data Import Handler has to be registered in `solrconfig.xml`. For example:
+
+[source,xml]
+----
+<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
+  <lst name="defaults">
+    <str name="config">/path/to/my/DIHconfigfile.xml</str>
+  </lst>
+</requestHandler>
+----
+
+The only required parameter is the `config` parameter, which specifies the location of the DIH configuration file that contains specifications for the data source, how to fetch data, what data to fetch, and how to process it to generate the Solr documents to be posted to the index.
+
+You can have multiple DIH configuration files. Each file would require a separate definition in the `solrconfig.xml` file, specifying a path to the file.
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-ConfiguringtheDIHConfigurationFile]]
+=== Configuring the DIH Configuration File
+
+An annotated configuration file, based on the "```db```" collection in the `dih` example server, is shown below (`example/example-DIH/solr/db/conf/db-data-config.xml`). It extracts fields from the four tables defining a simple product database, with this schema. More information about the parameters and options shown here are described in the sections following.
+
+[source,xml]
+----
+<dataConfig>
+<!-- The first element is the dataSource, in this case an HSQLDB database.
+     The path to the JDBC driver and the JDBC URL and login credentials are all specified here.
+     Other permissible attributes include whether or not to autocommit to Solr, the batchsize
+     used in the JDBC connection, a 'readOnly' flag.
+     The password attribute is optional if there is no password set for the DB.
+-->
+  <dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:./example-DIH/hsqldb/ex" user="sa" password="secret"/>
+<!--
+Alternately the password can be encrypted as follows. This is the value obtained as a result of the command
+openssl enc -aes-128-cbc -a -salt -in pwd.txt
+password="U2FsdGVkX18QMjY0yfCqlfBMvAB4d3XkwY96L7gfO2o="
+WHen the password is encrypted, you must provide an extra attribute
+encryptKeyFile="/location/of/encryptionkey"
+This file should a text file with a single line containing the encrypt/decrypt password
+
+-->
+<!-- A 'document' element follows, containing multiple 'entity' elements.
+     Note that 'entity' elements can be nested, and this allows the entity
+     relationships in the sample database to be mirrored here, so that we can
+     generate a denormalized Solr record which may include multiple features
+     for one item, for instance -->
+  <document>
+
+<!-- The possible attributes for the entity element are described below.
+     Entity elements may contain one or more 'field' elements, which map
+     the data source field names to Solr fields, and optionally specify
+     per-field transformations -->
+<!-- this entity is the 'root' entity. -->
+    <entity name="item" query="select * from item"
+            deltaQuery="select id from item where last_modified > '${dataimporter.last_index_time}'">
+      <field column="NAME" name="name" />
+
+<!-- This entity is nested and reflects the one-to-many relationship between an item and its multiple features.
+     Note the use of variables; ${item.ID} is the value of the column 'ID' for the current item
+     ('item' referring to the entity name)  -->
+      <entity name="feature"
+              query="select DESCRIPTION from FEATURE where ITEM_ID='${item.ID}'"
+              deltaQuery="select ITEM_ID from FEATURE where last_modified > '${dataimporter.last_index_time}'"
+              parentDeltaQuery="select ID from item where ID=${feature.ITEM_ID}">
+        <field name="features" column="DESCRIPTION" />
+      </entity>
+      <entity name="item_category"
+              query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'"
+              deltaQuery="select ITEM_ID, CATEGORY_ID from item_category where last_modified > '${dataimporter.last_index_time}'"
+              parentDeltaQuery="select ID from item where ID=${item_category.ITEM_ID}">
+        <entity name="category"
+                query="select DESCRIPTION from category where ID = '${item_category.CATEGORY_ID}'"
+                deltaQuery="select ID from category where last_modified > '${dataimporter.last_index_time}'"
+                parentDeltaQuery="select ITEM_ID, CATEGORY_ID from item_category where CATEGORY_ID=${category.ID}">
+          <field column="description" name="cat" />
+        </entity>
+      </entity>
+    </entity>
+  </document>
+</dataConfig>
+----
+
+Datasources can still be specified in `solrconfig.xml`. These must be specified in the defaults section of the handler in `solrconfig.xml`. However, these are not parsed until the main configuration is loaded.
+
+The entire configuration itself can be passed as a request parameter using the `dataConfig` parameter rather than using a file. When configuration errors are encountered, the error message is returned in XML format.
+
+A `reload-config` command is also supported, which is useful for validating a new configuration file, or if you want to specify a file, load it, and not have it reloaded again on import. If there is an `xml` mistake in the configuration a user-friendly message is returned in `xml` format. You can then fix the problem and do a `reload-config`.
+
+[TIP]
+====
+
+You can also view the DIH configuration in the Solr Admin UI and there is an interface to import content.
+
+====
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-RequestParameters]]
+==== Request Parameters
+
+Request parameters can be substituted in configuration with placeholder `${dataimporter.request.paramname}`.
+
+[source,xml]
+----
+<dataSource driver="org.hsqldb.jdbcDriver"
+            url="${dataimporter.request.jdbcurl}"
+	    user="${dataimporter.request.jdbcuser}"
+	    password=${dataimporter.request.jdbcpassword} />
+----
+
+Then, these parameters can be passed to the full-import command or defined in the `<defaults>` section in `solrconfig.xml`. This example shows the parameters with the full-import command:
+
+`dataimport?command=full-import&jdbcurl=jdbc:hsqldb:./example-DIH/hsqldb/ex&jdbcuser=sa&jdbcpassword=secret`
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-DataImportHandlerCommands]]
+== Data Import Handler Commands
+
+DIH commands are sent to Solr via an HTTP request. The following operations are supported.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Command |Description
+|`abort` |Aborts an ongoing operation. The URL is `\http://<host>:<port>/solr/<collection_name>/dataimport?command=abort`.
+|`delta-import` |For incremental imports and change detection. The command is of the form `\http://<host>:<port>/solr/<collection_name>/dataimport?command=delta-import`.  It supports the same clean, commit, optimize and debug parameters as full-import command. Only the SqlEntityProcessor supports delta imports.
+|`full-import` |A Full Import operation can be started with a URL of the form `\http://<host>:<port>/solr/<collection_name>/dataimport?command=full-import`. The command returns immediately. The operation will be started in a new thread and the _status_ attribute in the response should be shown as _busy_. The operation may take some time depending on the size of dataset. Queries to Solr are not blocked during full-imports. When a full-import command is executed, it stores the start time of the operation in a file located at `conf/dataimport.properties`. This stored timestamp is used when a delta-import operation is executed. For a list of parameters that can be passed to this command, see below.
+|`reload-config` a|
+If the configuration file has been changed and you wish to reload it without restarting Solr, run the command
+
+`\http://<host>:<port>/solr/<collection_name>/command=reload-config`
+
+|`status` |The URL is `\http://<host>:<port>/solr/<collection_name>/dataimport?command=status`. It returns statistics on the number of documents created, deleted, queries run, rows fetched, status, and so on.
+|`show-config` |responds with configuration
+|===
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand]]
+=== Parameters for the `full-import` Command
+
+The `full-import` command accepts the following parameters:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Parameter |Description
+|clean |Default is true. Tells whether to clean up the index before the indexing is started.
+|commit |Default is true. Tells whether to commit after the operation.
+|debug |Default is false Runs the command in debug mode. It is used by the interactive development mode. Note that in debug mode, documents are never committed automatically. If you want to run debug mode and commit the results too, add `commit=true` as a request parameter.
+|entity |The name of an entity directly under the `<document>` tag in the configuration file. Use this to execute one or more entities selectively. Multiple "entity" parameters can be passed on to run multiple entities at once. If nothing is passed, all entities are executed.
+|optimize |Default is true. Tells Solr whether to optimize after the operation.
+|synchronous |Blocks request until import is completed. Default is `false`.
+|===
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-PropertyWriter]]
+== Property Writer
+
+The `propertyWriter` element defines the date format and locale for use with delta queries. It is an optional configuration. Add the element to the DIH configuration file, directly under the `dataConfig` element.
+
+[source,xml]
+----
+<propertyWriter dateFormat="yyyy-MM-dd HH:mm:ss" type="SimplePropertiesWriter"
+                directory="data" filename="my_dih.properties" locale="en-US" />
+----
+
+The parameters available are:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Parameter |Description
+|dateFormat |A java.text.SimpleDateFormat to use when converting the date to text. The default is "yyyy-MM-dd HH:mm:ss".
+|type |The implementation class. Use `SimplePropertiesWriter` for non-SolrCloud installations. If using SolrCloud, use `ZKPropertiesWriter`. If this is not specified, it will default to the appropriate class depending on if SolrCloud mode is enabled.
+|directory |Used with the `SimplePropertiesWriter` only). The directory for the properties file. If not specified, the default is "conf".
+|filename |Used with the `SimplePropertiesWriter` only). The name of the properties file. If not specified, the default is the requestHandler name (as defined in `solrconfig.xml`, appended by ".properties" (i.e., "dataimport.properties").
+|locale |The locale. If not defined, the ROOT locale is used. It must be specified as language-country (https://tools.ietf.org/html/bcp47[BCP 47 language tag]). For example, `en-US`.
+|===
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-DataSources]]
+== Data Sources
+
+A data source specifies the origin of data and its type. Somewhat confusingly, some data sources are configured within the associated entity processor. Data sources can also be specified in `solrconfig.xml`, which is useful when you have multiple environments (for example, development, QA, and production) differing only in their data sources.
+
+You can create a custom data source by writing a class that extends `org.apache.solr.handler.dataimport.DataSource`.
+
+The mandatory attributes for a data source definition are its name and type. The name identifies the data source to an Entity element.
+
+The types of data sources available are described below.
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-ContentStreamDataSource]]
+=== ContentStreamDataSource
+
+This takes the POST data as the data source. This can be used with any EntityProcessor that uses a `DataSource<Reader>`.
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-FieldReaderDataSource]]
+=== FieldReaderDataSource
+
+This can be used where a database field contains XML which you wish to process using the XPathEntityProcessor. You would set up a configuration with both JDBC and FieldReader data sources, and two entities, as follows:
+
+[source,xml]
+----
+<dataSource name="a1" driver="org.hsqldb.jdbcDriver" ...  />
+<dataSource name="a2" type="FieldReaderDataSource" />
+<document>
+
+  <!-- processor for database -->
+
+  <entity name ="e1" dataSource="a1" processor="SqlEntityProcessor" pk="docid"
+          query="select * from t1 ...">
+
+    <!-- nested XpathEntity; the field in the parent which is to be used for
+         Xpath is set in the "datafield" attribute in place of the "url" attribute -->
+
+    <entity name="e2" dataSource="a2" processor="XPathEntityProcessor"
+            dataField="e1.fieldToUseForXPath">
+
+      <!-- Xpath configuration follows -->
+      ...
+    </entity>
+  </entity>
+</document>
+----
+
+The FieldReaderDataSource can take an `encoding` parameter, which will default to "UTF-8" if not specified.It must be specified as language-country. For example, `en-US`.
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-FileDataSource]]
+=== FileDataSource
+
+This can be used like an <<UploadingStructuredDataStoreDatawiththeDataImportHandler-URLDataSource,URLDataSource>>, but is used to fetch content from files on disk. The only difference from URLDataSource, when accessing disk files, is how a pathname is specified.
+
+This data source accepts these optional attributes.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Optional Attribute |Description
+|basePath |The base path relative to which the value is evaluated if it is not absolute.
+|encoding |Defines the character encoding to use. If not defined, UTF-8 is used.
+|===
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-JdbcDataSource]]
+=== JdbcDataSource
+
+This is the default datasource. It's used with the <<UploadingStructuredDataStoreDatawiththeDataImportHandler-TheSQLEntityProcessor,SqlEntityProcessor>>. See the example in the <<UploadingStructuredDataStoreDatawiththeDataImportHandler-FieldReaderDataSource,FieldReaderDataSource>> section for details on configuration. `JdbcDatasource` supports at least the following attributes: .
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attributes |Description
+|driver, url, user, password, encryptKeyFile |Usual jdbc connection properties
+|`batchSize`
+ a|
+Passed to Statement#setFetchSize, default value 500.
+
+For MySQL driver, which doesn't honor fetchSize and pulls whole resultSet, which often lead to OutOfMemoryError.
+
+In this case, set `batchSize=-1` that pass setFetchSize(Integer.MIN_VALUE), and switch result set to pull row by row
+
+|===
+
+All of them substitute properties via $\{placeholders}.
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-URLDataSource]]
+=== URLDataSource
+
+This data source is often used with XPathEntityProcessor to fetch content from an underlying `file://` or `http://` location. Here's an example:
+
+[source,xml]
+----
+<dataSource name="a"
+            type="URLDataSource"
+            baseUrl="http://host:port/"
+            encoding="UTF-8"
+            connectionTimeout="5000"
+            readTimeout="10000"/>
+----
+
+The URLDataSource type accepts these optional parameters:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Optional Parameter |Description
+|baseURL |Specifies a new baseURL for pathnames. You can use this to specify host/port changes between Dev/QA/Prod environments. Using this attribute isolates the changes to be made to the `solrconfig.xml`
+|connectionTimeout |Specifies the length of time in milliseconds after which the connection should time out. The default value is 5000ms.
+|encoding |By default the encoding in the response header is used. You can use this property to override the default encoding.
+|readTimeout |Specifies the length of time in milliseconds after which a read operation should time out. The default value is 10000ms.
+|===
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors]]
+== Entity Processors
+
+Entity processors extract data, transform it, and add it to a Solr index. Examples of entities include views or tables in a data store.
+
+Each processor has its own set of attributes, described in its own section below. In addition, there are non-specific attributes common to all entities which may be specified.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Use
+|dataSource |The name of a data source. If there are multiple data sources defined, use this attribute with the name of the data source for this entity.
+|name |Required. The unique name used to identify an entity.
+|pk |The primary key for the entity. It is optional, and required only when using delta-imports. It has no relation to the uniqueKey defined in `schema.xml` but they can both be the same. It is mandatory if you do delta-imports and then refers to the column name in `${dataimporter.delta.<column-name>`} which is used as the primary key.
+|processor |Default is SqlEntityProcessor. Required only if the datasource is not RDBMS.
+|onError |Permissible values are (abort|skip|continue) . The default value is 'abort'. 'Skip' skips the current document. 'Continue' ignores the error and processing continues.
+|preImportDeleteQuery |Before a full-import command, use this query this to cleanup the index instead of using '*:*'. This is honored only on an entity that is an immediate sub-child of `<document>`.
+|postImportDeleteQuery |Similar to the above, but executed after the import has completed.
+|rootEntity |By default the entities immediately under the `<document>` are root entities. If this attribute is set to false, the entity directly falling under that entity will be treated as the root entity (and so on). For every row returned by the root entity, a document is created in Solr.
+|transformer |Optional. One or more transformers to be applied on this entity.
+|cacheImpl |Optional. A class (which must implement `DIHCache`) to use for caching this entity when doing lookups from an entity which wraps it. Provided implementation is "```SortedMapBackedCache```".
+|cacheKey |The name of a property of this entity to use as a cache key if `cacheImpl` is specified.
+|cacheLookup |An entity + property name that will be used to lookup cached instances of this entity if `cacheImpl` is specified.
+|where |an alternative way to specify `cacheKey` and `cacheLookup` concatenated with '='. eg `where="CODE=People.COUNTRY_CODE"` is equal to `cacheKey="CODE" cacheLookup="People.COUNTRY_CODE"`
+|child="true" |Enables indexing document blocks aka <<uploading-data-with-index-handlers.adoc#uploading-data-with-index-handlers,Nested Child Documents>> for searching with <<other-parsers.adoc#other-parsers,Block Join Query Parsers>>. It can be only specified on `<entity>` under another root entity. It switches from default behavior (merging field values) to nesting documents as children documents. Note: parent `<entity>` should add a field which is used as a parent filter in query time.
+|join="zipper" |Enables merge join aka "zipper" algorithm for joining parent and child entities without cache. It should be specified at child (nested) `<entity>`. It implies that parent and child queries return results ordered by keys, otherwise it throws an exception. Keys should be specified either with `where` attribute or with `cacheKey` and `cacheLookup`.
+|===
+
+Caching of entities in DIH is provided to avoid repeated lookups for same entities again and again. The default `SortedMapBackedCache` is a `HashMap` where a key is a field in the row and the value is a bunch of rows for that same key.
+
+In the example below, each `manufacturer` entity is cached using the '`id`' property as a cache key. Cache lookups will be performed for each `product` entity based on the product's "```manu```" property. When the cache has no data for a particular key, the query is run and the cache is populated
+
+[source,xml]
+----
+<entity name="product" query="select description,sku, manu from product" >
+  <entity name="manufacturer" query="select id, name from manufacturer"
+          cacheKey="id" cacheLookup="product.manu" cacheImpl="SortedMapBackedCache"/>
+</entity>
+----
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-TheSQLEntityProcessor]]
+=== The SQL Entity Processor
+
+The SqlEntityProcessor is the default processor. The associated <<UploadingStructuredDataStoreDatawiththeDataImportHandler-JdbcDataSource,data source>> should be a JDBC URL.
+
+The entity attributes specific to this processor are shown in the table below.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Use
+|query |Required. The SQL query used to select rows.
+|deltaQuery |SQL query used if the operation is delta-import. This query selects the primary keys of the rows which will be parts of the delta-update. The pks will be available to the deltaImportQuery through the variable `${dataimporter.delta.<column-name>`}.
+|parentDeltaQuery |SQL query used if the operation is delta-import.
+|deletedPkQuery |SQL query used if the operation is delta-import.
+|deltaImportQuery |SQL query used if the operation is delta-import. If this is not present, DIH tries to construct the import query by(after identifying the delta) modifying the 'query' (this is error prone). There is a namespace `${dataimporter.delta.<column-name>`} which can be used in this query. For example, `select * from tbl where id=${dataimporter.delta.id`}.
+|===
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-TheXPathEntityProcessor]]
+=== The XPathEntityProcessor
+
+This processor is used when indexing XML formatted data. The data source is typically <<UploadingStructuredDataStoreDatawiththeDataImportHandler-URLDataSource,URLDataSource>> or <<UploadingStructuredDataStoreDatawiththeDataImportHandler-FileDataSource,FileDataSource>>. Xpath can also be used with the <<UploadingStructuredDataStoreDatawiththeDataImportHandler-TheFileListEntityProcessor,FileListEntityProcessor>> described below, to generate a document from each file.
+
+The entity attributes unique to this processor are shown below.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Use
+|Processor |Required. Must be set to "XpathEntityProcessor".
+|url |Required. HTTP URL or file location.
+|stream |Optional: Set to true for a large file or download.
+|forEach |Required unless you define `useSolrAddSchema`. The Xpath expression which demarcates each record. This will be used to set up the processing loop.
+|xsl |Optional: Its value (a URL or filesystem path) is the name of a resource used as a preprocessor for applying the XSL transformation.
+|useSolrAddSchema |Set this to true if the content is in the form of the standard Solr update XML schema.
+|===
+
+Each field element in the entity can have the following attributes as well as the default ones.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Use
+|xpath |Required. The XPath expression which will extract the content from the record for this field. Only a subset of Xpath syntax is supported.
+|commonField |Optional. If true, then when this field is encountered in a record it will be copied to future records when creating a Solr document.
+|flatten a|
+Optional: If set to true, then any children text nodes are collected to form the value of a field. image::images/icons/emoticons/warning.png[(warning)]
+ The default value is false, meaning that
+
+if there are any sub-elements of the node pointed to by the XPath expression, they will be quietly omitted.
+
+|===
+
+Here is an example from the "```rss```" collection in the `dih` example (`example/example-DIH/solr/rss/conf/rss-data-config.xml`):
+
+[source,xml]
+----
+<!-- slashdot RSS Feed --->
+<dataConfig>
+  <dataSource type="HttpDataSource" />
+    <document>
+      <entity name="slashdot"
+              pk="link"
+              url="http://rss.slashdot.org/Slashdot/slashdot"
+              processor="XPathEntityProcessor"
+              transformer="DateFormatTransformer"
+              forEach="/RDF/channel | /RDF/item" >
+          <!-- NOTE: forEach sets up a processing loop ; here there are two expressions -->
+      <field column="source" xpath="/RDF/channel/title" commonField="true" />
+      <field column="source-link" xpath="/RDF/channel/link" commonField="true"/>
+      <field column="subject" xpath="/RDF/channel/subject" commonField="true" />
+      <field column="title" xpath="/RDF/item/title" />
+      <field column="link" xpath="/RDF/item/link" />
+      <field column="description" xpath="/RDF/item/description" />
+      <field column="creator" xpath="/RDF/item/creator" />
+      <field column="item-subject" xpath="/RDF/item/subject" />
+      <field column="date" xpath="/RDF/item/date"
+             dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" />
+      <field column="slash-department" xpath="/RDF/item/department" />
+      <field column="slash-section" xpath="/RDF/item/section" />
+      <field column="slash-comments" xpath="/RDF/item/comments" />
+    </entity>
+  </document>
+</dataConfig>
+----
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-TheMailEntityProcessor]]
+=== The MailEntityProcessor
+
+The MailEntityProcessor uses the Java Mail API to index email messages using the IMAP protocol. The MailEntityProcessor works by connecting to a specified mailbox using a username and password, fetching the email headers for each message, and then fetching the full email contents to construct a document (one document for each mail message).
+
+Here is an example from the "```mail```" collection of the `dih` example (`example/example-DIH/mail/conf/mail-data-config.xml`):
+
+[source,xml]
+----
+<dataConfig>
+  <document>
+      <entity processor="MailEntityProcessor"
+              user="email@gmail.com"
+              password="password"
+              host="imap.gmail.com"
+              protocol="imaps"
+              fetchMailsSince="2009-09-20 00:00:00"
+              batchSize="20"
+              folders="inbox"
+              processAttachement="false"
+              name="sample_entity"/>
+  </document>
+</dataConfig>
+----
+
+The entity attributes unique to the MailEntityProcessor are shown below.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Use
+|processor |Required. Must be set to "MailEntityProcessor".
+|user |Required. Username for authenticating to the IMAP server; this is typically the email address of the mailbox owner.
+|password |Required. Password for authenticating to the IMAP server.
+|host |Required. The IMAP server to connect to.
+|protocol |Required. The IMAP protocol to use, valid values are: imap, imaps, gimap, and gimaps.
+|fetchMailsSince |Optional. Date/time used to set a filter to import messages that occur after the specified date; expected format is: `yyyy-MM-dd HH:mm:ss`.
+|folders |Required. Comma-delimited list of folder names to pull messages from, such as "inbox".
+|recurse |Optional (default is true). Flag to indicate if the processor should recurse all child folders when looking for messages to import.
+|include |Optional. Comma-delimited list of folder patterns to include when processing folders (can be a literal value or regular expression).
+|exclude |Optional. Comma-delimited list of folder patterns to exclude when processing folders (can be a literal value or regular expression); excluded folder patterns take precedence over include folder patterns.
+a|
+processAttachement
+
+or
+
+processAttachments
+
+ |Optional (default is true). Use Tika to process message attachments.
+|includeContent |Optional (default is true). Include the message body when constructing Solr documents for indexing.
+|===
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-ImportingNewEmailsOnly]]
+==== Importing New Emails Only
+
+After running a full import, the MailEntityProcessor keeps track of the timestamp of the previous import so that subsequent imports can use the fetchMailsSince filter to only pull new messages from the mail server. This occurs automatically using the Data Import Handler dataimport.properties file (stored in conf). For instance, if you set `fetchMailsSince="2014-08-22 00:00:00"` in your `mail-data-config.xml`, then all mail messages that occur after this date will be imported on the first run of the importer. Subsequent imports will use the date of the previous import as the fetchMailsSince filter, so that only new emails since the last import are indexed each time.
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-GMailExtensions]]
+==== GMail Extensions
+
+When connecting to a GMail account, you can improve the efficiency of the MailEntityProcessor by setting the protocol to *gimap* or *gimaps*. This allows the processor to send the fetchMailsSince filter to the GMail server to have the date filter applied on the server, which means the processor only receives new messages from the server. However, GMail only supports date granularity, so the server-side filter may return previously seen messages if run more than once a day.
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-TheTikaEntityProcessor]]
+=== The TikaEntityProcessor
+
+The TikaEntityProcessor uses Apache Tika to process incoming documents. This is similar to <<uploading-data-with-solr-cell-using-apache-tika.adoc#uploading-data-with-solr-cell-using-apache-tika,Uploading Data with Solr Cell using Apache Tika>>, but using the DataImportHandler options instead.
+
+Here is an example from the "```tika```" collection of the `dih` example (`example/example-DIH/tika/conf/tika-data-config.xml`):
+
+[source,xml]
+----
+<dataConfig>
+  <dataSource type="BinFileDataSource" />
+  <document>
+    <entity name="tika-test" processor="TikaEntityProcessor"
+            url="../contrib/extraction/src/test-files/extraction/solr-word.pdf" format="text">
+      <field column="Author" name="author" meta="true"/>
+      <field column="title" name="title" meta="true"/>
+      <field column="text" name="text"/>
+    </entity>
+  </document>
+</dataConfig>
+----
+
+The parameters for this processor are described in the table below:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Use
+|dataSource a|
+This parameter defines the data source and an optional name which can be referred to in later parts of the configuration if needed. This is the same dataSource explained in the description of general entity processor attributes above. The available data source types for this processor are:
+
+* BinURLDataSource: used for HTTP resources, but can also be used for files.
+* BinContentStreamDataSource: used for uploading content as a stream.
+* BinFileDataSource: used for content on the local filesystem.
+
+|url |The path to the source file(s), as a file path or a traditional internet URL. This parameter is required.
+|htmlMapper |Allows control of how Tika parses HTML. The "default" mapper strips much of the HTML from documents while the "identity" mapper passes all HTML as-is with no modifications. If this parameter is defined, it must be either *default* or *identity*; if it is absent, "default" is assumed.
+|format |The output format. The options are *text*, *xml*, *html* or *none*. The default is "text" if not defined. The format "none" can be used if metadata only should be indexed and not the body of the documents.
+|parser |The default parser is `org.apache.tika.parser.AutoDetectParser`. If a custom or other parser should be used, it should be entered as a fully-qualified name of the class and path.
+|fields |The list of fields from the input documents and how they should be mapped to Solr fields. If the attribute `meta` is defined as "true", the field will be obtained from the metadata of the document and not parsed from the body of the main text.
+|extractEmbedded |Instructs the TikaEntityProcessor to extract embedded documents or attachments when *true*. If false, embedded documents and attachments will be ignored.
+|onError |By default, the TikaEntityProcessor will stop processing documents if it finds one that generates an error. If you define `onError` to "skip", the TikaEntityProcessor will instead skip documents that fail processing and log a message that the document was skipped.
+|===
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-TheFileListEntityProcessor]]
+=== The FileListEntityProcessor
+
+This processor is basically a wrapper, and is designed to generate a set of files satisfying conditions specified in the attributes which can then be passed to another processor, such as the <<UploadingStructuredDataStoreDatawiththeDataImportHandler-TheXPathEntityProcessor,XPathEntityProcessor>>. The entity information for this processor would be nested within the FileListEnitity entry. It generates five implicit fields: `fileAbsolutePath, ``fileDir, fileSize, ``fileLastModified, ``file,` which can be used in the nested processor. This processor does not use a data source.
+
+The attributes specific to this processor are described in the table below:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Use
+|fileName |Required. A regular expression pattern to identify files to be included.
+|basedir |Required. The base directory (absolute path).
+|recursive |Whether to search directories recursively. Default is 'false'.
+|excludes |A regular expression pattern to identify files which will be excluded.
+|newerThan |A date in the format `yyyy-MM-ddHH:mm:ss` or a date math expression (`NOW - 2YEARS`).
+|olderThan |A date, using the same formats as newerThan.
+|rootEntity |This should be set to false. This ensures that each row (filepath) emitted by this processor is considered to be a document.
+|dataSource |Must be set to null.
+|===
+
+The example below shows the combination of the FileListEntityProcessor with another processor which will generate a set of fields from each file found.
+
+[source,xml]
+----
+<dataConfig>
+  <dataSource type="FileDataSource"/>
+  <document>
+    <!-- this outer processor generates a list of files satisfying the conditions
+         specified in the attributes -->
+    <entity name="f" processor="FileListEntityProcessor"
+            fileName=".*xml"
+            newerThan="'NOW-30DAYS'"
+            recursive="true"
+            rootEntity="false"
+            dataSource="null"
+            baseDir="/my/document/directory">
+
+      <!-- this processor extracts content using Xpath from each file found -->
+
+      <entity name="nested" processor="XPathEntityProcessor"
+              forEach="/rootelement" url="${f.fileAbsolutePath}" >
+        <field column="name" xpath="/rootelement/name"/>
+        <field column="number" xpath="/rootelement/number"/>
+      </entity>
+    </entity>
+  </document>
+</dataConfig>
+----
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-LineEntityProcessor]]
+=== LineEntityProcessor
+
+This EntityProcessor reads all content from the data source on a line by line basis and returns a field called `rawLine` for each line read. The content is not parsed in any way; however, you may add transformers to manipulate the data within the `rawLine` field, or to create other additional fields.
+
+The lines read can be filtered by two regular expressions specified with the `acceptLineRegex` and `omitLineRegex` attributes. The table below describes the LineEntityProcessor's attributes:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Description
+|url |A required attribute that specifies the location of the input file in a way that is compatible with the configured data source. If this value is relative and you are using FileDataSource or URLDataSource, it assumed to be relative to baseLoc.
+|acceptLineRegex |An optional attribute that if present discards any line which does not match the regExp.
+|omitLineRegex |An optional attribute that is applied after any acceptLineRegex and that discards any line which matches this regExp.
+|===
+
+For example:
+
+[source,xml]
+----
+<entity name="jc"
+        processor="LineEntityProcessor"
+        acceptLineRegex="^.*\.xml$"
+        omitLineRegex="/obsolete"
+        url="file:///Volumes/ts/files.lis"
+        rootEntity="false"
+        dataSource="myURIreader1"
+        transformer="RegexTransformer,DateFormatTransformer">
+  ...
+----
+
+While there are use cases where you might need to create a Solr document for each line read from a file, it is expected that in most cases that the lines read by this processor will consist of a pathname, which in turn will be consumed by another EntityProcessor, such as XPathEntityProcessor.
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-PlainTextEntityProcessor]]
+=== PlainTextEntityProcessor
+
+This EntityProcessor reads all content from the data source into an single implicit field called `plainText`. The content is not parsed in any way, however you may add transformers to manipulate the data within the `plainText` as needed, or to create other additional fields.
+
+For example:
+
+[source,xml]
+----
+<entity processor="PlainTextEntityProcessor" name="x" url="http://abc.com/a.txt" dataSource="data-source-name">
+  <!-- copies the text to a field called 'text' in Solr-->
+  <field column="plainText" name="text"/>
+</entity>
+----
+
+Ensure that the dataSource is of type `DataSource<Reader>` (`FileDataSource`, `URLDataSource`).
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-SolrEntityProcessor]]
+=== SolrEntityProcessor
+
+Uses Solr instance as a datasource, see https://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor. In addition to that, SolrEntityProcessor also supports the following parameters:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|cursorMark="true" |specify it to enable cursor for efficient result set scrolling
+|sort="id asc" |in this case it usually needs to specify sort param referencing uniqueKey field. see <<pagination-of-results.adoc#pagination-of-results,Pagination of Results>> for details.
+|===
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-Transformers]]
+== Transformers
+
+Transformers manipulate the fields in a document returned by an entity. A transformer can create new fields or modify existing ones. You must tell the entity which transformers your import operation will be using, by adding an attribute containing a comma separated list to the `<entity>` element.
+
+[source,xml]
+----
+<entity name="abcde" transformer="org.apache.solr....,my.own.transformer,..." />
+----
+
+Specific transformation rules are then added to the attributes of a `<field>` element, as shown in the examples below. The transformers are applied in the order in which they are specified in the transformer attribute.
+
+The Data Import Handler contains several built-in transformers. You can also write your own custom transformers, as described in the Solr Wiki (see http://wiki.apache.org/solr/DIHCustomTransformer). The ScriptTransformer (described below) offers an alternative method for writing your own transformers.
+
+Solr includes the following built-in transformers:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="40,60",options="header"]
+|===
+|Transformer Name |Use
+|<<UploadingStructuredDataStoreDatawiththeDataImportHandler-ClobTransformer,ClobTransformer>> |Used to create a String out of a Clob type in database.
+|<<UploadingStructuredDataStoreDatawiththeDataImportHandler-TheDateFormatTransformer,DateFormatTransformer>> |Parse date/time instances.
+|<<UploadingStructuredDataStoreDatawiththeDataImportHandler-TheHTMLStripTransformer,HTMLStripTransformer>> |Strip HTML from a field.
+|<<UploadingStructuredDataStoreDatawiththeDataImportHandler-TheLogTransformer,LogTransformer>> |Used to log data to log files or a console.
+|<<UploadingStructuredDataStoreDatawiththeDataImportHandler-TheNumberFormatTransformer,NumberFormatTransformer>> |Uses the NumberFormat class in java to parse a string into a number.
+|<<UploadingStructuredDataStoreDatawiththeDataImportHandler-TheRegexTransformer,RegexTransformer>> |Use regular expressions to manipulate fields.
+|<<UploadingStructuredDataStoreDatawiththeDataImportHandler-TheScriptTransformer,ScriptTransformer>> |Write transformers in Javascript or any other scripting language supported by Java.
+|<<UploadingStructuredDataStoreDatawiththeDataImportHandler-TheTemplateTransformer,TemplateTransformer>> |Transform a field using a template.
+|===
+
+These transformers are described below.
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-ClobTransformer]]
+=== ClobTransformer
+
+You can use the ClobTransformer to create a string out of a CLOB in a database. A CLOB is a character large object: a collection of character data typically stored in a separate location that is referenced in the database. See http://en.wikipedia.org/wiki/Character_large_object. Here's an example of invoking the ClobTransformer.
+
+[source,xml]
+----
+<entity name="e" transformer="ClobTransformer" ...>
+  <field column="hugeTextField" clob="true" />
+  ...
+</entity>
+----
+
+The ClobTransformer accepts these attributes:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Description
+|clob |Boolean value to signal if ClobTransformer should process this field or not. If this attribute is omitted, then the corresponding field is not transformed.
+|sourceColName |The source column to be used as input. If this is absent source and target are same
+|===
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-TheDateFormatTransformer]]
+=== The DateFormatTransformer
+
+This transformer converts dates from one format to another. This would be useful, for example, in a situation where you wanted to convert a field with a fully specified date/time into a less precise date format, for use in faceting.
+
+DateFormatTransformer applies only on the fields with an attribute `dateTimeFormat`. Other fields are not modified.
+
+This transformer recognizes the following attributes:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Description
+|dateTimeFormat |The format used for parsing this field. This must comply with the syntax of the http://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html[Java SimpleDateFormat] class.
+|sourceColName |The column on which the dateFormat is to be applied. If this is absent source and target are same.
+|locale |The locale to use for date transformations. If not defined, the ROOT locale is used. It must be specified as language-country (https://tools.ietf.org/html/bcp47[BCP 47 language tag]). For example, `en-US`.
+|===
+
+Here is example code that returns the date rounded up to the month "2007-JUL":
+
+[source,xml]
+----
+<entity name="en" pk="id" transformer="DateFormatTransformer" ... >
+  ...
+  <field column="date" sourceColName="fulldate" dateTimeFormat="yyyy-MMM"/>
+</entity>
+----
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-TheHTMLStripTransformer]]
+=== The HTMLStripTransformer
+
+You can use this transformer to strip HTML out of a field. For example:
+
+[source,xml]
+----
+<entity name="e" transformer="HTMLStripTransformer" ... >
+  <field column="htmlText" stripHTML="true" />
+  ...
+</entity>
+----
+
+There is one attribute for this transformer, `stripHTML`, which is a boolean value (true/false) to signal if the HTMLStripTransformer should process the field or not.
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-TheLogTransformer]]
+=== The LogTransformer
+
+You can use this transformer to log data to the console or log files. For example:
+
+[source,xml]
+----
+<entity ...
+        transformer="LogTransformer"
+        logTemplate="The name is ${e.name}" logLevel="debug">
+  ....
+</entity>
+----
+
+Unlike other transformers, the LogTransformer does not apply to any field, so the attributes are applied on the entity itself.
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-TheNumberFormatTransformer]]
+=== The NumberFormatTransformer
+
+Use this transformer to parse a number from a string, converting it into the specified format, and optionally using a different locale.
+
+NumberFormatTransformer will be applied only to fields with an attribute `formatStyle`.
+
+This transformer recognizes the following attributes:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Description
+|formatStyle |The format used for parsing this field. The value of the attribute must be one of (`number\|percent\|integer\|currency`). This uses the semantics of the Java NumberFormat class.
+|sourceColName |The column on which the NumberFormat is to be applied. This is attribute is absent. The source column and the target column are the same.
+|locale |The locale to be used for parsing the strings. The locale. If not defined, the ROOT locale is used. It must be specified as language-country (https://tools.ietf.org/html/bcp47[BCP 47 language tag]). For example, `en-US`.
+|===
+
+For example:
+
+[source,xml]
+----
+<entity name="en" pk="id" transformer="NumberFormatTransformer" ...>
+  ...
+
+  <!-- treat this field as UK pounds -->
+
+  <field name="price_uk" column="price" formatStyle="currency" locale="en-UK"/>
+</entity>
+----
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-TheRegexTransformer]]
+=== The RegexTransformer
+
+The regex transformer helps in extracting or manipulating values from fields (from the source) using Regular Expressions. The actual class name is `org.apache.solr.handler.dataimport.RegexTransformer`. But as it belongs to the default package the package-name can be omitted.
+
+The table below describes the attributes recognized by the regex transformer.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Description
+|regex |The regular expression that is used to match against the column or sourceColName's value(s). If replaceWith is absent, each regex _group_ is taken as a value and a list of values is returned.
+|sourceColName |The column on which the regex is to be applied. If not present, then the source and target are identical.
+|splitBy |Used to split a string. It returns a list of values. note: this is a regular expression – it may need to be escaped (e.g. via back-slashes)
+|groupNames |A comma separated list of field column names, used where the regex contains groups and each group is to be saved to a different field. If some groups are not to be named leave a space between commas.
+|replaceWith |Used along with regex . It is equivalent to the method `new String(<sourceColVal>).replaceAll(<regex>, <replaceWith>)`.
+|===
+
+Here is an example of configuring the regex transformer:
+
+[source,xml]
+----
+<entity name="foo" transformer="RegexTransformer"
+        query="select full_name, emailids from foo">
+  <field column="full_name"/>
+  <field column="firstName" regex="Mr(\w*)\b.*" sourceColName="full_name"/>
+  <field column="lastName" regex="Mr.*?\b(\w*)" sourceColName="full_name"/>
+
+  <!-- another way of doing the same -->
+
+  <field column="fullName" regex="Mr(\w*)\b(.*)" groupNames="firstName,lastName"/>
+  <field column="mailId" splitBy="," sourceColName="emailids"/>
+</entity>
+----
+
+In this example, regex and sourceColName are custom attributes used by the transformer. The transformer reads the field `full_name` from the resultset and transforms it to two new target fields, `firstName` and `lastName`. Even though the query returned only one column, `full_name`, in the result set, the Solr document gets two extra fields `firstName` and `lastName` which are "derived" fields. These new fields are only created if the regexp matches.
+
+The emailids field in the table can be a comma-separated value. It ends up producing one or more email IDs, and we expect the `mailId` to be a multivalued field in Solr.
+
+Note that this transformer can either be used to split a string into tokens based on a splitBy pattern, or to perform a string substitution as per replaceWith, or it can assign groups within a pattern to a list of groupNames. It decides what it is to do based upon the above attributes `splitBy`, `replaceWith` and `groupNames` which are looked for in order. This first one found is acted upon and other unrelated attributes are ignored.
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-TheScriptTransformer]]
+=== The ScriptTransformer
+
+The script transformer allows arbitrary transformer functions to be written in any scripting language supported by Java, such as Javascript, JRuby, Jython, Groovy, or BeanShell. Javascript is integrated into Java 8; you'll need to integrate other languages yourself.
+
+Each function you write must accept a row variable (which corresponds to a `Java Map<String,Object>`, thus permitting `get,put,remove` operations). Thus you can modify the value of an existing field or add new fields. The return value of the function is the returned object.
+
+The script is inserted into the DIH configuration file at the top level and is called once for each row.
+
+Here is a simple example.
+
+[source,xml]
+----
+<dataconfig>
+
+  <!-- simple script to generate a new row, converting a temperature from Fahrenheit to Centigrade -->
+
+  <script><![CDATA[
+    function f2c(row) {
+      var tempf, tempc;
+      tempf = row.get('temp_f');
+      if (tempf != null) {
+        tempc = (tempf - 32.0)*5.0/9.0;
+        row.put('temp_c', temp_c);
+      }
+      return row;
+    }
+    ]]>
+  </script>
+  <document>
+
+    <!-- the function is specified as an entity attribute -->
+
+    <entity name="e1" pk="id" transformer="script:f2c" query="select * from X">
+      ....
+    </entity>
+  </document>
+</dataConfig>
+----
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-TheTemplateTransformer]]
+=== The TemplateTransformer
+
+You can use the template transformer to construct or modify a field value, perhaps using the value of other fields. You can insert extra text into the template.
+
+[source,xml]
+----
+<entity name="en" pk="id" transformer="TemplateTransformer" ...>
+  ...
+  <!-- generate a full address from fields containing the component parts -->
+  <field column="full_address" template="${en.street},${en.city},${en.zip}" />
+</entity>
+----
+
+[[UploadingStructuredDataStoreDatawiththeDataImportHandler-SpecialCommandsfortheDataImportHandler]]
+== Special Commands for the Data Import Handler
+
+You can pass special commands to the DIH by adding any of the variables listed below to any row returned by any component:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Variable |Description
+|$skipDoc |Skip the current document; that is, do not add it to Solr. The value can be the string `true\|false`.
+|$skipRow |Skip the current row. The document will be added with rows from other entities. The value can be the string `true\|false`
+|$deleteDocById |Delete a document from Solr with this ID. The value has to be the `uniqueKey` value of the document.
+|$deleteDocByQuery |Delete documents from Solr using this query. The value must be a Solr Query.
+|===

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/using-javascript.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/using-javascript.adoc b/solr/solr-ref-guide/src/using-javascript.adoc
new file mode 100644
index 0000000..025aaa3
--- /dev/null
+++ b/solr/solr-ref-guide/src/using-javascript.adoc
@@ -0,0 +1,13 @@
+= Using JavaScript
+:page-shortname: using-javascript
+:page-permalink: using-javascript.html
+
+Using Solr from JavaScript clients is so straightforward that it deserves a special mention. In fact, it is so straightforward that there is no client API. You don't need to install any packages or configure anything.
+
+HTTP requests can be sent to Solr using the standard `XMLHttpRequest` mechanism.
+
+Out of the box, Solr can send <<response-writers.adoc#ResponseWriters-JSONResponseWriter,JavaScript Object Notation (JSON) responses>>, which are easily interpreted in JavaScript. Just add `wt=json` to the request URL to have responses sent as JSON.
+
+For more information and an excellent example, read the SolJSON page on the Solr Wiki:
+
+http://wiki.apache.org/solr/SolJSON

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/using-jmx-with-solr.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/using-jmx-with-solr.adoc b/solr/solr-ref-guide/src/using-jmx-with-solr.adoc
new file mode 100644
index 0000000..e69df27
--- /dev/null
+++ b/solr/solr-ref-guide/src/using-jmx-with-solr.adoc
@@ -0,0 +1,108 @@
+= Using JMX with Solr
+:page-shortname: using-jmx-with-solr
+:page-permalink: using-jmx-with-solr.html
+
+http://www.oracle.com/technetwork/java/javase/tech/javamanagement-140525.html[Java Management Extensions (JMX)] is a technology that makes it possible for complex systems to be controlled by tools without the systems and tools having any previous knowledge of each other. In essence, it is a standard interface by which complex systems can be viewed and manipulated.
+
+Solr, like any other good citizen of the Java universe, can be controlled via a JMX interface. You can enable JMX support by adding lines to `solrconfig.xml`. You can use a JMX client, like jconsole, to connect with Solr. Check out the Wiki page http://wiki.apache.org/solr/SolrJmx for more information. You may also find the following overview of JMX to be useful: http://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html.
+
+[[UsingJMXwithSolr-ConfiguringJMX]]
+== Configuring JMX
+
+JMX configuration is provided in `solrconfig.xml`. Please see the http://www.oracle.com/technetwork/java/javase/tech/javamanagement-140525.html[JMX Technology Home Page] for more details.
+
+A `rootName` attribute can be used when configuring `<jmx />` in `solrconfig.xml`. If this attribute is set, Solr uses it as the root name for all the MBeans that Solr exposes via JMX. The default name is "solr" followed by the core name.
+
+[IMPORTANT]
+====
+
+Enabling/disabling JMX and securing access to MBeanServers is left up to the user by specifying appropriate JVM parameters and configuration. Please explore the http://www.oracle.com/technetwork/java/javase/tech/javamanagement-140525.html[JMX Technology Home Page] for more details.
+
+====
+
+[[UsingJMXwithSolr-ConfiguringanExistingMBeanServer]]
+=== Configuring an Existing MBeanServer
+
+The command:
+
+[source,xml]
+----
+<jmx />
+----
+
+enables JMX support in Solr if and only if an existing MBeanServer is found. Use this if you want to configure JMX with JVM parameters. Remove this to disable exposing Solr configuration and statistics to JMX. If this is specified, Solr will try to list all available MBeanServers and use the first one to register MBeans.
+
+[[UsingJMXwithSolr-ConfiguringanExistingMBeanServerwithagentId]]
+=== Configuring an Existing MBeanServer with agentId
+
+The command:
+
+[source,xml]
+----
+<jmx agentId="myMBeanServer" />
+----
+
+enables JMX support in Solr if and only if an existing MBeanServer is found matching the given agentId. If multiple servers are found, the first one is used. If none is found, an exception is raised and depending on the configuration, Solr may refuse to start.
+
+[[UsingJMXwithSolr-ConfiguringaNewMBeanServer]]
+=== Configuring a New MBeanServer
+
+The command:
+
+[source,xml]
+----
+<jmx serviceUrl="service:jmx:rmi:///jndi/rmi://localhost:9999/solrjmx" />
+----
+
+creates a new MBeanServer exposed for remote monitoring at the specific service URL. If the JMXConnectorServer can't be started (probably because the serviceUrl is bad), an exception is thrown.
+
+[[UsingJMXwithSolr-Example]]
+==== Example
+
+Solr's `sample_techproducts_configs` config set uses the simple `<jmx />` configuration option. If you start the example with the necessary JVM system properties to launch an internal MBeanServer, Solr will register with it and you can connect using a tool like `jconsole`:
+
+1.  Launch the `techproducts` example with JMX enabled:
++
+[source,bash]
+----
+bin/solr -e techproducts -Dcom.sun.management.jmxremote
+----
+2.  Start `jconsole` (provided with the Sun JDK in the bin directory).
+3.  Connect to the "`start.jar`" shown in the list of local processes.
+4.  Switch to the "MBeans" tab. You should be able to see "`solr/techproducts`" listed there, at which point you can drill down and see details of every solr plugin.
+
+[[UsingJMXwithSolr-ConfiguringaRemoteConnectiontoSolrJMX]]
+=== Configuring a Remote Connection to Solr JMX
+
+If you need to attach a JMX-enabled Java profiling tool, such as JConsole or VisualVM, to a remote Solr server, then you need to enable remote JMX access when starting the Solr server. Simply change the `ENABLE_REMOTE_JMX_OPTS` property in the include file to true. You’ll also need to choose a port for the JMX RMI connector to bind to, such as 18983. For example, if your Solr include script sets:
+
+[source,bash]
+----
+ENABLE_REMOTE_JMX_OPTS=true
+RMI_PORT=18983
+----
+
+The JMX RMI connector will allow Java profiling tools to attach to port 18983. When enabled, the following properties are passed to the JVM when starting Solr:
+
+[source,plain]
+----
+-Dcom.sun.management.jmxremote \
+-Dcom.sun.management.jmxremote.local.only=false \
+-Dcom.sun.management.jmxremote.ssl=false \
+-Dcom.sun.management.jmxremote.authenticate=false \
+-Dcom.sun.management.jmxremote.port=18983 \
+-Dcom.sun.management.jmxremote.rmi.port=18983
+----
+
+We don’t recommend enabling remote JMX access in production, but it can sometimes be useful when doing performance and user-acceptance testing prior to going into production.
+
+For more information about these settings, see:
+
+http://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html
+
+[IMPORTANT]
+====
+
+Making JMX connections into machines running behind NATs (e.g. Amazon's EC2 service) is not a simple task. The `java.rmi.server.hostname` system property may help, but running `jconsole` on the server itself and using a remote desktop is often the simplest solution. See http://web.archive.org/web/20130525022506/http://jmsbrdy.com/monitoring-java-applications-running-on-ec2-i.
+
+====

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/using-python.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/using-python.adoc b/solr/solr-ref-guide/src/using-python.adoc
new file mode 100644
index 0000000..00c9305
--- /dev/null
+++ b/solr/solr-ref-guide/src/using-python.adoc
@@ -0,0 +1,61 @@
+= Using Python
+:page-shortname: using-python
+:page-permalink: using-python.html
+
+Solr includes an output format specifically for <<response-writers.adoc#ResponseWriters-PythonResponseWriter,Python>>, but <<response-writers.adoc#ResponseWriters-JSONResponseWriter,JSON output>> is a little more robust.
+
+[[UsingPython-SimplePython]]
+== Simple Python
+
+Making a query is a simple matter. First, tell Python you will need to make HTTP connections.
+
+[source,python]
+----
+from urllib2 import *
+----
+
+Now open a connection to the server and get a response. The `wt` query parameter tells Solr to return results in a format that Python can understand.
+
+[source,python]
+----
+connection = urlopen('http://localhost:8983/solr/collection_name/select?q=cheese&wt=python')
+response = eval(connection.read())
+----
+
+Now interpreting the response is just a matter of pulling out the information that you need.
+
+[source,python]
+----
+print response['response']['numFound'], "documents found."
+
+# Print the name of each document.
+
+for document in response['response']['docs']:
+  print "  Name =", document['name']
+----
+
+[[UsingPython-PythonwithJSON]]
+== Python with JSON
+
+JSON is a more robust response format, but you will need to add a Python package in order to use it. At a command line, install the simplejson package like this:
+
+[source,bash]
+----
+sudo easy_install simplejson
+----
+
+Once that is done, making a query is nearly the same as before. However, notice that the wt query parameter is now json, and the response is now digested by `simplejson.load()`.
+
+[source,python]
+----
+from urllib2 import *
+import simplejson
+connection = urlopen('http://localhost:8983/solr/collection_name/select?q=cheese&wt=json')
+response = simplejson.load(connection)
+print response['response']['numFound'], "documents found."
+
+# Print the name of each document.
+
+for document in response['response']['docs']:
+  print "  Name =", document['name']
+----

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/using-solr-from-ruby.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/using-solr-from-ruby.adoc b/solr/solr-ref-guide/src/using-solr-from-ruby.adoc
new file mode 100644
index 0000000..21d290c
--- /dev/null
+++ b/solr/solr-ref-guide/src/using-solr-from-ruby.adoc
@@ -0,0 +1,100 @@
+= Using Solr From Ruby
+:page-shortname: using-solr-from-ruby
+:page-permalink: using-solr-from-ruby.html
+
+Solr has an optional Ruby response format that extends the <<response-writers.adoc#ResponseWriters-JSONResponseWriter,JSON output>> to allow the response to be safely eval'd by Ruby's interpreter
+
+This Ruby response format differs from JSON in the following ways:
+
+* Ruby's single quoted strings are used to prevent possible string exploits
+** `\` and `'` are the only two characters escaped...
+** unicode escapes not used... data is written as raw UTF-8
+* nil used for null
+* `\=>` used as the key/value separator in maps
+
+Here's an example Ruby response from Solr, for a request like `\http://localhost:8983/solr/techproducts/select?q=iPod&wt=ruby&indent=on` (with Solr launching using `bin/solr start -e techproducts`):
+
+[source,ruby]
+----
+{
+  'responseHeader'=>{
+    'status'=>0,
+    'QTime'=>0,
+    'params'=>{
+      'q'=>'iPod',
+      'indent'=>'on',
+      'wt'=>'ruby'}},
+  'response'=>{'numFound'=>3,'start'=>0,'docs'=>[
+      {
+        'id'=>'IW-02',
+        'name'=>'iPod & iPod Mini USB 2.0 Cable',
+        'manu'=>'Belkin',
+        'manu_id_s'=>'belkin',
+        'cat'=>['electronics',
+          'connector'],
+        'features'=>['car power adapter for iPod, white'],
+        'weight'=>2.0,
+        'price'=>11.5,
+        'price_c'=>'11.50,USD',
+        'popularity'=>1,
+        'inStock'=>false,
+        'store'=>'37.7752,-122.4232',
+        'manufacturedate_dt'=>'2006-02-14T23:55:59Z',
+        '_version_'=>1491038048794705920},
+      {
+        'id'=>'F8V7067-APL-KIT',
+        'name'=>'Belkin Mobile Power Cord for iPod w/ Dock',
+        'manu'=>'Belkin',
+        'manu_id_s'=>'belkin',
+        'cat'=>['electronics',
+          'connector'],
+        'features'=>['car power adapter, white'],
+        'weight'=>4.0,
+        'price'=>19.95,
+        'price_c'=>'19.95,USD',
+        'popularity'=>1,
+        'inStock'=>false,
+        'store'=>'45.18014,-93.87741',
+        'manufacturedate_dt'=>'2005-08-01T16:30:25Z',
+        '_version_'=>1491038048792608768},
+      {
+        'id'=>'MA147LL/A',
+        'name'=>'Apple 60 GB iPod with Video Playback Black',
+        'manu'=>'Apple Computer Inc.',
+        'manu_id_s'=>'apple',
+        'cat'=>['electronics',
+          'music'],
+        'features'=>['iTunes, Podcasts, Audiobooks',
+          'Stores up to 15,000 songs, 25,000 photos, or 150 hours of video',
+          '2.5-inch, 320x240 color TFT LCD display with LED backlight',
+          'Up to 20 hours of battery life',
+          'Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video',
+          'Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication'],
+        'includes'=>'earbud headphones, USB cable',
+        'weight'=>5.5,
+        'price'=>399.0,
+        'price_c'=>'399.00,USD',
+        'popularity'=>10,
+        'inStock'=>true,
+        'store'=>'37.7752,-100.0232',
+        'manufacturedate_dt'=>'2005-10-12T08:00:00Z',
+        '_version_'=>1491038048799948800}]
+  }}
+----
+
+Here is a simple example of how one may query Solr using the Ruby response format:
+
+[source,ruby]
+----
+require 'net/http'
+
+h = Net::HTTP.new('localhost', 8983)
+http_response = h.get('/solr/techproducts/select?q=iPod&wt=ruby')
+rsp = eval(http_response.body)
+
+puts 'number of matches = ' + rsp['response']['numFound'].to_s
+#print out the name field for each returned document
+rsp['response']['docs'].each { |doc| puts 'name field = ' + doc['name'] }
+----
+
+For simple interactions with Solr, this may be all you need! If you are building complex interactions with Solr, then consider the libraries mentioned at https://wiki.apache.org/solr/Ruby%20Response%20Format

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/using-solrj.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/using-solrj.adoc b/solr/solr-ref-guide/src/using-solrj.adoc
new file mode 100644
index 0000000..39cd440
--- /dev/null
+++ b/solr/solr-ref-guide/src/using-solrj.adoc
@@ -0,0 +1,145 @@
+= Using SolrJ
+:page-shortname: using-solrj
+:page-permalink: using-solrj.html
+
+{solr-javadocs}/solr-solrj/[SolrJ] is an API that makes it easy for Java applications to talk to Solr. SolrJ hides a lot of the details of connecting to Solr and allows your application to interact with Solr with simple high-level methods.
+
+The center of SolrJ is the `org.apache.solr.client.solrj` package, which contains just five main classes. Begin by creating a {solr-javadocs}/solr-solrj/org/apache/solr/client/solrj/SolrClient.html[`SolrClient`], which represents the Solr instance you want to use. Then send `SolrRequests` or `SolrQuerys` and get back SolrResponses.
+
+`SolrClient` is abstract, so to connect to a remote Solr instance, you'll actually create an instance of either {solr-javadocs}/solr-solrj/org/apache/solr/client/solrj/impl/HttpSolrClient.html[`HttpSolrClient`], or {solr-javadocs}/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html[`CloudSolrClient`]. Both communicate with Solr via HTTP, the difference is that `HttpSolrClient` is configured using an explicit Solr URL, while `CloudSolrClient` is configured using the zkHost String for a <<solrcloud.adoc#solrcloud,SolrCloud>> cluster.
+
+
+.Single node Solr client
+[source,java]
+----
+String urlString = "http://localhost:8983/solr/techproducts";
+SolrClient solr = new HttpSolrClient.Builder(urlString).build();
+----
+
+.SolrCloud client
+[source,java]
+----
+String zkHostString = "zkServerA:2181,zkServerB:2181,zkServerC:2181/solr";
+SolrClient solr = new CloudSolrClient.Builder().withZkHost(zkHostString).build();
+----
+
+Once you have a `SolrClient`, you can use it by calling methods like `query()`, `add()`, and `commit()`.
+
+[[UsingSolrJ-BuildingandRunningSolrJApplications]]
+== Building and Running SolrJ Applications
+
+The SolrJ API is included with Solr, so you do not have to download or install anything else. However, in order to build and run applications that use SolrJ, you have to add some libraries to the classpath.
+
+At build time, the examples presented with this section require `solr-solrj-x.y.z.jar` to be in the classpath.
+
+At run time, the examples in this section require the libraries found in the 'dist/solrj-lib' directory.
+
+The Ant script bundled with this sections' examples includes the libraries as appropriate when building and running.
+
+You can sidestep a lot of the messing around with the JAR files by using Maven instead of Ant. All you will need to do to include SolrJ in your application is to put the following dependency in the project's `pom.xml`:
+
+[source,xml]
+----
+<dependency>
+  <groupId>org.apache.solr</groupId>
+  <artifactId>solr-solrj</artifactId>
+  <version>x.y.z</version>
+</dependency>
+----
+
+If you are worried about the SolrJ libraries expanding the size of your client application, you can use a code obfuscator like http://proguard.sourceforge.net/[ProGuard] to remove APIs that you are not using.
+
+[[UsingSolrJ-SettingXMLResponseParser]]
+== Setting XMLResponseParser
+
+SolrJ uses a binary format, rather than XML, as its default response format. If you are trying to mix Solr and SolrJ versions where one is version 1.x and the other is 3.x or later, then you MUST use the XML response parser. The binary format changed in 3.x, and the two javabin versions are entirely incompatible. The following code will make this change:
+
+[source,java]
+----
+solr.setParser(new XMLResponseParser());
+----
+
+[[UsingSolrJ-PerformingQueries]]
+== Performing Queries
+
+Use `query()` to have Solr search for results. You have to pass a `SolrQuery` object that describes the query, and you will get back a QueryResponse (from the `org.apache.solr.client.solrj.response` package).
+
+`SolrQuery` has methods that make it easy to add parameters to choose a request handler and send parameters to it. Here is a very simple example that uses the default request handler and sets the query string:
+
+[source,java]
+----
+SolrQuery query = new SolrQuery();
+query.setQuery(mQueryString);
+----
+
+To choose a different request handler, there is a specific method available in SolrJ version 4.0 and later:
+
+[source,java]
+----
+query.setRequestHandler("/spellCheckCompRH");
+----
+
+You can also set arbitrary parameters on the query object. The first two code lines below are equivalent to each other, and the third shows how to use an arbitrary parameter `q` to set the query string:
+
+[source,java]
+----
+query.set("fl", "category,title,price");
+query.setFields("category", "title", "price");
+query.set("q", "category:books");
+----
+
+Once you have your `SolrQuery` set up, submit it with `query()`:
+
+[source,java]
+----
+QueryResponse response = solr.query(query);
+----
+
+The client makes a network connection and sends the query. Solr processes the query, and the response is sent and parsed into a `QueryResponse`.
+
+The `QueryResponse` is a collection of documents that satisfy the query parameters. You can retrieve the documents directly with `getResults()` and you can call other methods to find out information about highlighting or facets.
+
+[source,java]
+----
+SolrDocumentList list = response.getResults();
+----
+
+[[UsingSolrJ-IndexingDocuments]]
+== Indexing Documents
+
+Other operations are just as simple. To index (add) a document, all you need to do is create a `SolrInputDocument` and pass it along to the `SolrClient`'s `add()` method. This example assumes that the SolrClient object called 'solr' is already created based on the examples shown earlier.
+
+[source,java]
+----
+SolrInputDocument document = new SolrInputDocument();
+document.addField("id", "552199");
+document.addField("name", "Gouda cheese wheel");
+document.addField("price", "49.99");
+UpdateResponse response = solr.add(document);
+
+// Remember to commit your changes!
+
+solr.commit();
+----
+
+[[UsingSolrJ-UploadingContentinXMLorBinaryFormats]]
+=== Uploading Content in XML or Binary Formats
+
+SolrJ lets you upload content in binary format instead of the default XML format. Use the following code to upload using binary format, which is the same format SolrJ uses to fetch results. If you are trying to mix Solr and SolrJ versions where one is version 1.x and the other is 3.x or later, then you MUST stick with the XML request writer. The binary format changed in 3.x, and the two javabin versions are entirely incompatible.
+
+[source,java]
+----
+solr.setRequestWriter(new BinaryRequestWriter());
+----
+
+[[UsingSolrJ-UsingtheConcurrentUpdateSolrClient]]
+=== Using the ConcurrentUpdateSolrClient
+
+When implementing java applications that will be bulk loading a lot of documents at once, {solr-javadocs}/solr-solrj/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.html[`ConcurrentUpdateSolrClient`] is an alternative to consider instead of using `HttpSolrClient`. The `ConcurrentUpdateSolrClient` buffers all added documents and writes them into open HTTP connections. This class is thread safe. Although any SolrClient request can be made with this implementation, it is only recommended to use the `ConcurrentUpdateSolrClient` for `/update` requests.
+
+[[UsingSolrJ-EmbeddedSolrServer]]
+== EmbeddedSolrServer
+
+The {solr-javadocs}/solr-core/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html[`EmbeddedSolrServer`] class provides an implementation of the `SolrClient` client API talking directly to an micro-instance of Solr running directly in your Java application. This embedded approach is not recommended in most cases and fairly limited in the set of features it supports – in particular it can not be used with <<solrcloud.adoc#solrcloud,SolrCloud>> or <<index-replication.adoc#index-replication,Index Replication>>. `EmbeddedSolrServer` exists primarily to help facilitate testing.
+
+For information on how to use `EmbeddedSolrServer` please review the SolrJ JUnit tests in the `org.apache.solr.client.solrj.embedded` package of the Solr source release.

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/using-the-solr-administration-user-interface.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/using-the-solr-administration-user-interface.adoc b/solr/solr-ref-guide/src/using-the-solr-administration-user-interface.adoc
new file mode 100644
index 0000000..9cfb196
--- /dev/null
+++ b/solr/solr-ref-guide/src/using-the-solr-administration-user-interface.adoc
@@ -0,0 +1,34 @@
+= Using the Solr Administration User Interface
+:page-shortname: using-the-solr-administration-user-interface
+:page-permalink: using-the-solr-administration-user-interface.html
+:page-children: overview-of-the-solr-admin-ui, getting-assistance, logging, cloud-screens, collections-core-admin, java-properties, thread-dump, collection-specific-tools, core-specific-tools
+
+This section discusses the Solr Administration User Interface ("Admin UI").
+
+
+The <<overview-of-the-solr-admin-ui.adoc#overview-of-the-solr-admin-ui,Overview of the Solr Admin UI>> explains the basic features of the user interface, what's on the initial Admin UI page, and how to configure the interface. In addition, there are pages describing each screen of the Admin UI:
+
+* *<<getting-assistance.adoc#getting-assistance,Getting Assistance>>* shows you how to get more information about the UI.
+* *<<logging.adoc#logging,Logging>>* >shows recent messages logged by this Solr node and provides a way to change logging levels for specific classes.
+* *<<cloud-screens.adoc#cloud-screens,Cloud Screens>>* display information about nodes when running in SolrCloud mode.
+* *<<collections-core-admin.adoc#collections-core-admin,Collections / Core Admin>>* explains how to get management information about each core.
+* *<<java-properties.adoc#java-properties,Java Properties>>* shows the Java information about each core.
+* *<<thread-dump.adoc#thread-dump,Thread Dump>>* lets you see detailed information about each thread, along with state information.
+
+* *<<collection-specific-tools.adoc#collection-specific-tools,Collection-Specific Tools>>* is a section explaining additional screens available for each collection.
+// TODO: SOLR-10655 BEGIN: refactor this into a 'collection-screens-list.include.adoc' file for reuse
+** <<analysis-screen.adoc#analysis-screen,Analysis>> - lets you analyze the data found in specific fields.
+** <<dataimport-screen.adoc#dataimport-screen,Dataimport>> - shows you information about the current status of the Data Import Handler.
+** <<documents-screen.adoc#documents-screen,Documents>> - provides a simple form allowing you to execute various Solr indexing commands directly from the browser.
+** <<files-screen.adoc#files-screen,Files>> - shows the current core configuration files such as `solrconfig.xml`.
+** <<query-screen.adoc#query-screen,Query>> - lets you submit a structured query about various elements of a core.
+** <<stream-screen.adoc#stream-screen,Stream>> - allows you to submit streaming expressions and see results and parsing explanations.
+** <<schema-browser-screen.adoc#schema-browser-screen,Schema Browser>> - displays schema data in a browser window.
+// TODO: SOLR-10655 END
+* *<<core-specific-tools.adoc#core-specific-tools,Core-Specific Tools>>* is a section explaining additional screens available for each named core.
+// TODO: SOLR-10655 BEGIN: refactor this into a 'core-screens-list.include.adoc' file for reuse
+** <<ping.adoc#ping,Ping>> - lets you ping a named core and determine whether the core is active.
+** <<plugins-stats-screen#plugins-stats-screen,Plugins/Stats>> - shows statistics for plugins and other installed components.
+** <<replication-screen.adoc#replication-screen,Replication>> - shows you the current replication status for the core, and lets you enable/disable replication.
+** <<segments-info.adoc#segments-info,Segments Info>> - Provides a visualization of the underlying Lucene index segments.
+// TODO: SOLR-10655 END

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/using-zookeeper-to-manage-configuration-files.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/using-zookeeper-to-manage-configuration-files.adoc b/solr/solr-ref-guide/src/using-zookeeper-to-manage-configuration-files.adoc
new file mode 100644
index 0000000..57d7f46
--- /dev/null
+++ b/solr/solr-ref-guide/src/using-zookeeper-to-manage-configuration-files.adoc
@@ -0,0 +1,81 @@
+= Using ZooKeeper to Manage Configuration Files
+:page-shortname: using-zookeeper-to-manage-configuration-files
+:page-permalink: using-zookeeper-to-manage-configuration-files.html
+
+With SolrCloud your configuration files are kept in ZooKeeper.
+
+These files are uploaded in either of the following cases:
+
+* When you start a SolrCloud example using the `bin/solr` script.
+* When you create a collection using the `bin/solr` script.
+* Explicitly upload a configuration set to ZooKeeper.
+
+[[UsingZooKeepertoManageConfigurationFiles-StartupBootstrap]]
+== Startup Bootstrap
+
+When you try SolrCloud for the first time using the `bin/solr -e cloud`, the related configset gets uploaded to ZooKeeper automatically and is linked with the newly created collection.
+
+The below command would start SolrCloud with the default collection name (gettingstarted) and default configset (data_driven_schema_configs) uploaded and linked to it.
+
+[source,bash]
+----
+bin/solr -e cloud -noprompt
+----
+
+You can also explicitly upload a configuration directory when creating a collection using the `bin/solr` script with the `-d` option, such as:
+
+[source,bash]
+----
+bin/solr create -c mycollection -d data_driven_schema_configs
+----
+
+The create command will upload a copy of the `data_driven_schema_configs` configuration directory to ZooKeeper under `/configs/mycollection`. Refer to the <<solr-control-script-reference.adoc#solr-control-script-reference,Solr Control Script Reference>> page for more details about the create command for creating collections.
+
+Once a configuration directory has been uploaded to ZooKeeper, you can update them using the <<solr-control-script-reference.adoc#solr-control-script-reference,Solr Control Script>>
+
+[IMPORTANT]
+====
+
+It's a good idea to keep these files under version control.
+
+====
+
+
+[[UsingZooKeepertoManageConfigurationFiles-UploadingConfigurationFilesusingbin_solrorSolrJ]]
+== Uploading Configuration Files using `bin/solr` or SolrJ
+
+In production situations, <<config-sets.adoc#config-sets,Config Sets>> can also be uploaded to ZooKeeper independent of collection creation using either Solr's <<solr-control-script-reference.adoc#solr-control-script-reference,Solr Control Script>> or the {solr-javadocs}/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html[CloudSolrClient.uploadConfig] java method.
+
+The below command can be used to upload a new configset using the bin/solr script.
+
+[source,bash]
+----
+bin/solr zk upconfig -n <name for configset> -d <path to directory with configset>
+----
+
+It is strongly recommended that the configurations be kept in a version control system, Git, SVN or similar.
+
+[[UsingZooKeepertoManageConfigurationFiles-ManagingYourSolrCloudConfigurationFiles]]
+== Managing Your SolrCloud Configuration Files
+
+To update or change your SolrCloud configuration files:
+
+1.  Download the latest configuration files from ZooKeeper, using the source control checkout process.
+2.  Make your changes.
+3.  Commit your changed file to source control.
+4.  Push the changes back to ZooKeeper.
+5.  Reload the collection so that the changes will be in effect.
+
+[[UsingZooKeepertoManageConfigurationFiles-PreparingZooKeeperbeforefirstclusterstart]]
+== Preparing ZooKeeper before first cluster start
+
+If you will share the same ZooKeeper instance with other applications you should use a _chroot_ in ZooKeeper. Please see <<taking-solr-to-production.adoc#TakingSolrtoProduction-ZooKeeperchroot,ZooKeeper chroot>> for instructions.
+
+There are certain configuration files containing cluster wide configuration. Since some of these are crucial for the cluster to function properly, you may need to upload such files to ZooKeeper before starting your Solr cluster for the first time. Examples of such configuration files (not exhaustive) are `solr.xml`, `security.json` and `clusterprops.json`.
+
+If you for example would like to keep your `solr.xml` in ZooKeeper to avoid having to copy it to every node's `solr_home` directory, you can push it to ZooKeeper with the bin/solr utility (Unix example):
+
+[source,bash]
+----
+bin/solr cp file:local/file/path/to/solr.xml zk:/solr.xml -z localhost:2181
+----