You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by tf...@apache.org on 2017/05/12 23:38:41 UTC

[14/58] [abbrv] lucene-solr:jira/solr-10233: squash merge jira/solr-10290 into master

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/overview-of-searching-in-solr.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/overview-of-searching-in-solr.adoc b/solr/solr-ref-guide/src/overview-of-searching-in-solr.adoc
new file mode 100644
index 0000000..c432f20
--- /dev/null
+++ b/solr/solr-ref-guide/src/overview-of-searching-in-solr.adoc
@@ -0,0 +1,45 @@
+= Overview of Searching in Solr
+:page-shortname: overview-of-searching-in-solr
+:page-permalink: overview-of-searching-in-solr.html
+
+Solr offers a rich, flexible set of features for search. To understand the extent of this flexibility, it's helpful to begin with an overview of the steps and components involved in a Solr search.
+
+When a user runs a search in Solr, the search query is processed by a *request handler*. A request handler is a Solr plug-in that defines the logic to be used when Solr processes a request. Solr supports a variety of request handlers. Some are designed for processing search queries, while others manage tasks such as index replication.
+
+Search applications select a particular request handler by default. In addition, applications can be configured to allow users to override the default selection in preference of a different request handler.
+
+To process a search query, a request handler calls a *query parser*, which interprets the terms and parameters of a query. Different query parsers support different syntax. Solr's default query parser is known as the <<the-standard-query-parser.adoc#the-standard-query-parser,Standard Query Parser>>,or more commonly just the "lucene" query parser. Solr also includes the <<the-dismax-query-parser.adoc#the-dismax-query-parser,DisMax>>query parser, and the <<the-extended-dismax-query-parser.adoc#the-extended-dismax-query-parser,Extended DisMax>> (eDisMax) query parser. The <<the-standard-query-parser.adoc#the-standard-query-parser,standard>> query parser's syntax allows for greater precision in searches, but the DisMax query parser is much more tolerant of errors. The DisMax query parser is designed to provide an experience similar to that of popular search engines such as Google, which rarely display syntax errors to users. The Extended DisMax query parser is an improved version of Dis
 Max that handles the full Lucene query syntax while still tolerating syntax errors. It also includes several additional features.
+
+In addition, there are <<common-query-parameters.adoc#common-query-parameters,common query parameters>> that are accepted by all query parsers.
+
+Input to a query parser can include:
+
+* search strings---that is, _terms_ to search for in the index
+* _parameters for fine-tuning the query_ by increasing the importance of particular strings or fields, by applying Boolean logic among the search terms, or by excluding content from the search results
+* _parameters for controlling the presentation of the query response_, such as specifying the order in which results are to be presented or limiting the response to particular fields of the search application's schema.
+
+Search parameters may also specify a *filter query*. As part of a search response, a filter query runs a query against the entire index and caches the results. Because Solr allocates a separate cache for filter queries, the strategic use of filter queries can improve search performance. (Despite their similar names, query filters are not related to analysis filters. Filter queries perform queries at search time against data already in the index, while analysis filters, such as Tokenizers, parse content for indexing, following specified rules).
+
+A search query can request that certain terms be highlighted in the search response; that is, the selected terms will be displayed in colored boxes so that they "jump out" on the screen of search results. <<highlighting.adoc#highlighting,*Highlighting*>> can make it easier to find relevant passages in long documents returned in a search. Solr supports multi-term highlighting. Solr includes a rich set of search parameters for controlling how terms are highlighted.
+
+Search responses can also be configured to include *snippets* (document excerpts) featuring highlighted text. Popular search engines such as Google and Yahoo! return snippets in their search results: 3-4 lines of text offering a description of a search result.
+
+To help users zero in on the content they're looking for, Solr supports two special ways of grouping search results to aid further exploration: faceting and clustering.
+
+<<faceting.adoc#faceting,*Faceting*>> is the arrangement of search results into categories (which are based on indexed terms). Within each category, Solr reports on the number of hits for relevant term, which is called a facet constraint. Faceting makes it easy for users to explore search results on sites such as movie sites and product review sites, where there are many categories and many items within a category.
+
+The screen shot below shows an example of faceting from the CNET Web site (CBS Interactive Inc.) , which was the first site to use Solr.
+
+image::images/overview-of-searching-in-solr/worddav88969a784fb8a63d8c46e9c043f5f953.png[image,width=600,height=300]
+
+Faceting makes use of fields defined when the search applications were indexed. In the example above, these fields include categories of information that are useful for describing digital cameras: manufacturer, resolution, and zoom range.
+
+*Clustering* groups search results by similarities discovered when a search is executed, rather than when content is indexed. The results of clustering often lack the neat hierarchical organization found in faceted search results, but clustering can be useful nonetheless. It can reveal unexpected commonalities among search results, and it can help users rule out content that isn't pertinent to what they're really searching for.
+
+Solr also supports a feature called <<morelikethis.adoc#morelikethis,MoreLikeThis>>, which enables users to submit new queries that focus on particular terms returned in an earlier query. MoreLikeThis queries can make use of faceting or clustering to provide additional aid to users.
+
+A Solr component called a <<response-writers.adoc#response-writers,*response writer*>> manages the final presentation of the query response. Solr includes a variety of response writers, including an <<response-writers.adoc#ResponseWriters-TheStandardXMLResponseWriter,XML Response Writer>> and a <<response-writers.adoc#ResponseWriters-JSONResponseWriter,JSON Response Writer>>.
+
+The diagram below summarizes some key elements of the search process.
+
+image::images/overview-of-searching-in-solr/worddav16392965e726e04513a21641fabad474.png[image,width=624,height=401]

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/overview-of-the-solr-admin-ui.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/overview-of-the-solr-admin-ui.adoc b/solr/solr-ref-guide/src/overview-of-the-solr-admin-ui.adoc
new file mode 100644
index 0000000..1f96ccc
--- /dev/null
+++ b/solr/solr-ref-guide/src/overview-of-the-solr-admin-ui.adoc
@@ -0,0 +1,22 @@
+= Overview of the Solr Admin UI
+:page-shortname: overview-of-the-solr-admin-ui
+:page-permalink: overview-of-the-solr-admin-ui.html
+
+Solr features a Web interface that makes it easy for Solr administrators and programmers to view <<files-screen.adoc#files-screen,Solr configuration>> details, run <<query-screen.adoc#query-screen,queries and analyze>> document fields in order to fine-tune a Solr configuration and access <<getting-assistance.adoc#getting-assistance,online documentation>> and other help.
+
+.Solr Dashboard
+image::images/overview-of-the-solr-admin-ui/dashboard.png[image,height=400]
+
+
+Accessing the URL `\http://hostname:8983/solr/` will show the main dashboard, which is divided into two parts.
+
+A left-side of the screen is a menu under the Solr logo that provides the navigation through the screens of the UI. The first set of links are for system-level information and configuration and provide access to <<logging.adoc#logging,Logging>>, <<collections-core-admin.adoc#collections-core-admin,Collection/Core Administration>>, and <<java-properties.adoc#java-properties,Java Properties>>, among other things. At the end of this information is at least one pulldown listing Solr cores configured for this instance. On <<solrcloud.adoc#solrcloud,SolrCloud>> nodes, an additional pulldown list shows all collections in this cluster. Clicking on a collection or core name shows secondary menus of information for the specified collection or core, such as a <<schema-browser-screen.adoc#schema-browser-screen,Schema Browser>>, <<files-screen.adoc#files-screen,Config Files>>, <<plugins-stats-screen.adoc#plugins-stats-screen,Plugins & Statistics>>, and an ability to perform <<query-screen.adoc#q
 uery-screen,Queries>> on indexed data.
+
+The center of the screen shows the detail of the option selected. This may include a sub-navigation for the option or text or graphical representation of the requested data. See the sections in this guide for each screen for more details.
+
+Under the covers, the Solr Admin UI re-uses the same HTTP APIs available to all clients to access Solr-related data to drive an external interface.
+
+[TIP]
+====
+The path to the Solr Admin UI given above is `\http://hostname:port/solr`, which redirects to `\http://hostname:port/solr/\#/` in the current version. A convenience redirect is also supported, so simply accessing the Admin UI at `\http://hostname:port/` will also redirect to `\http://hostname:port/solr/#/`.
+====

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/pagination-of-results.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/pagination-of-results.adoc b/solr/solr-ref-guide/src/pagination-of-results.adoc
new file mode 100644
index 0000000..98d750f
--- /dev/null
+++ b/solr/solr-ref-guide/src/pagination-of-results.adoc
@@ -0,0 +1,253 @@
+= Pagination of Results
+:page-shortname: pagination-of-results
+:page-permalink: pagination-of-results.html
+
+
+In most search applications, the "top" matching results (sorted by score, or some other criteria) are displayed to some human user.
+
+In many applications the UI for these sorted results are displayed to the user in "pages" containing a fixed number of matching results, and users don't typically look at results past the first few pages worth of results.
+
+== Basic Pagination
+In Solr, this basic paginated searching is supported using the `start` and `rows` parameters, and performance of this common behaviour can be tuned by utilizing the <<query-settings-in-solrconfig.adoc#QuerySettingsinSolrConfig-queryResultCache,`queryResultCache`>> and adjusting the <<query-settings-in-solrconfig.adoc#QuerySettingsinSolrConfig-queryResultWindowSize,`queryResultWindowSize`>> configuration options based on your expected page sizes.
+
+=== Basic Pagination Examples
+
+The easiest way to think about simple pagination, is to simply multiply the page number you want (treating the "first" page number as "0") by the number of rows per page; such as in the following pseudo-code:
+
+[source,plain]
+----
+function fetch_solr_page($page_number, $rows_per_page) {
+  $start = $page_number * $rows_per_page
+  $params = [ q = $some_query, rows = $rows_per_page, start = $start ]
+  return fetch_solr($params)
+}
+----
+
+=== How Basic Pagination is Affected by Index Updates
+
+The `start` param specified in a request to Solr indicates an *absolute* "offset" in the complete sorted list of matches that the client wants Solr to use as the beginning of the current "page".
+
+If an index modification (such as adding or removing documents) which affects the sequence of ordered documents matching a query occurs in between two requests from a client for subsequent pages of results, then it is possible that these modifications can result in the same document being returned on multiple pages, or documents being "skipped" as the result set shrinks or grows.
+
+For example, consider an index containing 26 documents like so:
+
+[options="header",%autowidth,width="30%"]
+|===
+|id |name
+|1 |A
+|2 |B
+| ... |
+|26 |Z
+|===
+
+Followed by the following requests & index modifications interleaved:
+
+* A client requests `q=*:*&rows=5&start=0&sort=name asc`
+** documents with the ids `1-5` will be returned to the client
+* Document id `3` is deleted
+* The client requests "page #2" using `q=*:*&rows=5&start=5&sort=name asc`
+** Documents `7-11` will be returned
+** Document `6` has been skipped, since it is now the 5th document in the sorted set of all matching results – it would be returned on a new request for "page #1"
+* 3 new documents are now added with the ids `90`, `91`, and `92`; All three documents have a name of `A`
+* The client requests "page #3" using `q=*:*&rows=5&start=10&sort=name asc`
+** Documents `9-13` will be returned
+** Documents `9`, `10`, and `11` have now been returned on both page #2 and page #3 since they moved farther back in the list of sorted results
+
+In typical situations these impacts from index changes on paginated searching don't significantly affect user experience -- either because they happen extremely infrequently in fairly static collections, or because the users recognize that the collection of data is constantly evolving and expect to see documents shift up and down in the result sets.
+
+=== Performance Problems with "Deep Paging"
+
+In some situations, the results of a Solr search are not destined for a simple paginated user interface.
+
+When you wish to fetch a very large number of sorted results from Solr to feed into an external system, using very large values for the `start` or `rows` parameters can be very inefficient. Pagination using `start` and `rows` not only require Solr to compute (and sort) in memory all of the matching documents that should be fetched for the current page, but also all of the documents that would have appeared on previous pages.
+
+While a request for `start=0&rows=1000000` may be obviously inefficient because it requires Solr to maintain & sort in memory a set of 1 million documents, likewise a request for `start=999000&rows=1000` is equally inefficient for the same reasons. Solr can't compute which matching document is the 999001st result in sorted order, without first determining what the first 999000 matching sorted results are.
+
+If the index is distributed, which is common when running in SolrCloud mode, then 1 million documents are retrieved from *each shard*. For a ten shard index, ten million entries must be retrieved and sorted to figure out the 1000 documents that match those query parameters.
+
+== Fetching A Large Number of Sorted Results: Cursors
+
+As an alternative to increasing the "start" parameter to request subsequent pages of sorted results, Solr supports using a "Cursor" to scan through results.
+
+Cursors in Solr are a logical concept that doesn't involve caching any state information on the server. Instead the sort values of the last document returned to the client are used to compute a "mark" representing a logical point in the ordered space of sort values. That "mark" can be specified in the parameters of subsequent requests to tell Solr where to continue.
+
+=== Using Cursors
+
+To use a cursor with Solr, specify a `cursorMark` parameter with the value of `\*`. You can think of this being analogous to `start=0` as a way to tell Solr "start at the beginning of my sorted results" except that it also informs Solr that you want to use a Cursor.
+
+In addition to returning the top N sorted results (where you can control N using the `rows` parameter) the Solr response will also include an encoded String named `nextCursorMark`. You then take the `nextCursorMark` String value from the response, and pass it back to Solr as the `cursorMark` parameter for your next request. You can repeat this process until you've fetched as many docs as you want, or until the `nextCursorMark` returned matches the `cursorMark` you've already specified -- indicating that there are no more results.
+
+=== Constraints when using Cursors
+
+There are a few important constraints to be aware of when using `cursorMark` parameter in a Solr request
+
+. `cursorMark` and `start` are mutually exclusive parameters.
+* Your requests must either not include a `start` parameter, or it must be specified with a value of "```0```".
+. `sort` clauses must include the uniqueKey field (either `asc` or `desc`).
+* If `id` is your uniqueKey field, then sort params like `id asc` and `name asc, id desc` would both work fine, but `name asc` by itself would not
+. Sorts including <<working-with-dates.adoc#working-with-dates,Date Math>> based functions that involve calculations relative to `NOW` will cause confusing results, since every document will get a new sort value on every subsequent request. This can easily result in cursors that never end, and constantly return the same documents over and over – even if the documents are never updated.
++
+In this situation, choose & re-use a fixed value for the <<working-with-dates.adoc#WorkingwithDates-NOW,`NOW` request param>> in all of your cursor requests.
+
+Cursor mark values are computed based on the sort values of each document in the result, which means multiple documents with identical sort values will produce identical Cursor mark values if one of them is the last document on a page of results. In that situation, the subsequent request using that `cursorMark` would not know which of the documents with the identical mark values should be skipped. Requiring that the uniqueKey field be used as a clause in the sort criteria guarantees that a deterministic ordering will be returned, and that every `cursorMark` value will identify a unique point in the sequence of documents.
+
+=== Cursor Examples
+
+==== Fetch All Docs
+
+The pseudo-code shown here shows the basic logic involved in fetching all documents matching a query using a cursor:
+
+[source,plain]
+----
+// when fetching all docs, you might as well use a simple id sort
+// unless you really need the docs to come back in a specific order
+$params = [ q => $some_query, sort => 'id asc', rows => $r, cursorMark => '*' ]
+$done = false
+while (not $done) {
+  $results = fetch_solr($params)
+  // do something with $results
+  if ($params[cursorMark] == $results[nextCursorMark]) {
+    $done = true
+  }
+  $params[cursorMark] = $results[nextCursorMark]
+}
+----
+
+Using SolrJ, this pseudo-code would be:
+
+[source,java]
+----
+SolrQuery q = (new SolrQuery(some_query)).setRows(r).setSort(SortClause.asc("id"));
+String cursorMark = CursorMarkParams.CURSOR_MARK_START;
+boolean done = false;
+while (! done) {
+  q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
+  QueryResponse rsp = solrServer.query(q);
+  String nextCursorMark = rsp.getNextCursorMark();
+  doCustomProcessingOfResults(rsp);
+  if (cursorMark.equals(nextCursorMark)) {
+    done = true;
+  }
+  cursorMark = nextCursorMark;
+}
+----
+
+If you wanted to do this by hand using curl, the sequence of requests would look something like this:
+
+[source,plain]
+----
+$ curl '...&rows=10&sort=id+asc&cursorMark=*'
+{
+  "response":{"numFound":32,"start":0,"docs":[
+    // ... 10 docs here ...
+  ]},
+  "nextCursorMark":"AoEjR0JQ"}
+$ curl '...&rows=10&sort=id+asc&cursorMark=AoEjR0JQ'
+{
+  "response":{"numFound":32,"start":0,"docs":[
+    // ... 10 more docs here ...
+  ]},
+  "nextCursorMark":"AoEpVkRCREIxQTE2"}
+$ curl '...&rows=10&sort=id+asc&cursorMark=AoEpVkRCREIxQTE2'
+{
+  "response":{"numFound":32,"start":0,"docs":[
+    // ... 10 more docs here ...
+  ]},
+  "nextCursorMark":"AoEmbWF4dG9y"}
+$ curl '...&rows=10&sort=id+asc&cursorMark=AoEmbWF4dG9y'
+{
+  "response":{"numFound":32,"start":0,"docs":[
+    // ... 2 docs here because we've reached the end.
+  ]},
+  "nextCursorMark":"AoEpdmlld3Nvbmlj"}
+$ curl '...&rows=10&sort=id+asc&cursorMark=AoEpdmlld3Nvbmlj'
+{
+  "response":{"numFound":32,"start":0,"docs":[
+    // no more docs here, and note that the nextCursorMark
+    // matches the cursorMark param we used
+  ]},
+  "nextCursorMark":"AoEpdmlld3Nvbmlj"}
+----
+
+==== Fetch First _N_ docs, Based on Post Processing
+
+Since the cursor is stateless from Solr's perspective, your client code can stop fetching additional results as soon as you have decided you have enough information:
+
+[source,java]
+----
+while (! done) {
+  q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
+  QueryResponse rsp = solrServer.query(q);
+  String nextCursorMark = rsp.getNextCursorMark();
+  boolean hadEnough = doCustomProcessingOfResults(rsp);
+  if (hadEnough || cursorMark.equals(nextCursorMark)) {
+    done = true;
+  }
+  cursorMark = nextCursorMark;
+}
+----
+
+=== How Cursors are Affected by Index Updates
+
+Unlike basic pagination, Cursor pagination does not rely on using an absolute "offset" into the completed sorted list of matching documents. Instead, the `cursorMark` specified in a request encapsulates information about the *relative* position of the last document returned, based on the *absolute* sort values of that document. This means that the impact of index modifications is much smaller when using a cursor compared to basic pagination. Consider the same example index described when discussing basic pagination:
+
+[options="header",%autowidth,width="30%"]
+|===
+|id |name
+|1 |A
+|2 |B
+| ... |
+|26 |Z
+|===
+
+* A client requests `q=*:*&rows=5&start=0&sort=name asc, id asc&cursorMark=*`
+** Documents with the ids `1-5` will be returned to the client in order
+* Document id `3` is deleted
+* The client requests 5 more documents using the `nextCursorMark` from the previous response
+** Documents `6-10` will be returned -- the deletion of a document that's already been returned doesn't affect the relative position of the cursor
+* 3 new documents are now added with the ids `90`, `91`, and `92`; All three documents have a name of `A`
+* The client requests 5 more documents using the `nextCursorMark` from the previous response
+** Documents `11-15` will be returned -- the addition of new documents with sort values already past does not affect the relative position of the cursor
+* Document id `1` is updated to change its 'name' to `Q`
+* Document id 17 is updated to change its 'name' to `A`
+* The client requests 5 more documents using the `nextCursorMark` from the previous response
+** The resulting documents are `16,1,18,19,20` in that order
+** Because the sort value of document `1` changed so that it is _after_ the cursor position, the document is returned to the client twice
+** Because the sort value of document `17` changed so that it is _before_ the cursor position, the document has been "skipped" and will not be returned to the client as the cursor continues to progress
+
+In a nutshell: When fetching all results matching a query using `cursorMark`, the only way index modifications can result in a document being skipped, or returned twice, is if the sort value of the document changes.
+
+[TIP]
+====
+One way to ensure that a document will never be returned more then once, is to use the uniqueKey field as the primary (and therefore: only significant) sort criterion.
+
+In this situation, you will be guaranteed that each document is only returned once, no matter how it may be be modified during the use of the cursor.
+====
+
+=== "Tailing" a Cursor
+
+Because Cursor requests are stateless, and the cursorMark values encapsulate the absolute sort values of the last document returned from a search, it's possible to "continue" fetching additional results from a cursor that has already reached its end. If new documents are added (or existing documents are updated) to the end of the results.
+
+You can think of this as similar to using something like "tail -f" in Unix. The most common examples of how this can be useful is when you have a "timestamp" field recording when a document has been added/updated in your index. Client applications can continuously poll a cursor using a `sort=timestamp asc, id asc` for documents matching a query, and always be notified when a document is added or updated matching the request criteria.
+
+Another common example is when you have uniqueKey values that always increase as new documents are created, and you can continuously poll a cursor using `sort=id asc` to be notified about new documents.
+
+The pseudo-code for tailing a cursor is only a slight modification from our early example for processing all docs matching a query:
+
+[source,plain]
+----
+while (true) {
+  $doneForNow = false
+  while (not $doneForNow) {
+    $results = fetch_solr($params)
+    // do something with $results
+    if ($params[cursorMark] == $results[nextCursorMark]) {
+      $doneForNow = true
+    }
+    $params[cursorMark] = $results[nextCursorMark]
+  }
+  sleep($some_configured_delay)
+}
+----
+
+TIP: For certain specialized cases, the <<exporting-result-sets.adoc#exporting-result-sets,/export handler>> may be an option.

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/parallel-sql-interface.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/parallel-sql-interface.adoc b/solr/solr-ref-guide/src/parallel-sql-interface.adoc
new file mode 100644
index 0000000..d4bb2c0
--- /dev/null
+++ b/solr/solr-ref-guide/src/parallel-sql-interface.adoc
@@ -0,0 +1,418 @@
+= Parallel SQL Interface
+:page-shortname: parallel-sql-interface
+:page-permalink: parallel-sql-interface.html
+:page-children: solr-jdbc-dbvisualizer, solr-jdbc-squirrel-sql, solr-jdbc-apache-zeppelin, solr-jdbc-python-jython, solr-jdbc-r
+
+Solr's Parallel SQL Interface brings the power of SQL to SolrCloud.
+
+The SQL interface seamlessly combines SQL with Solr's full-text search capabilities. Both MapReduce style and JSON Facet API aggregations are supported, which means the SQL interface can be used to support both *high query volume* and *high cardinality* use cases.
+
+== SQL Architecture
+
+The SQL interface allows sending a SQL query to Solr and getting documents streamed back in response. Under the covers, Solr's SQL interface uses the Apache Calcite SQL engine to translate SQL queries to physical query plans implemented as <<streaming-expressions.adoc#streaming-expressions,Streaming Expressions>>.
+
+=== Solr Collections and DB Tables
+
+In a standard `SELECT` statement such as `SELECT <expressions> FROM <table>`, the table names correspond to Solr collection names. Table names are case insensitive.
+
+Column names in the SQL query map directly to fields in the Solr index for the collection being queried. These identifiers are case sensitive. Aliases are supported, and can be referenced in the `ORDER BY` clause.
+
+The `*` syntax to indicate all fields is not supported in either limited or unlimited queries. The `score` field can be used only with queries that contain a `LIMIT` clause.
+
+For example, we could index Solr's sample documents and then construct an SQL query like this:
+
+[source,sql]
+----
+SELECT manu as mfr, price as retail FROM techproducts
+----
+
+The collection in Solr we are using is "techproducts", and we've asked for the "manu" and "price" fields to be returned and aliased with new names. While this example does not use those aliases, we could build on this to ORDER BY one or more of those fields.
+
+More information about how to structure SQL queries for Solr is included in the section <<Solr SQL Syntax>> below.
+
+=== Aggregation Modes
+
+The SQL feature of Solr can work with aggregations (grouping of results) in two ways:
+
+* `facet`: This is the *default* aggregation mode, which uses the JSON Facet API or StatsComponent for aggregations. In this scenario the aggregations logic is pushed down into the search engine and only the aggregates are sent across the network. This is Solr's normal mode of operation. This is fast when the cardinality of GROUP BY fields is low to moderate. But it breaks down when you have high cardinality fields in the GROUP BY field.
+* `map_reduce`: This implementation shuffles tuples to worker nodes and performs the aggregation on the worker nodes. It involves sorting and partitioning the entire result set and sending it to worker nodes. In this approach the tuples arrive at the worker nodes sorted by the GROUP BY fields. The worker nodes can then rollup the aggregates one group at a time. This allows for unlimited cardinality aggregation, but you pay the price of sending the entire result set across the network to worker nodes.
+
+These modes are defined with the `aggregationMode` property when sending the request to Solr.
+
+As noted, the choice between aggregation modes depends on the cardinality of the fields you are working with. If you have low-to-moderate cardinality in the fields you are grouping by, the 'facet' aggregation mode will give you a higher performance because only the final groups are returned, very similar to how facets work today. If, however, you have high cardinality in the fields, the "map_reduce" aggregation mode with worker nodes provide a much more performant option.
+
+== Configuration
+
+The request handlers used for the SQL interface are configured to load implicitly, meaning there is little to do to start using this feature.
+
+[[sql-request-handler]]
+=== /sql Request Handler
+
+The `/sql` handler is the front end of the Parallel SQL interface. All SQL queries are sent to the `/sql` handler to be processed. The handler also coordinates the distributed MapReduce jobs when running `GROUP BY` and `SELECT DISTINCT` queries in `map_reduce` mode. By default the `/sql` handler will choose worker nodes from its own collection to handle the distributed operations. In this default scenario the collection where the `/sql` handler resides acts as the default worker collection for MapReduce queries.
+
+By default, the `/sql` request handler is configured as an implicit handler, meaning that it is always enabled in every Solr installation and no further configuration is required.
+
+[IMPORTANT]
+====
+As described below in the section <<Best Practices>>, you may want to set up a separate collection for parallelized SQL queries. If you have high cardinality fields and a large amount of data, please be sure to review that section and consider using a separate collection.
+====
+
+=== /stream and /export Request Handlers
+
+The Streaming API is an extensible parallel computing framework for SolrCloud. <<streaming-expressions.adoc#streaming-expressions,Streaming Expressions>> provide a query language and a serialization format for the Streaming API.
+
+The Streaming API provides support for fast MapReduce allowing it to perform parallel relational algebra on extremely large data sets. Under the covers the SQL interface parses SQL queries using the Apache Calcite SQL Parser. It then translates the queries to the parallel query plan. The parallel query plan is expressed using the Streaming API and Streaming Expressions.
+
+Like the `/sql` request handler, the `/stream` and `/export` request handlers are configured as implicit handlers, and no further configuration is required.
+
+=== Fields
+
+In some cases, fields used in SQL queries must be configured as DocValue fields. If queries are unlimited, all fields must be DocValue fields. If queries are limited (with the `limit` clause) then fields do not have to be have DocValues enabled.
+
+=== Sending Queries
+
+The SQL Interface provides a basic JDBC driver and an HTTP interface to perform queries.
+
+=== JDBC Driver
+
+The JDBC Driver ships with SolrJ. Below is sample code for creating a connection and executing a query with the JDBC driver:
+
+[source,java]
+----
+Connection con = null;
+try {
+    con = DriverManager.getConnection("jdbc:solr://" + zkHost + "?collection=collection1&aggregationMode=map_reduce&numWorkers=2");
+    stmt = con.createStatement();
+    rs = stmt.executeQuery("SELECT a_s, sum(a_f) as sum FROM collection1 GROUP BY a_s ORDER BY sum desc");
+
+    while(rs.next()) {
+        String a_s = rs.getString("a_s");
+        double s = rs.getDouble("sum");
+    }
+} finally {
+    rs.close();
+    stmt.close();
+    con.close();
+}
+----
+
+The connection URL must contain the `zkHost` and the `collection` parameters. The collection must be a valid SolrCloud collection at the specified ZooKeeper host. The collection must also be configured with the `/sql` handler. The `aggregationMode` and `numWorkers` parameters are optional.
+
+=== HTTP Interface
+
+Solr accepts parallel SQL queries through the `/sql` handler.
+
+Below is a sample curl command performing a SQL aggregate query in facet mode:
+
+[source,bash]
+----
+curl --data-urlencode 'stmt=SELECT to, count(*) FROM collection4 GROUP BY to ORDER BY count(*) desc LIMIT 10' http://localhost:8983/solr/collection4/sql?aggregationMode=facet
+----
+
+Below is sample result set:
+
+[source,json]
+----
+{"result-set":{"docs":[
+   {"count(*)":9158,"to":"pete.davis@enron.com"},
+   {"count(*)":6244,"to":"tana.jones@enron.com"},
+   {"count(*)":5874,"to":"jeff.dasovich@enron.com"},
+   {"count(*)":5867,"to":"sara.shackleton@enron.com"},
+   {"count(*)":5595,"to":"steven.kean@enron.com"},
+   {"count(*)":4904,"to":"vkaminski@aol.com"},
+   {"count(*)":4622,"to":"mark.taylor@enron.com"},
+   {"count(*)":3819,"to":"kay.mann@enron.com"},
+   {"count(*)":3678,"to":"richard.shapiro@enron.com"},
+   {"count(*)":3653,"to":"kate.symes@enron.com"},
+   {"EOF":"true","RESPONSE_TIME":10}]}
+}
+----
+
+Notice that the result set is an array of tuples with key/value pairs that match the SQL column list. The final tuple contains the EOF flag which signals the end of the stream.
+
+== Solr SQL Syntax
+
+Solr supports a broad range of SQL syntax.
+
+.SQL Parser is Case Insensitive
+[IMPORTANT]
+====
+The SQL parser being used by Solr to translate the SQL statements is case insensitive. However, for ease of reading, all examples on this page use capitalized keywords.
+====
+
+=== Escaping Reserved Words
+
+The SQL parser will return an error if a reserved word is used in the SQL query. Reserved words can be escaped and included in the query using the back tick. For example:
+
+[source,sql]
+----
+select `from` from emails
+----
+
+=== SELECT Statements
+
+Solr supports limited and unlimited select queries. The syntax between the two types of queries are identical except for the `LIMIT` clause in the SQL statement. However, they have very different execution plans and different requirements for how the data is stored. The sections below explores both types of queries.
+
+==== Basic SELECT statement with LIMIT
+
+A limited select query follows this basic syntax:
+
+[source,sql]
+----
+SELECT fieldA as fa, fieldB as fb, fieldC as fc FROM tableA WHERE fieldC = 'term1 term2' ORDER BY fa desc LIMIT 100
+----
+
+We've covered many syntax options with this example, so let's walk through what's possible below.
+
+=== `WHERE` Clause and Boolean Predicates
+
+[IMPORTANT]
+====
+The WHERE clause must have a field on one side of the predicate. Two constants `(5 < 10`) or two fields `(fielda > fieldb)` is not supported. Subqueries are also not supported.
+====
+
+The `WHERE` clause allows Solr's search syntax to be injected into the SQL query. In the example:
+
+[source,sql]
+----
+WHERE fieldC = 'term1 term2'
+----
+
+The predicate above will execute a full text search for the phrase 'term1 term2' in fieldC.
+
+To execute a non-phrase query, simply add parens inside of the single quotes. For example:
+
+[source,sql]
+----
+WHERE fieldC = '(term1 term2)'
+----
+
+The predicate above searches for `term1` OR `term2` in `fieldC`.
+
+The Solr range query syntax can be used as follows:
+
+[source,sql]
+----
+WHERE fieldC = '[0 TO 100]'
+----
+
+Complex boolean queries can be specified as follows:
+
+[source,sql]
+----
+WHERE ((fieldC = 'term1' AND fieldA = 'term2') OR (fieldB = 'term3'))
+----
+
+To specify NOT queries, you use the `AND NOT` syntax as follows:
+
+[source,sql]
+----
+WHERE (fieldA = 'term1') AND NOT (fieldB = 'term2')
+----
+
+==== Supported `WHERE` Operators
+
+The parallel SQL interface supports and pushes down most common SQL operators, specifically:
+
+[width="100%",options="header",]
+|===
+|Operator |Description |Example |Solr Query
+|= |Equals |`fielda = 10` |`fielda:10`
+|<> |Does not equal |`fielda <> 10` |`-fielda:10`
+|!= |Does not equal |`fielda != 10` |`-fielda:10`
+|> |Greater than |`fielda > 10` | `fielda:{10 TO *]`
+|>= |Greater than or equals |`fielda >= 10` | `fielda:[10 TO *]`
+|< |Less than |`fielda < 10` | `fielda:[* TO 10}`
+|<= |Less than or equals |`fielda <= 10` | `fielda:[* TO 10]`
+|===
+
+Some operators that are not supported are BETWEEN, LIKE and IN. However, there are workarounds for BETWEEN and LIKE.
+
+* BETWEEN can be supported with a range query, such as `field = [50 TO 100]`.
+* A simplistic LIKE can be used with a wildcard, such as `field = 'sam*'`.
+
+=== `ORDER BY` Clause
+
+The `ORDER BY` clause maps directly to Solr fields. Multiple `ORDER BY` fields and directions are supported.
+
+The `score` field is accepted in the `ORDER BY` clause in queries where a limit is specified.
+
+If the `ORDER BY` clause contains the exact fields in the `GROUP BY` clause, then there is no-limit placed on the returned results. If the `ORDER BY` clause contains different fields than the `GROUP BY` clause, a limit of 100 is automatically applied. To increase this limit you must specify a value in the `LIMIT` clause.
+
+Order by fields are case sensitive.
+
+=== `LIMIT` Clause
+
+Limits the result set to the specified size. In the example above the clause `LIMIT 100` will limit the result set to 100 records.
+
+There are a few differences to note between limited and unlimited queries:
+
+* Limited queries support `score` in the field list and `ORDER BY`. Unlimited queries do not.
+* Limited queries allow any stored field in the field list. Unlimited queries require the fields to be stored as a DocValues field.
+* Limited queries allow any indexed field in the `ORDER BY` list. Unlimited queries require the fields to be stored as a DocValues field.
+
+=== `SELECT DISTINCT` Queries
+
+The SQL interface supports both MapReduce and Facet implementations for `SELECT DISTINCT` queries.
+
+The MapReduce implementation shuffles tuples to worker nodes where the Distinct operation is performed. This implementation can perform the Distinct operation over extremely high cardinality fields.
+
+The Facet implementation pushes down the Distinct operation into the search engine using the JSON Facet API. This implementation is designed for high performance, high QPS scenarios on low-to-moderate cardinality fields.
+
+The `aggregationMode` parameter is available in the both the JDBC driver and HTTP interface to choose the underlying implementation (`map_reduce` or `facet`). The SQL syntax is identical for both implementations:
+
+[source,sql]
+----
+SELECT distinct fieldA as fa, fieldB as fb FROM tableA ORDER BY fa desc, fb desc
+----
+
+=== Statistics
+
+The SQL interface supports simple statistics calculated on numeric fields. The supported functions are `count(*)`, `min`, `max`, `sum`, and `avg`.
+
+Because these functions never require data to be shuffled, the aggregations are pushed down into the search engine and are generated by the <<the-stats-component.adoc#the-stats-component,StatsComponent>>.
+
+[source,sql]
+----
+SELECT count(*) as count, sum(fieldB) as sum FROM tableA WHERE fieldC = 'Hello'
+----
+
+=== `GROUP BY` Aggregations
+
+The SQL interface also supports `GROUP BY` aggregate queries.
+
+As with `SELECT DISTINCT` queries, the SQL interface supports both a MapReduce implementation and a Facet implementation. The MapReduce implementation can build aggregations over extremely high cardinality fields. The Facet implementations provides high performance aggregation over fields with moderate levels of cardinality.
+
+==== Basic `GROUP BY` with Aggregates
+
+Here is a basic example of a GROUP BY query that requests aggregations:
+
+[source,sql]
+----
+SELECT fieldA as fa, fieldB as fb, count(*) as count, sum(fieldC) as sum, avg(fieldY) as avg FROM tableA WHERE fieldC = 'term1 term2'
+GROUP BY fa, fb HAVING sum > 1000 ORDER BY sum asc LIMIT 100
+----
+
+Let's break this down into pieces:
+
+==== Column Identifiers and Aliases
+
+The Column Identifiers can contain both fields in the Solr index and aggregate functions. The supported aggregate functions are:
+
+* `count(*)`: Counts the number of records over a set of buckets.
+* `sum(field)`: Sums a numeric field over over a set of buckets.
+* `avg(field)`: Averages a numeric field over a set of buckets.
+* `min(field)`: Returns the min value of a numeric field over a set of buckets.
+* `max:(field)`: Returns the max value of a numerics over a set of buckets.
+
+The non-function fields in the field list determine the fields to calculate the aggregations over.
+
+*`GROUP BY` Clause*
+
+The `GROUP BY` clause can contain up to 4 fields in the Solr index. These fields should correspond with the non-function fields in the field list.
+
+=== `HAVING` Clause
+
+The `HAVING` clause may contain any function listed in the field list. Complex `HAVING` clauses such as this are supported:
+
+[source,sql]
+----
+SELECT fieldA, fieldB, count(*), sum(fieldC), avg(fieldY)
+FROM tableA
+WHERE fieldC = 'term1 term2'
+GROUP BY fieldA, fieldB
+HAVING ((sum(fieldC) > 1000) AND (avg(fieldY) <= 10))
+ORDER BY sum(fieldC) asc
+LIMIT 100
+----
+
+== Best Practices
+
+=== Separate Collections
+
+It makes sense to create a separate SolrCloud collection just for the `/sql` handler. This collection can be created using SolrCloud's standard collection API.
+
+Since this collection only exists to handle `/sql` requests and provide a pool of worker nodes, this collection does not need to hold any data. Worker nodes are selected randomly from the entire pool of available nodes in the `/sql` handler's collection. So to grow this collection dynamically replicas can be added to existing shards. New replicas will automatically be put to work after they've been added.
+
+== Parallel SQL Queries
+
+An earlier section describes how the SQL interface translates the SQL statement to a streaming expression. One of the parameters of the request is the `aggregationMode`, which defines if the query should use a MapReduce-like shuffling technique or push the operation down into the search engine.
+
+=== Parallelized Queries
+
+The Parallel SQL architecture consists of three logical tiers: a *SQL* tier, a *Worker* tier, and a *Data Table* tier. By default the SQL and Worker tiers are collapsed into the same physical SolrCloud collection.
+
+==== SQL Tier
+
+The SQL tier is where the `/sql` handler resides. The `/sql` handler takes the SQL query and translates it to a parallel query plan. It then selects worker nodes to execute the plan and sends the query plan to each worker node to be run in parallel.
+
+Once the query plan has been executed by the worker nodes, the `/sql` handler then performs the final merge of the tuples returned by the worker nodes.
+
+==== Worker Tier
+
+The workers in the worker tier receive the query plan from the `/sql` handler and execute the parallel query plan. The parallel execution plan includes the queries that need to be made on the Data Table tier and the relational algebra needed to satisfy the query. Each worker node assigned to the query is shuffled 1/N of the tuples from the Data Tables. The worker nodes execute the query plan and stream tuples back to the worker nodes.
+
+==== Data Table Tier
+
+The Data Table tier is where the tables reside. Each table is its own SolrCloud collection. The Data Table layer receives queries from the worker nodes and emits tuples (search results). The Data Table tier also handles the initial sorting and partitioning of tuples sent to the workers. This means the tuples are always sorted and partitioned before they hit the network. The partitioned tuples are sent directly to the correct worker nodes in the proper sort order, ready to be reduced.
+
+.How Parallel SQL Queries are Distributed
+image::images/parallel-sql-interface/cluster.png[image,width=492,height=250]
+
+The image above shows the three tiers broken out into different SolrCloud collections for clarity. In practice the `/sql` handler and worker collection by default share the same collection.
+
+*Note:* The image shows the network flow for a single Parallel SQL Query (SQL over MapReduce). This network flow is used when `map_reduce` aggregation mode is used for `GROUP BY` aggregations or the `SELECT DISTINCT` query. The traditional SolrCloud network flow (without workers) is used when the `facet` aggregation mode is used.
+
+Below is a description of the flow:
+
+. The client sends a SQL query to the `/sql` handler. The request is handled by a single `/sql` handler instance.
+. The `/sql` handler parses the SQL query and creates the parallel query plan.
+. The query plan is sent to worker nodes (in green).
+. The worker nodes execute the plan in parallel. The diagram shows each worker node contacting a collection in the Data Table tier (in blue).
+. The collection in the Data Table tier is the table from the SQL query. Notice that the collection has five shards each with 3 replicas.
+. Notice that each worker contacts one replica from each shard. Because there are 5 workers, each worker is returned 1/5 of the search results from each shard. The partitioning is done inside of the Data Table tier so there is no duplication of data across the network.
+. Also notice with this design ALL replicas in the data layer are shuffling (sorting & partitioning) data simultaneously. As the number of shards, replicas and workers grows this design allows for a massive amount of computing power to be applied to a single query.
+. The worker nodes process the tuples returned from the Data Table tier in parallel. The worker nodes perform the relational algebra needed to satisfy the query plan.
+. The worker nodes stream tuples back to the `/sql` handler where the final merge is done, and finally the tuples are streamed back to the client.
+
+== SQL Clients and Database Visualization Tools
+
+The SQL interface supports queries sent from SQL clients and database visualization tools such as DbVisualizer and Apache Zeppelin.
+
+=== Generic Clients
+
+For most Java based clients, the following jars will need to be placed on the client classpath:
+
+* all .jars found in `$SOLR_HOME/dist/solrj-libs`
+* the SolrJ .jar found at `$SOLR_HOME/dist/solr-solrj-<version>.jar`
+
+If you are using Maven, the `org.apache.solr.solr-solrj` artifact contains the required jars.
+
+Once the jars are available on the classpath, the Solr JDBC driver name is `org.apache.solr.client.solrj.io.sql.DriverImpl` and a connection can be made with the following connection string format:
+
+[source,plain]
+----
+jdbc:solr://SOLR_ZK_CONNECTION_STRING?collection=COLLECTION_NAME
+----
+
+There are other parameters that can be optionally added to the connection string like `aggregationMode` and `numWorkers`.
+
+=== DbVisualizer
+
+A step-by-step guide for setting up https://www.dbvis.com/[DbVisualizer] is in the section <<solr-jdbc-dbvisualizer.adoc#solr-jdbc-dbvisualizer,Solr JDBC - DbVisualizer>>.
+
+=== SQuirreL SQL
+
+A step-by-step guide for setting up http://squirrel-sql.sourceforge.net[SQuirreL SQL] is in the section <<solr-jdbc-squirrel-sql.adoc#solr-jdbc-squirrel-sql,Solr JDBC - SQuirreL SQL>>.
+
+=== Apache Zeppelin (incubating)
+
+A step-by-step guide for setting up http://zeppelin.apache.org/[Apache Zeppelin] is in the section <<solr-jdbc-apache-zeppelin.adoc#solr-jdbc-apache-zeppelin,Solr JDBC - Apache Zeppelin>>.
+
+=== Python/Jython
+
+Examples of using Python and Jython for connecting to Solr with the Solr JDBC driver are available in the section <<solr-jdbc-python-jython.adoc#solr-jdbc-python-jython,Solr JDBC - Python/Jython>>.
+
+=== R
+
+Examples of using R for connecting to Solr with the Solr JDBC driver are available in the section <<solr-jdbc-r.adoc#solr-jdbc-r,Solr JDBC - R>> .

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/parameter-reference.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/parameter-reference.adoc b/solr/solr-ref-guide/src/parameter-reference.adoc
new file mode 100644
index 0000000..4942f57
--- /dev/null
+++ b/solr/solr-ref-guide/src/parameter-reference.adoc
@@ -0,0 +1,51 @@
+= Parameter Reference
+:page-shortname: parameter-reference
+:page-permalink: parameter-reference.html
+
+== Cluster Parameters
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="20,20,60"]
+|===
+|`numShards` |Defaults to 1 |The number of shards to hash documents to. There must be one leader per shard and each leader can have N replicas.
+|===
+
+== SolrCloud Instance Parameters
+
+These are set in `solr.xml`, but by default the `host` and `hostContext` parameters are set up to also work with system properties.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="20,20,60"]
+|===
+|`host` |Defaults to the first local host address found |If the wrong host address is found automatically, you can override the host address with this parameter.
+|`hostPort` |Defaults to the port specified via `bin/solr -p <port>`, or `8983` if not specified. |The port that Solr is running on. This value is only used when `-DzkRun` is specified without a value (see below), to calculate the default port on which embedded ZooKeeper will run. **I**n the `solr.xml` shipped with Solr, the `hostPort` system property is not referenced, and so is ignored. If you want to run Solr on a non-default port, use `bin/solr -p <port>` rather than specifying `-DhostPort`.
+|`hostContext` |Defaults to `solr` |The context path for the Solr web application.
+|===
+
+== SolrCloud Instance ZooKeeper Parameters
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="20,20,60"]
+|===
+|`zkRun` |Defaults to `localhost:<hostPort+1000>` |Causes Solr to run an embedded version of ZooKeeper. Set to the address of ZooKeeper on this node; this allows us to know who you are in the list of addresses in the `zkHost` connect string. Use `-DzkRun` (with no value) to get the default value.
+|`zkHost` |No default |The host address for ZooKeeper. Usually this is a comma-separated list of addresses to each node in your ZooKeeper ensemble.
+|`zkClientTimeout` |Defaults to 15000 |The time a client is allowed to not talk to ZooKeeper before its session expires.
+|===
+
+`zkRun` and `zkHost` are set up using system properties. `zkClientTimeout` is set up in `solr.xml` by default, but can also be set using a system property.
+
+== SolrCloud Core Parameters
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="20,20,60"]
+|===
+|`shard` |Defaults to being automatically assigned based on numShards |Specifies which shard this core acts as a replica of.
+|===
+
+`shard` can be specified in the <<defining-core-properties.adoc#defining-core-properties,`core.properties`>> for each core.
+
+Additional cloud related parameters are discussed in <<format-of-solr-xml.adoc#format-of-solr-xml,Format of solr.xml>>

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/pdf/SolrRefGuide-all.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/pdf/SolrRefGuide-all.adoc b/solr/solr-ref-guide/src/pdf/SolrRefGuide-all.adoc
new file mode 100644
index 0000000..667fc3b
--- /dev/null
+++ b/solr/solr-ref-guide/src/pdf/SolrRefGuide-all.adoc
@@ -0,0 +1,15 @@
+= Apache Solr Reference Guide: For Solr {solr-docs-version}
+:toc:
+:toc-title: Table of Contents
+
+[discrete]
+= Licenses
+Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements.  See the NOTICE file distributed with this work for additional information regarding copyright ownership.  The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.  You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and limitations under the License.
+
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Apache Lucene, Apache Solr and their respective logos are trademarks of the Apache Software Foundation. Please see the http://www.apache.org/foundation/marks/[Apache Trademark Policy] for more information.
+
+<<<
+
+include::_data/pdf-main-body.adoc[]

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/pdf/themes/refguide-theme.yml
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/pdf/themes/refguide-theme.yml b/solr/solr-ref-guide/src/pdf/themes/refguide-theme.yml
new file mode 100644
index 0000000..5e272ce
--- /dev/null
+++ b/solr/solr-ref-guide/src/pdf/themes/refguide-theme.yml
@@ -0,0 +1,259 @@
+font:
+  catalog:
+    # Noto Sans supports Latin, Latin-1 Supplement, Latin Extended-A, Greek, Cyrillic, Vietnamese & an assortment of symbols
+    # Noto Sans used for body text
+    Noto Sans:
+      normal: Noto_Sans/NotoSans-Regular.ttf
+      bold: Noto_Sans/NotoSans-Bold.ttf
+      italic: Noto_Sans/NotoSans-Italic.ttf
+      bold_italic: Noto_Sans/NotoSans-BoldItalic.ttf
+    # Inconsolata used for monospaced text & code blocks
+    Inconsolata:
+      normal: Inconsolata/Inconsolata-Regular.ttf
+      bold: Inconsolata/Inconsolata-Bold.ttf
+      italic: Inconsolata/Inconsolata-Regular.ttf
+      bold_italic: Inconsolata/Inconsolata-Bold.ttf
+    # M+1mn is from the asciidoctor-pdf project
+    # Provides the glyphs for callout numbers (see conum section below)
+    # This is a fallback font, and will only be used when it can fill in missing glyphs from other fonts
+    M+1mn Fallback:
+      normal: mplus1mn/mplus1mn-regular-ascii-conums.ttf
+      bold: mplus1mn/mplus1mn-bold-ascii.ttf
+      italic: mplus1mn/mplus1mn-italic-ascii.ttf
+      bold_italic: mplus1mn/mplus1mn-bold_italic-ascii.ttf
+  fallbacks:
+    - M+1mn Fallback
+# page-level settings apply to the entire page
+page:
+  background_color: '#ffffff'
+  layout: portrait
+  margin: [0.5in, 0.75in, 0.75in, 0.67in]
+  size: LETTER
+# base-level settings are defaults for other elements
+base:
+  align: left
+  # color as hex string (leading # is optional)
+  font_color: '#333333'
+  font_family: Noto Sans
+  font_size: 10
+  line_height_length: 11
+  line_height: $base_line_height_length / $base_font_size
+  font_size_large: round($base_font_size * 1.25)
+  font_size_small: round($base_font_size * 0.85)
+  font_size_min: $base_font_size * 0.75
+  font_style: normal
+  border_color: '#eeeeee'
+  border_radius: 4
+  border_width: 0.5
+vertical_rhythm: $base_line_height_length
+horizontal_rhythm: $base_line_height_length
+vertical_spacing: $vertical_rhythm
+link:
+  font_color: '#428bca'
+# literal is used for inline monospaced strings in prose and in table cells
+literal:
+  font_color: '#333333'
+  font_family: Inconsolata
+  background_color: '#f5f5f5'
+# code is used for source code blocks
+code:
+  font_color: $base_font_color
+  font_family: $literal_font_family
+  font_size: ceil($base_font_size)
+  padding: $code_font_size
+  line_height: 1.25
+  background_color: '#f5f5f5'
+  border_color: '#cccccc'
+  border_radius: $base_border_radius
+  border_width: 0.75
+# headings
+heading:
+  font_color: '#d9411e'
+  font_family: $base_font_family
+  font_style: bold
+  # h1 is used for part titles; equivalent to 22pt due to rounding
+  h1_font_size: floor($base_font_size * 2.1)
+  # h2 is used for chapter titles; equivalent to 20pt due to rounding
+  h2_font_size: floor($base_font_size * 1.9)
+  # equivalent to 16pt due to rounding
+  h3_font_size: ceil($base_font_size * 1.53)
+  # equivalent to 13.125pt
+  h4_font_size: $base_font_size_large
+  # equivalent to 10.5pt
+  h5_font_size: $base_font_size
+  #equivalent to 8.925pt
+  h6_font_size: $base_font_size_small
+  line_height: 1.2
+  margin_top: $vertical_rhythm * 0.2
+  margin_bottom: $vertical_rhythm * 0.8
+  padding: [0, 0, 5, 0]
+# Applies only to the title page
+title_page:
+  align: right
+  logo:
+    top: 10%
+    image: image:../../solr-sunOnly-small.png[pdfwidth=10%]
+    align: right
+  title:
+    top: 55%
+    font_size: $heading_h1_font_size
+    font_color: '#d9411e'
+    line_height: 0.9
+  subtitle:
+    font_size: $heading_h3_font_size
+    font_style: bold_italic
+    line_height: 1
+  authors:
+    margin_top: $base_font_size * 1.25
+    font_size: $base_font_size_large
+    font_color: '#181818'
+  revision:
+    margin_top: $base_font_size * 1.25
+block:
+  margin_top: 0
+  margin_bottom: $vertical_rhythm
+caption:
+  align: left
+  font:
+    style: italic
+    color: 7a2518
+  margin_inside: $vertical_rhythm / 3
+  margin_outside: 0
+lead:
+  font_size: $base_font_size_large
+  line_height: 1.4
+abstract:
+  font_color: '#5c6266'
+  font_size: $lead_font_size
+  line_height: $lead_line_height
+  font_style: italic
+admonition:
+  border_color: $base_border_color
+  border_width: $base_border_width
+  padding: [0, $horizontal_rhythm, 0, $horizontal_rhythm]
+  icon:
+    tip:
+      name: fa-lightbulb-o
+      stroke_color: '#428b30'
+    note:
+      name: fa-info-circle
+      stroke_color: '#19407c'
+    warning:
+      name: fa-exclamation-triangle
+      stroke_color: '#bf0000'
+    caution:
+      name: fa-fire
+      stroke_color: '#bf6900'
+    information:
+      name: fa-bolt
+      stroke_color: '#eeea74'
+blockquote:
+  font_color: $base_font_color
+  font_size: $base_font_size_large
+  border_color: $base_border_color
+  border_width: 5
+  padding: [$vertical_rhythm / 2, $horizontal_rhythm, $vertical_rhythm / -2, $horizontal_rhythm + $blockquote_border_width / 2]
+  cite_font_size: $base_font_size_small
+  cite_font_color: '#999999'
+#conums are used for inline callouts
+conum:
+  font_family: M+1mn Fallback
+  font_color: $literal_font_color
+  font_size: $base_font_size_large
+  line_height: 4 / 3
+example:
+  border_color: $base_border_color
+  border_radius: $base_border_radius
+  border_width: 0.75
+  background_color: transparent
+  padding: [$vertical_rhythm, $horizontal_rhythm, 0, $horizontal_rhythm]
+image:
+  align: left
+  border_color: $base_border_color
+  border_radius: $base_border_radius
+  border_width: 0.75
+prose:
+  margin_top: 0
+  margin_bottom: $vertical_rhythm
+sidebar:
+  border_color: $page_background_color
+  border_radius: $base_border_radius
+  border_width: $base_border_width
+  background_color: '#eeeeee'
+  padding: [$vertical_rhythm, $vertical_rhythm * 1.25, 0, $vertical_rhythm * 1.25]
+  title:
+    align: center
+    font_color: $heading_font_color
+    font_family: $heading_font_family
+    font_size: $heading_h4_font_size
+    font_style: $heading_font_style
+thematic_break:
+  border_color: $base_border_color
+  border_style: solid
+  border_width: $base_border_width
+  margin_top: $vertical_rhythm * 0.5
+  margin_bottom: $vertical_rhythm * 1.5
+description_list:
+  term_font_style: italic
+  term_spacing: $vertical_rhythm / 4
+  description_indent: $horizontal_rhythm * 1.25
+outline_list:
+  indent: $horizontal_rhythm * 1.5
+  # NOTE item_spacing applies to list items that do not have complex content
+  item_spacing: $vertical_rhythm / 2
+table:
+  background_color: $page_background_color
+  head_background_color: '#e6e7e8'
+  #head_font_color: $base_font_color
+  head_font_style: bold
+  even_row_background_color: '#f9f9f9'
+  #odd_row_background_color: <hex value>
+  foot_background_color: '#f0f0f0'
+  border_color: '#dddddd'
+  border_width: $base_border_width
+  cell_padding: [3, 3, 6, 3]
+toc:
+  dot_leader_color: '#dddddd'
+  #dot_leader_content: '. '
+  indent: $horizontal_rhythm
+  line_height: 1.4
+header:
+  font:
+    size: $base_font_size
+    color: $base_font_color
+    style: bold_italic
+  height: $base_line_height_length * 2.5
+  recto_content:
+    left: 'Apache Solr Reference Guide {solr-docs-version}'
+    right: 'Page {page-number} of {page-count}'
+  verso_content:
+    left: 'Page {page-number} of {page-count}'
+    right: 'Apache Solr Reference Guide {solr-docs-version}'
+footer:
+  font_size: $base_font_size_small
+  font_color: $base_font_color
+  # NOTE if background_color is set, background and border will span width of page
+  border_color: '#dddddd'
+  border_width: 0.25
+  height: $base_line_height_length * 2.5
+  line_height: 1
+  padding: [$base_line_height_length / 2, 1, 0, 1]
+  vertical_align: top
+  #image_vertical_align: <alignment> or <number>
+  # additional attributes for content:
+  # * {page-count}
+  # * {page-number}
+  # * {document-title}
+  # * {document-subtitle}
+  # * {chapter-title}
+  # * {section-title}
+  # * {section-or-chapter-title}
+  # We have added some custom variables from the build, see _config.yml.template
+  recto_content:
+    right: 'Published: {build-date}'
+    left: '(C) {build-year}, Apache Software Foundation'
+  verso_content:
+    left: 'Published: {build-date}'
+    right: '(C) {build-year}, Apache Software Foundation'
+colophon:
+  font_size: $base_font_size_small

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/performance-statistics-reference.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/performance-statistics-reference.adoc b/solr/solr-ref-guide/src/performance-statistics-reference.adoc
new file mode 100644
index 0000000..c0dfa1b
--- /dev/null
+++ b/solr/solr-ref-guide/src/performance-statistics-reference.adoc
@@ -0,0 +1,119 @@
+= Performance Statistics Reference
+:page-shortname: performance-statistics-reference
+:page-permalink: performance-statistics-reference.html
+
+This page explains some of the <<using-jmx-with-solr.adoc#using-jmx-with-solr,JMX>> statistics that Solr exposes.
+
+The same statistics are also exposed via the<<mbean-request-handler.adoc#mbean-request-handler,MBean Request Handler>> when statistics are requested.
+
+These statistics are per core. When you are running in SolrCloud mode these statistics would co-relate to each performance of an individual replica.
+
+== Request Handlers
+
+=== Update Request Handler
+
+The update request handler is an endpoint to send data to Solr. We can see how many update requests are being fired, how fast is it performing, and other valuable information regarding requests.
+
+*Path:* `/solr/<core>/update`
+
+=== Search Request Handler
+
+Can be useful to measure and track number of search queries , response times etc. If you are not using the “select” handler then the path needs to be changed appropriately. Similarly if you are using the “sql” handler or “export” handler , the realtime handler “get”, or any other handler similar statistics can be found for that as well.
+
+*Path*: `/solr/<core>/select`
+
+Both Update Request Handler and Search Request Handler along with handlers like “sql”, “export” and realtime “get” handler will provide the following attributes in their statistics.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Description
+|15minRateReqsPerSecond |Requests per second received over the past 15 minutes.
+|5minRateReqsPerSecond |Requests per second received over the past 5 minutes.
+|75thPcRequestTime |Request processing time for the request which belongs to the 75th Percentile. E.g., if 100 requests are received, then the 75th fastest request time will be reported by this statistic.
+|95thPcRequestTime |Request processing time in milliseconds for the request which belongs to the 95th Percentile. E.g., if 80 requests are received, then the 76th fastest request time will be reported in this statistic.
+|999thPcRequestTime |Request processing time in milliseconds for the request which belongs to the 99.9th Percentile. E.g., if 1000 requests are received, then the 999th fastest request time will be reported in this statistic.
+|99thPcRequestTime |Request processing time in milliseconds for the request which belongs to the 99th Percentile. E.g., if 200 requests are received, then the 198th fastest request time will be reported in this statistic.
+|avgRequestsPerSecond |Average number of requests received per second.
+|avgTimePerRequest |Average time taken for processing the requests. This parameter will decay over time, with a bias toward activity in the last 5 minutes.
+|errors |Number of error encountered by handler.
+|clientErrors |Number of syntax errors/parse errors made by client while making requests.
+|handlerStart |Epoch time when the handler was registered.
+|medianRequestTime |Median of all the request processing time.
+|requests |Total number of requests made since the Solr process was started.
+|serverErrors |Number of errors thrown by the server while executing the request.
+|timeouts |Number of responses received with partial results.
+|totalTime |The sum of all request processing times since the Solr process was started.
+|===
+
+== Update Handler
+
+*Update Handler:* This section has information on the total number of adds, how many commits have been fired against a solr core.
+
+*Path:* `/solr/<core>/updateHandler/DirectUpdateHandler2`
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Description
+|adds |Total number of “add” requests since last commit.
+|autocommit maxTime |Maximum time between two autocommits execution.
+|autocommits |Total number of auto-commits executed.
+|cumulative_adds |Number of “effective” additions executed over the lifetime. The counter is incremented when “add‘ command is executed while decremented when “rollback” is executed.
+|cumulative_deletesById |Number of document deletions executed by ID over the lifetime. The counter is incremented when “delete” command is executed while decremented when “rollback” is executed.
+|cumulative_deletesByQuery |Number of document deletions executed by query over the lifetime. The counter is incremented when “delete” command is executed while decremented when “rollback” is executed.
+|cumulative_errors |Number of error messages received by Update Handler while performing addition/deletion action on documents over the lifetime.
+|deletesById |Currently uncommitted deletions by ID.
+|deletesByQuery |Currently uncommitted deletions by query.
+|docsPending |Number of documents which are pending commit.
+|errors |Number of error messages received by Update Handler while performing addition/deletion/commit/rollback action on documents over the lifetime.
+|expungeDeletes |Number of commit commands issued with expunge deletes.
+|optimizes |Number of explicit optimize commands issued
+|rollbacks |Number of rollbacks executed.
+|soft autocommit maxTime |Maximum documents ‘adds’ between two soft auto-commits.
+|soft autocommits |Number of soft commits executed.
+|transaction_logs_total_number |Number of TLogs created from the beginning of the Solr instance. It will be equivalent to number of Hard commits executed.
+|transaction_logs_total_size |Total size of all the TLogs created so far from the beginning of the Solr instance.
+|===
+
+== Caches
+
+=== Document Cache
+
+This cache holds Lucene Document objects (the stored fields for each document). Since Lucene internal document IDs are transient, this cache cannot be auto-warmed.
+
+*Path:* `/solr/<cache>/documentCache`
+
+=== Query Result Cache
+
+This cache holds the results of previous searches: ordered lists of document IDs based on a query, a sort, and the range of documents requested
+
+*Path:* `/solr/<cache>/queryResultCache`
+
+=== Filter Cache
+
+This cache is used for filters for unordered sets of all documents that match a query.
+
+*Path:* `/solr/<cache>/filterCache`
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Attribute |Description
+|cumulative_evictions |Number of cache evictions across all caches since this node has been running.
+|cumulative_hitratio |Ratio of cache hits to lookups across all the caches since this node has been running.
+|cumulative_hits |Number of cache hits across all the caches since this node has been running.
+|cumulative_inserts |Number of cache insertions across all the caches since this node has been running.
+|cumulative_lookups |Number of cache lookups across all the caches since this node has been running.
+|evictions |Number of cache evictions for the current index searcher.
+|hitratio |Ratio of cache hits to lookups for the current index searcher.
+|hits |Number of hits for the current index searcher.
+|inserts |Number of inserts into the cache.
+|lookups |Number of lookups against the cache.
+|size |Size of the cache at that particular instance (in KBs).
+|warmupTime |Warm-up time for the registered index searcher. This time is taken in account for the “auto-warming” of caches.
+|===
+
+More information on Solr caches is available in the section <<query-settings-in-solrconfig.adoc#query-settings-in-solrconfig,Query Settings in SolrConfig>>.

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/phonetic-matching.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/phonetic-matching.adoc b/solr/solr-ref-guide/src/phonetic-matching.adoc
new file mode 100644
index 0000000..1f2f3c0
--- /dev/null
+++ b/solr/solr-ref-guide/src/phonetic-matching.adoc
@@ -0,0 +1,114 @@
+= Phonetic Matching
+:page-shortname: phonetic-matching
+:page-permalink: phonetic-matching.html
+
+Phonetic matching algorithms may be used to encode tokens so that two different spellings that are pronounced similarly will match.
+
+For overviews of and comparisons between algorithms, see http://en.wikipedia.org/wiki/Phonetic_algorithm and http://ntz-develop.blogspot.com/2011/03/phonetic-algorithms.html
+
+
+[[PhoneticMatching-Beider-MorsePhoneticMatching_BMPM_]]
+== Beider-Morse Phonetic Matching (BMPM)
+
+For examples of how to use this encoding in your analyzer, see <<filter-descriptions.adoc#FilterDescriptions-Beider-MorseFilter,Beider Morse Filter>> in the Filter Descriptions section.
+
+Beider-Morse Phonetic Matching (BMPM) is a "soundalike" tool that lets you search using a new phonetic matching system. BMPM helps you search for personal names (or just surnames) in a Solr/Lucene index, and is far superior to the existing phonetic codecs, such as regular soundex, metaphone, caverphone, etc.
+
+In general, phonetic matching lets you search a name list for names that are phonetically equivalent to the desired name. BMPM is similar to a soundex search in that an exact spelling is not required. Unlike soundex, it does not generate a large quantity of false hits.
+
+From the spelling of the name, BMPM attempts to determine the language. It then applies phonetic rules for that particular language to transliterate the name into a phonetic alphabet. If it is not possible to determine the language with a fair degree of certainty, it uses generic phonetic instead. Finally, it applies language-independent rules regarding such things as voiced and unvoiced consonants and vowels to further insure the reliability of the matches.
+
+For example, assume that the matches found when searching for Stephen in a database are "Stefan", "Steph", "Stephen", "Steve", "Steven", "Stove", and "Stuffin". "Stefan", "Stephen", and "Steven" are probably relevant, and are names that you want to see. "Stuffin", however, is probably not relevant. Also rejected were "Steph", "Steve", and "Stove". Of those, "Stove" is probably not one that we would have wanted. But "Steph" and "Steve" are possibly ones that you might be interested in.
+
+For Solr, BMPM searching is available for the following languages:
+
+* English
+* French
+* German
+* Greek
+* Hebrew written in Hebrew letters
+* Hungarian
+* Italian
+
+* Polish
+* Romanian
+* Russian written in Cyrillic letters
+* Russian transliterated into English letters
+* Spanish
+* Turkish
+
+The name matching is also applicable to non-Jewish surnames from the countries in which those languages are spoken.
+
+For more information, see here: http://stevemorse.org/phoneticinfo.htm and http://stevemorse.org/phonetics/bmpm.htm[http://stevemorse.org/phonetics/bmpm.htm.]
+
+== Daitch-Mokotoff Soundex
+
+To use this encoding in your analyzer, see <<filter-descriptions.adoc#FilterDescriptions-Daitch-MokotoffSoundexFilter,Daitch-Mokotoff Soundex Filter>> in the Filter Descriptions section.
+
+The Daitch-Mokotoff Soundex algorithm is a refinement of the Russel and American Soundex algorithms, yielding greater accuracy in matching especially Slavic and Yiddish surnames with similar pronunciation but differences in spelling.
+
+The main differences compared to the other soundex variants are:
+
+* coded names are 6 digits long
+* initial character of the name is coded
+* rules to encoded multi-character n-grams
+* multiple possible encodings for the same name (branching)
+
+Note: the implementation used by Solr (commons-codec's http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/DaitchMokotoffSoundex.html[`DaitchMokotoffSoundex`] ) has additional branching rules compared to the original description of the algorithm.
+
+For more information, see http://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex and http://www.avotaynu.com/soundex.htm
+
+== Double Metaphone
+
+To use this encoding in your analyzer, see <<filter-descriptions.adoc#FilterDescriptions-DoubleMetaphoneFilter,Double Metaphone Filter>> in the Filter Descriptions section. Alternatively, you may specify `encoding="DoubleMetaphone"` with the <<filter-descriptions.adoc#FilterDescriptions-PhoneticFilter,Phonetic Filter>>, but note that the Phonetic Filter version will *not* provide the second ("alternate") encoding that is generated by the Double Metaphone Filter for some tokens.
+
+Encodes tokens using the double metaphone algorithm by Lawrence Philips. See the original article at http://www.drdobbs.com/the-double-metaphone-search-algorithm/184401251?pgno=2
+
+== Metaphone
+
+To use this encoding in your analyzer, specify `encoding="Metaphone"` with the <<filter-descriptions.adoc#FilterDescriptions-PhoneticFilter,Phonetic Filter>>.
+
+Encodes tokens using the Metaphone algorithm by Lawrence Philips, described in "Hanging on the Metaphone" in Computer Language, Dec. 1990.
+
+Another reference for more information is http://www.drdobbs.com/the-double-metaphone-search-algorithm/184401251?pgno=2[Double Metaphone Search Algorithm], by Lawrence Philips.
+
+
+== Soundex
+
+To use this encoding in your analyzer, specify `encoding="Soundex"` with the <<filter-descriptions.adoc#FilterDescriptions-PhoneticFilter,Phonetic Filter>>.
+
+Encodes tokens using the Soundex algorithm, which is used to relate similar names, but can also be used as a general purpose scheme to find words with similar phonemes.
+
+See also http://en.wikipedia.org/wiki/Soundex.
+
+== Refined Soundex
+
+To use this encoding in your analyzer, specify `encoding="RefinedSoundex"` with the <<filter-descriptions.adoc#FilterDescriptions-PhoneticFilter,Phonetic Filter>>.
+
+Encodes tokens using an improved version of the Soundex algorithm.
+
+See http://en.wikipedia.org/wiki/Soundex.
+
+== Caverphone
+
+To use this encoding in your analyzer, specify `encoding="Caverphone"` with the <<filter-descriptions.adoc#FilterDescriptions-PhoneticFilter,Phonetic Filter>>.
+
+Caverphone is an algorithm created by the Caversham Project at the University of Otago. The algorithm is optimised for accents present in the southern part of the city of Dunedin, New Zealand.
+
+See http://en.wikipedia.org/wiki/Caverphone and the Caverphone 2.0 specification at http://caversham.otago.ac.nz/files/working/ctp150804.pdf
+
+== Kölner Phonetik a.k.a. Cologne Phonetic
+
+To use this encoding in your analyzer, specify `encoding="ColognePhonetic"` with the <<filter-descriptions.adoc#FilterDescriptions-PhoneticFilter,Phonetic Filter>>.
+
+The Kölner Phonetik, an algorithm published by Hans Joachim Postel in 1969, is optimized for the German language.
+
+See http://de.wikipedia.org/wiki/K%C3%B6lner_Phonetik
+
+== NYSIIS
+
+To use this encoding in your analyzer, specify `encoding="Nysiis"` with the <<filter-descriptions.adoc#FilterDescriptions-PhoneticFilter,Phonetic Filter>>.
+
+NYSIIS is an encoding used to relate similar names, but can also be used as a general purpose scheme to find words with similar phonemes.
+
+See http://en.wikipedia.org/wiki/NYSIIS and http://www.dropby.com/NYSIIS.html

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/ping.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/ping.adoc b/solr/solr-ref-guide/src/ping.adoc
new file mode 100644
index 0000000..d0a6039
--- /dev/null
+++ b/solr/solr-ref-guide/src/ping.adoc
@@ -0,0 +1,66 @@
+= Ping
+:page-shortname: ping
+:page-permalink: ping.html
+
+Choosing Ping under a core name issues a `ping` request to check whether the core is up and responding to requests.
+
+.Ping Option in Core Dropdown
+image::images/ping/ping.png[image,width=171,height=195]
+
+The search executed by a Ping is configured with the <<request-parameters-api.adoc#request-parameters-api,Request Parameters API>>. See <<implicit-requesthandlers.adoc#implicit-requesthandlers,Implicit RequestHandlers>> for the paramset to use for the `/admin/ping` endpoint.
+
+The Ping option doesn't open a page, but the status of the request can be seen on the core overview page shown when clicking on a collection name. The length of time the request has taken is displayed next to the Ping option, in milliseconds.
+
+== API Examples
+
+While the UI screen makes it easy to see the ping response time, the underlying ping command can be more useful when executed by remote monitoring tools:
+
+*Input*
+
+[source,text]
+----
+http://localhost:8983/solr/<core-name>/admin/ping
+----
+
+This command will ping the core name for a response.
+
+*Input*
+
+[source,text]
+----
+  http://localhost:8983/solr/<collection-name>admin/ping?wt=json&distrib=true&indent=true
+----
+
+This command will ping all replicas of the given collection name for a response
+
+*Sample Output*
+
+[source,xml]
+----
+<response>
+   <lst name="responseHeader">
+      <int name="status">0</int>
+      <int name="QTime">13</int>
+      <lst name="params">
+         <str name="q">{!lucene}*:*</str>
+         <str name="distrib">false</str>
+         <str name="df">_text_</str>
+         <str name="rows">10</str>
+         <str name="echoParams">all</str>
+      </lst>
+   </lst>
+   <str name="status">OK</str>
+</response>
+----
+
+Both API calls have the same output. A status=OK indicates that the nodes are responding.
+
+*SolrJ Example*
+
+[source,java]
+----
+SolrPing ping = new SolrPing();
+ping.getParams().add("distrib", "true"); //To make it a distributed request against a collection
+rsp = ping.process(solrClient, collectionName);
+int status = rsp.getStatus();
+----

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/plugins-stats-screen.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/plugins-stats-screen.adoc b/solr/solr-ref-guide/src/plugins-stats-screen.adoc
new file mode 100644
index 0000000..e32ca05
--- /dev/null
+++ b/solr/solr-ref-guide/src/plugins-stats-screen.adoc
@@ -0,0 +1,12 @@
+= Plugins & Stats Screen
+:page-shortname: plugins-stats-screen
+:page-permalink: plugins-stats-screen.html
+
+The Plugins screen shows information and statistics about the status and performance of various plugins running in each Solr core. You can find information about the performance of the Solr caches, the state of Solr's searchers, and the configuration of Request Handlers and Search Components.
+
+Choose an area of interest on the right, and then drill down into more specifics by clicking on one of the names that appear in the central part of the window. In this example, we've chosen to look at the Searcher stats, from the Core area:
+
+.Searcher Statistics
+image::images/plugins-stats-screen/plugin-searcher.png[image,width=462,height=250]
+
+The display is a snapshot taken when the page is loaded. You can get updated status by choosing to either *Watch Changes* or *Refresh Values*. Watching the changes will highlight those areas that have changed, while refreshing the values will reload the page with updated information.

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/95968c69/solr/solr-ref-guide/src/post-tool.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/post-tool.adoc b/solr/solr-ref-guide/src/post-tool.adoc
new file mode 100644
index 0000000..f6d084b
--- /dev/null
+++ b/solr/solr-ref-guide/src/post-tool.adoc
@@ -0,0 +1,167 @@
+= Post Tool
+:page-shortname: post-tool
+:page-permalink: post-tool.html
+
+Solr includes a simple command line tool for POSTing various types of content to a Solr server.
+
+The tool is `bin/post`. The bin/post tool is a Unix shell script; for Windows (non-Cygwin) usage, see the <<PostTool-WindowsSupport,Windows section>> below.
+
+To run it, open a window and enter:
+
+[source,bash]
+----
+bin/post -c gettingstarted example/films/films.json
+----
+
+This will contact the server at `localhost:8983`. Specifying the `collection/core name` is *mandatory*. The `-help` (or simply `-h`) option will output information on its usage (i.e., `bin/post -help)`.
+
+
+== Using the bin/post Tool
+
+Specifying either the `collection/core name` or the full update `url` is *mandatory* when using `bin/post`.
+
+The basic usage of `bin/post` is:
+
+[source,plain]
+----
+$ bin/post -h
+Usage: post -c <collection> [OPTIONS] <files|directories|urls|-d ["...",...]>
+    or post -help
+
+   collection name defaults to DEFAULT_SOLR_COLLECTION if not specified
+
+OPTIONS
+=======
+  Solr options:
+    -url <base Solr update URL> (overrides collection, host, and port)
+    -host <host> (default: localhost)
+    -p or -port <port> (default: 8983)
+    -commit yes|no (default: yes)
+    -u or -user <user:pass> (sets BasicAuth credentials)
+
+  Web crawl options:
+    -recursive <depth> (default: 1)
+    -delay <seconds> (default: 10)
+
+
+  Directory crawl options:
+    -delay <seconds> (default: 0)
+
+  stdin/args options:
+    -type <content/type> (default: application/xml)
+
+
+  Other options:
+    -filetypes <type>[,<type>,...] (default: xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log)
+    -params "<key>=<value>[&<key>=<value>...]" (values must be URL-encoded; these pass through to Solr update request)
+    -out yes|no (default: no; yes outputs Solr response to console)
+...
+----
+
+[[bin_post_examples]]
+== Examples
+
+There are several ways to use `bin/post`. This section presents several examples.
+
+=== Indexing XML
+
+Add all documents with file extension `.xml` to collection or core named `gettingstarted`.
+
+[source,bash]
+----
+bin/post -c gettingstarted *.xml
+----
+
+Add all documents with file extension `.xml` to the `gettingstarted` collection/core on Solr running on port `8984`.
+
+[source,bash]
+----
+bin/post -c gettingstarted -p 8984 *.xml
+----
+
+Send XML arguments to delete a document from `gettingstarted`.
+
+[source,bash]
+----
+bin/post -c gettingstarted -d '<delete><id>42</id></delete>'
+----
+
+=== Indexing CSV
+
+Index all CSV files into `gettingstarted`:
+
+[source,bash]
+----
+bin/post -c gettingstarted *.csv
+----
+
+Index a tab-separated file into `gettingstarted`:
+
+[source,bash]
+----
+bin/post -c signals -params "separator=%09" -type text/csv data.tsv
+----
+
+The content type (`-type`) parameter is required to treat the file as the proper type, otherwise it will be ignored and a WARNING logged as it does not know what type of content a .tsv file is. The <<uploading-data-with-index-handlers.adoc#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates,CSV handler>> supports the `separator` parameter, and is passed through using the `-params` setting.
+
+=== Indexing JSON
+
+Index all JSON files into `gettingstarted`.
+
+[source,bash]
+----
+bin/post -c gettingstarted *.json
+----
+
+=== Indexing Rich Documents (PDF, Word, HTML, etc)
+
+Index a PDF file into `gettingstarted`.
+
+[source,bash]
+----
+bin/post -c gettingstarted a.pdf
+----
+
+Automatically detect content types in a folder, and recursively scan it for documents for indexing into `gettingstarted`.
+
+[source,bash]
+----
+bin/post -c gettingstarted afolder/
+----
+
+Automatically detect content types in a folder, but limit it to PPT and HTML files and index into `gettingstarted`.
+
+[source,bash]
+----
+bin/post -c gettingstarted -filetypes ppt,html afolder/
+----
+
+=== Indexing to a Password Protected Solr (Basic Auth)
+
+Index a pdf as the user solr with password `SolrRocks`:
+
+[source,bash]
+----
+bin/post -u solr:SolrRocks -c gettingstarted a.pdf
+----
+
+[[PostTool-WindowsSupport]]
+== Windows Support
+
+`bin/post` exists currently only as a Unix shell script, however it delegates its work to a cross-platform capable Java program. The <<SimplePostTool>> can be run directly in supported environments, including Windows.
+
+== SimplePostTool
+
+The `bin/post` script currently delegates to a standalone Java program called `SimplePostTool`.
+
+This tool, bundled into a executable JAR, can be run directly using `java -jar example/exampledocs/post.jar`. See the help output and take it from there to post files, recurse a website or file system folder, or send direct commands to a Solr server.
+
+[source,plain]
+----
+$ java -jar example/exampledocs/post.jar -h
+SimplePostTool version 5.0.0
+Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]]
+.
+.
+.
+----